Topic on API talk:Querypage

[Closed] Odd duplicate entries returned. (potential missing data-returned issue)

6
MvGulik (talkcontribs)

When test-pulling 3 pages (of qplimit size 3) on the qppage Wantedfiles page (action: query & list: querypage). It tended to return 8, instead of the expected 9, unique entries. (total wantedfiles entries is greater than 9)

Comparing the returned data per call, the missing 9th case is due to the fact that the first and second call seem to have a tendency to return a duplicate entry. Only significant (so far) difference between the first and second call is the the first call is not yet using the qpoffset setting (second call is using qpoffset is 3)

Properly not really critical, as long as the session pulls all entries (eventually).

Unknown (at this point) if this is a more general querypage thing, or just limited to the Wantedfiles case.

MediaWiki:1.25.2
PHP:5.6.7-1 (fpm-fcgi)
MySQL:5.5.43-0+deb8u1

(more NTS then anything else)

Update: Definitely less benign than on might think. Getting only 75 unique entries back after 5 calls, of size 25 on a 107 wantedfiles list, kinda triggers a returned-data credibility issue ... well, to some.

RobinHood70 (talkcontribs)

@Anomie: this sounds like something you'd want to be aware of.

Anomie (talkcontribs)

Without an actual reproduction case, it's hard to guess what might be going on here.

But, in general, the API is allowed to return fewer results than expected before continuing, even zero in some cases. This might be due to limitations on the total result size, query efficiency when filtering, or any other reason.

MvGulik (talkcontribs)

>the API is allowed to return fewer results than expected

The qppage Wantedfiles API calls are actually returning 107 results in total. But of those 107 returned cases, 32 turned out to be duplicate results. Resulting in only 75 unique entries, and 32 missing cases.

The only thing that makes sense to me here is that it seems that the different qpoffset calls are using/given slightly different ordered source-lists.

Looking directly at the Special:WantedFiles page in two browser-tabs. One with "limit=20&offset=0" and the other with "limit=20&offset=1" confirms my suspicion. They return significant different lists (instead of the same list-order shifted by one entry).

Unfortunately, doing the same on the MediaWiki Special:WantedFiles page actually return the expected outcome. same ordered lists/data, just shifted by one entry. (MediaWiki 1.31/wmf.23|1.31.0-wmf.23)

Ergo: Its either limited to some older MediaWiki versions, and/or its related to the MediaWiki setup (could be anything).

Let me try the same on some other Mediawiki site:

MediaWiki: 1.19.1
PHP: 5.4.38-1~dotdeb.1 (fpm-fcgi)
MySQL: 5.5.41-0+wheezy1

Result: Order was maintained.

Getting the feeling the problem I encountered is probably related to how that particular MediaWiki site was setup. (proverbial needle in a haystack case) :(

MvGulik (talkcontribs)

Checked the other special pages for this behavior (list/data order change on different offset setting)

Apart from WantedFiles, the FewestRevisions and WantedTemplates pages also showed this problem.

(skipped (low data count): DoubleRedirects(0), WantedProperties(2).)

MvGulik (talkcontribs)

Time was still a nagging unknown variable here. Seems it plays a role here to.

Although initially I did not encountered this troublesome behavior on the Special:Wantedpages page, I did at a later point in time (For both API and wiki page itself).

Think that's about it from me on this subject.

Cheers.

(closing/marking as resolved (ie: works for me) in a couple of days)