Page MenuHomePhabricator

prop=revisions sorts by rev_id, not by rev_timestamp
Closed, ResolvedPublic

Description

Okay there is something weird going on. I'm not sure if this is the API or a inconsistency in the database. I want to get the first revision (and optionally n revisions after that). This does work with the The Big Bang Theory article [[http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids%7Ctimestamp&rvlimit=1&rvdir=newer&titles=The%20Big%20Bang%20Theory&rvendid=145491640|titles=The Big Bang Theory&rvendid=145491640]]. The rvendid is not the ID of the first revision but some revision I choose randomly. It does return me the first revision visible by the parentid.

Now this is a different story on the Main Page [[http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids%7Ctimestamp&rvlimit=1&rvdir=newer&titles=Main%20Page&rvendid=139871|titles=Main Page&rvendid=139871]]. It now returns directly that revision which might be because it's parentid is higher than it's current id. Even http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids%7Ctimestamp&titles=Main%20Page&rvstartid=139890&rvdir=older&rvlimit=50 this is just returning two revisions.

See also:

Event Timeline

XZise raised the priority of this task from to Needs Triage.
XZise updated the task description. (Show Details)
XZise added a project: MediaWiki-Action-API.
XZise subscribed.
Anomie renamed this task from Getting last revision with rvendid set doesn't work always to prop=revisions sorts by rev_id, not by rev_timestamp.Mar 7 2015, 2:51 PM
Anomie set Security to None.

"rvendid" has nothing to do with getting the first revision, that just tells the API which revision to stop at when listing. rvdir=newer is what's telling it to start at the first revision rather than the latest.

The problem you're seeing is that prop=revisions sorts revisions by id rather than by timestamp. Normally a revision with a higher timestamp will also have a higher id, but for really old pages that were imported from earlier versions of Wikipedia this doesn't necessarily hold true. If Special:Import preserves timestamps (I forget offhand if it does), that would be another source for a mismatch between id-order and timestamp-order.

Gerrit change 188843 should fix this, if @Springle says the new queries are good.

Change 188843 had a related patch set uploaded (by Anomie):
API: Improve queries for prop=revisions in enum mode

https://gerrit.wikimedia.org/r/188843

Oh that would make sense. I had already problems with that page a few days ago and there I got a weird query where the revisions weren't in order (aka parentid != next revision's id, with default ordering): [[https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids%7Ctimestamp&titles=Main%20Page&rvstartid=140204&rvlimit=500|action=query&prop=revisions&rvprop=ids|timestamp&titles=Main Page&rvstartid=140204&rvlimit=500]]

How does 188843 actually fix this bug? Only by switching the where clause additions in lines 243-246 (of PS2)?

issues of rev_id vs timestamp also come up when un-deleting material which was deleted before rev_id was included in the archive table.

How does 188843 actually fix this bug? Only by switching the where clause additions in lines 243-246 (of PS2)?

Yeah. addWhereRange() and addTimestampWhereRange() have a side effect of appending the field to the ORDER BY clause in the query.

Aklapper triaged this task as Medium priority.Mar 9 2015, 12:10 PM

Change 188843 merged by jenkins-bot:
API: Improve queries for prop=revisions in enum mode

https://gerrit.wikimedia.org/r/188843

Anomie claimed this task.

This should be deployed to WMF wikis with 1.26wmf4, see https://www.mediawiki.org/wiki/MediaWiki_1.26/Roadmap for the schedule.