Add Wikidata query service lag to Wikidata maxlag
Closed, ResolvedPublic8 Estimated Story Points
Actions

Assigned To

Authored By

	Lucas_Werkmeister_WMDE
	Apr 24 2019, 2:14 PM

Description

As a high-volume editor on Wikidata, I want to ensure that my edits do not impair query service responsiveness.

Problem:
maxlag is a parameter that API users can specify to avoid overloading the wiki: if I send an API request with maxlag=5, and the database replicas are currently more than five seconds behind the master, then MediaWiki will immediately refuse the request. Afterwards, I’m supposed to wait for a bit before retrying the request. See https://www.mediawiki.org/wiki/Manual:Maxlag_parameter.

Last year, we modified the API’s behavior so that this takes into account not just the replication lag, but also the dispatch lag (T194950: Include Wikibase dispatch lag in API "maxlag" enforcing) – if the database replicas are fine, but change dispatching to client wikis is more than 5 minutes behind, then requests with maxlag=5 will still be rejected. (The dispatchLagToMaxLagFactor is configurable, 60 in production, so the threshold for dispatch lag is in minutes instead of seconds.)

However, this does not take the query service lag into account – if updates on some or all of the WDQS servers start to lag behind, edits will continue at full speed as long as database replication and client dispatching are not affected. This can happen because query service lag depends not just on edit rate but also on the size of the entities edited (on each edit, the full entity is reloaded, even if only a small part of it was edited, so editing large items has a disproportionate impact) and the rate of external queries against the server.

BDD
GIVEN all the WDQS servers are lagged by more than one hour
WHEN I send a wbeditentity API request
AND I set the maxlag parameter to 5 seconds
THEN I should get a maxlag error
AND no edit should be made

(the GIVEN part describes a rather extreme case; once the open questions below are qualified, it can perhaps be changed to a more realistic case)

Acceptance criteria:

the effective max lag takes query service update lag into account

Open questions:

What should the conversion factor be? (For dispatch lag, it’s 60 – five seconds of replication lag are equivalent to five minutes of dispatch lag.)
Lag between different servers can differ significantly. Do we use the mean lag? The median? The maximum? Something else? (For dispatch lag, we seem to use the median.)
maxlag affects all API requests, even ones that shouldn’t have any effect on query service lag, such as action=wbgetentity or action=query&meta=userinfo. Should we try to limit the impact of this change, e. g. by only using query service lag on POST requests? (On the other hand, the same question should apply to dispatch lag and we don’t seem to limit the impact of that as far as I can tell.)

Break-down

Add ADR
Implement Caching Prometheus-based solution in Wikidata.org extension
Add cronjob for updating cached lag info

Details

Subject	Repo	Branch	Lines +/-
Switch query service maxlag to be median +1	mediawiki/extensions/Wikidata.org	master	+21 -3
Switch query service maxlag to be median +1	mediawiki/extensions/Wikidata.org	wmf/1.35.0-wmf.28	+2 -2
wgWikidataOrgQueryServiceMaxLagFactor 60	operations/mediawiki-config	master	+1 -1
qsmaxlag, send altered lag info, not only raw lag	mediawiki/extensions/Wikidata.org	master	+1 -1
qsmaxlag, send altered lag info, not only raw lag	mediawiki/extensions/Wikidata.org	wmf/1.35.0-wmf.5	+1 -1
wgWikidataOrgQueryServiceMaxLagFactor 180	operations/mediawiki-config	master	+4 -0
mediawiki/wikidata maint cron for updateQueryServiceLag	operations/puppet	production	+16 -3
Add ApiMaxLagInfo hook	mediawiki/extensions/Wikidata.org	wmf/1.35.0-wmf.5	+47 -0
onApiMaxLagInfo ttl to 1	mediawiki/extensions/Wikidata.org	master	+1 -1
Maintenance script to update cached wdqs lag	mediawiki/extensions/Wikidata.org	master	+104 -0
Maintenance script to update cached wdqs lag	mediawiki/extensions/Wikidata.org	wmf/1.35.0-wmf.5	+104 -0
Switch from load_composer_autoloader AutoloadNamespaces	mediawiki/extensions/Wikidata.org	wmf/1.35.0-wmf.5	+3 -1
Add WDQS lag into request maxlag logic.	mediawiki/extensions/Wikidata.org	wmf/1.35.0-wmf.5	+679 -3
Add ApiMaxLagInfo hook	mediawiki/extensions/Wikidata.org	master	+47 -0
Switch from load_composer_autoloader AutoloadNamespaces	mediawiki/extensions/Wikidata.org	master	+3 -1
Add WDQS lag into request maxlag logic.	mediawiki/extensions/Wikidata.org	master	+679 -3
Add ADR for the solution design on including WDQS lag into maxlag	mediawiki/extensions/Wikidata.org	master	+32 -0
Add Wikidata query service lag to Wikidata maxlag	mediawiki/extensions/Wikibase	master	+854 -0

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T222193 Configure query service lag for Wikidata maxlag
Resolved	Addshore	T221774 Add Wikidata query service lag to Wikidata maxlag
Declined	None	T222194 Add configuration option for Wikibase to take query service update lag into account

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 551857 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikidata.org@wmf/1.35.0-wmf.5] Maintenance script to update cached wdqs lag

https://gerrit.wikimedia.org/r/551857

Change 551858 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikidata.org@wmf/1.35.0-wmf.5] Add ApiMaxLagInfo hook

https://gerrit.wikimedia.org/r/551858

Just prepared the patches needed to backport this (as otherwise it won't go out with the train and land for 2-3 weeks.

Change 551855 merged by jenkins-bot:
[mediawiki/extensions/Wikidata.org@wmf/1.35.0-wmf.5] Add WDQS lag into request maxlag logic.

https://gerrit.wikimedia.org/r/551855

Change 551856 merged by jenkins-bot:
[mediawiki/extensions/Wikidata.org@wmf/1.35.0-wmf.5] Switch from load_composer_autoloader AutoloadNamespaces

https://gerrit.wikimedia.org/r/551856

ReleaseTaggerBot edited projects, added MW-1.35-notes (1.35.0-wmf.5; 2019-11-05); removed MW-1.35-notes (1.35.0-wmf.8; 2019-11-26).Nov 19 2019, 5:00 PM

Mentioned in SAL (#wikimedia-operations) [2019-11-19T17:11:30Z] <addshore@deploy1001> Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag) [[gerrit:551855]] [[gerrit:551856]] (duration: 00m 54s)

Change 551869 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikidata.org@master] onApiMaxLagInfo ttl to 1

https://gerrit.wikimedia.org/r/551869

Change 551857 merged by jenkins-bot:
[mediawiki/extensions/Wikidata.org@wmf/1.35.0-wmf.5] Maintenance script to update cached wdqs lag

https://gerrit.wikimedia.org/r/551857

Mentioned in SAL (#wikimedia-operations) [2019-11-19T17:43:52Z] <addshore@deploy1001> Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag, maint script) [[gerrit:551857]] (duration: 00m 52s)

Change 551869 merged by jenkins-bot:
[mediawiki/extensions/Wikidata.org@master] onApiMaxLagInfo ttl to 1

https://gerrit.wikimedia.org/r/551869

ReleaseTaggerBot edited projects, added MW-1.35-notes (1.35.0-wmf.8; 2019-11-26); removed MW-1.35-notes (1.35.0-wmf.5; 2019-11-05).Nov 19 2019, 6:00 PM

Change 551858 merged by jenkins-bot:
[mediawiki/extensions/Wikidata.org@wmf/1.35.0-wmf.5] Add ApiMaxLagInfo hook

https://gerrit.wikimedia.org/r/551858

Mentioned in SAL (#wikimedia-operations) [2019-11-19T18:20:28Z] <addshore@deploy1001> Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (queryservice maxlag, hook) [[gerrit:551858]] (duration: 00m 53s)

Just the cron job to go now
https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/551582/
The default settings mean that 1 hour of query lag = 1 second of maxlag.
I'll look at bringing this setting down once everything (including cron) is actually deployed.

ReleaseTaggerBot edited projects, added MW-1.35-notes (1.35.0-wmf.5; 2019-11-05); removed MW-1.35-notes (1.35.0-wmf.8; 2019-11-26).Nov 19 2019, 7:00 PM

Change 551582 merged by Giuseppe Lavagetto:
[operations/puppet@production] mediawiki/wikidata maint cron for updateQueryServiceLag

https://gerrit.wikimedia.org/r/551582

Change 552069 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] wgWikidataOrgQueryServiceMaxLagFactor 180

https://gerrit.wikimedia.org/r/552069

Change 552069 merged by jenkins-bot:
[operations/mediawiki-config@master] wgWikidataOrgQueryServiceMaxLagFactor 180

https://gerrit.wikimedia.org/r/552069

Change 552071 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikidata.org@master] qsmaxlag, send altered lag info, not only raw lag

https://gerrit.wikimedia.org/r/552071

Change 552072 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikidata.org@wmf/1.35.0-wmf.5] qsmaxlag, send altered lag info, not only raw lag

https://gerrit.wikimedia.org/r/552072

Change 552072 merged by Addshore:
[mediawiki/extensions/Wikidata.org@wmf/1.35.0-wmf.5] qsmaxlag, send altered lag info, not only raw lag

https://gerrit.wikimedia.org/r/552072

Mentioned in SAL (#wikimedia-operations) [2019-11-20T14:50:53Z] <addshore@deploy1001> Synchronized php-1.35.0-wmf.5/extensions/Wikidata.org: T221774 - Wikidata.org extension (use altered lag, not raw lag) [[gerrit:552072]] (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2019-11-20T14:52:39Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 180 [[gerrit:552069]] (duration: 00m 52s)

Now deployed and working

https://www.wikidata.org/w/api.php?action=query&titles=MediaWiki&format=json&maxlag=-1

Current maxlag for wikidata is roughly 10 due to the query service

Change 552071 merged by jenkins-bot:
[mediawiki/extensions/Wikidata.org@master] qsmaxlag, send altered lag info, not only raw lag

https://gerrit.wikimedia.org/r/552071

Mentioned in SAL (#wikimedia-operations) [2019-11-20T15:36:08Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: RESYNC T221774 - wgWikidataOrgQueryServiceMaxLagFactor 180 [[gerrit:552069]] (duration: 00m 52s)

The first sync didn't seem to do the trick, but the second one did.

The discovery team suggested that perhaps median might not be the best option.
We should investigate using https://config-master.wikimedia.org/pybal/eqiad/wdqs and https://config-master.wikimedia.org/pybal/codfw/wdqs to load the lag state of ONLY pooled servers, and then we could actually use max.

Will probably do this in a separate ticket.

Addshore updated the task description. (Show Details)Nov 20 2019, 3:38 PM

Addshore moved this task from Peer Review to Test (Verification) on the Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) board.

Addshore mentioned this in T238751: Only generate maxlag from pooled query service servers..Nov 20 2019, 3:46 PM

ReleaseTaggerBot edited projects, added MW-1.35-notes (1.35.0-wmf.8; 2019-11-26); removed MW-1.35-notes (1.35.0-wmf.5; 2019-11-05).Nov 20 2019, 4:00 PM

Mentioned in SAL (#wikimedia-operations) [2019-11-20T18:42:56Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 170 (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2019-11-20T18:51:14Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 120 (duration: 00m 54s)

Mentioned in SAL (#wikimedia-operations) [2019-11-20T18:56:43Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: RESYNC T221774 - wgWikidataOrgQueryServiceMaxLagFactor 120 (duration: 00m 50s)

Currently multiple tools are broken because the time that make maxlag back to normal is much longer than the total time the tool retries edits (see this and this).

Liridon subscribed.Nov 20 2019, 7:32 PM

In T221774#5679442, @Bugreporter wrote:

Currently multiple tools are broken because the time that make maxlag back to normal is much longer than the total time the tool retries edits (see this and this).

Good that this change works as intended? Tools should be updated to handle maxlag in a graceful manner. Retry time should not be something fixed, but something which increases if you get it or not. So 5, 10, 20, 40, 80, etc. This also makes sure that not all tools restart at the same time.

Thanks for the notification! I would be happy to release a new version of OpenRefine with a patch applied - I can do this in the coming days. The exponential back-off suggested by @Multichill makes sense intuitively - could WMDE confirm that this is the policy they recommend? Happy to adapt the policy as required.

Notable that WDQS lag seems to be paying not a blind bit of notice to this change :(

WDQS 1005-7 all at ~3 hours.

https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&panelId=8&fullscreen

In T221774#5680575, @Tagishsimon wrote:

Notable that WDQS lag seems to be paying not a blind bit of notice to this change :(

WDQS 1005-7 all at ~3 hours.

https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&panelId=8&fullscreen

Due to the apparent differences in lag pattern between some of the nodes we need to try to get T238751 done so that we can use the max lagged server rather than the median (which we currently do).

Bugreporter added subscribers: Sebotic, Gstupp.Nov 21 2019, 11:41 AM

Envlh subscribed.Nov 21 2019, 12:38 PM

So this is what we get with an exponential back-off (1.5 factor), at the moment:

22:37:27.148 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 1000 milliseconds. (19338ms)
22:37:28.729 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 1500 milliseconds. (1581ms)
22:37:33.809 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 2250 milliseconds. (5080ms)
22:37:37.931 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 3375 milliseconds. (4122ms)
22:37:42.663 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 5062 milliseconds. (4732ms)
22:37:49.437 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 7593 milliseconds. (6774ms)
22:37:58.429 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 5.4916666666667 seconds lagged. -- pausing for 11389 milliseconds. (8992ms)
22:38:18.217 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 6 seconds lagged. -- pausing for 17083 milliseconds. (19788ms)
22:38:36.461 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 6 seconds lagged. -- pausing for 25624 milliseconds. (18244ms)
22:39:05.013 [..baseapi.WbEditingAction] [maxlag] Waiting for all: 6.4666666666667 seconds lagged. -- pausing for 38436 milliseconds. (28552ms)

So it looks like this means no OpenRefine edits at all with these new rules, in the current situation.

Actually maxlag may be more than 5 second for up to one hour. tools should not break (not "give up") even no edits can be made.

@Addshore what factor are we using right now? Most bots seem to have stopped so it might be on the low side.

Change 552474 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] wgWikidataOrgQueryServiceMaxLagFactor 60

https://gerrit.wikimedia.org/r/552474

In T221774#5683786, @Multichill wrote:

@Addshore what factor are we using right now? Most bots seem to have stopped so it might be on the low side.

I have slowly been lowering the factor, just about to set it at 60, so that 1 minute of lag = 1 second of max lag.
Once we get down this low, we should end up in a position where getting into such a bad state as we have been in over the past days / week is fairly hard / impossible.
I would like to think that the spikes of maxlag would be much smaller, but we will see, always open to changing the factor if it doesn't work so well.

As can be seen with the maxlag over the past 12 hours, it seems to be working.

This system will be much better when T238751 is also done.

Change 552474 merged by jenkins-bot:
[operations/mediawiki-config@master] wgWikidataOrgQueryServiceMaxLagFactor 60

https://gerrit.wikimedia.org/r/552474

Mentioned in SAL (#wikimedia-operations) [2019-11-22T12:32:27Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 60 (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2019-11-22T12:34:11Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T221774 - wgWikidataOrgQueryServiceMaxLagFactor 60 RESYNC (duration: 00m 51s)

Verified as working.
Work continues on T238751

One problem with the current policy (requesting all automated editing processes to use maxlag=5) is that this creates a binary threshold: either the query service lag is under the threshold, in which case bots will edit at full speed, or the query service lag is above the threshold, in which case they should all stop editing entirely. This is likely to create an oscillating behaviour, where all bots start and stop periodically. This is probably not ideal neither for the infrastructure nor for the users.

Not all automated edits are long-running tasks where stopping for an hour is acceptable: many QuickStatements or OpenRefine batches are under 10 edits for instance. As a user of these tools, I would expect them to slow down when lag is high, but stopping entirely should be extremely rare, as it breaks workflows.

I think it would be preferable to get a continuous throttling behaviour: the higher the lag, the slower the edits.
This can already be achieved by clients by using a small maxlag parameter and increasing it gradually as they retry. For instance, start with maxlag=2, which fails, retry in 1 sec with maxlag=3, which fails, retry in 4 seconds with maxlag=4, which fails, retry in 8 seconds with maxlag=5, which fails, retry in 16 seconds with maxlag=6, which succeeds.

Would such a behaviour be acceptable?

Actually, some tools seem to be doing something like that already, since edits are still going through despite max lag being above 5 for more than an hour now (Author Disambiguator does this, QuickStatements too probably, Edoderoobot too). So these tools use higher (more agressive) maxlag values than 5.

I will therefore follow their example and publish a new version of OpenRefine which does that. It might be worth updating the official recommendations about this: if the rule is never to set maxlag higher than 5, then it looks like not many are actually following it…

I think it would be preferable to get a continuous throttling behaviour: the higher the lag, the slower the edits.

Sounds sensible to me, 5 should perhaps still be the point at which to stop.
But if editing slows down when the "maxlag" is 0-5 then maybe we will stop reaching 5 in the first place :)

In T221774#5688856, @Pintoch wrote:

Actually, some tools seem to be doing something like that already, since edits are still going through despite max lag being above 5 for more than an hour now (Author Disambiguator does this, QuickStatements too probably, Edoderoobot too). So these tools use higher (more agressive) maxlag values than 5.

I can go ahead and check the values probably.
It is also possible that some bots and tools do not send max lag values.

The theory the logic of maxlag is that it is how lagged the SQL dbs are.
So, if you have a maxlag of 5, you can wait 5 seconds and in theory that lag might now be gone.
We are now abusing that somewhat by including dispatching and the query service lag which don't quite fit that model which is why factors are applied to them.

OK! If you have ways to check what sort of maxlag values are used it would be great!

I will try to implement a responsible throttling strategy in OpenRefine, hoping that others will do the same.

In T221774#5694863, @Pintoch wrote:

I will try to implement a responsible throttling strategy in OpenRefine, hoping that others will do the same.

Once this is done, if by any chance you want to share and explain how you did it, for example in a blog post, so we can share with other tool builders, it would be absolutely wonderful <3

I am first getting in touch with people who seem to be running bots with maxlag greater than 5 or no maxlag parameter at all, to see if they would accept to follow @Addshore's advice never to use maxlag greater than 5 at all.

Bugreporter mentioned this in T240370: Maxlag=5 for Petscan.Dec 10 2019, 8:15 PM

Pintoch mentioned this in T240442: Design a continuous throttling policy for Wikidata bots.Dec 11 2019, 11:19 AM

Bugreporter mentioned this in T240371: Maxlag=5 for Author Disambiguator.Dec 11 2019, 2:30 PM

Am I misreading this graph? https://grafana.wikimedia.org/d/000000489/wikidata-query-service?panelId=8&fullscreen&orgId=1&from=now-12h&to=now&refresh=10s It looks like the query service lag for 3 of the servers has been growing steadily for the past roughly 8 hours. However, edits are going through. Did something change in the maxlag logic somewhere earlier today?

maxlag is currently based on median (not max) of lag of all public servers.

@Bugreporter well something must have changed early today - was it previously "mean" and is now "median"? I'm not sure which is better, but having WDQS hours out of date (we're over 4 hours now) is NOT a good situation, and what this whole task was intended to avoid! @Pintoch any thoughts on this?

In T221774#5680891, @Addshore wrote:

Due to the apparent differences in lag pattern between some of the nodes we need to try to get T238751 done so that we can use the max lagged server rather than the median (which we currently do).

I don't know - I stopped working on this task and T240369 since T240374 was declined. I don't think I can contribute to solving this problem in the current state of affairs, sorry!

In T221774#5814473, @Pintoch wrote:

I don't know - I stopped working on this task and T240369 since T240374 was declined. I don't think I can contribute to solving this problem in the current state of affairs, sorry!

Excuse me? What does that have to do with it? Don't put this on me. Maxlag is just a configuration setting in Pywikibot and default is 5, there is no "code to fix". Every once in a while I might set it to a different value while testing so I'm not wasting my time. Generally it's just set to 5.

This does not have anything to do with you indeed! I was just trying to explain that I stopped trying to help solve this issue (therefore unsubscribing from this).

In T221774#5813612, @ArthurPSmith wrote:

Am I misreading this graph? https://grafana.wikimedia.org/d/000000489/wikidata-query-service?panelId=8&fullscreen&orgId=1&from=now-12h&to=now&refresh=10s It looks like the query service lag for 3 of the servers has been growing steadily for the past roughly 8 hours. However, edits are going through. Did something change in the maxlag logic somewhere earlier today?

Note that this dashboard includes metrics for both pooled and depooled servers.
So whatever you read there will likely also be reporting data for servers that you can't actually query thus are not seeing the lag for via the query service

In T221774#5815408, @Addshore wrote:

[...]

Note that this dashboard includes metrics for both pooled and depooled servers.
So whatever you read there will likely also be reporting data for servers that you can't actually query thus are not seeing the lag for via the query service

However there was definitely a significant lag on some (most?) of the wdqs servers available for querying when I noticed this problem - updates done over an hour previous were not visible when I queried. But it seems to have resolved for now.

dcausse awarded a token.Jan 23 2020, 1:44 PM

dcausse subscribed.

Xqt mentioned this in T242081: Pywikibot fails to access Wikidata due to high maxlag lately.Jan 27 2020, 10:40 AM

Today 8-10s for the whole day (WDQS lag 8-10 minutes), none of Pywikibot tests loading WD succeed within timeout. See T243701 for more details

Dvorapa mentioned this in T243701: Wikidata maxlag repeatedly over 5s since Jan 20, 2020 (primarily caused by the query service).Jan 27 2020, 4:32 PM

In T221774#5834478, @Dvorapa wrote:

Today 8-10s for the whole day, none of Pywikibot tests loading WD succeed within timeout. See T243701 for more details

Reference for the people looking here (last 2 days):

Anything above the orange line is maxlag greater than 5

Bugreporter mentioned this in T245144: Increase Retry-After header for Wikidata.Feb 13 2020, 2:21 PM

Bugreporter mentioned this in T245138: Have a "now is not a good time" flag on the Wikidata api.Feb 13 2020, 2:29 PM

Bovlb subscribed.Mar 18 2020, 1:42 AM

Change 589873 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikidata.org@master] Switch query service maxlag to be median +1

https://gerrit.wikimedia.org/r/589873

Change 589874 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikidata.org@wmf/1.35.0-wmf.28] Switch query service maxlag to be median +1

https://gerrit.wikimedia.org/r/589874

Change 589874 abandoned by Addshore:
Switch query service maxlag to be median 1

https://gerrit.wikimedia.org/r/589874

Change 589873 merged by jenkins-bot:
[mediawiki/extensions/Wikidata.org@master] Switch query service maxlag to be median +1

https://gerrit.wikimedia.org/r/589873

ReleaseTaggerBot edited projects, added MW-1.35-notes (1.35.0-wmf.31; 2020-05-05); removed MW-1.35-notes (1.35.0-wmf.8; 2019-11-26).Apr 29 2020, 9:00 PM

Ladsgroup mentioned this in T252091: RFC: Site-wide edit rate limiting with PoolCounter.May 7 2020, 1:37 AM

dcausse mentioned this in T244341: Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS.May 12 2020, 2:29 PM

So9q subscribed.Aug 9 2021, 11:56 AM

So9q mentioned this in T289970: Recommend exponential waiting time to tool authors?.Aug 30 2021, 6:01 AM

dcausse mentioned this in T293886: Remove Wikidata query service lag from Wikidata maxlag.Oct 20 2021, 12:57 PM

dcausse mentioned this in T285710: WDQS lag detection required manual adjustment during DC switchover.

	F31548365: image.png
	Feb 5 2020, 9:51 AM

	F31132633: image.png
	Nov 22 2019, 9:34 AM

Add Wikidata query service lag to Wikidata maxlagClosed, ResolvedPublic8 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Add Wikidata query service lag to Wikidata maxlag
Closed, ResolvedPublic8 Estimated Story Points
Actions

Related Objects
Search...