Page MenuHomePhabricator

SRE-swift-storageComponent
ActivePublic

Members (3)

Details

Description

For media file storage on Wikimedia Foundation sites. Specifically, as implemented via OpenStack Swift.

If you encounter issues relating to user uploads or thumbnails while using MediaWiki (e.g. on Wikimedia Commons), then instead first report to MediaWiki-File-management.

Parent project: SRE

Recent Activity

Wed, Dec 25

MGA73 added a comment to T382750: Cannot move File:فندق قصبة بوزنيقة.jpg to File:Kasbah Hotel in Bouznika.jpg on Commons.

Seems to be similar to the issue I had in T382715: Error using FileImporter and undelete file on Commons because of "local-multiwrite/local-public...is in an inconsistent state within the internal storage backends"

Wed, Dec 25, 8:58 PM · SRE-swift-storage
Reedy added a project to T382764: Media storage error with the re-uploading file in Commons: SRE-swift-storage.
Wed, Dec 25, 7:10 PM · SRE-swift-storage, Commons, MediaWiki-File-management
Reedy added a project to T382765: Some files uploaded on Dec 23 not found on upload.wikimedia.org: SRE-swift-storage.
Wed, Dec 25, 7:10 PM · SRE-swift-storage, Commons, MediaWiki-File-management
Don-vip renamed T382763: Thumbnail generation errors from Thumbail generation errors to Thumbnail generation errors.
Wed, Dec 25, 4:17 PM · SRE-swift-storage, Thumbor
Don-vip created T382763: Thumbnail generation errors.
Wed, Dec 25, 4:16 PM · SRE-swift-storage, Thumbor

Tue, Dec 24

mdaniels5757 created T382750: Cannot move File:فندق قصبة بوزنيقة.jpg to File:Kasbah Hotel in Bouznika.jpg on Commons.
Tue, Dec 24, 2:50 PM · SRE-swift-storage
AntiCompositeNumber merged T382711: Error 404, Not Found when accessing thumbnails into T382705: High amount of 503/504 for swift uploads.
Tue, Dec 24, 2:33 AM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
Bugreporter added a project to T382715: Error using FileImporter and undelete file on Commons because of "local-multiwrite/local-public...is in an inconsistent state within the internal storage backends": SRE-swift-storage.
Tue, Dec 24, 12:55 AM · SRE-swift-storage, Commons, MediaWiki-Page-deletion, MediaWiki-File-management, Move-Files-To-Commons
Sreejithk2000 added a comment to T382694: Unable to restore File:Model 4000-First of Odakyu Electric Railway 2.JPG.

Awesome, thank you guys.

Tue, Dec 24, 12:48 AM · Commons, SRE-swift-storage

Mon, Dec 23

BCornwall closed T382705: High amount of 503/504 for swift uploads as Resolved.

This should be fixed now that ms-be2075 is taken out of the ring. Thanks to @MatthewVernon for doing all the heavy lifting.

Mon, Dec 23, 9:56 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
andrea.denisse added a subtask for T360913: Swift proxy server misbehaviour (no longer calling `accept`?): Unknown Object (Task).
Mon, Dec 23, 9:14 PM · SRE-swift-storage
TheDJ added a comment to T382705: High amount of 503/504 for swift uploads.

(it's been quite a day for swift!)

Mon, Dec 23, 8:59 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
BCornwall added a comment to T382705: High amount of 503/504 for swift uploads.

@TheDJ That was a result of a separate issue that is now resolved (it's been quite a day for swift!)

Mon, Dec 23, 8:58 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
Stashbot added a comment to T382707: Frequent disk resets on ms-be2075.

Mentioned in SAL (#wikimedia-operations) [2024-12-23T20:16:56Z] <Emperor> weighted ms-be2075 to zero T382705 T382707

Mon, Dec 23, 8:17 PM · SRE, DC-Ops, SRE-swift-storage, ops-codfw
Stashbot added a comment to T382705: High amount of 503/504 for swift uploads.

Mentioned in SAL (#wikimedia-operations) [2024-12-23T20:16:56Z] <Emperor> weighted ms-be2075 to zero T382705 T382707

Mon, Dec 23, 8:17 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
TheDJ added a comment to T382705: High amount of 503/504 for swift uploads.

Ehm. it this a problem ? or a side effect of the depool taking effect after that 20:15 window ?

Mon, Dec 23, 8:10 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
MatthewVernon closed T382694: Unable to restore File:Model 4000-First of Odakyu Electric Railway 2.JPG as Resolved.

Great, thanks, I'll close this ticket now :)

Mon, Dec 23, 6:45 PM · Commons, SRE-swift-storage
MGA73 added a comment to T382694: Unable to restore File:Model 4000-First of Odakyu Electric Railway 2.JPG.

Ooops. Sorry. I undeleted the file and it worked fine.

Mon, Dec 23, 6:44 PM · Commons, SRE-swift-storage
Pppery added a comment to T382694: Unable to restore File:Model 4000-First of Odakyu Electric Railway 2.JPG.

@Sreejithk2000 Could you try undeleting the file again now?

Mon, Dec 23, 5:26 PM · Commons, SRE-swift-storage
MatthewVernon added a comment to T382694: Unable to restore File:Model 4000-First of Odakyu Electric Railway 2.JPG.

Done, in both clusters.

Mon, Dec 23, 5:23 PM · Commons, SRE-swift-storage
Stashbot added a comment to T382694: Unable to restore File:Model 4000-First of Odakyu Electric Railway 2.JPG.

Mentioned in SAL (#wikimedia-operations) [2024-12-23T17:22:29Z] <Emperor> swift delete wikipedia-commons-local-public.88 8/88/Model_4000-First_of_Odakyu_Electric_Railway_2.JPG T382694

Mon, Dec 23, 5:22 PM · Commons, SRE-swift-storage
Ladsgroup added a comment to T382694: Unable to restore File:Model 4000-First of Odakyu Electric Railway 2.JPG.

if the image exists in deleted container, I agree, just deleting the file from the public container is the right thing to do. In fact, I think we should actively look for images in the deleted container that are in public too as these images (for legal reasons such as copyright) are not supposed to accessible publicly.

Mon, Dec 23, 5:09 PM · Commons, SRE-swift-storage
Pppery added a comment to T382694: Unable to restore File:Model 4000-First of Odakyu Electric Railway 2.JPG.

(with the caveat that I'm not super familiar with this)

Mon, Dec 23, 2:43 PM · Commons, SRE-swift-storage
Maintenance_bot added a project to T382707: Frequent disk resets on ms-be2075: SRE.
Mon, Dec 23, 2:29 PM · SRE, DC-Ops, SRE-swift-storage, ops-codfw
MatthewVernon added a comment to T382705: High amount of 503/504 for swift uploads.

The depool won't entirely help (writes always go to both clusters), but diverting read traffic to eqiad swift should help mitigate user impact a bit. We should restore it before US staff stop work at the end of today, though.

Mon, Dec 23, 2:11 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
Stashbot added a comment to T382705: High amount of 503/504 for swift uploads.

Mentioned in SAL (#wikimedia-operations) [2024-12-23T14:10:02Z] <Emperor> depool codfw swift T382705

Mon, Dec 23, 2:10 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
BCornwall added a comment to T382705: High amount of 503/504 for swift uploads.

ms-be2075 will be effectively removed from the ring (weights set to 0), but a small snag: Swift rings have an enforced minimum time between changes for data integrity reasons and the next availability for application will be at 20:15 UTC. Unfortunately, we'll need to wait.

Mon, Dec 23, 2:02 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
Maintenance_bot removed a project from T382705: High amount of 503/504 for swift uploads: Patch-For-Review.
Mon, Dec 23, 1:30 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
MoritzMuehlenhoff triaged T382707: Frequent disk resets on ms-be2075 as Medium priority.
Mon, Dec 23, 1:30 PM · SRE, DC-Ops, SRE-swift-storage, ops-codfw
MoritzMuehlenhoff created T382707: Frequent disk resets on ms-be2075.
Mon, Dec 23, 1:30 PM · SRE, DC-Ops, SRE-swift-storage, ops-codfw
gerritbot added a comment to T382705: High amount of 503/504 for swift uploads.

Change #1106303 merged by BCornwall:

[operations/puppet@production] Swift: Mark ms-be2075 as failed, remove from prod

https://gerrit.wikimedia.org/r/1106303

Mon, Dec 23, 1:19 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
TheDJ renamed T382705: High amount of 503/504 for swift uploads from High amount of 503 for swift uploads to High amount of 503/504 for swift uploads.
Mon, Dec 23, 12:47 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
BCornwall added a comment to T382705: High amount of 503/504 for swift uploads.

ms-be2075 has a data link reset a few times a minute:

Mon, Dec 23, 12:45 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
TheDJ added a comment to T382705: High amount of 503/504 for swift uploads.

oh, wrong link, and wrong screenshot, I copied from the wrong browser tab :D

Mon, Dec 23, 12:44 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
gerritbot added a project to T382705: High amount of 503/504 for swift uploads: Patch-For-Review.
Mon, Dec 23, 12:43 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
gerritbot added a comment to T382705: High amount of 503/504 for swift uploads.

Change #1106303 had a related patch set uploaded (by BCornwall; author: BCornwall):

[operations/puppet@production] Swift: Remove ms-be2075 from prod hosts

https://gerrit.wikimedia.org/r/1106303

Mon, Dec 23, 12:43 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
BCornwall changed the status of T382705: High amount of 503/504 for swift uploads from Open to In Progress.
Mon, Dec 23, 12:42 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
TheDJ raised the priority of T382705: High amount of 503/504 for swift uploads from High to Unbreak Now!.
Mon, Dec 23, 12:41 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
TheDJ updated the task description for T382705: High amount of 503/504 for swift uploads.
Mon, Dec 23, 12:37 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
TheDJ added a comment to T328872: Commons: UploadChunkFileException: Error storing file: backend-fail-internal; local-swift-codfw.

Thank you for reporting @Yann. I created T382705 for this one.

Mon, Dec 23, 12:37 PM · API Platform, MediaWiki-File-management, MW-1.41-notes (1.41.0-wmf.25; 2023-09-05), Unstewarded-production-error, MediaWiki-Uploading, Wikimedia-production-error, SRE-swift-storage, Commons
TheDJ triaged T382705: High amount of 503/504 for swift uploads as High priority.
Mon, Dec 23, 12:36 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
TheDJ created T382705: High amount of 503/504 for swift uploads.
Mon, Dec 23, 12:36 PM · Data-Persistence, MediaWiki-Uploading, SRE-swift-storage
TheDJ added a comment to T382445: Check and convert SVGs on commons to have a MIME-type of image/svg+xml.

For any sort of maintenance, we either have to reset the mime type of all svgs, or preferably, we need to list files on swift by a header property and only reset the text/plain files. I don't think we have something for that last option I MediaWiki however, and I'm not sure swift itself even allows that ?

Mon, Dec 23, 10:49 AM · SRE-swift-storage, SVG, Commons
Yann added a comment to T328872: Commons: UploadChunkFileException: Error storing file: backend-fail-internal; local-swift-codfw.

This happened again while trying to upload a new version of https://commons.wikimedia.org/wiki/File:The_Lion_of_the_Moguls_-_Le_Lion_des_Mogols_(1924)_dir._Jean_Epstein.webm

04369: finalize/189> Still waiting for server to publish uploaded file
04374: FAILED: stashfailed: An unknown error occurred in storage backend "local-swift-codfw".
Mon, Dec 23, 10:28 AM · API Platform, MediaWiki-File-management, MW-1.41-notes (1.41.0-wmf.25; 2023-09-05), Unstewarded-production-error, MediaWiki-Uploading, Wikimedia-production-error, SRE-swift-storage, Commons
MatthewVernon added a project to T382694: Unable to restore File:Model 4000-First of Odakyu Electric Railway 2.JPG: Commons.

Both swift clusters do indeed have an object there - in eqiad last modified 2021-06-30, in codfw last modified 2021-01-05, and both objects are the same image.

Mon, Dec 23, 9:11 AM · Commons, SRE-swift-storage
Pppery added a project to T382694: Unable to restore File:Model 4000-First of Odakyu Electric Railway 2.JPG: SRE-swift-storage.
Mon, Dec 23, 5:19 AM · Commons, SRE-swift-storage

Sat, Dec 21

Glrx added a comment to T382445: Check and convert SVGs on commons to have a MIME-type of image/svg+xml.

My vague, ancient, memory is SVG files without an XML processing instruction used to be tagged as text/plain.

Sat, Dec 21, 5:20 PM · SRE-swift-storage, SVG, Commons

Thu, Dec 19

dancy moved T381109: Wikimedia\RequestTimeout\RequestTimeoutException: The maximum execution time of {limit} seconds was exceeded from Untriaged to Nov 2024 on the Wikimedia-production-error board.
Thu, Dec 19, 4:39 PM · SRE-swift-storage, MediaWiki-Uploading, MediaWiki-File-management, Wikimedia-production-error, Commons
dancy added a project to T381109: Wikimedia\RequestTimeout\RequestTimeoutException: The maximum execution time of {limit} seconds was exceeded: SRE-swift-storage.
Thu, Dec 19, 4:39 PM · SRE-swift-storage, MediaWiki-Uploading, MediaWiki-File-management, Wikimedia-production-error, Commons
TheDJ added a comment to T382445: Check and convert SVGs on commons to have a MIME-type of image/svg+xml.

I vaguely remember that this happened for invalid svgs when MediaWiki did not yet supply the content type to swift, and instead we relied on the swift side to determine the content type at upload time..

Thu, Dec 19, 9:45 AM · SRE-swift-storage, SVG, Commons