Page MenuHomePhabricator

cscott (C. Scott Ananian)
Parser whisperer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:47 PM (508 w, 2 d)
Availability
Available
IRC Nick
cscott
LDAP User
Unknown
MediaWiki User
Cscott [ Global Accounts ]

Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.

On github: https://github.com/cscott

See https://en.wikipedia.org/wiki/User:cscott for more.

Recent Activity

Today

cscott added a member for Parsoid-Read-Views: cscott.
Thu, Jul 18, 2:27 PM

Yesterday

cscott added a comment to T369898: Reduce the number of resource_change and resource_purge events emitted due to template changes.

For completeness, another option is the varnish "x-key" system, which involves two research projects. One is that implementation of x-key in varnish appears to be incomplete, and the second is that the assignment of appropriate x-keys to URLs is non-trivial as well. There are too many templates used on a page like [[Barack Obama]] to naively assign one x-key to every recursively-included template, so we still need to come up with a mechanism to determine which of the templates deserve an x-key assigned, likely based on purge statistics.

Wed, Jul 17, 3:33 PM · Patch-For-Review, serviceops, Performance Issue, MediaWiki-Engineering, MediaWiki-Core-HTTP-Cache, ChangeProp
cscott added a comment to T369898: Reduce the number of resource_change and resource_purge events emitted due to template changes.

Another option is to subdivide pages into two categories, "high traffic pages" and "long tail low traffic pages". The latter would be put effectively into a no-cache state: the cache lifetime would be very short, and we would never emit purges for them, relying on the natural expiration to deal with vandalism. We'd only emit purges for the high traffic pages.

Wed, Jul 17, 3:29 PM · Patch-For-Review, serviceops, Performance Issue, MediaWiki-Engineering, MediaWiki-Core-HTTP-Cache, ChangeProp

Tue, Jul 16

cscott added a comment to T268250: Decide on a structure for galleries.

I like the <figure> inside <figure> option that @lzno proposed makes sense. Switching from inline to block styles makes the semantics clearer for accessibility, in terms of associating the caption with the image.

Tue, Jul 16, 2:25 PM · MediaWiki-Gallery, MW-1.40-notes (1.40.0-wmf.23; 2023-02-13), MW-1.38-notes (1.38.0-wmf.3; 2021-10-05), Parsoid, Parsoid-Media-Structure, Parsing-Active-Work

Mon, Jul 15

cscott moved T370061: Create Confidence Framework report for wikivoyage from Backlog to Current Deploy Target on the Content-Transform-Team-WIP board.
Mon, Jul 15, 3:23 PM · Parsoid-Read-Views, Content-Transform-Team-WIP
cscott added a project to T370061: Create Confidence Framework report for wikivoyage: Parsoid-Read-Views.
Mon, Jul 15, 3:23 PM · Parsoid-Read-Views, Content-Transform-Team-WIP
cscott created T370061: Create Confidence Framework report for wikivoyage.
Mon, Jul 15, 3:15 PM · Parsoid-Read-Views, Content-Transform-Team-WIP
cscott moved T368720: <code> rendering differences (visualdiff testing) from In Progress to Code Review on the Content-Transform-Team-WIP board.
Mon, Jul 15, 3:10 PM · Patch-For-Review, Content-Transform-Team-WIP, Parsoid-Read-Views, Parsoid

Thu, Jul 11

cscott merged T369282: Linter false positives caused by markup generated by <maplink> at the German Wikivoyage into T331655: Support `outputHasCoreMwDomSpecMarkup='mixed'` in Parsoid extension registration.
Thu, Jul 11, 2:26 PM · Parsoid
cscott merged task T369282: Linter false positives caused by markup generated by <maplink> at the German Wikivoyage into T331655: Support `outputHasCoreMwDomSpecMarkup='mixed'` in Parsoid extension registration.
Thu, Jul 11, 2:25 PM · Content-Transform-Team, dark-mode, Maps (Kartographer), MediaWiki-extensions-Linter
cscott added a comment to T369282: Linter false positives caused by markup generated by <maplink> at the German Wikivoyage.

We're going to work on T369454. The other issue related to whether/how linter recurses into extension-generated content I'm going to treat as a dup/subcase of T331655: Support `outputHasCoreMwDomSpecMarkup='mixed'` in Parsoid extension registration. Right now, it is desired behavior because it is catching a legitimate issue, so Not A Bug.

Thu, Jul 11, 2:24 PM · Content-Transform-Team, dark-mode, Maps (Kartographer), MediaWiki-extensions-Linter
cscott updated subscribers of T369614: Parsoid footnote marker fallback must be localized.

This is a won't fix, I believe, although I'm going to punt this to @ssastry for comment when he returns from sabbatical. We shouldn't really be emitting fallback text at all, and to the extent that we do it is desirable that it is consistent for machine-readability of citation markup. No humans should see that.

Thu, Jul 11, 2:18 PM · Parsoid (Tracking), WMDE-References-FocusArea
cscott assigned T369614: Parsoid footnote marker fallback must be localized to ssastry.
Thu, Jul 11, 2:18 PM · Parsoid (Tracking), WMDE-References-FocusArea

Wed, Jul 10

cscott added a comment to T369719: Decide how to use use StatsFactory inside Parsoid.

StatsFactory in particular is the thing we want to use, and it ends up depending on all the metric classes implicitly through its method signatures.

Wed, Jul 10, 2:02 PM · MediaWiki-Engineering, MediaWiki-libs-Stats

Tue, Jul 9

cscott added a comment to T368722: Missing edit-source button rendered in parsoid read views (from visualdiff testing).

Could we change both Parsoid and the legacy parser to agree that === === is not a valid heading? That would avoid the whole "blank ID" issue.

Tue, Jul 9, 2:52 PM · Content-Transform-Team-WIP, Parsoid-Read-Views (Phase 1 - DiscussionTools support), Patch-For-Review, Parsoid

Jun 17 2024

cscott added a comment to T359002: Lead paragraph is not hoisted in new Parsoid HTML.

For what it's worth, the new OutputTransform pipeline will (hopefully) maintain the document in DOM form throughout the transformation pipeline, eliminating the repeated HTML parsing steps. That's T347062: Create HtmlHolder interface.

Jun 17 2024, 4:19 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Web Team Essential Work 2024, Parsoid (Tracking)

Jun 14 2024

cscott created T367616: Rename data-mw.attrs to extAttrs to avoid confusion with data-mw.attribs.
Jun 14 2024, 10:36 PM · Content-Transform-Team-WIP, Essential-Work, Parsoid
cscott created T367584: After an appropriate time, drop `_complex_` from ParserOutput serialization.
Jun 14 2024, 5:06 PM · MediaWiki-Parser
cscott committed rMLJC5901928e7cfd: Gracefully handle embedded objects passed to ::newFromJsonArray().
Gracefully handle embedded objects passed to ::newFromJsonArray()
Jun 14 2024, 6:03 AM
cscott committed rMLJCe7773c6c39cd: Release v3.0.1.
Release v3.0.1
Jun 14 2024, 6:03 AM
cscott committed rMLJC55603d078daa: Update HISTORY.md after release.
Update HISTORY.md after release
Jun 14 2024, 6:03 AM
cscott committed rMLJC3075e454315f: Refactor JsonCodec::codecFor() to be non-recursive.
Refactor JsonCodec::codecFor() to be non-recursive
Jun 14 2024, 6:01 AM
cscott updated the task description for T367471: CTT tasks week of 2024-06-14.
Jun 14 2024, 12:55 AM · MW-1.43-notes (1.43.0-wmf.10; 2024-06-18), Essential-Work, Content-Transform-Team-WIP
cscott updated subscribers of T367471: CTT tasks week of 2024-06-14.

We have a few issues with regression testing:

  • Out of disk space. I ran the script in https://wikitech.wikimedia.org/wiki/Parsoid/Common_Tasks#Freeing_disk_space but it failed halfway through (just after "Dumping results table") with a permissions error trying to write results.sql.gz. I was in my homedir and i'm pretty sure the script was running as my user so I don't know what that was about. At any rate, it had freed up enough space by that time to successfully start rt testing, so I didn't investigate further.
  • Between 7:30am-8am EST on Jun 13 we had a server issue, resulting in a large number of the following in the logs:
Zmqsmfn7_ueX-TT4hLc5ZwAAAII] /w/rest.php/fr.wikipedia.org/v3/page/wikitext/Badis_badis   Wikimedia\Rdbms\DBUnexpectedError: Database servers in cluster26 are overloaded. In order to protect application servers, the circuit breaking to databases of this section have been activated. Please try again a few seconds.

This resulted in a large number of 500 failures. With @Arlolra's advice I used the instructions in https://www.mediawiki.org/wiki/Parsing/Visual_Diff_Testing#Retesting_a_subset_of_titles to purge the failing files:

$ mysql -u testreduce -p testreduce
>update pages set claim_hash="",claim_num_tries=0, claim_timestamp=null,latest_stat=null,latest_result=null,latest_score=0,num_fetch_errors=0 where latest_score=1000000;

which (a) affected 166 rows, and (b) resulted in a huge spike in load average on the machine, as apparently all of those pages got requeued at once:

Top - 17:42:05 up 134 days,  7:20,  1 user,  load average: 102.97, 76.62, 43.64
Tasks: 148 total,  38 running, 110 sleeping,   0 stopped,   0 zombie

But eventually things settled down, although some of those pages failed with timeouts again and I had to repeat the sql command to requeue a handful of them for a third time.

Jun 14 2024, 12:37 AM · MW-1.43-notes (1.43.0-wmf.10; 2024-06-18), Essential-Work, Content-Transform-Team-WIP
cscott updated the task description for T367471: CTT tasks week of 2024-06-14.
Jun 14 2024, 12:29 AM · MW-1.43-notes (1.43.0-wmf.10; 2024-06-18), Essential-Work, Content-Transform-Team-WIP

Jun 13 2024

cscott added a subtask for T361013: Update lint tables independently of changeprop/restbase: T361413: Parsoid should perhaps use LinkUpdate job for lints instead of special Linter API/Hook.
Jun 13 2024, 2:33 PM · MW-1.43-notes (1.43.0-wmf.9; 2024-06-11), RESTBase Sunsetting, Content-Transform-Team
cscott added a parent task for T361413: Parsoid should perhaps use LinkUpdate job for lints instead of special Linter API/Hook: T361013: Update lint tables independently of changeprop/restbase.
Jun 13 2024, 2:33 PM · Content-Transform-Team-WIP, MediaWiki-extensions-Linter
cscott added a comment to T365847: Add "JSON with custom renderer" content model.

This seems related to the idea of "page data contexts" which has been floating around (eg T122934#9196348), but as a sorted inverted case -- instead of "an article page have a data context attached", this proposal is closer to "a data page with a presentation context attached". I think both are interesting concepts that I'd like to explore, but we don't have anything similar on our current roadmap. @daniel's proposal is workable and is very similar to the "article page with a data context attached", in that if you could associate a scribunto module with a Title, you could execute that scribunto module to generate the 'data context'. But here we want to execute the scribunto module to generate the article page... generating wikitext from lua is something i'm not a huge fan off. See [[Extension:ArrayFunctions]], which would be a way to generate the wikitext from tabular data.

Jun 13 2024, 2:22 PM · Content-Transform-Team, MediaWiki-Engineering, MediaWiki-Parser, MediaWiki-ContentHandler

Jun 12 2024

cscott committed rMLJC3bc8f56250e8: Release v3.0.0.
Release v3.0.0
Jun 12 2024, 8:17 PM
cscott committed rMLJC797c723056fa: Update HISTORY.md after release.
Update HISTORY.md after release
Jun 12 2024, 8:17 PM
cscott added a comment to T366083: Parser inconsistency when a <hX> tag has an id attribute.

Would be ideal if legacy could match Parsoid...

Jun 12 2024, 8:17 PM · Patch-For-Review, Parsoid, MediaWiki-Parser
cscott committed rMLJC49efc8b73bfe: New class hint suffixes `+` and `-` to control brace selection in output.
New class hint suffixes `+` and `-` to control brace selection in output
Jun 12 2024, 8:13 PM
cscott committed rMLJC076953830dda: Rewrite class hint suffix system to use Hint objects.
Rewrite class hint suffix system to use Hint objects
Jun 12 2024, 8:13 PM
cscott added a comment to T171398: On mobile domain, interwiki links for WMF wikis should be resolved as mobile rather than desktop.

With the resolution of T365483: Links from mobile go back to desktop host when using Parsoid Read Views this bug has been resolved for Parsoid read views, which means it will eventually get resolved on all wikis as we phase out legacy read views.

Jun 12 2024, 7:29 PM · Wikimedia-Interwiki-links, Mobile, MediaWiki-Interwiki
cscott added a comment to T278482: Add message parameter types for user groups, PageIdentity, LinkTarget, and UserIdentity..

We deprecated the 'object' parameter type used by this in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1036758 as it was unused in practice.

Jun 12 2024, 7:28 PM · Patch-For-Review, MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), User-ArielGlenn, MW-1.36-notes (1.36.0-wmf.38; 2021-04-06), MediaWiki-Internationalization, Platform Team Workboards (MW Expedition)
cscott added a project to T359761: Create a parser function to get the direction of a language or script: User-notice.
Jun 12 2024, 7:15 PM · User-notice, Patch-For-Review, RTL, MediaWiki-Internationalization, I18n
cscott added a project to T366623: Create a parser function to get the BCP47 code for a language: User-notice.
Jun 12 2024, 7:15 PM · User-notice, Patch-For-Review, MediaWiki-Parser, MediaWiki-Internationalization
cscott added a comment to T361081: Fix Special:LintErrors to use standard Title selector widget.

Linking bugs up: the original issue was T360865: Slow query in Special:LintErrors. Let's just make sure we're not regressing that bug.

Jun 12 2024, 5:58 PM · Patch-For-Review, MediaWiki-extensions-Linter

Jun 11 2024

cscott added a comment to T269499: [Epic] Make MobileFrontend compatible with Parsoid HTML.

T359002: Lead paragraph is not hoisted in new Parsoid HTML is currently stalling further visualdiff testing on the CTT side.

Jun 11 2024, 3:24 PM · Web-Team-Backlog (Needs Prioritization (Tech)), Web Team Essential Work 2024, Epic, Parsoid (Tracking), MobileFrontend
cscott updated subscribers of T359002: Lead paragraph is not hoisted in new Parsoid HTML.

Addressing this issue is high-ish priority for CTT because we find that inconsistent treatment of the lead paragraph between legacy and Parsoid tends to have knock-on effects with positioning that then throw off visualdiff results all the way down the page. So basically we find it hard to discover any /other/ issues with the MFE+Parsoid rendering because the noise from the lead section drowns out everything else.

Jun 11 2024, 3:23 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Web Team Essential Work 2024, Parsoid (Tracking)
cscott added a comment to T182041: Display math generates div inside of paragraph (HTML5 violation).

Seems reasonable. Be careful not to put any HTML block-level content (<p> or <div> tags, eg, which could get hidden inside a caption or other metadata) inside your <span>, but otherwise using CSS to get the block-level display seems to work fine from our perspective.

Jun 11 2024, 3:19 PM · MW-1.43-notes (1.43.0-wmf.10; 2024-06-18), RESTBase Sunsetting, Content-Transform-Team-WIP, MW-1.41-notes (1.41.0-wmf.3; 2023-04-03), HTML5, Math
cscott created T367141: Replace stdError with Error class inside DataMw.
Jun 11 2024, 5:10 AM · Essential-Work, Parsoid

Jun 10 2024

cscott created T367109: Create and use DataMwPart class.
Jun 10 2024, 6:35 PM · Essential-Work, Content-Transform-Team-WIP, Patch-For-Review, Parsoid, Technical-Debt
cscott created T367093: JsonCodec always serializes objects as `{ ... }`.
Jun 10 2024, 5:18 PM · JsonCodec
cscott moved T367074: Deprecate and remove ParsoidOutputAccess from In Progress to To Deploy on the Content-Transform-Team-WIP board.
Jun 10 2024, 3:45 PM · MW-1.43-notes (1.43.0-wmf.12; 2024-07-02), Parsoid, Patch-For-Review, Content-Transform-Team-WIP, Essential-Work
cscott moved T367074: Deprecate and remove ParsoidOutputAccess from Backlog to In Progress on the Content-Transform-Team-WIP board.
Jun 10 2024, 3:45 PM · MW-1.43-notes (1.43.0-wmf.12; 2024-07-02), Parsoid, Patch-For-Review, Content-Transform-Team-WIP, Essential-Work
cscott added projects to T367074: Deprecate and remove ParsoidOutputAccess: Essential-Work, Content-Transform-Team-WIP.
Jun 10 2024, 3:44 PM · MW-1.43-notes (1.43.0-wmf.12; 2024-07-02), Parsoid, Patch-For-Review, Content-Transform-Team-WIP, Essential-Work
cscott created T367074: Deprecate and remove ParsoidOutputAccess.
Jun 10 2024, 3:44 PM · MW-1.43-notes (1.43.0-wmf.12; 2024-07-02), Parsoid, Patch-For-Review, Content-Transform-Team-WIP, Essential-Work
cscott assigned T365413: Dollar sign as first character of a Wikipedia heading produces backslash-dollar in the table of contents to ihurbain.
Jun 10 2024, 3:41 PM · MW-1.43-notes (1.43.0-wmf.10; 2024-06-18), Essential-Work, Content-Transform-Team-WIP, MediaWiki-Parser
cscott claimed T366395: CommonsMetadata DataCollector::verifyAttributeMetadata should use ParserOutput::getRawText().
Jun 10 2024, 3:10 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Content-Transform-Team-WIP, CommonsMetadata

Jun 7 2024

cscott committed rMLBCabf6e8f560c9: Add homepage link to composer.json authors list.
Add homepage link to composer.json authors list
Jun 7 2024, 9:47 PM
cscott committed rMLIDc2ea49857fac: Add homepage link to composer.json authors list.
Add homepage link to composer.json authors list
Jun 7 2024, 9:37 PM
cscott committed rMLJC60f716e7f6c3: Add homepage link to composer.json authors list.
Add homepage link to composer.json authors list
Jun 7 2024, 8:37 PM
cscott added a comment to T36217: Rename emlwiki -> eglwiki.

Note also that LanguageConverter might be an appropriate solution to allow egl and rgn to co-exist on the same wiki.

Jun 7 2024, 4:49 PM · Patch-For-Review, Wiki-Setup (Rename), Wikimedia-Language-setup

Jun 6 2024

cscott added a comment to T366808: CTT tasks week of 2024-06-07.

Created a "midweek release" to break a Kartographer dependency based on a78e700def47f6733e0d81ea3dca3f566cfb5373, which Isabelle has started rt testing on.

Jun 6 2024, 2:48 PM · MW-1.43-notes (1.43.0-wmf.9; 2024-06-11), Essential-Work, Content-Transform-Team-WIP
cscott updated the task description for T366408: CTT tasks week of 2024-05-31.
Jun 6 2024, 2:44 PM · MW-1.43-notes (1.43.0-wmf.9; 2024-06-11), Essential-Work, Content-Transform-Team-WIP
cscott created T366808: CTT tasks week of 2024-06-07.
Jun 6 2024, 2:43 PM · MW-1.43-notes (1.43.0-wmf.9; 2024-06-11), Essential-Work, Content-Transform-Team-WIP
cscott added a project to T363711: <mw:editsection> markup visible on a specific page on Wikisource when using Parsoid: ProofreadPage.
Jun 6 2024, 2:24 PM · ProofreadPage, Parsoid-Read-Views
cscott added a project to T358818: Check MassMessages for Linter errors before sending: Editing-team.
Jun 6 2024, 2:14 PM · MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Essential-Work, Content-Transform-Team-WIP, Editing-team, MassMessage, MediaWiki-extensions-Linter
cscott added a comment to T358818: Check MassMessages for Linter errors before sending.

No one owns MassMessage, apparently. :(

Jun 6 2024, 2:10 PM · MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Essential-Work, Content-Transform-Team-WIP, Editing-team, MassMessage, MediaWiki-extensions-Linter
cscott updated subscribers of T366305: Consider splitting ContentHandler's fillParserOutput method into "fillHTMLParserOutput" and "fillMetadataParserOutput" or similar.

Tagging @daniel on this. In various places we use ParserOutput::getRawText() is null to indicate the "no request html" case. I could use some more context, though -- what are some of the confusion & bugs related to this? We really can't generate metadata output without generating HTML, so I've always been a bit hazy on what sort of optimization we're aiming for here. If we're trying to respond to a HTTP HEAD request (eg) the "metadata" we're talking about is pretty distinct from the "metadata" usually associated with the ParserOutput (page properties, categories,m etc) -- they overlap only in cache-related properties, and again (unfortunately) we have no way of computing the cache metadata without doing a full parse of the page.

Jun 6 2024, 2:08 PM · MediaWiki-Engineering, Content-Transform-Team, MediaWiki-Parser

Jun 4 2024

Tacsipacsi awarded T366623: Create a parser function to get the BCP47 code for a language a Like token.
Jun 4 2024, 7:48 PM · User-notice, Patch-For-Review, MediaWiki-Parser, MediaWiki-Internationalization
cscott created T366623: Create a parser function to get the BCP47 code for a language.
Jun 4 2024, 4:54 PM · User-notice, Patch-For-Review, MediaWiki-Parser, MediaWiki-Internationalization

Jun 3 2024

cscott committed rMLJCdaa7445f98d0: Ensure class hints work correctly even when class_alias is used.
Ensure class hints work correctly even when class_alias is used
Jun 3 2024, 9:41 PM
cscott committed rMLJC41ea138f35f4: Release v2.2.3.
Release v2.2.3
Jun 3 2024, 9:41 PM
cscott committed rMLJCb0b499d77c57: Update HISTORY.md after release.
Update HISTORY.md after release
Jun 3 2024, 9:41 PM
cscott updated subscribers of T319053: Parsoid doesn't support T34189 (interwiki link with localized title).

@ihurbain This is probably a task which your i18n fragments can now fix?

Jun 3 2024, 4:57 PM · MW-1.40-notes (1.40.0-wmf.14; 2022-12-12), Patch-For-Review, Parsoid-Read-Views (Phase 2 - testwiki Main namespace support), Parsoid
cscott moved T358950: Local interlanguage links don’t work with Parsoid read views from In Progress to Code Review on the Content-Transform-Team-WIP board.
Jun 3 2024, 3:28 PM · Patch-For-Review, Parsoid-Read-Views (Phase 1 - DiscussionTools support), Content-Transform-Team-WIP, Wikimedia-Interwiki-links, Parsoid
cscott moved T355099: Difference in paragraph wrapping after transclusion end from Current Deploy Target to Code Review on the Content-Transform-Team-WIP board.
Jun 3 2024, 3:24 PM · Patch-For-Review, Parsoid-Read-Views, Content-Transform-Team-WIP, Parsoid
cscott reassigned T293512: ParserOutput::getText() should be removed from ParserOutput from cscott to ihurbain.
Jun 3 2024, 3:19 PM · Patch-For-Review, Parsoid-Read-Views (Phase 1 - DiscussionTools support), Content-Transform-Team-WIP, MediaWiki-Parser, Parsoid
ihurbain awarded T366395: CommonsMetadata DataCollector::verifyAttributeMetadata should use ParserOutput::getRawText() a Fox token.
Jun 3 2024, 7:33 AM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Content-Transform-Team-WIP, CommonsMetadata

May 31 2024

cscott added a comment to T365433: Cannot save VisualEditor content in File namespace.

Strong hypothesis: this happens when CommonsMetadata is involved in the rendering of the page.
Specifically, the DataCollector::verifyAttributionMetadata runs getText, which on VE content is considered Parsoid content, which converts back and forth between content and PageBundle, which loses the version number during that operation.

This is probably also a main path for T365036.

May 31 2024, 10:12 PM · MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Essential-Work, Content-Transform-Team-WIP, Parsoid, VisualEditor, Multi-Content-Revisions
cscott created T366395: CommonsMetadata DataCollector::verifyAttributeMetadata should use ParserOutput::getRawText().
May 31 2024, 10:10 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Content-Transform-Team-WIP, CommonsMetadata

May 29 2024

cscott added a comment to T366142: REST API /transform/ endpoint api-testing tests break in Wikibase gate-and-submit.
{"reason":"The given page ([0:TransformSource_0ZbZaRyLAW]) does not belong to page ID 383 but actually belongs to 425",

Is this legit? Is the root cause here that we're reusing a page ID and/or a title?

May 29 2024, 4:16 PM · MW-1.43-notes (1.43.0-wmf.9; 2024-06-11), MediaWiki-REST-API, ci-test-error (WMF-deployed Build Failure), Wikidata

May 24 2024

cscott updated the task description for T365808: "Browser search" across related/split articles.
May 24 2024, 3:08 PM · MediaWiki-General, All-and-every-Wikisource
cscott added a comment to T365806: Infinite scroll for articles (split documents on wikisource).

Noting that implementing infinite scrolling will effectively result in a decline of T325607 since infinite scrolling is effectively unsupported by Google bot. :(

May 24 2024, 2:38 PM · MediaWiki-General, All-and-every-Wikisource
cscott added a comment to T365812: Generalized stable link mechanism to both *page* and *section*.

Yes, but the solution is specific to DT. This is an attempt to generalize that solution. I plan on chatting with the editing team next week to figure out to what extent their existing code is generalizable. My understanding is that it is pretty tightly tied to the way they look for "new comments" and do notifications; a generalized feature might need a way for an editor to manually make/fix a particular association between page and section rather than try to magically infer the correct link the way that DT does.

May 24 2024, 2:27 PM · All-and-every-Wikisource
cscott added a comment to T365806: Infinite scroll for articles (split documents on wikisource).

These are proposals/help-wanted for now. The WMF annual planning process also deals better with discrete features/interventions, so if we can collaboratively hammer out the mechanics here it makes it easier to insert into planning to get resources allocated.

May 24 2024, 2:22 PM · MediaWiki-General, All-and-every-Wikisource
cscott added a comment to T275319: Change $wgMaxArticleSize limit from byte-based to character-based.

Please comment on the specific tasks, your concerns are addressed there. But also feel free to suggest other solutions! The point is that @Fuzzy's list of desiderata is an excellent one, and we should continue to work on finding solutions to those issues that don't necessarily involve increasing article size ad infinitum. There are plenty of large topics that could benefit from UX improvements to better tie together a collection of separate pages.

May 24 2024, 2:19 PM · LPL Technical Support, Patch-For-Review, serviceops, SRE, Wikimedia-Site-requests
cscott added a comment to T275319: Change $wgMaxArticleSize limit from byte-based to character-based.

I've created the following five feature requests as strawdog proposals to address each of @Fuzzy's five concerns. Feel free to poke holes in my proposed solutions, suggest improvements or alternatives, etc. But I want to keep the focus on making split documents work better, since we're going to continue to butt up against article size limits.

May 24 2024, 1:36 PM · LPL Technical Support, Patch-For-Review, serviceops, SRE, Wikimedia-Site-requests
cscott created T365819: Shared citations for multiple pages.
May 24 2024, 1:33 PM · Cite
cscott updated the task description for T365812: Generalized stable link mechanism to both *page* and *section*.
May 24 2024, 1:16 PM · All-and-every-Wikisource
cscott created T365812: Generalized stable link mechanism to both *page* and *section*.
May 24 2024, 1:13 PM · All-and-every-Wikisource
cscott updated the task description for T365806: Infinite scroll for articles (split documents on wikisource).
May 24 2024, 1:04 PM · MediaWiki-General, All-and-every-Wikisource
cscott added a parent task for T365806: Infinite scroll for articles (split documents on wikisource): T365810: Export a collection of pages as a single document (PDF, HTML, printable) *client-side*.
May 24 2024, 1:04 PM · MediaWiki-General, All-and-every-Wikisource
cscott added a subtask for T365810: Export a collection of pages as a single document (PDF, HTML, printable) *client-side*: T365806: Infinite scroll for articles (split documents on wikisource).
May 24 2024, 1:03 PM · WS Export, Community-Tech, All-and-every-Wikisource
cscott created T365810: Export a collection of pages as a single document (PDF, HTML, printable) *client-side*.
May 24 2024, 1:03 PM · WS Export, Community-Tech, All-and-every-Wikisource
cscott updated the task description for T365806: Infinite scroll for articles (split documents on wikisource).
May 24 2024, 12:55 PM · MediaWiki-General, All-and-every-Wikisource
cscott created T365808: "Browser search" across related/split articles.
May 24 2024, 12:53 PM · MediaWiki-General, All-and-every-Wikisource
cscott created T365806: Infinite scroll for articles (split documents on wikisource).
May 24 2024, 12:47 PM · MediaWiki-General, All-and-every-Wikisource
cscott added a comment to T365803: Turn Interwiki Map page on meta into JSON.

The interwiki map is (in theory) generated by Extension:Interwiki, which could probably be convinced to export its table as a JSON file as well. It seems like the https://noc.wikimedia.org/conf/interwiki.php.txt is actually the target output, I don't see why that can't be generated as an optional export by Extension:Interwiki as well.

May 24 2024, 12:41 PM · MediaWiki-Platform-Team (Radar), MW-1.43-notes (1.43.0-wmf.10; 2024-06-18), MediaWiki-extensions-WikimediaMaintenance, WMF-General-or-Unknown

May 23 2024

cscott updated the task description for T365746: Deploy mni-Beng language converter on mniwiki.
May 23 2024, 5:42 PM · MediaWiki-Language-converter
cscott added a parent task for T313883: Some terms are transliterated in Bengali instead of Meetei Mayek in Wikidata. : T365746: Deploy mni-Beng language converter on mniwiki.
May 23 2024, 5:40 PM · MW-1.43-notes (1.43.0-wmf.14; 2024-07-16), Patch-For-Review, MediaWiki-extensions-CLDR, Wikidata
cscott added a parent task for T357853: LanguageConverter for mni-Beng (Support for both `Beng` script as well as `Mtei` scipt in `mniwiki`): T365746: Deploy mni-Beng language converter on mniwiki.
May 23 2024, 5:40 PM · MW-1.43-notes (1.43.0-wmf.14; 2024-07-16), Wikimedia-Hackathon-2024, MediaWiki-Language-converter
cscott added subtasks for T365746: Deploy mni-Beng language converter on mniwiki: T357853: LanguageConverter for mni-Beng (Support for both `Beng` script as well as `Mtei` scipt in `mniwiki`), T313883: Some terms are transliterated in Bengali instead of Meetei Mayek in Wikidata. .
May 23 2024, 5:39 PM · MediaWiki-Language-converter
cscott created T365746: Deploy mni-Beng language converter on mniwiki.
May 23 2024, 5:39 PM · MediaWiki-Language-converter
cscott renamed T357853: LanguageConverter for mni-Beng (Support for both `Beng` script as well as `Mtei` scipt in `mniwiki`) from Support for both `Beng` script as well as `Mtei` scipt in `mniwiki` to LanguageConverter for mni-Beng (Support for both `Beng` script as well as `Mtei` scipt in `mniwiki`).
May 23 2024, 5:36 PM · MW-1.43-notes (1.43.0-wmf.14; 2024-07-16), Wikimedia-Hackathon-2024, MediaWiki-Language-converter
cscott added a comment to T328695: Parsoid's Cite output could break gadgets, bots, user scripts.

For completeness, @subbu has proposed another alternative to #3 and #4 above, which I'll call "#3.5" make class="reference-text mw-reference-text" to both legacy and parsoid; aka have both parsers emit both classes.

May 23 2024, 4:53 PM · MW-1.43-notes (1.43.0-wmf.9; 2024-06-11), MW-1.41-notes (1.41.0-wmf.18; 2023-07-18), Cite, Content-Transform-Team-WIP, Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid
cscott added a comment to T275319: Change $wgMaxArticleSize limit from byte-based to character-based.

So, having written the above two patches to replace byte-size limits with character-size limits, let me set out why I'm afraid this is a Bad Idea and I've done a Bad Thing by even bringing up the possibility of a technical fix here. Sorry:

  • Solving the five tasks outlined by @Fuzzy in T275319#9818815 above would be a much better solution to the root cause problems here, and provide features generally useful to a number of projects outside this one particular case:
    • Contextual Understanding: solutions to this might involve the sort of 'infinite scroll' UX common on the web, where we could chain separate pages together to provide a seamless reading experience. This could work on any number of pages where there is a both a hierarchical or other structure as well as a suggested reading order (API docs, historical events, etc) T365806
    • Internal and External References: the Cite extension could be improved to better handle this case. Having citations stored in a separate page or subpage and then referenced in a way that would allow a final "references" section to combine them would be generally useful. I know the "citations in wikidata" folks are also thinking in this general direction. Other wikisources have also solved this problem with templates which manually insert links to a final end-notes section. Improving support for this sort of pattern would be great. T365819 is one proposal.
    • Link Integrity: Again a super general problem which would benefit from work. We have permalinks and a link shortener, and there are templates like {{anchor}} as well. DiscussionTools as well as done some work on maintaining links when sections are renamed and the page title changes (in DT's case, existing topics moved to archive pages). We just need to put the pieces together to allow a standard solution to allow you to write an anchor to a specific section of a page which is robust against both section renaming and the section being moved to a different title. T365812
    • Unsearchable Split Texts: Just bind Ctrl-F to a search query that includes all pages in a certain category or that share a page prefix. @Tgr had some ideas here, apparently the implementation is not hard at all. T365808
    • Exporting Issues: This was the Collection extension's entire raison d'ê·tre. It has suffered from neglect and a lack of maintainers -- as well as from DoS issues which touch on common problems we've discussed here. Creating a PDF for a book containing hundreds of pages of content is computationally expensive. Some way of providing that feature while also protecting it from abuse is needed, but this is something which could 100% be built in Wikimedia Labs. T365810
  • Increasing size limits is a one-way ratchet. Once articles of increased size are allowed through and stored in the database, it is really hard to get them back out. For better or worse, MediaWiki's article size limits were built around preventing overly-large articles from being stored in the first place and the code to deal with articles exceeding the limits which are already coming from the database is comparatively immature. I'd like to say "we'll try out larger limits on wiki X for a while, and if this leads to problems (with resource consumption, DoS attacks, etc) then we'll just bump them back to what they were" but that is, unfortunately, not straightforward from a SRE perspective. We'd need a rollback strategy Just In Case before we actually deployed something like this.
  • @Krinkle mentioned that there are various situations where we *do* actually want/need to know specific byte size limits on different things. The particular approach in the patch above limits byte size to 4x the character limit, due to the way that UTF-8 works, but that's not a particularly tight bound, and there are extensions of UTF-8 which permit 5- or 6-byte characters as well. You could consider counts based on PHP's grapheme_strlen but that could lead to even larger bytes-per-grapheme counts. We would probably want a combination of byte- and character-based limits just to ensure some amount of predictability.
  • As elaborated at length above, these patches are only a band-aid, and it is a near certainty that new source texts will be found which violate the any newly-raised limits. Solving @Fuzzy's five tasks would be a permanent solution.
May 23 2024, 3:52 PM · LPL Technical Support, Patch-For-Review, serviceops, SRE, Wikimedia-Site-requests

May 22 2024

cscott added a comment to T189108: Increase the « Post‐expand include size » process up to 2.5 MB.

This is a near-dup of T275319 and further discussion should take place on that ticket. It's not *entirely* the same thing, so I'm not going to formally close this as a dup, but the arguments being made for and against are almost identical so it really doesn't serve much of a point to have that discussion in two different places.

May 22 2024, 6:37 PM · Performance Issue, MediaWiki-Templates, MediaWiki-General