Content Transform Team/Weekly Updates
Week of Feb 21, 2022
editParsoid integration with core
- First draft of setFunctionHook support ready for review https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/761494
- ParserOutput compatible support in Parsoid is close to landing
- In prep for 1.38 and moving along the Parsoid and core integration, we're migration all of Parsoid's extension/* code to MediaWiki core repo
Maps
- Collaborating with WMDE folks on refactoring kartotherian
- By the end of the week we will mirror requests to eqiad
mobile-html services
- Mobile Preview on hold until next week
Week of Feb 14, 2022
editParsoid integration with core
- First draft of setFunctionHook support in https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/761494
- Some additions needed to the ContentMetadataCollector in Parsoid (https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/761996, https://gerrit.wikimedia.org/r/c/mediawiki/core/+/762008)
Extension Updates
- Linter
- fixes submitted for review
- namespace database column add is nearly done and Tag and Template column field add should begin
Performance
- Patch for benchmarking ready for review (T272331)
Maps
- Maps tile pregeneration is throwing errors (https://phabricator.wikimedia.org/T301664)
- Mitigation in place
Week of Feb 7, 2022
editParsoid integration with core
- Started work on supporting setFunctionHook and bridging Parsoid's Frame object with Parser.php's PPFrame_* object
Extension Updates
- Linter: further work on adding Template and Tag column
Performance
- Further work on roundtrip testing (T272331)
Maps
- Got the green light switch Maps traffic between clusters
Week of Jan 10, 2022
editMedia Output changes in core
- inline-media-captions lint turned out to have been a bad idea - disabling it for now
Parsoid integration with core
- Signature of addModules/addModuleStyles fixes landed in core
Performance
- Tim's patches from November rolled out on this week's train
Maps
- A bit of firefighting with T299216
Week of Jan 3, 2022
editMedia Output changes in core
- Linter category for inline captions deployed
- https://www.mediawiki.org/wiki/Special:LintErrors/inline-media-caption
- Still waiting on Parsoid to start populating the category though
- Added "resource" attribute to img tags
Parsoid integration with core
- Strip State Handling issues to resolve: T299103
- ContentMetadataCollector interface being implemented in core
Extension Updates
- CI fixes needed to run Parsoid wt2wt and other test modes in extension repositories
- Hiero
Maps
- Supporting WMDE on beta cluster issues
- Bugs reported with borders; Fixes being worked on
Week of Dec 13, 2021
editMedia Output changes in core
- Linter category for inline captions created and merged. Help page for the category created.
- "resource" attribute being added to img tags to fix T292657; matches Parsoid
- Concern about bloating HTML payload. See T297984
Extension Updates
- Translate
- Parsoid changes rolled out to production as part of wmf.13 train.
- InputBox
- Proof of Concept patch. Blocked on missing support in ParsoidExtensionAPI.
Performance
- All roundtrip regressions from Tim's patch have been fixed and tested. Will roll out to production in Jan
Maps
- Maps 2.0 stack has been rolled out to all wikis
- Last minute issue with overzoom fixed
- Swift backpressure issue
- When tile pregeneration parallelism is >= 5 workers cache latency increases not just for pregeneration but for all cache related ops
- To be investigated.
Week of Dec 6, 2021
editMedia Output changes in core
- T287965: Print styles are fixed
- Inline images & alt text handling: See T297443
Parsoid integration with core
- Exploring adding setFunctionHook support to Parsoid - related Parsoid SiteConfig fix along the way
Extension Updates
- Linter
- Patch to display all lints for a single page in gerrit
- Translate
- Split deployment into two pieces. With wmf.12, only html->wt support was introduced to add forward compatibility for Parsoid HTML version 2.4.0. Translate support will be rolled out on the next train.
- wmf.12 train got rolled back
- InputBox
- First proof of concept patch in gerrit; progress now requires discussion about ParsoidExtensionAPI
- SyntaxHighlight
- Exploration of why SyntaxHighlight cares about strip state in phabricator. Parsoid's behaviour is more reasonable overall but might need a temporary workaround to deal with Scribunto's use of this mechanism.
Performance
- Tim's last patch merged and sent to rt testing - regressions found and need to be investigated and fixed before it can be deployed (in the new year)
Visual Diffing
- Regressions in wmf.12 in image layout (bottom border) between core & Parsoid. Almost 5% drop in test pages without rendering diffs
- This is mostly something that pops up in visual diff testing more readily but impacts are subtle on wikis that will mostly not be noticed by readers or editors.
- Regression has been fixed in core and merged - will ride the next train.
Maps
- Maps 2.0 stack has been rolled out to frwiki - no complaints and everything stable
Everything else
- Filed T297259 for ServiceOps to run some perf benchmarking for us with newer hardware to estimate what hardware changes might be beneficial when Parsoid is used for read views on all wikis
- C.Scott (with Subbu's input) presented updates from the Parsoid / wikitext parsing world at SWMCon 2021
- WIP to look at better CI and parsertests support for extensions that are updated to work natively with Parsoid APIs
Week of Nov 29, 2021
editExtension Updates
- Translate
- Annotations support rolling out to production in next week's train
- Linter
- All lints for a single page patch nearing completion
- SyntaxHighlight
- Initial explorations to have it work with Parsoid's Extension API directly
Maps
- To deploy follow-up patch regarding label cut on Tegola
mobile-html Services
- Issue with graphs came up on phabricator T285093
Week of Nov 22, 2021
editParsoid integration with core
- ContentMetadataCollector interface: Basic patch merged in Parsoid
Performance
- TIm's autoInserted* flag detection via Remex patch cannot be merged till new train rolls out to production to update Remex version on scandium
Week of Nov 15, 2021
editParsoid integration with core
- First phase of ContentMetadataCollector should land this week (just a few methods left to audit) - might be underwhelming since most of the 'exciting' methods got punted to phab tickets
Extension Updates
- Translate
- RT testing showed a few issues, most of them corner-case-y; all the ones we found either in phab or need to be fixed on pages
Performance
- TIm's autoInserted* flag detection via Remex - patch in gerrit for review. CPU and memory benefits expected with rollout
Other
- Subbu met with SRE Data Persistence to discuss ParserCache capacity needs for Parsoid Read Views. TLDR is that after recent server upgrades, ParserCache has ~30% utilization and should be able to support Parsoid's HTML as well as long as we rollout to wikis in stages.
Week of Nov 7, 2021
editMedia output changes in core
- FAQ edited and approved
Extension Updates
- Translate
- Ran RT-testing, examined regressions and filed patches to fix them. Followups needed.
- Dirty diffs related to newline changes could impact translate behavior and needs investigation.
Performance
- No new updates. Tim busy working on PHP-VM bug
Maps
Maps v2: T263854
- Most of the tickets are resolved - the ones not resolved are either low priority or docs related
- testwiki now is connected to tegola backed kartotherian source
- Resolved some event related issues
- cronjob to trigger invalidation on OSM syncs
- kafka concurrency
- when we scaled workers kafka didn't allow concurrent consuming
- envoy + tegola k8s reliability issues
- Re-introduced batching in tegola pregeneration scrips
- Next steps
- Test pregeneration with production load
- Roll out to more wikis
mobile-html Services
- Mobile Preview problem statement submitted for preview - T295348
Other
- Filed TOC Incident report
- Discussed ParserCache implications of ParserOutput work with Amir (database arch)
Week of Nov 1, 2021
editMedia output changes in core
- Started working on the FAQ for the rollout, please add questions you want to see there
Extension Updates
- Translate
- Annotations patch merged. Three bugs identified via rt-testing. Investigation done. Patches Soon.
Performance
- Tim is working to get rid of the start/end meta addition to detect tree builder fixups and register handlers (via subclassing) with Remex to listen to treebuilder events. This has the potential to cut processing and memory if it works out.
Visual Diffing
- Something seems to have improved arwiki results a bit in the latest run
Maps
- FYI: WMDE is submitting some patches to Kartographer as part of their tech wishlist
- Still working on tile pregeneration
mobile-html Services
- Phab task to track Dark Themes Preview - T295299
Other
Production incidents
- Regression in ToC output caused firedrill Friday
- Should figure out how to pass __NOCONTENTCONVERT__ and some other properties to ParserOutput
- Should document proper mechanism for ParserCache updates
- Maybe zhwiki needs to be group 1 instead of group 2
- Proper versioning for ParserCache would be helpful. (Also RestBASE.)
- Sanitizer interactions with <meta> tags, needs followup this week (toc and translate)
- Follow up to rt testing interaction with mediawiki-vendor as well