Parsoid/Deployments/2016

Tuesday, December 21, 2016 around 5:03 am PT: Yes Deployed e7e3a4dc on the deploy-20161221 branchEdit

  • ApiRequest: Clone the request options before modifying them.

Tuesday, December 20, 2016 around 7:48 am PT: Yes Deployed 5eb649e8Edit

  • Use mwApiServer as the provider of the full URI of the MW API
  • Add a mwApiServer configuration variable
  • Add arbcom_cswiki to site matrix

Thursday, December 15, 2016 around 10:24 am PT: Yes Deployed 6719e240Edit

  • task T96555: Ignore self-closed tags when extending source
  • Drop native LST altogether
  • Fix DOMDiff annotations
  • Linter:
    • Fix bug in self-closing-tag category + other cleanup
    • Fix crasher when linting a gallery
    • Apply lint sampling when sending it to the logger as well
    • Don't provide 'src'

Wednesday, December 14, 2016 around 1:24 pm PT: Yes Deployed 60ee19acEdit

wt2html:

  • task T119265: Add more page-level metadata that MCS can use
  • Support extension tags which shadows block level elements
  • Move section handling to the LST extension
  • task T104523: Prevent infinite recursion
  • task T104662: Allow nested ref tags only in templates

Linting (disabled in production):

  • Use ApiRequest.js to post results
  • Handle MW API errors that come with a HTTP 200

Debugging:

  • Let extensions supply the pp tracing name

Monday, December 12, 2016 around 1:35 pm PT: Yes Updated production configEdit

  • Bump table cell and list item resource limits to 40K (from 30K)

Wednesday, December 7, 2016 around 1:21 pm PT: Yes Deployed 3cf19c6bEdit

  • Bump HTML contentVersion to 1.3.0 (see updated spec)
  • task T151570: Update SiteMatrix data fork for last 3 wiki creations
  • task T149209: Deal with newlines in <td> and <th> cells
  • task T150213: Suppress logs for known unknown contentmodels
  • task T152073: Reduce request timeout to 110s (from 3min) and worker timeout to 115s (from 3min); Increase M/W batcher API timeout to 65s
  • Some configurations moved to vars.yaml in the deploy repo
  • s/warning/warn/ to match service-runner's levels
  • Don't entity escape extension attribute values from data-mw
  • Normalize all extension options, not just native
  • Remove unused package gelf-stream
  • Linter: Add linting of self-closed tags
  • Testing:
    • Remove scrolling by access key
    • require('should') in lintertests.js for standalone runs

Monday, November 7, 2016 around 1:29 pm PT: Yes Deployed 2c2fe425Edit

  • Cleanup http redirects
  • Send error responses in the requested format
  • Fix processing listeners in node v7.x

Wednesday, November 2, 2016 around 1:27 pm PT: Yes Deployed 173d7e32Edit

  • task T149241: Whitelist content model fallback
  • Testing:
    • Don't expose dev routes in production
    • Get rid of simple debug helpers
    • task T119228: Stop testing on node v0.10.x
  • Linter:
    • Add node name for missing-end-tag
  • Remove higher resource limits (max wikitext page size, max # list items, max # table cells per page) and fall back to default limits.

And the commits that were attempted to deploy on Oct. 26th (ede4353):

  • task T141723: Bump mediawiki-title
  • task T141905: Fix crasher and other bugs of that category
  • service-runner doesn't recognize warning level
  • Stop asserting that we'll never be encapsulating a flipped range
  • Lots of linter fixes / features (currently, linting is disabled in production though)
  • Remove html5 treebuilder in favour of domino's
  • Bump domino to 1.0.27
  • task T147742: Trim template target after stripping comments
  • task T48580, task T133320: Allow extensions to handle specific contentmodels

Tuesday, November 1, 2016: Parsoid cluster upgraded to node v4.6Edit

Ops upgraded node on the Parsoid eqiad cluster to node v4.6. The (backup) codfw cluster had been upgraded on Monday.

Monday, October 31, 2016 around 1:34 pm PT: Yes Deployed e503e801Edit

Wednesday, October 26, 2016 around 1:15 PT: ede4353 to be deployed Reverted to 63f1e151, contentmodel errsEdit

  • task T141723: Bump mediawiki-title
  • task T141905: Fix crasher and other bugs of that category
  • service-runner doesn't recognize warning level
  • Stop asserting that we'll never be encapsulating a flipped range
  • Lots of linter fixes / features (currently, linting is disabled in production though)
  • Remove html5 treebuilder in favour of domino's
  • Bump domino to 1.0.27
  • task T147742: Trim template target after stripping comments
  • task T48580, task T133320: Allow extensions to handle specific contentmodels

Monday, October 24, 2016 around 1:42 pm PT: Yes Deployed 63f1e151Edit

Wednesday, September 21, 2016 around 1:17 pm PT: Yes Deployed a802de0Edit

  • Tokenizer:
    • Encapsulate protected table attributes from wt
    • Inline generic_attribute_newline_value and table_attribute_value
    • Set srcOffsets for table_attribute and generic_newline_attribute
  • HTTP API:
    • Page id and revid aren't the same thing
    • html2html should require an original or previous revision

Wednesday, September 14, 2016 around 1:11 pm PT: Yes Deployed aed15ddaEdit

  • Let native extensions add stylesheets
  • Move getAPIProxy to parsoidConfig
  • Other minor refactorings and parserTest changes

Monday, September 12, 2016 around 1:40 pm PT: Yes Deployed f7c43009Edit

  • Handle HTML tags in attribute text properly
  • AttributeExpander: Tweak check for improved code readability
  • Testing:
    • Bump worker_heartbeat_timeout to 2mins for testing
    • Allow specifying a specific revision for roundtrip-test.js

Tuesday, September 6, 2016 around 10:37 am PT: Yes Deployed 7863e6adEdit

  • task T142617: Handle invalid titles in transclusions
  • Sanitizer fixes:
    • Decode all char refs in text
    • Ignore some fields when freezing SanitizerConstants for node v6.5 -- no-op for Wikimedia cluster that runs node v4.x
  • node-module updates:
    • Bump service-runner to v2.1.0
    • Remove bunyan
  • Some minor cleanups

Monday, August 29, 2016 around 1:10 pm PT: Yes Deployed 48cf803eEdit

  • Run localSettings.setup after assigning options
  • Use service-runner's metrics reporter in the http api
  • Updates in preparation for supporting version 2.x content in the future -- should be no-op for version 1.x content
    • Support downgrading 2.x content to 1.x
    • No content reuse from semantically different content versions
    • task T143356: Establish precedence for data-mw in 2.0.0 content

Monday, August 22, 2016 around 1:12 pm PT: Yes Deployed df53a991Edit

  • task T142998: html2wt: Fix crasher in DOM normalization code
  • task T141370: Use service-runner's logger as a backend to Parsoid's logger

Wednesday, August 17, 2016 around 1:09 pm PT: Yes Deployed 3cf877bbEdit

  • html2wt: Always emit canonical wikitext for url links
  • html2wt: Emit url-links where appropriate no matter what rel attribute says

Monday, August 15, 2016 around 1:09 pm PT: Yes Deployed f039dcf6Edit

  • migrateTrailingNLs DOM pass: Code simplifications and some subtle edge case bug fixes
  • task T138864: Deal with edge cases serializing links
  • Remove deprecated "disablepp" MediaWiki API param and pass "disablelimitreport" instead
  • Increase resource limits for wikitext size, max table cells, and max list items
    • With the upgrade to node v4, we have more breathing room for parsing large pages

Wednesday, August 10, 2016 around 1:10 pm PT: Yes Deployed 4de49e26Edit

  • Handle caption-like text outside tables
  • Table captions: Remove unneeded mw:TSRMarker meta token + add TSR info in tokenizer which leads to more accurate DSR offsets.
  • When table wikitext shows up outside tables and are converted to strings, strip attached mw:TSRMarker tags
  • computeDSR: Fix source of pathological O(n^2) behavior

Tuesday, August 9, 2016 around 11:15 am PT: Yes Deployed a577d80eEdit

  • Fix crasher in escapeWikitext
  • task T140898: Update site matrix for tcy.wikipedia.org

Tuesday, August 2, 2016 - Tuesday August 9, 2016: Yes Upgrade Parsoid cluster to node v4.x and JessieEdit

  • task T135176: Over the week, Operations upgraded the cluster gradually.
    • The eqiad cluster was fully migrated by Friday, August 5th.
    • The codfw cluster was fully migrated by Tuesday, August 9th.

Monday, August 1, 2016 around 1:15 pm PT: Yes Deployed abf396ebEdit

  • Fix title parsing of subpages during initialization (addresses crashers while parsing these pages)
  • Only apply data-* attributes in /pagebundle/ paths (API cleanup)
    • Determines the content version in the html2wt direction, enabling content upgrade

Tuesday, July 26, 2016 around 10:12 am PT: Yes Deployed 285b6983Edit

  • Use mediawiki-title package to replace homegrown Title code (resolves task T113322, task T133425, and task T139135)
  • Reintroduce a 3-minute request timeout
  • Bump some minor / patch level versions of dependencies (addresses a security advisory)
  • Prevent JSON.stringify circular refs in template wrapping trace/error logs

Thursday, July 21, 2016 around 9:30 am PT: Yes Deployed ed2f8228Edit

  • Test deploy to verify trebuchet deployment is not broken after all the tinkering done during the service-runner deploy. The deployed change was a change that only affects parser tests.

Wednesday, July 20, 2016 between 7:30 - 8:20 am PT: Yes Deployed 45beb6c0Edit

  • task T90668: Update Parsoid to use the service-runner framework
    • In collaboration with Services & Ops teams
    • wtp1001 and wtp1002 were transitioned over July 19, 2016 between 8:00 - 9:00 am PT

Monday, July 11, 2016 around 1:10 pm PT: Yes Deployed e738c415Edit

  • task T131564: Respect $wgInterwikiMagic setting while parsing lang-links
  • task T139388: DOMDiff: Skip over encapsulated content rather than about-id content (fixes problem with lost edits in content nested in elements with templated attributes)
  • Code cleanup (don't expect functional changes): Use a more appropriate DOM helper (s/hasParsoidAboutId/isEncapsulationWrapper/) where appropriate

Monday, June 27, 2016 around 1:08 pm PT: Yes Deployed dd8e644dEdit

  • Template wrapping: Eliminate pathological tpl-range nesting scenario

Thursday, June 23, 2016 around 10:30 am PT: Yes Deployed 18022c96Edit

  • Emit single newline separator in table wikitext for new content
  • Make the http connect timeout configurable
  • Update many deps by minor version
  • task T137406: Ensure newlines are added where required around thead/tbody/tfoot
  • task T96195: Remove node 0.8 support (does not affect WMF deploy of Parsoid)

Wednesday June 15, 2016 around 1:10 pm PT: Yes Deployed 3445ecebEdit

Non-functional changes (these will come into play once we move to v2.0.0 of Parsoid HTML):

  • Roundtrip 2.0.0 content
  • task T114413: Provide HTML2HTML endpoint in Parsoid

Monday, June 6, 2016 around 1:15 pm PT: Yes Deployed e8d6092eEdit

  • Normalize all lists to not mix wikitext and HTML list syntax (selser prevents unnecessary dirty diffs in production)

Thursday, June 2, 2016 around 10:40pm PT: Yes Deployed 7188080bEdit

  • task T134389: Serialize content in HTML tables using HTML tags
  • task T125419: Fix selser issues serializing first table row
  • Selser: Bug fix reusing separator text from original source

Wednesday, June 1, 2016 around 1:15 pm PT: Yes Deployed afb0d522Edit

  • Bump core-js from v1.2.6 to v2.4.0
  • Bump yargs from v1.3.1 to v4.7.1
  • Don't use non-standard array generic functions (Array.reduce, etc.) - removed from newer version of core-js
  • Use normalized form of default page "Main_Page" instead of "Main Page"
  • task T135596: Return client error for missing data attributes
  • Fix up the internal forms to use v3 post endpoint
  • Add a page/wikitext/:title route to GET wikitext for a page

Thursday, May 19, 2016 around 11:38am PT: Yes Deployed 67816adfEdit

  • task T100681: Remove deprecated v1/v2 HTTP APIs.
  • task T130638: Content negotiation; Add data-mw as separate JSON blob in the page bundle.
  • Strict Accept header checking is turned off; we will return 1.2.x format if an invalid Accept header is provided (which is allowed by RFC 2616).

CLEARED DIRTY REPOS which had this patch applied as root during the restbase/changeprop/parsoid outage:

diff --git a/lib/api/routes.js b/lib/api/routes.js
index 4d08922..d372c2f 100644
--- a/lib/api/routes.js
+++ b/lib/api/routes.js
@@ -377,6 +377,7 @@ module.exports = function(parsoidConfig, processLogger) {
        var v1Wt2html = function(req, res, wt) {
                var env = res.locals.env;
                var p = apiUtils.startWt2html(req, res, wt).then(function(ret) {
+                       if ( ret.oldid === 106801025 ) { return false; }
                        if (typeof ret.wikitext === 'string') {
                                return apiUtils.parseWt(ret)
                                        // .timeout(REQ_TIMEOUT)

Wednesday, May 4, 2016 around 1:15 pm PT: Yes Deployed b0d015faEdit

Monday, May 2, 2016 around 1:15 pm PT: Yes Deployed 0a26f3a4Edit

  • html -> wt: For invalid links, text doesn't need escaping in link context
  • DOMDiff: Fix marking data-is-block on extra base nodes
  • Add autoload mechanism for user extension code -- proof-of-concept for future use
  • Update shrinkwrap after 23c97752
  • Code cleanup: should not affect functionality
    • Keep the data-* attributes at the edges of the DOM
    • Remove ParsoidCacheRequest
    • Organize post-processors distinguishing handlers
    • Move the dumper to DOMUtils and use more widely

Monday, April 25, 2016 around 1:05 pm PT: Yes Deployed d5363193Edit

  • task T130645: Pass the right title to PHPParseRequest
  • Don't allow unclosed extension tags
  • Code cleanup: should not affect functionality
    • task T95325: Move tsrDelta to dp.tmp
    • Rename DU.serializeChilden to DU.serializeToXML
    • storeDataParsoid is an env variable, not a Parsoid config property

Monday, April 11, 2016 around 1:15pm PT: Yes Deployed e3766b79Edit

  • Count api version use
  • Don't dom-diff on a cloned node
  • task T95325: Migrate temporary data to dp.tmp
  • Suppress errors raised when getting debugging info
  • Code cleanup: should not affect functionality
    • Fix some variable shadowing
    • Stop working on cloned nodes in parserTests
    • Rename timer to stats, since we do counting too
    • Fix regression testing tool
    • Fix crasher and more informative rt errors

Wednesday, April 6, 2016 around 1:15 pm PT: Yes Deployed 5f6c0c60Edit

  • task T116020, task T53852: Serialize localized image options (already cherry-picked yesterday)
  • Stop suppressing escaping errors
  • Remove the broken_template rule in the PEG tokenizer -- no need to wrap {{, {{{, }}, }}} in <nowiki> spans
  • Code cleanup: should not affect functionality
    • Cleanup some fallback rules in the PEG tokenizer
    • Use Util.placeholder in a few more places
    • Be consistent with dp.src check

Tuesday, April 5, 2016 around 2:40pm PT: Yes Deployed a5be1cdcEdit

  • task T116020, task T53852: Cherry-pick of image option localization patch to match alias reordering in mediawiki core version 1.27.0-wmf.20.
  • Deployed cherry-pick from deploy-20160405 branch.

Monday, April 4, 2016 around 1:10 pm PT: Yes Deployed 579ec3e6Edit

  • Fix log type in cite implementation
  • Code cleanup: should not affect functionality
    • Move dp.src handlers to their respective dom handlers
    • Add new env.normalizeAndResolvePageTitle helper and use it

Wednesday, March 30, 2016 around 1:15 pm PT: Yes Deployed a20ef276Edit

  • Bump HTML version number to 1.2.1
  • Declare charset with <meta charset>
  • Add html/dp version numbers in <head> instead of full content type
  • task T113331: Move auto-generated refs flag from data-parsoid to data-mw
  • Default ParsoidConfig.loadWMF to false
  • Bump node-uuid to 1.4.7 for nsp

Wednesday, March 23, 2016 around 1:15 pm PT: Yes Deployed 5538d868Edit

  • Don't construct regexp with a regexp when flags need to be set
  • Don't export Namespace since it isn't used anywhere else
  • task T129752: Include user agent in request logs
  • Tweak error prefixes for ease of browsing in logstash
  • Promisify the exposed batching methods
  • task T128659: Handle async createSocket

Monday, March 7, 2016 around 1:15pm PT: Yes Deployed 5db1d28bEdit

  • Cleanup and tweaks of transclusion formatting for clarity and fewer dirty diffs
  • Fix breakage in counting of HTTP status codes (broken by fix for T127983)

Tuesday, March 1, 2016 around 10:50am PT: Yes Deployed 1f7ed5d0Edit

  • task T128319: Fix bug in formatting of transclusions for block-format templates
  • Remove overloading of pipe stop in the PEG tokenizer -- eliminates incorrect parsing of pipes in external links

Monday, February 29, 2016 around 1:25pm PT: Yes Deployed d809ad7aEdit

  • task T127983: Don't crash on misconfigured statsd host
  • task T108134: Match html5 unquoted attribute parsing
  • Break for [[ in table attribute values too

Wednesday, Feb 24, 2016 around 1:15 pm PT: Yes Deployed 581a43c7Edit

  • Bump HTML content-type version to 1.2.0 (from 1.1.0) and data-parsoid content-type version to 0.0.2 (from 0.0.1)
  • Update parsoid content type meta tags in the <head>
    • <meta property="mw:parsoidVersion" content="0"/> is now changed to <meta property="mw:html-content-type" content='text/html; charset=utf-8; profile="mediawiki.org/specs/html/1.2.0"'/> to be more consistent with the version information that is output in the response headers.
    • For the non-pagebundle API endpoints, <meta property="mw:data-parsoid-content-type" content='application/json; charset=utf-8; profile="mediawiki.org/specs/data-parsoid/0.0.2"'/> is also emitted.
  • task T125266: Remove user/contribution information from header
  • task T90479: Assert param value serializes to a string
  • task T104599, task T111674: Fetch and use templatedata while serializing transclusions
    • data-parsoid semantics updated to use 'foo=bar' as the default transclusion arg spacing.
  • Remove data-mw.body.extsrc for the <references> tag (unused, and bloats data-mw)

Thursday, Feb 18, 2016 around 11:00 am PT: Yes Deployed dfbafb60Edit

Wednesday, Feb 10, 2016 around 1:15 pm PT: Yes Deployed 8976ab93Edit

  • Assert when flipped ranges are expected in template wrapping
    • This should have no functional changes in parsing. At best, it will catch a bug / failed expectation in the template wrapping code.

Monday, Feb 8 2016 around 1:15 pm PT: Yes Deployed 4d44fcc7Edit

  • Fix worker shutdown code in server.js + use it to restart stuck workers and to shutdown the Parsoid service
    • Expect that this will fix the scenario with stuck worker processes when Parsoid service is restarted during deploys.

Wednesday, Feb 3, 2016 around 2:45 pm PT: Yes Deployed 98619f7fEdit

  • Fix complex single-line nowiki handling
    • More robust algorithm + can eliminate some spurious nowikis
  • task T115289: Disable migrateTrailingNLs if table has had content fostered out of it
  • Some code cleanup
    • Removed some FIXMEs in nowiki escaping in <td>s
    • Tweaks to attribute parsing in the PEG tokenizer
  • Warn if prefix/domain is not unique during configuration
  • ParsoidConfig changes: Don't proxy nonglobal wikis (temporary special handling for labswiki and labstestwiki)
  • Config changes:
    • Remove hardcoded references to internal API LVS endpoint.
    • Removed references to unused parsoidcache.
    • Removed explicit config entry for labswiki (ParsoidConfig handles it now).

Monday, Feb 1, 2016 around 1:15 pm PT: 2fcc841f to be deployed Cancelled deploy to fix nowiki regressionsEdit

  • Warn if prefix/domain is not unique during configuration
  • Fix complex single-line nowiki tests
    • Can eliminate some spurious nowikis
    • But, can introduce spurious nowikis around [{{echo|foo}}] style wikitext -- 0.07% of pages in rt testing were affected, but with selective serialization, we expect impact to be small. We will consider possible solutions to minimize nowikis in this scenario, nevertheless.
  • task T115289: Disable migrateTrailingNLs if table has had content fostered out of it
  • Config changes: Remove hardcoded references to internal API LVS endpoint + removed references to unused parsoidcache.

Wednesday, Jan 20, 2016 around 1:45 pm PT: Yes Deployed f1ddfb88Edit

  • task T122816: Record when a range is subsumed from overlapping
  • Temporarily disable the request timeout (since they don't abort request processing and cancel cpu timeouts as well)
  • Reduce cpu timeout value to 3 minutes

Monday, Jan 11, 2016 around 1:15 pm PT: Yes Deployed 07494cf2Edit

wt2html

  • task T73154: Remove the vestiges of pipetrick entirely
  • task T114225, task T121611: Note that DOM tree building uses restrictive checks (documentation fix)
  • task T122054: Strip nowiki spans from templated / extension content
  • Match permitted attributes to php's getAttribsRegex

html2wt

  • Normalize DOM by stripping \u200e, \u200f next to category links (This is controlled by a config switch that we will turn on, if necessary)
  • Edge case fixes to serializing lists with templated portions

task T119883: Performance fixes (for large DOMs)

  • Use startsWith() instead of regex to match tag names in the DOM
  • Optimise shadow meta deletion
  • Bump domino to 1.0.21 (with performance fixes)

Other

  • task T55874: Add a generic extension registration mechanism
  • task T50891: Register ‎<translate> and <tvar> natively
  • task T122062: Update SiteMatrix, another wiki created
  • task T121611: Use httpStatus instead of code as the property on errors