Parsing/QA

This page catalogs all the QA functionality that the parsing team uses as part of its work. This includes custom QA tools that we've built or adapted over the years.

Parser Tests

These serve as wikitext specs for language behavior and expectations around how a piece of wikitext behaves and renders as HTML (and vice versa).

The authoritative version is in tests/parser/parserTests.txt in mediawiki-core and some extensions. The Parsoid project maintains a copy of these files in the Parsoid repository (and periodically syncs them with the primary versions).

Before Parsoid, this was primarily a wikitext to HTML spec. Parsoid has introduced a number of additional test modes.

Besides wt2html, Parsoid added wt2wt, html2wt, and html2html.
In addition, Parsoid also adds selser (selective serialization) test modes in two forms.
- In the manual form, the test can specify edits on the output DOM (for input wikitext) and specifies the expected wikitext output for the edited DOM. This form can also optionally indicate whether the edited DOM should be normalized to scrub generated wikitext.
- In the automated form, the test runner generates a number of random edits on the output DOM and passes that through regular wt2wt and selser wt2wt and compares the generated wikitext.

In the Parsoid codebase, the test runner is bin/parserTests.js and the tests are in tests/*.txt.

From a QA standpoint, these are integration tests that test the entire wt->html or html->wt subsytem of the codebase.

Parsoid-specific mocha tests

Certain kinds of tests are harder to specify in the parser tests format. For example, tests of the API behavior, regression specs, or expectations around the use of TemplateData, unit tests for individual token or DOM transformers, linting and some other behavior. Some of these tests mock the mediawiki API requests and responses. These tests are found in the tests/mocha directory of the Parsoid codebase.

From a QA standpoint, some of these are unit tests of specific pieces of code (DSR computation, linting, some API tests) and others are more integration tests (some API tests, templatedata tests).

Unit tests to aid Parsing porting to PHP

We have been developing this testing mode in 2018 in preparation for a port of Parsoid to PHP. The test files are generated by the genTest mode of the bin/parse.js script. These tests are fed to two different test scripts: one for token transformers and another for dom transformers. The tests are generated during parsing of pages from live production wikis. They specify the input to the token or DOM transformers and the expected output after the input is processed by that individual transformer.

By porting these test runners to PHP first, we expect to be able to validate the correctness (and get a measure of performance) of the porting of individual token or DOM transformers.

Mass round trip testing

See Parsoid/Round-trip_testing - this runs on ruthenium.

Before every Parsoid deploy, the code to be deployed goes through round trip tests. Any semantic test result regressions are pushed through tools/regression-testing.js to rule out false positives because of page edits (since the tests are run against production wiki pages).

Mass visual diff testing comparing Parsoid output to PHP parser output

See Parsoid/Visual Diffs Testing - this runs on ruthenium.

We occasionally run these tests to compare how Parsoid output renders relative to the PHP parser output. The testing normalizes the rendered output in a number of ways to eliminate false positives. We expect to rely on this test modality and use this more extensively as part of the Parsing/Parser Unification project.

Mass visual diff testing for other wikitext and functionality changes

See Parsing/Visual Diff Testing - this runs on parsing-qa-01.

We have used this test mode whenever we have had a need to change some behavior in wikitext.

On parsing-qa-01, we have two different test databases: mwexpts and tidy-vs-remex.

tidy-vs-remex was used as part of the Parsing/Replacing Tidy project which finished in July 2018. This test mode compared the output of a production wiki page that was tidied by HTML4-tidy vs when it was tidied by RemexHtml. We ran tests on a sample of over 70K pages from 50+ production wikis.

In the mwexpts test mode, we compare mediawiki rendering on one labs VM (running baseline mediawiki) with mediawiki rendering on another labs VM (running mediawiki with some change we wish to QA). We have used this test mode when making changes to language variants, testing proposed changes to media rendering, initial testing when we wanted to replace Tidy, changes to parsing of definition lists. In the future, we expect to use this test mode a bit more heavily to validate proposed changes to wikitext before we release them to production. Currently, we run tests on a 2016 snapshot of a sample of 50K+ pages from 41 production wikis. In the future, as we leverage this test mode more heavily, we expect to refresh this sample more regularly and include pages from a wider range of namespaces and wikis.