Parsoid/Parsoid HTML Specification Versioning

The MediaWiki-specific DOM (which eventually serializes to HTML) is defined in a series of specifications, the latest of which can be accessed on Specs/HTML. This document defines the processes around the creation of a new version of this specification.

Context

edit

The Parsoid HTML specification versioning follows the MediaWiki API versioning policy. That page also explains the content negotiation that happens when a client requests a specific version of the content.

We use the terminology defined on https://semver.org when we talk about incrementing the major, minor and patch version, respectively (corresponding to a version number MAJOR.MINOR.PATCH).

When should a new version of the specification be published?

edit

The specification should be modified any time the output of Parsoid is modified in a way that is not compatible with the current specification. This includes the introduction, modification and deprecation of tags and attributes and of their semantics. The questions to answer are:

  • If I send content to a client, is the client able to parse and interpret it with the current specification?
  • If the new version of Parsoid reads an output generated by the previous version, is it able (without explicit modification/support of this previous version) to parse and interpret it with the current specification?
  • If the previous version of Parsoid reads an output generated by the new version of Parsoid, is it able to parse and interpret it with the current specification?

If the answer to at least one of these questions is no, a new version of the specification should be published. Additionally, some more work may be required to synchronize with clients - see So you are going to change Parsoid output.

How to decide between a patch, minor or major version?

edit

The major version number should be incremented when the change is not backward-compatible: clients would need an update to be able to parse and interpret the output of Parsoid.

The minor version number should be incremented when the clients would be able to parse and interpret the output of Parsoid, and able to send content readable by Parsoid without needing an update, but would not necessarily benefit from all the features of the content in question.

The patch version should be incremented when the client would not see a difference compared to the previous specification version. This may happen due to a bugfix of a deviation to the specification, or due to a modification of data in data-parsoid (which the client would not typically see anyway). Patch versions differentiation would then be used for Parsoid internal use.

Examples

edit
  • Adding <audio> tags to the possible tags in the output in Specs/HTML/2.0.0 required incrementing the major version because clients would require an update to handle the newly introduced <audio> tag and there was no previous expectation on this tag.
  • In Specs/HTML/2.3.0, the mw:html:version has been renamed to mw:htmlVersion, but the deprecated mw:html:version has been kept for backwards compatibility, which makes it possible to increment the minor version instead of the major version. An increment of the major version would be necessary to remove mw:html:version from the output.
  • The annotation semantics was implemented as a minor version change (2.4.0) because the <meta> tags used to delimit the regions were considered as transparent, and because elements that have an unknown "typeof" are supposed to be preserved by the client. Hence, the client would be able to handle this specification without modification, which removes the necessity to define a new major version. Conversely, to be able to benefit from the feature, the client would need an update, making the annotation specification more than a patch version increment.

What are the compatibility requirements when creating a new version?

edit

Content generated according to a new patch version must be backward compatible with the specifications of all the previous patch versions of the same minor. Additionally, content generated according to the previous patch versions of that minor must be forward-compatible with the new specification.

Content generated according to a new minor version must be backward compatible with the specifications of all the previous minor versions of the same major.

There is no expectation of compatibility for a major version; however, versions of Parsoid introducing the new major version must provide a way to serve and handle content for all supported major versions.

  Warning: When rolling out a new version of the specification, it is strongly advised that the version of Parsoid is able to read the new specification before it starts generating content according to the specification in question. Concretely, this means rolling out the HTML2WT direction of a specification change before rolling out the WT2HTML direction. This way, if the WT2HTML direction gets rolled back, the content generated by that version (and that may survive in caches, including RESTBase) can still be handled by Parsoid.

If incompatible content coming from a later version exists in RESTBase (for instance after the rollback of a train), RESTBase should be purged to avoid errors. This hypothesis has been explored in T296425 (but has not been tested yet as of 2021-12-20).

How to create a new version?

edit

An example of a minor version bump is provided in Gerrit change 735971:

  • update the AVAILABLE_VERSIONS constant in src/Parsoid.php
  • update the defaultContentVersion in bin/roundtrip-test.js
  • update the tests/api-testing/Parsoid.js to contain the correct defaultContentVersion and to update the tests to 'Accept' the correct version

The logic involved for downgrading content to another, older version of a specification lives in src/Parsoid.php::downgrade.

How to test?

edit

The behaviour of Parsoid when provided with/asked for different content versions is tested in tests/api-testing/Parsoid.js. The numerous tests already present in this file can serve as examples of how to test desired behaviours. See MediaWiki API integration tests for more information about the testing library.