< Specs‎ | HTML‎ | 2.8.0‎ | Extensions‎ | Cite

(This announcement was posted to wikitech-l on 2024-06-06.)

As the Content Transform Team works towards defaulting to Parsoid for read views on wikis, we have been working to address the long tail of differences that might impact the ecosystem of tools that consume content.

There are a couple of changes in Parsoid’s Cite output that we are rolling out this week and next week. Both of these changes are meant to ensure that code that use CSS selectors based on current read view HTML will continue to work properly with Parsoid’s HTML. But, since there are also tools that are written to consume Parsoid HTML, we also want to ensure that those tools don’t break with these changes.

  • If you don’t maintain a tool that consumes Parsoid’s HTML, you will not be impacted by these changes.
  • If you maintain a tool that targets current read-views HTML (whether desktop or mobile), these changes will potentially reduce / eliminate any breakage when Parsoid’s HTML is used for read views.
  • If you maintain a tool that consumes Parsoid HTML, please read further to ensure you aren’t impacted by these changes which we believe is quite unlikely.

Changes being rolled out


This week, we are rolling out this change which adds reference-text as a CSS class where Parsoid currently emits mw-reference-text. This is a non-breaking change that should not impact Parsoid clients in any way.  Note that mw-reference-text is still the preferred class name to use in new code, but temporarily adding the legacy unprefixed class name as well will enhance compatibility with some existing users.

Next week, we are rolling out this change which adds the mw-cite-backlink class to Parsoid HTML. By itself, that would be a non-breaking change. But, additionally, that patch also adds a <span> wrapper around the back-link HTML for non-named references. This is to match the HTML structure for backlinks in the current readview HTML. However, strictly speaking, this would be a breaking change to Parsoid HTML since altering the DOM tree structure could affect tools written to consume Parsoid HTML. However, we believe that the chance of this breakage is quite remote as explained in the commit message of the linked patch.

In the unlikely scenario that your Parsoid-HTML tool is affected


If any tool depends on a CSS selector like li > a[rel="mw:referencedBy"] which might now break because of the intervening span wrapper, you will have to adjust your code to handle the span wrapper.

The simplest thing to do would be to drop the unnecessary li > prefix in the selector. But, if for some reason you cannot, given that the HTML is cached, until the caches roll over (up to 4 weeks), you must be ready to handle HTML with or without the extra span wrapper.  You can look at the data-mw-parsoid-version attribute on the element matching the div.mw-parser-output selector to determine whether these changes are present on a page. If the value of the data-mw-parsoid-version attribute is or later, you are working with Parsoid HTML with the additional wrapper span. For Parsoid version or earlier, you are with Parsoid HTML without the additional wrapper.

Why are we skipping a major version bump?


Normally, breaking changes to HTML structure would require a major version bump to Parsoid’s HTML version (currently at 2.8.0). However, the content negotiation protocol is currently broken in the RESTBase + core REST API + Parsoid combination. RESTBase is also in the process of active deprecation and removal. So, we feel that we should wait to fix up the content negotiation protocol implementation at least till RESTBase is out of the picture. But, at the same time, we do not want to unduly delay the rollout of Parsoid HTML read views. Given the nature of the change (as noted above), we feel that breakage is extremely unlikely.