Parsoid/So you want your extension to work with Parsoid

This page aims at gathering information about extension development with Parsoid, in the form of a Frequently Asked Questions (FAQ) list.

What's the point?


To make Parsoid the default MediaWiki wikitext engine, the dependencies to the legacy parser must be removed. This means that the Parser.php class and the parser hooks will be deprecated at some point. To prepare for that, affected extensions should transition to the Parsoid API.

Is my extension affected?


An extension is affected if it does any of the following:

  • uses the Parser object
  • has a parser hook
  • uses ParserOutput::getExtensionData on pre-cached content

What do I need to do?


The following steps are documented in further sections in this document; this section aims at being a quick checklist of all the necessary steps.

  • Register the extension to Parsoid
  • Create hooks for Parsoid to reproduce the functionality of the existing parser hooks
  • Refactor the code that references Parser to provide a version that uses the Parsoid extension API for the Parsoid hooks
  • Possibly: add a dependency to Parsoid in the CI of the extension
  • Possibly: refactor code to use ParserOutput::appendExtensionData instead of ::getExtensionData followed by ::setExtensionData

How do I register the extension to Parsoid?


This is documented in detail at Extension registration and configuration. You'll need to edit extension.json to either flesh out the whole configuration there or to refer to a class implementing the Wikimedia\Parsoid\Ext\ExtensionModule interface. There is a small preference among the Parsoid developers for the second option, in order to be consistent with existing code bases and conventions.

How do I create hooks for Parsoid?


This is documented in detail at Extension registration and configuration and Mapping existing parser hooks. You'll need classes that extend Wikimedia\Parsoid\Ext\DOMProcessor or Wikimedia\Parsoid\Ext\ExtensionTagHandler, depending on the hooks you want to implement.

How do I refactor my uses of the Parser class?


The hooks provide access to a Wikimedia\Parsoid\Ext\ParsoidExtensionAPI which contains entry points into the Parsoid API. If the needs of your extension are not met by this API, please contact with the Content Transform Team by opening a Phabricator ticket so that we can discuss solutions.

How do I use appendExtensionData?


Instead of reading from extension data, modifying the result, and then rewriting it, use a combination of:

  • Writes to distinct keys
  • The ParserOutput::appendExtensionData method to collect lists

You can look at the existing patches linked to T300981 for examples. For example, instead of:

// In Parsoid, this is not guaranteed to fetch data from other
// instances of your extension on this page.
$data = $parserOutput->getExtensionData("my-extension");
$data[] = $someComplexObject;
$parserOutput->setExtensionData("my-extension", $data);

Use a pattern like:

$uuid = ...generate UUID, for example by hashing the input...;
// This will never overwrite an existing key
// (or if it does, $someComplexObject should be identical)
$parserOutput->setExtensionData("myextension-$uuid", $someComplexObject);
// This is a mergeable update:
$parserOutput->appendExtensionData("myextension", $uuid);

My CI is complaining about something else, it seems it can't find Parsoid code


Have a look at Parsoid's Continuous Integration section, it might help.

What about testing?


During the continuous integration of a build of an extension, .txt files in tests/parser/ get added to the list of parser tests to run. These will also be run against Parsoid under two conditions: (a) it contains the parsoid-compatible flag, and (b) the ParsoidPageConfigFactory class is available. (a) is handled in the test file by having its header be

!! options
!! end

As of January 2022, (b) requires the addition of 'parsoid' as a dependency to the extension in the CI configuration. An example of this configuration is provided in Gerrit change 745929. Note that this will imply that the tests are run against the tip of the Parsoid branch corresponding to the extension branch (by default, master) and not against the version of Parsoid referenced by MediaWiki core.

Do you have a minimal example for an extension that was easy to adapt?


Yes – have a look at Gerrit change 751751.