Parsoid/Todo:PHP parser integration

Extension expansion


Most extensions don't depend on order and frame state, so can be expanded in parallel and out-of-order. The following extensions are the exceptions among the 455 extensions in the wikimedia extensions.git repository.

Extension tags depending on frame state


The following extensions define extension tags (which are not run by the PHP preprocessor) that depend on the frame state (grep -r 'frame->expand' extensions; grep -r 'frame->getArguments' extensions):

  • Arrays (frame->expand, shared state so order-dependent)
  • Carp (debugging extension, low-level frame access)
  • ExtTab / ET_ParserFunction (frame->expand)
  • FacebookOpenGraph (parser->replaceVariables, parser->recursiveTagParse)
  • HTMLTags (parser->replaceVariables)
  • HashTables (frame->expand, frame->getArguments, order-dependent)
  • LabeledSectionTransclusion (frame->expand)
  • Loops (frame->expand, frame->getArgument, order/nesting-dependent)
  • Poem (parser->recursiveTagParse)
  • RSS (parser->recursiveTagParse on an optional per-RSS-item wikitext-based template)
  • SelectTag (parser->recursiveTagParse)
  • SoundManager2Button (parser->recursiveTagParse)
  • Spark (parser->replaceVariables)
  • Validator (parser->recursiveTagParse)
  • WikitextLoggedInOut (parser->recursiveTagParse)

Parser functions depending on frame state


These extensions only define parser functions (which are run by the preprocessor) that depend on the frame state:

  • CreatePage (frame->expand)
  • GeoData (frame->expand)
  • PageInCat (frame->expand)
  • ParserFun (frame->expand, frame stack access, ...)
  • ParserFunctions (frame->expand etc)
  • RegexFun (low-level frame access)
  • ReplaceSet (frame->expand)
  • Scribunto (frame->getArguments)
  • SemanticForms (frame->expand)
  • SemanticMediawiki (frame->expand etc, not 100% sure if it registers tags too)
  • SubpageFun (frame->expand)
  • WikiLovesMonuments (frame->expand, frame->getArgument)

Order-dependent parser functions:

  • UserFunctions (dynamic user-defined parser functions)

Parser functions adding global state:

  • Description2 (frame->expand); also adds an output hook which adds a global meta tag to the parser output

Order-dependent extensions


These typically maintain internal state between calls and expect all hooks in a page to be called sequentially.

Enabled on WMF wikis

  • Cite: We have a strategy on how to handle this by re-rendering references sections and numbering as a post-processing step on the full DOM.

Third party


Preprocessor (Function hooks):

  • Arrays: explicitly defines mutable state in arrays. WONTFIX.

Possible solutions

  • Add an expandTemplatesAndMostTagExtensions API method to the MW API that expands all templates and most extensions (possibly all except Cite).
    • Top-level template expansion is probably an ok granularity for incremental updates- highly dynamic extensions can still be inserted without a template wrapper to avoid template re-expansions
    • Avoids the need to serialize out & send back frame information for most extensions
    • Should add encapsulation tags around extension output so that we can treat it differently for sanitation purposes
  • Parse all templates in a single action=parse call, separated with unique strings so that the results can be split per template transclusion
    • Problem: Single-threaded, hides a lot of information we would like to have.
  • Instrument the PHP preprocessor to provide a serialized frame parameter for unexpanded extension tags
    • Lets us perform the expansion independently
  • Add API method for direct extension calls rather than action=parse
    • Can support wikitext-returning tag extensions (TODO: find those!)
    • Extension calls still needed for top-level extensions even with an expandTemplatesAndMostTagExtensions API method

Information we would like to get from action=expandtemplates and extension expansions

  • List of templates and parser functions used in the expansion
    • Lets us track dependencies and cacheability for selective re-rendering (examples: #time-dependent output as used on en:Main Page, which templates should trigger fragment re-rendering etc).
  • A TTL, if time-sensitive (for example for #time). The minimum of the TTLs of all parser functions used (if any).
  • Re-render events this content block depends on. Can be empty or any combination of the events listed in the fragment index section.
  • (Maybe:) Serialized frame for tag extensions in template expansion output
    • Lets us expand those extension tags with the proper frame
    • BUT: Parent frame access is not generally provided by common extensions: User:GWicke/Test

See also Parsoid/Page metadata.