Parsing/Notes/Wikitext 2.0

This document is a high level document of how to evolve wikitext. Parsing/Notes/Wikitext 2.0/Strawman Spec has an initial strawman spec that tries to implement the proposal here.

Wikitext 2.0: Typed wikitext (?)

Think of this as retroactively imposing a type system on top of the existing wikitext markup language. Editors don't need to know about any of this typing except when they want to control how transclusions behave / compose with the surrounding context. This is purely a layered abstraction on top of the string-based model and as such won't involve any drastic changes to wikitext itself but which could lay the path for changes in the future.

Typed wikitext constructs

Wikitext constructs (be it a piece of text, links, list markup, nowiki markup, extensions, transclusions, etc.) return a string, but are interpreted as a typed value. The available types are:

string
a key value pair
a DOM tree

More generally, 2. and 3. would be a map and a DOM forest. We might also expand 3. into a bunch of specialized types for reasons that will become clear later on. But in the interest of simplicity, let us ignore these details now.

The types are treated as constraints that are enforced on the output of the wikitext construct, i.e. if a template declares its output to be a DOM tree, then the output of the template will get that type enforced. However, if the output of the template is declared to be a string, all HTML tags inside it will be appropriately escaped to render as a string. So, for example, for use cases where HTML attributes need to be templated, you can only use constructs that return a string or a key-value pair. If a DOM tree typed construct is used, it could be (silently or loudly with error markup in output) ignored.

Document composition spec / rules

The result of this typed wikitext notion is that for all DOM tree returning constructs, we now have a bunch of DOM fragments that need to be composed together into a top-level document. This composition is trivial in the non-nesting cases. However, where there is nesting, we also need to specify fragment composition rules to specify how composition is going to work (see document composibility). One way of resolving this is to not specify any composition rules in the wikitext spec and let the HTML5 tree building resolve nesting scenarios according to the HTML5 content model. This is the wikitext 1.0 status quo but we can do much better than that and build upon it. The balanced templates RFC is proposing finer grained types for output of templates that de facto define how the template output is going to compose with the surrounding context.

Markup errors are going to continue to be an issue. The DOM tree type for a construct implicitly enforces fixups of unclosed and mis-nested tags within the output of the construct (ex: an unclosed or other inline/phrase tags are going to get implicitly fixed up within list items, paragraphs, table cells, etc. and won't leak out of those contexts). However, we would nevertheless benefit from notion of top-level DOM-scopes which bounds the range of unclosed tags. One way of doing this is defining an implicit top-level section construct (much like the implicit paragraph wikitext construct) whose output is naturally a DOM tree. This takes care of unclosed tags within a section. But, this potentially eliminates one use case that seems to be used on wikis which is wrapping the entire page or wrapping multiple sections in <div> tags. We need to figure out how that fits into this model.

Metadata as annotations on the DOM tree

In addition to returning a typed value, a construct can also return metadata about the construct / document / page. For example: categories, refs, lang links, etc. The metadata is represented as annotations on nodes of a DOM tree. This is a detail that we can ignore for the moment — to be elaborated upon later.

Implications of this model

There are a bunch of implications that naturally fallout from the above model / layer.

Decoupling top-level document from extensions and templates

The top-level document becomes independent of extensions and templates, i.e. the top-level markup can be seen as producing a document with holes into which output of extensions and templates will get slotted in. So, '''foo {{some-template}} bar''' will *always* parse to a bold tag no matter what {{some-template}} outputs. Since the template output is nested within the tag, document composition rules for nesting will determine how the composed document looks like, but it is not going to alter the fact that at the top level, both foo and bar will be bolded.
This is a big change from wikitext 1.0 where you cannot parse the document till all transclusions are expanded. This has implications for performance as well (you can parse all the top-level document, templates, and extensions in parallel and compose the output together by relying on document composition rules). This has implications for wikitext reasoning for editors as explained earlier.

You can cache the output of templates and extensions separate from the page they were found on (with some caveats where they aren't relying on things like time of day, or other such volatile components). For example,{{convert|1|km}} can be cached and reused wherever it occurs.

Transclusions, extensions, and any other HTML-producing tools are all identical. They all fill DOM-shaped holes in the top-level document. For all practical purposes, a transclusion can be modelled as an extension that has its own special-purpose shortcut syntax {{..}} instead of <xyz>..</xyz>.

Clearer spec for nesting rules

The ability to define and spec composition rules let us deal with pesky problems like links-nested-in-links (common on wikis) and fosterable-content-in-tables (also common on wikis) which introduce a fair bit of complexity in Parsoid and a lot of known rendering errors on wikis.
For example, we could spec a behaviour where we specify that when content is nested inside a link, all links within the nested content will be stripped.
Similarly, we could spec behaviour where we specify that when content is nested inside ‎<tbody> but outside a table-cell, (a) we will implicitly add any necessary table rows / cells to hold that content (b) we will quietly swallow the content and not render it (c) we will swallow the content but render error markup so editors can fix it.
These notes about composition constraints are a beginning to developing this spec.

Well-structured / specified output (?)

The DOM tree typing make supporting tools like VE and CX simpler without the kind of contortions that Parsoid currently goes through.
We can continue to use (a variation) of the Parsoid DOM spec that continues to provide semantic markup on special constructs.
With some carefully crafted composition rules, you can even view the original wikitext as a nested tree which benefits source editing tools.

Fine-grained editing beyond top-level sections

Sections are no longer special with respect to sub-page editing. They are simpler to support since they are direct children of the ‎<body> tag and the simple composition rule of concatenating the HTML of the section works since there is no nesting issues involved. However, if we are going to define a composition spec for the document anyway, we can use that to enable fine-grained editing at any level in the DOM tree of the top-level document. However, in order to prevent surprises for the editor, the UI might also provide the editor the surrounding page context. For example, the UI might tell the editor that the text being edited is nested inside a link along with a note that adding a link to the current text would be stripped out (assuming that is what our composition spec dictates).
This has implications for edit conflicts and real-time collaboration as well in that more parts of the document could be edited concurrently without stepping on each other's toes. Based on the path of the node to the tree, you can even warn about potential edit conflicts if an editor decides to edit any node on the path from the current node up to the root of the tree. Or, you could enable editing those sections but flag / lock the node being currently edited by another editor.

Defining an extension spec

Since extensions are going to be useful and will continue to be developed, we need to define an extension spec that is not dependent on the internals of the parsing tools. In the common case, all extensions need to do is the normal thing, i.e. they take some input and produce a typed value (a DOM tree in the common case) which gets composed into the document.
But, there will continue to be extensions like cite that need to do some post-processing on the document. So, they can be provided hooks wherein they get to post-process the document. For example, the extension could process metadata annotated to the DOM tree to collect ‎<ref>s and generate the appropriate ‎<ref> links and generate ‎<references> output (this is how Parsoid's Cite implementation is structured). So, the spec would also need to provide mechanisms to attach metadata to the source document.
There may be other use cases that fall out by examining the existing extensions and how they are used.
Extensions like ‎<translate> could potentially be modelled as tools that decorate the DOM tree with metadata annotations. These metadata annotations can then be used by editing tools to provide translation interfaces.

Separating raw parser output from final MediaWiki output

It would be useful to separate the raw parser output from the final MediaWiki output, i.e. the parser doesn't concern itself with database state (red links, bad images, resource sizes, etc.). It produces output that is then post-processed to generate the final MediaWiki output. This further decouples the parser from MediaWiki internals and lets different platforms (mobile, desktop, etc.) define their own post-processing transformations. This might be an architectural / processing model distinction primarily and the implementation might be structured differently.

Getting there

This is just a thought dump for now.

Since the value types as well as the document composition rules are layered on top of base wikitext 1.0, we can rely on the content handler abstraction to give us a fairly clean way of transitioning to the newer model.
If a page declares itself to be a wikitext 2.0 page, then this abstraction layer kicks in. In practice, the page will be handled by a totally different code path.
But, this layering also lets us use today's templates on both wikitext 1.0 and wikitext 2.0 pages and gradually migrate templates and pages over. This is based on the observation that in practice, a large proportion of templates are wikitext 2.0 compliant, i.e. they return well-formed DOMs and in many cases, document composition is trivial without requiring any fix-ups.
In addition, Parsoid has sufficient information about existing templates and their uses to identify where we would need to either develop new templates or tweak existing templates to make them usable / compliant with wikitext 2.0
We can rely on 'undefined behaviour' spec magic to let grey areas slide while also providing editors a good way to fix such behaviour.
We will need some form of new syntax (heredoc style transclusion syntax OR new extension wrapper like ‎<domparse>, ‎<domscope>, ‎<dom>) to deal with use cases where a table (almost always) or other DOM structure is built up from multiple templates (table-start, table-row, table-end). As more pages migrate to using this new formulation, more pages would be wikitext 2.0 compliant.