User:SSastry (WMF)/Notes/Document Composability

WIP: This is a pulling together of notes from separate documents (Parsoid/DOM notes, Wikitext notes, Notes about incremental parsing) and from different phabricator tasks into a coherent writeup.

Composability

The main requirement is one of being able to compose a document from DOM fragments, independent of how the fragments are generated (plain markup, transclusions, extensions, widgets, components, etc.)

Given document composability,

it is simpler to reason about the document and the markup
WYSIWYG editability is enabled more easily
edits can be processed lot more efficiently
the target document can be easily tailored to different device / network contexts
... other benefits ...

So, it is sufficient to focus on this high level goal of enabling composability of a document from DOM fragments.

Questions

Do we need the ability to compose DOM fragments? If the answer is no, we can pack up and go home.
What are good candidates for DOM fragments?
- 2a. How are they specified in input markup? Are they implicit or explicit? If explicit, are they opt-in or opt-out markers?
- 2b. How are they identified in output markup, if at all?
What kind of constraints need to be respected while composing a document from DOM fragments?
What approaches exist for enforcing constraints?
How are old revisions handled?
What pieces of this functionality need to be supported in the core parser and Parsoid?

Wikitext and composability

Wikitext does not have the notion of being able to compose the final document from individual DOM fragments. Even with transclusions and extensions, what is composed is the source markup not the target document, i.e. you construct the complete / final source and derive the target document from it.

What are good DOM fragments?

If we want to support document composition from DOM fragments, we need to figure out what are good candidates for fragments and how they are specified. A related question is: do these fragments need to be marked up specially in output?

task T114444 suggests that sections, lists, tables, and potentially other constructs can all be treated as DOM fragments. The RFC proposes that these boundaries are specified by changing wikitext semantics (DOM scopes). So, there is no explicit markup or opting in required.
task T114445 proposes that templates be treated as fragment generators. Specifically, it proposes that template authors opt in to this behavior by adding markup to template output.
task T114072 proposes that sections be treated as DOM fragments (and specifically requests special output markup). There are some details to be worked out as to whether this behavior is enforceable everywhere or not.
task T105845 is somewhat related. It picks a narrow subset of templates used for navboxes and infoboxes and discusses other DOM-generating components. The primary concern is about the form of the output DOM and requests special output markup as with task T114072.
task T114432 is tangentially related. It proposes heredoc syntax for templates for the multi-template-content-block use cases. The new syntax effectively demarcates DOM fragments as well even if that is not the direct intention.

How are DOM fragments identified during parse?

There are multiple approaches here.

Specified via wikitext semantics: For example, at some point, we can declare that lists and tables produce well-formed DOMs and are processed as such. This is unlikely to work for transclusions as of today.
Opt-in via special markup: task T114445 specifies parser-function style markup added to template source allowing template authors to declare how the output of that template should be processed. The opt-in / typing information can also be added to templatedata.
Opt-out via special markup: This is based on the optimistic assumption that most templates do indeed produce well-formed output, but there might be scenarios where this is not possible (ex: multi-template-content-block use cases, templates used for generating strings to be processed as HTML attribute strings or HTML attribute values). The opt-out information can also be added to templatedata.
Automatically inferred: This is a special case of opt-out/opt-in where template behavior and fragment boundaries are inferred based on information available during regular parses. Conceivably, this derived information could be added to templatedata. So, the difference from opt-in/opt-out is whether a template author adds this information or whether software adds this information.

Which of the above strategies are suitable depend on requirements of the core parser, Parsoid, what are the needs of specific clients / products / projects on the Wikimedia cluster, what kind of support we want to provide for 3rd party wikis, how old content is affected. This needs additional clarity.

How are DOM fragments identified in the output document?

Parsoid uses special markup for transclusions and extensions. T114072 and T105845 propose / request special markup for other DOM fragments. But, looks like this special markup for DOM fragments is a cross-cutting concern. A parser might add additional information (ex: Parsoid in the private data-parsoid attribute) to aid incremental parsing. But, in all cases, it makes sense to come up with output markers that delineate DOM fragments and provide additional information about how they were generated, etc.

Generating DOM fragments: Caveats

For templates that generate a single table row/cell, we need to figure out a way to make sure that parsing to full-dom doesn't break it. i.e. if a template produces <td>foo</td>, you cannot parse this string via a HTML5 parser without adding surrounding <table> context. Otherwise, the generated DOM fragment will be "foo" and the surrounding <td> tags will be dropped.

Similarly for any other constructs. dt/dd? But, it looks like list items can stand alone.

Nesting constraints

Composability, to be useful, has to ensure that the final produced document is always semantically meaningful.

If there were no constraints on DOM fragments and their nesting context, this discussion would be vastly simpler. All that would need to be done is demarcate fragment boundaries (ex: sections, templates, extensions, lists, tables, etc.), parse those to DOM to fix ill-formed markup and compose the final document from top-level output by inserting output of individual fragments at the right places.

But, there are two kinds of contraints that apply [ Note that this distinction is probably mostly pedantic. We are, at this point, constructing HTML5 documents as our canonical target (and deriving all other formats: text, PDF, mobile app output, audio readers, etc.) from this canonical form. ]

1. Nesting constraints that are independent of the target domain: For example, it doesn't make any sense to nest a link within a link, a paragraph within a paragraph, or a heading within a heading.

2. Nesting constraints that are imposed by the target domain (HTML5): While HTML5 specifics content model constraints, for the most part, the constraints that are actually enforced are ones that fall out of the tree building algorithm that parses a (html) string into a HTML5 DOM.

For example, see the following output:

> DU.parseHTML('<p><ul><li>x</li></ul></p>').body.outerHTML
'<body><p></p><ul><li>x</li></ul><p></p></body>'

> DU.parseHTML('<p><b><ul><li>x</li></ul></b></p>').body.outerHTML
'<body><p><b></b></p><ul><li><b>x</b></li></ul><p></p></body>'

So, within HTML5, you cannot nest a list within a paragraph. You cannot insert content in a table anywhere except within <td>, <th>, and <caption> tags. Content inserted anywhere else within a table gets moved out of the table (the content is adopted by the <table>'s preceding sibling. There are likely other such constraints as well.

Possible approaches for handling nesting constraints

a. No-op: This is the ideal scenario where nothing needs to be done because no constraints are violated.

b. Modify fragment: Based on the insertion site, modify the output of the fragment suitably to ensure that constraints are respected. For example, if a fragment is being inserted inside a link, you could convert all links in the fragment to plain text.

c. Change insertion context: For example, if a link-containing fragment is being inserted inside a link, you could insert the fragment *after* the link.

d. Expand the fragment scope: So, when you try to insert a list within a <p> tag, the p-tag is treated as the composable fragment instead of the list. This is how Parsoid deals with transclusions today. It has a notion of "template-affected output" which includes a template's output as well as any enclosing page context that is treated as an indivisible unit for the purpose of editability. Example: Look at Parsoid's output for <p><b>a\n{{echo|*a}}\nb</b></p>

e. Turn off fragment composition: Recognizing when document composibility cannot be supported and mark the document as such. In this scenario, WYSIWYG editing and incremental parsing would be impaired or disabled on that page. Example: If you try to insert a list within <p><b>a ... b</b></p>, fragment modification would have to convert the list into plain text which might be considered unacceptable. Similarly, changing insertion context to insert the list *after* the p-tag could be considered unacceptable in some contexts. So, if strategy d above is not available, another option would be to lose fragment composition ability on this part of the document. When the fragment is inserted in the paragraph, there are non-local effects on the document and the p and b-tags are modified as well. So, on any edit to this part, you would have to reparse the entire page. Similarly, inside say VE, you may get surprising non-WYSIWYG behaviour. Replacing the list-production-transclusion with plain-text might show you 3 paragraphs in VE, but on save, the content of those paragraphs will render as a single paragraph.

If we want to treat transclusion output as DOM fragments (which is the direction we seem to be moving towards), my personal instinct is that no single strategy is likely to cover the diverse templating use cases on the wikimedia wikis. The choice of what strategy is used / applied is also likely going to be determined by what the core parser needs to and can support, what Parsoid will support, and what the needs are for 3rd party wikis.

Situation today in core parser and Parsoid

As it stands today, in the core parser, strategy e. is applied everywhere implicitly even when it is probably true that strategy a. can be used commonly in practice.

Parsoid already supports some ad-hoc document composition via DOM fragments. See below for why it is ad hoc.

It treats extension output, link text, and image thumbnails as DOM fragments. It does this to contain the effects of non-well-formed markup to those fragments. Parsoid uses fragment proxies to deal with nesting constraints -- in other words, strategy e. is applied everywhere implicitly since strategies b, c, d aren't exercised.
Parsoid doesn't treat transclusions as DOM fragments exactly. However, since Parsoid's original target was VE, it applies strategy d. to all transclusions to minimize surprising non-WYSIWYG behavior. So, in other words, Parsoid does treat transclusions as DOM fragments, but only for the purposes of visual editing. The information in data-mw can be used by clients to identify transclusions that behave like fragments. Specifically if data-mw.parts.length === 1, the transclusion does indeed behave like a composable fragment.

Given Parsoid's ad-hoc and incomplete handling of document composition, it makes sense to approach this more systematically. This requires moving wikitext semantics to enable DOM fragments and composition.

Nesting constraints for children of <body>

For top-level (children of <body>) DOM fragments, composition is fairly trivial. You can just drop in the fragment in all cases except for <tr>/<td>/<th> fragments which only make sense in a <table> context. For such fragments, strategy (b) of modifying the fragment to wrap a <table> around the fragment is sufficient. [ Tangent: Given this one simple incremental parsing solution would be to find the child of <body> that an edit is located it and reparse the wikitext for just that child. However, this strategy won't do much for edits in VE that runs into the same nesting constraint issues. ]

Nesting constraints for transclusions and task T114445

The focus of task T114445 is to treat templates as fragment generators. Since transclusions can show up anywhere, constraint handling is trickier. My personal opinion is that all the non-no-op strategies should be on the table. I don't think any single strategy by itself is sufficient to cover all use cases.

task T114445 attempts to tackle this problem as follows.

Where composition is desired, it requires the template author to declare a type for the template's output.
The current prototype attempts to integrate the HTML5 tree builder into the core parser and let things shake out how they will. In this solution, the core parser won't suppport incremental parsing, but the goal is for Parsoid and the core parser's rendering to be identical.
The RFC proposes type annotations as a unified simple mechanism that determines whether one or both strategies need to be applied to handle nesting constraints.
- For example, a balance:block annotation could force the output to be a block tag (by adding a <div> if necessary), and at the same time, modify the insertion context to prevent any other changes to the template output. But, consider the following wikitext: <p>{{block-tpl}}</p>. If block-tpl produces a <div>foo</div>, then the <p> need not be closed. However, if someone edits the template to produce a <div>foo<p>bar</p></div>, then the enclosing <p> tag would have to be closed during incremental parse. There are 2 approaches to this problem. Approach 1 below is likely the better solution.
  1. Always modify the insertion context, i.e. all p, a, h tags (for example) are closed when inserting the output of a balance:block template, even if it might not be necessary (as in this case). So, this effectively constraints the insertion / use contexts for these templates. But, by doing so, it simplifies handling of these templates in VE, simplifies incremental parsing, and also clearly lets editors know in what contexts, these templates can be used.
  2. Record enough information about insertion context to determine whether incremental parsing can be applied on edits. So, in the example above, since the nesting context included a p-tag, and the template output changed and introduced a p-tag (or the other way round as well), incremental parsing would have to be turned off.
- Current work on the prototype is around determining the specifics of strategy (b) for the balance:block type annotation, i.e. what contexts can these templates be used and what tags will always get closed at the use site.
- On the other hand, a balance:inline annotation might not modify the insertion context at all, but very aggressively apply strategy (b), i.e. modify the output as much as necessary to force it to fit the use site without impacting the use site.
While type annotations a proposed by the RFC are attractive as a simple unified mechanism to determine the strategies for applying nesting constraints, for completeness' sake, here is a variant of the same
- Let template authors explicitly specify what strategy to apply to a template output. So, if a template author specifies that template output cannot be modified (beyond balancing), the parser would then aggressively modify the insertion context (along the lines of what balance:block proposal does). Or, a template author might explicity specify that a template can only be used at the top level (i.e. as child of <body>), or that it cannot be used in inline contexts, etc. This is not a fully fleshed out proposal and this might actually be more confusing and complex to support even if it gives a template author finer-grained control over how a template is used. Type annotations as currently proposed seem a more attractive approach.

... to be continued ....

Implementing incremental parsing in Parsoid

Use relative DSR offsets, [length, start-width, end-width] instead of [start, end, start-width, end-width]
- This eliminates the need to update DSR globally after a reparse.

Need to maintain a mapping from template names -> DSR offsets in the output.
- Given this, when a template is edited, we can quickly identify what parts of the HTML we need to update and replace the old HTML with the new HTML

Solve problem of how to update global state used by extensions.
- If a template generates <ref>s, we need to update the global references section to replace the old <ref>s with new <ref>s. Besides this ref pointers need to be updated globally.

Add markers in data-parsoid for transcluions that would need a full parse (or the other way round) -- when we cannot handle constraints via strategies a - c
- when a transclusion's output leads to non-local changes (because of missing output / use-site constraints, for ex.) or other reasons

Algorithm sketch

- find all transclusions in the article that corresponds to the edited template.
- for each such transclusion, do the following:
  - find the transclusion's dom in the original html.
  - compute new transclusion output.
  - parse that html output to a dom.
  - enforce output constraints on the dom.
  - if any of the following conditions are true, switch to a full page parse:
    * data-mw.parts.length > 1
    * new toplevel dom node !== old toplevel dom node
    * a-in-a, p-in-p, h-in-h, fostering fixups were done in old html,
      but new dom DOES NOT generate the corresponding a/p/h* tag.
  - replace old dom with new dom.
    - no need to do any use-site fixup since that has been done in the original parse.
    - since we parse output to dom, all tags are balanced and well-nested and won't
      impact surrounding context wrt tag balancing.
    - since data-mw.parts.length = 1, the template doesn't affect surrounding context.
  - if template generates citations,
    - update <ref>links in new dom.
    - update <ref>linkbacks in the <references /> section.
    (alternatively, switch to full page parse)