Requests for comment/HTML content templating

MediaWiki has very rich Wikitext-based templating and scripting facilities. We are interested in exploring how we can provide similar functionality in a HTML-only world.

Why are we interested in storing and processing our content as HTML rather than wikitext? There are several reasons, but the most important are probably performance and ease of use.

Page view latency can be lower when using stored HTML, as no expensive transformations are necessary to produce the desired HTML output. Similarly, templating HTML directly is faster and more efficient than building up a wikitext string, parsing that to HTML & finally cleaning up the HTML string with an external tool (tidy).

HTML with semantic markup is significantly easier to work with than wikitext or presentational HTML without a well-defined interface. This lets us easily extract information from our content, build new ways to interact with it, or modify the way information is presented on different platforms. We can do so using standard tools rather than custom and notoriously complex parsers. We won't rely on unmaintained hacks like tidy any more to balance our tags. Visual editing of transclusions can finally be truly WYSIWYG, rather than something that's only WYSIWYG as long as templates emit balanced content.

Finally, there is the potential to simplify our infrastructure. Wikitext parsing abilities and thus Parsoid become an optional edit user interface. Third-party users can elect to use HTML editing exclusively, and won't need a Parsoid installation in that case.

Goals

Provide similar functionality as available in Wikitext templating with parser functions & Scribunto
Separate presentation (templates) from data & logic as much as possible
Generate well-formed HTML, support efficient and principled sanitization of both templates and data
Perform better than the Wikitext pipeline
Integrate well with HTML-only content storage
Ideally, let top-level transclusions of HTML templates be represented as wikitext transclusions for continued wikitext editing support

Implementation sketch

Presentation layer: Templating

Initial implementations: TAssembly.js, TAssembly.php, Knockoff.js

DOM-based compiler, current front-end syntax uses KnockOut.js syntax
- HTML syntax can support visual template editing
Optional DOM-based template sanitization for user-editable templates
Intermediate string-based TAssembly representation with very fast execution engines
Automatic context-sensitive sanitization of model data used in attributes to prevent XSS, without a need for a separate DOM post-processing step (good for performance)

Data access layer

The templating environment needs a way to access information from several sources:

Transclusion parameters
Wikidata queries and other service requests
Other environmental information such as the one currently provided by parser functions and magic words

Apart from parameters, this information can't all be set up ahead of time as in a traditional MVC model. Instead, it needs to be available to the template on demand using an accessor interface. The template execution environment should support IO parallelism to reduce the render latency of content with several external data dependencies.

The data layer should keep track of data dependencies of parts of the content to enable accurate and efficient cache updates.

Code modules for data massaging & complex predicates

Complex data massaging and predicate logic should be clearly separated from the templating layer. TAssembly already defines a very restricted abstract expression syntax, which is powerful enough to allow nested function calls with rich parameters, but abstract enough to make backend implementations in different languages straightforward.

It would be desirable to share these helpers between the server and client, which would favor implementing them in JavaScript. One potential issue with client-side execution is that arbitrary code editable by untrusted users will need to be executed in a safe environment with limited resources, which is difficult to do efficiently on the client. It is unclear how much of such unreviewed utility code will actually be needed, and thus how much of an issue this would be in practice.

We should also consider leveraging existing Lua code for some of this functionality, in particular server-side data massaging.

Overall it is pretty clear that this area is the one with the most need for research and discussion.

I18n, L12n

Internationalization & localization involves both data access (message loading) and utilities (gender, plural etc). Messages can be represented using regular content templates, so that they can be efficiently executed by the same runtime.

For client-side execution of templates with localized messages it is desirable to extract a static list of messages used by a template, so that those can be efficiently loaded along with the template. In the current TAssembly spec, this can be supported with a static analysis tool.