Parsoid/Page metadata

Some common metadata like the title of the page, the revision number etc will be available in the head section of the HTML document. Other, internal information however will be in a separate header / index section for efficient processing.

HTML head element

edit

Probably similar to the info in Special:Export XML. See bug 45206.

title
The page name.
revision
The page revision number.

Internal page metadata

edit
page TTL
minimum of all (non-ESI) fragment TTLs (Time To Live, i.e. the amount of time this fragment is expected to be fresh), if any. Sets the HTTP cache headers.

Fragment index

edit

A list of per-fragment index entries, each of which contains

byte length
Used for efficient seeking / updating withouth parsing full page
update events
set of condition in which fragment might need to be updated:
edit
Re-render on every edit. Examples: PAGESIZE, REVISION* etc magic words
view
Potentially re-render on view if TTL for fragment has elapsed
rights
Re-render if protection levels have changed
move
Re-render if page is moved / renamed (example: page name dependent templates)
fragment TTL
time to live for full fragment.
dependencies
other resources used in the rendering of this fragment. Any kind of transclusion (including templates / pages, parser functions, magic words, Lua modules etc), files

data-parsoid

edit

In our current implementation we use a private data-parsoid attribute with JSON data to store per-node round-trip information. Since this is private information and not needed by clients we should move this out of the DOM itself, which will also reduce the size of the returned DOM. To preserve a link between nodes and external metadata we need a stable node key. A simple solution is to add an attribute with a UID to each node. We should try to be somewhat resistant against client-side reassignments of such UIDs, so at least the node type should probably be checked. We might also be able to avoid assigning UIDs to all nodes by using hash trees on DOM subtrees similar to those used in the XyDiff diff algorithm.

See also

edit