User:GWicke/PageProperties

The wikitext interface has traditionally mixed page content and metadata liberally. Usually these properties can be added at any point in the page and don't directly produce rendered output:

  • language links are used to render the 'languages' side bar
  • categories are rendered in a separate box below the article
  • Behavior switches [1] like __notoc__ and __forcetoc__ affect the way MediaWiki renders the entire page
  • Similarly, the category default sort key [2] changes the way this page sorts within a category member list

As access to these properties is common and needs to be efficient, the PHP parser extracts this information from the page and caches these values for the latest version of each page in the page_properties table.

Similarly, the VisualEditor presents these page properties in a 'page property' dialog after extracting this information from the page content. New properties are usually appended at the end of the page content. Since diffing is still wikitext-based, the page property abstraction breaks down when inspecting changes.

Once bug 49143 is implemented, we'll have the capability to store page properties separately for each revision. This has a number of advantages over the current status quo:

  • It provides efficient and convenient access to page properties on both current and old revisions. Clients don't need to deal with extracting this information from the page content and the associated edge cases (position preservation for round-tripping, duplicate properties, positioning of new properties etc).
  • It reduces the page size when Parsoid-generated HTML+RDFa is used for page views. Metadata can be retrieved separately for editing (see bug 52936).
  • A (wiki-)text based interface for page properties can be provided separately from main content editing interface. Users don't have to navigate a mix of metadata and content. This is especially relevant for bulkier metadata like category- or page-specific language variant conversion rules.
  • A visual diffing interface can be developed along with HTML diffing planned for the content, so that users without wikitext knowledge can understand property changes.

This is all fairly straightforward for 'static' properties- those that are added directly in the page content and will only change when the page is edited. Properties added from transclusions however need to be handled differently:

1) They cannot be edited directly

2) They can change whenever a transclusion is re-rendered

We can reflect this by adding a 'dynamic' marker on page properties that are only added from transclusions.

LinksUpdate jobs in response to template edits are very expensive as they affect a large number of pages. To avoid overloading the cluster, we currently ignore large LinksUpdate jobs completely. In Parsoid we would like to make this more efficient and correct by only re-expanding transclusions that used the edited template. The associated metadata update can be sped up using reference counted page properties. For a re-rendered fragment, the difference between original and new properties can be calculated by looking only at the fragment content. Using compact reference counted page properties, it now becomes efficient to determine which property changes result for the page as a whole. It is not necessary to parse and traverse the entire DOM just to retrieve page properties.

[1]: Parsoid/MediaWiki DOM spec#Behavior switches [2]: Parsoid/MediaWiki DOM spec#Category default sort key