VisualEditor/Design/Software overview

MediaWiki allows users to quickly edit web pages. Editing is done by modifying an article’s source code directly within the browser. This source code, called Wikitext, is a combination of three distinct kinds of syntax: macros, shorthand, and HTML. Macros are either templates or hooks, both referred to by name and optionally given arguments which influence their expanded result. Shorthand is a meta syntax for rendering HTML as well as specifying meta information for the page. A subset of actual HTML is also allowed to pass through the rendering process, whereas the use of disallowed HTML tags are escaped and rendered as plain text.

This document specifies the information models and technologies required to interact with Wikitext visually.

Project status

This project, like this document, is in a research and development phase. Details about the project are subject to change, and will evolve as development continues. This document represents how the software and information models work in the most recent prototypes, but is not a complete or final design. For more information about the motivation behind this project, see the Great Movement Projects section of the Product Whitepaper area of the Wikimedia Strategy Wiki.

Objectives

A visual editor should make it easier for new users to contribute productively on a wiki. Studies have shown that entry level users of MediaWiki have difficulties learning Wikitext; that becomes a factor in their deciding to limit or stop contributing. Thus editing tends to be monopolized by those who are able and willing to spend the time to learn Wikitext, time that otherwise could be used for actually editing content.

Visual editing should first improve the usability of the most common tasks. Less frequent tasks may still be performed using a source code editing mode. In early versions, a visual editor may only implement a minimal subset of features, so it’s important that these initial features target the most common use cases. Reliance on source code editing should naturally decrease as the software matures.

Visual editing should enhance, not degrade, the ability to inspect what was changed between revisions. "Dirty diffs" are a common pitfall of visual editing systems that work with Wikitext; they occur when portions of the document that the user did not intend to change are modified, obscuring the user's contribution. These unexpected modifications can occur when converting Wikitext to HTML for editing and then from HTML back into Wikitext upon saving. Ideally, a visual editor should be able to more accurately keep track of changes as they are made, and provide information beyond a simple diff, indicating more clearly the user’s intentions. At the very least, a visual editor should not make more work for administrators and editors who are reviewing edits done by others.

Editing a document visually may occasionally involve the use of source code, especially while the editor is under development. When a visual editor is given a complex document that contains syntax it's not designed to handle, one of three choices must be made; ignore the unrecognized syntax and risk losing information, disable the visual editor for the entire page and return to source code editing or isolate these portions of the document to be edited as source code while leaving other portions of the document editable visually. Over time more of these edge cases will be resolved as more user interfaces are developed around them. This progressive enhancement approach will allow incremental development and maximize the number of documents which the visual editor can edit while safe-guarding against data corruption. To ensure that the editor need not be disabled on a document, some syntax that may have been previously accepted will need to be constrained.

Architecture

VisualEditor is made up of three primary components, and is dependent on and designed around a new alternative parser called Parsoid.

Parser and load/save API (parsoid): The VisualEditor is designed to load and save HTML documents with special data attributes, not raw Wikitext. The parser's job is to convert a plain text document into this enhanced HTML format. Some of the special data attributes applied to the document are used for converting the document back to raw Wikitext cleanly. If this round-tripping information is omitted from a document, converting it to Wikitext will effectively expand and normalize it. Parsoid is also responsible for providing the HTTP APIs that the VisualEditor uses to load and save documents.

Linear model and transaction system (ve.dm): Once a document is received from the server, it is converted to a linear data model which is optimized for transactional editing. This model is similar to an HTML token stream, however inline formatting is composed onto each character. This allows arbitrary slicing of content to be simple and efficient. The transaction system allows modifications of the document to be safe and reversible. Transactions are prepared against the current document state and then committed. Transactions can also be later rolled back, or "undone".

Rendering, selection and input (ve.ce): Documents are editable by rendering the linear model on the client and then enabling the browser's native ContentEditable feature. The editable portion of the DOM is locked down so that only a highly limited set of operations are allowed to occur without intervention. The DOM is watched constantly for changes in the content, which are then sent to the model as transactions. Selection is handled natively, but is monitored and controlled through the model as well. Most input is allowed to happen natively, but many actions are overridden or quickly corrected such as cursor movement or clipboard actions.

Toolbars and inspectors (ve.ui): Changes to document structure and inline formatting are accomplished by using toolbars and inspectors. The basic toolbar floats at the top of the page above the content providing easy access to its tools independent of the document's length or current scroll position. Inspectors are lightweight inline dialogs that provide additional control over more complex formatting, such as link locations or template parameters.

Collaboration Server: This component is being experimentally designed and developed primarily by volunteers. The ve.dm module is being designed with real-time collaboration in mind, but no resources are being allocated to building a collaboration server at this time. Some documentation on real-time collaboration has been started.

Document model

While Wikitext contains, and has been traditionally converted exclusively into, HTML, there is not a 1:1 correlation between Wikitext and HTML due to a combination of features being present in one but not the other, ambiguities in shorthand syntax and the general forgiving nature of Wikitext.

Parsing Wikitext and converting it into a data structure of blocks, each containing content, allows Wikitext to be represented in a sufficiently abstract manner, such that it can be modified and rendered back into Wikitext without loss of information, as well as rendered into a variety of formats including a variety of styles of HTML, such as HTML4 or HTML5, a simplified form of HTML for mobile devices, or non HTML formats such as PDF or plain text.

Elements

The structure of a document is described using elements, some of which contain other elements while others contain content, but never both. Additional elements can be added to the system at any time, but these are the elements that are to be initially supported.

Paragraph
Series of lines of content.
Heading
Single line of content and a heading level.
List
Series of items, each containing a single line of content, depth and style information.
Table
Series of rows, each containing a series of cells, each containing a series of elements.
Template
Application controlled content with any number of parameters consisting of content/elements.
Hook
Application controlled content with any number of parameters consisting of plain text.

Content

The meaning or appearance of text can be defined by applying annotations to regions of the text. Additional in-line content can be injected by applying rendering annotations at a specific position within the text, typically occupied by a space character.

Formatting
Bold, italic, internal and external links, etc.
Rendering
Images, templates and hooks.
Meaning
Semantic relationships and comments. (not currently supported in Wikitext)

Data structures

Once parsed, Wikitext will be converted into a serializable data structure, which can be sent to a web browser as JSON data. Once at the client, the VisualEditor converts the serialized form into a data structure optimized for editing and rendering, so that user interactions are responsive. HTML, or any other output format, can also be generated from the serialized form.

At the document level, all content is accessed in a single linear address space. Blocks have local address space which is utilized during rendering and interaction, and handle translating the positions of aggregate contents if any, such as in the case of lists and tables.

The visual editor software uses a model-view-controller pattern. The surface object acts as the controller, interpreting events from the browser as actions being taken on the document. These changes are acted on by modifying the model, which in turn updates the views.

The surface contains a document model and a document view. A document model contains both a linear representation of the structural and content data and a space partitioning tree of nodes, which are observable. The document view is a tree of views which observe and mimic changes to the tree of model nodes.

The linear model contains an item for the closing and opening of each element and each character in the document. Element openings and closings are represented as objects, while content data is represented by single character strings or arrays containing a single character string as its first element followed by any number of references to objects describing annotations to be applied to the character during rendering. The model tree contains a structure of nodes, each containing a length value, which when summed with previous siblings all the way upstream provides an index of the data the node represents in the linear model.

For a detailed list of node types that exist in the data model (linear model and model tree) and their current level of support in the editor, see VisualEditor/Design/Node types.

Transactions

As the user works with a document, their use of the mouse and keyboard is interpreted into a series of transactions which can be applied to the document immediately, reversed and repeated locally (undo/redo), logged for later analysis (playback), or communicated to other users and transformed against their transactions (real-time collaboration).

Types of Transactions

Insert
Adding content/elements.
Remove
Removing a range of content/elements.
Annotate content
Set or clear annotation on a range of content.
Change element attributes
Set or clear attributes on an element.

Transactions are reversible actions, described as lists of operations. All operations in a transaction must be performed in order and completely before other transactions can be applied.

Types of Operations

Retain
Keep existing content/elements and apply annotation operations to it.
Insert
Keep new content/elements and apply annotation operations to it.
Remove
Removing a range of content/elements.
Start annotating
Add annotation operations to be applied to retained and inserted content.
Stop annotating
Remove annotation operations to be applied to retained and inserted content.
Change element attributes
Set or clear attributes on an element.

Preparing Transactions

Transactions are applied to ranges within a linear document model, and after the operations are complete the model tree is updated accordingly. In the simplest cases, these updates are as simple as adjusting the length of a node, and propagating that length change upstream. In more complex cases nodes may need to be removed, inserted, split or merged. To maintain reversibility, transactions must have a predictable effect on both the linear model and the model tree, so it's critical that transactions are prepared in the context of a given document state and adjusted as needed to be able to be performed directly without corrupting the document. Preparing insertion and removal transactions thus becomes the sole area of the system that must be fault-tolerant, allowing all other systems to assume that transactions can be applied safely. A basic protocol for insertions and removals will be used in the absence of code which supports special cases. This protocol is designed to meet user expectations where possible, erring on the side of being conservative about deleting structural elements or splitting at the cursor.

Implementation

Challenges

Template Encapsulation

To enhance the user experience of visual editing of Wikitext, and support a more portable document object model, several constraints will need to be applied to the parsing and rendering systems. Macros, such as templates and hooks, will need to be rendered prior to final resolution in the document, and their resulting HTML structures will need to be complete and valid HTML. This will allow macros to be safely treated as discrete objects while editing visually or when rendering into various alternative formats. Macros that do not expand into valid HTML structures should either be fixed using a best-effort approach, or rejected and replaced with a visible error in the final rendered document.

The sequential example illustrates a template is poorly factored, using separate templates to open the table, add rows and cells, and to close it out. While the hierarchical document structure can still describe this template, a visual editor will not be able to present the user with a sensible user interface because there is no clear connection between the opening, contents and closing. Finding and migrating old templates in this style will be an ongoing task to be considered in how new tools get rolled out.

Ideally templates that generate tables or other arbitrarily long lists of content would be invoked using a method more similar to the encapsulated example. It's important to note that this may require more powerful iteration utilities for the template processor. An added benefit of this approach is that the rendered result could easily encapsulate the content that each template expanded to without breaking the HTML structure.

Dirty-diffs

Converting Wikitext from its plain-text form to the various data structures needed by the VisualEditor and then back again can easily lead to a side-effect where the modified version differs from the original in ways that the user did not intend. This is especially difficult to resolve in cases where Wikitext gives users more than one way to write markup that renders to the same output. There are at least three approaches to this problem.

Annotation
Adding extra information to all representations of the document which can be used when serializing back to Wikitext to ensure a clean diff.
Reconciliation
Comparing the original and modified versions and making selective changes to the original only in places where the resulting output is different.
Normalization
Rendering Wikitext in a standard way before saving.

Attempts to achieve clean diffs through annotation have shown to handled most cases well but leaves a long tail of minor issues that are increasingly difficult to resolve. Reconciliation and normalization are promising, but not yet implemented. Normalization also has the additional challenge of causing an initial dirty diff the first time it's used on existing content. This can be mitigated by performing this change using a bot and leaving a comment on the change to clarify what has happened, but when utilizing page history from before this change, diffs can cause problems for reviewers trying to identify what the user's intentions were.