VisualEditor/Developers/Getting started

VisualEditor: Getting Started With Development


VisualEditor is a rich-text editor, developed primarily for use with MediaWiki. It is a general purpose editor and can be integrated anywhere. As such, it’s maintained in two parts:

  • VisualEditor core
    • The basic editor component
  • MediaWiki-VisualEditor
    • Integration with MediaWiki, including wikitext support and various wiki-specific element types
    • Relies on Parsoid, a service that translates between HTML and wikitext
    • Other MW extensions include components that add support for their features (e.g. references are defined in the Cite extension)

This document is concerned primarily with explaining the concepts VisualEditor uses, and will simplify examples somewhat to avoid getting caught in unnecessary details.



VisualEditor has three layers:

  • DM: DataModel
  • CE: ContentEditable
  • UI: UserInterface

As a simplified view, the user can interact directly with the UserInterface and ContentEditable, both of which then translate the user's input into the DataModel. The DataModel is never exposed to the user.

DM: DataModel


This is the underlying model of the document, divorced from knowledge of the DOM. It’s a linear data structure, which breaks down the document as a mixed series of characters and nodes, stored in order.

A simple document might be:

0: <paragraph>
1: 'F'
2: 'o'
3: 'o'
4: </paragraph>

All data in the model is referred to in terms of offsets from the start of the model. E.g. F in the above example is at offset 1. Most interactions with the model involve telling it to make a change at a given offset.

Nodes are most block-level elements, such as paragraphs and images. Nodes which can contain other data are called branch nodes. Nodes which cannot contain other data are called leaf nodes.

The <paragraph> in the example document is a branch node. An <image> would be a leaf node.

Any offset can have “annotations”, which are extra data attached to that particular character.

Extending our earlier example, we might bold the Fo of Foo like so:

0: <paragraph>
1: ['F', 'bold']
2: ['o', 'bold']
3: 'o'
4: </paragraph>

Annotations include text styles, links, and language data.

The rule of thumb for whether something should be a node or an annotation is whether text contained within it should bring along its annotation when copied around. E.g. italic text should probably still be italic if you move it elsewhere, whereas text in a table cell probably shouldn’t still be in a table cell if you copy it to a paragraph.

It's useful to keep this rule of thumb in mind because it's technically possible to implement most elements as a node or an annotation. The 'bold' annotation discussed above could be implemented as a <bold> node, which surrounded the bolded text.

CE: ContentEditable


This is the view. It turns the DataModel into a DOM which can be interacted with.

These interactions are based on browser contenteditable support, as you might guess from the name. Because browsers are deeply inconsistent, a great deal of processing is applied to inputs.

Once inputs occur, they are applied to the DataModel to keep it in sync with the DOM.

UI: UserInterface


This is the layer surrounding the editing surface. It handles everything which isn’t direct keyboard input to the surface, such as toolbars, dialogs, and context popups.

Keyboard shortcuts and commands triggered from special inputs also live on this layer.

Changes that the UI makes are applied directly to the DataModel, rather than interacting with the DOM. The DOM is then updated to match the model.

A ui.Surface instance can be considered to be the top-level object when dealing with a VisualEditor instance.

Making changes


Each edit to the document becomes a Transaction that is applied to the DataModel.

A Transaction is an object which contains a set of operations to apply to the model. The operations move an internal cursor across the document, and handle updating model offsets automatically so the entity making the change doesn’t have to think about it.

Allowed operations are:

  • Retain
    • Make no changes, just move the cursor along by X positions
  • Insert
  • Remove
  • Annotate
    • Set / clear annotations
  • Attribute
    • Set an attribute on a node
  • Metadata
    • Retain / replace metadata
    • Metadata is non-document content, which technically exists in the linear model but is entirely hidden from display (this is mostly an artifact of the system being designed for mediawiki, which has some magic text flags you can include in a document to trigger special page behavior)
    • Note: we’re trying to deprecate and remove the metadata concept

You could think of an example Transaction as “move the cursor to offset 71; remove 3 characters; add ‘Hello’; move the cursor to the end of the document”.

Implementation detail: some non-retain operations are actually implemented as a replace operation, which functions as a splice on the data. Unless you’re working unusually close to the metal, this probably won’t matter to you.

When developing for VisualEditor, you rarely directly write a Transaction. Rather, you normally use a helper called, which creates the correct set of operations for you.

Each transaction must result in a valid DataModel – no unbalanced nodes, or nodes in places they’re not allowed to be. If you manually create a Transaction which does result in an invalid document, it will refuse to apply and throw an error. Transactions are stored in the document history, and can be cleanly reversed. This is used for undo/redo functionality, rather than relying on browser behavior.

Note: Although all transactions are stored, undo/redo depends on “breakpoints”, which try to split the history into usable chunks to jump between. E.g. undoing at word-level rather than character-level.

Part of the design goal of this system is to have a format that can be used for collaboration when synchronizing a document being edited by multiple editors.

How it fits together


That's enough theory. Let’s get our feet wet and walk through how VisualEditor initializes itself, looking at the standalone VisualEditor instance in demos/ve/minimal.html from the core VisualEditor repo.

VisualEditor initialization relies on several things, mostly stored in the ve.init namespace. These are:

  • ve.init.Platform
    • Interaction with whatever software platform VisualEditor is running on
    • Hooks up translations, platform-specific config objects, etc
    • Defines what VisualEditor considers to be an external URL
    • Browser checks
  • ve.init.Target
    • Sets up toolbars and other UI framework
    • Manages editing surfaces
  • ve.ui.Surface
    • Glue around a dm.Surface and a ce.Surface

ve.demo.init.js kicks everything off by telling a Platform to initialize itself, and let us know when it’s done:

// Set up the platform and wait for i18n messages to load
new ve.messagePaths ).getInitializedPromise()

If this Promise is resolved, we know that VisualEditor is supported in the current browser, and has received all the platform information it needs to work. E.g. it has loaded all the translations for the current language. Next, it creates the UI framework surrounding the editor on the demo page.

// Create the target
target = new;
// Append the target to the document
$( '.ve-instance' ).append( target.$element );
// Create a document model for a new surface
        ve.createDocumentFromHtml( '<p><b>Hello,</b> <i>World!</i></p>' ),
        // Optional: Document language, directionality (ltr/rtl)
        { lang: $.i18n().locale, dir: $( 'body' ).css( 'direction' ) }

This creates a DOM HTMLDocument from some HTML, makes a DataModel dm.Document from that, and then adds a ui.Surface to the Target based on that dm.Document.

We now have a working VisualEditor instance.



Here are some useful shortcuts into a VisualEditor instance, which you might want to use in the console:

// The target:
// Which contains a ui.Surface:
// Which contains a ce.Surface:
// Or a dm.Surface:
// The data model for the current document is:

Below your VisualEditor surface there's a debug bar. If you're using VisualEditor on MediaWiki, you'll need to trigger debug mode first as well (add debug=1 to the URL). Using this bar you can:

  • Display the entire DataModel
  • Display the transaction history
  • Turn on input debugging, which adds symbols to the document to show you hidden characters that VisualEditor uses to control annotations and cursoring
  • Turn on Filibuster mode, which performs extensive logging as you make changes (and makes VisualEditor run very slowly).

OO + OOUI and you


VisualEditor is built on top of OOjs, and you may want to familiarize yourself with it. In particular, OO.EventEmitter is used extensively.

VisualEditor uses OOjs UI for its UI widgets. As well as all of UserInterface, almost everything in ContentEditable descends from OO.ui.Element.