Markup spec/BNF/Noparse-block

A "noparse block" (my term) is a block that is parsed according to totally different logic. It is the first thing the current parser does after preprocessing, in the "strip" method. The only thing that ends one of these blocks is the matching close tag.

 <noparse-block>           ::= <nowiki-block> | <html-block> | <math-block> | <pre-block> | <html-comment>


  • Nowiki, pre and html-comment are always available.
  • Html is available if $wgRawHtml is true in localsettings.php
  • Math is available if the math extension is installed
  • Other tags may be available if installed and present in parser->mTagHooks.

Nowiki edit

The <nowiki> tag prevents special markup (like '' for italics) from being recognized.

 <nowiki-block>            ::= <nowiki-opening-tag> (<whitespace>) <nowiki-body> (<whitespace>) [<nowiki-closing-tag> | (?=EOF) ]
 <nowiki-opening-tag>      ::= "&lt;nowiki" (<whitespace> (<characters>)) "&gt;"
 <nowiki-closing-tag>      ::= "&lt;/nowiki" (<whitespace>) "&gt;"
 <nowiki-body>             ::= <characters>

In words, if a <nowiki> tag is not closed, then it is taken to run until the end. (?=EOF) is a look-ahead assertion, like in PCRE. It asserts that an EOF follows, but does not consume the EOF.

Translating to HTML edit

To translate a nowiki tag to HTML, perform the following transformations:

  • <html-unsafe-symbol> terminals within <nowiki-body> are replaced with the appropriate <html-entity> (see Fundamental elements).
  • <nowiki-body> is otherwise output more or less literally. Whitespace is treated as normal: single new lines are ignored, consecutive new lines are converted into p and br elements. Leading and trailing space from each line is removed, and runs of spaces are normalised to a single space within a line.
  • The <whitespace> elements in the top-level <nowiki-tag> are discarded.

Math edit

The <pre> tag behaves much like nowiki, but generates a literal <pre> tag, which causes different output. Notably, a nowiki is treated literally inside a pre tag, and vice versa.

 <pre-block>               ::= <pre-opening-tag> (<whitespace>) <pre-body> (<whitespace>) [<pre-closing-tag> | (?=EOF) ]
 <pre-opening-tag>         ::= "&lt;pre" (<whitespace> (<characters>)) "&gt;"
 <pre-closing-tag>         ::= "&lt;/pre" (<whitespace>) "&gt;"
 <pre-body>                ::= <characters>


  • Not quite accurate, <pre-foo> is recognised, although <prefoo> is not.

Translating to HTML edit

  • <html-unsafe-symbol> terminals are replaced.
  • New lines are retained literally.
  • The whole block is wrapped in <pre>

Html edit

 <html-block>               ::= <html-opening-tag> (<whitespace>) <html-body> (<whitespace>) [<html-closing-tag> | (?=EOF) ]
 <html-opening-tag>         ::= "&lt;html" (<whitespace> (<characters>)) "&gt;"
 <html-closing-tag>         ::= "&lt;/html" (<whitespace>) "&gt;"
 <html-body>                ::= <characters>

Translating to HTML edit

  • All characters, including whitespace, newlines, and "html-unsafe-symbol" terminals are output literally.
  • The block is not wrapped in anything.

HTML-comment edit

<html-comment>             ::= "&lt;!--" ({ characters }) "-->"

Translating to HTML edit

HTML comments are completely stripped out, never to be seen again. It's possible that with the new parser, this behaviour could be changed - it was primarily to avoid conflict with other parts of the parser that generated internal comments, such as to identify section headings with.

Note: Unlike in HTML, this stripping is repeated until there is nothing left to strip, i.e. <<!---->!----> becomes (nothing).