Extension:Collection/XML Bridge/MWXHTML

This page describes the structure of the XHTML export in regard to MediaWiki markup and its semantics.

Naming Conventions edit

We use class attributes to annotate the XHTML with semantic information if available.

To avoid conflicting names, all class names are prefixed with mwx. (e.g. class="mwx.section").

Page content and meta-data edit

  • <title> element reflects prefixed page name (e.g., Extension:XML);
  • <meta> elements provide the language, namespace, version, page name, and redirection target for redirect pages;
* Example?
* what about the indication of used microformats in the header?

Sections edit

The XML converter marks sections of pages using <div class="mwx.section" title="Header of the section">. In addition, sections include the usual XHTML header elements <h[123456]>. Sections end at the following header of same or smaller level or at the end of the containing element. Sections can be nested.

Links edit

In general, we use XHTML <a...> elements to mark up all links in pages and use classes (staring with "mwx.link." to describe their type:

Category and Language links are contained in a div element.


<div class="mwx.categorylinks"> 
 <a href="Category:People" class="mwx.link.category">People</a>
 ...
</div>

<div class="mwx.languagelinks"> 
 <a href="http://de.wikipedia.org/wiki/Mensch" class="mwx.link.interwiki">Mensch</a>
 ...
</div>



Other Link types:

  • mwx.link.internal : Links to resources inside the Wiki (internal links)
  • mwx.link.external : Links to resources outsite the Wiki (external links)
  • mwx.link.fragment :Links to sections of the same page (intra-page links):
  • mwx.link.interwiki : Links to pages in other wikis or other languages (interwiki links)
  • mwx.link.self : Links to the current page (self-links)
  • mwx.link.note : Links to footnotes (created through <ref> markup in wiki-text)
  • mwx.link.noteref : Backlinks from a footnote to the reference

Templates edit

Probably using the Object-Tag

Images edit

Images included in pages are marked up with XHMTL IMG- and A-elements:

  • A-element gets extra attributes: class="mwx.link.image" and links to the page describing the image resource
  • The src attribute of the IMG-element is set to the URL of the actual image
  • if the image is framed, thumbnailed or floating, it is embedded in a DIV-element (with class="mwx.image.float|frame|thumb") together with the optional image caption (class="mwx.imagecaption")
<div class="mwx.image.float">
 <a href="Image:Logo.png" class="mwx.link.image">
  <img src="/resources/images/logo.png"/>
  <span class="mwx.imagecaption">descriptive image caption</span>
 </a>
</div>

Math edit

The content of <math> markup is converted to MathML

Further the unmodified Latex is put in an OBJECT-TAG? or in a Data Uri

Details, Example?

Images edit

Idea: don´t resolve image urls but rather set their url to a service that redirects to the resolved ressource. this will be significantly faster.

Timeline, Hiero and other Extensions edit

are put unmodified into Object-elements if possible

details? examples?

use Data Uris ? implement a timeline server?

Magic variables edit

e.g., Extension:Collection/XML Bridge/MWXHTML are expanded as in MediaWiki's XHTML output

can we mark them?