Parsoid/API
Parsoid provides the following REST API endpoints to Parsoid's clients to convert MediaWiki's Wikitext to XHTML5 + RDFa and back.
Common HTTP headers supported in all entry points
edit- Accept-encoding
- Please accept gzip.
- Cookie
- Cookie header that will be forwarded to the Mediawiki API. Makes it possible to use Parsoid with private wikis. Setting a cookie implicitly disables all caching for security reasons, so do not send a cookie for public wikis if you care about caching.
- x-request-id
- A request id that will be forward to the Mediawiki Api.
v3 API
editCommon path parameters across all requests
edit- domain
- The hostname of the wiki.
- title
- Page title -- needs to be urlencoded (percent encoded).
- revision
- Revision id of the title.
- format
- Input / output format of content - wikitext, html, or pagebundle
- wikitext
- Plain text that is treated as wikitext. Content type is text/plain.
- html
- Parsoid's XHTML5 + RDFa output, which includes inlined data-parsoid attributes. The HTML conforms to the MediaWiki DOM spec. Content type is text/html.
- pagebundle
- A JSON blob containing the above html with the data-parsoid attributes split out and ids added to each node. Content type is application/json.
Pagebundle blobs have the form,
{
"html": {
"headers": {
"content-type": "text/html;profile='mediawiki.org/specs/html/1.0.0'"
},
"body": "<!DOCTYPE html> ... </html>"
},
"data-parsoid": {
"headers": {
"content-type": "application/json;profile='mediawiki.org/specs/data-parsoid/0.0.1'"
},
"body": {
"counter": n,
"ids": { ... }
}
}
}
Common payload / querystring parameters across all formats
editFor wikitext -> HTML requests
edit- body_only
- Optional boolean flag, only return the HTML body.innerHTML instead of a full document.
For HTML -> wikitext requests
edit- scrub_wikitext
- Optional boolean flag, which normalizes the DOM to yield cleaner wikitext than might otherwise be generated.
GET
editWikitext -> HTML
edit GET /:domain/v3/page/:format/:title/:revision?
- revision
- Revision is optional, however GET requests without a revision id should be considered a convenience method. If no revision id is provided, it'll redirect to the latest revision.
- format
- One of html or pagebundle
Some querystring parameters are also accepted: body_only
POST
editThe content type for the POST payload can be: application/x-www-form-urlencoded
, application/json
, or multipart/form-data
Wikitext -> HTML
edit POST /:domain/v3/transform/:from/to/:format/:title?/:revision?
- from
- wikitext
- format
- One of html or pagebundle
The payload can contain,
{
"wikitext": "...", // if omitted, a title is required to fetch wt source
"body_only": true, // optional
"original": {
"title": "...", // optional, and instead of in the path
"revid": n, // optional, and instead of in the path
}
}
Some other fields exist (including previous
for expansion reuse). See Parsoid's API test suite for their use.
HTML -> Wikitext
editPOST /:domain/v3/transform/:from/to/:format/:title?/:revision?
- from
- One of html or pagebundle
- format
- wikitext
The payload can contain,
{
"html": "...",
"scrub_wikitext": true, // optional
"original": {
"title": "...", // optional, and instead of in the path
"revid": n, // optional, and instead of in the path
"wikitext": "...", // optional, but the following three provide original data used in the selective serialization strategy
"html": "...",
"data-parsoid": { ... }
}
}
Parsoid serializes HTML to a normalized form of wikitext. In order to avoid "dirty diffs" (differences outside the edited region of content) when serializing HTML generated from a given wikitext source, pass in the revision (either as revision
in the path or original.revid
in the payload) and optionally (as an optimization, because Parsoid will fetch / generate them if they're missing) the source, original.wikitext
, and unedited html, (original.html
and original['data-parsoid']
). This strategy is known as "selective serialization"; an example of which can be seen in the test suite.
HTML -> HTML
editPOST /:domain/v3/transform/pagebundle/to/pagebundle/:title?/:revision?
Parsoid exposes an API which transforms Parsoid-format HTML (encapsulated as a page bundle) to itself, performing a number of possible transformations. T114413 discusses some of the transformations, both actual and potential.
The payload is of the form:
{
original: {
html: {
headers: {
'content-type': 'text/html; charset=utf-8; profile="https://mediawiki.org/wiki/Specs/DOM/1.2.1"'
},
body: '<html>...</html>'
}
},
updates: {
transclusions: ...,
media: ..., // Could specific the exact image to update later.
redlinks: { ... },
variant: { ... }
}
}
The original
field is a pagebundle blob, as described above.
The updates
field specifies the desired transformations, which are described in more detail below.
Redlinks
editXXX: write me
Variant
editSee T43716.
XXX: write me
Content up/downgrade
editXXX: write me
Wikitext -> Lint
editPOST /:domain/v3/transform/wikitext/to/lint/:title?/:revision?
Parsoid also exposes an API to get wikitext "syntax" errors for a given page, revision or wikitext.
The payload can contain:
{
"wikitext": "...", // if omitted, a title or revision is required to fetch lint errors
}
Examples
editFor more intricate examples, see Parsoid's API test suite.
Wikitext -> HTML
editGET
editSome simple GET requests to a Parsoid HTTP server bound to localhost:8000
.
http://localhost:8000/en.wikipedia.org/v3/page/html/User:Arlolra%2Fsandbox/696653152
Returns text/html
http://localhost:8000/en.wikipedia.org/v3/page/pagebundle/User:Arlolra%2Fsandbox/696653152?body_only=true
Returns application/json
POST
editPOSTing the following blob,
{
"wikitext": "== h2 =="
}
to,
http://localhost:8000/localhost/v3/transform/wikitext/to/html/
returns,
<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head ...>...</head><body data-parsoid='{"dsr":[0,8,0,0]}' lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body mw-body-content mediawiki" dir="ltr"><h2 data-parsoid='{"dsr":[0,8,2,2]}'> h2 </h2></body></html>
HTML -> Wikitext
editPOST
editPOSTing the following blob,
{
"html": "<html><body>foo <b>bar</b></body></html>"
}
to http://localhost:8000/localhost/v3/transform/html/to/wikitext/
returns
foo '''bar'''
Wikitext -> Lint
editPOST
editPOSTing the following blob
{
"wikitext": "<div/>"
}
to http://localhost:8000/localhost/v3/transform/wikitext/to/lint
returns
[
{
"type": "self-closed-tag",
"params": {
"name": "div"
},
"dsr": [
0,
6,
6,
0
]
}
]
Using CURL, this works well, replace "LinterTest" with the appropriate wikipage and this will go to the most recent version using the -L follow redirect option
$ curl -X POST http://localhost:8080/rest.php/localhost/v3/transform/wikitext/to/lint/LinterTest -L -H "Content-Type: application/x-www-form-urlencoded" -d ""
Produces:
[{"type":"misnested-tag","dsr":[21,33,3,0],"params":{"name":"i"}},{"type":"misnested-tag","dsr":[78,90,3,0],"params":{"name":"i"}},{"type":"obsolete-tag" ...
Content Negotiation
edit- Accept
text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/2.0.0"
When making a parse requests (wikitext->HTML), passing an Accept
header defining an acceptable spec version will induce Parsoid to return HTML that satisfies that version, following Semantic Versioning caret semantics, or error with a 406
status code.
Older entry points
editThese versions have been deprecated.