Debugging tips for commandline parsing (parse.php script)Edit

This section assumes you are in the bin/ directory.

Debugging the wt2html modeEdit

php bin/parse.php --help is a useful command to remember. Continue reading to find out more about a few of these options. Since Parsoid processes wikitext in a pipeline composed of synchronous and asynchronous phases, it is sometimes useful to know how to examine the contents of the pipeline at various stages.

1. If you want to debug the tokenizer, php bin/parse.php --trace peg is useful. Each time the tokenizer emits a token array to the next stage in the pipeline, this option prints out the token array.

[subbu@earth tests] echo -e "foo bar\nThis is a [[link]]" | php bin/parse.php --trace peg
0-[peg]        | ---->   ["foo bar"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[7,8]}}]
0-[peg]        | ---->   ["This is a ",{"type":"SelfclosingTagTk","name":"wikilink","attribs":[{"k":"href","v":["link"],"srcOffsets":[20,20,20,24],"vsrc":"link"}],"dataAttribs":{"tsr":[18,26],"src":"[[link]]"}}]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[26,27]}}]

The end-of-output is signalled by the EOFTk. Also note that multiple tokenizers might be active at the same time because of concurrent template expansions. Future enhancement of this debugging output would assign debug ids to every tokenizer and use that id to distinguish output between tokenizers.

2. If you want to look at the fully expanded and in-order token-stream, php bin/parse.php --trace tsp is your friend. This emits the tokens as seen by the TokenStreamPatcher handler which is the very first handler in the in-order third phase synchronous transformation passes. So, it is a good proxy for the in-order token stream of the top-level document.

[subbu@earth tests] echo -e "foo \n{{1x|bar}}\n[[link]]" | php bin/parse.php --trace tsp
0-[TSP]        | "foo "
0-[TSP]        | {"type":"NlTk","dataAttribs":{"tsr":[4,5]}}
0-[TSP]        | {"type":"SelfclosingTagTk","name":"meta","attribs":[{"k":"typeof","v":"mw:Transclusion"},{"k":"about","v":"#mwt1"}],"dataAttribs":{"tsr":[5,15],"src":"{{1x|bar}}","tmp":{"tplarginfo":"{\"dict\":{\"target\":{\"wt\":\"1x\",\"href\":\"./Template:1x\"},\"params\":{\"1\":{\"wt\":\"bar\"}}},\"paramInfos\":[{\"k\":\"1\",\"srcOffsets\":[10,10,10,13]}]}"}}}
0-[TSP]        | "bar"
0-[TSP]        | {"type":"SelfclosingTagTk","name":"meta","attribs":[{"k":"typeof","v":"mw:Transclusion/End"},{"k":"about","v":"#mwt1"}],"dataAttribs":{"tsr":[null,15]}}
0-[TSP]        | {"type":"NlTk","dataAttribs":{"tsr":[15,16]}}
0-[TSP]        | {"type":"TagTk","name":"a","attribs":[{"k":"rel","v":"mw:WikiLink"},{"k":"href","v":"./Link"},{"k":"title","v":"Link"}],"dataAttribs":{"tsr":[16,24],"stx":"simple","a":{"href":"./Link"},"sa":{"href":"link"}}}
0-[TSP]        | "link"
0-[TSP]        | {"type":"EndTagTk","name":"a","attribs":[],"dataAttribs":{}}
0-[TSP]        | {"type":"NlTk","dataAttribs":{"tsr":[24,25]}}
0-[TSP]        | {"type":"EOFTk"}

3. If you want to look at the fully processed and transformed token stream (post all transformations), php bin/parse.php --trace html is a good proxy. The output is a little bit noisier than it needs to be. Refining it and making it more useful is left as an enhancement.

4. If you want to look at the DOM at different stages of transformation, --dump dom:post-builder, --dump dom:pre-dsr, --dump dom:pre-encap are useful DOM debug options which can be combined as --dump dom:pre-dsr,dom:pre-encap

5. Sometimes, it is useful to look at the preprocessed template source that Parsoid then tokenizes. --dump tplsrc is useful in those scenarios.

6. There are a bunch of other handler-specific tracing flags. "php bin/parse.php --help" should tell you what they do. There are tracers for the PreHandler, ListHandler, and ParagraphWrapper. There is no tracer currently for the QuoteTransformer.

Debugging the html2wt modeEdit

php bin/parse.php --help has a few options to help debug the serializer (this converts HTML to Wikitext).

1. If you want to trace the actions of the regular serializer, --trace wts is what you want.

$ echo "<p>foo</p>" | php bin/parse.php --html2wt --trace wts

2. If you want to debug the wikitext escaping behavior of the serializer, --trace wt-escape is what you want.

$ echo "<p> foo\nbar</p>\n\n<p>*a\n*b</p>" | php bin/parse.php --trace wt-escape --html2wt

Debugging the selective serializer (selser)Edit

In order to test the selective serializer, you need (a) original wikitext (b) original html (c) modified html. Strictly speaking, (b) is not necessary since selser reparses (a) to generate (b) as necessary. However, in certain cases where you want to control testing conditions, it is useful to provide original HTML as well.

Running selser (for HTML with inline data-parsoid)Edit

Let us first look at ways to test the selective serializer on the command line.

$ echo "<p>foo</p><p>boo</p>" > /tmp/wt
$ php bin/parse.php < /tmp/wt > /tmp/orig.html 
$ sed "s/foo/bar/g" < /tmp/orig.html > /tmp/edited.html
$ php bin/parse.php --selser --oldtextfile /tmp/wt < /tmp/edited.html
$ php bin/parse.php --selser --oldtextfile /tmp/wt --oldhtmlfile /tmp/orig.html  < /tmp/edited.html

Running selser on HTML and data-parsoid in separate filesEdit

This is useful to testing parsoid output after dumping orig HTML and edited HTML (from VE -- see instructions for doing that later in this file) and fetching data-parsoid from RESTBase. This effectively simulates the v2 html2wt API endpoint that VE and other clients use via RESTBase.

$ node parse --dpinfile data-parsoid.json --selser --oldhtmlfile old.html --oldtextfile wt.txt < new.html

There are entirely commandline options for running selser for very simple examples. Check php bin/parse.php --help to find out more.

Debugging DOMDiffEdit

Selser first compares the old and new html and generates a diff-marked DOM. This is the DOMDiff class. There is a commandline script to test and debug this functionality in isolation.

Output edited below to fit window

$ php bin/domdiff.test.php /tmp/orig.html /tmp/edited.html
----- DIFF-marked DOM -----
<html data-parsoid-diff="{&quot;id&quot;:null,&quot;diff&quot;:[&quot;subtree-changed&quot;]}">
<body data-parsoid="{&quot;dsr&quot;:[0,21,0,0]}" data-parsoid-diff="{&quot;id&quot;:null,&quot;diff&quot;:[&quot;subtree-changed&quot;]}">
<p data-parsoid="{&quot;stx&quot;:&quot;html&quot;,&quot;dsr&quot;:[0,10,3,4]}" data-parsoid-diff="{&quot;id&quot;:null,&quot;diff&quot;:[&quot;children-changed&quot;,&quot;subtree-changed&quot;]}">bar</p>
<p data-parsoid="{&quot;stx&quot;:&quot;html&quot;,&quot;dsr&quot;:[10,20,3,4]}">boo</p>

You can look at (currently very) verbose output of DOM-diffing by turning on the --debug option.

$ php bin/domdiff.test.php --debug /tmp/orig.html /tmp/edited.html
$ php bin/domdiff.test.php --help

Debugging selserEdit

To debug the selective serializer, you can use --trace selser. Using this flag will automatically enable tracing of the regular serializer, so there is no need to say --trace wts as well.

$ php bin/parse.php --selser --oldtextfile /tmp/wt --oldhtmlfile /tmp/orig.html --trace selser < /tmp/edited.html

Debugging what's happening on a local mediawiki installEdit

$ $ MW_INSTALL_PATH=/var/www/html/mediawiki php bin/parse.php --integrated --domain http://localhost --pageName Main_Page < /dev/null > /dev/null will throw what happens with a given page on the locally installed MediaWiki on the command line.

Debugging tips for parser tests (parserTests.js script)Edit

Running tests in all modesEdit

parserTests is the script to run parser tests. The following command runs tests in all 5 modes. Of course, you can run tests for any of the 5 combinations. The default commandline runs in 4 modes (excludes selser mode).

$ php bin/parserTests.php --wt2html --wt2wt --html2wt --html2html --selser

All commandline options that the parse.js script accepts can be passed into parserTests as well. So, the debugging techniques from the previous section are applicable here. In addition, a couple parser tests specific options are useful when debugging parser test failures.

Running a subset of testsEdit

parserTests accepts the --filter <string> option which can be used to run a single test or a subset of tests. Examples below:

$ php bin/parserTests.php --wt2html --filter "Tabs don't trigger preformatted text"
$ php bin/parserTests.php --wt2wt --selser --filter "Tabs don't trigger preformatted text" 
--trace wts,selser
$ php bin/parserTests.php --wt2wt --selser --filter "Tabs don't trigger preformatted text" 
--trace wts,selser --no-blacklist

The last commandline ignores the blacklist entries and dumps failure output (input, expected output, and rendered output).

Running selser with a specific editEdit

In selser mode, parserTests script generates a bunch of edited DOM tests by generating random DOM changes and applying those to the HTML and running selser test on it. The generated changes is called a changetree and is specific to the wt2html output produced on the wikitext for a test. In order to run a selser test for a specific change-tree, the --changetree commandline flag can be used.

$ php bin/parserTests.php --selser --filter "Tabs don't trigger preformatted text" --changetree "[2,0,0]" --no-blacklist

Usually this last commandline is used to debug a failing selser test as recorded in the blacklist file and determining whether wt2wt output is incorrect or selser output is incorrect. This is easy to determine by dumping the edited DOM after the changetree is applied to the wt2html output as follows:

$ node parserTests --selser --filter "Tabs don't trigger preformatted text" 
--changetree "[2,0,0]" --dump dom:post-changes --no-blacklist
WARNING: parserTests.txt not up-to-date with upstream.
ParserTests running with node v0.10.19
Initialisation complete. Now launching tests.
Change tree: [2,0,0]
DOM with changes applied: <body data-parsoid="{&quot;dsr&quot;:[0,75,0,0]}">gzlly4beyrx561or<p data-parsoid="{&quot;dsr&quot;:[0,33,0,0]}">	This is not
	 preformatted text.</p>
<pre data-parsoid="{&quot;dsr&quot;:[34,75,1,0]}">This is preformatted text.
	So is this.</pre></body>

Using scriptEdit

For debugging selser failures for a specific test with a specific edit, script is your friend. It takes the test name as the first argument and the edit tree as its second argument.

Running a roundtrip test and emitting roundtrip diffs (roundtrip-test.js)Edit

You can use roundtrip-test.js to run a roundtrip test (converting from html to wikitext and back) on a title on a wiki with a registered wiki prefix. Roundtrip-test additionally supports trace, dump, and debug flags that are passed through to the parser and the serializer.

$ node roundtrip-test.js --prefix enwiki "Medha Patkar"
.... verbose diff omitted here ....
Semantic differences : 0
Syntactic differences: 1
ALL differences      : 1
$ node roundtrip-test.js --prefix enwiki "Medha Patkar" --trace selser
["WTS:"," DOM ==> \n","<body data-parsoid
.... verbose diff omitted here ....

Other helpful scriptsEdit

fetch-wt.js is a useful script to fetch wikitext for a revision. This is useful when you want to debug Parsoid behavior on the commandline for a specific page.

[subbu@earth tests] fetch-wt.js --help
Usage: node ./fetch-wt.js [options] <page-title or rev-id>
If first argument is numeric, it is used as a rev id; otherwise it is
used as a title.  Use the --title option for a numeric title.

  --output  Write page to given file                                                                           
  --prefix  Which wiki prefix to use; e.g. "enwiki" for English Wikipedia, "eswiki" for Spanish,
            "mediawikiwiki" for  [default: "en"]
  --revid   Page revision to fetch                                                                             
  --title   Page title to fetch (only if revid is not present)                                                 
  --help    Show this message

Generating PHP parser output (without Tidy) on snippetsEdit

Note that running the PHP parser without Tidy enabled has been deprecated and will shortly be removed. In addition, the below is out-of-date since the maintenance/parse.php script in core has been tweaked to tidy by default. If you really want to see no-tidy output, you need to give the --no-tidy option to parse.php.

In the browserEdit

This requires you to have a mediawiki install with Tidy not enabled. The default Mediawiki installation comes without Tidy enabled.

Create a wikipage in your browser and save or look at preview.

View Source in the browser (Inspect Element in the DOM inspector will show you the DOM that your browser helpfully fixed up for you, but that is not what you want).

Via parserTests script on the commandlineEdit

Create a test file with a wikitext snippet you want to generate output for in the parser tests format but leave the result section empty (see example below)

!! test
Sample test
!! input

!! result
!! end

Now run this through php parserTests script as follows

$ cd <mediawiki-core-install>
$ php tests/parserTests.php --file=/tmp/tst
This is MediaWiki version 1.23alpha (edac6c3).

Running test Sample test... FAILED!
--- /tmp/mwParser-1026460993-expected	2013-11-15 16:39:57.096375665 -0600
+++ /tmp/mwParser-1026460993-actual	2013-11-15 16:39:57.096375665 -0600
@@ -1 +1,7 @@

Passed 0 of 1 tests (0%)... 1 tests failed!

And you have the PHP parser's output without Tidy getting in your way.

Stepping through the ParsoidService.js (for Parsoid/JS)Edit

Pass `-n 0` to avoid spawning workers with cluster.

npm install -g node-inspector
node --debug-brk api/server.js -n 0
node-inspector &

Dumping HTML DOM in VEEdit

In Chrome or Firefox, you can interactively explore the HTML in the console by typing:

  • for the original HTML before edits, or
  • for the HTML after edits

For further analysis, it might be helpful to copy the HTML as a string, so that you can paste it into a file for further analysis or debugging. To do this, use:

  • copy( for the original HTML before edits, or
  • copy( for the HTML after edits.

Good test pagesEdit

Purging RESTBaseEdit

Use the regular parsercache purge urls to purge stored content from RESTBase as well (https://wiki-base-url/wiki/$title?action=purge, Ex: Monitor the etags to determine when it took effect.

See alsoEdit