I think you're making some heavy mistakes there. I also was thinking about such an live editor and semantic autoformatting, and instead of starting hacking (OK, I did and rewrote Preprocessor_DOM in javascript) I pondered a lot about parsing and editing. I had loved to join in the hackaton, but I had to learn for my tests.
- First of all you're completely forgetting "inclusion zones" (<includeonly> etc), which are even used in article namespace (w:de:Wikipedia:WikiProjekt Begriffsklärungsseiten/FAQ#Wie funktioniert das nochmal mit dem Per-Vorlage-Einbinden und dem noinclude.2Fonlyinclude.3F) and in talkspaces as a workaround for Extension:Labeled Section Transclusion.
- The proplems described in the section #Constraints are more common than you seem to believe. Lots of things are based on the use of templates as and in attributes (eg
{| {{orangetable}} ...
), and its unimaginable to do without.
Templates are never complete documents, some are even designed to be tablestarters or new-liners. How would you parse structures like{{#if:...| {{!-}} ...}}
?
And some templates even need to be invalid, because they would be much much larger instead. Examples would be de:Wikipedia:Formatvorlage Bahnstrecke#Beispielanwendung or de:Vorlage:Infobox Schiff/DokuOhneTyp#Beispiel. - Another topic is the editing of template pages. A preview for test parameters (and test environment) would be nice, also dynamic nesting of templates etc. I can't see how to deal with such requests in the proposed DOM.
At first I also thought about a top-down document model, but I fastly came to the conclusion that this is only doable at very, very simple pages. A autoformatter that sees an unclosed table/div/whatever never knows what's hidden in the following templates. A live-parser/autoformatter/semantic lexer has to use a bottom-up model, just like the current parser. Steps would be
- Getting the xml-like tag hooks, comments and inclusion handlers (what to do if malformed? Current: run to the end)
- Parsing headings, templates and tpl-arguments
- expanding templates
- parsing wikitexts into tables/blocks/images/whatever and doing text annotations
- tidy the generated html for output
The current parser does the first two steps together, semantically they could be divided. I'm not sure about the fourth step, I've not dived into the source code yet so maybe I'm writing nonsense about that.
My conclusion is that a semantic lexer has to start at the bottom, a autoformatter or editing transaction needs to run down from the top (generated result) again. Everthing other would narrow the required syntax possibilities.
Of course, I think its right to have the document-block-annotatedText model as a data format for saving pages with parsing possibilities to html4, html5, pdf, rss etc, for quick-generating cached content and, most of all, for creating diffs. But for editing we will have to go deeper into wikitext, which has to stay as uncomfortable as today, and templates should not be a part of the DOM.