Will code using these new hooks have access to the ParserOutput object? This is used by a lot of tags/parser functions to add tracking categories, page_props, RL modules and other metadata.
Topic on Talk:Parsoid/Extension API
Access to ParserOutput?
Hook methods ( in ExtensionTagHandler and DOMProcessor ) and/or ParsoidExtensionAPI need to expose the ParserOutput instance since extensions currently rely on accessing it and updating state.
Good question. The current draft doesn't. But, we could potentially expose both ParserOptions and ParserOutput objects via ParsoidExtensionAPI. @CAnanian (WMF) is working on refactoring the core classes and so the specifics of what methods and properties those classes export might change between now and then. But, that is a detail we can ignore here.
Alternatively, we could proxy the desired functionality through ParsoidExtensionAPI object.
I am not certain yet whether proxying is better OR direct exposure of ParserOptions and ParserOutput classes is better. Thoughts? For example, with the Sanitizer object, we started off with proxying and are now leaning towards direct exposure of the Sanitizer class' API.
For ParserOutput I personally lean towards direct exposure, because as far as I remember it it's already pretty tailored to things that make sense to access/store while parsing and that are cacheable. If there are things in there that don't make sense in a Parsoid world (and are more specific to the wikitext parser world), then maybe that's an issue, but if not then I think you're probably better off not reinventing the numerous wheels that ParserOutput has acquired over time.
My initial thought here is to proxy desired functionality since (a) there are possibilities of ParserOptions and ParserOutput exposing more configurations than is usable by extensions directly (b) backdoor access to parsing functionality depending on what they expose (c) expands the compatibility interface that we will have to maintain and cannot refactor / modify freely.
But, depending on what the ParserOutput refactor yields, it might be possible to have it be narrow / abstract enough to not have these pitfalls.
Heh! "edit conflict" :-) Yes, depending on the specific details of what the interface exposes, direct exposure can be better. We'll review code and update. Thanks for flagging this gap.
I just looked at that class, and it has 100 odd methods and other public properties. So, that spells the end of any proxying desires. Narrowing interfaces any further is probably best done at a future date. But, for now, it does make sense to expose the ParserOutput class.