User:OrenBochman/ParserNG/antlr
The preprocessor is tricky business to code.
- It has recursive rules a complicated (for parsing) format.
- double curly expressions can have different semantics
- triple curlies too
- single curlies also on occasion.
next follow some advanced Antlr specific features illustrated by things they could help to fix.
Recursive Lexer / Parser Rules
editWhen creating a recursive expresions definition, the lowest nesting level is the lowest precedence of the operator matched in that rule.
Asociativity
editexpr: mult ('+' mult)* ; // left-associative via (...)* mult: pow ('*' pow)* ; pow : atom ('^' pow)? ; // right-associative via tail recursion atom: ID| INT ;
In The PreProcessor
editthus:
grammar:PreProcessor; members { String myName; boolean isTemplate(String name){...} String getTemplate(String name, List<String> args){...} boolean isTemplateArg(String name){...} String getTemplateArg(String name){...} boolean isMagicWord(String name){...} String getMagicWord(String name, List<String> args){...} boolean isParserFunction(String name){...} String getParserFunction(String name, List<String> args){...} } wikiExpr: curlyExpr; curlyExpr: parserFunction| magicWord | templateVar | template ; angleExpr: ext | nowiki | noInclude | includeOnly | comment | pre ; template: {isTemplate(input.LT(2).text)}?=>'{{' NAME TEMPLATE_ARGS | curlyExpr | /* empty */ '}}'; // right-associative via tail recursion templateVar: {isTemplate(input.LT(2).text)}?=>'{{{' NAME | /* empty */ '}}}'; // right-associative via tail recursion parserFunction : {isParserFunction(input.LT(2).text)}?=>'{{' NAME PF_ARRGS '}}'; //curly atom magicWord : {isMagicWord(input.LT(2).text)}?=>'{{' NAME MW_ARGS '}}'; //curly atom //lexer rules PF_ARRGS: ... ; MW_ARGS: ... ; TEMPLATE_ARGS: ... ;
Semantic and Syntactic Predicates
edittwo advanced ideas are
syntactic predicates
editparts of a template look like:
parts:
title=LITERAL (PIPE (args=LITERAL)*)*;
semantic predicates
editUsing Semantics To Simplify Syntax Parsing
editdoes {{LITERAL| ... }} reffer to a parser function, magic word or a template ?
since the parser should be privy to a symbol table of parser_functions, magic_words, templates, variables, etc they can be used to simplify (read: speed up parsing using a simple semantic check (isMagicWord) instead of a deep syntax scan.) in Antlr speak lookahead is said to be analogous to checking all options of the maze, using predicates is called sending in a trained monkiess to stand in the required junction points.
IncludeOnly and NoInclude semantics
edit- IncludeOnly means include the following text only within the scope of the tansculded page (expanded page).
- NoInclude means include the following text only within the scope of the template page (expanded page).
//todo prototype and paste here :-)
Look Ahead
editThis is an example of using look ahead in the parser. There is also lookahead in the lexer. But it does not return string only ints (of chars).
Implementing some parser functions
editif switch etc are the where parsers shine. the parser rules below use sysntactic predicates:
stat: keyIF expr stat
| keyCALL ID ';'
| ';'
;
/** An ID whose text is "#if" */
keyIF : {input.LT(1).getText().equals("#if")}? ID ;
/** An ID whose text is "#switch" */
keyCALL : {input.LT(1).getText().equals("#switch")}? ID ;
Implementing some Magic Words
editMagic words can be trickier to take care of since. one idea is to keep them in a data structure like a hash or a trie.
@members{
public boolean isMagic(String token){
return (magic.isKey(token))); // a hash of magic words token (and thier functor objects)
}
}
keyCALL : {isMagic(input.LT(1).getText())}? ID ;