User:OrenBochman/ParserNG/Transliterator Antlr

Tranlitatrator Filter Antlr edit

"To make ANTLR generate lexers that behave like the UNIX utility sed (copy standard in to standard out except as specified by the replace patterns), use a filter rule that does the input to output copying:" - antlr docs [1]

class cfgSed extends Lexer;
options {
  k=2;
  filter=IGNORE;
  charVocabulary = '\3'..'\177';

  //if dictionary is needed
  map<String,String> dictionary = loadDictionary();

}

//example of unicode to unicode conversion;
ALPHA1 : '\u000X'-'\u000Y';

KENJI  : src:ALPHA1;
        { System.out.print(dictionary.get(src)); } // filter output
        ;

protected
IGNORE
  :  ( "\r\n" | '\r' | '\n' )
     {newline(); System.out.println("");}
  |  c:. {System.out.print(c);}
  ;

based on [2]

the idea is to use a dictionary (map) or a conversion function to replace the detected char set.

Usage edit

this filter can be:

  • Integrated into the lexer (one scan would be fastest).
  • Run as a seperate step (modular, slow, easily configurable).

Issues edit

  • Best if translitration is an offset or a call to an outside module.
  • Dictionry and look ahead provide maximum tanliteration power.