User:OrenBochman/ParserNG/Transliterator Antlr

Tranlitatrator Filter AntlrEdit

"To make ANTLR generate lexers that behave like the UNIX utility sed (copy standard in to standard out except as specified by the replace patterns), use a filter rule that does the input to output copying:" - antlr docs [1]

class cfgSed extends Lexer;
options {
  charVocabulary = '\3'..'\177';

  //if dictionary is needed
  map<String,String> dictionary = loadDictionary();


//example of unicode to unicode conversion;
ALPHA1 : '\u000X'-'\u000Y';

KENJI  : src:ALPHA1;
        { System.out.print(dictionary.get(src)); } // filter output

  :  ( "\r\n" | '\r' | '\n' )
     {newline(); System.out.println("");}
  |  c:. {System.out.print(c);}

based on [2]

the idea is to use a dictionary (map) or a conversion function to replace the detected char set.


this filter can be:

  • Integrated into the lexer (one scan would be fastest).
  • Run as a seperate step (modular, slow, easily configurable).


  • Best if translitration is an offset or a call to an outside module.
  • Dictionry and look ahead provide maximum tanliteration power.