Antlr4: How to exit a grammar rule? -
so i"m experimenting antlr v4, , i'm poking unusual grammar sense of how works. here's current test case:
i'd grammar consists of letters a, b, c, d in order. letters may repeated. group a's , b's together, , c's , d's also, make grammar more interesting. strings these acceptable grammars:
aaa
abcd
acccdd
but it's not going well. think happening antlr needs better exit rule grammar. doesn't seem recognize after collecting a's , b's, presence of c means go next rule. it's sort of working, error messages, , resulting parse tree seems have null elements in it, inserted element issued error message.
here's example error message:
line 1:2 extraneous input 'c' expecting {'b', 'a'}
which happens input 'abcd'. weird going on when antlr sees c there. here's output of parse tree:
'abcd': (prog (aorb (a a) (aorb (b b) aorb)) (cord (c c) (cord (d d) cord)) <eof>)
which can see has empty aorb element there @ end of first set of elements.
any idea going on? antlr "thinking" here when issues error , adds empty element? , how might fix this?
ok, here gory details.
my grammar:
grammar abcd; prog : aorb cord eof; aorb : ( | b ) aorb ; : 'a'+ ; b : 'b'+ ; cord : ( c | d ) cord ; c : 'c'+ ; d : 'd'+ ;
my test program in java:
package antlrtests; import antlrtests.grammars.*; import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.tree.*; class abcdtest { private final string[] testvectors = { "a", "aabb", "b", "abcd", "c", "d", }; public void runtests() { for( string test : testvectors ) simpletest( test ); } private void simpletest( string test ) { antlrinputstream ains = new antlrinputstream( test ); abcdlexer wpl = new abcdlexer( ains ); commontokenstream tokens = new commontokenstream( wpl ); abcdparser wikiparser = new abcdparser( tokens ); parsetree parsetree = wikiparser.prog(); system.out.println( "'" + test + "': " + parsetree.tostringtree( wikiparser ) ); } }
and output of test program. note error message jumbled regular output because printed antlr on standard error.
run: line 1:1 no viable alternative @ input '<eof>' 'a': (prog (aorb (a a) aorb) cord <eof>) line 1:4 no viable alternative @ input '<eof>' 'aabb': (prog (aorb (a a) (aorb (b b b) aorb)) cord <eof>) 'b': (prog (aorb (b b) aorb) cord <eof>) line 1:1 no viable alternative @ input '<eof>' line 1:2 extraneous input 'c' expecting {'b', 'a'} line 1:4 no viable alternative @ input '<eof>' 'abcd': (prog (aorb (a a) (aorb (b b) aorb)) (cord (c c) (cord (d d) cord)) <eof>) line 1:0 no viable alternative @ input 'c' line 1:1 no viable alternative @ input '<eof>' line 1:0 no viable alternative @ input 'd' 'c': (prog aorb (cord (c c) cord) <eof>) line 1:1 no viable alternative @ input '<eof>' 'd': (prog aorb (cord (d d) cord) <eof>) build successful (total time: 0 seconds)
any appreciated.
is not you're after?
prog : 'a'* 'b'* 'c'* 'd'* eof;
the following rule of grammar matches infinitely long series of a
, b
tokens because tail recursive aorb
reference not optional. grammar either throw stackoverflowexception
if input starts sufficiently many a
and/or b
characters, or encounter syntax error if not.
aorb : ( | b ) aorb ;
if want maintain groupings, use grammar instead. made changes aorb
, cord
rules. since a
rule matches sequence of a
tokens, aorb
rule uses a?
instead of a*
(only 1 instance of a
ever appear, , entire series of a
tokens children).
grammar abcd; prog : aorb cord eof; aorb : a? b?; : 'a'+ ; b : 'b'+ ; cord : c? d?; c : 'c'+ ; d : 'd'+ ;
here grammar matches same language (but produces different parse tree) showing other options *
, +
, , ?
quantifiers. wouldn't recommend using grammar, should on understand each choice doing , understand why matches same input grammar gave above.
grammar abcd; prog : aorb cord? eof; aorb : a* b; : 'a' ; b : 'b'* ; cord : (c d* | d+); c : 'c'+ ; d : 'd' ;
Comments
Post a Comment