ANTLRWorks, whitespaces, memory leaks, and crashing -
i wanted try tool, antlr, arrive parse code , refactor it. tried small grammars, ok, took next step , started parsing sort of simple c#.
news: takes 10 minutes understand basics.
extrememly bad news: takes hours understand how parse 2 spaces instead of one. really. things hates whitespaces, , has no shame in telling that. started think unable parse them, went right way... or @ least thought so.
now problem of spaces comes after fact antlrworks tries allocate half gb of ram , cannot parse anything.
the grammar not hard, being beginner:
grammar newemptycombinedgrammar; tokenendcmd : ';' ; tokenglobimport : 'import' ; tokenglobnamespace : 'namespace' ; tokenclass : 'class' ; tokensepfloat : ',' ; tokensepnamespace : '.' ; fragment tokenemptystring : '' ; tokenunderscore : '_' ; tokenargssep : ',' ; tokenargsopen : '(' ; tokenargsclose : ')' ; tokenblockopen : '{' ; tokenblockclose : '}' ; // -------------------- digit : [0-9] ; numberint : digit+ ; numberfloat : numberint tokensepfloat numberint ; wordci : [a-za-z]+ ; wordup : [a-z]+ ; wordlw : [a-z]+ ; // ----------------- keyword : (wordci | tokenunderscore+) (numberint | wordci | tokenunderscore)* ; // --------------------- spaces : (' ' | '\t')+ ; spacelns : (' ' | '\t' | '\r' | '\n')+ ; spacesopt : spaces* ; spacelnsopt : spacelns* ; // --------------------- // tipo "system" o "system.net.socket" namepacenamecomposited : keyword (tokensepnamespace keyword)* ; // import system; import system.io; globimport : tokenglobimport spaces namepacenamecomposited spacesopt tokenendcmd ; // class class1 {} namespaceclass : tokenclass spaces keyword spacelnsopt tokenblockopen spacelnsopt tokenblockclose ; // "namespace ns1 {}", "namespace ns1.sns2{}" globnamespace : tokenglobnamespace spaces namepacenamecomposited spacelnsopt tokenblockopen spacelnsopt namespaceclass spacelnsopt tokenblockclose ; globfile : (globimport | spacelnsopt)* (globnamespace | spacelnsopt)* ;
but still when globfile
or globnamespace
added ide starts allocate memory there's no tomorrow, , that's problem.
so
-is way of capturing whitespaces right? (i don't want skip them, that's point)
-is memory leaking recursion don't see?
the code thing able parse like:
import system; namespace anamespace{ class aclass{ } }
globfile
main rule, way.
you should define lexer token treat whitespaces way need to. if want group of consecutive space or tab characters form single token, use definition following. in case, reference whitespace in parser rules whitespace
(required) or whitespace?
(optional).
// antlr 3: whitespace : (' ' | '\t')+; // antlr 4: whitespace : [ \t]+;
if want every individual whitespace character own token, use following. in case, reference whitespace in parser rules whitespace+
(required) or whitespace*
(optional).
// antlr 3: whitespace : ' ' | '\t'; // antlr 4: whitespace : [ \t];
the question memory leaks belongs on antlrworks issue tracker.
- antlrworks 1 issue tracker: https://github.com/antlr/antlrworks/issues
- antlrworks 2 issue tracker: https://bitbucket.org/sharwell/antlrworks2/issues
Comments
Post a Comment