Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

ANTLR: how to debug a misidentified token

I am trying to implement a grammar in Antlr4 for a simple template engine. This engine consists of 3 different clauses:

IF ANSWERED ( variable )

END IF

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Variable

Variable can be any upper or lowercase letter including white spaces. Both IF ANSWERED and END IF are always uppercase.

I have written the following grammar/lexer rules so far, but my problem is that IF ANSWERED keeps getting recognized as a Variable and not as 2 tokens IF and ANSWERED.

grammar program;

/**grammar */
command: (ifStart | ifEnd | VARIABLE ) EOF;

ifStart: IF ANSWERED '(' VARIABLE ')';

ifEnd: 'END IF';

/** lexer */

IF: 'IF';
ANSWERED: 'ANSWERED';

TEXT: (LOWERCASE | UPPERCASE | NUMBER) ;
VARIABLE: (TEXT | [ \t\r\n])+;

fragment LOWERCASE: [a-z];
fragment UPPERCASE: [A-Z];
fragment NUMBER: [0-9];

If I try to parse IF ANSWERED ( FirstName ) I get the following output:

[@0,0:10='IF ANSWERED',**<VARIABLE>**,1:0]
[@1,11:11='(',<'('>,1:11]
[@2,12:25='Execution date',<VARIABLE>,1:12]
[@3,26:26=')',<')'>,1:26]
[@4,27:26='<EOF>',<EOF>,1:27]
line 1:0 mismatched input 'IF ANSWERED' expecting 'IF'

I read that Antlr4 is greedy and tries to match the biggest possible token, but I fail to understand what is the correct approach, or how to think through the problem to find a solution.

>Solution :

Correct: ANTLR’s lexer is greedy, and tries to consume as much as possible. That is why IF ANSWERED is tokenised as a TEXT token instead of 2 separate keywords. You’ll need to change TEXT so that it does not match spaces.

Something like this could get you started:

parse
 : command* EOF
 ;

command
 : (ifStatement | variable)+
 ;

ifStatement
 : IF ANSWERED '(' variable ')' command* END IF
 ;

variable
 : TEXT
 ;

IF       : 'IF';
END      : 'END';
ANSWERED : 'ANSWERED';
TEXT     : [a-zA-Z0-9]+;
SPACES   : [ \t\r\n]+ -> skip;
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading