How to unify similar tokens when constructing parsers and lexers.

Filed under: Articles, Compilers & Interpreters — Leave a comment

May 12, 2014

In this article I’ll show you how syntactically similar tokens can be handled in a unified way in order to simplify both lexical and syntactic analysis. The trick is to use one token for several similar operators in order to keep down the size of the grammar in the parser and to simplify the regular expression rules in the lexer.

Handling the following relational and equality operators in such a way is a very common use.

Relational Operators:

">", "<", ">=", "<="

Equality Operators:

"==", "!="

In order to implement this technique we have to write appropriate code both in lexer and parser. First, we have to define in a shared header file so both the lexer and the parser can have access the enumerations for the relational and equality operators:

enum relational_operator
{
  LESS,
  LESS_OR_EQUAL,
  GREATER,
  GREATER_OR_EQUAL
};

enum equality_operator
{
  EQUAL,
  NOT_EQUAL
};

Here is a code snippet of a simple Flex lexer that uses this common idiom:

%%
... more lexer rules ...

"<"  { yylval.operator = LESS; return RELATIONAL; }
"<=" { yylval.operator = LESS_OR_EQUAL; return RELATIONAL; }
">"  { yylval.operator = GREATER; return RELATIONAL; }
">=" { yylval.operator = GREATER_OR_EQUAL; return RELATIONAL; }

"==" { yylval.operator = EQUAL; return EQUALITY; }
"!=" { yylval.operator = NOT_EQUAL; return EQUALITY; }

... more lexer rules ...
%%

Please, take account that the relational and the equality operators have been separated only because we have chosen to give to equality operators a lower precedence level. We follow this scheme inspired by the C programming language which gives a higher precedence to relational operators.

Next, in order to code appropriately a Bison parser to support this technique we have to use a union type for the “yylval” to store the value of the RELATIONAL and EQUALITY tokens. Here follows a code snippet for the union type:

%union {
  ... more symbol data types ...

  /* which operator to use (for relational and equality operators). */
  int operator;

  ... more symbol data types ...
}

Furthermore, since both the RELATIONAL and EQUALITY tokens have a value indicating a specific operator we have to declare the tokens and their value type. At the same time we specify in the declarations the associativity property (left means left-associative) and the precedence level (which is reflected from the order of the precedence declarations) of the operators:

... more precedence declarations ...

%left <operator> EQUALITY
%left <operator> RELATIONAL

... more precedence declarations ...

Lastly, we add the RELATIONAL and EQUALITY tokens in the grammar of the Bison parser:

%%
... more grammar rules ...

exp
  : exp RELATIONAL exp { $$ = new_relational_ast_node ($2, $1, $3); }
  | exp EQUALITY exp   { $$ = new_equality_ast_node ($2, $1, $3); }

  ... more rule alternatives ...
;

... more grammar rules ...
%%

In the right hand side of the expression BNF grammar rule for RELATIONAL and EQUALITY syntaxes we create the appropriate abstract syntax tree nodes. All equality and relational operators are boolean binary operators that always “compare” two expressions and return either false or true.

Tags: associativity, comparison, compiler, equality, flex, free software, lexer, open source, precedence

Comments RSS feed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Efstathios Chatzikyriakidis