Home > Articles, Compilers & Interpreters > Using literal character tokens when designing lexers and parsers.

Using literal character tokens when designing lexers and parsers.

Sometimes while I exploring the source code of various free software Flex lexers and Bison parsers I see name declarations for single character tokens.

I present some of these that will be used later on for demonstration reasons:

"+", "-", "*", "/", "=", "|", "(", ")"

Software architects usually use named tokens for these characters. I believe that there is no need to declare literal character tokens unless we need to declare the type of their values. Rather than giving every token a name, it’s possible to use a single quoted character as a token, with the ASCII value of the token being the token number (Bison starts the numbers for named tokens at 258, so there’s no problem of collisions). By convention, literal character tokens are used to represent input tokens consisting of the same character; for example, the token ‘+’ represents the input token +, so in practice they are used only for punctuation and operators.

There is a common idiom or a design pattern in which we can handle all single-character operators with the same rule that returns “yytext[0]”, the character itself, as the token. Here is a code snippet of a simple Flex lexer that uses this common idiom:

%%
... more lexer rules ...

"+" |
"-" |
"*" |
"/" |
"=" |
"|" |
"(" |
")" { return yytext[0]; }

... more lexer rules ...
%%

Also, a Bison parser can use in its BNF grammar rules the literal character tokens as single characters. Here follows a small code snippet as an example for a grammar rule that describes an expression in a programming language:

%%
... more grammar rules ...

exp
  : exp '+' exp           { $$ = new_ast_node ('+', $1, $3); }
  | exp '-' exp           { $$ = new_ast_node ('-', $1, $3);}
  | exp '*' exp           { $$ = new_ast_node ('*', $1, $3); }
  | exp '/' exp           { $$ = new_ast_node ('/', $1, $3); }
  | '|' exp               { $$ = new_ast_node ('|', $2, NULL); }
  | '(' exp ')'           { $$ = $2; }
  | '-' exp %prec UMINUS  { $$ = new_ast_node ('M', $2, NULL); }
  | NUMBER                { $$ = new_ast_number_node ($1); }
  | NAME                  { $$ = new_ast_symbol_reference_node ($1); }
  | NAME '=' exp          { $$ = new_ast_assignment_node ($1, $3); }
  | NAME '(' ')'          { $$ = new_ast_function_node ($1, NULL); }
  | NAME '(' exp_list ')' { $$ = new_ast_function_node ($1, $3); }
;

... more grammar rules ...
%%
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: