Techniques for resolving common grammar conflicts in parsers.

Filed under: Articles, Compilers & Interpreters — 3 Comments

May 17, 2014

In this article I’ll present to you some common conflicts that usually occur in Bison grammars and ways of resolving these. At first, conflicts in Bison context are situations where a sequence of input can be parsed in multiple ways according to the specified BNF grammar rules.

This behavior of multiple parsing ways for the same sequence of input may be appropriate for some software systems, but is completely inappropriate for constructing parsers for computer programming languages since computer algorithms must always be decisive.

A parser should always have a grammar that can handle a sequence of input in one way only. If the sequence of input cannot be handled by the specified grammar of the parser then a syntax error should be produced by the parser.

Bison can recognize two type of conflicts:

shift/reduce: situation where a token can be shifted and a grammar rule can be reduced
reduce/reduce: situation where two grammar rules can be reduced

It is possible to instruct Bison to resolve shift/reduce and reduce/reduce conflicts when generating a parser.

However, is not always a good idea to “patch” the conflicts of a grammar by instructing Bison how to resolve these. Some conflicts may occur due to a bad grammar design. Maybe, it could be better to rewrite the grammar of the parser in order to be more human readable and more clear without having fuzzy rules.

Take account that there are situations where the grammar is clear without fuzzy rules but Bison cannot handle it because by default it generates LALR(1) parsers with limited look-ahead ability. However, you can instruct Bison to create a GLR parser which can face the problem of limited look-ahead ability and handle a more powerful grammar.

Conflicts in Expression Grammars

Let’s start with conflicts that usually occur when designing the expression grammar of a programming language. In an expression grammar the most common conflicts occur when we forget to implement the associativity property and the precedence of the various tokens.

There are two ways to specify precedence and associativity in a grammar, implicitly and explicitly. When we specify the expression grammar implicitly we have to implement nonterminal symbols for each precedence level. This is perfectly reasonable way to write a grammar, and if Bison didn’t have explicit precedence rules, it would be the only way.

This is how an expression grammar could be designed implicitly for supporting arithmetic expressions:

%%
... more grammar rules ...

exp: factor
  | exp '+' factor { $$ = new_ast_node ('+', $1, $3); }
  | exp '-' factor { $$ = new_ast_node ('-', $1, $3); }
;

factor: term
  | factor '*' term { $$ = new_ast_node ('*', $1, $3); }
  | factor '/' term { $$ = new_ast_node ('/', $1, $3); }
;

term: NUMBER    { $$ = new_ast_number_node ($1); }
  | '(' exp ')' { $$ = $2; }
  | '-' term    { $$ = new_ast_node ('M', $2, NULL); }
;

... more grammar rules ...
%%

However, when there are many precedence levels with various tokens and more complicated expression syntaxes, is very hard to maintain a grammar based in this old fashion structure which implements for each precedence level a different nonterminal symbol (“exp”, “factor”, “term”, etc).

Another better approach for designing an expression grammar could be the following:

%%
... more grammar rules ...

exp
  : exp RELATIONAL exp    { $$ = new_ast_relational_node ($2, $1, $3); }
  | exp EQUALITY exp      { $$ = new_ast_equality_node ($2, $1, $3); }
  | exp '+' exp           { $$ = new_ast_node ('+', $1, $3); }
  | exp '-' exp           { $$ = new_ast_node ('-', $1, $3);}
  | exp '*' exp           { $$ = new_ast_node ('*', $1, $3); }
  | exp '/' exp           { $$ = new_ast_node ('/', $1, $3); }
  | '(' exp ')'           { $$ = $2; }
  | '-' exp %prec UMINUS  { $$ = new_ast_node ('M', $2, NULL); }
  | NUMBER                { $$ = new_ast_number_node ($1); }
  | NAME                  { $$ = new_ast_symbol_reference_node ($1); }
  | NAME '(' ')'          { $$ = new_ast_function_node ($1, NULL); }
  | NAME '(' exp_list ')' { $$ = new_ast_function_node ($1, $3); }
;

... more grammar rules ...
%%

This expression grammar is more elegant and promises evolvable recursive expression syntaxes. However, Bison produces many shift/reduce conflicts due to the fact that we have not provided instructions about the associativity property and the precedence level of the tokens. So, for example the following expression as input “exp – exp – exp” has a shift/reduce conflict since Bison cannot decide the way that input will be parsed. There are two possible ways of parsing:

Case 1: (exp - exp) - exp

Case 2: exp - (exp - exp)

This shift/reduce conflict can be resolved by instructing Bison the associativity property of the ‘-‘ token. Also, the expression “exp – exp * exp” produces a similar conflict but now we have to care also about the precedence level of the tokens ‘-‘ and ‘*’. Again, this conflict can also be resolved by instructing Bison the precedence of the tokens.

So, in order to resolve shift/reduce conflicts that occur in the above grammar we can use the precedence declarations of Bison. By using “%left”, “%right” or “%nonassoc” declarations we can support associativity and complex precedence levels:

%left <operator> EQUALITY
%left <operator> RELATIONAL
%left '+' '-'
%left '*' '/'
%right UMINUS

For the above expression grammar we support five precedence levels from lowest to highest. Each precedence declaration describes both the precedence level of related grouped tokens and their associativity property. Also for the EQUALITY and RELATIONAL tokens we specify their value.

If we want to reuse a token in an alternative syntax and give it a different meaning we can do it without restrictions. But, we should know that the precedence level or the associativity property of the token might not be the appropriate. For example, in our grammar we reuse the ‘-‘ token as a unary operator to support negative expressions. However, the ‘-‘ unary operator has right associativity and a higher precedence. In order to override both properties we use the “%prec” declaration and a fake token UMINUS.

Conflicts in IF/THEN/ELSE Grammars

Another common problem that we might face while designing the grammar of a parser to support conditional branching with “if/then/else” programming structures is the well-known dangling else problem. The dangling else problem dates to ALGOL 60, and has been resolved in various ways in subsequent languages. In LR parsers, the dangling else is the archetypal example of a shift-reduce conflict. So, suppose that we want to create a parser for handling the “if” and “if/else” structures in order to support conditional branching. A reasonable, easy and quick way to do it is to write a grammar like the following (just a code snippet):

%%
... more grammar rules ...

statement
  : iteration_statement
  | compound_statement
  | selection_statement
  | ... other alternatives ...
;

selection_statement
  : IF '(' expression ')' statement
  | IF '(' expression ')' statement ELSE statement
;

... more grammar rules ...
%%

Although, such a grammar is clear and human readable it raises a shift/reduce conflict because of the problem of dangling else. For example, the sequence of input “if (e1) if (e2) s1 else s2” can be parsed with two different ways because of the shift/reduce conflict. The two possible ways of parsing such input are the following:

Case 1: if (e1) { if (e2) s1 else s2 }

Case 2: if (e1) { if (e2) s1 } else s2

Which case is the correct one? None. We should decide. Most of the time compilers of various programming languages are constructed in such a way that associate always the dangling “else” with the closest/nearest “if” (Case 1). It turns out that this is also the default way that Bison selects despite the fact that produces a warning. So, the dangling else problem can be solved easily. We can instruct Bison how to resolve this conflict in various ways:

By using matched/unmatched statements to associate the dangling “else” with the nearest “if”
By using something like “endif” for marking the end of the conditional structure
By using always opening and closing curly braces around statements
By using different precedence rules to associate the dangling “else” with the nearest “if”
By insisting on space indentation

Next follows an example for most of the above solutions.

1. By using matched/unmatched statements to associate the dangling “else” with the nearest “if”

We can leave Bison to do its job 🙂 since it recognizes that this is the dangling else problem and selects the scenario where the dangling “else” is associated with the closest/nearest “if”. However, we can solve this conflict manually and remove the related warning with the following grammar:

%%
... more grammar rules ...

statement
  : matched
  | unmatched
;

matched
  : IF '(' expression ')' matched ELSE matched
  | other_statement
;

unmatched
  : IF '(' expression ')' statement
  | IF '(' expression ')' matched ELSE unmatched
;

other_statement
  : iteration_statement
  | compound_statement
  | ... other alternatives ...
;

... more grammar rules ...
%%

2. By using something like “endif” for marking the end of the conditional structure

The concept of “endif” for marking the end of “if” and “if/else” conditional statements is implemented by many programming languages (ALGOL 68, Ada, Eiffel, PL/SQL, Visual Basic, etc) and can be used for solving the dangling else problem. Also, the Bash programming language forces “fi” (the “if” reversed) instead of “endif” for closing conditional statements which is the same thing. Here follows the grammar implementing the “endif” concept:

%%
... more grammar rules ...

statement
  : iteration_statement
  | compound_statement
  | selection_statement
  | ... other alternatives ...
;

selection_statement
  : IF '(' expression ')' statement ENDIF
  | IF '(' expression ')' statement ELSE statement ENDIF
;

... more grammar rules ...
%%

3. By using always opening and closing curly braces around statements

Another common solution of the dangling else problem that is used by many programming languages is to insist on opening and closing curly braces around statements in the “if” and “if/else” conditional structures. Here follows the grammar implementing this concept:

%%
... more grammar rules ...

statement
  : iteration_statement
  | compound_statement
  | selection_statement
  | ... other alternatives ...
;

selection_statement
  : IF '(' expression ')' '{' statement '}'
  | IF '(' expression ')' '{' statement '}' ELSE '{' statement '}'
;

... more grammar rules ...
%%

4. By using different precedence rules to associate the dangling “else” with the nearest “if”

If the grammar supports the “then” keyword it is possible to solve the dangling else problem by instructing Bison to use different precedence levels for the “THEN” and “ELSE” tokens. Later, you will see that also the problem can be solved with the same technique by eliminating the “then” keyword.

Here are the precedence declarations:

... more precedence declarations ...

%nonassoc THEN
%nonassoc ELSE

... more precedence declarations ...

Here are the grammar rules:

%%
... more grammar rules ...

statement
  : iteration_statement
  | compound_statement
  | selection_statement
  | ... other alternatives ...
;

selection_statement
  : IF '(' expression ')' THEN statement
  | IF '(' expression ')' statement ELSE statement
;

... more grammar rules ...
%%

In case the grammar does not support the “then” keyword it is possible to create a fake token (for example NO_ELSE or even THEN) that will have lower precedence than ELSE token and use it with the “%prec” declaration in order to accomplish the same result.

Here are the precedence declarations:

... more precedence declarations ...

%nonassoc NO_ELSE
%nonassoc ELSE

... more precedence declarations ...

Here are the grammar rules:

%%
... more grammar rules ...

statement
  : iteration_statement
  | compound_statement
  | selection_statement
  | ... other alternatives ...
;

selection_statement
  : IF '(' expression ')' statement %prec NO_ELSE
  | IF '(' expression ')' statement ELSE statement
;

... more grammar rules ...
%%

5. By insisting on space indentation

This is a practice that rarely is used in programming languages. By insisting on space indentation the dangling else problem can be solved. Python does it by providing at the same time readable code. In order to implement this technique it is necessary to use states and special tokens like INDENT and DEDENT for the indentation.

Python does not ignore all whitespace. It ignores indentation just inside parentheses, square brackets and curly braces. So, in the following example space indentation matters and we should be careful with the indentation of the “else” keyword.

Case 1 (the “else” is associated with the inner “if”) :

if e1:
   if e2:
      s1
   else:
      s2

Case 2 (the “else” is associated with the outer “if”) :

if e1:
   if e2:
      s1
else:
   s2

Happy Hacking!

Tags: bison, compiler, conflict, dangling else, expression, flex, free software, grammar, lexer, open source, parser

Comments RSS feed

3 Comments:

Keith Badeau

May 18, 2020 at 23:03

Bison is still new to me so the more I learn, the more problems arise (at approx. a 3 or 4 to 1 ratio). This article helped me to resolve an issue but, more importantly, it gave me insight into Bison grammars and how I can resolve the other conflicts. I narrowed the productions and went from 152 conflicts to 0. Thank you.
Efstathios Chatzikyriakidis

February 28, 2020 at 19:30

I am so glad it helped! Be well! Don’t hesitate to ask for help!
swordofbisnegar@gmail.com

February 28, 2020 at 18:46

I know this is six years old and no one will probably see this comment, but I just wanted to say that this post helped me SO MUCH to resolve conflicts in my compiler design project and I couldn’t find such a well written and helpful guide anywhere else online. Thank you so much.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Efstathios Chatzikyriakidis