In this article I’ll show you how you can create the abstract syntax tree (AST) of an input stream while parsing it. A parser usually reads various tokens from its input by using a lexer as a helper coroutine and tries to match various grammar rules that specify the syntax of a language (the source language).

## Category: Compilers & Interpreters

In this article I’ll present to you some common conflicts that usually occur in Bison grammars and ways of resolving these. At first, conflicts in Bison context are situations where a sequence of input can be parsed in multiple ways according to the specified BNF grammar rules.

In this article I’ll show you how syntactically similar tokens can be handled in a unified way in order to simplify both lexical and syntactic analysis. The trick is to use one token for several similar operators in order to keep down the size of the grammar in the parser and to simplify the regular expression rules in the lexer.

Sometimes while I exploring the source code of various free software Flex lexers and Bison parsers I see name declarations for single character tokens.

Many programming languages and computer files have a directive, often called “include” (as well as “copy” and “import”), that causes the contents of a second file to be inserted into the original file. These included files are called copybooks or header files. They are often used to define the physical layout of program data, pieces of procedural code and/or forward declarations while promoting encapsulation and the reuse of code.

Below, there are some useful regular expressions for matching C-like primitive data values.

Below, is a simple example for counting words, lines and characters by using a Flex lexer.

In your lexer when lexical analysis is performed you may want to ignore any multiline C-like comments.

This project refers to the development of the procedural programming language YAFL (Yet Another Free Language) as well as its interpreter. The programming language YAFL is a tool for teaching the basic principles of procedural programming to secondary school students and beginners in general.

The following program creates the syntax tree of a mathematical expression in prefix notation. For simplicity, we assume that the terms of the expression are individual characters (digits). Each time the recursive function parse() is called, a new node is created, whose value is the next character of the expression. If the value is a term (digit), we return the new node. However, if it is an operator, we set the left and right pointers to point the tree made (recursively) for the two operands.

To calculate a prefix expression, we either convert a number from ASCII to decimal (in the loop ‘while’ at the end of the program) or implement the operation indicated by the first character of the expressions to the two terms, with a recursive calculation. This function is recursive, but it uses a global array containing the expression and an index number for the current character of the expression. The index number goes beyond each sub-expression calculated.

The following program converts an expression from infix to postfix notation. The conversion is carried out with the help of a stack. For example, to turn the expression (A + B) into the postfix form A B +, we ignore the left parenthesis, convert the A to postfix form, we store the + operator on the stack, we convert B to postfix form and then, when find the right parenthesis, we pop the + operator from the top of the stack. In particular, the following implementation works only for the addition and multiplication of integers.

The following program reads any postfix expression that includes integer multiplication and addition, evaluates the expression and displays the final result on the screen. The program stores the intermediate results in a stack of integers.

The terms (operands) are pushed onto the stack. The operators are applied to the two entries at the top of the stack (the two entries are popped from the stack); the result is pushed back into the stack. Because the order in which the two pop() functions are performed in the expression of this code is not specified in C++, the code for some non-commutative operators, such as the ones for subtraction and division, would be somewhat more complicated.

## Ignoring multiline comments with start states in compilers with Flex.

In a previous article I have presented a way for ignoring multiline comments with an old fashion way.

In this article I’ll demonstrate a more elegant Flex-like way for ignoring multiline comments.

Continue reading →