Many programming languages and computer files have a directive, often called “include” (as well as “copy” and “import”), that causes the contents of a second file to be inserted into the original file. These included files are called copybooks or header files. They are often used to define the physical layout of program data, pieces of procedural code and/or forward declarations while promoting encapsulation and the reuse of code.

Here follows an example in the C programming language which uses the “#include” directive:

#include <stdio.h>

int
main (void)
{
  printf("Hello World!\n");

  return 0;
}

In the compiling process, the C preprocessor at some point will copy the contents of the “stdio.h” standard header file into the original file containing the main function. In case the file “stdio.h” contains also inside it another “#include” directive this process continues recursively. A file can include more than one other files. But, there must be no circular references; otherwise there will be an infinite recursion leading to a stack overflow.

In order to implement this behavior we can build a simple Flex lexer. If we want to implement a more powerful preprocessor supporting also features such as “#if”, “#else”, “#elif”, “#endif”, “#ifndef”, “#define” and “#undef” we should try to implement also a simple Bison parser for handling the branching and use the Flex lexer as coroutine plus a symbol table for storing any defined identifiers. Also, in order to protect your program from circular references programmatically you need to implement the “#ifndef”, “#define” and “#endif” directives. Otherwise, your should be very careful of what files you include.

However, we continue in this article assuming that you will always be careful about the files you include 🙂 and for simplicity reasons we demonstrate how the “#include” directive could be implemented straightforward with a Flex lexer only.

Flex lexer implementation:

%x INCLUDE_FILE

%%
^"#"[ \t]*include[ \t]*[\"<] { BEGIN INCLUDE_FILE; }

<INCLUDE_FILE>[^ \t\n\">]+ {
                             {
                               int c;

                               while ((c = input ()) && c != '\n') ;
                             }

                             if (!push_file (strdup(yytext)))
                             {
                               yyterminate ();
                             }

                             BEGIN INITIAL;
                           }

<INCLUDE_FILE>.|\n {
                     fprintf (stderr, "Bad include line is found.\n");

                     yyterminate ();
                   }

<<EOF>> {
          if (!pop_file ())
          {
            yyterminate ();
          }
        }

. { ECHO; }
%%

Under push_file() and pop_file() functions you can maintain a stack of buffers in order to handle nested included files.

The push_file() function should use yy_create_buffer() and yy_switch_to_buffer() Flex functions in order to perform the context switch and start reading input from the buffer of the new included file.

The pop_file() function should use yy_delete_buffer() and yy_switch_to_buffer() Flex functions in order to perform the context switch and continue reading input from the buffer of the parent previous file.

A possible stack structure for the buffers could be the following:

struct buffer_stack_node
{
  struct buffer_stack_node *previous; /* the previous node in stack. */
  YY_BUFFER_STATE buffer;             /* the input buffer state. */
  char * filename;                    /* the input filename. */
  FILE * file;                        /* the input file. */
} * buffer_stack = 0;