Thoughts on Automatic Software Repairing and Genetic Programming

In the field of Software Engineering enough emphasis is given on the development of methodologies and mechanisms for the design of optimal software systems. Moreover, the quality of a software system can be assessed by carrying out appropriate metrics. Key features under study during the evaluation of a system are reliability, stability, security, portability and usability. The quality of a software system depends mainly on the time spent, expenses made, debugging and testing techniques used etc.

A key step always present in any software development methodology, which is designed to verify that the software works so far, is testing. Software Testing is very basic because it strongly affects quality. Also, software testing takes place at several stages of the development.

The human factor is most of the time needed for the realization of testing as in all the stages of system development. With the development of technologies, more and more ways are developed to test a system.

Some of the main testing methods are White-Box, Black-Box, Grey-Box. Also, some of the different levels of testing are the modules testing, integration testing, system testing, stability testing, usability testing, and more..

Each of these testing levels are supported either by various methods or the appropriate tools. Several of the tools perform checks automatically while others also need human review anyway.

Generally, the presence of errors in software is something undesirable. The optimization process of a system must eliminate all bugs. The time we will spend to locate an error, to think of a satisfactory solution to eliminate it and to code the solution with appropriate additions or changes to the system source code can cost us a lot.

For this reason, as we have mentioned above, systems have developed that automatically (and with minimal involvement of the human factor) can detect and recognize an error or check the functionality of a system.

Nevertheless, the issue of automatic repairing of an error after locating it remains crucial and important. Solving a problem is a difficult stage because it requires human imagination, scientific though, research, judgment, etc.

Genetic Programming (GP) might very well be the discipline that will enable the development of automated systems for alternative solutions to a software error. GP is a specialization of the Genetic Algorithms (GA). GAs model the evolution of genotypes in a population. Generally, GAs solve optimization and search problems. The GP is also an implementation example of Evolutionary Computing (EC). In turn, the EC (as sub-sector of Evolutionary Intelligence and Machine Learning) models the mechanism of natural evolution which optimizes the species within the dynamically ever changing environment.

The general mechanism of a Genetic Algorithm

The general mechanism of a Genetic Algorithm

The GP aims to create programs in a programming language well or optimally solving a problem. Many times GP is also referred to as Invention Machine because it can give us solutions to problems that would be difficult for the human mind to find.

Sometimes the solution to a problem comes through a combination of many sciences or disciplines. Generally, combinatorial logic can bring satisfactory results. In this case, it is necessary to combine Software Engineering, Artificial Intelligence and Genetics for creating automated systems for error repairing in software.

In GP, we use specialized genetic algorithms. Individuals in the population are represented as executable programs (as syntax trees). In each generation of the evolution, the individuals – that is, the programs – are executed which is then followed by measurement of their performance on each problem.

Each individual of the population in GP represents an executable program as an abstract syntax tree. Because the GP is required to find a program or optimize an existing one to solve a problem, it is required to know the grammar and the definition of the programming language you will use beforehand. Also, we define the set of terminal symbols and the set of functions or operators. As the two sets mentioned increase, as many more programs can be derived from the grammar of the language. The goal of GA is to find a suitable or even optimal program to solve each problem.

Within the population, trees representing the chromosomes can be of static or variable size. In the case of the static representation, all trees have the same size and all the sub-branches of a tree can be extended to a maximum depth. However, in the case of variable representation, there are no such restrictions.

During the initialization of the population, there are randomly generated programs following the limitations, however, on the maximum size of the trees, the grammatical rules of the language etc. so that the programs are syntactically and semantically correct. With optimization rules, during initialization and during development, efforts are being made to simplify the programs.

In the course of evolution where the estimated quality of each program is evaluated (to the function of quality), each program is run and tested on a set of failed use cases as well as in a set of successful use cases. Programs that produce satisfactory results regarding the use cases survive while those that do not produce satisfactory results can not evolve and disappear since the strongest prevails in nature.

To calculate the quality of a program of the population the mean-squared or cross-entropy error is used for the data / use sets of cases.

The function quality can also examine the complexity of possible solutions so as to punish some individuals / programs such as deep or complex testing structures trees with appropriate penalties, and reward others which are more simple.

Moreover, in the course of evolution there are acts of reproduction where the genetic material of individuals recombine for the birth (production) of new chromosomes (programs).

Mutation is important in the case of GP and happens in many different ways. Both recombination and mutation are performed in a probabilistic way with transformations and processes onto the tree data structures as graph theory specifies.

Correcting the error in MS Zune Player

Code with an error that drives the system into an endless loop:

Code exported by the Genetic Algorithm (without the error) :

Tools & Genetic Programming Libraries

  • Open BEAGLE, EC Framework
  • Evolving Objects (EO) : Evolutionary Computation Framework
  • GPC++ – Genetic Programming C++ Class Library
  • GenPro – Genetic Programming
  • deap : Distributed Evolutionary Algorithms In Python
  • pySTEP : Python Strongly Typed Genetic Programming
  • DGPF : Distributed Genetic Programming Framework
  • PMDGP : Object Oriented Framework For Solving GP Problems
  • jGE : Java Grammatical Evolution