DARWIN: A Genetic Algorithm Language: Arslan Arslan and Göktürk Üçoluk
DARWIN: A Genetic Algorithm Language: Arslan Arslan and Göktürk Üçoluk
Abstract This article describes the DARWIN Project, which is a Genetic Algorithm
programming language and its C Cross-Compiler. The primary aim of this project is
to facilitate experimentation of Genetic Algorithm solution representations, operators
and parameters by requiring just a minimal set of definitions and automatically
generating most of the program code. The syntax of the DARWIN language and an
implementational overview of the the cross-compiler will be presented. It is assumed
that the reader is familiar with Genetic Algorithms, Programming Languages and
Compilers.
1 Introduction
The goal of the DARWIN Project is twofold - first, creating a programming language
with support for Genetic Algorithm concepts and second, by requiring just a minimal
set of specifications, generating all the necessary code automatically. The DARWIN
Project thus involves a language design and its compiler implementation.
In general, if creation of a GA solution to a problem is desired, first the data
representation is selected, the set of GA operators designed and the parameters spec-
ified. For example, lets decide that an array will store our chromosome and that we
will implement 1-point cross-over with a binary flip mutation and a simple objective
score function. Up to that point it is simple, but there is a lot to do. Next, we have to
decide what kind of scaling and how selection will be performed. Thus, we create an
array to store the statistics of the current population and define scaling and selection
A. Arslan (B)
LOGO, Ankara, Turkey
e-mail: [email protected]
G. Üçoluk
Middle East Technical University, Ankara, Turkey
e-mail: [email protected]
functions to use the information stored there-in. Finally, we code the genetic
algorithm, to create a population and start evolving it until a termination condi-
tion is satisfied. At that point, we have a completely functioning Genetic Algorithm
solution and we can start experimenting with different operators and GA parame-
ters. This can be a time consuming process, since certain GA operators should be
rewritten and modified, then the system should be tested against different inputs and
parameters. It is custom to have hundreds of runs until a proper set of parameters is
obtained. So a tool or a library is needed to facilitate creation of an initial system and
afterwards help experimenting in the search of a better system.
Although GALib is easy to use, it is not that easy to extend. For example, creating
a chromosome representation matching any of the available chromosome classes
is easy, but when a more complex structure is needed a separate class should be
derived and all the operators be redefined. This is still acceptable, but if creating a
custom population object or even worse, a hand-made genetic algorithm object is
required, then the programmer should be pretty much involved in the GALib object
implementation and interaction details.
Another drawback of GALib comes from the fact that classes are as “general”,
i.e. data structure independent, as possible. So inefficient data storage arises, just
because the implementation fails to specialize. To cover up, GALib provides many
specialized implementations of the same object. For example, GAListGenome,
GATreeGenome, GABinaryString and GAArray objects derive from the base class
named GAGenome and provide support for specific genome representations. But this
is not the end of the story, since the above mentioned four classes are in turn base
classes for more and more specific data representations and this results in a fairly big
object hierarchy.
In general Object Orientation is a handy concept, because implementation details
can be hidden and inheritance is a fast way of complying to the overall library con-
ventions, but expertise in C++ is required to accomplish tasks with even moderate
difficulty. Another potential problem is the fact that inheritance hides the implemen-
tation details of the base class, so the resultant object inherits both the good and bad
sides of his parents. As a result, the cumulative effect of the inefficient implementa-
tion during the path of derivation can add up to a point where performance is severely
reduced.
The General-purpose systems go a step further, compared to a Algorithm Library,
in the sense that they provide a high-level language to program their libraries. These
high-level languages allow extremely fast prototyping, when experimenting with
the system parameters. But, there is another reason for having such languages—
the algorithm libraries are extremely complex and thus difficult to understand and
extend in their low-level implementations. In addition, all such systems require a
special kernel to execute these high-level GA instructions. The kernel may not be
portable and can even rely on some parallel hardware.
As a solution, a programming language naturally recognizing and providing sup-
port for genome representations and their operators can be designed and imple-
mented. By having a separate language and a compiler, all the necessary code can be
generated for the unspecified genetic operators, thus giving a change for the developer
DARWIN: A Genetic Algorithm Language 37
to focus just on genome representation, operators and proper parameter set design.
In addition, since genome data representation is known by the compiler, efficient
code for dealing with the genome data structure can be automatically generated, thus
eliminating inefficiency issues.
First, previous work related to the DARWIN project will be presented. In Sect. 4,
an example of a DARWIN program will be given. In Sect. 5, An overview of the
design of DARWIN and its implementation will be covered. We conclude in Sect. 7.
Genetic Algorithms received wide spread support due to the fact that they are robust
and their complexity is not linear with respect to the number of parameters encoded
in the solution. In fact, Genetic Algorithms were even used in obtaining near optimal
solutions for many of the problems previously considered NP or NP-hard. But the
task of producing a successful Genetic Algorithms system is not simple. This is
due to the fact that experimentation with different GA operators and parameters is
needed. So, some tools for facilitating GA program creation and experimentation
with different genome structures and operators are needed.
In its most general case, the GA Programming Environments can be grouped into
the following major classes [9]:
Application-oriented systems are programming environments hiding all the
implementation details and are targeted at business professionals. These systems
focus on a specific domain such as scheduling, telecommunications, finance, etc.
A typical application-oriented environment contains a menu-driven interface giving
access to parameterized implementations of genetic algorithms targeted at specific
domains. The user can configure the application, change parameters and monitor pro-
gram execution. Examples of application-oriented systems are EVOLVER, OMEGA,
PC/BEABLE, XpertRule GenAsys, etc.
Algorithm-oriented systems are programming environments with support for
specific genetic algorithm:
Algorithm-specific systems contain just a single powerful genetic algorithm.
In general, these systems come in source code and allow the expert user to make
changes. The implementations are usually modular in structure and there are nearly
no user interfaces. Examples of these systems are ESCPADE, GAGA, GAUSCD,
GENESIS and GENITOR.
Algorithm Libraries contain variety of genetic algorithms and operators are
grouped in a library. These systems are modular, allowing the programmer to select
among parameterized implementations of different algorithms and operators. Usually
these libraries are implemented in C/C++ and Lisp. Examples of algorithm libraries
are GALib, EM and OOGA.
Algorithm-oriented systems are often GNU Public Licensed and include free
source code. Thus they can be easily incorporated into user applications.
38 A. Arslan and G. Üçoluk
Tool Kits are programming systems that contain many programming utilities,
algorithms and genetic operators that can be used in a wide range of application
domains. These programming systems subdivide into two:
Educational systems help the novice user to get familiar with Genetic Algorithm
concepts. Typically these systems are very user friendly and support just a few options
for configuring an algorithm. GA Workbench is the only example for an educational
system.
General-purpose systems provide a comprehensive set of tools for programming
any GA and application. These systems may even allow the expert user to customize
any part of the software. Generally, these systems provide:
• Sophisticated user interface
• Parameterized algorithm library
• A high level language to program the system library
• Open architecture for expert user modification.
Examples include EnGENEer, GAME, MicroGA, PeGAsus and Splicer.
From the programmers point of view, General-purpose systems are the best tools
available for programming GA systems. But, since these are not freely available, the
current trend is creating libraries of functions or classes implementing predefined GA
operators. Perhaps the most complete and easy to use Genetic Algorithms package is
GALib. GALib is a C++ Library of Genetic Algorithm Components and is written and
maintained by Matthew Wall, Mechanical Engineering Department of Massachusetts
Institute of Technology [10].
GALib is a hierarchy of objects representing different GA constructs. The top
level objects are GAGenome, GAPopulation and GAGeneticAlgorithm. The Genetic
Algorithms operators are thus implemented as methods of these objects. The library
features a full set of ready to use specialized versions of these top level objects, so
a working program can be fastly prototyped, but experimentation usually requires
modifications in the default behavior implemented by the classes. GALib provides
two extension methods—deriving own classes to implement custom behavior, or
defining new methods and instructing GALib to use them instead of the defaults.
Stated briefly the solution of a problem using a Genetic Algorithm involves these
important steps:
• Defining a representation
• Defining the objective score function
• Defining the genetic operators
The DARWIN language is designed to clearly distinguish Genetic Algorithm
constructs—the genome and its operator set. It’s language compiler has been designed
to be a C cross-compiler. The target language is chosen to be C, since C is the most
DARWIN: A Genetic Algorithm Language 39
commonly used language with compilers present on the widest range of platforms.
By generating standard C code, portability is ensured on the largest scale possible.
The DARWIN language is designed to be simple, GA oriented language. In order
to achieve fast learning and adaptation, the syntax of the language resembles the
standard C language syntax. For example, the DARWIN statements look just the
same in C, with the minor difference that DARWIN is not as rich with build-in types
as C and there are no pointers. The reason for excluding pointers from the language is
that they are not currently needed, since all the storage in the generated code is linear.
This does not lead to inefficient implementation since the generated code contains
C pointers.
The DARWIN language, is capable of distinguishing genetic algorithm constructs.
In the first place, DARWIN syntax provides means for defining genes, chromosomes,
populations and the genetic algorithm.
The best possible way to represent them would be to have object classes. But
since DARWIN generates plain C code, the DARWIN language has a special syn-
tax construct called moderators. Moderator is a function that effectively associates
an operation on a data structure, thus creating the effect of encapsulation present
in Object Oriented programming. By introducing the concept of moderators, the
DARWIN language becomes powerful enough to express a genetic algorithm con-
struct and provide the standard set of operators associated with it. For example, the
gene construct has the print, initialize, crossover, mutate and evaluate operations
defined on it.
chromosome TRectangles
{ TRectangle rectangles[20]; };
population Pop
{ TRectangles individuals[100]; };
algorithm GA
{ Pop population; }; % end of genome
definitions;
5 Implementation
The DARWIN project implementation consists of 4 major parts and an external code
template database. It consists of
• Parser
• Post-processor
• Template Database
• Electra Engine
• Unparser
The general structure of the DARWIN cross-compiler is Fig. 1.
The Parser is the entity responsible for imposing the DARWIN Language syntax
and converting a DARWIN program into a computer readable form—the parse tree.
This parse tree is in turn fed into the Post-Processor, which in turn performs type-
checking and verifies language semantics. The output of the Post-processor is an
abstract syntax tree, equivalent to the parse tree but restructured in a form suitable for
DARWIN Program
Parser
ELECTRA
Unparser
Pure C Code
traversal by the Electra Engine. In addition, the Post-Processor produces the symbol
and type tables. Next, the abstract syntax tree is passed to the Electra Engine, which
performs abstract tree translation and code generation. The Electra Engine relies on
the presence of a template database, which is loaded prior to starting the engine.
The template database is a text file containing abstract tree templates, which are
matched inside the abstract tree of the DARWIN program and translated accordingly.
The output of the Electra Engine is an abstract tree representing valid C code. The
Unparser, takes this abstract tree and produces a formatted C program code.
5.2 Parser
The Parser converts the DARWIN syntax into its equivalent parse tree. The input is
a DARWIN program file and the output of the Parser is a LISP structure, represent-
ing its parse tree. The Parser uses the parsing property list driven engine which is
easily customizable and more efficient than the table-driven approach employed by
LALR(1) parsing.
5.3 Post-Processor
The DARWIN Post-Processor operates in two stages. In the first stage it performs
type-checking, manipulates the symbol table and collects all the necessary informa-
tion that will be needed during the actual code generations. The Post-Processor is
also responsible for tagging each branch in the abstract syntax tree. These tags will
be used during the code generation phase, since the Electra engine will decide upon
them.
After having performed the validity checks of the functions and genetic algorithm
constructs, the Post-Processor enters its next stage, where the moderators specified
in the genetic constructs are extracted from the functions section and are placed at
the section they actually belong. During this phase, each of the genetic constructs
are inspected and all of their moderators are searched in the functions section. Any
inconsistency end up in an exception raise.
Electra is a rule based unification pattern matcher/substituter and is the core code
generating engine of the DARWIN compiler. It effectively traverses the input from
the Post-Processor section by section, and decides when some additional code should
be generated and how a given subtree should be transformed. So the Electra engine
is traversing an abstract syntax tree, matching a given abstract syntax subtree and
transforming it. The output is an abstract syntax tree, organized in sections.
DARWIN: A Genetic Algorithm Language 43
6 Unparser
The Unparser is the abstract syntax to C code translator. The implementation adopts
the one-pass pretty printer developed by Hearn and Norman [8]. The input of the
Unparser is the output of the Electra Engine, where all of the syntactic sugars were
removed, code translations done and any additional code generated. The output is a
pretty printed C program, that corresponds to the user’s DARWIN program.
7 Conclusion
References