AI General Game Player Using Neuroevolution Algorithms
AI General Game Player Using Neuroevolution Algorithms
DECLARATION
We, hereby declare that the project work entitled AI General Game Player
using Neuroevolution Algorithms has been independently carried out by us under
the guidance of Mr Guru R, Assistant Professor, Department of Computer Science and
Engineering, Sri Jayachamarajendra College Of Engineering, Mysuru is a record of an
original work done by us and this project work is submitted in the partial fulfillment
of the award of the degree of Bachelor of Engineering in Computer Science and
Engineering of Visvesvaraya Technological University, Belgaum during year
2016-17. The results embodied in this thesis have not been submitted to any other
University or Institute for the award of any degree or diploma.
Basanth Jenu H B
Meghana S B
Sanjana G S
Abstract
The goal of General Game Playing (GGP) has been to develop computer programs
that can perform well across various game types. It is natural for humans to transfer
knowledge from games they already know to play to other similar games but the same
is a difficult task for computers. GGP research attempts to design systems that work
well across different game types, including unknown new games. Developing of intelligent
agents that can learn by themselves a given task has a great significance in AI research.
Earlier attempts towards general Game playing have been through tree based methods
and heuristics. Recently, there have been attempts to solve the problem using Reinforce-
ment Learning methods, Q-Learning in particular. Many attempts have combined the
latest advances in Deep Learning and Q - Learning and have achieved impressive results.
In this project, a model is designed by combining latest advances in Deep Learning
and Genetic algorithms. In particular,Conventional Neuroevolution(CNE) and Neuroevo-
lution of Augmented Topologies(NEAT) will be implemented and tested on various tasks.
CNE focuses on evolution of weights of a Neural Network to solve a problem. NEAT goes
a step beyond and evolves the structure of the Neural Network along with its weights.
Finally, a model is designed that can learn on its own to play a variety of games
by itself without any prior knowledge about the game. The algorithm is made to play
Flappy Bird, Pong and Super Mario Bros along with benchmark tests.
ACKNOWLEDGEMENT
We would like to thank the whole management of the Department of Computer Science
and Engineering for having given us an opportunity to carry out project on our own by
trusting and acknowledging for our abilities. We have a great pleasure in expressing our
deep sense of gratitude to our institution Sri Jayachamarajendra College of Engi-
neering, Mysuru.
We take this opportunity to thank our Project Guide Mr Guru R, Assistant Pro-
fessor, Department of Computer Science and Engineering, SJCE for suggestions,
valuable support, encouragement and guidance throughout the project.
We would like to thank all the teaching and non-teaching staff of Computer Science
and Engineering Department. We also convey our gratitude to all those who have con-
tributed to this project directly or indirectly.
Basanth Jenu H B
Meghana S B
Sanjana G S
Contents
Declaration i
Abstract ii
Acknowledgement iii
1 Introduction 2
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Introduction to the problem domain . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 General Game Playing . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.3 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Existing solution methods . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Proposed solution methods . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Time schedule for completion of the project work (Gantt chart) . . . . . 9
2 Literature Survey 11
2.1 A survey of Monte Carlo tree search methods . . . . . . . . . . . . . . . 11
2.2 A GGP Feature Learning Algorithm . . . . . . . . . . . . . . . . . . . . 12
2.3 Training Feedforward Neural Networks Using Genetic Algorithms . . . . 12
2.4 High-level Reinforcement Learning in Strategy Games . . . . . . . . . . . 13
2.5 Human-level control through deep reinforcement learning . . . . . . . . . 13
5 System Design 24
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.1 ANN to interact with Environment . . . . . . . . . . . . . . . . . 24
5.2.2 Evolution of ANNs . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 System Implementation 28
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Conventional Neuroevolution . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 Neuroevolution of Augmented Topologies(NEAT) . . . . . . . . . . . . . 31
6.3.1 Genetic Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3.2 Tracking Genes through Historical Markings . . . . . . . . . . . . 31
6.3.3 Protecting Innovation through Speciation . . . . . . . . . . . . . . 33
6.3.4 Minimizing Dimensionality through Incremental Growth from Min-
imal Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.3.5 NEAT Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.3.6 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Appendix A 52
Appendix B 54
References 57
List of Figures
1.1 Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1 Interacting with the environment . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Evolution of ANNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.1 Encoding an ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2 Crossover of ANNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3 Genetic Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.4 Mutation in NEAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.5 Matching of Genes and Crossover . . . . . . . . . . . . . . . . . . . . . . 34
7.1 Solving XOR using CNE . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.2 Solving XOR using NEAT . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.3 Solving XOR using Backpropagation . . . . . . . . . . . . . . . . . . . . 40
7.4 Evolution of structure by NEAT . . . . . . . . . . . . . . . . . . . . . . . 41
7.5 Cartpole Balancing Environment . . . . . . . . . . . . . . . . . . . . . . 42
7.6 ANN to solve Cartpole Balancing Environment . . . . . . . . . . . . . . 43
7.7 NEAT Playing Pong . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.8 NEAT playing the flappy Bird Game . . . . . . . . . . . . . . . . . . . . 46
7.9 ANN evolved to play Flappy Bird . . . . . . . . . . . . . . . . . . . . . . 48
7.10 State of the Super Mario Bros game . . . . . . . . . . . . . . . . . . . . 49
7.11 NEAT playing Super Mario Bros . . . . . . . . . . . . . . . . . . . . . 49
List of Tables
6.1 Mutation Rates for NEAT . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 Parameters for speciation for NEAT . . . . . . . . . . . . . . . . . . . . . 36
7.1 XOR Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.2 Results for solving XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
CHAPTER 1
1 Introduction
General Game Playing is all about making machines to play a variety of games by
themselves without having any prior knowledge about the game. Humans are very good
at this task but the same is difficult for a computer. The project deals with Genetic
Algorithms(GA) and Artificial Neural Networks(ANN). GAs are optimization algorithms
inspired by natural evolution. ANNs are mathematical models that try to predict and
memic the working of our brain. In this project, Genetic Algorithms and Neural Net-
works are combined to make a machine play a variety of games by itself
1.2 Objectives
The objectives of the project are as follows,
1. Implement Conventional Neuroevolution(CNE) and using the algorithm
Games have always been an important platform for research on Artificial Intelli-
gence (AI).Since the early days of AI, many popular board games, such as chess and
checkers, have been used to demonstrate the potential of emerging AI techniques to solve
combinatorial problems. General Game Playing (GGP)[13] was introduced to design
game-playing systems with applicability to more than one specific game. Traditionally, it
is assumed that game AI programs need to play extremely well on a target game without
consideration for the AIs General Game Playing ability. As a result, a world-champion
level chess program, such as Deep Blue, has no idea how to play checkers or even a board
game that only slightly differs from chess. This is quite opposite to humans game-playing
mechanism, which easily adapts to various types of games based on learning the rules
and playing experience.
Some research stresses the importance of human-style game playing instead of sim-
ply unbeatable performance. For example, given a certain board configuration, human
players usually do not check as many possible scenarios as computer players. However,
human players are good at capturing patterns in very complex games, such as Go or
Chess. Generally, the automatic detection of meaningful shapes on boards is essential
to successfully play large-branching factor games. The use of computational intelligence
algorithms to filter out irrelevant paths at an early stage of the search process is an im-
portant and challenging research area. Finally, current research trends are attempting to
imitate the human learning process in game play.
In the context of GGP, the goal of an AI program is not to perfectly solve one game
but to perform well on a variety of different types of games, including games that were
previously unknown. Unlike game-specific AI research, GGP assumes that the AI pro-
gram is not tightly coupled to a game. Such an approach requires a completely different
research approach, which, in turn, leads to new types of general-purpose algorithms. Tra-
ditionally, GGP has focused primarily on two-dimensional board games inspired by chess
or checkers, although several new approaches for General Video Game Playing (GVGP)
have been recently introduced to expand the territory of GGP. The goal of GVGP re-
search is to develop computer algorithms that perform well across different types of video
games. Compared with board games, video games are characterized by uncertainty, con-
tinuous game and action space, occasional real-time properties, and complex gaming rules.
Neural networks[6] are algorithms for optimization and learning based loosely on
concepts inspired by research into the nature of the brain. They generally consist of five
components:
1. A directed graph known as the network topology whose arcs are called links.
5. A transfer function for each node which determines the state of a node as a function
of
This transfer function usually takes the form of either a sigmoid or a step function.
A feedforward network is one whose topology has no closed paths. Its input nodes
are the ones with no arcs to them, and its output nodes have no arcs away from them.
All other nodes are hidden nodes. When the states of all the input nodes are set, all
the other nodes in the network can also set their states as values propagate through the
network. The operation of a feedforward network consists of calculating outputs given a
set of inputs in this manner. A layered feedforward network is one such that any path
from an input node to an output node traverses the same number of arcs. The nth layer
of such a network consists of all nodes which are n arc traversals from an input node.
A hidden layer is one which contains hidden nodes. Such a network is fully connected if
each node in layer I is connected to all nodes in layer i + I for all I.
Layered feedforward networks have become very popular for a few reasons. For one,
they have been found in practice to generalize well, i.e. when trained on a relatively sparse
set of data points, they will often provide the right output for an input not in the training
set. Secondly, a training algorithm called backpropagation exists which can often find a
good set of weights (and biases) in a reasonable amount of time. Backpropagation is a
variation on gradient search. It generally uses a least-squares optimality criterion. The
key to backpropagation is a method for calculating the gradient of the error with respect
to the weights for a given input by propagating error backwards through the network.
There are some drawbacks to backpropagation. For one, there is the "scaling prob-
lem". Backpropagation works well on simple training problems. However, as the problem
complexity increases (due to increased dimensionality and/or greater complexity of the
data), the performance of backpropagation falls off rapidly. This makes it infeasible for
many real-world problems. The performance degradation appears to stem from the fact
that complex spaces have nearly global minima which are sparse among the local minima.
Gradient search techniques tend to get trapped at local minima. With a high enough
gain (or momentum), backpropagation can escape these local minima. However, it leaves
them without knowing whether the next one it finds will be better or worse. When the
nearly global minima are well hidden among the local minima, backpropagation can end
up bouncing between local minima without much overall improvement, thus making for
very slow training.
Genetic algorithms[9] are algorithms for optimization and learning based loosely
on several features of biological evolution. They require five components:
2. An evaluation function that returns a rating for each chromosome given to it.
4. Operators that may be applied to parents when they reproduce to alter their genetic
composition. Included might be mutation, crossover (i.e. recombination of genetic
material), and domain-specific operators.
Given these five components, a genetic algorithm operates according to the following
steps:
(a) One or more parents are chosen to reproduce. Selection is stochastic, but the
parents with the highest evaluations are favored in the selection.
(b) The operators are applied to the parents to produce children. The parameters
help determine which operators to use.
(c) The children are evaluated and inserted into the population. In some versions
of the genetic algorithm, the entire population is replaced in each cycle of
reproduction. In others, only subsets of the population are replaced.
When a genetic algorithm is run using a representation that usefully encodes solutions
to a problem and operators that can generate better children from good parents, the
algorithm can produce populations of better and better individuals, converging finally on
results close to a global optimum. In many cases the standard operators, mutation and
crossover, are sufficient for performing the optimization. In such cases, genetic algorithms
can serve as a black-box function optimizer not requiring their creator to input any
knowledge about the domain. However, knowledge of the domain can often be exploited to
improve the genetic algorithms performance through the incorporation of new operators.
Genetic algorithms should not have the same problem with scaling as backpropa-
gation. One reason for this is that they generally improve the current best candidate
monotonically. They do this by keeping the current best individual as part of their popu-
lation while they search for better candidates. Secondly, genetic algorithms are generally
not bothered by local minima. The mutation and crossover operators can step from a
valley across a hill to an even lower valley with no more difficulty than descending directly
into a valley. The field of genetic algorithms was created by John Holland.
1.4 Applications
1. Game-playing robots
General game-playing software provides a great opportunity to make a relatively
simple robotic system act smart. A robot arm capable of moving pieces on a board,
coupled with a state-of-the-art player, can in principle learn to play arbitrary games
with these pieces. Beyond this, interesting challenges are posed by game-playing
robots that learn to recognise and manipulate new pieces, or mobile robots that
can solve a whole array of new tasks formulated as games.
2. Manufacturing
Take, for instance, the task of picking a device from one box and putting it in a
container. Robots are now training themselves to do this job with great speed and
precision. Fanuc, a Japanese company, takes pride in the industrial robot that is
clever enough to train itself to do this job.
4. Dynamic pricing
Dynamic pricing is a well-suited strategy to adjust prices depending on supply
and demand to maximize revenue from products. Techniques like Q-learning can be
leveraged to provide solutions addressing dynamic pricing problems. Reinforcement
learning algorithms serve businesses to optimize pricing during interactions with
customers.
5. Customer delivery
A manufacturer wants to deliver products for customers with a fleet of trucks
ready to serve customer demands. With the aim to make split deliveries and realize
savings in the process, the manufacturer opts for Split Delivery Vehicle Routing
Problem. The prime objective of the manufacturer is to reduce total fleet cost
while meeting all demands of the customers.
6. E - Commerce personalization
For retailers and e-commerce merchants, it has grown into an absolute imper-
ative to tailor communications and promotions fitting customer purchasing habits
Personalization is at the core of promoting relevant shopping experiences to cap-
ture customer loyalty. Reinforcement learning algorithms are proving their worth
by allowing e-commerce merchants to learn and analyze customer behaviors and
tailor products and services to suit customer interests.
8. Medical industry
A dynamic treatment regime (DTR) is a subject of medical research setting
rules for finding effective treatments for patients. Diseases like cancer demand
treatments for a long period where drugs and treatment levels are administered
over a long period. Reinforcement learning addresses this DTR problem where RI
algorithms help in processing clinical data to come up with a treatment strategy,
using various clinical indicators collected from patients as inputs.
competitive aspect when students players are pitted against each other for eval-
uation, which works as a great motivator. With todays available software tools
and online resources for teaching general game playing, this is also much easier to
organise than, say, a full-fledged robotics laboratory.
Since a long time, the most influential methods for GGP have been Monte Carlo
Tree search methods. The algorithm iteratively searches the game tree starting from the
current state in series of iterations until the allotted time runs out. This has proven to
be very successful in case of Board Games like Chess and Go. Systems like DeepBlue
were developed that could play Chess and eventually beat the world champion in the
game. But the system was written with heuristics and hard coded rules. Meaning, the
algorithm would fail to play any other game.
Recently, the focus to GGP has been through combination of Deep Neural Networks
and Reinforcement Learning.Googles Deepmind achieved amazing results on a set of
Atari games where the system was able to learn to play the game at superhuman
levels.The company developed AlphaGO that beat the 18 time world champion in the
chinese game of Go.This was a major breakthrough in the field of AI as the system was
able to learn by itself.
CHAPTER 2
2 Literature Survey
In this section, various methods used earlier to solve the problem will be discussed.
Earlier attempts towards general Game playing have been through tree based methods
and heuristics. These were very successful in playing board games like Chess. Recently,
there have been attempts to solve the problem using Reinforcement Learning methods,
Q-Learning in particular. A team of researchers at Google combined the latest advances
in Deep Learning and Q - Learning and the algorithms were able to play a set of Atari
games without any prior knowledge about the game. Very recently, OpenAI combined
Neural Networks and Evolutionary strategies to beat the initial benchmark set by Google
towards the Atari games.
Some of the literature studied and used in the implementation of this project includes
the literature mentioned below.
Monte Carlo Tree Search[2] was the algorithm of choice by the most competitive
General Game Playing agents. The algorithm iteratively searches the game tree start-
ing from the current state in series of iterations until the allotted time runs out. An
iteration consists of the following four steps: selection, expansion, simulation, and back-
propagation.
1. Selection Step - The algorithm starts from a root of the game-tree and chooses a
node within an already built part of the tree based on the nodes statistics. Actions,
which have been performing better so far, are tested more frequently.
2. Expansion Step - It means extending the tree by a new node with the first
unvisited state so far, that is, the first state found after leaving the tree.
3. Simulation Step - After leaving the stored fragment of the tree, a random simu-
lation is performed until a game termination is reached.
4. Back-Propagation Step - The scores obtained by all players in the ended game
are fetched and back-propagated (back-propagation) to all nodes visited in the
selection and expansion steps.
One of the interesting things about this is that the same core algorithm can be used
for a whole class of games: Chess, Go, Othello, and almost any board game you can
think of. The main limitations are that the game has to have a finite number of possible
moves in each state as well as a finite length. Also,the algorithm does not work if the
upcoming states of the environment are unknown or cannot be analyzed. Top AIs for Go
(a notoriously difficult game to make a good AI for) use some form of MCTS.
The goal is to learn generalized features which can be used to improve the quality
of play[5]. Given a move sequence ending in a terminal state, there are two stages to the
learning process, which are described in detail here. In the first stage, a 2-player game
tree is built that leads to a terminal state, and states are identified for learning. In the
second stage, general features are extracted from states to be used during game play. It
has the following stages.
1. Identifying States for Learning - GIFL identifies states for learning by per-
forming random walks in a game until a terminal state is reached. Then, a 2-ply
tree is built around the terminal state to analyze whether learning can occur.
3. Extending GIFL Features Up The Tree - Building the tree identifies possible
states where offensive or defensive features could be learned, and builds generalized
GIFL features from these states. But, ideally, a learner would also learn elsewhere
in the tree. This learning can be performed using the same procedure by looking
at moves higher up in the random walk. Instead of looking for a move which leads
to a goal, GIFL looks for a move which contributes to the offensive feature. A tree
is then built around this move, and similar learning takes place.
The main disadvantage is that it is possible that some necessary predicates are not
properly generalized using this method and this method is useful only for board games
and cannot be extended to complex video games.
go on to produce a new population and mutation is added into the new population. Here
a genome is a neural network with random weights. Over many generations, the networks
get better and better at performing the given task.
The major disadvantage here is that, the method describes only evolution of weights
of the ANN and there is no reasoning about evolution of structure of the ANN.
Q-learning - This method updates the value of a state-action pair after the action
has been taken in the state and an immediate reward has been received. Values
of state-action pairs, Q(s, a) are learned because the resulting policy is more easily
recoverable than learning the values of states alone, V (s). Q-learning will converge
to an optimal value function under conditions of sufficiently visiting each state-
action pair, but often requires many learning episodes to do so. In multiagent
domains, Q-learning is no longer guaranteed to converge due to the environment
no longer being stationary. Nevertheless, it has been shown to be effective at some
places.
CHAPTER 3
3.1 Introduction
The system should be able to record/store what has been learned so that the system
wont need to learn again and again.
The environment to be learned should not have a very large search space. Ex.
Chess, Go
CHAPTER 4
Various tools and technologies have been used for the development of the system.
The whole system was developed using Python and various libraries and frameworks
were used. The environments for the games were used from OpenAI Gym and Pygame
Learning Environment. All the tools are described below.
4.1 Python
4.2 Numpy
NumPy is a library for the Python programming language, adding support for
large, multidimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric,
was originally created by Jim Hugunin with contributions from several other developers.
In 2005, Travis Oliphant created NumPy by incorporating features of the competing
Numarray into Numeric, with extensive modifications. NumPy is an open-source software
and has many contributors.
4.3 Pickle
The pickle module implements a fundamental, but powerful algorithm for serial-
izing and de-serializing a Python object structure. Pickling is the process whereby a
Python object hierarchy is converted into a byte stream, and unpickling is the inverse
operation, whereby a byte stream is converted back into an object hierarchy. Pickling
4.4 Graphviz
4.5 Virtualenv
1. The gym open-source library: a collection of test problems (environments) that you
can use to work out your reinforcement learning algorithms. These environments
have a shared interface, allowing you to write general algorithms.
2. TheOpenAI Gym service: a site and API allowing people to meaningfully compare
performance of their trained agents.
The Cart Pole Balancing task and SUPER MARIO BROS environment have been
used from OpenAI gym for this project.
1. Catcher
2. Monster Kong
3. FlappyBird
4. Pixel Copter
5. Pong
6. PuckWorld
7. Raycast Maze
8. Snake
9. WaterWorld
The Flappy Bird and Pong environment have been used from Pygame Learning
environment for this project.
Git is a Version Control System (VCS) for tracking changes in computer files and
coordinating work on those files among multiple people. It is primarily used for software
development, but it can be used to keep track of changes in any files. As a distributed
revision control system it is aimed at speed,data integrity, and support for distributed,
non-linear workflows.
Git was created by Linus Torvalds in 2005 for development of the Linux kernel, with
other kernel developers contributing to its initial development. Its current maintainer
since 2005 is Junio Hamano.
As with most other distributed version control systems, and unlike most clientserver
systems, every Git directory on every computer is a full-fledged repository with complete
history and full version tracking abilities, independent of network access or a central
server. Like the Linux kernel, Git is free software distributed under the terms of the
GNU General Public License version.
4.11 Latex
LaTex is a document markup language and document preparation system for teX
typesetting program. The term LaTeX refers only to the language in which documents are
written, not to the editor application used to write those documents. In order to create
a document in LaTeX a .tex file must be created using some form of text editor. While
most text editors can be used to create a LaTeX document, a number of editors have been
created specifically for working LaTeX. LaTeX is widely used in academia. As a primary
or intermediate format,e.g., translating DocBook and other XML based formats to pdf,
LaTeX is used because of the high quality of typesetting achievable by Tex. The type-
setting system offers programmable desktop publishing features and extensive facilities
for automating most aspects of typesetting and desktop publishing, including numbering
and cross-referencing, tables and figures, page layout and bibliographies. LaTeX is in-
tended to provide a high-level language that accesses the power of TeX. LaTeX essentially
comprises a collection of TeX macros and a program to process LaTeX documents.
CHAPTER 5
5 System Design
5.1 Introduction
System design describes constraints in the system and includes any assumptions
made by the project team during development. It is the process or art of defining the
hardware and software architecture, components, modules, interfaces, and data for a
computer system to satisfy specified requirements. One could see it as the application of
systems theory to computing. The design of the system is essentially a blueprint, or a
plan for a solution for the system. A system is considered to be a set of components with
the clear and defined behaviour, which interacts each other in a fixed, defined manner,
to produce some behaviour or services to its environment.
The system developed is a package/library that can learn to play a given game by
itself by combining the strengths of Artificial Neural Networks and Genetic Algorithms.
In this section, diagrams are included showing in schematic form, the general ar-
rangement of the components of our system. The first part shows how an Artificial
Neural Network is able to interact with the environment. The latter part describes how
the Genetic Algorithm is able to evolve the Neural Networks to do a particular task.
Each environment has different number of input and output parameters. So, to
implement a generalized model that can interact with multiple environments, an Artificial
Neural Network is used. By this, to make an ANN be suitable for any environment, it is
just a matter of changing the size of Input layer and Output Layer of the environment.As
an example, for a Snake Game, the inputs would be (x1, y1) - Position of Head of snake
and (x2, y2) - Position of food in the environment. So, the input layer size of the ANN
would be 4. The possible outputs for the snake would be a decision whether to go left,
right, up or down. So the output layer size of the ANN would also be 4 in this case.
Each node in the output layer represents the probability to take a particular decision.
The agent selects the decision with the node having the highest probability. The block
diagram is described in figure 5.1.
A population of random ANNs is created, with the input and output layer size of
each ANN equal to the number of input parameters and output parameters of the corre-
sponding environment in context. The environments are adapted from OpenAI gym and
Pygame Learning Environment. All the ANNs are let to perform in the environment and
their score(Fitness) is recorded. Fit parents are retained and the weak ones are discarded.
A new population is created using the retained fit parents by favouring the parents with
higher fitness for reproduction. Multiple mutation operators are applied on the newly
created population. The same process is continued until the specific termination criteria
is met. After an agent has solved the task or performed better than its predecessors, the
ANN is stored using Pickle. The block diagram is shown in figure 5.2.
CHAPTER 6
6 System Implementation
6.1 Introduction
Based on the above system design, two algorithms are implemented. The first
algorithm is Conventional Neuroevolution(CNE) and the next is Neuroevolution of Aug-
mented Topologies(NEAT).
CNE focuses on evolution with a static structure of ANN by optimizing the weights
of the ANN. NEAT goes a step beyond and focuses on both evolution of weights and the
structure of the ANN. Both of these algorithms will be discussed below.
1. Chromosome Encoding: The weights (and biases) in the neural network are
encoded as a list of real numbers as shown in figure 6.1.
4. Operators: There are different types of genetic operators. These are grouped into
two basic categories: mutations, crossovers. A mutation operator takes one parent
and randomly changes some of the entries in its chromosome to create a child. A
crossover operator takes two parents and creates one child containing some of the
genetic material of each parent. Each of the operators are discussed individually
below.
Therefore, the weight settings in these parents tend to be better than random
settings. Hence, biasing the probability distribution by the present value of
the weight should give better results than a probability distribution centered
on zero.
CROSSOVER-WEIGHTS: This operator puts a value into each position
of the childs chromosome by randomly selecting one of the two parents and
using the value in the same position on that parents chromosome as shown in
Figure 6.2.
5. Parameter Settings: There are a different parameters whose values can greatly
influence the performance of the algorithm. Except where stated otherwise, these
were kept constant across runs. The different parameters are,
(a) POPULATION-SIZE: 50
(b) MUTATION-PROBABILITY = 0.3
(c) BIASED-MUTATE-WEIGHTS-PROBABILITY = 0.5
(a) The population is initialized. The result of the initialization is a set of chro-
mosomes. Each member of the population is evaluated. Evaluations may be
normalized. The important thing is to preserve relative ranking of evaluations.
(b) The population undergoes reproduction until a stopping criterion is met.
Reproduction consists of a number of iterations of the following three steps:
i. One or more parents are chosen to reproduce. Selection is stochastic, but
the parents with the highest evaluations are favored in the selection.
ii. The operators are applied to the parents to produce children. The param-
eters help determine which operators to use.
iii. The children are evaluated and inserted into the population. In some ver-
sions of the genetic algorithm, the entire population is replaced in each
cycle of reproduction. In others, only subsets of the population are re-
placed.
they are both derived from the same ancestral gene at some point in the past. Thus,
all a system needs to do to know which genes line up with which is to keep track of the
historical origin of every gene in the system.
Tracking the historical origins requires very little computation. Whenever a new
gene appears (through structural mutation), a global innovation number is incremented
and assigned to that gene. The innovation numbers thus represent a chronology of the
appearance of every gene in the system.
The historical markings give NEAT a powerful new capability. The system now knows
exactly which genes match up with which as shown in figure 6.5. When crossing over, the
genes in both genomes with the same innovation numbers are lined up. These genes are
called matching genes. Genes that do not match are either disjoint or excess, depending on
whether they occur within or outside the range of the other parents innovation numbers.
They represent structure that is not present in the other genome. In composing the
offspring, genes are randomly chosen from either parent at matching genes, whereas all
excess or disjoint genes are always included from the more fit parent. This way, historical
markings allow NEAT to perform crossover using linear genomes with out the need for
expensive topological analysis.
Speciating the population allows organisms to compete primarily within their own
niches instead of with the population at large. This way, topological innovations are
protected in a new niche where they have time to optimize their structure through com-
petition within the niche. The idea is to divide the population into species such that
similar topologies are in the same species.
The number of excess and disjoint genes between a pair of genomes is a natural
measure of their compatibility distance. The more disjoint two genomes are, the less
evolutionary history they share, and thus the less compatible they are. Therefore, the
compatibility distance of different structures in NEAT can be measured as a simple
linear combination of the number of excess E and disjoint D genes, as well as the average
weight differences of matching genes W , including disabled genes:
c1.E c2.E c3.W
= + + (1)
N N N
The coefficients c1, c2, and c3 allow us to adjust the importance of the three factors,
and the factor N , the number of genes in the larger genome, normalizes for genome size
(N can be set to 1 if both genomes are small, i.e., consist of fewer than 20 genes).
NEAT biases the search towards minimal-dimensional spaces by starting out with
a uniform population of networks with zero hidden nodes (i.e., all inputs connect directly
to outputs). New structure is introduced incrementally as structural mutations occur,
and only those structures survive that are found to be useful through fitness evaluations.
In other words, the structural elaborations that occur in NEAT are always justified. Since
the population starts minimally, the dimensionality of the search space is minimized, and
NEAT is always searching through fewer dimensions than other TWEANNs and fixed-
topology NE systems. Minimizing dimensionality gives NEAT a performance advantage
compared to other approaches.
With the components described in the previous sections, the algorithm to evolve
Neural Networks is described here. A population of random ANNs is created. Then,
the population is divided into species and the fitness is evaluated. Based on this fitness,
the parents to produce the new population are selected. This continues until the task is
solved. The algorithm is described in detail in Algorithm 6.1.
The values as described in Table 6.1 are the mutation rates set with the NEAT
algorithm. These values can be modified according to problem domain.
Name Value
Perturb Weight 0.8
Perturb Bias 0.25
Biased Perturb Weight 0.9
Add New Node 0.03
Add connection 0.05
Crossover 0.75
Enable Connection 0.2
Disable Connection 0.4
The values as described in Table 6.3 are the parameters set for speciation. These can
be modified according to problem domain.
Name Value
c1 1
c2 2
c3 0.4
3
CHAPTER 7
7.1 Introduction
Testing is intended to show that a system conforms to its specification. Large sys-
tems are built out of the sub systems which are built out of modules which are composed
of procedures and functions. The testing process should therefore proceed in stages where
testing is carried out incrementally in conjunction with system implementation. The test-
ing of the algorithms was done along with the implementation of the various modules.
This method of testing helps to ensure the proper working of the modules at the time of
their implementation. The existence of program defects or inadequacies is inferred from
unexpected system outputs. For verification and validation, program testing technique
is used.
Solving for XOR is a simple and yet a difficult problem as the outputs of the prob-
lem are not linearly separable. So, to solve this, hidden layers are required. In this
section, a comparison between the standard backpropagation algorithm and evolution
The results for solving XOR over 50 runs are described in table 7.2:
Algorithm Minimum Epoch Maximum Epoch Average Epochs
CNE 1 199 34.5
NEAT 3 183 66.36
Backpropagation 205 7276 748
NEAT starts with no hidden layers and so the initial structures would not be able
to solve XOR. Because XOR is not linearly separable, a neural network requires hidden
units to solve it. The two inputs must be combined at some hidden unit, as opposed to
only at the output node, because there is no function over a linear combination of the
inputs that can separate the inputs into the proper classes. These structural requirements
make XOR suitable for testing NEATs ability to evolve structure. Figure 7.4 shows the
smallest ANN that NEAT evolved to solve the XOR problem.
The environment for cartpole balancing task was adopted from OpenAI gym. The
environment provides 4 observations.
1. Cart Position
2. Cart Velocity
3. Pole Angle
Based on these observations, the agent can do 2 things to keep the pole from falling
1. Move left
2. Move right
NEAT is let to evolve a network with 4 input nodes, 1 bias node and 1 output node.
If the value in the output node is less than 0.5, the agent decides to move left, else move
right. NEAT was easily able to solve the environment in less than 20 seconds(In worst
case). The environment and the ANN which solves the environment are shown in figure
7.5 and 7.6 respectively.
2. Player velocity.
3. Cpu y position.
4. Ball x position.
5. Ball y position.
6. Ball x velocity.
7. Ball y velocity.
Based on these observations, the agent can do one of the below 3 things to make sure
the ball is returned.
1. Go Up
3. Go Down
NEAT was easily able to learn to play the game in 5 minutes(Worst case). Figure 7.7
shows NEAT playing against the preprogrammed player.
flaps upward each time that the player taps the screen; if the screen is not tapped, Faby
falls because of gravity; each pair of pipes that he navigates between earns the player
a single point, with medals awarded for the score at the end of the game. No medal
is awarded to scores less than ten. A bronze medal is given to scores between ten and
twenty. In order to receive the silver medal, the player must reach 20 points. The gold
medal is given to those who score higher than thirty points. Players who achieve a score
of forty or higher receive a platinum medal. Android devices enabled the access of world
leaderboards, through Google Play.
The environment for Flappy Bird[11] has been adopted from Pygame Learning Envi-
ronment. The environment returns the following observations as state of the game.
1. Player y position.
2. Player velocity.
1. Go up
2. No operation(So as to go down)
The ANN has 8 input nodes, 1 bias node and 2 output nodes. The output nodes
determine the probability of performing an action(Up or No operation). The agents
selects the decision which has the higher probability. NEAT was able to learn to play
this in about 3 hours of evolution as explains the difficulty of the game. NEAT playing
the Flappy Bird game and the ANN that was evolved to play the game are shown in
figure 7.8 and 7.9 respectively.
24 hours and the system was let to learn to play on Google cloud compute engine with
4 vCPUs and 15GB machine. A VNC server was setup to create a desktop which was
accessed by VNC client and the learning was started. Figure 7.11 is an image of the
algorithm playing the game.
With this it is shown that our algorithm was able to learn to play a variety of games on
its own without any help. This shows that the machines can indeed learn by themselves
to do a particular task on their own rather than being explicitly programmed.
CHAPTER 8
Appendix A
THE TEAM
Sanjana G S, Meghana S B, Guru R (Guide), Basanth Jenu H B
Appendix B
CO1: Formulate the problem definition, conduct literature review and apply re-
quirements analysis.
CO2: Develop and implement algorithms for solving the problem formulated.
CO3: Comprehend, present and defend the results of exhaustive testing and explain
the major findings.
Program Outcomes:
PO6: Apply computing knowledge to assess societal, health, safety, legal and cul-
tural issues and the consequent responsibilities relevant to professional engineering
practice.
PO7: Analyze the local and global impact of computing on individuals and orga-
nizations for sustainable development.
PO8: Adopt ethical principles and uphold the responsibilities and norms of com-
puter engineering practice.
PO12: Recognize contemporary issues and adapt to technological changes for life-
long learning.
PSO1: Problem Solving Skills: Ability to apply standard practices and mathe-
matical methodologies to solve computational tasks, model real world problems in
the areas of database systems, system software, web technologies and Networking
solutions with an appropriate knowledge of Data structures and Algorithms.
Note:
Justification:
The trending advances in the field of Artificial Intelligence motivated us to choose the
topic and develop the model(PO 1, 2). With the help of the knowledge of computing,
we could come up with the solution to the existing problem . With the help of research
papers, we analysed the different research techniques used earlier(PO 3, 4). New ideas
and design patterns made our code simple and efficient(PO 5). Analyzing the problem
led us to create groundwork for future research and application in real time(PO 4, 6, 7).
The task was divided into multiple modules and each member worked individually and
collaboratively for the success of the project(PO 9, 10). The developed library can be
used for a variety of tasks and has scope for future research(PO 11, 12).
We were able to apply the knowledge from Data Structures, Algorithms and machine
Learning to implement and improve the model(PSO 1). Knowledge of computer systems
helped us to integrate different components and environments(PSO 2). This project is
a topic under research and studies in this field adds value to the resume for work and
universities(PSO 3,4).
References
[1] Brockman, Greg, and Vicki Cheung. Openai Gym. https://gym.openai.com/env
s/CartPole-v1. N.p., 2016. Web. 25 Apr. 2017.
[2] Browne, Cameron B. et al. A Survey Of Monte Carlo Tree Search Methods.
IEEE Transactions on Computational Intelligence and AI in Games 4.1 (2012):
1-43. Web.
[3] Christopher Amato and Guy Shani. High-level Reinforcement Learning in Strategy
Games, Proc. of 9th Int. Conf. on Autonomous Agents and Multiagent Systems
(AAMAS 2010), van der Hoek, Kaminka, Lesprance, Luck and Sen (eds.), May,
1014, 2010, Toronto, Canada.
[4] David J. Montana and Lawrence Davis.Training Feedforward Neural Networks Us-
ing Genetic Algorithms ,BBN Systems and Technologies Corp. 10 Moulton St.
Cambridge, MA 02138
[5] Kirci, Mesut, Nathan Sturtevant, and Jonathan Schaeffer. A GGP Feature Learn-
ing Algorithm. KI - Knstliche Intelligenz 25.1 (2011): 35-42. Web.
[6] Mitchell, Tom M. Machine Learning. 1st ed. Johanneshov: MTM, 2015. Print.
[7] Mnih, Volodymyr et al. Human-Level Control Through Deep Reinforcement Learn-
ing. Nature 518.7540 (2015): 529-533. Web.
[9] Shiffman, Daniel, Shannon Fry, and Zannah Marsh. The Nature Of Code. 1st ed.
2012.
[10] Stanley, Kenneth O., and Risto Miikkulainen. Evolving Neural Networks Through
Augmenting Topologies. Evolutionary Computation 10.2 (2002): 99-127. Web.
[13] wiechowski, Maciej et al. Recent Advances In General Game Playing. The
Scientific World Journal 2015 (2015): 1-22. Web.