0% found this document useful (0 votes)
69 views30 pages

6 Genetic Algorithms: 6.1 Theory

Genetic algorithms (GAs) are computer simulations that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover. GAs initialize a population of chromosomes that represent potential solutions, then evaluate their fitness and select chromosomes for reproduction to produce new generations. This process is repeated until an optimal solution emerges. The key steps are encoding solutions, evaluating fitness, selecting parents, crossing over parents to create children for the next generation, and occasionally mutating genes. GAs are useful for optimization and machine learning problems as they can search very large spaces to find optimal or near-optimal solutions.

Uploaded by

Rodolfo Flores
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views30 pages

6 Genetic Algorithms: 6.1 Theory

Genetic algorithms (GAs) are computer simulations that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover. GAs initialize a population of chromosomes that represent potential solutions, then evaluate their fitness and select chromosomes for reproduction to produce new generations. This process is repeated until an optimal solution emerges. The key steps are encoding solutions, evaluating fitness, selecting parents, crossing over parents to create children for the next generation, and occasionally mutating genes. GAs are useful for optimization and machine learning problems as they can search very large spaces to find optimal or near-optimal solutions.

Uploaded by

Rodolfo Flores
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

6 Genetic Algorithms

Genetic Algorithms (GAs) are computer simulations to evolve a population of chromosomes that contain at least some very t individuals. Fitness is specied by a tness function that rates each individual in the population. Setting up a GA simulation is fairly easy: we need to represent (or encode) the state of a system in a chromosome that is usually implemented as a set of bits. GA is basically a search operation: searching for a good solution to a problem where the solution is a very t chromosome. The programming technique of using GA is useful for AI systems that must adapt to changing conditions because re-programming can be as simple as dening a new tness function and re-running the simulation. An advantage of GA is that the search process will not often get stuck in local minimum because the genetic crossover process produces radically different chromosomes in new generations while occasional mutations (ipping a random bit in a chromosome) cause small changes. Another aspect of GA is supporting the evolutionary concept of survival of the ttest: by using the tness function we will preferentially breed chromosomes with higher tness values. It is interesting to compare how GAs are trained with how we train neural networks (Chapter 7). We need to manually supervise the training process: for GAs we need to supply a tness function and for the two neural network models used in Chapter 7 we need to supply training data with desired sample outputs for sample inputs.

6.1 Theory
GAs are typically used to search very large and possibly very high dimensional search spaces. If we want to nd a solution as a single point in an N dimensional space where a tness function has a near maximum value, then we have N parameters to encode in each chromosome. In this chapter we will be solving a simple problem that is one-dimensional so we only need to encode a single number (a oating point number for this example) in each chromosome. Using a GA toolkit, like the one developed in Section 6.2, requires two problem-specic customizations: Characterize the search space by a set of parameters that can be encoded in a chromosome (more on this later). GAs work with the coding of a parameter set, not the parameters themselves (Genetic Algorithms in Search, Optimiza-

99

6 Genetic Algorithms

Figure 6.1: The test function evaluated over the interval [0.0, 10.0]. The maximum value of 0.56 occurs at x=3.8 tion, and Machine Learning, David Goldberg, 1989). Provide a numeric tness function that allows us to rate the tness of each chromosome in a population. We will use these tness values to determine which chromosomes in the population are most likely to survive and reproduce using genetic crossover and mutation operations. The GA toolkit developed in this chapter treats genes as a single bit; while you can consider a gene to be an arbitrary data structure, the approach of using single bit genes and specifying the number of genes (or bits) in a chromosome is very exible. A population is a set of chromosomes. A generation is dened as one reproductive cycle of replacing some elements of the chromosome population with new chromosomes produced by using a genetic crossover operation followed by optionally mutating a few chromosomes in the population. We will describe a simple example problem in this section, write a general purpose library in Section 6.2, and nish the chapter in Section 6.3 by solving the problem posed in this section. For a sample problem, suppose that we want to nd the maximum value of the function F with one independent variable x in Equation 6.1 and as seen in Figure 6.1: F (x) = sin(x) sin(0.4 x) sin(3 x) (6.1)

The problem that we want to solve is nding a good value of x to nd a near to possible maximum value of F (x). To be clear: we encode a oating point number

100

6.2 Java Library for Genetic Algorithms

Figure 6.2: Crossover operation as a chromosome made up of a specic number of bits so any chromosome with randomly set bits will represent some random number in the interval [0, 10]. The tness function is simply the function in Equation 6.1. Figure 6.2 shows an example of a crossover operation. A random chromosome bit index is chosen, and two chromosomes are cut at this this index and swap cut parts. The two original chromosomes in generationn are shown on the left of the gure and after the crossover operation they produce two new chromosomes in generationn+1 shown on the right of the gure. In addition to using crossover operations to create new chromosomes from existing chromosomes, we will also use genetic mutation: randomly ipping bits in chromosomes. A tness function that rates the tness value of each chromosome allows us to decide which chromosomes to discard and which to use for the next generation: we will use the most t chromosomes in the population for producing the next generation using crossover and mutation. We will implement a general purpose Java GA library in the next section and then solve the example problem posed in this section in Section 6.3.

6.2 Java Library for Genetic Algorithms


The full implementation of the GA library is in the Java source le Genetic.java. The following code snippets shows the method signatures dening the public API for the library; note that there are two constructors, the rst using default values for the fraction of chromosomes on which to perform crossover and mutation operations and the second constructor allows setting explicit values for these parameters: abstract public class Genetic { public Genetic(int num_genes_per_chromosome,

101

6 Genetic Algorithms int num_chromosomes) public Genetic(int num_genes_per_chromosome, int num_chromosomes, float crossover_fraction, float mutation_fraction) The method sort is used to sort the population of chromosomes in most t rst order. The methods getGene and setGene are used to fetch and change the value of any gene (bit) in any chromosome. These methods are protected but you will probably not need to override them in derived classes. protected void sort() protected boolean getGene(int chromosome, int gene) protected void setGene(int chromosome, int gene, int value) protected void setGene(int chromosome, int gene, boolean value) The methods evolve, doCrossovers, doM utations, and doRemoveDuplicates are utilities for running GA simulations. These methods are protected but you will probably not need to override them in derived classes. protected protected protected protected void void void void evolve() doCrossovers() doMutations() doRemoveDuplicates()

When you subclass class Genetic you must implement the following abstract method calcF itness that will determine the evolution of chromosomes during the GA simulation. // Implement the following method in sub-classes: abstract public void calcFitness(); }

The class Chromosome represents a bit set with a specied number of bits and a oating point tness value.

102

6.2 Java Library for Genetic Algorithms class Chromosome { private Chromosome() public Chromosome(int num_genes) public boolean getBit(int index) public void setBit(int index, boolean value) public float getFitness() public void setFitness(float value) public boolean equals(Chromosome c) }

The class ChromosomeComparator implements a Comparator interface and is application specic: it is used to sort a population in best rst order:

class ChromosomeComparator implements Comparator<Chromosome> { public int compare(Chromosome o1, Chromosome o2) } The last class ChromosomeComparator is used when using the Java Collection class static sort method. The class Genetic is an abstract class: you must subclass it and implement the method calcF itness that uses an application specic tness function (that you must supply) to set a tness value for each chromosome. This GA library provides the following behavior: Generates an initial random population with a specied number of bits (or genes) per chromosome and a specied number of chromosomes in the population Ability to evaluate each chromosome based on a numeric tness function Ability to create new chromosomes from the most t chromosomes in the population using the genetic crossover and mutation operations There are two class constructors for Genetic set up a new GA experiment by setting the number of genes (or bits) per chromosome, and the number of chromosomes in the population. The Genetic class constructors build an array of integers rouletteW heel which is used to weight the most t chromosomes in the population for choosing the parents

103

6 Genetic Algorithms of crossover and mutation operations. When a chromosome is being chosen, a random integer is selected to be used as an index into the rouletteW heel array; the values in the array are all integer indices into the chromosome array. More t chromosomes are heavily weighted in favor of being chosen as parents for the crossover operations. The algorithm for the crossover operation is fairly simple; here is the implementation: public void doCrossovers() { int num = (int)(numChromosomes * crossoverFraction); for (int i = num - 1; i >= 0; i--) { // Dont overwrite the "best" chromosome // from current generation: int c1 = 1 + (int) ((rouletteWheelSize - 1) * Math.random() * 0.9999f); int c2 = 1 + (int) ((rouletteWheelSize - 1) * Math.random() * 0.9999f); c1 = rouletteWheel[c1]; c2 = rouletteWheel[c2]; if (c1 != c2) { int locus = 1+(int)((numGenesPerChromosome-2) * Math.random()); for (int g = 0; g<numGenesPerChromosome; g++) { if (g < locus) { setGene(i, g, getGene(c1, g)); } else { setGene(i, g, getGene(c2, g)); } } } } } The method doM utations is similar to doCrossovers: we randomly choose chromosomes from the population and for these selected chromosomes we randomly ip the value of one gene (a gene is a bit in our implementation): public void doMutations() { int num = (int)(numChromosomes * mutationFraction); for (int i = 0; i < num; i++) { // Dont overwrite the "best" chromosome // from current generation: int c = 1 + (int) ((numChromosomes - 1) * Math.random() * 0.99);

104

6.3 Finding the Maximum Value of a Function int g = (int) (numGenesPerChromosome * Math.random() * 0.99); setGene(c, g, !getGene(c, g)); } } We developed a general purpose library in this section for simulating populations of chromosomes that can evolve to a more t population given a tness function that ranks individual chromosomes in order of tness. In Section 6.3 we will develop an example GA application by dening the size of a population and the tness function dened by Equation 6.1.

6.3 Finding the Maximum Value of a Function


We will use the Java library in the last section to develop an example application to nd the maximum of the function seen in Figure 6.1 which shows a plot of Equation 6.1 plotted in the interval [0, 10]. While we could nd the maximum value of this function by using Newtons method (or even a simple brute force search over the range of the independent variable x), the GA method scales very well to similar problems of higher dimensionality. The GA also helps us to not nd just locally optimum solutions. In this example we are working in one dimension so we only need to encode a single variable in a chromosome. As an example of a higher dimensional system, we might have products of sine waves using 20 independent variables x1, x2, ..x20. Still, the one-dimensional case seen in Figure 6.1 is a good example for showing you how to set up a GA simulation. Our rst task is to characterize the search space as one or more parameters. In general when we write GA applications we might need to encode several parameters in a single chromosome. For example, if a tness function has three arguments we would encode three numbers in a singe chromosome. In this example problem, we have only one parameter, the independent variable x. We will encode the parameter x using ten bits (so we have ten 1-bit genes per chromosome). A good starting place is writing utility method for converting the 10-bit representation to a oating-point number in the range [0.0, 10.0]: float geneToFloat(int chromosomeIndex) { int base = 1; float x = 0; for (int j=0; j<numGenesPerChromosome; j++) if (getGene(chromosomeIndex, j)) {

105

6 Genetic Algorithms x += base; } base *= 2;

After summing up all on bits times their base2 value, we need to normalize what is an integer in the range of [0,1023] to a oating point number in the approximate range of [0, 10]: x /= 102.4f; return x; } Note that we do not need the reverse method! We use the GA library from Section 6.2 to create a population of 10-bit chromosomes; in order to evaluate the tness of each chromosome in a population, we only have to convert the 10-bit representation to a oating-point number for evaluation using the following tness function (Equation 6.1): private float fitness(float x) { return (float)(Math.sin(x) * Math.sin(0.4f * x) * Math.sin(3.0f * x)); } Table 6.1 shows some sample random chromosomes and the oating point numbers that they encode. The rst column shows the gene indices where the bit is on, the second column shows the chromosomes as an integer number represented in binary notation, and the third column shows the oating point number that the chromosome encodes. The center column in Table 6.1 shows the bits in order where index 0 is the left-most bit, and index 9 if the right-most bit; this is the reverse of the normal order for encoding integers but the GA does not care: it works with any encoding we use. Once again, GAs work with encodings. On bits in chromosome 2, 5, 7, 8, 9 0, 1, 3, 5, 6 0, 3, 5, 6, 7, 8 As binary 0010010111 1101011000 1001011110 Number encoded 9.1015625 1.0449219 4.7753906

Table 6.1: Random chromosomes and the oating point numbers that they encode Using methods geneT oF loat and f itness we now implement the abstract method calcF itness from our GA library class Genetic so the derived class T estGenetic

106

6.3 Finding the Maximum Value of a Function is not abstract. This method has the responsibility for calculating and setting the tness value for every chromosome stored in an instance of class Genetic: public void calcFitness() { for (int i=0; i<numChromosomes; i++) { float x = geneToFloat(i); chromosomes.get(i).setFitness(fitness(x)); } } While it was useful to make this example more clear with a separate geneT oF loat method, it would have also been reasonable to simply place the formula in the method f itness in the implementation of the abstract (in the base class) method calcF itness. In any case we are done with coding this example: you can compile the two example Java les Genetic.java and TestGenetic.java, and run the T estGenetic class to verify that the example program quickly nds a near maximum value for this function. You can try setting different numbers of chromosomes in the population and try setting non-default crossover rates of 0.85 and a mutation rates of 0.3. We will look at a run with a small number of chromosomes in the population created with: genetic_experiment = new MyGenetic(10, 20, 0.85f, 0.3f); int NUM_CYCLES = 500; for (int i=0; i<NUM_CYCLES; i++) { genetic_experiment.evolve(); if ((i%(NUM_CYCLES/5))==0 || i==(NUM_CYCLES-1)) { System.out.println("Generation " + i); genetic_experiment.print(); } } In this experiment 85% of chromosomes will be sliced and diced with a crossover operation and 30% will have one of their genes changed. We specied 10 bits per chromosome and a population size of 20 chromosomes. In this example, I have run 500 evolutionary cycles. After you determine a tness function to use, you will probably need to experiment with the size of the population and the crossover and mutation rates. Since the simulation uses random numbers (and is thus nondeterministic), you can get different results by simply rerunning the simulation. Here is example program output (with much of the output removed for brevity): count of slots in roulette wheel=55

107

6 Genetic Algorithms Generation 0 Fitness for chromosome 0 is 0.505, occurs at x=7.960 Fitness for chromosome 1 is 0.461, occurs at x=3.945 Fitness for chromosome 2 is 0.374, occurs at x=7.211 Fitness for chromosome 3 is 0.304, occurs at x=3.929 Fitness for chromosome 4 is 0.231, occurs at x=5.375 ... Fitness for chromosome 18 is -0.282 occurs at x=1.265 Fitness for chromosome 19 is -0.495, occurs at x=5.281 Average fitness=0.090 and best fitness for this generation:0.505 ... Generation 499 Fitness for chromosome 0 is 0.561, occurs at x=3.812 Fitness for chromosome 1 is 0.559, occurs at x=3.703 ... This example is simple but is intended to be show you how to encode parameters for a problem where you want to search for values to maximize a tness function that you specify. Using the library developed in this chapter you should be able to set up and run a GA simulation for your own applications.

108

7 Neural Networks
Neural networks can be used to efciently solve many problems that are intractable or difcult using other AI programming techniques. I spent almost two years on a DARPA neural network tools advisory panel, wrote the rst version of the ANSim neural network product, and have used neural networks for a wide range of application problems (radar interpretation, bomb detection, and as controllers in computer games). Mastering the use of simulated neural networks will allow you to solve many types of problems that are very difcult to solve using other methods. Although most of this book is intended to provide practical advice (with some theoretical background) on using AI programming techniques, I cannot imagine being interested in practical AI programming without also wanting to think about the philosophy and mechanics of how the human mind works. I hope that my readers share this interest. In this book, we have examined techniques for focused problem solving, concentrating on performing one task at a time. However, the physical structure and dynamics of the human brain is inherently parallel and distributed [Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Rumelhart, McClelland, etc. 1986]. We are experts at doing many things at once. For example, I simultaneously can walk, talk with my wife, keep our puppy out of cactus, and enjoy the scenery behind our house in Sedona, Arizona. AI software systems struggle to perform even narrowly dened tasks well, so how is it that we are able to simultaneously perform several complex tasks? There is no clear or certain answer to this question at this time, but certainly the distributed neural architecture of our brains is a requirement for our abilities. Unfortunately, articial neural network simulations do not currently address multi-tasking (other techniques that do address this issue are multi-agent systems with some form or mediation between agents). Also interesting is the distinction between instinctual behavior and learned behavior. Our knowledge of GAs from Chapter 6 provides a clue to how the brains of especially lower order animals can be hardwired to provide efcient instinctual behavior under the pressures of evolutionary forces (i.e., likely survival of more t individuals). This works by using genetic algorithms to design specic neural wiring. I have used genetic algorithms to evolve recurrent neural networks for control applications. This work only had partial success but did convince me that biological genetic pressure is probably adequate to pre-wire some forms of behavior in natural (biological) neural networks.

109

7 Neural Networks

Figure 7.1: Physical structure of a neuron While we will study supervised learning techniques in this chapter, it is possible to evolve both structure and attributes of neural networks using other types of neural network models like Adaptive Resonance Theory (ART) to autonomously learn to classify learning examples without intervention. We will start this chapter by discussing human neuron cells and which features of real neurons that we will model. Unfortunately, we do not yet understand all of the biochemical processes that occur in neurons, but there are fairly accurate models available (web search neuron biochemical). Neurons are surrounded by thin hairlike structures called dendrites which serve to accept activation from other neurons. Neurons sum up activation from their dendrites and each neuron has a threshold value; if the activation summed over all incoming dendrites exceeds this threshold, then the neuron res, spreading its activation to other neurons. Dendrites are very localized around a neuron. Output from a neuron is carried by an axon, which is thicker than dendrites and potentially much longer than dendrites in order to affect remote neurons. Figure 7.1 shows the physical structure of a neuron; in general, the neurons axon would be much longer than is seen in Figure 7.1. The axon terminal buttons transfer activation to the dendrites of neurons that are close to the individual button. An individual neuron is connected to up to ten thousand other neurons in this way. The activation absorbed through dendrites is summed together, but the ring of a neuron only occurs when a threshold is passed.

7.1 Hopeld Neural Networks


Hopeld neural networks implement associative (or content addressable) memory. A Hopeld network is trained using a set of patterns. After training, the network can be shown a pattern similar to one of the training inputs and it will hopefully associate the noisy pattern with the correct input pattern. Hopeld networks are

110

7.2 Java Classes for Hopeld Neural Networks very different than back propagation networks (covered later in Section 7.4) because the training data only contains input examples unlike back propagation networks that are trained to associate desired output patterns with input patterns. Internally, the operation of Hopeld neural networks is very different than back propagation networks. We use Hopeld neural networks to introduce the subject of neural nets because they are very easy to simulate with a program, and they can also be very useful in practical applications. The inputs to Hopeld networks can be any dimensionality. Hopeld networks are often shown as having a two-dimensional input eld and are demonstrated recognizing characters, pictures of faces, etc. However, we will lose no generality by implementing a Hopeld neural network toolkit with one-dimensional inputs because a two-dimensional image can be represented by an equivalent one-dimensional array. How do Hopeld networks work? A simple analogy will help. The trained connection weights in a neural network represent a high dimensional space. This space is folded and convoluted with local minima representing areas around training input patterns. For a moment, visualize this very high dimensional space as just being the three dimensional space inside a room. The oor of this room is a convoluted and curved surface. If you pick up a basketball and bounce it around the room, it will settle at a low point in this curved and convoluted oor. Now, consider that the space of input values is a two-dimensional grid a foot above the oor. For any new input, that is equivalent to a point dened in horizontal coordinates; if we drop our basketball from a position above an input grid point, the basketball will tend to roll down hill into local gravitational minima. The shape of the curved and convoluted oor is a calculated function of a set of training input vectors. After the oor has been trained with a set of input vectors, then the operation of dropping the basketball from an input grid point is equivalent to mapping a new input into the training example that is closest to this new input using a neural network. A common technique in training and using neural networks is to add noise to training data and weights. In the basketball analogy, this is equivalent to shaking the room so that the basketball nds a good minima to settle into, and not a non-optimal local minima. We use this technique later when implementing back propagation networks. The weights of back propagation networks are also best visualized as dening a very high dimensional space with a manifold that is very convoluted near areas of local minima. These local minima are centered near the coordinates dened by each input vector.

7.2 Java Classes for Hopeld Neural Networks


The Hopeld neural network model is dened in the le Hopeld.java. Since this le only contains about 65 lines of code, we will look at the code and discuss the

111

7 Neural Networks algorithms for storing and recall of patterns at the same time. In a Hopeld neural network simulation, every neuron is connected to every other neuron. Consider a pair of neurons indexed by i and j. There is a weight Wi,j between these neurons that corresponds in the code to the array element weight[i, j]. We can dene energy between the associations of these two neurons as: energy[i, j] = weight[i, j] activation[i] activation[j] In the Hopeld neural network simulator, we store activations (i.e., the input values) as oating point numbers that get clamped in value to -1 (for off) or +1 (for on). In the energy equation, we consider an activation that is not clamped to a value of one to be zero. This energy is like gravitational energy potential using a basketball court analogy: think of a basketball court with an overlaid 2D grid, different grid cells on the oor are at different heights (representing energy levels) and as you throw a basketball on the court, the ball naturally bounces around and nally stops in a location near to the place you threw the ball, in a low grid cell in the oor that is, it settles in a locally low energy level. Hopeld networks function in much the same way: when shown a pattern, the network attempts to settle in a local minimum energy point as dened by a previously seen training example. When training a network with a new input, we are looking for a low energy point near the new input vector. The total energy is a sum of the above equation over all (i,j). The class constructor allocates storage for input values, temporary storage, and a two-dimensional array to store weights: public Hopfield(int numInputs) { this.numInputs = numInputs; weights = new float[numInputs][numInputs]; inputCells = new float[numInputs]; tempStorage = new float[numInputs]; } Remember that this model is general purpose: multi-dimensional inputs can be converted to an equivalent one-dimensional array. The method addT rainingData is used to store an input data array for later training. All input values get clamped to an off or on value by the utility method adjustInput. The utility method truncate truncates oating-point values to an integer value. The utility method deltaEnergy has one argument: an index into the input vector. The class variable tempStorage is set during training to be the sum of a row of trained weights. So, the method deltaEnergy returns a measure of the energy difference between the input vector in the current input cells and the training input examples: private float deltaEnergy(int index) {

112

7.2 Java Classes for Hopeld Neural Networks float temp = 0.0f; for (int j=0; j<numInputs; j++) { temp += weights[index][j] * inputCells[j]; } return 2.0f * temp - tempStorage[index];

The method train is used to set the two-dimensional weight array and the onedimensional tempStorage array in which each element is the sum of the corresponding row in the two-dimensional weight array: public void train() { for (int j=1; j<numInputs; j++) { for (int i=0; i<j; i++) { for (int n=0; n<trainingData.size(); n++) { float [] data = (float [])trainingData.elementAt(n); float temp1 = adjustInput(data[i]) * adjustInput(data[j]); float temp = truncate(temp1 + weights[j][i]); weights[i][j] = weights[j][i] = temp; } } } for (int i=0; i<numInputs; i++) { tempStorage[i] = 0.0f; for (int j=0; j<i; j++) { tempStorage[i] += weights[i][j]; } } } Once the arrays weight and tempStorage are dened, it is simple to recall an original input pattern from a similar test pattern: public float [] recall(float [] pattern, int numIterations) { for (int i=0; i<numInputs; i++) { inputCells[i] = pattern[i]; } for (int ii = 0; ii<numIterations; ii++) { for (int i=0; i<numInputs; i++) { if (deltaEnergy(i) > 0.0f) {

113

7 Neural Networks inputCells[i] = 1.0f; } else { inputCells[i] = 0.0f; } } } return inputCells; }

7.3 Testing the Hopeld Neural Network Class


The test program for the Hopeld neural network class is T est Hopf ield. This test program dened three test input patterns, each with ten values: static float [] data [] = { { 1, 1, 1, -1, -1, -1, -1, -1, -1, -1}, {-1, -1, -1, 1, 1, 1, -1, -1, -1, -1}, {-1, -1, -1, -1, -1, -1, -1, 1, 1, 1} }; The following code fragment shows how to create a new instance of the Hopf ield class and train it to recognize these three test input patterns: test = new Hopfield(10); test.addTrainingData(data[0]); test.addTrainingData(data[1]); test.addTrainingData(data[2]); test.train(); The static method helper is used to slightly scramble an input pattern, then test the training Hopeld neural network to see if the original pattern is re-created: helper(test, "pattern 0", data[0]); helper(test, "pattern 1", data[1]); helper(test, "pattern 2", data[2]); The following listing shows an implementation of the method helper (the called method pp simply formats a oating point number for printing by clamping it to zero or one). This version of the code randomly ips one test bit and we will see that the trained Hopeld network almost always correctly recognizes the original

114

7.3 Testing the Hopeld Neural Network Class pattern. The version of method helper included in the ZIP le for this book is slightly different in that two bits are randomly ipped (we will later look at sample output with both one and two bits randomly ipped). private static void helper(Hopfield test, String s, float [] test_data) { float [] dd = new float[10]; for (int i=0; i<10; i++) { dd[i] = test_data[i]; } int index = (int)(9.0f * (float)Math.random()); if (dd[index] < 0.0f) dd[index] = 1.0f; else dd[index] = -1.0f; float [] rr = test.recall(dd, 5); System.out.print(s+"\nOriginal data: "); for (int i = 0; i < 10; i++) System.out.print(pp(test_data[i]) + " "); System.out.print("\nRandomized data: "); for (int i = 0; i < 10; i++) System.out.print(pp(dd[i]) + " "); System.out.print("\nRecognized pattern: "); for (int i = 0; i < 10; i++) System.out.print(pp(rr[i]) + " "); System.out.println(); } The following listing shows how to run the program, and lists the example output: java Test_Hopfield pattern 0 Original data: Randomized data: Recognized pattern: pattern 1 Original data: Randomized data: Recognized pattern: pattern 2 Original data: Randomized data: Recognized pattern:

1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1

In this listing we see that the three sample training patterns in T est Hopf ield.java are re-created after scrambling the data by changing one randomly chosen value to

115

7 Neural Networks its opposite value. When you run the test program several times you will see occasional errors when one random bit is ipped and you will see errors occur more often with two bits ipped. Here is an example with two bits ipped per test: the rst pattern is incorrectly reconstructed and the second and third patterns are reconstructed correctly: pattern 0 Original data: Randomized data: Recognized pattern: pattern 1 Original data: Randomized data: Recognized pattern: pattern 2 Original data: Randomized data: Recognized pattern:

1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 1

7.4 Back Propagation Neural Networks


The next neural network model that we will use is called back propagation, also known as back-prop and delta rule learning. In this model, neurons are organized into data structures that we call layers. Figure 7.2 shows a simple neural network with two layers; this network is shown in two different views: just the neurons organized as two one-dimensional arrays, and as two one-dimensional arrays with the connections between the neurons. In our model, there is a connection between two neurons that is characterized by a single oating-point number that we will call the connections weight. A weight Wi,j connects input neuron i to output neuron j. In the back propagation model, we always assume that a neuron is connected to every neuron in the previous layer. The key thing is to be able to train a back-prop neural network. Training is performed by calculating sets of weights for connecting each layer. As we will see, we will train networks by applying input values to the input layer, allowing these values to propagate through the network using the current weight values, and calculating the errors between desired output values and the output values from propagation of input values through the network. Initially, weights are set to small random values. You will get a general idea for how this is done in this section and then we will look at Java implementation code in Section 7.5. In Figure 7.2, we only have two neuron layers, one for the input neurons and one for the output neurons. Networks with no hidden layers are not usually useful I

116

7.4 Back Propagation Neural Networks

Input neuron layer


Input 1 W1,1 W1,2 Input 2 W2,1 W 2,2

Input neuron layer


Input 1 Input 2

Output 1 Output 2

W1,1 W1,2

W2,1 W2,2

Output 1

Output 2

Output neuron layer

Output neuron layer

Figure 7.2: Two views of the same two-layer neural network; the view on the right shows the connection weights between the input and output layers as a two-dimensional array.

am using the network in Figure 7.2 just to demonstrate layer to layer connections through a weights array. To calculate the activation of the rst output neuron O1, we evaluate the sum of the products of the input neurons times the appropriate weight values; this sum is input to a Sigmoid activation function (see Figure 7.3) and the result is the new activation value for O1. Here is the formula for the simple network in Figure 7.2: O1 = Sigmoid(I1 W [1, 1] + I2 W [2, 1]) O2 = Sigmoid(I2 W [1, 2] + I2 W [2, 2]) Figure 7.3 shows a plot of the Sigmoid function and the derivative of the sigmoid function (SigmoidP ). We will use the derivative of the Sigmoid function when training a neural network (with at least one hidden neuron layer) with classied data examples. A neural network like the one seen in Figure 7.2 is trained by using a set of training data. For back propagation networks, training data consists of matched sets of input with matching desired output values. We want to train a network to not only produce the same outputs for training data inputs as appear in the training data, but also to generalize its pattern matching ability based on the training data to be able to match test patterns that are similar to training input patterns. A key here is to balance the size of the network against how much information it must hold. A common mistake when using back-prop networks is to use too large a network: a network that contains too many neurons and connections will simply memorize the training

117

7 Neural Networks

Figure 7.3: Sigmoid and derivative of the Sigmoid (SigmoidP) functions. This plot was produced by the le src-neural-networks/Graph.java. examples, including any noise in the training data. However, if we use a smaller number of neurons with a very large number of training data examples, then we force the network to generalize, ignoring noise in the training data and learning to recognize important traits in input data while ignoring statistical noise. How do we train a back propagation neural network given that we have a good training data set? The algorithm is quite easy; we will now walk through the simple case of a two-layer network like the one in Figure 7.2, and later in Section 7.5 we will review the algorithm in more detail when we have either one or two hidden neuron layers between the input and output layers. In order to train the network in Figure 7.2, we repeat the following learning cycle several times: 1. Zero out temporary arrays for holding the error at each neuron. The error, starting at the output layer, is the difference between the output value for a specic output layer neuron and the calculated value from setting the input layer neurons activation values to the input values in the current training example, and letting activation spread through the network. 2. Update the weight Wi,j (where i is the index of an input neuron, and j is the index of an output neuron) using the formula Wi,j + = learning rate output errorj Ii (learning rate is a tunable parameter) and output errorj was calculated in step 1, and Ii is the activation of input neuron at index i. This process is continued to either a maximum number of learning cycles or until the calculated output errors get very small. We will see later that the algorithm is similar but slightly more complicated, when we have hidden neuron layers; the difference is that we will back propagate output errors to the hidden layers in order to estimate errors for hidden neurons. We will cover more on this later. This type of neural

118

7.5 A Java Class Library for Back Propagation

Figure 7.4: Capabilities of zero, one, and two hidden neuron layer neural networks. The grayed areas depict one of two possible output values based on two input neuron activation values. Note that this is a two-dimensional case for visualization purposes; if a network had ten input neurons instead of two, then these plots would have to be ten-dimensional instead of twodimensional. network is too simple to solve very many interesting problems, and in practical applications we almost always use either one additional hidden neuron layer or two additional hidden neuron layers. Figure 7.4 shows the types of problems that can be solved by zero hidden layer, one hidden layer, and two hidden layer networks.

7.5 A Java Class Library for Back Propagation


The back propagation neural network library used in this chapter was written to be easily understood and is useful for many problems. However, one thing that is not in the implementation in this section (it is added in Section 7.6) is something usually called momentum to speed up the training process at a cost of doubling the storage requirements for weights. Adding a momentum term not only makes learning faster but also increases the chances of sucessfully learning more difcult problems. We will concentrate in this section on implementing a back-prop learning algorithm that works for both one and two hidden layer networks. As we saw in Figure 7.4 a network with two hidden layers is capable of arbitrary mappings of input to output values so there is no theoretical reason that I know of for using networks with three hidden layers. The source directory src-neural-networks contains example programs for both back

119

7 Neural Networks

Figure 7.5: Example backpropagation neural network with one hidden layer.

Figure 7.6: Example backpropagation neural network with two hidden layers.

120

7.5 A Java Class Library for Back Propagation propagation neural networks and Hopeld neural networks which we saw at the beginning of this chapter. The relevant les for the back propagation examples are: Neural 1H.java contains a class for simulating a neural network with one hidden neuron layer Test 1H.java a text-based test program for the class Neural 1H GUITest 1H.java a GUI-based test program for the class Neural 1H Neural 2H.java contains a class for simulating a neural network with two hidden neuron layers Neural 2H momentum.java contains a class for simulating a neural network with two hidden neuron layers and implements momentum learning (implemented in Section 7.6) Test 2H.java a text-based test program for the class Neural 2H GUITest 2H.java a GUI-based test program for the class Neural 2H GUITest 2H momentum.java a GUI-based test program for the class Neural 2H momentum that uses momentum learning (implemented in Section 7.6) Plot1DPanel a Java JFC graphics panel for the values of a one-dimensional array of oating point values Plot2DPanel a Java JFC graphics panel for the values of a two-dimensional array of oating point values The GUI les are for demonstration purposes only, and we will not discuss the code for these classes; if you are interested in the demo graphics code and do not know JFC Java programming, there are a few good JFC tutorials at the web site java.sun.com. It is common to implement back-prop libraries to handle either zero, one, or two hidden layers in the same code base. At the risk of having to repeat similar code in two different classes, I decided to make the N eural 1H and N eural 2H classes distinct. I think that this makes the code a little easier for you to understand. As a practical point, you will almost always start solving a neural network problem using only one hidden layer and only progress to trying two hidden layers if you cannot train a one hidden layer network to solve the problem at-hand with sufciently small error when tested with data that is different than the original training data. One hidden layer networks require less storage space and run faster in simulation than two hidden layer networks. In this section we will only look at the implementation of the class N eural 2H

121

7 Neural Networks (class N eural 1H is simpler and when you understand how N eural 2H works, the simpler class is easy to understand also). This class implements the Serializable interface and contains a utility method save to write a trained network to a disk le: class Neural_2H implements Serializable { There is a static factory method that reads a saved network le from disk and builds an instance of N eural 2H and there is a class constructor that builds a new untrained network in memory, given the number of neurons in each layer: public static Neural_2H Factory(String serialized_file_name) public Neural_2H(int num_in, int num_hidden1, int num_hidden2, int num_output) An instance of N eural 2H contains training data as transient data that is not saved by method save. transient protected ArrayList inputTraining = new Vector(); transient protected ArrayList outputTraining = new Vector(); I want the training examples to be native oat arrays so I used generic ArrayList containers. You will usually need to experiment with training parameters in order to solve difcult problems. The learning rate not only controls how large the weight corrections we make each learning cycle but this parameter also affects whether we can break out of local minimum. Other parameters that affect learning are the ranges of initial random weight values that are hardwired in the method randomizeW eights() and the small random values that we add to weights during the training cycles; these values are set in in slightlyRandomizeW eights(). I usually only need to adjust the learning rate when training back-prop networks: public float TRAINING_RATE = 0.5f; I often decrease the learning rate during training that is, I start with a large learning rate and gradually reduce it during training. The calculation for output neuron values given a set of inputs and the current weight values is simple. I placed the code for calculating a forward pass through the network in a separate method f orwardP ass() because it is also used later in the method training:

122

7.5 A Java Class Library for Back Propagation public float[] recall(float[] in) { for (int i = 0; i < numInputs; i++) inputs[i] = in[i]; forwardPass(); float[] ret = new float[numOutputs]; for (int i = 0; i < numOutputs; i++) ret[i] = outputs[i]; return ret; } public void forwardPass() { for (int h = 0; h < numHidden1; h++) { hidden1[h] = 0.0f; } for (int h = 0; h < numHidden2; h++) { hidden2[h] = 0.0f; } for (int i = 0; i < numInputs; i++) { for (int h = 0; h < numHidden1; h++) { hidden1[h] += inputs[i] * W1[i][h]; } } for (int i = 0; i < numHidden1; i++) { for (int h = 0; h < numHidden2; h++) { hidden2[h] += hidden1[i] * W2[i][h]; } } for (int o = 0; o < numOutputs; o++) outputs[o] = 0.0f; for (int h = 0; h < numHidden2; h++) { for (int o = 0; o < numOutputs; o++) { outputs[o] += sigmoid(hidden2[h]) * W3[h][o]; } } } While the code for recall and f orwardP ass is almost trivial, the training code in method train is more complex and we will go through it in some detail. Before we get to the code, I want to mention that there are two primary techniques for training back-prop networks. The technique that I use is to update the weight arrays after each individual training example. The other technique is to sum all output errors

123

7 Neural Networks over the entire training set (or part of the training set) and then calculate weight updates. In the following discussion, I am going to weave my comments on the code into the listing. The private member variable current example is used to cycle through the training examples: one training example is processed each time that the train method is called: private int current_example = 0; public float train(ArrayList ins, ArrayList v_outs) {

Before starting a training cycle for one example, we zero out the arrays used to hold the output layer errors and the errors that are back propagated to the hidden layers. We also need to copy the training example input values and output values:

int i, h, o; float error = 0.0f; int num_cases = ins.size(); //for (int example=0; example<num_cases; example++) { // zero out error arrays: for (h = 0; h < numHidden1; h++) hidden1_errors[h] = 0.0f; for (h = 0; h < numHidden2; h++) hidden2_errors[h] = 0.0f; for (o = 0; o < numOutputs; o++) output_errors[o] = 0.0f; // copy the input values: for (i = 0; i < numInputs; i++) { inputs[i] = ((float[]) ins.get(current_example))[i]; } // copy the output values: float[] outs = (float[]) v_outs.get(current_example);

We need to propagate the training example input values through the hidden layers to the output layers. We use the current values of the weights: forwardPass(); After propagating the input values to the output layer, we need to calculate the output error for each output neuron. This error is the difference between the desired output and the calculated output; this difference is multiplied by the value of the calculated

124

7.5 A Java Class Library for Back Propagation output neuron value that is rst modied by the Sigmoid function that we saw in Figure 7.3. The Sigmoid function is to clamp the calculated output value to a reasonable range. for (o = 0; o < numOutputs; o++) { output_errors[o] = (outs[o] outputs[o]) * sigmoidP(outputs[o]); } The errors for the neuron activation values in the second hidden layer (the hidden layer connected to the output layer) are estimated by summing for each hidden neuron its contribution to the errors of the output layer neurons. The thing to notice is that if the connection weight value between hidden neuron h and output neuron o is large, then hidden neuron h is contributing more to the error of output neuron o than other neurons with smaller connecting weight values: for (h = 0; h < numHidden2; h++) { hidden2_errors[h] = 0.0f; for (o = 0; o < numOutputs; o++) { hidden2_errors[h] += output_errors[o] * W3[h][o]; } } We estimate the errors in activation energy for the rst hidden layer neurons by using the estimated errors for the second hidden layers that we calculated in the last code snippet: for (h = 0; h < numHidden1; h++) { hidden1_errors[h] = 0.0f; for (o = 0; o < numHidden2; o++) { hidden1_errors[h] += hidden2_errors[o] * W2[h][o]; } }

After we have scaled estimates for the activation energy errors for both hidden layers we then want to scale the error estimates using the derivative of the sigmoid functions value of each hidden neurons activation energy:

125

7 Neural Networks for (h = 0; h < numHidden2; h++) { hidden2_errors[h] = hidden2_errors[h] * sigmoidP(hidden2[h]); } for (h = 0; h < numHidden1; h++) { hidden1_errors[h] = hidden1_errors[h] * sigmoidP(hidden1[h]); } Now that we have estimates for the hidden layer neuron errors, we update the weights connecting to the output layer and each hidden layer by adding the product of the current learning rate, the estimated error of each weights target neuron, and the value of the weights source neuron: // update the hidden2 to output weights: for (o = 0; o < numOutputs; o++) { for (h = 0; h < numHidden2; h++) { W3[h][o] += TRAINING_RATE * output_errors[o] * hidden2[h]; W3[h][o] = clampWeight(W3[h][o]); } } // update the hidden1 to hidden2 weights: for (o = 0; o < numHidden2; o++) { for (h = 0; h < numHidden1; h++) { W2[h][o] += TRAINING_RATE * hidden2_errors[o] * hidden1[h]; W2[h][o] = clampWeight(W2[h][o]); } } // update the input to hidden1 weights: for (h = 0; h < numHidden1; h++) { for (i = 0; i < numInputs; i++) { W1[i][h] += TRAINING_RATE * hidden1_errors[h] * inputs[i]; W1[i][h] = clampWeight(W1[i][h]); } } for (o = 0; o < numOutputs; o++) { error += Math.abs(outs[o] - outputs[o]); } The last step in this code snippet was to calculate an average error over all output neurons for this training example. This is important so that we can track the training

126

7.6 Adding Momentum to Speed Up Back-Prop Training status in real time. For very long running back-prop training experiments I like to be able to see this error graphed in real time to help decide when to stop a training run. This allows me to experiment with the learning rate initial value and see how fast it decays. The last thing that method train needs to do is to update the training example counter so that the next example is used the next time that train is called: current_example++; if (current_example >= num_cases) current_example = 0; return error; } You can look at the implementation of the Swing GUI test class GU T est 2H to see how I decrease the training rate during training. I also monitor the summed error rate over all output neurons and occasionally randomize the weights if the network is not converging to a solution to the current problem.

7.6 Adding Momentum to Speed Up Back-Prop Training


We did not use a momentum term in the Java code in Section 7.5. For difcult to train problems, adding a momentum term can drastically reduce the training time at a cost of doubling the weight storage requirements. To implement momentum, we remember how much each weight was changed in the previous learning cycle and make the weight change larger if the current change in direction is the same as the last learning cycle. For example, if the change to weight Wi,j had a large positive value in the last learning cycle and the calculated weight change for Wi,j is also a large positive value in the current learning cycle, then make the current weight change even larger. Adding a momentum term not only makes learning faster but also increases the chances of sucessfully learning more difcult problems. I modied two of the classes from Section 7.5 to use momentum: Neural 2H momentum.java training and recall for two hidden layer backprop networks. The constructor has an extra argument alpha that is a scaling factor for how much of the previous cycles weight change to add to the new calculated delta weight values. GUITest 2H momentum.java a GUI test application that tests the new class N eural 2H momentum. The code for class N eural 2H momentum is similar to the code for N eural 2H that we saw in the last section so here we will just look at the differences. The

127

7 Neural Networks class constructor now takes another parameter alpha that determines how strong the momentum correction is when we modify weight values: // momentum scaling term that is applied // to last delta weight: private float alpha = 0f; While this alpha term is used three times in the training code, it sufces to just look at one of these uses in detail. When we allocated the three weight arrays W 1, W 2, and W 3 we also now allocate three additional arrays of corresponding same size: W 1 last delta, W 2 last delta, and W 3 last delta. These three new arrays are used to store the weight changes for use in the next training cycle. Here is the original code to update W 3 from the last section: W3[h][o] += TRAINING_RATE * output_errors[o] * hidden2[h]; The following code snippet shows the additions required to use momentum: W3[h][o] += TRAINING_RATE * output_errors[o] * hidden2[h] + // apply the momentum term: alpha * W3_last_delta[h][o]; W3_last_delta[h][o] = TRAINING_RATE * output_errors[o] * hidden2[h]; I mentioned in the last section that there are two techniques for training back-prop networks: updating the weights after processing each training example or waiting to update weights until all training examples are processed. I always use the rst method when I dont use momentum. In many cases it is best to use the second method when using momentum.

128

You might also like