0% found this document useful (0 votes)
12 views

Genetic Algorithm

Machine learning techniques Btech

Uploaded by

premkumarluv31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
12 views

Genetic Algorithm

Machine learning techniques Btech

Uploaded by

premkumarluv31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 13
9.1 MOTIVATION Genetic algorithms (GAs) provide a learning method motivated by an analogy to biological evolution. Rather than search from general-to-specific hypotheses, or from simple-to-complex, GAs generate successor hypotheses by repeatedly mutat- ing and recombining parts of the best currently known hypotheses. At each step, a collection of hypotheses called the current population is updated by replacing some fraction of the population by offspring of the most fit current hypotheses. The process forms a generate-and-test beam-search of hypotheses, in which vari- ants of the best current hypotheses are most likely to be considered next. The popularity of GAs is motivated by a number of factors including: © Evolution is known to be a successful, robust method for adaptation within biological systems. © GAs can search spaces of hypotheses containing complex interacting parts, where the impact of each part on overall hypothesis fitness may be difficult to model. Genetic algorithms are easily parallelized and can take advantage of the decreasing costs of powerful computer hardware. GA(Fitness, Fitness threshold, p,,™) Fitmess: A function that assigns an evaluation score, given a hypothesis. Fitness.shreshold: A threshold specifying the termination criterion. p: The number of hypotheses to be included in the population. 1: The fraction of the population to be replaced by Crossover at each step. im: The mutation rate. + Initialize population: P — Generate p hypotheses at random # Evaluate: For each h in P, compute Fitness(h) + While [max Fimness(h)] < Fitness threshold do Create a new generation, Ps: 1, Seect: Probabilistically select (1 —r)p members of P to add to Ps. The probability Pr(h,) of selecting hypothesis fy from P is given by Fitness(h) iy = Pltnere uy Pe) = FF Fiimessthp 2. Crossover: Probabilistically sclect 5 pairs of hypotheses from P, according to Pr(ti) given above. For each pair, (hi, 2), produce two offspring by applying the Crossover operator. ‘Add all offspring to Py. 3. Mutate: Choose m percent of the members of P, with uniform probability. For each, invert ‘one randomly selected bit in its representation. 4, Update: P = Py. 5S. Evaluate: for each h in P, compute Fitness(h) ‘* Retum the hypothesis from P that has the highest fitness. TABLE 9.1 A prototypical genetic algorithm. A population containing p hypotheses is maintained. On each itera- tion, the successor population Ps is formed by probabilistically selecting current hypotheses according to their fitness and by adding new hypotheses. New hypotheses are created by applying a crossover ‘operator to pairs of most fit hypotheses and by creating single point mutations in the resulting gener- ation of hypotheses. This process is iterated until sufficiently fit hypotheses are discovered. Typical ‘crossover and mutation operators are defined in a subsequent table. 9.2.2 Genetic Operators The generation of successors in a GA is determined by a set of operators that recombine and mutate selected members of the current population. Typical GA operators for manipulating bit string hypotheses are illustrated in Table 9.1. These operators correspond to idealized versions of the genetic operations found in bi- ological evolution. The two most common operators are crossover and mutation. The crossover operator produces two new offspring from two parent strings, by copying selected bits from each parent. The bit at position i in each offspring is copied from the bit at position i in one of the two parents. The choice of which parent contributes the bit for position i is determined by an additional string called the crossover mask. To illustrate, consider the single-point crossover operator at the top of Table 9.2. Consider the topmost of the two offspring in this case. This offspring takes its first five bits from the first parent and its remaining six bits from the second parent, because the crossover mask 11111000000 specifies these choices for each of the bit positions. The second offspring uses the same crossover ‘mask, but switches the roles of the two parents. Therefore, it contains the bits that were not used by the first offspring. In single-point crossover, the crossover mask is always constructed so that it begins with a string containing n contiguous 1s, followed by the necessary number of Os to complete the string. This results in offspring in which the first n bits are contributed by one parent and the remaining bits by the second parent, Each time the single-point crossover operator is applied, Initial strings Crossover Musk Offspring Single-point crossover: 4111091001000 11101010101 S 11111900000 < (00001010101, (00001001000 Two-point crossover: 11101001000 11001011000 aa Su f 00001010101 (00101000101 Uniform crossover: 111101001000 19001000100 . 10011010011 a ‘oggo1010101 01101011001 Point mutation: 11101001000 = 11101011000, TABLE 92 Common operators for genetic algorithms. These operators form offspring of hypotheses represented by bit strings, The crossover operators create two descendants from two parents, using the crossover ‘mask to determine which parent contributes which bits. Mutation creates a single descendant from a single parent by changing the value of a randomly chosen bit. the crossover point n is chosen at random, and the crossover mask is then created and applied. In two-point crossover, offspring are created by substituting intermediate segments of one parent into the middle of the second parent string. Put another way, the crossover mask is a string beginning with no zeros, followed by a con- tiguous string of n, ones, followed by the necessary number of zeros to complete the string. Each time the two-point crossover operator is applied, a mask is gen- erated by randomly choosing the integers np and n,. For instance, in the example shown in Table 9.2 the offspring are created using a mask for which no = 2 and n, = 5. Again, the two offspring are created by switching the roles played by the two parents. Uniform crossover combines bits sampled uniformly from the two parents, as illustrated in Table 9.2. In this case the crossover mask is generated as a random bit string with each bit chosen at random and independent of the others. In addition to recombination operators that produce offspring by combining parts of two parents, a second type of operator produces offspring from a single parent. In particular, the mutation operator produces small random changes to the bit string by choosing a single bit at random, then changing its value. Mutation is often performed after crossover has been applied as in our prototypical algorithm from Table 9.1. Some GA systems employ additional operators, especially operators that are specialized to the particular hypothesis representation used by the system. For example, Grefenstette et al. (1991) describe a system that learns sets of rules for robot control. It uses mutation and crossover, together with an operator for specializing rules. Janikow (1993) describes a system that learns sets of rules using operators that generalize and specialize rules in a variety of directed ways (e.g., by explicitly replacing the condition on an attribute by “don’t care”). 9.2.3. Fitness Function and Selection The fitness function defines the criterion for ranking potential hypotheses and for probabilistically selecting them for inclusion in the next generation population. If the task is to leam classification rules, then the fitness function typically has a component that scores the classification accuracy of the rule over a set of provided training examples. Often other criteria may be included as well, such as the com- plexity or generality of the rule. More generally, when the bit-string hypothesis is interpreted as a complex procedure (e.g. when the bit string represents a collec- tion of if-then rules that will be chained together to control a robotic device), the fitness function may measure the overall performance of the resulting procedure rather than performance of individual rules. In our prototypical GA shown in Table 9.1, the probability that a hypothesis will be selected is given by the ratio of its fitness to the fitness of other members of the current population as seen in Equation (9.1). This method is sometimes called fitness proportionate selection, ot roulette wheel selection, Other methods for using fitness to select hypotheses have also been proposed. For example, in tournament selection, two hypotheses are first chosen at random from the current population. With some predefined probability p the more fit of these two is then selected, and with probability (1 — p) the less fit hypothesis is selected. Tourna- ment selection often yields a more diverse population than fitness proportionate selection (Goldberg and Deb 1991). In another method called rank selection, the hypotheses in the current population are first sorted by fitness. The probability that a hypothesis will be selected is then proportional to its rank in this sorted list, rather than its fitness. 9.5 GENETIC PROGRAMMING Genetic programming (GP) is a form of evolutionary computation in which the in- dividuals in the evolving population are computer programs rather than bit strings. Koza (1992) describes the basic genetic programming approach and presents a broad range of simple programs that can be successfully learned by GP. 9.5.1 Representing Programs Programs manipulated by a GP are typically represented by trees correspond- ing to the parse tree of the program. Each function call is represented by a node in the tree, and the arguments to the function are given by its descendant nodes. For example, Figure 9.1 illustrates this tree representation for the function sin(x) + /x? + y. To apply genetic programming to a particular domain, the user must define the primitive functions to be considered (¢.g., sin, cos, "+, —, ex- ponentials), as well as the terminals (e.g., x, y, constants such as 2). The genetic programming algorithm then uses an evolutionary search to explore the vast space of programs that can be described using these primitives. As in a genetic algorithm, the prototypical genetic programming algorithm maintains a population of individuals (in this case, program trees). On each it- eration, it produces a new generation of individuals using selection, crossover, and mutation. The fitness of a given individual program in the population is typ- ically determined by executing the progtam on a set of training data. Crossover operations are performed by replacing a randomly chosen subtree of one parent FIGURE 9.1 OO Program tree representation in genetic programming. Arbitrary programs are represented by their parse trees. © © FIGURE 92 Crossover operation applied to two parent program trees (top). Crossover points (nodes shown in bold at top) are chosen at random. The subtrees rooted at these crossover points are then exchanged to create children trees (bottom) program by a subtree from the other parent program. Figure 9.2 illustrates a typical crossover operation. Koza (1992) describes a set of experiments applying a GP to a number of applications. In his experiments, 10% of the current population, selected prob- abilistically according to fitness, is retained unchanged in the next generation. The remainder of the new generation is created by applying of programs from the current generation, again selected probabi ing to their fitness. The mutation operator was not used in this particular set of experiments. 9.5.2 Ilustrative Example One illustrative example presented by Koza (1992) involves learning an algorithm for stacking the blocks shown in Figure 9.3. The task is to develop a general algo- rithm for stacking the blocks into a single stack that spells the word “universal,” Ln] © s r ful {7 [i] SSG... y/° uUe ,10Yv 5 FIGURE 93. A block-stacking problem. The task for GP is to discover a program that can transform an arbitrary initial configuration of blocks imto a stack that spells the word “universal.” A set of 166 such Infual configurations was provided to evaluate fitness of candidate programs (after Koza 1992), independent of the initial configuration of blocks in the world. The actions avail- able for manipulating blocks allow moving only a single block at a time. In particular, the top block on the stack can be moved to the table surface, or a block on the table surface can be moved to the top of the stack. As in most GP applications, the choice of problem representation has a significant impact on the ease of solving the problem. In Koza’s formulation, the primitive functions used to compose programs for this task include the following three terminal arguments: © CS (current stack), which refers to the name of the top block on the stack, or F if there is no current stack. TB (top correct block), which refers to the name of the topmost block on the stack, such that it and those blocks beneath it are in the correct order. ‘* NN (next necessary), which refers to the name of the next block needed above TB in the stack, in order to spell the word “universal,” or F if no more blocks are needed. As can be seen, this particular choice of terminal arguments provides a natu- ral representation for describing programs for manipulating blocks for this task. Imagine, in contrast, the relative difficulty of the task if we were to instead define the terminal arguments to be the x and y coordinates of each block. In addition to these terminal arguments, the program language in this appli- cation included the following primitive functions: (MS x) (move to stack), if block x is on the table, this operator moves x to the top of the stack and returns the value T. Otherwise, it does nothing and returns the value F. (MT x) (move to table), if block x is somewhere in the stack, this moves the block at the top of the stack to the table and returns the value 7. Otherwise, it returns the value F. # EQ x ») (equal), which retums T if x equals y, and returns F otherwise. (NOT x), which returns T if x = F, and returns F if x = 7. © (DU x y) do until), which executes the expression x repeatedly until ex- pression y returns the value 7. To allow the system to evaluate the fitness of any given program, Koza provided a set of 166 training example problems representing a broad variety of initial block configurations, including problems of differing degrees of difficulty. The fitness of any given program was taken to be the number of these examples solved by the algorithm. The population was initialized to a set of 300 random programs. After 10 generations, the system discovered the following program, which solves all 166 problems. (EQ (DU (MT CS)(NOT CS)) (DU (MS NN)(NOT NN)) ) Notice this program contains a sequence of two DU, or “Do Until” state- ments. The first repeatedly moves the current top of the stack onto the table, until the stack becomes empty. The second “Do Until” statement then repeatedly moves the next necessary block from the table onto the stack. ‘The role played by the top level EQ expression here is to provide a syntactically legal way to sequence these two “Do Until” loops. Somewhat surprisingly, after only a few generations, this GP was able to discover a program that solves all 166 training problems. Of course the ability of the system to accomplish this depends strongly on the primitive arguments and functions provided, and on the set of training example cases used to evaluate fitness. 9.6 MODELS OF EVOLUTION AND LEARNING In many natural systems, individual organisms learn to adapt significantly during their lifetime. At the same time, biological and social processes allow their species to adapt over a time frame of many generations. One interesting question regarding evolutionary systems is “What is the relationship between learning during the lifetime of a single individual, and the longer time frame species-level learning afforded by evolution?” 9.6.1 Lamarckian Evolution Lamarck was a scientist who, in the late nineteenth century, proposed that evo- lution over many generations was directly influenced by the experiences of indi- vidual organisms during their lifetime. In particular, he proposed that experiences of a single organism directly affected the genetic makeup of their offspring: If an individual learned during its lifetime to avoid some toxic food, it could pass this trait on genetically to its offspring, which therefore would not need to learn the trait. This is an attractive conjecture, because it would presumably allow for more efficient evolutionary progress than a generate-and-test process (like that of GAs and GPs) that ignores the experience gained during an individual's lifetime. Despite the attractiveness of this theory, current scientific evidence overwhelm- ingly contradicts Lamarck’s model. The currently accepted view is that the genetic makeup of an individual is, in fact, unaffected by the lifetime experience of one’s biological parents. Despite this apparent biological fact, recent computer studies have shown that Lamarckian processes can sometimes improve the effectiveness of computerized genetic algorithms (see Grefenstette 1991; Ackley and Littman 1994; and Hart and Belew 1995). 9.6.2 Baldwin Effect Although Lamarckian evolution is not an accepted model of biological evolution, other mechanisms have been suggested by which individual learning can alter the course of evolution. One such mechanism is called the Baldwin effect, after J. M. Baldwin (1896), who first suggested the idea. The Baldwin effect is based on the following observations: © Ifa species is evolving in a changing environment, there will be evolution- ary pressure to favor individuals with the capability to learn during their lifetime. For example, if a new predator appears in the environment, then individuals capable of learning to avoid the predator will be more successful than individuals who cannot lear. In effect, the ability to learn allows an individual to perform a small local search during its lifetime to maximize its fitness. In contrast, nonlearning individuals whose fitness is fully determined by their genetic makeup will operate at a relative disadvantage. « Those individuals who are able to learn many traits will rely less strongly on their genetic code to “hard-wire” traits. As a result, these individuals ‘can support a more diverse gene pool, relying on individual learning to overcome the “missing” or “not quite optimized” traits in the genetic code. This more diverse gene pool can, in turn, support more rapid evolutionary adaptation, Thus, the ability of individuals to lear can have an indirect accelerating effect on the rate of evolutionary adaptation for the entire pop- ulation. To illustrate, imagine some new change in the environment of some species, such as a new predator. Such a change will selectively favor individuals capa- ble of learning to avoid the predator. As the proportion of such self-improving individuals in the population grows, the population will be able to support a more diverse gene pool, allowing evolutionary processes (even non-Lamarckian generate-and-test processes) to adapt more rapidly. This accelerated adaptation may in tum enable standard evolutionary processes to more quickly evolve a genetic (nonlearned) trait to avoid the predator (e.g., an instinctive fear of this animal). Thus, the Baldwin effect provides an indirect mechanism for individ- ual learning to positively impact the rate of evolutionary progress. By increas- ing survivability and genetic diversity of the species, individual learning sup- ports more rapid evolutionary progress, thereby increasing the chance that the species will evolve genetic, nonleared traits that better fit the new environ- ‘ment. There have been several attempts to develop computational models to study the Baldwin effect. For example, Hinton and Nowlan (1987) experimented with evolving a population of simple neural networks, in which some network weights were fixed during the individual network “lifetime,” while others were trainable. ‘The genetic makeup of the individual determined which weights were train- able and which were fixed. In their experiments, when no individual learning © Genetic programming is a variant of genetic algorithms in which the hy- potheses being manipulated are computer programs rather than bit strings. Operations such as crossover and mutation are generalized to apply to pro- grams rather than bit strings. Genetic programming has been demonstrated to learn programs for tasks such as simulated robot control (Koza 1992) and recognizing objects in visual scenes (Teller and Veloso 1994).

You might also like