0% found this document useful (0 votes)

11 views84 pages

Bestintro

The document provides an overview of Genetic Algorithms (GAs), which are optimization techniques inspired by natural selection principles. It discusses the history, key components, and methodologies associated with GAs, including encoding methods, selection processes, and operators like crossover and mutation. Additionally, it highlights the advantages and disadvantages of GAs, along with their applications in data mining and genetic programming.

Uploaded by

Fouziya A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views84 pages

Bestintro

Uploaded by

Fouziya A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 84

CSE 634

Data Mining Concepts & Techniques

Prof: Anita Wasilewska

Genetic Algorithms (GAs)

By: Group 1

Abhishek Sharma, Mikhail Rubnich, George Iordache, Marcela

Boboila

1
General description
of the method

By: Abhishek Sharma

2
References
 ‘DATA MINING Concepts and Techniques’
Jiawei Han, Micheline Kamber Morgan Kaufman Publishers, 2003
 Data Mining Techniques : Class Lecture Notes and PP Slides.
 http://cs.felk.cvut.cz/~xobitko/ga/
 Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox,
Multidisciplinary System Design Optimization Course Lecture Notes on
Heuristic Techniques, “A Basic Introduction to Genetic Algorithms”:
http://ocw.mit.edu/NR/rdonlyres/Aeronautics-and-Astronautics/16-888Spri
ng-2004/D66C4396-90C8-49BE-BF4A-4EBE39CEAE6F/0/MSDO_L11_
GA.pdf

3
History of Genetic Algorithms
 “Evolutionary Computing” was introduced in the 1960s by I. Rechenberg.

 Professor John Holland at the University of Michigan came up with book

"Adaptation in Natural and Artificial Systems" explored the concept of
using mathematically-based artificial evolution as a method to conduct a
structured search for solutions to complex problems.

 Dr. David E. Goldberg. In his 1989 landmark text "Genetic Algorithms in

Search, Optimization and Machine Learning”, suggested applications for
genetic algorithms in a wide range of engineering fields.

4
What Are Genetic Algorithms
(GAs)?
 Genetic Algorithms are search and optimization techniques based on
Darwin’s Principle of Natural Selection.

“problems are solved by an evolutionary process resulting in a best (fittest) solution (survivor) ,
-In Other words, the solution is evolved”
1. Inheritance – Offspring acquire characteristics
2. Mutation – Change, to avoid similarity
3. Natural Selection – Variations improve survival
4. Recombination - Crossover

5
Genetics
Chromosome
 All Living organisms consists of cells. In each cell there is a same set of Chromosomes.
 Chromosomes are strings of DNA and consists of genes, blocks of DNA.
 Each gene encodes a trait, for example color of eyes.

Reproduction
 During reproduction, recombination (or crossover) occurs first. Genes from parents combine
to form a whole new chromosome. The newly created offspring can then be mutated. The
changes are mainly caused by errors in copying genes from parents.

 The fitness of an organism is measure by success of the organism in its life (survival)

 Two important elements required for any problem before a genetic

algorithm can be used for a solution are:

 Method for representing a solution (encoding)

ex: string of bits, numbers, character

 Method for measuring the quality of any proposed solution, using

fitness function
ex: Determining total weight

7
GA Elements

Citation:
http://ocw.mit.edu/NR/rdonlyres/Aeronautics-and-Astronautics/16-888Spring-2004/D66C4396-90C8-49BE-BF4A-4EBE39CEAE6F/0/MSDO_L11_GA.pdf
8
Search Space
 If we are solving some problem, we are usually looking for some solution, which will be the best
among others. The space of all feasible solutions (it means objects among those the desired
solution is) is called search space (also state space). Each point in the search space represent one
feasible solution. Each feasible solution can be "marked" by its value or fitness for the problem.
 Initialization
Initially many individual solutions are randomly generated to form an initial population, covering
the entire range of possible solutions (the search space)
Each point in the search space represents one possible solution marked by its value( fitness)
 Selection
A proportion of the existing population is selected to bread a new bread of generation.
 Reproduction
Generate a second generation population of solutions from those selected through genetic
operators: crossover and mutation.
 Termination
A solution is found that satisfies minimum criteria
Fixed number of generations found
Allocated budget (computation, time/money) reached
The highest ranking solution’s fitness is reaching or has reached

9
Methodology Associated with
GAs Begi
n

Initialize
population

Evaluate
Solutions

T =0 (first step)

Optimum N
Solution?
Selection
Y

T=T+1 Stop Crossover

(go to next step)

Mutation

10
Citation: http://cs.felk.cvut.cz/~xobitko/ga/
Creating a GA on Computer

Simple_Genetic_Algorithm()
{
Initialize the Population;
Calculate Fitness Function;

While(Fitness Value != Optimal Value)

{
Selection;//Natural Selection,
Survival Of Fittest
Crossover;//Reproduction,
Propagate favorable characteristics
Mutation;//Mutation
Calculate Fitness Function;
}
}

11
Nature Vs Computer - Mapping

Nature Computer
Population Set of solutions.
Individual Solution to a problem.
Fitness Quality of a solution.
Chromosome Encoding for a Solution.
Gene Part of the encoding of a solution.
Reproduction Crossover

12
Encoding
 The process of representing the solution in
the form of a string that conveys the
necessary information.

 Just as in a chromosome, each gene controls a

particular characteristic of the individual,
similarly, each element in the string represents a
characteristic of the solution.

13
Encoding Methods
 Binary Encoding – Most common method of encoding. Chromosomes are
strings of 1s and 0s and each position in the chromosome represents a
particular characteristic of the problem.

Chromosome A 10110010110011100101
Chromosome B 11111110000000011111

 Permutation Encoding – Useful in ordering problems such as the

Traveling Salesman Problem (TSP). Example. In TSP, every chromosome
is a string of numbers, each of which represents a city to be visited.

Chromosome A 1 5 3 2 6 4 7 9 8
Chromosome B 8 5 6 7 2 3 1 4 9

14
Encoding Methods (contd.)
 Value Encoding – Used in problems where complicated values, such as
real numbers, are used and where binary encoding would not suffice.
Good for some problems, but often necessary to develop some specific
crossover and mutation techniques for these chromosomes.

Chromosome A 1.235 5.323 0.454 2.321 2.454

Chromosome B (left), (back), (left), (right), (forward)

15
Encoding Methods (contd.)
 Tree Encoding – This encoding is used mainly for evolving programs or
expressions, i.e. for Genetic programming.
 Tree Encoding - every chromosome is a tree of some objects, such as
values/arithmetic operators or commands in a programming language.

(+ x (/ 5 y)) ( do_until step wall )

Citation:
http://ocw.mit.edu/NR/rdonlyres/Aeronautics-and-Astronautics/16-888Spring-2004/D66C4396-90C8-49BE-BF4A-4EBE39CEAE6F/0/MSDO_L11_GA.pdf
16
GA Operators

By: Mikhail Rubnich

17
References

 ‘DATA MINING Concepts and Techniques’

Jiawei Han, Micheline Kamber Morgan Kaufman Publishers, 2003
 http://www.ai-junkie.com/ga/intro/gat2.html
 http://www.faqs.org/faqs/ai-faq/genetic/part2/
 http://en.wikipedia.org/wiki/Genetic_algorithms

18
19
Citation: http://www.ewh.ieee.org/soc/es/May2001/14/GA.GIF
Basic GA Operators

Recombination

Crossover - Looking for solutions near

existing solutions

Mutation - Looking at completely new

areas of search space

20
Fitness function
 quantifies the optimality of a solution (that is,
a chromosome): that particular chromosome
may be ranked against all the other
chromosomes

 A fitness value is assigned to each solution

depending on how close it actually is to solving the
problem.
 Ideal fitness function correlates closely to goal +
quickly computable.
 For instance, knapsack problem
Fitness Function = Total value of the things in the
knapsack
21
Recombination
Main idea: "Select The Best, Discard The Rest”.

The process that chooses solutions to be

preserved and allowed to reproduce and
selects which ones must to die out.

 The main goal of the recombination operator is to

emphasize the good solutions and eliminate the
bad solutions in a population ( while keeping the
population size constant )

22
So, how to select the best?

 Roulette Selection

 Rank Selection

 Steady State Selection

 Tournament Selection

23
Roulette wheel selection
Main idea: the fitter is the solution with the
most chances to be chosen

HOW IT WORKS ?

24
Example of Roulette wheel selection
No. String Fitness % Of Total

1 01101 169 14.4

2 11000 576 49.2

3 01000 64 5.5

10011 361 30.9

4
Total 1170 100.0

Citation: : www.cs.vu.nl/~gusz/
25
Roulette wheel selection

Chromosome1
Chromosome 2
Chromosome 3
Chromosome 4

All you have to do is spin the ball and grab the chromosome at the
point it stops 

26
Crossover
Main idea: combine genetic material ( bits ) of
2 “parent” chromosomes ( solutions ) and
produce a new “child” possessing
characteristics of both “parents”.

How it works ?

Several methods ….

27
Crossover methods
 Single Point Crossover- A random point is chosen on the individual chromosomes (strings) and the genetic material is exchanged at this point.

Citation: http://www.ewh.ieee.org/soc/es/May2001/14/CROSS0.GIF

28
Crossover methods
 Two-Point Crossover- Two random points are
chosen on the individual chromosomes (strings) and
the genetic material is exchanged at these points.

Chromosome1 11011 | 00100 | 110110

Chromosome 2 10101 | 11000 | 011110

Offspring 1 10101 | 00100 | 011110

Offspring 2 11011 | 11000 | 110110

NOTE: These chromosomes are different from the last example.

29
Crossover methods
 Uniform Crossover- Each gene (bit) is selected
randomly from one of the corresponding genes of
the parent chromosomes.

Chromosome1 11011 | 00100 | 110110

Chromosome 2 10101 | 11000 | 011110

Offspring 10111 | 00000 | 110110

NOTE: Uniform Crossover yields ONLY 1 offspring.

30
Crossover (contd.)
 Crossover between 2 good solutions MAY NOT
ALWAYS yield a better or as good a solution.

 Since parents are good, probability of the child

being good is high.

 If offspring is not good (poor solution), it will be

removed in the next iteration during “Selection”.

31
Elitism
Main idea: copy the best chromosomes
(solutions) to new population before applying
crossover and mutation

 When creating a new population by crossover or

mutation the best chromosome might be lost.

 Forces GAs to retain some number of the best

individuals at each generation.

 Has been found that elitism significantly improves

performance.

32
Mutation
Main idea: random inversion of bits in
solution to maintain diversity in
population set

Ex. giraffes’ - mutations could be beneficial.

Citation: http://www.ewh.ieee.org/soc/es/May2001/14/MUTATE0.GIF
33
Advantages and disadvantages

Advantages:
 Always an answer; answer gets better with time
 Good for “noisy” environments
 Inherently parallel; easily distributed

Issues:
 Performance
 Solution is only as good as the evaluation function
(often hardest part)
 Termination Criteria

34
Applications - Genetic
programming and data
mining
By: George Iordache

35
 A.A. Freitas. “A survey of evolutionary algorithms for data mining and knowledge
discovery”, Pontificia Universidade Catolica do Parana, Brazil. In A. Ghosh and S.
Tsutsui, editors, Advances in Evolutionary Computation, pages 819--845. Springer-
Verlag, 2002.
http://citeseer.ist.psu.edu/cache/papers/cs/23050/http:zSzzSzwww.ppgia.pucpr.brzSz~alexzSzpub_pape
rs.dirzSzAdvEC-bk.pdf/freitas01survey.pdf

 Anita Wasilewska, Course Lecture Notes (2007 and previous years) on Classification
(Data Mining book Chapters 5 and 7) -
http://www.cs.sunysb.edu/~cse634/lecture_notes/07classification.pdf

 J. Han, and M. Kamber. “Data Mining: Concepts and Techniques 2nd ed.”, Morgan
Kaufmann Publishers, March 2006. ISBN 1-55860-901-6
 R. Mendes, F. Voznika, A. Freitas, and J. Nievola. “Discovering fuzzy classification rules
with genetic programming and co-evolution”, Pontificia Universidade Catolica do
Parana, Brazil. In L. de Raedt and A. Siebes, editors, 5th European Conference on
Principles and Practice of Knowledge Discovery in Databases (PKDD'01), volume 2168 of
LNAI, pages 314--325. Springer Verlag, 2001.
http://citeseer.ist.psu.edu/cache/papers/cs/23050/http:zSzzSzwww.ppgia.pucpr.brzSz~alexzSzpub_papers.dirz
SzPKDD-2001.pdf/mendes01discovering.pdf

 John R. Koza, Medical Informatics, Department of Medicine, Department of Electrical

Engineering, Stanford University, Genetic algorithms and genetic programming, Lecture
notes, 2003.
www.genetic-programming.com/c2003lecture1modified.ppt

36
Genetic Programming
A program in C
 int foo (int time)
{
int temp1, temp2;
if (time > 10)
temp1 = 3;
else
temp1 = 4;
temp2 = temp1 + 1 + 2;
return (temp2);
}

 Equivalent expression (similar to a

classification rule in data mining):

(+ 1 2 (IF (> TIME 10) 3 4))

Citation: www.genetic-programming.com/c2003lecture1modified.ppt 37
Program tree

(+ 1 2 (IF (> TIME 10) 3 4))

Citation: www.genetic-programming.com/c2003lecture1modified.ppt 38
Given data
Input: Independent variable X Output: Dependent variable Y

-1.00 1.00
-0.80 0.84
-0.60 0.76
-0.40 0.76
-0.20 0.84
0.00 1.00
0.20 1.24
0.40 1.56
0.60 1.96
0.80 2.44
1.00 3.00
Citation: www.genetic-programming.com/c2003lecture1modified.ppt 39
Problem description
Objective: Find a computer program with one
input (independent variable X) whose
output Y equals the given data

1 Terminal set: T = {X, Random-Constants}

2 Function set: F = {+, -, *, /}

3 Initial population: Randomly created individuals from

elements in T and F.
4 Fitness: |y0’ – y0| + |y1’ – y1| + … where yi’ is
computed output and yi is given
output for xi in the range [-1,1]
5 Termination: An individual emerges whose sum of
absolute errors (the value of its fitness
function) is less than 0.1
Citation: www.genetic-programming.com/c2003lecture1modified.ppt 40
Generation 0
Population of 4 randomly created individuals

x+1 x2 + 1 2 x

Citation: examples taken from: www.genetic-programming.com/c2003lecture1modified.ppt 41

X Y X+1 |X+1- X2+1 |X2+1- 2 |2-Y| X |X-Y|
Y| Y|
-1.00 1.00 0 1 2 1 2 1 -1.00 2
-0.80 0.84 0.20 0.64 1.64 0.80 2 1.16 -0.80 1.64
-0.60 0.76 0.40 0.36 1.36 0.60 2 1.24 -0.60 1.36
-0.40 0.76 0.60 0.16 1.16 0.40 2 1.24 -0.40 1.16
-0.20 0.84 0.80 0.04 1.04 0.20 2 1.16 -0.20 1.04
0.00 1.00 1.00 0 1 0 2 1 0.00 1
0.20 1.24 1.20 0.04 1.04 0.20 2 0.76 0.20 1.04
0.40 1.56 1.40 0.16 1.16 0.40 2 0.44 0.40 1.16
0.60 1.96 1.60 0.36 1.36 0.60 2 0.04 0.60 1.36
0.80 2.44 1.80 0.64 1.64 0.80 2 0.44 0.80 1.64
1.00 3.00 2.00 1 2 1 2 1 1.00 2
Σ Σ Σ Σ

Fitness 4.40 6.00 9.48 15.40

: Best in Gen 0 42
Mutation

Mutation:
/

picking “2”
as mutation
point

43
Citation: part of the pictures used as examples are taken from: www.genetic-programming.com/c2003lecture1modified.ppt
Crossover
Crossover:

picking “+”
subtree and
leftmost “x” as
crossover points

Citation: example taken from: www.genetic-programming.com/c2003lecture1modified.ppt 44

Generation 1

Second offspring
First offspring of of crossover of
Mutant of (c) crossover of (a) (a) and (b)
and (b) picking “+” of
Copy of (a) picking “2” picking “+” of parent (a) and
as mutation parent (a) and
point left-most “x” of
left-most “x” of parent (b) as
parent (b) as crossover points
crossover points
45
Citation: part of the examples is taken from: www.genetic-programming.com/c2003lecture1modified.ppt
X Y X+1 |X+1- 1 |1-Y| X |X-Y| X2+X |
Y| +1 X2+X+1
-Y|
-1.00 1.00 0 1 1 0 -1.00 2 1 0
-0.80 0.84 0.20 0.64 1 0.16 -0.80 1.64 0.84 0
-0.60 0.76 0.40 0.36 1 0.24 -0.60 1.36 0.76 0
-0.40 0.76 0.60 0.16 1 0.24 -0.40 1.16 0.76 0
-0.20 0.84 0.80 0.04 1 0.16 -0.20 1.04 0.84 0
0.00 1.00 1.00 0 1 0 0.00 1 1 0
0.20 1.24 1.20 0.04 1 0.24 0.20 1.04 1.24 0
0.40 1.56 1.40 0.16 1 0.56 0.40 1.16 1.56 0
0.60 1.96 1.60 0.36 1 0.96 0.60 1.36 1.96 0
0.80 2.44 1.80 0.64 1 1.44 0.80 1.64 2.44 0
1.00 3.00 2.00 1 Σ 1 2 Σ1.00 2 Σ 3 0 Σ

Fitness 4.40 6.00 15.40 0.00

Found!
: 46
GA & Classification
Classify customers based on number of
children and salary:
Parameter # of children Salary
(NOC) (S)
Domain 0…10 0…500000

Syntax of NOC = x S=x

atomic NOC < x S<x
expression
NOC <= x S>x
NOC > x
NOC >= x
Citation: data table is taken from prof. Anita Wasilewska previous years course slides 47
GA & Classification Rules
 A classification rule is of the form (the rule is in
a predicate form – see course lectures):

IF formula THEN class=ci

Antecedent Consequence

48
Formula representation
 Possible rule:
 If (NOC = 2) AND ( S > 80000) then GOOD (customer)
Formula Class

AND

= >

NOC 2 S 8000
0 course slides
Citation: the example is taken from prof. Anita Wasilewska previous years
49
Initial data table
Nr. Number of children Salary(S) Type of customer (C)
Crt. (NOC)
1 2 > 80000 GOOD
2 1 > 30000 GOOD
3 0 = 50000 GOOD
4 >2 < 10000 BAD
5 = 10 = 30000 BAD
6 =5 < 30000 BAD
50
Initial data (written as rules inferred
from the initial table)
 Rule 1: If (NOC = 2) AND ( S > 80000) then C = GOOD
 Rule 2: If (NOC = 1) AND ( S > 30000) then C = GOOD
 Rule 3: If (NOC = 0) AND ( S = 50000) then C = GOOD
 Rule 4: If (NOC > 2) AND ( S < 10000) then C = BAD
 Rule 5: If (NOC = 10) AND ( S = 30000) then C = BAD
 Rule 6: If (NOC = 5) AND ( S < 30000) then C = BAD

51
Generation 0
 Population of 3 randomly created individuals:
 If (NOC > 3) AND ( S > 10000) then C = GOOD
 If (NOC > 1) AND ( S > 30000) then C = GOOD
 If (NOC >= 0) AND ( S < 40000) then C = GOOD

 We want to find a more general (if it is possible

the most general) “characteristic description” for
class GOOD => assign predicted class GOOD
for all individuals

52
Generation 0
AND
Individual
1
> >

NOC 3 S 1000
0
(NOC > 3) AND ( S > 10000)

Individual AND Individual AND

2 3 >= <
> >

NOC 1 S 3000 NOC 0 S 4000

0 0
(NOC > 1) AND ( S > 30000) (NOC >= 0) AND ( S < 40000)

53
Fitness function

 For one rule (IF A THEN C):

|AUC|
CF (Confidence factor) = |A|

 |A| = number of records that satisfy A

 |AUC| = number of records that satisfy A
and are in predicted class C

Citation: the confidence formula is taken from class slides: http://www.cs.sunysb.edu/~cse634/lecture_notes/07association.pdf

54
Fitness function – Generation 0
Rule 1: If (NOC = 2) AND ( S > 80000) then GOOD
Rule 2: If (NOC = 1) AND ( S > 30000) then GOOD
Rule 3: If (NOC = 0) AND ( S = 50000) then GOOD
Rule 4: If (NOC > 2) AND ( S < 10000) then BAD
Rule 5: If (NOC = 10) AND ( S = 30000) then BAD
Rule 6: If (NOC = 5) AND ( S < 30000) then BAD

Fitness of Individual 1: If (NOC > 3) AND ( S > 10000) then GOOD

|A| = 2 (Rule 5 & 6), |AUC| = 0, CF = 0 / 2 = 0
Fitness of Individual 2: If (NOC > 1) AND ( S > 30000) then GOOD
|A| = 1 (Rule 1), |AUC| = 1, CF = 1 / 1 = 1 Best in Gen 0
Fitness of Individual 3: If (NOC >= 0) AND ( S < 40000) then GOOD
|A| = 4 (Rule 2 & 4 & 5 & 6), |AUC| = 1, CF = 1 / 4 = 0.25

55
Mutation

AND Mutation AND

>= < > <

NOC 0 S 4000 NOC 0 S 9000

0 0
(NOC >= 0) AND ( S < 40000) (NOC > 0) AND ( S < 90000)

56
Crossover
AND AND

>
> > <

S 30000 S
NOC 1 NOC 1 4000
0
(NOC > 1) AND ( S < 40000)
(NOC > 1) AND ( S > 30000)

Crossover

AND AND

>= < >= >

NOC 0 S 4000 NOC 0 S 3000

0 0
(NOC >= 0) AND ( S < 40000) (NOC >= 0) AND ( S > 30000)
57
Generation 1
Individual AND Individual AND
1 2
>= >
> <

NOC 0 S 3000
NOC 1 S 4000
0
0
(NOC > 1) AND ( S < 40000) (NOC >= 0) AND ( S > 30000)

Individual AND
3
> <

NOC 0 S 9000
0
(NOC > 0) AND ( S < 90000)

58
Fitness function – Generation 1
Rule 1: If (NOC = 2) AND ( S > 80000) then GOOD
Rule 2: If (NOC = 1) AND ( S > 30000) then GOOD
Rule 3: If (NOC = 0) AND ( S = 50000) then GOOD
Rule 4: If (NOC > 2) AND ( S < 10000) then BAD
Rule 5: If (NOC = 10) AND ( S = 30000) then BAD
Rule 6: If (NOC = 5) AND ( S < 30000) then BAD

Individual 1: If (NOC > 1) AND ( S < 40000) then GOOD

|A| = 2 (Rule 4 & 5 & 6), |A&C| = 0, CF = 0 / 2 = 0
Individual 2: If (NOC >= 0) AND ( S > 30000) then GOOD
|A| = 3 (Rule 1 & 2 & 3), |A&C| = 3, CF = 3 / 3 = 1Best in Gen 1
Individual 3: If (NOC > 0) AND ( S < 90000) then GOOD
|A| = 5 (Rule 1 & 2 & 4 & 5 & 6), |A&C| = 1, CF = 1 / 5 = 0.2

59
GA Operators on Rules –
Flockharts’s paper
approach

By: Marcela Boboila

60
 I.W. Flockhart and N.J. Radcliffe. “GA-MINER: parallel data
mining with hierarchical genetic algorithms - final report”.
EPCC-AIKMS-GAMINER -Report 1.0. University of
Edinburgh, UK, 1995.
http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/3487/
http:zSzzSzwww.quadstone.co.ukzSz~ianzSzaikmszSzreport.pdf/flockhart95gaminer.pdf

 I. W. Flockhart and N. J. Radcliffe, "A genetic algorithm-based

approach to data mining," in The Second International
Conference on Knowledge Discovery and Data Mining (KDD-
96), (Portland, OR), p. 299-302, AAAI Press, Aug. 2-4 1996.
http://citeseer.ist.psu.edu/cache/papers/cs/3487/
http:zSzzSzwww.quadstone.co.ukzSz~ianzSzaikmszSzkdd96a.pdf/
flockhart96genetic.pdf

61
From rules to subset descriptions
 Step 1: We have the following rules, that
describe part of the data table:
Rule 1: A1 => C
Rule 2: A2 => C
…
Rule n: An => C
 Step 2: (A1 U A2 … U An) => C
 Step 3: We look only at the antecedent to get
the subset description:
(A1 U A2 … U An)
62
Part of the data table. An example
Nr. Age Hobby Class C
Crt.
1 20 .. 30 dancing GOOD
2 25 .. 55 reading GOOD

Rule 1: If Age = 20 .. 30 AND Hobby = dancing then

GOOD
A1 C

Rule 2: If Age = 25 .. 55A2

AND Hobby = reading then C
GOOD
A1 U A2 C
63
From rules to subset descriptions.
An example
 Step 1: We have the rules:
 Rule 1: If Age = 20 .. 30 AND Hobby = dancing then GOOD
 Rule 2: If Age = 25 .. 55 AND Hobby = reading then GOOD
 Step 2: We combine the antecedent part to form a single
rule describing the “subset” of individuals in the same class:
 If ((Age = 20 .. 30 AND Hobby = dancing) OR (Age = 25 .. 55 AND
Hobby = reading)) then GOOD
 Step 3: subset description = antecedent part:
 {Age = 20 .. 30 AND Hobby = dancing} OR {Age = 25 .. 55 AND
Hobby = reading}

64
Subset description
or

and and

Term
Age = 20 .. 30 Hobby = dancing Age = 25 .. 55 Hobby = reading

Clause
65
Subset description
 Chromosomes represented as subset descriptions.
 Subsets consist of disjunction and conjunction of
attribute value or attribute range constraints:
Subset Description: {Clause} [or Clause]
Clause: {Term} [and Term]
Term: Attribute in Value Set
| Attribute in Range
 E.g.: {Age = 20 .. 30 and Hobby = dancing} or {Age =
25 .. 55 and Hobby = reading}

66
Crossover
 Apply crossover at all levels, successively:
 Subset description crossover
 Clause crossover (uniform or single-point)
 Term crossover

67
Subset description crossover
Clause A1 OR Clause A2 OR Clause A3

Clause Clause rBias %

Crossover Crossover

Clause B1 OR Clause B2 OR Clause B4

(1 – rBias) %

Clause C1 OR Clause C2 OR Clause C3 OR Clause C4

68
Subset description crossover
 Consider the following 2 descriptors
(chromosomes):
A : Clause A1 or Clause A2 or Clause A3
 B : Clause B1 or Clause B2 or Clause B4

 Apply clause crossover (uniform or single-point)

to cross clause A1 with B1, and A2 with B2.
 For clauses with no partner:
 Include A3 with probability rBias (first parent).
 Include B4 with probability 1-rBias (second
parent).
69
Uniform clause crossover
Age = 20 .. 30 AND Height = 1.5 .. 2.0

Term
Crossover rBias %

Hobby = dancing AND Age = 0 .. 25

(1 – rBias) %

Hobby = dancing AND Age = .. AND Height = 1.5 .. 2.0

70
Uniform clause crossover
 Consider the clauses:
 A : Age = 20 .. 30 and Height = 1.5 .. 2.0
 B : Hobby = dancing and Age = 0 .. 25

 Align clauses with respect to terms:

 A: Age = 20 .. 30 and Height = 1.5 .. 2.0
 B : Hobby = dancing and Age = 0 .. 25

 Apply term crossover between Age terms

 Include:
 Height term (with no partner) in the child with probability rBias.
 Hobby term (with no partner) in the child with probability (1–
rBias).
71
Single-point clause crossover
Crossover Age = 20 .. 30 AND Height = 1.5 .. 2.0
Point

Hobby = dancing AND Age = 0 .. 25

Age = 0 .. 25

From first child From second child

72
Single-point clause crossover
 Consider the clauses:
 A : Age = 20 .. 30 and Height = 1.5 .. 2.0
 B : Hobby = dancing and Age = 0 .. 25

 Align clauses with respect to terms:

 A: Age = 20 .. 30 and Height = 1.5 .. 2.0
 B : Hobby = dancing and Age = 0 .. 25

 E.g.: consider crossover point between Hobby and

Age:
 child takes terms to the left of the crossover point in clause A,
and terms to the right of the crossover point in clause B:
Child C : Age = 0 .. 25
73
Term crossover – value terms
Hobby = dancing, singing
rBias %

Hobby = dancing, hiking

(1 – rBias) %

Hobby = dancing, singing, hiking

74
Term crossover – range terms
Age = 20 .... 30

rBias %

Age = 0 .... 25

(1 – rBias) % (1 – rBias) %

Age = low limit, high limit

75
Term crossover
 Used to combine two terms concerning the same
attribute.
 Consider the clauses:
A : Hobby = dancing, singing and Age = 20 .. 30
 B : Hobby = hiking, dancing and Age = 0 .. 25
 How to form child:
 Value terms:
 Include values common to both parents: e.g.: dancing
 Include values unique to one parent with a probability:
e.g.: rBias for singing and 1-rBias for hiking
 Range terms:
 Select low and high limit with a probability:
 Low limit for Age: rBias for value 20 and 1-rBias for value 0
 High limit for Age: rBias for value 30 and 1-rBias for value 25
 Later prune (get rid of) non-valid ranges. 76
Mutation
 Apply mutation at all levels, successively:
 Subset description mutation
 Clause mutation
 Term mutation

77
Subset description mutation
Clause A1 OR Clause A2 OR Clause A3

Clause Clause Clause

mutation mutation mutation

Clause A1’ OR Clause A2’ OR Clause A3’

Add or delete ? Do add/delete clause ?

pCls %
50 % (equal prob)

Add: Clause A1’ OR Clause A2’ OR Clause A3’ OR Clause A4’

Delete: Clause A1’ OR Clause A2’ OR Clause A3’

78
Subset description mutation
 Consider the following descriptor (chromosome):
 A : Clause A1 or Clause A2 or Clause A3 or Clause A4
 Steps:
1. Apply clause mutation on each clause: on A1, A2, A3 and
A4.
2. Decide with probability pCls to do or not do an add/delete
clause operation.
3. If add/delete has been decided, either add a new clause or
delete an existing clause with equal probability (50%)
 deletion: pick a clause at random and delete it.
 adding: generate a new clause at random (from random possible
attributes with random values/ranges assigned)
79
Clause mutation
Hobby = dancing AND Age = 20 .. 30

Term Term
mutation mutation

Term Hobby’ AND Term Age’

Add or delete ? Do add/delete term ?

pTerm %
50 % (equal prob)

Add: Term Hobby’ AND Term Age’ AND Term X

Delete: Term Hobby’ AND Term Age’

80
Clause mutation
 Consider the following clause:
 Hobby = dancing and Age = 20 .. 30
 Steps:
1. Apply term mutation on each term.
2. Decide with probability pTerm to do or not do an add/delete
term operation.
3. If add/delete has been decided, either add a new term or
delete an existing term with equal probability (50%)
 deletion: pick a term at random and delete it.
 adding: generate a new term at random

81
Term mutation - Value
Hobby = dancing

Do term mutation?

rMutTerm %

Attribute or value/range?
rAvr % (1 – rAvr) %
Attribute mutation Value mutation

Occupation = student Hobby = swimming

82
Term mutation - Range
Age = 10 .. 50

Do term mutation?

rMutTerm %

Attribute or value/range?
rAvr % (1 – rAvr) %
Attribute mutation Range mutation

Occupation = student Age = 3 .. 25

83
Term mutation
 First decide with a probability rMutTerm to mutate this
term or not.
 If term mutation decided, do with a probability either
attribute mutation, or value/range mutation.
 Consider the following term: Hobby = dancing
 Attribute mutation: randomly choose another attribute
available, e.g. occupation, and a random value for it: e.g.
student. New term: occupation = student
 Value mutation: randomly choose another value for current
attribute. E.g.: swimming. New term: Hobby = swimming
 Consider the following term: Age = 10 .. 50
 Range mutation: randomly choose another range for
current attribute. E.g.: 3 .. 25. New term: Age = 3 .. 25
84

Biological Science PDF
60% (5)
Biological Science PDF
1,124 pages
Topic 4 - Evolutionary Algorithm
No ratings yet
Topic 4 - Evolutionary Algorithm
100 pages
Soft Computing - Dr. H.S. Hota 28.08.14
No ratings yet
Soft Computing - Dr. H.S. Hota 28.08.14
216 pages
Genetic Algorithms
100% (2)
Genetic Algorithms
41 pages
Introduction To GA and SGA
No ratings yet
Introduction To GA and SGA
66 pages
Unit 3: Contents
No ratings yet
Unit 3: Contents
74 pages
Lecture Notes
No ratings yet
Lecture Notes
78 pages
SC - GA - 7th Sem
No ratings yet
SC - GA - 7th Sem
48 pages
Class 20 21 Unlocked
No ratings yet
Class 20 21 Unlocked
57 pages
Introduction To Genetic Algorithms
100% (6)
Introduction To Genetic Algorithms
82 pages
Human Evolution
100% (13)
Human Evolution
374 pages
Lecture 09 EGA
No ratings yet
Lecture 09 EGA
43 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
38 pages
Genetic Algorithms: and Other Approaches For Similar Applications
No ratings yet
Genetic Algorithms: and Other Approaches For Similar Applications
83 pages
ML - Unit1 - GA SKG
No ratings yet
ML - Unit1 - GA SKG
78 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
83 pages
Chapter 4 - GA
No ratings yet
Chapter 4 - GA
73 pages
Animal Behaviour - Animal Courtship
100% (2)
Animal Behaviour - Animal Courtship
121 pages
Genetic Algorithms: and Other Approaches For Similar Applications
No ratings yet
Genetic Algorithms: and Other Approaches For Similar Applications
83 pages
BTech 2024 ML Genetic Algorithms
No ratings yet
BTech 2024 ML Genetic Algorithms
40 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
47 pages
Lec 15 - GA
No ratings yet
Lec 15 - GA
30 pages
Genetic Algo SC
No ratings yet
Genetic Algo SC
42 pages
GA Lecture1
No ratings yet
GA Lecture1
47 pages
14 Genetic Algorithm
No ratings yet
14 Genetic Algorithm
70 pages
Genetic Algorithm
100% (1)
Genetic Algorithm
40 pages
Long-Summer Semester 2022-23 CSE2009 ETH AP2022238000006 Reference Material I 04-Jul-2023 Module 5
No ratings yet
Long-Summer Semester 2022-23 CSE2009 ETH AP2022238000006 Reference Material I 04-Jul-2023 Module 5
55 pages
22AIP3101A Session 8
No ratings yet
22AIP3101A Session 8
51 pages
AI L5 - Genetic Algorithms
No ratings yet
AI L5 - Genetic Algorithms
17 pages
Introduction To GA and SGA
No ratings yet
Introduction To GA and SGA
48 pages
Plga
No ratings yet
Plga
35 pages
1) Genetic Algorithm
No ratings yet
1) Genetic Algorithm
207 pages
Genetic Algorithms: and Other Approaches For Similar Applications
100% (1)
Genetic Algorithms: and Other Approaches For Similar Applications
83 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
30 pages
AML Unit 4
No ratings yet
AML Unit 4
22 pages
Introduction To Genetic Algorithms (GA)
No ratings yet
Introduction To Genetic Algorithms (GA)
46 pages
M. Younes - M. Rahli - A. Koridak: Economic Power Dispatch Using Evolutionary Algorithm
No ratings yet
M. Younes - M. Rahli - A. Koridak: Economic Power Dispatch Using Evolutionary Algorithm
6 pages
Guided Notes - The Process of Evolution
25% (4)
Guided Notes - The Process of Evolution
6 pages
4 e 4 F 9 Genetic 22
No ratings yet
4 e 4 F 9 Genetic 22
83 pages
Institute of Southern Punjab Multan: Syed Zohair Quain Haider Lecturer ISP Multan
No ratings yet
Institute of Southern Punjab Multan: Syed Zohair Quain Haider Lecturer ISP Multan
47 pages
Introduction To Genetic Algorithm
No ratings yet
Introduction To Genetic Algorithm
26 pages
Introduction To Genetic Algorithms 1
No ratings yet
Introduction To Genetic Algorithms 1
44 pages
Department of Computer Science & Engineering University Institute of Technology Rajiv Gandhi Proudhyogiki Vishwvidyalaya
No ratings yet
Department of Computer Science & Engineering University Institute of Technology Rajiv Gandhi Proudhyogiki Vishwvidyalaya
52 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
40 pages
GA Lectures-2019
No ratings yet
GA Lectures-2019
22 pages
Genetic Algorithm 2
No ratings yet
Genetic Algorithm 2
41 pages
Genetic Algorithms (Gas) : by Mutaz Flmban
No ratings yet
Genetic Algorithms (Gas) : by Mutaz Flmban
48 pages
Swarm Unit2
No ratings yet
Swarm Unit2
12 pages
General Biology 2 PowerPoint
No ratings yet
General Biology 2 PowerPoint
49 pages
Path Planning For A Robot Arm Using Genetic Algorithm
No ratings yet
Path Planning For A Robot Arm Using Genetic Algorithm
55 pages
Genetic Algorithms
No ratings yet
Genetic Algorithms
43 pages
Foundations of Learning and Adaptive Systems: Evolutionary Algorithms
No ratings yet
Foundations of Learning and Adaptive Systems: Evolutionary Algorithms
26 pages
Genetic Algorithms: GA Quick Overview
No ratings yet
Genetic Algorithms: GA Quick Overview
32 pages
Syllabus Biological Sciences
No ratings yet
Syllabus Biological Sciences
10 pages
Genetic Algo
No ratings yet
Genetic Algo
28 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
23 pages
BIOL 211 1 Roe Jflagg MC Practice Questions - SV
No ratings yet
BIOL 211 1 Roe Jflagg MC Practice Questions - SV
5 pages
Genetic Algorithms
No ratings yet
Genetic Algorithms
20 pages
Introduction To Genetic Algorithms (GA)
No ratings yet
Introduction To Genetic Algorithms (GA)
14 pages
09 Ga
No ratings yet
09 Ga
9 pages
1
No ratings yet
1
8 pages
Evolutionary Biology
No ratings yet
Evolutionary Biology
13 pages
Rules and Exceptions in Biology From Fundamental Concepts To Applications
No ratings yet
Rules and Exceptions in Biology From Fundamental Concepts To Applications
557 pages
Darwins Natural Selection Worksheet
No ratings yet
Darwins Natural Selection Worksheet
2 pages
TSH CAPE Biology Unit 1 Class Notes 13
No ratings yet
TSH CAPE Biology Unit 1 Class Notes 13
12 pages
Signature Assignment Science Lesson Plan
No ratings yet
Signature Assignment Science Lesson Plan
18 pages
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet
Complex Adaptive Systems
No ratings yet
Complex Adaptive Systems
24 pages
6th Sem QSN
No ratings yet
6th Sem QSN
15 pages
Genetic Programming: Fundamentals and Applications
From Everand
Genetic Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet
Genetic Algorithm: Fundamentals and Applications
From Everand
Genetic Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
State Space Search: Fundamentals and Applications
From Everand
State Space Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
CSEC Biology Solutions (Variation)
No ratings yet
CSEC Biology Solutions (Variation)
4 pages
Lesson 2 Zoology As Part of Biology
No ratings yet
Lesson 2 Zoology As Part of Biology
3 pages
Evolutionary Computation: Fundamentals and Applications
From Everand
Evolutionary Computation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Genetic Drift
No ratings yet
Genetic Drift
12 pages
Evolutionary Robotics: Fundamentals and Applications
From Everand
Evolutionary Robotics: Fundamentals and Applications
Fouad Sabry
No ratings yet
Quantitative Inheritance
No ratings yet
Quantitative Inheritance
4 pages
A TEMPLETON CONVERSATION - Does Evolution Explain Human Nature
No ratings yet
A TEMPLETON CONVERSATION - Does Evolution Explain Human Nature
24 pages
Assessment Schedule - 2016 Biology: Demonstrate Understanding of Evolutionary Processes Leading To Speciation (91605)
No ratings yet
Assessment Schedule - 2016 Biology: Demonstrate Understanding of Evolutionary Processes Leading To Speciation (91605)
6 pages
Darwin's Finches and Natural Selection: Enamul Haque
No ratings yet
Darwin's Finches and Natural Selection: Enamul Haque
34 pages
Assignment I 4TH Sem 2024
No ratings yet
Assignment I 4TH Sem 2024
2 pages
Understanding Culture, Society and Politics: Grade
No ratings yet
Understanding Culture, Society and Politics: Grade
29 pages
Module 1
No ratings yet
Module 1
5 pages
Davies Et Al-2016-Ecology and Evolution PDF
No ratings yet
Davies Et Al-2016-Ecology and Evolution PDF
16 pages
Darwinian Revolution
No ratings yet
Darwinian Revolution
16 pages
Poultry Genetics and Breeding in Developing Countries
No ratings yet
Poultry Genetics and Breeding in Developing Countries
16 pages
Publications of John Maynard Smith
No ratings yet
Publications of John Maynard Smith
15 pages
What Is Artifical Selection
No ratings yet
What Is Artifical Selection
4 pages