Genetic algorithms [1] are search and optimization
algorithms based on the principles of natural evolution, which
were first introduced by john Holland in 1970. Genetic
algorithms also implement the optimization strategies by
simulating evolution of species through natural selections.
Genetic algorithm is generally composed of two processes.
First process is selection of individual for the production of
next generation and second process is manipulation of the
selected individual to form the next generation by crossover
and mutation techniques [2]. The selection mechanism
determines which individual are chosen for reproduction and
how many offspring each selected individual produce. The
main principle of selection strategy is the better is an
individual; the higher is its chance of being parent.
Figure 1. Flowchart of GA System
The GAs is computer program that simulate the heredity
Genetic algorithms (GA) are search algorithms and evolution of living organisms [3]. An optimum solution is
based on the principles of natural selection and genetics, possible even for multi modal objective functions utilizing
introduced by J Holland in the 1970’s and inspired by the GAs because they are multi-point search methods. Also, GAs
biological evolution of living beings. Genetic algorithms is applicable to discrete search space problems. Thus, GA is
abstract the problem space as a population of individuals, and not only very easy to use but also a very powerful
try to explore the fittest individual by producing generations optimization tool [4]. In GA, the search space consists of
iteratively. GA evolves a population of initial individuals to a strings, each of which representing a candidate solution to the
population of high quality individuals, where each individual problem and are termed as chromosomes. The objective
represents a solution of the problem to be solved. The quality function value of each chromosome is called its fitness value.
of each rule is measured by a fitness function as the Population is a set of chromosomes along with their
quantitative representation of each rule’s adaptation to a associated fitness. Generations are populations generated in
certain environment. The procedure starts from an initial an iteration of the GA [5]. Genetic algorithm to search a space
population of randomly generated individuals. of candidate solutions to identify the best one is as follows:
During each generation, three basic genetic operators are
sequentially applied to each individual with certain Procedure:
probabilities, i.e. selection, crossover and mutation [3]. {
1. [Start] Generate random population of n chromosomes
two parents. Therefore, it contains the bits that were not used has a component that scores the classification accuracy of the
by the first offspring. rule over a set of provided training examples. Often other
criteria may be included as well, such as the complexity or
2) Two-Point Crossover generality of the rule. More generally, when the bit-string
In two-point crossover, offspring are created by hypothesis is interpreted as a complex procedure (e.g., when
substituting intermediate segments of one parent into the the bit string represents a collection of if-then rules that will
middle of the second parent string. Put another way, the be chained together to control a robotic device), the fitness
crossover mask is a string beginning with n0 zeros, followed function may measure the overall performance of the
by a contiguous string of nl ones, followed by the necessary resulting procedure rather than performance of individual
number of zeros to complete the string. Each time the two- rules.
point crossover operator is applied, a mask is generated by
randomly choosing the integers n0 and nl. For instance, in the V. REVIEW OF LITERATURE
example shown in Figure 3.10 the offspring are created using
a mask for which n0= 2 and nl = 5. Again, the two offspring A novel hybrid genetic k-means algorithm (GKA)
[6] to find a globally optimal partition of a given data into a
are created by switching the roles played by the two parents.
specified number of clusters. The proposed GA circumvent
expensive crossover operator used to generate valid child
3) Uniform Crossover
chromosomes from parent chromosomes. It hybridized the
Uniform crossover combines bits sampled uniformly
GA by using a classical gradient descent algorithm used in
from the two parents, as illustrated in Figure 3. In this case the clustering viz., K-means algorithm. In genetic K-means
crossover mask is generated as a random bit string with each algorithm (GKA), K-means operator was defined and used as
bit chosen at random and independent of the others. a search operator instead of crossover. It defined a biased
mutation operator specific to clustering called distance-based-
mutation. The authors used finite Markov chain theory to
prove that the proposed GKA converges to the global
optimum. It was also observed that GKA searches faster than
some of the other evolutionary algorithms used for clustering.
An improved version of GKA known as Fast
Genetic K-means Algorithm (FGKA) was proposed in [7].
The proposed GA featured several improvements over GKA.
It was evident from experiments in [6][7] that K-means
algorithm might converge to a local optimum, both FGKA
and GKA always converge to the global optimum. FGKA
initializes the population to P0 and obtains the next
population by applying selection, crossover and mutation
operators and it keeps on evolving until some termination
condition is met. Illegal strings are permitted in FGKA during
initialization phase, but were considered as the most
Figure 3. Crossover and Mutation Operations
undesirable solutions by defining their total within cluster
variation (TWCVs) as infinity (+∞). By allowing illegal
C. Mutation Operations strings the overhead of illegal string in the evolution process
In addition to recombination operators that produce was avoided and thus improved the time performance of the
offspring by combining parts of two parents, a second type of algorithm as compared to GKA.
operator produces offspring from a single parent. In
particular, the mutation operator produces small random Incremental Genetic K-means Algorithm (IGKA)
changes to the bit string by choosing a single bit at random, proposed in [8] was an extension to previously proposed
then changing its value. Mutation is often performed after clustering algorithm, the Fast Genetic K- means Algorithm
crossover as in Figure 3. (FGKA). The performance of IGKA was found to be better
when the mutation probability was small. IGKA was based
calculating the Total Within-Cluster Variation (TWCV) and
IV. FITNESS FUNCTION to cluster centroids incrementally whenever the mutation
The fitness function defines the criterion for ranking probability was small for the clustering task. Like FGKA,
potential hypotheses and for probabilistically selecting them IGKA also always converges to the global optimum.
for inclusion in the next generation population. If the task is to
learn classification rules, then the fitness function typically
A GA-based unsupervised clustering technique was generated from the created model are fed to the GA which is
proposed in [9], which selects cluster centers directly from the used to find the corresponding inputs so that automation of
data set, thus speeding up the fitness evaluation process by test cases generation from output domain is completed.
constructing a look-up table in advance and saving the
In [16] introduces a new algorithm based on the
distances between all pairs of data points. Binary
traditional genetic algorithm, for the traditional GA algorithm
representation rather than string representation is used to
the new algorithm has done some improvements: By
encode a variable number of cluster centers and more
introducing genetic selection strategy, decreased the
effective operators for selection, crossover, and mutation were
possibility of being trapped into a local optimum. Compared
the traditional genetic algorithm, the new algorithm enlarges
A novel clustering algorithm for mixed data was the searching space and the complexity is not high. By
proposed in [10].Most of the existing clustering algorithms analyzing the testing results of benchmarks functions
were only efficient for the numeric data rather than the mixed optimization, it is concluded that in the optimization precision,
data set but the proposed GA worked efficiently for datasets the new algorithm is efficiency than the traditional genetic
with mixed values by modifying the common cost function. algorithm. We also use this new algorithm for data
classification and the experiment results shown that our
A hybrid genetic based clustering algorithm, called
proposed algorithm outperforms the KNN with greater
HGA-clustering was proposed in [11] to explore the proper
clustering of data sets. This algorithm, with the cooperation of
tabulist and aspiration criteria, has achieved harmony between Managing groundwater supplies has found AI and
population diversity and convergence speed. GAs useful. [17] Proposed used GAs to fit parameters of a
model to optimize pumping locations and schedules for
A genetic algorithm was proposed in [12] which
groundwater treatment. They then combined the GA with a
designed a dissimilarity measure, termed as Genetic Distance
neural network (NN) to model the complex response
Measure (GDM) to improve the performance of the K-modes
functions within the GA [18]. Then [19] combined Simulated
algorithm which is an extension of k- means.
Annealing (SA) and GAs to maximize efficiency and well use
A novel approach of [13] parallel indexing the color the easily applied parallel nature of the GA. Most recently,
and feature extraction of images and genetic algorithm has [20] together with Peralta used a Pareto GA to sort optimal
been implemented. Its main functionality is image-to-image solutions for managing surface and groundwater supplies,
matching and its intended use is for still-image retrieval. The together with a fuzzy-penalty function while using an
evaluation criteria are provided by the GA and have been Artificial Neural Network (ANN) to model the complex
successfully employed as a measure to evaluate the efficacy aquifer systems in the groundwater system responses.
of content-based image retrieval process.
Evolutionary methods have also found their way into
In [14], P.R. Srivastava and Tai have presented a oceanographic experimental design. In [21] showed that a
method for optimizing software testing efficiency by genetic algorithm is faster than simulated annealing and more
identifying the most critical path clusters in a program. The accurate than a problem specific method for optimizing the
SUT is converted into a CFG. Weights are assigned to the design of an oceanographic experiment. [22] found that an
edges of the CFG by applying 80-20 rule. 80 percentage of evolutionary programming strategy was more robust than
weight of incoming credit is given to loops and branches and traditional methods for locating an array of sensors in the
the remaining 20 percentage of incoming credit is given to the ocean after they have drifted from their initial deployment
edges in sequential path. The summation of weights along the location.
edges comprising a path determines criticality of path. Higher
the summation more critical is path and therefore must be
tested before other paths. In this way by identifying most Genetic Algorithms proved to be better in finding areas of
critical paths that must be tested first, testing efficiency is complex and real world problems. Genetic Algorithms are
increased. adaptive to their environments, as this type of method is a
In [15], Zhao used the neural network and GA for platform appearing in the changing environment. In Present
the functional testing. Neural network is used to create a these algorithms are more applicable. Several improvements
model that can be taken as a function substitute for the SUT. must be made in order that GAs could be more generally
The emphasis is given on the outputs which exhibit the applicable.
