ls4 PDF
ls4 PDF
2, May 2011
Abstract
Neural Networks are one of many data mining analytical tools that can be utilized to make predictions for
medical data. Model selection for a neural network entails various factors such as selection of the optimal
number of hidden nodes, selection of the relevant input variables and selection of optimal connection
weights. This paper presents the application of hybrid model that integrates Genetic Algorithm and Back
Propatation network(BPN) where GA is used to initialize and optmize the connection weights of BPN .
Significant feactures identified by using two methods :Decision tree and GA-CFS method are used as
input to the hybrid model to diagonise diabetes mellitus. The results prove that, GA-optimized BPN
approach has outperformed the BPN approach without GA optimization. In addition the hybrid GA-BPN
with relevant inputs lead to further improvised categorization accuracy compared to results produced by
GA-BPN alone with some redundant inputs.
KEYWORDS
Back Propagation Network, Genetic algorithm, connection weight optimisation.
1. INTRODUCTION
With the computerization in hospitals, a huge amount of data is collected. Although human
decision-making is often optimal, it is poor when there are huge amounts of data to be classified.
Medical data mining has great potential for exploring hidden patterns in the data sets of medical
domain. These patterns can be used for clinical diagnosis. Neural Networks are one of many data
mining analytical tools that can be utilized to make predictions for medical data. BPN uses the
gradient based approach which either trains slowly or get struct with local minimum. Instead of
using gradient-based learning techniques, one may apply the commonly used optimization
methods such as Genetic Algorithms (GAs), Particle swarm optimization (PSO), Ant Colony
optimization to find the network weights. GA is a stochastic general search method, capable of
effectively exploring large search spaces, and has been used with Back Propagation Network
(BPN) for determining the various parameters such as number of hidden nodes and hidden
DOI : 10.5121/ijsc.2011.2202 15
International Journal on Soft Computing ( IJSC ), Vol.2, No.2, May 2011
layers, select relevant feature subsets, the learning rate, the momentum, and initialize and
optimize the network connection weights. This paper presents the application of hybrid model
that integrates Genetic Algorithm and BPN for diagnosis of Pima Indians Diabetes Database by
finding the optimal network connection wieghts. For sake of completeness, BPN and GA have
been explained in section 2 and 3 respectively. Section 4 elloborates the hybrid GA-BPN model
and its applications in diverse fields. The high dimension data not only confuse the classifier but
also increases testing and training time of BPN. Hence significant feature have been identified by
two different methods: Decision tree and Correlation based feature selection. Feature selection
has been explained in section 5 and the data model used for the experiment has been discussed in
section 6 followed by results and conclusion in section 6 and 7 respectively.
In each generation, the population is evaluated using fitness function. Next comes the selection
process, where in the high fitness chromosomes are used to eliminate low fitness chromosomes.
The commonly used methods for reproduction or selection are Roulette-wheel selection,
Boltzmann selection, Tournament selection, Rank selection and Steady-state selection. But
selection alone does not produce any new individuals into the population. Hence selection is
followed by crossover and mutation operations. Crossover is the process by which two-selected
chromosome with high fitness values exchange part of the genes to generate new pair of
chromosomes. The crossover tends to facilitate the evolutionary process to progress toward
potential regions of the solution space. Different types of crossover by and large used are one
point crossover, two-point crossover, uniform crossover, multipoint crossover and average
crossover. Mutation is the random change of the value of a gene, which is used to prevent
16
International Journal on Soft Computing ( IJSC ), Vol.2, No.2, May 2011
premature convergence to local optima. Major ways that mutation is accomplished are random bit
mutation, random gene mutation, creep mutation, and heuristic mutation. The new population
generated undergoes the further selection, crossover and mutation till the termination criterion is
not satisfied. Convergence of the genetic algorithm depends on the various criterions like
fitness value achieved or number of generations [6-7].
The hybrid GA-ANN has been used in the diverse applications. GA has been used to search for
optimal hidden-layer architectures, connectivity, and training parameters (learning rate and
momentum parameters) for ANN for predicting community-acquired pneumonia among patients
with respiratory complaints [15]. GA has been used to initialize and optimize the connection
weight of ANN to improve the performance ANN and is applied in a medical problem for
predicting stroke disease[16]. GA has been used to optimize the ANN parameters namely:
learning rate, momentum coefficient, Activation function, Number of hidden layers and number
of nodes for worker assignment into Virtual Manufacturing Cells(VMC) application[17]. GA-
ANN model has been experimented for of study of the heat transport characteristics of a
nanofluid thermosyphon in a magnetic field where, GA is used to optimize the number of
neurons in the hidden layer, the coefficient of the learning rate and the momentumof ANN[18].
The current paper illustrates the application of GA for initializing and optimizing the connection
weights of BPN and has been experimented for PIMA dataset. The foremost step of the GA is
representation of the chromosome. For the BPN with single hidden layer with m nodes, n input
nodes and p output nodes the number of weights to be computed is given by (n+p)* m. Each
chromosome is made up of (n+p)*m number of genes.
17
International Journal on Soft Computing ( IJSC ), Vol.2, No.2, May 2011
Genes are represented by real number encoding method. The original population is a set of N
chromosome, which is generated randomly. Fitness of each chromosome is computed by
minimum optimization method. Fitness is given by Fitness (Ci) = 1/ E for each chromosome of
the population, where E is the error computed as root mean square error at the output layer as
shown in equation 1 , where summation is performed over all output nodes pj and tj is the desired
or target value of output oj for a given input vector.
1
E =
2
∑ ∑
p j
( t pj − o pj )2 (1)
Once fitness is computed for the all the chromosomes, the best-fit chromosomes replace the worst
fit chromosomes. Further crossover step is experimented using single point crossover, two-point
crossover and multi point crossover. In addition a new type of crossover called mixed crossover
has been used where for the given number of generation M, first 60% generation we applied
multipoint crossover, followed by next 20% generation using two point crossover and remaining
using one point crossover. Finally mutation is applied as the last step to generate the new
population. The new population is given as input to PN to compute the fitness of each
chromosome , followed by process of selection, reproduction, cross over and mutations to
18
International Journal on Soft Computing ( IJSC ), Vol.2, No.2, May 2011
generate the next population. This process is repeated till more or less all the chromosomes
converge to the same fitness value. The weights represented by the chromosome in the final
converged population are the optimized connection weights of the BPN. The working of hybrid
GA-ANN for optimizing connection weights is shown in figure 1.
5. FEATURE SELECTION
Feature subset selection is of great importance in the field of data mining. The high dimension
data makes testing and training of general classification methods difficult. Feature selection is an
essential pre-processing method to remove irrelevant and redundant data. It can be applied in both
unsupervised and supervised learning. In supervised learning, feature selection aims to maximize
classification accuracy. The goal of feature selection for unsupervised learning is to find the
smallest feature subset that best uncovers clusters form data according to the preferred criterion
[19]. Authors have used two filters approaches namely Gain ratio and Correlation based feature
selection for identifying relevant features. Decision tree is a simple tree like structure where non-
terminal nodes represent tests on one or more attributes and terminal nodes reflect decision
outcomes. The non-terminal nodes are taken as relevant features. The basic decision tree
induction algorithm ID3 was enhanced by C4.5. The WEKA classifier package has its own
version of C4.5 known as J4.8. In the first method adopted for attribute selection, the authors
have used J4.8 to identify the significant attributes [20]. In second method, the authors have used
GA and Correlation based feature selection (CFS) in a cascaded fashion, where GA rendered
global search of attributes with fitness evaluation effected by CFS. Genetic algorithm is used as
search method with Correlation based feature selection as subset evaluating mechanism[21].
Experimental results show that the feature subsets selected by CFS filter resulted in marginal
improvement for back propagation neural network classification accuracy when compared to
feature subset selected by DT for PIMA dataset [22].
19
International Journal on Soft Computing ( IJSC ), Vol.2, No.2, May 2011
7.Results
The number of generations, population size, number of nodes in input layer used with different
number of hidden nodes in the hidden layer experimented for PIMA dataset is shown in Table 1.
Among the various topologies experimented, the best performance of GA-BPN with 4 types of
crossover with all 8 inputs is for 8-20-1 topology, with 5 inputs (Plasma, diastolic blood
pressure, Body mass index, diabetes pedigree function and age) identified by DT (Plasma,
insulin, Body mass index and Age) is for 5-15-1 topology and with 4 inputs (Plasma, insulin,
Body mass index and Age) identified by GA-CFS is 4-10-1 topology is shown in table 2,3 and 4
respectively. For 8 inputs, the two points and multiple crossover resulted in slight improved
accuracy compared to single point and mixture cross over. With 5 and 4 inputs, the single point
and mixture cross over resulted in slight improved accuracy compared to two points and multiple
crossover. Further table 3 and 4 shows the significance of relevant inputs identified by
DT and GA-CFS for improving the classfication of GA-BPN when compared with 8 inputs
shown in table 2. Results of GA-BPN were compared with BPN alone with all 8 inputs, 5 inputs
identified by DT and with 4 inputs identified by GA-CFS shown in table 5.
7.Conclusions
In this paper, application of hybrid GA_BPN has been experimented for classification of PIMA
dataset. Back propagation learns by making modifications in weight values by using gradient
method starting at the output layer then moving backward through the hidden layers of the
network and hence is prone to lead to troubles such as local minimum problem, slow convergence
pace and convergence unsteadiness in its training procedure. The optimal network connection
weights can be obtained by using hybrid GA-BPN. GA is a stochastic general search method,
capable of effectively exploring large search spaces, is used with BPN for determining the
optimized connection weights of BPN. The hybrid GA_BPN shows substantial improvement in
classfication accuracy of BPN. Significant features selected by DT and GA-CFS further enhanced
classfication accuracy of GA-BPN.
20
International Journal on Soft Computing ( IJSC ), Vol.2, No.2, May 2011
REFERENCES
[1] S. Haykin,(1994), Neural Networks- A comprehensive foundation, Macmillan Press, New York.
[2] D.E. Rinehart, G.E. Hinton, and R. J. Williams, (1986), Learning internal representations by
errorpropagation, In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing,
Cambridge, MA: MIIT Press.
[3] H. Lu, R. Setiono and H. Liu, (1996),”Effective data mining using neural networks”, IEEE Trans.
On Knowledge and Data Engineering.
21
International Journal on Soft Computing ( IJSC ), Vol.2, No.2, May 2011
22
International Journal on Soft Computing ( IJSC ), Vol.2, No.2, May 2011
AUTHORS
Asha Gowda Karegowda received her MCA degree and M.Phil in Computer Science in 1998
and 2008 from Bangalore University and Madurai Kamraj University, India respectively. She is
currently pursuing her Ph.D under Visvesvaraya Technological University, Belgaum,India. She
is working as Associate Professor in the Dept of Master of Computer Applications, Siddaganga
Instituteof Technology, Tumkur, India. Her research interests are soft computing, image analysis
and medical data mining. She has published few papers in International conferences and
International Journals.
A.S. Manjunath received his M.Tech and PhD in Computer Science 1988 and 2003 from Mysore
University and Bangalore University, India respectively. He is working as Professor in the Dept
of Computer Science and Engineering, Siddaganga Institute of Technology,Tumkur, India. Her
research interests are Embedded Systems and solutions, Networking and communications and soft
computing. He has published few papers in International conferences and International Journals.
M.A. Jayaram received his M.Tech in Civil ,MCA degree and PhD from Bangalore University
, IGNOU University , and Visvesvaraya Technological University, Belgaum, India in the
year 1987, 2002 and 2008 respectively. He is working as Director in the Dept of Master of
Computer Applications, Siddaganga Institute of Technology,Tumkur, India. His research
interests are soft computing, image analysis and medical data mining. He has published few
papers in International conferences and International Journals.
23