Neurocomputing 61 (2004) 361 379
www.elsevier.com/locate/neucom
Prediction of colon cancer using an evolutionary neural network
Kyung-Joong Kim , Sung-Bae Cho
Department of Computer Science, Yonsei University, 134 Shinchon-dong, Sudaemoon-ku, Seoul 120-749, South Korea Received 24 March 2003; received in revised form 8 November 2003; accepted 19 November 2003
Abstract Colon cancer is second only to lung cancer as a cause of cancer-related mortality in Western countries. Colon cancer is a genetic disease, propagated by the acquisition of somatic alterations that in uence gene expression. DNA microarray technology provides a format for the simultaneous measurement of the expression level of thousands of genes in a single hybridization assay. The most exciting result of microarray technology has been the demonstration that patterns of gene expression can distinguish between tumors of di erent anatomical origin. Standard statistical methodologies in classication and prediction do not work well or even at all when N (a number of samples) p (genes). Modication of conventional statistical methodologies or devise of new methodologies is needed for the analysis of colon cancer. Recently, designing articial neural networks by evolutionary algorithms has emerged as a preferred alternative to the common practice of selecting the apparent best network. In this paper, we propose an evolutionary neural network that classies gene expression proles into normal or colon cancer cell. Experimental results on colon microarray data show that the proposed method is superior to other classiers. c 2003 Published by Elsevier B.V.
Keywords: DNA Microarray; Evolutionary neural network; Colon cancer; Feature selection; Information gain
1. Introduction Recently, the techniques based on oligonucleotide or cDNA arrays allow the expression level of thousands of genes to be monitored in parallel. Critically important
Corresponding author. Tel.: +82221234803; fax: +8223652579.
0925-2312/$ - see front matter c 2003 Published by Elsevier B.V. doi:10.1016/j.neucom.2003.11.008
362
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
thing for cancer diagnosis and treatment is precise prediction of tumors. One of the remarkable advances for molecular biology and for cancer research is DNA microarray technology. DNA microarray datasets have a high dimensionality corresponding to the large number of genes monitored and there are often comparatively few samples. In this paper, we address the problem of prediction of cancer using a small subset of genes from broad patterns of gene expression data. In cancer research, microarray technology allows the better understanding of the regulation of activity of cells and tumors in various states [32]. Prediction, classication, and clustering techniques are used for analysis and interpretation of the microarray data. Colon cancer is the second most common cause of cancer mortality in Western countries [7]. Gene expression in 40 tumor and 22 normal colon tissue samples was analyzed with an A ymetrix oligonucleotide array complementary to more than 6500 human genes. We chose to work only with the 2000 genes of the greatest minimal expression over the samples [1]. Evolutionary articial neural networks (EANNs) combine the learning of neural networks and evolution of evolutionary algorithms [8]. A lot of works have been made on EANNs. For the game of checkers, the evolutionary algorithm can discover a neural network that can be used to play at a near-expert level without injecting expert knowledge about how to play the game [12]. Evolutionary algorithm can be used for various tasks, such as connection weight training, architecture design, learning rule adaptation, input feature selection, connection weight initialization and rule extraction from ANNs [38]. We propose an evolutionary neural network for classifying (predicting) human tumor samples based on microarray gene expressions. This procedure involves the dimension reduction with information gain and the classication with EANN. The proposed method is applied to colon cancer microarray data sets containing various human tumor samples. We have compared the evolutionary neural network to the well-known classication methods. The rest of the paper is organized as follows. In Section 2, we describe the microarray technology and related works on the prediction of cancer, which include oligonucleotide microarray technology, the relevant works with evolutionary neural networks and the results on colon cancer data set of previous studies. In Section 3, we present the evolutionary neural network in details. In Section 4 we examine the performance of the proposed method. 2. Bioinformatics with DNA microarray Uncovering broad patterns of genetic activity, providing new understanding of gene functions and generating unexpected insight into biological mechanism are the impact of microarray-based studies [19]. With the development and application of DNA microarrays, the expression of almost all human genes can now be systematically examined in human malignancies [18]. DNA sequences are initially transcribed into mRNA sequences. These mRNA sequences are translated into the amino acid sequences of the proteins that perform various functions. Measuring mRNA levels can provide a detailed molecular view of the genes. Measuring gene expression levels under di erent
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
363
Fig. 1. General process of acquiring the gene expression data from DNA microarray. (this is an example of Leukemia cancer and there are two types of cancers including ALL and AML. A sample comes from patient.)
conditions is important for expanding our knowledge of gene function. Gene expression data can help in better understanding of cancer. 2.1. Oligonucleotide DNA microarray A main goal of the analysis of gene expression data is the identication of sets of genes that can serve as classication. Understanding cellular responses to drug treatment is another important goal of gene expression proling. The complexity of microarray data calls for data analysis tools that will e ectively aid in biological data mining. DNA microarrays are composed of thousands of individual DNA sequences printed in a high-density array on a glass microscope slide using a robotic arrayer as shown in Fig. 1. After hybridization of two samples, the slides are imaged using scanner that makes uorescene measurements for each dye. In this study, Alons colon cancer data that are monitored using A ymetrix oligonucleotide array are used [1]. High-density oligonucleotide chip arrays are made using spatially patterned, light-directed combinatorial chemical synthesis, and contain up to hundreds of thousands of di erent oligonucleotides on a small glass surface [22]. As the chemical cycle is repeated, each spot on the array contains a short synthetic oligonucleotide, typically 20 25 bases. The oligonucleotides are designed based on the knowledge of the DNA target sequences, to ensure high-a nity and specicity
364
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
Table 1 Relevant works on colon cancer classication Authors Method Feature Furey et al. [13] Li et al. [26] Ben-Dor et al. [5] Nguyen et al. [30] Signal to noise ratio Genetic algorithm All genes, TNoM score Principal component analysis Partial least square Classier SVM KNN Nearest neighbor SVM with quadratic kernel AdaBoost Logistic discriminant Quadratic discriminant Logistic discriminant Quadratic discriminant 90.3 94.1 80.6 74.2 72.6 87.1 87.1 93.5 91.9 Accuracy (%)
of each oligonucleotide to a particular gene. This allows cross-hybridization with the other similar sequenced gene and local background to be estimated and subtracted. Oligonucleotide DNA microarray might eventually eliminate the use of cDNA arrays [4]. 2.2. Related works Derisi et al. [10] published that the expression patterns of many previously uncharacterized genes provided clues to their possible functions [10]. Eisen et al. [11] presented that clustering gene expression data grouped together e ciently genes of known similar function [11]. Shamir [34] described some of the main algorithmic approaches to clustering gene expression data [34]. Getz et al. [14] presented two-way clustering approach to gene microarray data analysis. There are many researchers to attempt to predict colon cancer using various machine learning methods and they show that prediction rate of colon cancer can be approximately 80 90% (Table 1). Sarkar et al. [33] presented a novel and simple method that exhaustively scanned microarray data for unambiguous gene expression patterns [33]. Tclass is a corresponding program of a method that incorporates feature selection into Fishers linear discriminant analysis for gene expression based on tumor classication [36]. Li et al. investigated two Bayesian classication algorithms incorporating feature selection and these algorithms were applied to the classication of gene expression data derived from cDNA microarrays [25]. Li et al. studied to decide which and how many genes should be selected [24]. Guyon et al. proposed a new method of gene selection using support vector machine based on recursive feature elimination (RFE) [17]. Xiong et al. reported that using two or three genes, one could achieve more than 90% accuracy of classication in colon cancer, breast cancer, and leukemia [37]. There are some related works on EANNs that combine the advantages of the global search performed by evolutionary algorithms and local search of the learning algorithms (like BP) of ANN. Yao [39] proposed EANNs approach, EPNet based on Fogels evolutionary programming (EP) as evolutionary algorithm. EPNet emphasizes the evolution
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
365
of ANN behaviors by EP and uses a number of techniques, such as partial training after each architectural mutation and node splitting, to maintain the behavioral link between parent and its o spring e ectively. EPNet also encourages parsimony of evolved ANNs by attempting di erent mutations sequentially. That is, node or connection deletion is always attempted before addition. EPNet has shown good performance in error rate and size of ANN. Cho proposed a new approach of constructing multiple neural networks that used genetic algorithms with speciation to generate a population of accurate and diverse ANNs. Speciation in genetic algorithm creates di erent species, each embodying a sub-solution, which means to create diverse solutions not the best one [2]. Experiments with the breast cancer data from UCI benchmark datasets show that the method can produce more speciated ANNs and improve the performance by combining only representative individuals [3]. Several combination methods are applied to combine speciated neural networks [23]. 3. Evolutionary neural network for cancer classication A traditional articial neural network based on backpropagation algorithm has some limitations. At rst, the architecture of the neural network is xed and a designer needs much knowledge to determine it. Also, error function of the learning algorithm must have a derivative. Finally, it frequently gets stuck in local optima because it is based on gradient-based search without stochastic property. Evolutionary algorithm is a kind of search method based on biological facts and uses a population of multiple individuals. The combination of evolutionary algorithm and neural network can overcome these shortcomings. Design of a near optimal ANN architecture can be formulated as a search problem in the architecture space where each point represents architecture. One major issue in evolving pure architectures is to decide how much information about architecture should be encoded into a chromosome (genotype). There are two representative encoding schemes for neural network including direct and indirect methods. In indirect encoding, rules for generating neural network structure are represented as a chromosome for the evolution [21]. If the phenotype has many overlapped components, indirect encoding is more useful than direct one because it can reduce the length of a chromosome by simple rule representation. In our work, recurrent link is not allowed and only feed-forward link is acceptable. Usually, recurrent link is used for memorizing information but in our problem, it is not useful to adopt the link. 3.1. Feature selection There are two approaches to reduce the dimensionality of data. In ltering approach, there is no concern about which classier is used and only characteristics of the features are measured for selection. The method is very fast and easily implemented. Meanwhile, wrapper approach is a method that uses a specic classier for the selection procedure and the performance of the classier-feature combination is measured for selection. In
366
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
this paper, we adopt the ltering approach because it is computationally inexpensive. Usually, evolutionary computation is computationally expensive and wrapper approach is not appropriate. Details of comparison between two approaches can be found in [28]. The number of genes is too large to manipulate in learning algorithm and all features are not useful for classication. Only relevant features are useful for classication to produce better performance. Feature ranking method is used to classify genes. Information gain is representative feature ranking and selection method used in C4.5 [31] that utilizes information gain to nd the most important feature for each time. Denition of information gain is restricted to genes that take on a discrete set of values. This restriction can easily be removed by dynamically dening new discrete valued genes that partition the continuous gene value into a discrete set of intervals (threshold c is used for the separation). How to select the optimal threshold c is described in [27]. In the formula below, k is the total number of classes; n is the total number of expression values; nl is the number of values in the left partition; and nr is the number of values in the right partition; li is the number of values that belong to class i in the left partition; ri is the number of values that belong to class i in the right partition. Information gain of a gene is dened as follows:
k
IG =
i=1
li ri li ri log + log n nl n nr
i=1
li + ri n
log
li + ri n
3.2. EANN The simultaneous evolution of both architectures and weights can be summarized as follows: (1) Evaluate each individual based on its error and/or other performance criteria such as its complexity. (2) Select individuals for reproduction and genetic operation. (3) Apply genetic operators, such as crossover and mutation, to the ANNs architectures and weights, and obtain the next generation. Fig. 2 shows the overview of evolving neural network. Each ANN is generated with random initial weights and full connection. Then, each ANN is trained partially with training data to help the evolution search the optimal architecture of ANN and is tested with validation data to compute the tness. The tness of ANN is recognition rate of validation data. Once the tness is calculated, selection is conducted that chooses the best 50% individuals to apply genetic operators. The genetic operators, crossover and mutation, are applied to those selected individuals. Then a population of the next generation is created. The process is repeated until stop criterion is satised. The ANNs in the last generation are trained fully. Feature selection is used to reduce the dimensionality for EANN because one feature is corresponding to one input node and if the number of features is very large, the size of network is required to be large. Large network size is not useful for the generalization and the dimensionality reduction is needed for the EANN procedure. Data separation procedure divides the data into three distinct sample sets such as training, validation, and test sets. Training data are used for partial training and full training. Validation data are used for tness calculation and full training. The tness of each
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
Microarray data Feature selection Data separation Generate initial ANNs Train the ANNs partially Compute the fitness Rank-based seletion Apply crossover and mutation Test data Validation data Training data Generate new generation No
367
Stop? Yes Train the ANNs fully Evaluation
Fig. 2. The procedure for evolving neural network.
individual in EANN is solely determined by the inverse of an error value. The selection mechanism used in EANN is rank based. M is the population size. Let M sorted individuals be numbered as 0; 1; : : : ; M 1, with the zeroth being the ttest. Then the (M j )th individual is selected with probability j : p(M j ) = M k =1 k Each sample (gene expression data for one person) is used to train or validate each individual (EANN) of the population (a set of individuals). Population is a collection of individuals and the size is xed at the initial stage. Each individual represents one evolutionary neural network. In Fig. 2, iteration is repeated until stop criterion is satised. It stops when an individual shows better performance than pre-dened accuracy (100%) or iteration number exceeds pre-dened maximum number of generations. 3.2.1. Representation To evolve an ANN, it needs to be expressed in proper form. There are some methods to encode an ANN like binary representation, tree, linked list, and matrix. We have used a matrix to encode an ANN since it is straightforward to implement and easy to apply genetic operators [35]. When N is the total number of nodes in an ANN including input, hidden, and output nodes, the matrix is N N , and its entries consist of connection links and corresponding weights. In the matrix, upper right triangle (see Fig. 3) has connection link information that is 1 when there is a connection link and 0 when there is no connection link. Lower left triangle describes the weight values corresponding to the connection link information. There will be no connections among input nodes. Architectural crossover and mutation can be implemented easily under
368
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
I1 I1 0.0 H1 0.4 H 2 0.5 H 3 0.0 O1 0.1
H1 1.0 0.0 0.0 0.0 0.7
H2 1.0 0.0 0.0 0.1 0.2
H3 0.0 0.0 1.0 0.0 0.7
O1 1.0 1.0 1.0 1.0 0.0
H1 0.4
Generation of Neural Network
0.7 H2 0.1 H3 0.1 0.7 0.2 O1
I1
0.5
Input Node
Hidden Node
O u t put Node
I1 I1 0.0 H1 0.4 H 2 0.5 H 3 0.0 O1 0.1
H 1 H 2 H 3 O1 1.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.1 0.0 1.0 0.7 0.2 0.7 0.0
Connectivity
Weight
Fig. 3. An example of neural network representation.
such a representation scheme. Node deletion and addition involve ipping a bit in the matrix. Fig. 3 shows an example of encoding of an ANN that has one input node, three hidden nodes, and one output node. Each input node is mapped to one gene of a sample and two output nodes are used for indication of cancer. The maximum number of hidden nodes must be pre-dened in this representation. The number of input nodes and output nodes is dependent on the problem as described before. Though the maximum number of hidden nodes is predened, it is not necessary that all hidden nodes are used. Some hidden nodes that have no useful path to output nodes will be ignored. At the initialization stage, connectivity information of the matrix is randomly determined and if the connection value is 1, the corresponding weight is represented with a random real value. This representation allows some direct links between input nodes and output nodes. 3.2.2. Crossover The crossover operator exchanges the architecture of two ANNs in the population to search ANNs with various architectures [29]. In the population of ANNs, crossover operator selects two distinct ANNs randomly and chooses one hidden node from each ANN selected. These two nodes should be in the same entry of each ANN matrix encoding the ANN to exchange the architectures. Once the nodes are selected, the two
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
369
Fig. 4. Crossover operation. (a) Node H2 is selected as a crossover point. Two neural networks exchange all the links related to H2 and H3. (b) Matrix representation of the above example.
ANNs exchange the connection links and corresponding weights information of the nodes and the hidden nodes after that. Fig. 4 shows an example of crossover operation. In this example, two ANNs have one input node, three hidden nodes, and one output
370
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
H1 0.4 I1 0.5 H2 0.1 H3 0.1 0.7 0.7 0.2 O1
Add Connection
H1 0.4 I1 0.5 0.3 H2 0.1 H3 0.1 0.7 0.7 0.2 O1
H1 0.4 I1 H2 0.1 H3 0.1 0.7 0.7
Delete Connection
H1 0.4 O1 I1 H2 0.1 H3 0.1 0.7 0.2 O1
0.2
0.5
0.5
Fig. 5. Mutation operation.
node. For simplicity, it is assumed that the maximum number of hidden nodes is 3. Among the hidden nodes, one hidden node is randomly selected as a crossover point. In the gure, H2 node is chosen as the crossover point. The hidden nodes that have larger index than the point are considered for crossover. In this example, H2 and H3 are considered and the links related to them are exchanged. Fig. 4(a) shows topology change after crossover operation and Fig. 4(b) shows change of the matrix representation. 3.2.3. Mutation The mutation operator changes a connection link and the corresponding weight of a randomly selected ANN from the population. Mutation operator performs one of the two operations that are addition of a new connection and deletion of an existing connection. Mutation operator selects an ANN from the population of ANNs randomly and chooses one connection link from it. If the connection link does not exist and the connection entry of the ANN matrix is the 0, the connection link is added. It adds new connection link to the ANN with random weights. Otherwise, if the connection link already exists, the connection is deleted. It deletes the connection link and weight information. Fig. 5 shows two examples of the mutation. In the gure, an entry (I1, H3) of the matrix is selected for mutation and there is no connection between them. In this case, a new connection is generated and weight is determined randomly. In the second case, (H3, O1) is selected for mutation and there is already
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
371
a connection. In the case, the connection is eliminated and there is no link between H3 and O1. In this study, we use only two mutation types but other methods can be used such as only modifying weights as di erent value. In EPNET, they use only mutations for evolution and does not use crossover [39]. Mutation is very useful to explore broad area of solution space but overuse of the operation can hinder convergence of the solution. In this reason, we have adopted only two mutation types with small mutation rate. 4. Experimental results Colon dataset consists of 62 samples of colon epithelial cells taken from colon-cancer patients. Each sample is from one person and contains 2000 gene expression levels. Although original data consist of 6000 gene expression levels, 4000 out of 6000 were removed based on the condence in the measured expression levels. 40 of 62 samples are colon cancer samples and the remainings are normal samples. Each sample was taken from tumors and normal healthy parts of the colons of the same patients and measured using high-density oligonucleotide arrays. 31 out of 62 samples were used as training data and the remainings were used as test data in this paper. (Available at http://www.sph.uth.tmc.edu:8052/hgc/default.asp) As mentioned before, the feature size of colon dataset is 2000. There is no single solution for optimal number of features for classication but approximately 20 40 genes are appropriate for classication. In this study, we use 30 genes for classication that has high information gain in feature ranking. There are some systematic ways to determine the optimal number of features. Evolutionary approach is also useful to estimate optimal subset of genes [26]. Table 2 shows the name of 30 genes that are selected. Fig. 6 shows some of the features with color that represents the rank. Parameters of genetic algorithm are as follows [15]. In EANN, the population size is 20 and the maximum generation number is 200. Each ANN is feed-forward ANN and back-propagation is used as learning algorithm. Learning rate is 0.1 and the partial training presents the training data 200 times and full training presents the training data 1000 times. Crossover rate is 0.3 and mutation rate is 0.1. Fitness function of EANN is dened as the recognition rate for validation data. In colon data set, the number of data sample is very small and we use test data as validation set. Parameters of the EANN are determined empirically. Usually, the number of population size is necessary to be large but it consumes much computational resource. In empirical test with population size as 40 shows no performance improvement and we set the size as 20. 4.1. Classiers compared SASOM (structure-adaptive self-organizing map) [6] is used by 4 4 map with rectangular topology, 0.05 of initial learning rate, 1000 of initial maximum iteration, 10 of initial radius, 0.02 of nal learning rate, 10000 of nal maximum iteration and 3 of nal radius. We have used SVM (support vector machine) [9] with linear
372
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
Table 2 30 genes selected by information gain Name 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Human monocyte-derived neutrophil-activating protein (MONAP) mRNA, complete cds. Human desmin gene, complete cds. Myosin heavy chain, nonmuscle (Gallus gallus) Human cysteine-rich protein (CRP) gene, exons 5 and 6. Collagen alpha 2(XI) chain (Homo sapiens) Human gene for heterogeneous nuclear ribonucleoprotein (hnRNP) core protein A1. P03001 Transcription factor IIIA;. Myosin regulatory light chain 2, smooth muscle isoform (Human); contains element TAR1 repetitive element;. Mitochondrial matrix protein P1 precursor (Human);. Human aspartyl-tRNA synthetase alpha-2 subunit mRNA, complete cds. Human cysteine-rich protein (CRP) gene, exons 5 and 6. Human cysteine-rich protein (CRP) gene, exons 5 and 6. Human homeo box c1 protein, mRNA, complete cds. Macrophage migration inhibitory factor (Human);. Human splicing factor SRp30c mRNA, complete cds. 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Name Complement factor D precursor (Homo sapiens) H.sapiens mRNA for p cadherin. GTP-binding nuclear Protein ran (Homo sapiens) Prohibitin (Homo sapiens) Hypothetical protein in trpe 3 region (Spirochaeta aurantia) 40S Ribosomal protein S6 (Nicotiana tabacum) Small nuclear ribonucleoprotein associated proteins B and B (Human);. Human DNA polymerase delta small subunit mRNA, complete cds. Human GAP SH3 binding protein mRNA, complete cds. Human (Human);. Tropomyosin, broblast and epithelial muscle-type (Human);. Human serine kinase mRNA, complete cds. Thioredoxin (Human);. S-100P Protein (Human). Human mRNA for integrin alpha 6.
function and RBF (radial basis function) as kernel function. In RBF, we have changed the gamma variable as 0.1 0.5. For classication, we have used 3-layered MLP (multilayer perceptron) [20] with 5 15 hidden nodes, 2 output nodes, 0.01 0.50 of learning rate and 0.9 of momentum. Similarity measures used in KNN [27] are Pearsons correlation coe cient and Euclidean distance. KNN (k nearest neighbor) has been used with k = 18. 4.2. Results and analysis We have conducted 10 runs of experiments to get the average. Fig. 7 shows the results of 10 runs, and min, max and average of 20 individuals in the last generation for each run. Fig. 8 shows the comparison of classiers performance which conrms that EANN performs well. In this experiment, all classiers including EANN use
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
373
Fig. 6. Some features of colon cancer data.
features that are extracted using the information gain. To show the performance of the method clearly, 10-fold cross validations are conducted. Recognition rate of the cross validation is 75%. The neural network which shows 94% recognition rate
374
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
1 0.8 Accuracy 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 10 R uns
Fig. 7. Max, min and average accuracy of 10 runs.
1 Accuracy 0. 9 0. 8 0. 7 0. 6
0. 94
0. 71 0. 71 0. 71
0. 81 0. 71 0. 74
1: 2: 3: 4: 5: 6: 7:
EANN MLP SASOM SVM(Linear) SVM(RBF) KNN(Cosine) KNN(Pearson)
Classifier
S1 7
Fig. 8. Comparison of classication rate (maximum accuracy for each classier) (All classiers use the same features that are selected using the information gain).
contains 203 connections. The number of connections from input nodes to hidden nodes is 147, that from input nodes to output nodes is 26, that from hidden nodes to hidden nodes is 25, and that from hidden nodes to output nodes is 5. The neural network contains 42 nodes: 30 input nodes, 10 hidden nodes and 2 output nodes. Fig. 9 shows four di erent connections among nodes using a graph visualization tool [16]. How to extract meaningful information from the network structure is challenging task and one attempt is as follows. In Fig. 9(b), there are some direct links between input nodes and output nodes and it is possible to estimate relationship between features and the cancer. If there is a link between feature and cancer indication output node (which is set as 1.0 when patient is cancer) with high weight, it can be inferred that the feature is the most relevant one for the cancer. Meanwhile, some features are connected only to output node that is for normal person indication. Some are connected to two output nodes simultaneously. To analyze the meaning correctly, comparison with the clinical investigation is demanded. Table 3 summarizes the confusion matrix of the best evolutionary neural network. The network produces wrong classication for the sample id 24 and 30. Sensitivity of
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
375
Fig. 9. Connection of nodes by the four di erent types. (a) from input nodes to hidden nodes, (b) from input nodes to output nodes, (c) from hidden nodes to hidden nodes, (d) from hidden nodes to output nodes.
the classier is 100.0% and specicity is 81.8%. This means that the classier does not classify patient into normal person but it classies normal person into patient with the probability of 18.2%. This means that if the person whom the classier decides as a patient is a normal person with the probability of 9%. The relationship between specicity and sensitivity is negatively correlated and the cost for misclassication for two cases is important point to decide the level of two measures. Prediction error is composed of two components (one is discriminating normal person as a cancer person and vice versa). The two cases have di erent cost for the misclassication. If normal
376
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
Table 3 Confusion matrix of the best EANN EANN Predicted 0 (Normal) Actual 0 (Normal) 1 (Cancer) 9 0 1 (Cancer) 2 20
A confusion matrix contains the information about actual and predicted classications conducted by a classication system.
person is diagnosed as a cancer person only small cost is enough for deep investigation, whereas missing cancer person produces big loss such as death. The best neural network means the one that produces 94% of accuracy as depicted in Fig. 8.
5. Concluding remarks It is important to distinguish normal from tumor samples. We have introduced an evolutionary neural network for the classication of tumors based on microarray gene expression data. The methodologies involve dimension reduction of the highdimensional gene expression space followed by information gain. We have illustrated the e ectiveness of the method in predicting normal and tumor samples in colon cancer data set. The methods can distinguish between normal and tumor samples with high accuracy. There are many approaches to predict cancer data using machine learning techniques including SASOM, SVM, MLP and KNN. EANN is a hybrid method of evolutionary algorithm and neural network to nd a solution without expert knowledge. Comparison with other classiers shows that EANN performs very well. Especially, including feature selection for evolution procedure can avoid too large network structure that requires huge computational resource and produces low performance. The advantage of the proposed method can be summarized as follows. At rst, human does not need any prior knowledge about neural network structure. Additional research can reveal the relationships between genes and classes from the emerged structure. For example, rule extraction from neural network can be used for this task. Disadvantage of the method is that it requires more computational resource than the conventional methods because evolutionary algorithm uses multiple points to search solutions.
Acknowledgements This research was supported by Biometrics Engineering Research Center and Brain Science and Engineering Research Program sponsored by Korean Ministry of Science and Technology.
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
377
References
[1] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Sci. USA 96 (1999) 67456750. [2] J.-H. Ahn, S.-B. Cho, Combining multiple neural networks evolved by speciation, ICONIP 2000, 2000, pp. 230 234. [3] J.-H. Ahn, S.-B. Cho, Speciated neural networks evolved with tness sharing technique, Proceedings of the 2001 Congress on Evolutionary Computation, Vol. 1, 2001, pp. 390 396. [4] J.C. Barrett, E.S. Kawasaki, Microarrays: The use of oligonucleotides and cDNA for the analysis of gene expression, Drug Discovery Today 8 (3) (2003) 134141. [5] A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, N. Yakhini, Tissue classication with gene expression proles, J. Comput. Biol. 7 (2000) 559584. [6] S.-B. Cho, Self-organizing map with dynamical node splitting: application to handwritten digit recognition, Neural Comput. 9 (6) (1997) 13451355. [7] G.A. Chung-Faye, D.J. Kerr, L.S. Young, P.F. Searle, Gene therapy strategies for colon cancer, Mol. Med. Today 6 (2) (2000) 8287. [8] M. Conrad, Computation: evolutionary, neural, molecular, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks, 2000, pp. 19. [9] N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University Press, Cambridge, 2000. [10] J. Derisi, V. Iyer, P. Brosn, Exploring the metabolic and genetic control of gene expression on a genomic scale, Science 278 (1997) 680686. [11] M.B. Eisen, P.T. Spellman, P.O. Brown, D. Bostein, Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. USA 95 (1998) 1486314868. [12] D.B. Fogel, K. Chellapilla, Verifying Anacondas expert rating by competing against Chinook: experiments in co-evolving a neural checkers player, Neurocomputing 42 (1 4) (2002) 6986. [13] T.S. Furey, N. Cristianini, N. Du y, D.W. Bednarski, M. Schummer, D. Haussler, Support vector machine classication and validation of cancer tissue samples using microarray expression data, Bioinformatics 16 (10) (2000) 906914. [14] G. Getz, E. Levine, E. Domany, Coupled two-way clustering analysis of gene microarray data, Proc. Natl. Acad. Sci. USA 97 (22) (2000) 1207912084. [15] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA, 1989. [16] I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classication using support vector machine, Mach. Learning 46 (13) (2002) 389422. [17] GraphViz, Graph Visualization Project, http://www.graphviz.org/. [18] G.M. Hampton, H.F. Frierson Jr., Classifying human cancer by analysis of gene expression, Trends Mol. Med. 9 (1) (2003) 519. [19] C.A. Harrington, C. Rosenow, J. Retief, Monitoring gene expression using DNA microarrays, Curr. Opinion Microbiol. 3 (2000) 285291. [20] S.S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd Edition, Prentice-Hall, Englewood Cli s, NJ, 1998. [21] H. Kitano, Designing neural networks using genetic algorithms with graph generation system, Complex Syst. 4 (4) (1990) 461476. [22] R.J. Lipshutz, S.P.A. Fodor, T.R. Gingeras, D.J. Lockhart, High density synthetic oligonucleotide arrays, Nat. Genet. 21 (1999) 2024. [23] S.-I. Lee, J.-H. Ahn, S.-B. Cho, Exploiting diversity of neural ensembles with speciated evolution, International Joint Conference on Neural Networks, Vol. 2, 2001, pp. 808313. [24] W. Li, I. Grosse, Gene selection criterion for discriminant microarray data analysis based on extreme value distributions, RECOMB03: Proceedings of the Seventh Annual International Conference on Computational Biology, 2003. [25] Y. Li, C. Campbell, M. Tipping, Bayesian automatic relevance determination algorithms for classifying gene expression data, Bioinformatics 18 (2002) 13321339.
378
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
[26] L. Li, C.R. Weinberg, T.A. Darden, L.G. Pedersen, Gene selection for sample classication based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method, Bioinformatics 17 (12) (2001) 11311142. [27] T.M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997. [28] D. Mladenic, M. Grobelnik, Feature selection on hierarchy of web documents, Decision Support Syst. 35 (1) (2003) 4587. [29] D. Montana, L. Davis, Training feedforward neural networks using genetic algorithms, Proceedings of the 11th International Conference on Articial Intelligence, 1989, pp. 762767. [30] D.V. Nguyen, D.M. Rocke, Tumor classication by partial least squares using microarray gene expression data, Bioinformatics 18 (1) (2002) 3950. [31] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, Los Altos, CA, 1992. [32] D.A. Rew, DNA microarray technology in cancer research, Eur. J. Surg. Oncol. 27 (5) (2001) 504 508. [33] I.N. Sarkar, P.J. Planet, T.E. Bael, S.E. Stanley, M. Siddall, R. DeSalle, D.H. Figurski, Characteristic attributes in cancer microarrays, J. Biomed. Inf. 35 (2) (2002) 111122. [34] R. Shamir, R. Sharan, Algorithmic approaches to clustering gene expression data, in: T. Jiang, T. Smith, Y. Xu, M.Q. Zhang (Eds.), Current Topics in Computational Biology, MIT Press, Cambridge, MA, 2001. [35] D.W. Taylor, D.W. Corne, D.L. Taylor, J. Harkness, Predicting alarms in supermarket refrigeration systems using evolved neural networks and evolved rulesets, Congress on Evolutionary Computation, 2002, pp. 19881993. [36] L. Wuju, X. Momiao, Tclass: tumor classication system based on gene expression proles, Bioinformatics 18 (2002) 325326. [37] M. Xiong, W. Li, J. Zhao, L. Jin, E. Boerwinkle, Feature (Gene) selection in gene expression-based tumor classication, Mol. Genet. Metabolism 73 (3) (2001) 239247. [38] X. Yao, Evolving articial neural networks, Proc. IEEE 87 (9) (1999) 14231447. [39] X. Yao, Y. Liu, A new evolutionary system for evolving articial neural networks, IEEE Trans. Neural Networks 8 (3) (1997) 694713. Kyung-Joong Kim received the B.S. and M.S. degree in computer science from Yonsei University, Seoul, Korea, in 2000 and 2002, respectively. Since 2002, he has been a Ph.D. student in the Department of Computer Science, Yonsei University. His research interests include evolutionary neural network, robot control, and agent architecture.
Sung-Bae Cho received the B.S. degree in computer science from Yonsei University, Seoul, Korea, in 1988 and the M.S. and Ph.D. degrees in computer science from KAIST (Korea Advanced Institute of Science and Technology), Taejeon, Korea, in 1990 and 1993, respectively. He worked as a Member of the Research Sta at the Center for Articial Intelligence Research at KAIST from 1991 to 1993. He was an Invited Researcher of Human Information Processing Research Laboratories at ATR (Advanced Telecommunications Research) Institute, Kyoto, Japan from 1993 to 1995, and a Visiting Scholar at University of New South Wales, Canberra, Australia in 1998. Since 1995, he has been an Associate Professor in the Department of Computer Science, Yonsei University. His research interests include
K.-J. Kim, S.-B. Cho / Neurocomputing 61 (2004) 361 379
379
neural networks, pattern recognition, intelligent man-machine interfaces, evolutionary computation, and articial life. Dr. Cho was awarded outstanding paper prizes from the IEEE Korea Section in 1989 and 1992, and another one from the Korea Information Science Society in 1990. He was also the recipient of the Richard E. Merwin prize from the IEEE Computer Society in 1993. He was listed in Whos Who in Pattern Recognition from the International Association for Pattern Recognition in 1994, and received the best paper awards at International Conference on Soft Computing in 1996 and 1998. Also, he received the best paper award at World Automation Congress in 1998, and listed in Marquis Whos Who in Science and Engineering in 2000 and in Marquis Whos Who in the World in 2001. He is a Member of the Korea Information Science Society, INNS, the IEEE Computer Society, and the IEEE Systems, Man, and Cybernetics Society.