0% found this document useful (0 votes)
12 views10 pages

A New Feature Selection Algorithm Based On Binary Ant Colony Optimization

This document presents a new undersampling method called ACOSampling to address class imbalance in DNA microarray data classification. ACOSampling uses ant colony optimization to iteratively filter less informative majority samples and search for optimal training sample subsets. It starts with feature selection to remove noisy genes, then randomly divides the data into training and validation sets. For each division, ACO is used to determine which majority samples to filter. The selection frequency of each majority sample indicates its importance. Samples with high frequency are kept along with all minority samples to create a balanced final training set for an SVM classifier. The method is evaluated on four DNA microarray datasets and outperforms other sampling approaches.

Uploaded by

installheri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views10 pages

A New Feature Selection Algorithm Based On Binary Ant Colony Optimization

This document presents a new undersampling method called ACOSampling to address class imbalance in DNA microarray data classification. ACOSampling uses ant colony optimization to iteratively filter less informative majority samples and search for optimal training sample subsets. It starts with feature selection to remove noisy genes, then randomly divides the data into training and validation sets. For each division, ACO is used to determine which majority samples to filter. The selection frequency of each majority sample indicates its importance. Samples with high frequency are kept along with all minority samples to create a balanced final training set for an SVM classifier. The method is evaluated on four DNA microarray datasets and outperforms other sampling approaches.

Uploaded by

installheri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Neurocomputing 101 (2013) 309–318

Contents lists available at SciVerse ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

ACOSampling: An ant colony optimization-based undersampling method


for classifying imbalanced DNA microarray data
Hualong Yu a,n, Jun Ni b, Jing Zhao c
a
School of Computer Science and Engineering, Jiangsu University of Science and Technology, Mengxi Road No.2, Zhenjiang 212003, China
b
Department of Radiology, Carver College of Medicine, The University of Iowa, Iowa City, IA 52242, USA
c
College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China

a r t i c l e i n f o abstract

Article history: In DNA microarray data, class imbalance problem occurs frequently, causing poor prediction
Received 25 December 2011 performance for minority classes. Moreover, its other features, such as high-dimension, small sample,
Received in revised form high noise etc., intensify this damage. In this study, we propose ACOSampling that is a novel
25 August 2012
undersampling method based on the idea of ant colony optimization (ACO) to address this problem.
Accepted 26 August 2012
The algorithm starts with feature selection technology to eliminate noisy genes in data. Then we
Communicated by T. Heskes
Available online 19 September 2012 randomly and repeatedly divided the original training set into two groups: training set and validation
set. In each division, one modified ACO algorithm as a variant of our previous work is conducted to filter
Keywords: less informative majority samples and search the corresponding optimal training sample subset. At last,
DNA microarray
the statistical results from all local optimal training sample subsets are given in the form of frequence
Ant colony optimization
list, where each frequence indicates the importance of the corresponding majority sample. We only
Class imbalance
Undersampling extracted those high frequency ones and combined them with all minority samples to construct the
Support vector machine final balanced training set. We evaluated the method on four benchmark skewed DNA microarray
datasets by support vector machine (SVM) classifier, showing that the proposed method outperforms
many other sampling approaches, which indicates its superiority.
& 2012 Elsevier B.V. All rights reserved.

1. Introduction In fact, class imbalance learning has drawn a significant


amount of interest since 2000 from artificial intelligence, data
In the past decade, DNA microarray has been one of the most mining and machine learning, which can be reflected by launch
important molecular biology technologies in the post-genomic of several major workshops and special issues [11], including
era. By this technology, biologists and medical experts are AAAI’00 [12], ICML’03 [13] and ACM SIGKDD Explorations’04 [14]
permitted to detect the activity of thousands of genes in a cell etc. There are two major methods to solve class imbalance
simultaneously. At present, DNA microarray has been widely problem: sampling-based strategy and cost sensitive learning.
applied to predict gene functions [1], investigate gene regulatory Sampling, which includes oversampling and undersampling, deals
mechanisms [2,3], provide invaluable information for drug dis- with class imbalance by inserting samples for minority class or
covery [4], classify for cancer [5,6] and mining new subtypes of a discarding samples of majority class [15–20]. While cost-sensitive
specific tumor [7–9] etc. Among these applications, cancer classi- learning treats class imbalance by incurring different costs for
fication has attracted more attentions. However, it is well-known different classes [21–29]. Recently, some research also focused on
that microarray data generally has some particular features, such ensemble learning built on multiple different sampling or weight-
as high-dimension, small sample, high noise and most impor- ing data sets with presenting excellent performance and general-
tantly, imbalanced class distributions. Skewed class distributions ization ability [30–35]. More details about class imbalance
will underestimate greatly the prediction performance for min- learning methods are presented in Section 2.
ority classes and provide inaccurate evaluation for classification In this study, we introduce a novel undersampling method
performance, while the other features of microarray data will based on the idea of ant colony optimization (ACO), which is
further intensify this damage [10]. Therefore, it is necessary to named ACOSampling, to classify for skewed DNA microarray data.
remedy this bias by some effective strategies. In fact, this method is a modified version of our previous work
[36], the difference between them is this work converts the
information selection from feature space to sample space. First,
n
Corresponding author. Tel.: þ86 511 88690470; fax.: þ86 511 88690471. the original training dataset is randomly and repeatedly divided
E-mail address: [email protected] (H. Yu). into two groups: training dataset and validation dataset. Then for

0925-2312/$ - see front matter & 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2012.08.018
310 H. Yu et al. / Neurocomputing 101 (2013) 309–318

each partition, ACOSampling is conducted to find the correspond- of majority class by cleaning noisy samples, redundant samples
ing optimal majority class sample subset. Different from the and boundary examples in majority category [18]. As another
traditional ACO algorithm, ACOSampling impels ants to leave improved oversampling method, Adaptive Synthetic Sampling
from the nest, then to pass all majority class samples one by one, (ADA-SYN) uses a density distribution as criterion to automati-
by either pathway 0 or pathway 1, at last to reach the food source, cally decide the number of synthetic samples that need to be
where pathway 0 indicates the corresponding sample is useless generated for each minority example by adaptively changing the
and should be filtered, while pathway 1 represents it is important weights of different minority class examples to compensate for
and should be selected. Considering the particularity of the the skewed distributions [19]. Another sampling method using
classification tasks in this study, the overall accuracy is not an density distribution is Under-sampling based on clustering (SBC),
excellent measure as the fitness function, thus we construct it by presented recently by Yen and Lee [20]. SBC may automatically
three weighted indicative metrics, namely F-measure, G-mean and decide to remove how many majority class samples in each
AUC, respectively. After that, many local optimal majority class cluster, according to the corresponding density distribution.
sample subsets can be generated by iterative partitions, so the Garcı́a et al. [37] have simply compared two kinds of sampling
significance of each majority sample may be estimated according strategies and found oversampling generally produces better
to its selection frequence, i.e., the higher the selection frequence, classification performance when the dataset is highly skewed,
the more information the corresponding sample can provide. while undersampling is more effective when imbalance ratio is
Next, a global optimum balanced sample set can be created by very low. All in all, sampling possess many advantages, such as
combining the highly ranked samples of majority class with all simple, intuitive, low time complexity and low storage cost, thus
examples of minority class. At last, we construct a SVM classifier it can be more convenient to apply in real-world imbalanced
upon the balanced training set for recognizing future unlabeled classification tasks. In Section 4, we would investigate the
samples. performance of the proposed ACOSampling method compared
The remainder of this paper is organized as follows. Section 2 with original data without sampling (ORI) and several benchmark
reviews some previous work related with class imbalance pro- sampling strategies described above, such as ROS, RUS, SMOTE,
blem. In Section 3, the idea and procedure of ACOSampling BSO1, BSO2, OSS, ADA-SYN and SBC.
method is described in detail. Experimental results and discus- Cost sensitive learning methods consider the costs associated
sions are presented in Section 4. At last, we conclude this paper in with misclassifying samples [21]. Instead of creating balanced
Section 5. data distributions through sampling, cost-sensitive learning
assigns different costs for the samples belonging to different
classes by creating a cost matrix. Based on the cost matrix,
2. Previous work misclassifications on the minority class are more expensive than
the majority class. Moreover, cost sensitive learning pursues to
As mentioned in the Section 1, the existing class imbalance minimize the total cost but not error rate, thus the significance of
learning methods could be roughly categorized into two major minority class is highlighted. There are generally three kinds of
groups: sampling strategy and cost sensitive learning. Here, we cost sensitive learning methods. The first one is based on
pay special attention to sampling strategy because it is more translation theorem [22]. It applies misclassification costs to the
related with our study. data set in terms of data-space weighting. The second class,
The sampling is actually a re-balancing process for the given building on metacost framework [23], uses cost-minimizing
imbalanced data set. It can be distinguished into oversampling techniques to the combination schemes of ensemble methods.
and undersampling. Oversampling, as its name indicates, Some existing research has combined these two strategies, such
increases some samples belonging to minority class, while under- as AdaCX series algorithms [24] and AdaCost [25]. The last class of
sampling takes away some examples of majority class. The cost sensitive learning methods directly designs appropriate cost
simplest sampling methods are Random Over Sampling (ROS) functions for specific classifier, including cost-sensitive decision
and Random Under Sampling (RUS) [15]. The former will make tree [26], cost-sensitive neural network [27] and cost-sensitive
the learner to be overfitting by simply duplicating some samples support vector machine [28] etc. In some application fields, it has
of minority class, while the latter may lose some valuable been demonstrated that cost sensitive learning is superior to
classification information due to many majority examples are sampling approaches [29]. However, it is difficult to pre-design
randomly removed [11]. To overcome their drawbacks, some an appropriate cost function when class imbalance problem
complicated sampling methods were developed. Synthetic Min- occurs [27].
ority Over-sampling TEchnique (SMOTE), proposed by Chawla In recent several years, ensemble learning has also become
et al. [16], can create artificial data based on the feature space popular to be employed for solving class imbalance problems.
similarities between existing minority examples. Specifically, Generally speaking, in this technology, ensemble learning frame-
randomly select one sample xi in minority class, find its K-nearest work is incorporated with sampling approach or weighting
neighbors belonging to the same class by Euclidian distance. To strategy to acquire better classification performance and general-
create a synthetic sample, randomly select one of the K-nearest ization capability. Chawla et al. introduced SMOTE into Boosting
neighbors, then multiply the corresponding feature vector differ- ensemble learning framework to develop the SMOTEBoost learn-
ence with a random number between [0,1], and finally, add this ing method [30]. Unlike the base classifiers generation strategy in
vector to xi. Han et al. [17] observed that most misclassified traditional Boosting, SMOTEBoost promotes weak classifiers
samples scatter around the borderline between two categories, through altering distributions for the samples of different classes
then presented two improved versions of SMOTE, Borderline- by SMOTE. Liu et al. combined RUS and AdaBoost classifier to
SmOte1 (BSO1) and Borderline-SmOte2 (BSO2), respectively. For overcome deficiency of information loss of traditional RUS
BSO1, SMOTE only runs on those minority class samples near method and presented two ensemble strategies: EasyEnsemble
borderline, while for BSO2, it generates synthetic minority class and BalanceCascade [31]. In contrast with Boosting framework,
samples between each frontier minority example and one of its Bagging seems to leave less room to be modified for class
K-nearest neighbors belonging to majority class, thus mildly imbalance problem. However, there are still some improved
enlarges decision region of minority class. One Side Selection versions about Bagging, including Asymmetric Bagging (asBag-
(OSS) has very similar idea with BSO2. It shrinks the decision area ging) which has been used to retrieve image [32] and predict drug
H. Yu et al. / Neurocomputing 101 (2013) 309–318 311

molecules [33], Roughly Balanced Bagging (RB Bagging) based on where i represents the ith site, i.e., the ith majority sample in
negative binomial distributions [34] etc. In literature [35], Khosh- original training set, j denotes pathway, which may be assigned as
goftaar et al. compared the existing Boosting and Bagging tech- 1 or 0 to denote whether selecting the corresponding sample or
nologies on noisy and imbalanced data and found Bagging series not. tij is pheromone intensity of the ith site in the jth pathway, pij
algorithms generally outperform Boosting. However, ensemble and k are the probability of selecting the jth pathway of the ith
learning is more time-consuming than both former methods so site and possible value of pathway j (0 or 1), respectively. When
that it is restricted in practical applications [35]. an ant arrives at the food source, the corresponding sample subset
will be evaluated by fitness function. It is worth noting that
overall classification accuracy is not an indicative measure for
imbalanced classification tasks [16]. Therefore, we use a special
3. Methods metric designed by Yang et al. [43] to evaluate classification
performance, which is given in formula (2):
3.1. Undersampling based on ant colony optimization
fitness ¼ a  F-measure þ b  G-mean þ g  AUC
Ant colony optimization (ACO) algorithm, which is developed s:t: : a þ b þ g ¼ 1 ð2Þ
by Colorni et al. [38], is one important member of swarm
The fitness function is constitutive of three weighted metrics:
intelligence family. ACO simulates the behavior of foraging by
F-measure, G-mean and AUC. We will introduce these metrics in
real ant colony and in recent years, it has been successfully
Section 4.2. When one cycle finishes, the pheromone of all
applied to solve various practical optimization problems, includ-
pathways is updated, the update function inherits from the
ing TSP [39], parameter optimization [40], path planning [41],
literature [38] and is described as follows:
protein folding [42] etc.
In previous work, we have once designed an ACO algorithm to tij ðt þ 1Þ ¼ r  tij ðtÞ þ Dtij ð3Þ
select tumor-related marker genes in DNA microarray data [36].
whereris the evaporation coefficient, which controls the decre-
While in this study, we transform it from feature space to sample
ment of pheromone, Dtij is increased pheromone of some excel-
space to search an undersampling set which is regarded as the
lent pathways. In this paper, we add pheromone in the pathways
optimal subset estimated on the given validation set. However,
of the best 10% ants after each cycle and store these pathways in a
this optimal set is not necessary absolutely balanced. In addition,
set E. Dtij is defined as follows:
in optimization process, to justly evaluate the performance for ( 1
each ant, several indicative metrics have also been jointly used to 0:1ant_n  fitness, pathwayij A E
construct the fitness function.
Dtij ¼ ð4Þ
0, pathwayij2=E
Fig. 1 describes the sample selection procedure using our ACO
algorithm. As indicates in Fig. 1, the process of sample selection In formula (4), ant_n is the size of ant colony, i.e., the number
may be regarded as the procedure of seeking for food of one ant. of ants. When one cycle finishes, the pheromone of some path-
Between nest and food, sites are built one by one, and each of ways will be intensified and the others will be weakened, so that
them represents one alternative sample of majority class in guaranteeing those excellent pathways are given more chances in
original training set. In the process of moving from nest to food, next cycle. When convergence of ACO algorithm, all ants are
ant passes each site by either pathway 0 or pathway 1, where inclined to select the same pathway. At last, the optimal solution
pathway 0 denotes that the next sample will be filtered and returns.
pathway 1 represents that the next sample will be selected. In contrast with our previous work, we make a few changes
At last, when the ant arrives at the food, some majority samples in this study, such as fitness function and pheromone update
are extracted and combined with all minority examples to
constitute the corresponding training set. A binary set {1, 0, 1, 0,
0, 1} means the 1st, 3rd and 6th majority sample have been
picked out. Then the new created training set would be evaluated
according to the fitness on the validation set. Ants cooperate with
each other by intensity of pheromone left in every pathway to
search the optimal routine.
In our ACO algorithm, many ants synchronously search path-
ways from nest to food. They select pathways according to the
quantities of pheromone left in these pathways. The more
pheromone is left, the more probability of the corresponding
pathway is selected. We compute the probability of selecting a
pathway by:
tij
pij ¼ Pk ð1Þ
j tij

Fig. 1. Sample selection procedure based on ACO algorithm. Fig. 2. Pseudo-code description of the undersampling algorithm based on ACO.
312 H. Yu et al. / Neurocomputing 101 (2013) 309–318

function. On the other hand, it also inherits some advantages from applications, it may be easily modified to fit various validation
previous method, for example, we impose the upper and lower methods. Meanwhile, to impartially estimate the amount of
boundary of pheromone in each pathway to prevent the algo- information of each majority example and avoid overfitting for
rithm sinking into local optimization prematurely. Pseudo-code final generated classifier, the original training set is randomly and
description of the algorithm is simply summarized in Fig. 2. repeatedly divided into two groups: training set (2/3) and
validation set (1/3) for 100 times. It is clear that in these 100
3.2. ACOSampling strategy repeated partitions, each sample has equal chance to be picked
into training set. Then we conduct ACO algorithm in each
By ACO algorithm mentioned above, an excellent undersam- partition to find the corresponding optimal undersampling set
pling subset may be extracted as the final training set to construct and figure out majority class samples ranking frequence list (take
a classifier and recognize future testing samples. However, to Fig. 4 as an example, the more times one sample emerges, the
guide optimization procedure, we have to divide the original more information the sample can provide for classification) based
training set into two parts: training set and validation set, before on these local optimal sets. Next, a balanced dataset is created by
ACO algorithm works. Generally, it can cause two severe pro- combining the highly ranked samples of majority class with all
blems for constructed classifier: information loss and overfitting examples of minority class. At last, we train a classifier upon the
due to the employment of validation set. In particular, when balanced sample set and evaluate its performance by testing set.
classification tasks are based on small sample set, these problems According to the descriptions above, main loop of ACOSampling
become more serious. To solve this problem, we present a novel method can be summarized by pseudo-code in Fig. 5.
strategy named as ACOSampling to produce more robust classifier
by the combination of reduplicative partition of original sample 3.3. Support vector machine
set and ACO algorithm. The frame diagram of ACOSampling
strategy presents in Fig. 3. Support vector machine (SVM) introduced by Vapnik [44], is a
As can be observed from Fig. 3, in our design, ACOSampling valuable tool for solving pattern classification problem. In contrast
applies 3-fold cross validation to evaluate classification perfor- with traditional classification methods, SVM possesses several
mance, i.e., two-thirds samples are extracted into original training prominent advantages as follows: (1) high generalization capabil-
set and the rest ones are used for testing each time. In practical ity; (2) absence of local minima; (3) be suitable for small-sample

Fig. 3. The frame diagram of ACOSampling strategy.

50
Rank Index Frequence
40
1 15 49
Frequence

30 2 7 42
20 … ... …

10 26 21 4
27 24 1
0
0 3 6 9 12 15 18 21 24 27
Sample

Fig. 4. Ranking frequence list for samples of majority class.


H. Yu et al. / Neurocomputing 101 (2013) 309–318 313

In formula (5), sv represents the number of support vectors, ai


is lagrange multiplier, b is the bias of optimum classification
hyperplane, while K (x, xi) denotes the kernel function. In this
work, we conduct the experiments with radial basis kernel
function (RBF) because it generally produces more excellent
generalization performance and lower computational cost com-
pared with other kernel functions in practical applications [45].
RBF kernel function is described as follows:
( 2
)
9xi xj 9
Kðxi ,xj Þ ¼ exp  ð6Þ
2s2

The detailed description about the theory of SVM please refers


to literature [44]. We choose SVM as baseline classifier because it
generally provides better classification performance for high-
dimensional and small sample data, e.g., DNA microarray data,
than some typical classifiers.

Fig. 5. Pseudo-code description of ACOSampling strategy.

3.4. Preprocessing and feature selection of DNA microarray data

Generally, genes exist with different value range in microarray


datasets. To impartial evaluate the significance for each gene, it is
necessary to conduct a data preprocessing procedure. In this
paper, we normalized expression levels of each gene to be mean
0 and variance 1 [46]. The computational formula is given as:

0 g ij mi
g ij ¼ ð7Þ
si

where gij and gij0 represent original and normalized expression


value of the jth sample on the ith gene, respectively. While mi and
si are mean and variance for the ith gene in original dataset,
respectively.
Moreover, in microarray datasets, there are lots of genes with
noise, redundancy or irrelevant information for classification task.
It is thus important for achieving excellent performance to select
a few feature genes which are strongly related with classification
task [47].
In recent years, a lot of feature gene selection strategies were
proposed [7,36,46–49] and most of them have been proved
helpful for improving classification accuracy. These strategies
may be grouped into two classes: filter and wrapper. The former
evaluates individually each gene and assigns a score reflecting its
correlation with the class label according to certain criteria. Genes
are then ranked based on their scores and some top-ranked ones
are selected. While wrapper searches the optimal solution in gene
space according to the returned accuracy percentage of a specific
Fig. 6. SVM constructs a hyperplane (bold line) to maximize the margin between classifier. Generally speaking, wrapper could obtain better classi-
two classes (circle and pentagram). The samples emerging on the dashed lines are fication performance but expend more computational cost than
called as support vectors. New instances will be classified to the side of the
filter [48]. For imbalanced microarray data classification task,
hyperplane they fall into.
researchers have designed some special and complicated feature
gene extraction approaches [50–53]. However, the target of this
dataset. Its main idea is to implicitly map data to a higher
study is to develop one more effective undersampling method,
dimensional space via a kernel function and then solve an
thus a simple and efficient strategy named as SNR (Signal-Noise
optimization problem to identify the maximum-margin hyper-
Ratio) [7] is used. It is described as follows:
plane that separates two class training instances. New instances
are classified according to the side of the hyperplane they fall into SNRðiÞ ¼ 9mi1 mi2 9=ðsi1 þ si2 Þ ð8Þ
(see Fig. 6).
Given training set S ¼ fðxi ,yi Þ9xi A Rd , yi A f1, þ 1g, i ¼ 1,    ,Ng, where mi and si stand for mean and standard deviation calculated
n n

wherexi is a d-dimension sample, yi is the corresponding class by all samples belong to nclass. Take colon dataset [5] as an
label, N is the number of samples. The discriminant function of example, we compute SNR value for all 2000 genes and rank them
SVM can be described as follows: in ascending sequence (see Fig. 7).
Fig. 7 shows that only quite a few genes have high SNR values
!
X
sv and they could be regarded as feature genes related closely with
gðxÞ ¼ sgn ai yi Kðx,xi Þ þ b ð5Þ classification task. In this study, we select top-100 ranked genes
i¼1
to conduct experiments.
314 H. Yu et al. / Neurocomputing 101 (2013) 309–318

1 1

0.8 0.8

Signal-Noise Ratio

Signal-Noise Ratio
0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 500 1000 1500 2000 0 500 1000 1500 2000
Gene Index Gene Ranking

Fig. 7. SNR value distribution for gene index (left) and gene ranking (right).

Table 1 Table 2
Datasets used in this study. Confusion matrix.

Dataset Size Genes Maj:Min Imbalance ratio Predicted positive class Predicted negative class

Colon [5] 62 2000 40:22 1.82 Actual positive class TP (True positive) FN (False negative)
CNS [53] 60 7129 39:21 1.86 Actual negative class FP (False positive) TN (True negative)
Lung [8] 39 2880 24:15 1.60
Glioma [9] 50 10367 36:14 2.57

Table 3
4. Experiments Initial parameters settings.

Common parameters for ACOSampling Value


4.1. Datasets
ant_n: population size 50
Four benchmark imbalanced microarray datasets are used in ITA: iteration times of ant colony 50
our experiments, including Colon dataset [5], CNS (Central Neural ITP: iteration times of partition for original training set 100
System) dataset [54], Lung cancer dataset [8] and Glioma data- dispose: evaporation coefficient 0.8
ph_initial: the initial pheromone in each pathway 1.0
set [9]. The first three datasets are composed of binary class
phmin: the lower boundary of pheromone 0.5
samples and Glioma dataset consists of four subtypes: cancer phmax: the upper boundary of pheromone 2.0
glioblastomas (CG), non-cancer glioblastomas (NG), cancer oligo- a, b, g: weight for three metrics 1/3
dendrogliomas (CO) and non-cancer oligodendrogliomas (NO). Common parameters for SVM Value
For Glioma dataset, CG is used as the positive class with 14 s: the parameter of RBF kernel function 5
C: the penalty factor 500
examples and the others are integrated into one class with 36
examples. In these four datasets, the size of samples is 39–62, the
number of genes is from 2000 to 10367 and the imbalance ratio is
TN
1.60–2.57. Information about these datasets is summarized in TNR ¼ ð13Þ
TN þ FP
Table 1 and they are available at http://datam.i2r.a-star.edu.sg/
datasets/krbd/. and the overall accuracy Acc is computed as:
TP þ TN
4.2. Evaluation criteria and parameters settings Acc ¼ ð14Þ
TP þ TN þFP þ FN

It is well-known that in skewed recognition tasks, overall AUC is the area below the ROC curve which depicts the
accuracy (Acc) generally gives bias evaluation, thus some other performance of a method using the (FPR, TPR) pairs. It has been
specific evaluation metrics, such as F-measure, G-mean and area proved to be a reliable performance measure for class imbalance
under the receiver operating characteristic curve (AUC), are problem [11].
needed to estimate classification performance of a learner. In the study, some initial parameters used in ACOSampling and
F-measure and G-mean may be regarded as functions of the con- SVM have been given empirically according to our previous work
fusion matrix as shown in Table 2. They are calculated as follows: [36] (see Table 3). As for several parameters such as phmin and
phmax, we have made a little adjustment based on extensive
2  Precision  Recall
F-measure ¼ ð9Þ experimental results.
Precision þRecall
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
G-mean ¼ TPR  TNR ð10Þ
4.3. Results and discussions
where Precision, Recall, TPR and TNR are further defined as
follows: First, we conduct experiments on four imbalanced microarray
TP datasets (refers to Section 4.1) with top-100 feature genes
Precision ¼ ð11Þ extracted by SNR strategy. To present superiority of the proposed
TP þ FP
method, the performance of some typical sampling approaches,
TP such as original data without sampling (ORI), ROS, RUS, SMOTE,
Recall ¼ TPR ¼ ð12Þ
TP þFN BSO1, BSO2, OSS, ADA-SYN and SBC etc., are tested synchronously.
H. Yu et al. / Neurocomputing 101 (2013) 309–318 315

Table 4
Performance comparison for various sampling methods on four datasets.

Performance (%) Sampling method

ORI ROS RUS SMOTE BSO1 BSO2 OSS ADA-SYN SBC ACOSampling

Colon dataset
Acc 83.23 7 2.72 84.19 72.48 84.52 7 3.47 85.48 7 1.61 83.07 7 1.94 84.03 72.10 85.657 2.84 85.65 72.10 83.23 7 2.41 85.63 7 1.83
F-measure 75.24 7 4.49 76.78 74.78 79.31 7 4.07 79.37 7 2.85 74.99 7 3.29 76.91 73.08 81.137 2.96 79.76 73.29 78.95 7 2.84 81.137 2.63
G-mean 80.237 3.76 81.54 74.15 84.17 7 3.22 83.83 7 2.54 80.01 7 2.78 81.68 72.45 85.76 7 2.29 84.21 72.80 84.25 7 2.47 85.927 2.41
AUC 87.23 7 2.32 87.76 73.20 89.16 7 1.88 89.13 7 1.69 88.20 7 1.51 88.61 72.72 91.33 7 1.89 88.82 71.50 90.197 2.19 94.187 1.56

CNS dataset
Acc 82.83 7 1.98 83.33 72.36 82.00 7 1.00 84.33 7 2.49 84.50 7 2.48 84.33 72.49 83.17 7 2.29 84.67 72.21 79.507 1.98 83.83 7 3.42
F-measure 75.507 2.99 76.08 73.29 77.45 7 1.44 77.56 7 3.66 78.13 7 3.23 78.55 73.75 77.58 7 2.29 78.44 73.32 76.107 2.37 79.757 3.83
G-mean 80.967 2.44 81.32 72.55 83.21 7 1.31 82.51 7 3.09 83.08 7 2.56 83.79 73.24 83.037 1.69 83.43 72.86 81.94 7 2.12 85.177 3.23
AUC 92.21 7 1.94 92.26 70.89 92.91 7 1.47 93.057 1.09 93.367 1.44 93.22 71.74 92.94 7 1.24 92.81 71.22 92.43 7 1.08 93.33 7 1.47

Lung dataset
Acc 65.38 7 3.29 64.62 72.99 67.44 7 3.25 65.907 2.82 65.13 7 2.86 67.44 72.00 68.21 7 3.08 65.38 72.63 67.18 7 2.99 71.797 4.59
F-measure 56.107 5.31 53.30 74.10 60.56 7 4.66 55.58 7 5.35 55.40 7 4.38 59.39 72.30 62.507 5.52 54.79 73.53 60.437 4.14 67.867 4.50
G-mean 63.46 7 4.24 61.48 73.37 66.89 7 3.82 63.35 7 4.20 62.93 7 3.52 66.16 71.90 68.067 4.18 62.67 72.90 66.74 7 3.45 72.327 4.19
AUC 67.75 7 2.52 67.36 72.49 71.22 7 2.33 67.92 7 2.94 68.14 7 2.90 69.78 72.74 73.53 7 3.56 68.00 73.34 73.22 7 2.06 77.427 4.16

Glioma dataset
Acc 92.807 1.83 94.00 70.89 92.20 7 3.03 93.607 1.20 94.00 7 1.79 93.40 71.28 93.207 1.60 94.20 71.40 93.407 1.80 94.407 1.96
F-measure 87.087 3.54 89.35 71.63 87.54 7 4.01 88.56 7 2.16 89.32 7 3.08 88.80 71.59 88.57 7 2.39 89.77 72.42 88.87 7 2.73 90.547 3.00
G-mean 90.947 2.96 92.71 71.53 93.16 7 1.96 91.97 7 1.76 92.47 7 2.03 93.19 70.78 93.26 7 1.50 93.08 71.69 93.44 7 1.66 94.327 1.30
AUC 98.71 7 0.34 98.93 70.18 98.75 7 0.56 98.87 7 0.47 99.157 0.27 98.73 70.37 98.75 7 0.90 98.97 70.36 98.77 7 0.47 99.13 7 0.16

Fig. 8. Performance comparison for various sampling methods on four datasets. 1st column: Colon dataset; 2nd column: CNS dataset; 3rd column: Lung dataset;
4th column: Glioma dataset.

The average classification results based on 10 times’ 3-fold cross particularly designed assessment criteria for imbalanced classifi-
validation are presented in Table 4 and Fig. 8. cation, such as F-measure, G-mean and AUC, but also is embodied
As shown in Table 4 and Fig. 8, almost all sampling methods in the overall accuracy Acc.
outperform the method only using original data (i.e., ORI), Compared with those traditional sampling strategies, we are
indicating that sampling is effective to improve classification more interested in the performance of ACOSampling method.
performance for imbalanced high-dimensional and small sample From Table 4 and Fig. 8, we observe that ACOSampling acquires
classification tasks. This improvement not only reflects in some the highest F-measure and G-mean in all datasets. For AUC metric,
316 H. Yu et al. / Neurocomputing 101 (2013) 309–318

ACOSampling attains the highest value on Colon dataset and Lung serious degeneration or not. Undoubtedly, Lung dataset may be
dataset, and it ranks only second to BSO1 on two other datasets. regarded as a clear harmful class imbalance task, and thus
The results indicate that the proposed ACOSampling strategy is ACOSampling performs better on this dataset. However, we have
more effective and can extract some majority examples with to admit that ACOSampling is more time-consuming because it
more classification information than those typical sampling runs iteratively for estimating the significance of each majority
approaches. At the same time, we find an interesting pheno- sample. This could be well explained by ‘‘No Free Lunch
menon that the proposed method could obviously improve Theorems’’ of Wolpert et al. [55], which demonstrates that there
classification performance on Lung dataset, but only a slight is no optimization method that outperforms all others in all
improvement on the other datasets. This can be explained by circumstances.
the viewpoint of Ref. [31] which partitions class imbalance tasks Moreover, Fig. 8 shows that undersampling generally produces
into two groups: harmful and unharmful, according to the judg- better results than oversampling on our low imbalance ratio
ment of whether the classification performance suffers from datasets, which is similar with the finding of Ref. [37]. This

Table 5
Performance comparison for ACOSampling method based on different number of feature genes on four datasets.

Performance (%) Number of feature genes

10 20 50 100 200 500 1000

Colon dataset
Acc 82.74 7 3.95 83.71 7 2.10 84.84 72.41 85.637 1.83 84.20 7 2.68 82.10 7 3.91 75.32 7 6.08
F-measure 77.27 7 4.92 78.74 7 2.66 79.50 73.54 81.137 2.63 78.61 7 3.47 76.08 7 4.15 69.42 7 6.68
G-mean 82.59 7 4.24 83.94 7 2.16 84.35 73.05 85.927 2.41 83.58 7 2.81 81.43 7 3.30 75.80 7 5.72
AUC 90.017 3.30 90.31 7 2.03 91.78 72.15 94.187 1.56 87.70 7 1.41 87.35 7 1.45 83.05 7 3.52

CNS dataset
Acc 69.33 7 3.18 78.17 7 4.44 81.67 74.01 83.837 3.42 79.50 7 3.42 78.67 7 2.96 73.83 7 2.59
F-measure 63.47 7 3.72 74.36 7 4.06 77.23 74.18 79.757 3.83 73.17 7 3.70 70.61 7 2.53 61.43 7 2.90
G-mean 70.65 7 3.33 80.16 7 3.77 82.91 73.49 85.177 3.23 79.35 7 2.93 77.08 7 1.99 69.55 7 2.31
AUC 78.18 7 3.93 89.27 7 3.38 92.47 71.66 93.337 1.47 90.31 7 1.69 84.58 7 4.08 76.04 7 1.58

Lung dataset
Acc 68.21 7 3.28 71.79 7 3.80 68.20 72.85 71.79 7 4.59 74.107 1.80 70.26 7 4.47 68.20 7 4.89
F-measure 63.10 7 2.94 66.62 7 4.13 65.26 73.23 67.86 7 4.50 69.727 3.14 65.99 7 5.42 63.85 7 6.82
G-mean 68.43 7 2.91 71.77 7 3.53 69.29 72.75 72.32 7 4.19 74.597 2.43 70.93 7 4.71 68.70 7 5.69
AUC 75.22 7 4.63 79.78 7 2.44 74.89 74.32 77.42 7 4.16 80.227 3.31 73.56 7 4.18 72.55 7 3.74

Glioma dataset
Acc 93.40 7 2.01 96.80 7 1.83 96.60 72.69 94.40 7 1.96 89.60 7 2.50 88.40 7 2.15 82.40 7 1.96
F-measure 89.19 7 3.06 94.68 7 2.93 94.22 74.45 90.54 7 3.00 83.69 7 3.28 82.80 7 2.85 75.85 7 2.41
G-mean 94.26 7 1.83 97.74 7 1.30 96.95 73.04 94.32 7 1.30 90.92 7 1.69 91.40 7 1.89 86.60 7 2.01
AUC 99.20 7 0.53 99.86 7 0.25 99.68 70.43 99.13 7 0.16 98.08 7 1.17 98.06 7 1.27 91.03 7 3.50

Fig. 9. Performance comparison for ACOSampling method based on different number of feature genes on four datasets. 1st column: Colon dataset; 2nd column: CNS
dataset; 3rd column: Lung dataset; 4th column: Glioma dataset.
H. Yu et al. / Neurocomputing 101 (2013) 309–318 317

implies that using all majority class samples is not necessary Foundation for young researchers of Ministry of Education of China
when the dataset is a little skewed. In five oversampling methods, under Grant No.20070217051 and Ph.D Foundation of Jiangsu
ROS performs worst in most cases, while the other four strategies University of Science and Technology under Grant No.35301002.
show comparative performance with each other. In those under-
sampling methods exclusive of ACOSampling, OSS performs best
in most datasets, while compared with RUS, SBC does not reveal
enough competitive power. Therefore, in practical applications, References
if the time-complexity is limited strictly, OSS should be one
considerable alternative. [1] X. Zhou, M.C. Kao, W.H. Wong, From the cover: transitive functional
annotation by shortest-path analysis of gene expression data, Proc. Nat.
Then we investigate the relationship between number of Acad. Sci. U.S.A. 99 (20) (2002) 12783–12788.
selected feature genes and classification performance for ACO- [2] D. Husmeier, Sensitivity and specificity of inferring genetic regulatory
Sampling method. The number of feature genes is assigned as 10, interactions from microarray experiments with dynamic Bayesian networks,
Bioinformatics 19 (17) (2003) 2271–2282.
20, 50, 100, 200, 500 and 1000, respectively. We conduct 10
[3] E. Segal, M. Shapira, A. Regev, et al., Module networks: discovering regulatory
times’ 3-fold cross validation in each group, then present the modules and their condition specific regulators from gene expression data,
results in Table 5 and Fig. 9. Nat. Genet. 34 (2) (2003) 166–176.
Though there are some fluctuations, a trend can be still [4] W.E. Evans, R.K. Guy, Gene expression as a drug discovery tool, Nat. Genet. 36
(3) (2004) 214–215.
observed from the curves in Fig. 9, i.e., the middle high and low [5] U. Alon, N. Barkai, D.A. Notterman, et al., Broad patterns of gene expression
on both sides. That means both of selecting too few or too many revealed by clustering analysis of tumor and normal colon tissues probed by
feature genes would degenerate classification performance. We Oligonucleotide array, Proc. Nat. Acad. Sci. U.S.A. 96 (12) (1999) 6745–6750.
[6] T.P. Conrads, M. Zhou, E.F. Petricoin, et al., Cancer diagnosis using proteomic
believe the reason is the former causes the deficiency of useful patterns, Expert Rev. Mol. Diagn. 3 (4) (2003) 411–420.
information and the latter adds in much noise and redundant [7] T.R. Golub, D.K. Slonim, P. Tamayo, et al., Molecular classification of cancer:
information. Table 5 shows that for Colon dataset and CNS dataset, class discovery and class prediction by gene expression monitoring, Science
286 (5439) (1999) 531–537.
the best performance can be obtained with 100 feature genes, [8] D.A. Wigle, I. Jurisica, N. Radulovich, et al., Molecular profiling of non-small
while for Lung dataset and Glioma dataset, the optimal number are cell lung cancer and correlation with disease-free survival, Cancer Res. 62
200 and 20, respectively. Therefore, for high-dimensional and small (11) (2002) 3005–3008.
[9] C.L. Nutt, D.R. Mani, R.A. Betensky, et al., Gene expression-based classification
sample classification tasks, for example, DNA microarray data, it is
of malignant gliomas correlates better with survival than histological
necessary to extract a few class-related features previously, which classification, Cancer Res. 63 (7) (2003) 1602–1607.
is also verified by Wasikowski et al. [56]. [10] R. Blagus, L. Lusa, Class prediction for high-dimensional class-imbalanced
data, BMC Bioinf. 11 (523) (2010).
In contrast with the previous work by Yang et al. [43] which is
[11] H. He, E.A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data
similar with this study, our ACOSampling owns one specific Eng. 21 (9) (2009) 1263–1284.
merit: stronger generalization ability derived from 100 times’ [12] N. Japkowicz, Workshop on learning from imbalanced data sets, in: Proceed-
random partitions. Using these random partitions, we could give ings of the 17th American Association for Artificial Intelligence, Austin, Texas,
USA, 2000.
more righteous evaluation for the significance of each majority [13] N.V. Chawla, N. Japkowicz, A. Kolcz, Workshop on learning from imbalanced
class sample. While Ref. [43] tries to avoid overfitting by inte- data sets II, in: Proceedings of the 20th International Conference of Machine
grating the results from multiple different kinds of classifiers, Learning, Washington, USA, 2003.
[14] N.V. Chawla, N. Japkowicz, A. Kolcz, Editorial: special issue on learning from
which would cause more bias for final classification results than imbalanced data sets, ACM SIGKDD Explor. Newsl 6 (1) (2004) 1–6.
our method. However, we have to admit that the proposed [15] C. Ling, C. Li, Data mining for direct marketing problems and solutions, in:
method in this study is more time-consuming than their work. Proceedings of the 4th ACM SIGKDD International Conference of Knowledge
Discovery and Data Mining, New York, USA, 1998, pp.73–79.
Therefore, we declare that the proposed ACOSampling method is [16] N.V. Chawla, K.W. Bowyer, L.O. Hall, et al., SMOTE: synthetic minority over-
more suitable to deal with imbalanced classification tasks with sampling technique, J. Artif. Intell. Res. 16 (1) (2002) 321–357.
the characteristic of small sample simultaneously. [17] H. Han, W.Y. Wang, B.H. Mao, Borderline-SMOTE: a new over-sampling
method in imbalanced data sets learning, in: Proceedings of the 2005
International Conference of Intelligent Computing, Hefei, China, 2005,
5. Conclusions pp.878–887.
[18] M. Kubat, S. Matwin, Addressing the curse of imbalanced training sets: one-
sided selection, in: Proceedings of the 14th International Conference of
In this paper, we present a novel heuristic undersampling Machine Learning, Nashville, Tennessee, USA, 1997, pp.179–186.
method named as ACOSampling to address imbalanced DNA [19] H. He, Y. Bai, E.A. Garcia, et al., ADASYN: adaptive synthetic sampling
microarray data classification problem. By extensive experiments, approach for imbalanced learning, in: Proceedings of the 2008 International
Joint Conference of Neural Networks, Hong Kong, China, 2008, pp.1322–
it has demonstrated that the proposed method is effective and 1328.
can automatically extract those so-called ‘‘information samples’’ [20] S.J. Yen, Y.S. Lee, Cluster-based under-sampling approaches for imbalanced
from majority class. However, since its procedure of sampling is data distributions, Expert. Syst. Appl. 36 (3) (2009) 5718–5727.
[21] C. Elkan, The foundations of cost-sensitive learning, in: Proceedings of the
more time-consuming than those typical sampling approaches, it 17th International Joint Conference of Artificial Intelligence, Seattle,
will be more efficient on small sample classification tasks. Washington, USA, 2001, pp.973–978.
Considering the excessive computational and storage cost of the [22] B. Zadrozny, J. Langford, N. Abe, Cost-sensitive learning by cost-proportionate
example weighting, in: Proceedings of the 3rd International Conference of
proposed algorithm, we intend to improve its efficiency by modifying Data Mining, Melbourne, Florida, USA, 2003, pp.435–442.
its formation rule in future work. We also expect that our ACOSam- [23] P. Domingos, MetaCost: a general method for making classifiers cost-
pling can be applied to other real-world data mining applications, sensitive, in: Proceedings of the 5th ACM SIGKDD International Conference
of Knowledge Discovery and Data Mining, San Diego, CA, USA 1999, PP.155–
where we suffer from class imbalance. Moreover, considering ubi-
164.
quitous multiclass imbalanced classification tasks in practical appli- [24] Y. Sun, M.S. Kamel, A.K.C. Wong, et al., Cost-sensitive boosting for classifica-
cations, we will investigate the possibility of extending current tion of imbalanced data, Pattern Recognit. 40 (12) (2007) 3358–3378.
[25] W. Fan, S.J. Stolfo, J. Zhang, et al., AdaCost: misclassification cost-sensitive
ACOSampling to multiclass tasks in the future work, too.
boosting, in: Proceedings of the 16th International Conference of Machine
Learning, Bled, Slovenia, 1999, pp.97–105.
[26] C. Drummond, R.C. Holte, Exploiting the cost (In)sensitivity of decision tree
Acknowledgements splitting criteria, in: Proceedings of the 17th International Conference of
Machine Learning, Stanford, CA, USA, 2000, pp.239–246.
[27] Z.H. Zhou, X.Y. Liu, Training cost-sensitive neural networks with methods
This work is partially supported by National Natural Science addressing the class imbalance problem, IEEE Trans. Knowl. Data Eng. 18 (1)
Foundation of China under grant No.60873036, the Ph.D Programs (2006) 63–77.
318 H. Yu et al. / Neurocomputing 101 (2013) 309–318

[28] R. Akbani, S. Kwek, N. Japkowicz, Applying support vector machines to International Conference of Bioinformatics, Systems Biology and Intelligent
imbalanced data sets, in: Proceedings of the 15th European Conference on Computing, Shanghai, China, 2009, pp.3–9.
Machine Learning, Pisa, Italy, 2004, pp.39–50. [53] Q. Shen, Z. Mei, B.X. Ye, Simultaneous genes and training samples selection
[29] X.Y. Liu, Z.H. Zhou, The influence of class imbalance on cost-sensitive by modified particle swarm optimization for gene expression data classifica-
learning: an empirical study, in: Proceedings of the 6th IEEE International tion, Comput. Biol. Med. 39 (7) (2009) 646–649.
Conference on Data Mining, Hong Kong, China, 2006, pp.970–974. [54] S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, et al., Prediction of central nervous
[30] N.V. Chawla, A. Lazarevic, L.O. Hall, et al., SMOTEBoost: improving prediction system embryonal tumour outcome based on gene expression, Nature 415
of the minority class in boosting, in: Proceedings of the 7th European (6870) (2002) 436–442.
Conference on Principles of Data Mining and Knowledge Discovery, Cavtat– [55] D.H. Wolpert, W.G. Macready, No free lunch theorems for optimization, IEEE
Dubrovnik, Croatia, 2003, pp.107–119. Trans. Evol. Comput. 1 (1) (1997) 67–82.
[31] X.Y. Liu, J. Wu, Z.H. Zhou, Exploratory undersampling for class-imbalance [56] M. Wasikowski, X.W. Chen, Combating the small sample class imbalance
learning, IEEE Trans. Syst. Man Cybern. Part B Cybern. 39 (2) (2009) problem using feature selection, IEEE Trans. Knowl. Data Eng. 22 (10) (2010)
539–550. 1388–1400.
[32] D. Tao, X. Tang, X. Li, et al., Asymmetric bagging and random subspace for
support vector machines-based relevance feedback in image retrieval, IEEE
Trans. Pattern Anal. Mach. Intell. 28 (7) (2006) 1088–1099.
[33] G.Z. Li, H.H. Meng, W.C. Lu, et al., Asymmetric bagging and feature selection
Hualong Yu received the B.S. degree from Heilongjiang
for activities prediction of drug molecules, BMC Bioinf. 9 (S6) (2008) S7.
University, Harbin, China, in 2005. Then he received
[34] S. Hido, H. Kashima, Y. Takahashi, Roughly balanced bagging for imbalanced
M.S. and Ph.D degree from the college of computer
data, Stat. Anal. Data Min. 2 (5-6) (2009) 412–426.
science and technology, Harbin Engineering Univer-
[35] T.M. Khoshgoftaar, J.V. Hulse, A. Napolitano, Comparing boosting and bagging
sity, Harbin, China, in 2008 and 2010, respectively.
techniques with noisy and imbalanced data, IEEE Trans. Syst. Man Cybern.
Since 2010, he has been one lecturer and master
Part B Cybern. 41 (3) (2011) 552–568.
supervisor in Jiangsu University of Science and Tech-
[36] H.L. Yu, G.C. Gu, H.B. Liu, et al., A modified ant colony optimization algorithm
nology, Zhenjiang, China. He is author or co-author for
for tumor marker gene selection, Genomics Proteomics Bioinf. 7 (4) (2009)
over 20 research papers, 3 books and the program
200–208.
committee member for ICICSE2012. His research inter-
[37] V. Garcı́a, J.S. Sánchez, R.A. Mollineda, et al., The Class Imbalance Problem in
ests mainly include pattern recognition, machine
Pattern Classification and Learning, in: II Congreso Español de Informática,
learning and Bioinformatics, etc.
2007, pp. 283–291.
[38] A. Colorni, M. Dorigo, V. Maniezzo, Distributed optimization by ant colonies.
in: Proceedings of the 1st European Conference on Artificial Life, Paris,
France, 1991, pp.134–142.
[39] A. Uğur, D. Aydin, An interactive simulation and analysis software for solving Jun Ni received the B.S. degree from Harbin Engineer-
TSP using ant colony optimization algorithms, Adv. Eng. Software 40 (5) ing University, Harbin, China, the M.S. degree from
(2009) 341–349. Shanghai Jiaotong University, Shanghai, China and the
[40] X. Zhang, X. Chen, Z. He, An ACO-based algorithm for parameter optimization Ph.D degree from the University of Iowa, IA, USA. He is
of support vector machines, Expert. Syst. Appl. 37 (9) (2010) 6618–6628. currently an associate professor and director of Med-
[41] H. Duan, Y. Yu, X. Zhang, et al., Three-dimension path planning for UCAV ical Imaging HPC and Informatics Lab, Department of
using hybrid meta-heuristic ACO-DE algorithm, Simul. Modell. Pract. Theory Radiology, Carver College of Medicine, the University
18 (8) (2010) 1104–1115. of Iowa, Iowa City, IA, USA. He is also visiting professor
[42] A. Shmygelska, H.H. Hoos, An ant colony optimization algorithm for the 2D in Harbin Engineering University and Shanghai Uni-
and 3D hydrophobic polar protein folding problem, BMC Bioinf. 6 (30) versity, China, since 2006 and 2009, respectively. He
(2005). edited or co-edited 34 books or proceedings and
[43] P. Yang, L. Xu, B. Zhou, et al., A particle swarm based hybrid system for authored or co-authored 115 peer-reviewed journal
imbalanced medical data sampling, BMC Genomics 10 (S3) (2009) S34. and conference papers. In addition, he is editor-in-
[44] V. Vapnik, Statistical Learning Theory, Wiley Publishers, New York, USA, chief of International Journal of Computational Medicine and Healthcare, associate
1998. editor of IEEE Systems Journal and editorial board member for 15 other profes-
[45] J.A.K. Suykens, J. Vandewalle, Least squares support vector machine classi- sional journals. Since 2003, he has also been General/Program Chairs for over 50
fiers, Neural Process. Lett. 9 (3) (1999) 293–300. International conferences. Currently, his research interests include distributed
[46] Y.H. Wang, F.S. Makedon, J.C. Ford, et al., HykGene: a hybrid approach for computation, parallel computing, medical imaging informatics, computational
selecting feature genes for phenotype classification using microarray gene biology and Bioinformatics, etc.
expression data, Bioinformatics 21 (8) (2005) 1530–1537.
[47] E. Xing, M. Jordan, R. Karp, Feature selection for high-dimensional genomic
microarray data, in: Proceedings of the 18th International Conference of
Machine Learning, Williamstown, MA, USA, 2001, pp.601–608.
[48] I. Inza, P. Larranaga, R. Blanco, Filter versus wrapper gene selection Jing Zhao received the Ph.D degree in Harbin Institute
approaches in DNA microarray domains, Artif. Intell. Med. 31 (2) (2004) of Technology, Harbin, China, in 2005. She is currently
91–104. a professor and Ph.D supervisor in college of computer
[49] J.H. Chiang, S.H. Ho, A combination of rough-based feature selection and RBF science and technology, Harbin Engineering Univer-
neural network for classification using gene expression data, IEEE Trans. sity, Harbin, China and a senior visiting scholar in Duke
Nanobiosci. 7 (1) (2008) 91–99. University, USA. She is author or co-author for over 20
[50] K. Yang, Z. Cai, J. Li, et al., A stable gene selection in microarray data analysis, research papers. Her research interests include soft-
BMC Bioinf. 7 (228) (2006). ware reliability, mobile computing and image
[51] G.Z. Li, H.H. Meng, J. Ni, Embedded gene selection for imbalanced microarray processing.
data analysis, in: Proceedings of the 3rd International Multi-symposiums on
Computer and Computational Sciences, Shanghai, China, 2008, pp.17–24.
[52] A.H.M. Kamal, X.Q. Zhu, R. Narayanan, Gene selection for microarray expres-
sion data with imbalanced sample distributions, in: Proceedings of the 2009

You might also like