0% found this document useful (0 votes)
61 views12 pages

New Bandwidth Selection Criterion For Kernel PCA Approach To Dimensionality Reduction and Classification Problems

DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade.

Uploaded by

jomasool
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views12 pages

New Bandwidth Selection Criterion For Kernel PCA Approach To Dimensionality Reduction and Classification Problems

DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade.

Uploaded by

jomasool
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Thomas et al.

BMC Bioinformatics 2014, 15:137


http://www.biomedcentral.com/1471-2105/15/137

M E TH O DO LO G Y A RTI CLE Open Access

New bandwidth selection criterion for Kernel


PCA: Approach to dimensionality reduction
and classification problems
Minta Thomas1* , Kris De Brabanter2 and Bart De Moor1

Abstract
Background: DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment
selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a
decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or
clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few
observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature
selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics.
Several prediction tools are available based on these techniques.
Results: Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality
reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we
propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation
for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support
Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we
compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other
well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of
Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of
the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies.
Conclusion: We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature
transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both
feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection
property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future
samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity.

Background widespread concern; hence some form of dimensionality


Biomarker discovery and prognosis prediction are essen- reduction is often applied. Feature selection and feature
tial for improved personalized cancer treatment. Microar- transformation are two commonly used dimensionality
ray technology is a significant tool for gene expression reduction techniques. The key difference between feature
analysis and cancer diagnosis. Typically, microarray data selection and feature transformation is that, in the former
sets are used for class discovery [1,2] and prediction [3,4]. only a subset of original features is selected while the latter
The high dimensionality of the input feature space in com- is based on generation of new features.
parison with the relatively small number of subjects is a In this genomic era, several classification and dimen-
sionality reduction methods are available for analyz-
*Correspondence: [email protected] ing and classifying microarray data. Prediction Analysis
1 KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for
Dynamical Systems, Signal Processing and Data Analytics/iMinds Medical IT, of Microarray (PAM) [5] is a statistical technique for
Kasteelpark Arenberg 10, 3001 Leuven, Belgium class prediction from gene expression data using Nearest
Full list of author information is available at the end of the article Shrunken Centroid (NSC). PAM identifies subsets of

© 2014 Thomas et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly credited.
Thomas et al. BMC Bioinformatics 2014, 15:137 Page 2 of 12
http://www.biomedcentral.com/1471-2105/15/137

genes that best characterize each class. LS-SVM is a used kernel function
 is radial
 basis function (RBF) kernel:
promising method for classification, because of its solid x −x 2
K (xi , xj ) = exp − i2h2j (RBF kernel with bandwidth
mathematical foundations which convey several salient
h). Traditionally the optimal parameters (bandwidth and
properties that other methods hardly provide. A com-
number of principal components) of RBF kernel function
monly used technique for feature selection, t-test, assumes
are selected in a trial and error fashion.
that the feature values from two different classes follow
Pochet et al. [17] proposed an optimization algorithm
normal distributions. Several studies, especially microar-
for KPCA with RBF kernel followed by Fisher Discrimi-
ray analysis, have used t-test and LS-SVM together to
nant Analysis (FDA) to find the parameters of KPCA. In
improve the prediction performance by selecting key fea-
this case, the parameter selection is coupled with the cor-
tures [6,7]. The Least Absolute Shrinkage and Selection
responding classifier. This means that the performance of
Operator (Lasso) [8] is often used for gene selection
the final procedure depends on the chosen classifier. Such
and parameter estimation in high-dimensional microar-
a procedure could produce possible inaccurate results in
ray data [9]. The Lasso shrinks some of the coefficients to
the case of weak classifiers. In addition, this appears to be
zero, and extend of shrinkage is determined by the tuning
a time consuming procedure, while tuning the parameters
parameter, often obtained from cross validation.
of KPCA.
Inductive learning systems were successfully applied
Most classification methods have inherent problem with
in a number of medical domains, e.g. in localization of
high dimensionality of microarray data and hence require
primary tumors, prognostic of recurring breast cancer,
dimensionality reduction. The ultimate goal of our work is
diagnosis of thyroid diseases, and rheumatology [10]. An
to design a powerful preprocessing step, decoupled from
induction algorithm is used to learn a classifier, which
the classification method, for large dimensional data sets.
maps the space of feature values into the set of class values.
In this paper, initially we explain an SVM approach to
This classifier is later used to classify new instances, with
PCA and LS-SVM approach to KPCA. Next, by following
the unknown classifications (class labels). Researchers and
the idea of least squares cross-validation in kernel den-
practitioners realize that the effective use of these induc-
sity estimation, we propose a new data-driven bandwidth
tive learning systems requires data preprocessing, before
selection criterion for KPCA. The tuned LS-SVM formu-
a learning algorithm could be applied [11]. Due to the
lation to KPCA is applied to several data sets and serves as
instability of feature selection techniques, it might be
a dimensionality reduction technique for a final classifica-
difficult or even impossible to remove irrelevant and/or
tion task. In addition, we compared the proposed strategy
redundant features from a data set. Feature transforma-
with an existing optimization algorithm for KPCA, as well
tion techniques, such as KPCA, discover a new feature
as with other preprocessing steps. Finally, for the sake
space having fewer dimensions through a functional map-
of comparison, we applied LS-SVM on whole data sets,
ping, while keeping as much information, as possible in
PCA+LS-SVM, t-test + LS-SVM, PAM and Lasso. Ran-
the data set.
domization on all data sets are carried out in order to get
KPCA, which is a generalization of PCA, a nonlin-
a more reliable idea of the expected performance.
ear dimensionality reduction technique that has proven
to be a powerful pre-processing step for classification
algorithms. It has been studied intensively in the last Data sets
several years in the field of machine learning and has In our analysis, we collected 11 publicly available binary
claimed success in many applications [12]. An algorithm class data sets (diseased vs. normal). The data sets
for classification using KPCA was developed by Liu et al. are: colon cancer data [18,19], breast cancer data [20],
[13]. KPCA was proposed by Schölkopf and Smola [14], pancreatic cancer premalignant data [21,22], cervical
by mapping features sets to a high-dimensional feature cancer data [23], acute myeloid leukemia data[24], ovarian
space (possibly infinite) and applying Mercer’s theorem. cancer data [21], head & neck squamous cell carcinoma
Suykens et al. [15,16] proposed a simple and straightfor- data [25], early-early stage duchenne muscular dystrophy
ward primal-dual support vector machine formulation to (EDMD) data [26], HIV encephalitis data [27], high grade
the PCA problem. glioma data [28], and breast cancer data [29]. In breast
To perform KPCA, the user first transforms the input cancer data [29] and high grade glioma data, all data sam-
data x from the original input space F0 into a higher- ples have already been assigned to a training set or test
dimensional feature space F1 with a nonlinear transform set. The breast cancer data in [29] contains missing values;
x → (x) where  is a nonlinear function. Then a ker- those values have been imputed based on the nearest
nel matrix K is formed using the inner products of new neighbor method.
feature vectors. Finally, a PCA is performed on the cen- An overview of the characteristics of all the data sets can
tralized K, which is an estimate of the covariance matrix be found in Table 1. In all the cases, 2/3rd of the data sam-
of the new feature vectors in F1 . One of the commonly ples of each class are assigned randomly to the training
Thomas et al. BMC Bioinformatics 2014, 15:137 Page 3 of 12
http://www.biomedcentral.com/1471-2105/15/137

Table 1 Summary of the 11 binary disease data sets These error variables are maximized for the given N data
Data set #Samples #Genes points while keeping the norm of v small by the regular-
Class 1 Class 2 ization term. The value γ is a positive real constant. The
Lagrangian becomes
1: Colon 22 40 2000
2: Breast cancer I 34 99 5970
1 2 1 T
N   N
3: Pancreatic 50 50 15154 L(v, e; α) = γ ek − v v − αk ek − vT xk
2 2
4: Cervical 8 24 10692 k=1 k=1

5: Leukemia 26 38 22283 with conditions for optimality


6: Ovarian 91 162 15154

7: Head & neck squamous ⎪ ∂L
N

⎪ =0→v= αk xk
cell carcinoma 22 22 12625
⎨ ∂v
k=1
∂L
⎪ = 0 → αk = γ ek k = 1, . . . , N
8: Duchenne muscular dystrophy 23 14 22283 ⎪

∂ek
⎩ ∂L
9: HIV encephalitis 16 12 12625 ∂αk = 0 → ek − vT xk , k = 1, . . . , N.
10: High grade glioma 29 21 12625
11: Breast cancer II 19 78 24188
By elimination of the variables e, v one obtains the follow-
ing symmetric eigenvalue problem:
⎡ ⎤⎡ ⎤ ⎡ ⎤
and the rest to the test set. These randomizations are the xT1 x1 . . . xT1 xN α1 α1
same for all numerical experiments on all data sets. This ⎢ .. .. ⎥ ⎢ .. ⎥ = λ ⎢ .. ⎥
⎣ . . ⎦⎣ . ⎦ ⎣ . ⎦
split was performed stratified to ensure that the relative
xN x1 . . . xN xN
T T αN αN
proportion of outcomes sampled in both training and test
set was similar to the original proportion in the full data The vector of dual variables α = [α1 ; . . . ; αN ] is an eigen-
set. In all these cases, the data were standardized to zero vector of the Gram matrix and λ = γ1 is the corresponding
mean and unit variance. eigenvalue. The score variable, znpca (x) of sample x on nth
eigenvector α n becomes
Methods
The methods used to set up the case studies can be subdi- (n)
znpca (x) = vT x = i=1
N
αi xTi x (1)
vided into two categories: dimensionality reduction using
the proposed criterion and subsequent classification. LS-SVM approach to KPCA
The PCA analysis problem is interpreted as a one-class
SVM formulation to linear PCA
modeling problem with a target value equal to zero around
Given training set{xi }N
i=1 , xi ∈ R (d - dimensional data)
d
which the variance is maximized. This results into a sum
and N given data points for which one aims at find- of squared error cost function with regularization. The
ing projected variables vT xi with maximal variance. SVM score variables are taken as additional error variables.
formulation to PCA problem is given in [30] as follows: We now follow the usual SVM methodology of mapping
N 
 2 the d-dimensional data from the input space to a high-
max 0 − vT xi dimensional feature space φ : Rd → Rnh , where nh can be
v
i=1 infinite, and apply Mercer’s theorem [31].
where zero is considered as a single target value. This Our objective is the following
interpretation of the problem leads to the following primal
N 
 2
optimization problem
max 0 − vT (φ(xk ) − μ̂φ )) (2)
v
1 2 1 T
N
k=1
max JP (v, e) = γ ei − v v
v,e 2 2
i=1 with μ̂φ = (1/N) N k=1 φ(xk ) and v is the eigenvector
such that in the primal space with maximum variance. This for-
mulation states that one considers the difference between
ei = vT xi , i = 1, . . . , N.
vT (φ(xk ) − μ̂φ ) (the projected data points to the target
This formulation states that one considers the difference space) and the value 0 as error variables. The projected
between vT xi (the projected data points to the target variables correspond to what is called score variables.
space) and the value 0 as error variables. The projected These error variables are maximized for the given N data
variables correspond to what one calls the score variables. points. Next, by adding a regularization term we also want
Thomas et al. BMC Bioinformatics 2014, 15:137 Page 4 of 12
http://www.biomedcentral.com/1471-2105/15/137

to keep the norm of v small. The following optimization function, the following problem is formulated for kernel
problem is formulated now in the primal weight space PCA [34].
1
N
1 2 1 T
N 1
max JP (v, e) = γ ek − v v (3) max JP (v, e) = γ L1 (ek ) − vT v
v,e 2 2 v,e 2 2
k=1 k=1

such that such that

ek = vT (φ(xk ) − μφ ), k = 1, . . . , N. ek = vT (φ(xk ) − μφ ), k = 1, . . . , N.

The Lagrangian yields We propose the following tuning criterion for the band-
width h which maximizes the L1 loss function of KPCA:
1 2 1 T   
N N
 
L(v, e; α) = γ ek− v v− αk ek −vT φ (xk )− μ̂φ
2 2 J(h) = argmax E |zn (x)|dx, (4)
k=1 k=1
h∈R+
0
with conditions for optimality where E denotes the expectation operator. Maximizing
⎧ Eq. 4 would lead to overfitting since we used all the
⎪ N  

⎪ ∂L
⎨ ∂v = 0 → v = αk φ (xk ) − μ̂φ training data in the criterion. Instead, we work with Leave-
∂L
k=1 One-Out cross validation (LOOCV) estimation of zn (x)

⎪ ∂ek = 0 → αk = γ ek k = 1, . . . , N

⎩ ∂L
  to obtain the optimum bandwidth h of KPCA, which
∂αk = 0 → ek − vT φ (xk ) − μ̂φ = 0, k = 1, . . . , N. gives projected variables with maximal variance. A finite
By elimination of variables e and v, one obtains approximation to Eq. 4 is given by
N 
1  N
T   1  (−j)
αk − αl φ (xl )− μ̂φ φ (xk )− μ̂φ = 0 k = 1,. . ., N. J(h) = argmax |zn (x)|dx (5)
γ h∈R + N
0 j=1
l=1
(−j)
Defining λ = 1
γ, one obtains the following dual problem where N is the number of samples and zn denotes the
score variable with the jth observation is left out. In case
c α = λα the leave-one-out approach is computationally expensive,
where c denotes the centered kernel matrix with ijth
one could replace it with a leave v group out strategy
entry: c,i,j = K (xi , xj ) − N1 N K (xi , xr ) − N1 N (v- fold cross-validation). Integration can be performed
N r=1 r=1
by means of any numerical technique. In our case, we
K (xj , xr ) + N12 N
r=1 s=1 K (xr , xs ).
have used trapezoidal rule. The final model with optimum
Data-driven bandwidth selection for KPCA bandwidth is constructed as follows:
Model selection is a prominent issue in all learning tasks, c,ĥ α = λα,
max
especially in KPCA. Since KPCA is an unsupervised  (−j)
technique, formulating a data-driven bandwidth selection where ĥmax = maxh∈R+ N1 N j=1 |zn (x)|dx. Figure 1
0
criterion is not trivial. Until now, no such data-driven cri- shows the bandwidth selection for cervical and colon can-
terion was available to tune the bandwidth (h) and number cer data sets for fixed number of components. To also
of components (k) for KPCA. Typically these parameters retain the optimum number of components of KPCA, we
are selected by trial and error. Analogue to least squares modify Eq. 5 as follows:
cross validation [32,33] in kernel density estimation, we k N 
propose a new data driven selection criterion for KPCA. 1  (−j)
J(h, k) = argmax |zn (x)|dx (6)
Let +
h∈R ,k∈N
N
0 0 n=1 j=1

zn (x) = i=1N
αi(n) K (xi , x) where k = 1, . . . , N. Figure 2 illustrate the proposed
  model. Figure 3 shows the surface plot of Eq. 6 for various
x −x 2
where K (xi , xj ) = exp − i2h2j (RBF kernel with band- values of h and k.
width h) and set the target equal to 0 and denote by zn (x) Thus, the proposed data-driven model can obtain the
the score variable of sample x on nth eigenvector α (n) . optimal bandwidth for KPCA, while retaining minimum
Here, the score variables are expressed in terms of ker- number of eigenvectors which capture the majority of the
nel expressions in which every training point contributes. variance of the data. Figure 4 shows a slice of the surface
These expansions are typically dense (nonsparse). In plots. The values of the proposed criterion were re-scaled
Equation 3, the KPCA uses L2 lose function. Here we have to be maximum 1. The parameters that maximize Eq. 6
chosen the L1 loss function to induce sparsness in KPCA. are h = 70.71 and k = 5 for cervical cancer data and h =
By extending the formulation in Equation 3 to L1 loss 43.59 and k = 15 for colon cancer data.
Thomas et al. BMC Bioinformatics 2014, 15:137 Page 5 of 12
http://www.biomedcentral.com/1471-2105/15/137

0.08 0.8

0.06 0.6
\tex[t][t]{J(h)}

\tex[t][t]{J(h)}
0.04 0.4

0.02 0.2

0 0
0 50 100 150 200 0 20 40 60 80 100
\tex[t][t]{h} \tex[t][t]{h}

a b
Figure 1 Bandwidth selection of KPCA for a fixed number of components. Retaining (a) 5 components for cervical cancer data set (b) 15
components for colon cancer data set.

Classification models with y =[y1 , . . . , yN ]T , 1N =[1, . . . , 1]T , e =[e1 , . . . , eN ]T ,


The constrained optimization problem for an LS-SVM β =[β1 , . . . , βN ]T and i,j = yi yj K (xi , xj ) where K (xi , xj )
[16,35] for classification has the following form: is the kernel function. The classifier in the dual space takes
  the form
1 T 1 N 2
min w w + γ k=1 ek N 
w,b,e 2 2 
subject to: y(x) = sign βk yk K (x, xk ) + b (7)
k=1
 
yk wT φ(xk ) + b = 1 − ek , k = 1, . . . , N
where βk are Lagrange multipliers.
where φ(.): Rd → Rd h
is a nonlinear function which maps
the d-dimensional input vector x from the input space to Results
the dh -dimensional feature space, possibly infinite. In the First we considered nine data sets described inTable 1. We 
||x −x ||2
dual space the solution is given by have chosen the RBF kernel K (xi , xj ) = exp − i2h2j
     for KPCA. In this section all the steps are implemented
0 yT b 0 using Matlab R2012b and LS-SVMlab v1.8 toolbox [36].
=
y + γI β 1v Next, we compared the performance of the proposed

Figure 2 Data-Driven Bandwidth Selection for KPCA Leave-one-out cross validation (LOOCV) for KPCA.
Thomas et al. BMC Bioinformatics 2014, 15:137 Page 6 of 12
http://www.biomedcentral.com/1471-2105/15/137

0.2

\tex[t][t]{$J(h,k)$}
1

\tex[t][t]{J(h,k)}
0.15

0.1
0.5
0.05

0 0 50
200 100
20
100 10 50
0 0
\tex[t][t]{$h$} \tex[t][t]{$k$} \tex[t][t]{$h$} 0 0 \tex[t][t]{$k$}

a b
Figure 3 Model selection for KPCA-optimal bandwidth and number of components. (a) Cervical cancer (b) Colon cancer.

method with classical PCA and an existing tuning algo- optimal number of components were selected by slightly
rithm for RBF-KPCA developed by Pochet et al. [17]. modifying the Equation 6, i.e., which performed only for
Later, with the intention to comprehensively compare the components k as follows:
PCA+LS-SVM and KPCA+LS-SVM with other classifica-
k N 
1    (−j) 
tion methods, we applied four widely used classifiers to
the microarray data, being LS-SVM on whole data sets, t- J(k) = argmax znpca (x) dx (8)
test + LS-SVM, PAM and Lasso. To fairly compare kernel k∈N0 N n=1 j=1
functions of the LS-SVM classifier; linear, RBF and poly-
nomial kernel functions are used (in Table 2 referred to where znpca (x) the score corresponding to the varibale x on
as linear/poly/RBF). The average test accuracies and exe- PCA problem. (See Equation 1).
cution time for all these methods when applied to the 9 Figure 5 shows the plots of the optimal components
case studies are shown in Table 2 and Table 3 respectively. selection of PCA. Thus we retained 13 components and
Statistical significance test results (two-sided signed rank 15 components for cervical and colon cancer respectively
test) are given in Table 4 which compares the performance for PCA. Similarly, we obtained number of components of
of KPCA with other classifiers. For all these methods, PCA and the number of components with corresponding
training on 2/3rd of the samples and testing on 1/3rd of bandwidth for KPCA for the remaining data sets.
the samples was repeated 30 times. The score variables (projection of samples onto the
direction of selected principal components) are used to
Comparison between the proposed criterion and PCA develop an LS-SVM classification model. The averaged
For each data set, the proposed methodology is applied. test AUC values over the 30 random repetitions were
This methodology consists of two steps. First, Eq. 6 is reported.
maximized in order to obtain an optimal bandwidth h The main goal of PCA is the reduction of dimension-
and corresponding number of components k. Second, the ality, that is, focusing on a few principal components
reduced data set is used to perform a classification task (PC) versus many variables. There are several criteria
with LS-SVM. We retained 5 and 15 components respec- have been proposed for determining how many PC should
tively for cervical and colon cancer data sets. For PCA, the be investigated and how many should be ignored. One

1
\tex[B][B]{Rescaled Criteria}

1
\tex[t][t]{Rescaled Criteria}

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 5 10 15 20 0 10 20 30 40 50
\tex[t][t]{Number of components $k$} \tex[t][t]{Number of components $k$}

a b
Figure 4 Slice plot for the Model selection for KPCA for the optimal bandwidth. (a) Cervical cancer (b) Colon cancer.
Thomas et al. BMC Bioinformatics 2014, 15:137 Page 7 of 12
http://www.biomedcentral.com/1471-2105/15/137

Table 2 Comparison of classifiers: Mean AUC(std) of 30 iterations


Data set Kernel function Preprocessing + LS-SVM classifier PAM Lasso
for classification
Whole data PCA KPCA t-test (p < 0.05)

RBF 0.769(0.127) 0.793(0.081) 0.822(0.088) 0.835(0.078)


I lin 0.822(0.068) 0.837(0.088) 0.864(0.078) 0.857(0.078) 0.787(0.097) 0.837(0.116)
poly 0.818(0.071) 0.732(0.072) 0.825(0.125) 0.845(0.017)

RBF 0.637(0.146) 0.749(0.093) 0.780(0.076) 0.779(0.082)


II lin 0.803(0.059) 0.772(0.094) 0.790(0.075) 0.751(0.071) 0.659(0.084) 0.766(0.074)
poly 0.701(086) 0.752(0.063) 0.753(0.072) 0.784(0.059)

RBF 0.832(0.143) 0.762(0.066) 0.879(0.058) 0.921(0.027)


III lin 0.915(0.043) 0.785(0.063) 0.878(0.066) 0.941(0.036) 0.707(0.067) 0.9359( 0.0374)
poly 0.775(0.080) 0.685(0.105) 0.8380(0.068) 0.858(0.042)

RBF 0.615(0.197) 0.853(0.112) 0.867(0.098) 0.808(0.225)


IV lin 0.953(0.070) 0.917(0.083) 0.929(0.077) 0.987(0.028) 0.759(0.152) 0.707(0.194)
poly 0.762(0.118) 0.811(0.140) 0.840(0.131) 0.779(0.123)

RBF 0.807(0.238) 0.790(0.140) 0.976(0.035) 0.998(0.005)


V lin 0.997(0.005) 0.528(0.134) 0.982(0.022) 0.998(0.006) 0.923(0.062) 0.934(0.084)
poly 0.942(0.051) 0.804(0.121) 0.975(0.028) 0.965(0.049)

RBF 0.998(0.001) 0.982(0.002) 0.984(0.012) 0.998(0.004)


VI lin 0.990(0.005) 0.973(0.002) 0.978(0.013) 0.993(0.013) 0.960(0.016) 0.951(0.045)
poly 0.998(0.006) 0.985(0.016) 0.973(0.018) 0.995(0.011)

RBF 0.946(0.098) 0.941(0.057) 0.932(0.071) 0.967(0.048)


VII lin 0.983(0.025) 0.947(0.047) 0.954(0.051) 0.987(0.022) 0.931(0.058) 0.952(0.030)
poly 0.785(0.143) 0.903(0.078) 0.915(0.080) 0.920(0.025)

RBF 0.823(0.159) 0.923(0.096) 0.858(0.113) 0.950(0.150)


VIII lin 0.840(0.164) 0.969(0.044) 0.800(0.019) 0.999(0.005) 0.982(0.050) 0.890(0.081)
poly 0.781(0.186) 0.870(0.117) 0.785(0.121) 0.998(0.007)

RBF 0.638(0.210) 0.823(0.159) 0.852(0.180) 0.815(0.200)


IX lin 0.931(0.126) 0.840(0.164) 0.846(0.143) 0.930(0.139) 0.703(0.175) 0.705(0.174)
poly 0.841(0.176) 0.781(0.186) 0.798(0.193) 0.768(0.193)
p-value: False Discovery Rate (FDR) corrected.

common criteria is to include all those PCs up to a prede- Comparison between the proposed criterion and an existing
termined total percent variance explained, such as, 95%. optimization algorithm for RBF-KPCA
Figure 6 depicts the prediction performances on colon We selected two experiments from Pochet et al. [17] (last
cancer data, with PCA+LS-SVM(RBF), at different frac- two data sets in Table 1), being high-grade glioma and
tions of explained total variance. It shows the results breast cancer II data sets. We repeated the same experi-
vary with the selected components. Here the number of ments as reported in Pochet et al. [17] and compared with
retained components, depends on the chosen fraction of the proposed strategy. The results are shown in Table 5.
explained total variance. The proposed approach offers a The three dimensional surface plot of LOOCV perfor-
data-driven selection criterion for PCA problem, instead mance of the method proposed by [17] for the high-grade
of a traditional trial and error PC selection. glioma data set is shown in Figure 7, with the optimal
Thomas et al. BMC Bioinformatics 2014, 15:137 Page 8 of 12
http://www.biomedcentral.com/1471-2105/15/137

Table 3 Summary of averaged execution time of classifiers over 30 iterations in seconds


Data set Whole data PCA KPCA t-test (p < 0.05) PAM Lasso
1: Colon 17 10 18 13 8 72
2: Breast 56 38 54 42 12 258
3: Pancreatic 17 12 26 19 20 453
4: Cervical 43 28 29 33 43 106
5: Leukemia 225 185 184 195 28 680
6: Ovarian 51 25 39 44 19 865
7: Head & neck squamous
cell carcinoma 59 39 45 47 30 238
8: Duchenne muscular dystrophy 146 115 113 110 80 20100
9: HIV encephalitis 45 27 27 28 88 118

Table 4 Statistical significance test which compares KPCA with other classifiers: whole data, PCA, t-test, PAM and Lasso
Kernel function Data set I II III IV V VI VII VIII IX
Whole data 1.0000 1.0000 0.9250 0.0015 0.5750 0.0400 0.0628 0.0200 0.0150
PCA 0.0050 0.0021 0.0003 0.0015 2.83E-08 5.00E-07 0.0250 0.0005 0.0140
RBF t-test 1.0000 1.0000 1.0000 1.0000 6.50E-04 4.35E-04 0.0110 0.0005 1.0000
PAM 1.0000 6.10E-05 0.0002 0.0800 0.1450 0.0462 1.0000 0.0002 0.0015
Lasso 0.0278 1.000 0.0001 0.0498 1.0000 0.0015 1.0000 0.00003 0.0200

Whole data 1.0000 0.3095 1.0000 1.0000 1.0000 1.0000 1.0000 0.0009 1.0000
PCA 7.00E-05 0.0011 1.30E-09 7.70E-09 1.28E-08 2.72E-05 6.15E-07 0.357 0.230
lin t-test 1.0000 0.2150 0.7200 1.0000 0.0559 0.0443 1.0000 0.5450 1.0000
PAM 0.0400 0.0003 0.0422 0.0015 0.0004 0.0001 0.0015 1.0000 0.0300
Lasso 0.4950 0.4950 0.0049 2.12E-06 0.0005 0.0493 0.0025 1.0000 2.12E-06

Whole data 1.0000 0.0100 1.0000 4.16E-11 0.00450 5.90E-08 7.70E-08 1.0000 1.0000
PCA 0.0130 0.0003 4.35E-07 4.50E-05 7.70E-08 0.0040 3.28E-08 2.72E-05 5.00E-11
poly t-test 1.0000 1.0000 0.0250 1.0000 0.0443 0.2100 1.0000 0.0005 1.0000
PAM 0.1200 0.0005 0.0100 0.0400 0.0300 1.0000 0.0015 0.0200 0.0650
Lasso 0.0100 1.0000 4.61E-05 1.76E-08 0.5000 1.0000 0.0006 0.0010 0.4350
P-values of two-sided signed test are given.
p-value: False Discovery Rate (FDR) corrected.

1 1
\tex[B][B]{Rescaled Criteria}
\tex[B][B]{Rescaled Criteria}

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 5 10 15 20 0 10 20 30 40 50
\tex[t][t]{Number of components $k$} \tex[t][t]{Number of components $k$}

a b
Figure 5 Plot for the selection of optimal number of components for PCA. (a) Cervical cancer (b) Colon cancer.
Thomas et al. BMC Bioinformatics 2014, 15:137 Page 9 of 12
http://www.biomedcentral.com/1471-2105/15/137

0.9
15 components
selected by proposed
0.85 criterion for PCA which
is 65% − of variance
explained
0.8
averaged test AUC

0.75

0.7

0.65

0.6

0.55
30 40 50 60 70 80 90 100
% − of variance explained

Figure 6 The prediction performances on colon cancer data, with PCA+LS-SVM(RBF). Number of selected components depends on the
chosen fraction of explained total variance.

h = 114.018 and k = 12. The optimum parameters are Discussions


h = 94.868 and k = 10 obtained by the proposed strategy The obtained test AUC of different classifiers on nine
(see Eq. 6) for the same data set. data sets, do not direct to a common conclusion that
When looking at test AUC in Table 5, both case studies one method outperforms the other. Instead, it shows
applying the proposed strategy, perform better than the that each of these methods have its own advantage in
method proposed by Pochet et al.[17] with less variability. classification tasks. When considering classification prob-
In addition, the tuning method Pochet et al. [17] appears lems without dimensionality reduction, the regularized
to be quite time consuming, whereas the proposed model LS-SVM classifier shows a good performance on 50 per-
enjoys a distinctive advantage with its low time complexity centage of data sets. Up till now, most microarray data sets
to carry out the same process. are smaller in the sense of number of features and sam-
ples, but it is expected that these data sets might become
Comparison between the proposed criterion and other larger or perhaps represent more complex classification
classifiers problems in the future. In this situation, dimension-
In Table 4, we have highlighted the comparisons in which ality reduction processes (feature selection and feature
the proposed method was significantly better. When look- transformation) are the essential steps for building sta-
ing specifically on the performance of each of the dis- ble, robust and interpretable classifiers on these kind of
cussed methods, we note that LS-SVM performance was data.
slightly low on PCA. On data sets IV, VI, VII proposed The selected features of feature selection method such
approach performs better than, LS-SVM with RBF kernel as t-test, PAM and Lasso widely vary for each random iter-
and LS-SVM with linear kernel. The proposed approach is ation. Moreover, the classification performance of these
outperformed, by the t-test + LS-SVM on data sets V and methods on each iteration depends on the number of
VI and, by both PAM and Lasso on most of the data sets. features selected. Table 6 shows the range, i.e. minimum
and maximum number of features selected on 30 itera-
tions. However PAM is a user friendly toolbox for gene
Table 5 Comparison of performance of proposed criterion selection and classification tasks, its performance depends
with the method proposed by Pochet et al. [17]: Averaged heavily on the selected features. In addition, it is interest-
test AUC(std) over 30 iterations and execution time in
ing that the Lasso selected only very small subsets of the
minutes
actual data sets. But, in the Lasso, the amount of shrinkage
Data set Proposed Pochet varies, depending on the value of the tuning parame-
strategy et al. [17]
ter, which is often determined by cross validation [37].
Test AUC Time Test AUC Time
The number of genes selected as the outcome-predictive
High-grade 0.746 (0.071) 2 0.704 38 genes, generally decrease as the value of the tuning param-
glioma data (0.104)
eter increases. The optimal value of the tuning parameter,
Breast cancer II 0.6747 4 0.603 459 that maximizes the prediction accuracy is determined;
(0.1057) (0.157)
however, the set of genes identified using the optimal
Thomas et al. BMC Bioinformatics 2014, 15:137 Page 10 of 12
http://www.biomedcentral.com/1471-2105/15/137

0.9

0.8
\tex[t][t]{$LOO−CV$}

0.7

0.6

0.5

0.4
250
200 20
18
150 16
14
100 12
10
8
50 6
\tex[b][b]{$h$} 4
\tex[b][b]{$k$}
2
0 0
Figure 7 LOOCV performance of optimization algorithm [17] on high-grade glioma data set.

value contains the non-outcome-predictive genes (ie, false in overfitting. The proposed parameter selection criterion
positive genes) in many cases [9]. of KPCA with RBF kernel, often results in test set per-
The test AUC on all nine case studies shows that KPCA formances (see Table 4) that is better than using KPCA
performs better than classical PCA. But the parameters of with a linear kernel, which reported in Pochet et al. It
KPCA need to be optimized. Here we have used LOOCV means that LOOCV in the proposed parameter selection
approach for parameters selection (bandwidth and num- criterion does not encounter an overfitting for KPCA with
ber of components) of KPCA. In the optimization algo- RBF kernel function. In addition, the optimization algo-
rithm proposed by Pochet et al. [17], the combination of rithm proposed by Pochet et al. is completely coupled with
KPCA with RBF kernel followed by FDA tends to result the subsequent classifier and thus it appears to be very
time-consuming.
In combination with classification methods, microarray
Table 6 Summary of the range (minimum to maximum) of
data analysis can be useful to guide clinical management
features selected over 30 iterations
in cancer studies. In this study, several mathematical and
Data set t-test (p < 0.05) PAM Lasso
statistical techniques were evaluated and compared in
1: Colon 197-323 15-373 8-36 order to optimize the performance of clinical predictions
2: Breast 993-1124 13-4718 7-87 based on microarray data. Considering the possibility of
3: Pancreatic 2713-4855 3-1514 12-112 increasing size and complexity of microarray data sets
4: Cervical 5858-6756 2-10692 5-67
in future, dimensionality reduction and nonlinear tech-
niques have its own significance. In many cases, in a
5: Leukemia 1089-2654 137-11453 2-69
specific application context the best feature set is still
6: Ovarian 7341-7841 34-278 62-132 important (e.g. drug discovery). While considering the
7: Head and neck stability and performance (both accuracy and execution
squamous
time) of classifiers, the proposed methodology has its own
cell carcinoma 307-831 1-12625 3-35 importance to predict classes, of future samples of known
8: Duchenne 973-2031 129-22283 8-24 disease cases.
muscular dystrophy Finally this work could be extended further to uncover
9: HIV encephalitis 941-1422 1-12625 1-20 key features from biological data sets. In several studies,
p-value: False Discovery Rate (FDR) corrected. KPCA have used to obtain biologically relevant features
Thomas et al. BMC Bioinformatics 2014, 15:137 Page 11 of 12
http://www.biomedcentral.com/1471-2105/15/137

such as genes [38,39] or detect the association between PFV/10/016 SymBioSys, PhD/Postdoc grants;Industrial Research fund (IOF):
multiple SNPs and disease [40]. In all these cases, one IOF/HB/13/027 Logic Insulin; Flemish Government: FWO: projects: G.0871.12N
(Neural circuits); PhD/Postdoc grants; IWT: TBM-Logic Insulin (100793), TBM
needs to address the parameter optimization of KPCA. Rectal Cancer (100783), TBM IETA (130256); PhD/Postdoc grants; Hercules
The available bandwidth selection techniques of KPCA Stichting: Hercules 3: PacBio RS, Hercules 1: The C1 single-cell auto prep
are time-consuming with high computational burden. system, BioMark HD System and IFC controllers (Fluidigm) for single-cell
analyses; iMinds Medical Information Technologies SBO 2014; VLK Stichting E.
This could be resolved with the proposed data-driven van der Schueren: rectal cancer;Federal Government: FOD: Cancer Plan
bandwidth selection criterion for KPCA. 2012-2015 KPC-29-023 (prostate); COST: Action: BM1104: Mass Spectrometry
Imaging. The scientific responsibility is assumed by its authors.

Conclusion Author details


1 KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for
The objective in class prediction with microarray data
Dynamical Systems, Signal Processing and Data Analytics/iMinds Medical IT,
is an accurate classification of cancerous samples, which Kasteelpark Arenberg 10, 3001 Leuven, Belgium. 2 Iowa State University,
allows directed and more successful therapies. In this Department of Statistics & Computer Science, Ames, IA, USA.
paper, we proposed a new data-driven bandwidth selec-
Received: 17 April 2013 Accepted: 24 April 2014
tion criterion for KPCA (which is a well defined prepro- Published: 10 May 2014
cessing technique). In particular, we optimize the band-
width and the number of components by maximizing the References
1. Roth V, Lange T: Bayesian class discovery in microarray data.
projected variance of KPCA. In addition, we compared IEEE Trans Biomed Eng 2004, 51:707–718.
several data preprocessing techniques prior to classifica- 2. Qiu P, Plevritis SK: Simultaneous class discovery and classification of
tion. In all the case studies, most of these preprocessing microarray data using spectral analysis. J Comput Biol 2009,
16:935–944.
steps performed well on classification with approximately 3. Somorjai RL, Dolenko B, Baumgartner R: Class prediction and discovery
similar performance. We observed that in feature selec- using gene microarray and proteomics mass spectroscopy
tion methods selected features widely vary on each iter- data:curses,caveats, cautions. Bioinformatics 2003, 19:1484–1491.
4. Conde L, Mateos A, Herrero J, Dopazo J: Improved class prediction in
ation. Hence it is difficult, even impossible to design DNA microarray gene expression data by unsupervised reduction of
a stable class predictor for future samples with these the dimensionality followed by supervised learning with a
methods. Experiments on nine data sets show that the perceptron. J VLSI Signal Process 2003, 35(3):245–253.
5. Tibshirani RJ, Hastie TJ, Narasimhan B, Chu G: Diagnosis of multiple
proposed strategy provides a stable preprocessing algo- cancer types by shrunken centroids of gene expression. PNAS 2002,
rithm for classification of high dimensional data with good 99(10):6567–6572.
performance on test data. 6. Chu F, Wang L: Application of support vector machine to cancer
classification with microarray data. Int J Neural Syst World Scientif 2005,
The advantages of the proposed KPCA+LS-SVM clas- 5:475–484.
sifier were presented in four aspects. First, we propose a 7. Chun LH, Wen CL: Detecting differentially expressed genes in
data-driven bandwidth selection criterion for KPCA by heterogeneous disease using half Student’s t-test. Int I Epidemiol
2010, 10:1–8.
tuning the optimum bandwidth and the number of prin- 8. Tibshirani R: Regression shrinkage and selection via the lasso. J Roy
cipal components. Second, we illustrate that the perfor- Statist Soc B 1996, 58:267–288.
mance of the proposed strategy is significantly better than 9. Kaneko S, Hirakawa A, Hamada C: Gene selection using a
high-dimensional regression model with microarrays in cancer
an existing optimization algorithm for KPCA. Third, its prognostic studies. Cancer Inform 2012, 11:29–39.
classification performance is not sensitive to any number 10. Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R: Advances in
of selected genes, so the proposed method is more sta- Knowledge Discovery and Data Mining. Cambridge, MA: AAAI/ MIT Press;
1997.
ble than others proposed in literature. Fourth, it reduces 11. Pechenizkiy M, Tsymbal A, Puuronen S: PCA-based feature
the dimensionality of the data while keeping as much transformation for classification:issues in medical diagnostics. In
information as possible of the original data. This leads to Proceedings of the 17th IEEE Symposium on Computer-Based Medical
Systems. Washington, DC, USA: IEEE Computer Society; 2004:535–540.
computationally less expensive and more stable results for 12. Ng A, Jordan M, Weiss Y: On spectral clustering: Analysis and an
massive microarray classification. algorithm. In Advances in Neural Information Processing Systems 14;
2001:849–856.
13. Liu Z, Chen D, Bensmail H: Gene expression data classification with
Competing interests kernel principal component analysis. J Biomed Biotechnol 2005,
The authors declare that they have no competing interests. 2:155–159.
14. Scholkopf B, Smola AJ, Muller KR: Nonlinear component analysis as a
kernel eigenvalue problem. Neural Comput 1998b, 10:1299–1319.
Authors’ contributions
15. Suykens JAK, Van Gestel T, De Moor B: A support vector machine
MT performed bandwidth selection, subsequent classification and drafted the
formulation to PCA analysis and its kernel version. IEEE Trans Neural
paper. KDB participated in the design and implementation of framework. KDB
Netw 2003, 14:447–450.
and BDM helped draft the manuscript. All authors read and approved the final
16. Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J: Least
manuscript.
Squares Support Vector Machines. Singapore: World Scientific; 2002.
17. Pochet N, De Smet F, Suykens JAK, De Moor B: Systematic
Acknowledgements benchmarking of microarray data classification: assessing the role
BDM is full professor at the Katholieke Universiteit Leuven, Belgium. Research of nonlinearity and dimensionality reduction. Bioinformatics 2004,
supported by Research Council KU Leuven: GOA/10/09 MaNet, KUL 20:3185–3195.
Thomas et al. BMC Bioinformatics 2014, 15:137 Page 12 of 12
http://www.biomedcentral.com/1471-2105/15/137

18. Bioinformatics research group [http://www.upo.es/eps/bigs/datasets. 39. Gao Q, He Y, Yuan Z, Zhao J, Zhang B, Xue F: Gene- or region-based
html] association study via kernel principal component analysis. BMC
19. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Genetics 2011, 12(75):1–8.
Broad patterns of gene expression revealed by clustering analysis 40. Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X:
of tumor and normal colon tissues probed by oligonucleotide Powerful SNP-set analysis for case-control genome-wide
arrays. PNAS 1999, 96(12):6745–6750. association studies. Am J Hum Genet 2010, 6(86):929–942.
20. Hess KR, Anderson K, Symmans WF, Valero V, Ibrahim N, Mejia JA, Booser
D, Theriault RL, Buzdar AU, Dempsey PJ, Rouzier R, Sneige N, Ross JS, doi:10.1186/1471-2105-15-137
Vidaurre T, Gómez HL, Hortobagyi GN, Pusztai L: Pharmacogenomic Cite this article as: Thomas et al.: New bandwidth selection criterion
predictor of sensitivity to preoperative chemotherapy with for Kernel PCA: Approach to dimensionality reduction and classification
paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in problems. BMC Bioinformatics 2014 15:137.
breast cancer. J Clin Oncol 2006, 24:4236–4244.
21. FDA-NCI clinical proteomics program databank [http://home.ccr.
cancer.gov/ncifdaproteomics/ppatterns.asp]
22. Hingorani SR, Petricoin EF, Maitra A, Rajapakse V, King C, Jacobetz MA,
Ross S, Conrads TP, Veenstra TD, Hitt BA, Kawaguchi Y, Johann D, Liotta
LA, Crawford HC, Putt ME, Jacks T, Wright CV, Hruban RH, Lowy AM,
Tuveson DA: Preinvasive and invasive ductal pancreatic cancer and
its early detection in the mouse. Cancer Cell 2003, 4(6):437–50.
23. Wong YF, Selvanayagam ZE, Wei N, Porter J: Expression genomics of
cervical cancer: molecular classification and prediction of
radiotherapy response by DNA microarray. Clin Cancer Res 2003,
9(15):5486–92.
24. Stirewalt DL, Meshinchi S, Kopecky KJ, Fan W: Identification of genes
with abnormal expression changes in acute myeloid leukemia.
Genes Chromosomes Cancer 2008, 47(1):8–20.
25. Kuriakose MA, Chen WT, He ZM, Sikora AG: Selection and validation of
differentially expressed genes in head and neck cancer. Cell Mol Life
Sci 2004, 61(11):1372–83.
26. Pescatori M, Broccolini A, Minetti C, Bertini E: Gene expression profiling
in the early phases of DMD: a constant molecular signature
characterizes DMD muscle from early postnatal life throughout
disease progression. FASEB J 2007, 21(4):1210–26.
27. Masliah E, Roberts ES, Langford D, Everall I: Patterns of gene
dysregulation in the frontal cortex of patients with HIV encephalitis.
J Neuroimmunol 2004, 157(1–2):163–75.
28. Nutt CL, Mani DR, Betensky RA, Tamayo P, Cairncross JG, Ladd U, Pohl C,
Hartmann C, McLaughlin ME, Batchelor TT, Black PM, von Deimling A,
Pomeroy SL, Golub TR, Louis DN: Gene expression-based classification
of malignant gliomas correlates better with survival than
histological classification. Cancer Res 2003, 63(7):1602–1607.
29. van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL,
van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM,
Roberts C, Linsley PS, Bernard R, Friend SH: Gene expression profiling
predicts clinical outcome of breast cancer. Nature 2002, 415(6871):
530–536.
30. Suykens JAK, Van Gestel T, Vandewalle J, De Moor B: A support vector
machine formulation to PCA analysis and its kernel version. IEEE
Trans Neural Netw 2003, 14(2):447–450.
31. Mercer J: Functions of positive and negative type and their
connection with the theory of integral equations. Philos Trans R Soc A
1909, 209:415–446.
32. Bowman AW: An alternative method of cross-validation for the
smoothing of density estimates. Biometrika 1984, 71:353–360.
33. Rudemo M: Empirical choice of histograms and kernel density
estimators. Scand J Statist 1982, 9:65–78.
34. Alzate C, Suykens JAK: Kernel component analysis using an Submit your next manuscript to BioMed Central
epsilon-insensitive robust loss function. IEEE Trans Neural Netw 2008, and take full advantage of:
9(19):1583–98.
35. Suykens JAK, Vandewalle J: Least squares support vector machine • Convenient online submission
classifiers. Neural Process Lett 1999, 9:293–300.
• Thorough peer review
36. De Brabanter K, Karsmakers P, Ojeda F, Alzate C, De Brabanter J,
Pelckmans K, De Moor B, Vandewalle J, Suykens JAK: LS-SVMlab toolbox • No space constraints or color figure charges
user’s guide version 1.8. Internal Report ESAT-SISTA, K.U.Leuven • Immediate publication on acceptance
(Leuven, Belgium) 2010: 10–146.
37. Verweij PJ, Houwelingen HC: Cross-validation in survival analysis. • Inclusion in PubMed, CAS, Scopus and Google Scholar
Stat Med 1993, 12:2305–14. • Research which is freely available for redistribution
38. Reverter F, Vegas E, Sánchez P: Mining gene expression profiles: an
integrated implementation of kernel principal component analysis
Submit your manuscript at
and singular value decomposition. Genomics Proteomics Bioinformatics www.biomedcentral.com/submit
2010, 3(8):200–210.

You might also like