0% found this document useful (0 votes)
4 views7 pages

Devos 2009

This document discusses the application of support vector machines (SVM) in near infrared (NIR) spectroscopy, focusing on the optimization of parameters and model interpretation. It highlights the challenges of SVM, including the need for parameter tuning to avoid overfitting and the difficulty in interpreting the models. The authors propose a methodological approach for parameter selection and visualization of support vectors, demonstrating its effectiveness on two NIR datasets with high classification performance.

Uploaded by

Tanvir Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views7 pages

Devos 2009

This document discusses the application of support vector machines (SVM) in near infrared (NIR) spectroscopy, focusing on the optimization of parameters and model interpretation. It highlights the challenges of SVM, including the need for parameter tuning to avoid overfitting and the difficulty in interpreting the models. The authors propose a methodological approach for parameter selection and visualization of support vectors, demonstrating its effectiveness on two NIR datasets with high classification performance.

Uploaded by

Tanvir Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Chemometrics and Intelligent Laboratory Systems 96 (2009) 27–33

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems


j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / c h e m o l a b

Support vector machines (SVM) in near infrared (NIR) spectroscopy: Focus on


parameters optimization and model interpretation☆
Olivier Devos a,⁎, Cyril Ruckebusch a, Alexandra Durand a,b, Ludovic Duponchel a, Jean-Pierre Huvenne a
a
LAboratoire de Spectrochimie Infrarouge et Raman (LASIR CNRS UMR 8516), Université des Sciences et Technologies de Lille (USTL), bât C5, 59655 Villeneuve d'Ascq, France
b
Department of Analytical Chemistry and Pharmaceutical Technology (FABI), Vrije Universiteit Brussel, Laarbeeklaan 103, B-1090 Brussel, Belgium

a r t i c l e i n f o a b s t r a c t

Article history: Support vector machines (SVM) are learning algorithms that present good generalization performance and
Received 2 July 2008 can model complex non linear boundaries through the use of adapted kernel functions. They have been
Accepted 25 November 2008 introduced recently in chemometrics and have proven to be powerful in NIR spectra classification. But one of
Available online 6 December 2008
the major drawbacks of SVM is that training the model requires optimization of the regularization and kernel
meta-parameters in order to control the risk of overfitting and the complexity of the boundary. Furthermore
Keywords:
the interpretation of the SVM models remains difficult and these tools are then often considered as black box
Support vector machines
Near infrared spectroscopy
techniques.
Classification We propose a methodological approach to guide the choice of the SVM parameters based on a grid search for
Model visualization minimizing the classification error rate but also relying on the visualization of the number of support vectors
Interpretation (SVs). We also demonstrate the interest of visualizing the SVs in principal components subspaces to go
deeper into the interpretation of the trained SVM. The proposed methods are applied on two NIR datasets:
the first one is a slightly non linear 2-class problem and the second one a more complex 3-class task. The
optimized SVM models are quite parsimonious, relying on 8 and 35 support vectors respectively, and good
classification performances is obtained (classification rate of 98.9% and 91% on the test sets, respectively).
© 2008 Elsevier B.V. All rights reserved.

1. Introduction classification models obtained are robust and less subject to the curse
of dimensionality and over-fitting [14], properties that should be
NIR spectroscopy is widely used in food [1] and pharmaceutical emphasized when dealing with spectroscopic data. The major
industries [2] for analysis and quality control. From the NIR spectrum, difficulties considering SVM are the parameters optimization and
quantitative information can be obtained with regression models, or the lack of model interpretability. The SVM parameters tuning is a
qualitative information with classification models. Many methods critical step and they are classically optimized using an exhaustive
exist for sample classification from spectroscopic data. Very often search algorithm. Furthermore the SVM models are quite difficult to
the choice of the classification algorithm depends first on the data interpret and they are often used as “black box” methods.
structure under study and is guided by the prediction performance We propose here an approach where the choice of SVM meta-
obtained with this model. Support vector machines (SVM) are a part of parameters is performed on two simultaneous one-criterion grid search
a new generation of learning algorithms used for classification and optimization: the cross validation classification rate and the number of
regression tasks [3,4]. The SVM have been introduced in chemo- support vectors (SVs). To further interpret the classification model it might
metrics only recently [5] and have been successfully applied for mid be interesting to focus on these SVs, which are particular data points for
and near infrared classification tasks, such as material identification the SVM model, in term of their number and repartition. We exemplify
[6,7] and food discrimination [8–11]. In the case of classification, SVM these two points on two near infrared (NIR) spectroscopy datasets
minimize simultaneously the empirical classification error and max- presenting both overlapping class and multi-class task for the latter.
imize the inter-class geometric margin [12,13] leading to a unique
solution. One of the major features of SVM models is that they can 2. Support vector machines
operate in a kernel-induced feature space allowing non linear mo-
deling. Furthermore good generalization performance is obtained The theory of SVM has been extensively described in literature
even with relatively small datasets [3]. It has been reported that the [13,15]. Therefore, only a brief description of the concept of SVM in the
framework of classification will be given here.
☆ Dedicated to Prof. Jean-Pierre Huvenne on the occasion of his retirement.
Considering a binary classification problem, the objective is to
⁎ Corresponding author. Tel.: +33 320434748; fax: +33 320436755. predict for all the objects their belonging to a class y {−1, + 1}, from m
E-mail address: [email protected] (O. Devos). dimensional input data represented by a vector written x = (x1, x2,…,xm)

0169-7439/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.chemolab.2008.11.005
28 O. Devos et al. / Chemometrics and Intelligent Laboratory Systems 96 (2009) 27–33

and xi for the ith object of the training set. In the case of spec- is large, the error minimization is predominant. The classifier function
tra, m represents the number of wavelengths. The class prediction in the dual form is closed to the one in the linearly separable case (Eq.
first requires training on a data set containing the spectra correspon- (3)) and is given in Eq. (5).
ding to n objects or samples with known class, that is to say n {x,y}
n
values. f ðxÞ = w:x + b = ∑ yi α i xi :x + b with 0Vα i VC ð5Þ
i=1
2.1. Separable data
It should be noticed that a new constraint is added as αi values are
The idea of linear SVM is to search for the hyperplane which now limited by the upperbound C and the SVs with αi = C are the ones
correctly separates the data while maximizing the shortest distances on the bad side of the margin border.
from this hyperplane to the closest training samples for each class (d+
2.3. Non-linear classification
for class {+1}, d− for class {−1}). The distance (d+ +d−) defines the margin
associated to the separating hyperplane. This hyperplane is unique and
The SVM classification methodology can be extended to non linear
is called optimal hyperplane. Moreover, maximizing the margin ensures
classification. For this purpose the data are first projected in a high
good generalization performance. For linear SVM, the classification rule
dimensional feature space by the mean of a mapping function ϕ. A
is thus that all the training objects must be on the good side of the
linear SVM is then applied in the feature space (Eq. (6)), where ideally
margin borders as written in Eq. (1).
the data can be linearly separable, which corresponds to an implicit
yi ðw:xi + bÞ≥1 ð1Þ non linear boundary in the input space.
n
Where w and b are respectively the normal vector and the bias of f ðxÞ = f ð/ðxÞÞ = ∑ yi α i /ðxi Þ:/ðxÞ + b with 0Vα i VC ð6Þ
the hyperplane. In order to define the margin hyperplanes the points to i=1

consider are the ones which respect the equality in Eq. (1) as these
points lie exactly on one of the margin border. The distances for these The dot product ϕ(xi).ϕ(x) is often replaced by a so-called kernel
points to the optimal hyperplane (w.x i + b = 0) are therefore function K [15]. Among existing kernel functions, the radial basis
1
d + = d− = jjwjj and the margin width is equal thus to jjwjj 2
. The function (RBF) kernel is the most widely used as almost any boundary
optimization of the margin width is obtained by solving the shape can be obtained with this kernel and good performance is
constrained quadratic optimization problem presented in Eq. (2). generally obtained. The RBF kernel function is given in Eq. (7) where G
is related to the kernel width meta-parameter.
minimize 1=2jjwjj2 ð2Þ
 
−jjx1 −x2 jj
under the constraints yi ðw:xi + bÞ−1≥0 K ðx1 ; x2 Þ = /ðx1 Þ:/ðx2 Þ = exp = expð−Gjjx1 −x2 jjÞ ð7Þ
2σ 2
According to the optimization theory [16], the Lagrangian dual
formulation, which expresses the importance of each example in the
2.4. SVM parameters optimization
training set, can be used to solve this problem. The implementation of
the dual form is more efficient and leads to express the optimal
Optimization of the meta-parameters C (regularization parameter)
hyperplane as a linear combination of the training observations (Eq. (3)).
and G (RBF kernel parameter) is the key step in SVM as their combined
n values determine the boundary complexity and thus the classification
f ðxÞ = w:x + b = ∑ yi α i xi :x + b with 0Vα i ð3Þ
i=1 performance. In order to perform this optimization, different methods
such as grid search [18] or gradient descent algorithm [19,20] can be
The Lagrangian multipliers αi obtained after training are associated used. All these methods are usually based on cross validation clas-
to two types of constraints: the constraints for which αi N 0 are said sification rate to evaluate the performance of the model and minimize
active and the constraints for which αi = 0 are inactive. Active the risk of overfitting.
constraints correspond to objects for which distance to the optimal In this study an exhaustive grid search is used as this method is easy
hyperplane is exactly equal to half the margin. These objects are called to use and performs well (fast) when only two parameters require
support vectors (SVs). tuning. Furthermore this approach enables to visualize directly the
effect of both parameters and provides useful information. To
2.2. Non separable data illustrate the effects of the SVM parameters, we propose in Fig. 1 a
simple 2-dimensionnal classification task where direct visualization of
In the case of non separable data, the linear SVM defined the margin borders and boundary are possible. From Fig. 1, it can be
previously must be adapted to tolerate errors for some objects i. noticed that the data are almost linearly separable which is confirmed by
Their total amount is accounted for with the introduction of slack the satisfying classification performance of a linear discriminant analysis
variables ξi [17] and should be minimized. Each ξi corresponds to the (LDA) (classification rate = 91%). Sixteen different SVM models are build
distance between the object i and the corresponding margin hyper- on the basis of a small optimization grid with 16 points (C = 1, 10, 100,
plane. It is equal to 0 when the sample i is correctly assigned. The 1000 and G = 0.01, 0.1, 1, 10). The support vectors (SVs), boundaries and
Eq. (2) can be generalized to the non separable case, as presented in margins are presented Fig. 1. The number of SVs and the leave one out
Eq. (4). cross validation classification rate are given in Table 1. Globally when C is
  small the margin maximization is emphasized leading to large margin
n
minimize 1=2jjwjj2 + C ∑ ξi and smooth boundary. The number of support vectors is therefore more
i=1 ð4Þ important (remind that the SVs correspond to data points on the margin
under the constraints ξi + yi ðw:xi + bÞ−1≥0 and ξi ≥0 borders inside the margin or misclassified). When C is large the error
minimization is emphasized leading to more complex boundary and
The parameter C is a regularization meta-parameter that balances smaller margin. In this case, the misclassified objects are more
the penalization of errors and should be optimized beforehand. More important for the SVM model and the boundary seems to be “attracted”
precisely, it controls the tradeoff between two conflicting objectives: by these objects, as seen clearly for C = 1000 and G = 0.01 for example
when C is small, margin maximization is emphasized whereas when C (Fig. 1). For small values of G (large kernel bandwidth) the boundary is
O. Devos et al. / Chemometrics and Intelligent Laboratory Systems 96 (2009) 27–33 29

Fig. 1. Boundaries (black lines) and class margin borders (gray lines) of SVM models with a RBF kernel and different values of the kernel (G) and regularization (C) meta-parameters.
The support vectors (SVs) are marked in bold.

almost linear. It can be noticed that when both parameters are small samples in the calibration set and 96 in the validation set. The
(C = 1 and G = 0.01) the boundary is comparable to LDA classification. As NIR spectra were measured on a FT-NIR BOMEM MB160 DiffusIR
the value of G increased the boundary becomes more complex. instrument (ABB) in reflectance mode from 3800 to 7500 cm− 1 at
Regarding the number of SVs, it presents an optimum for intermediate 7.7 cm− 1 apparent resolution (480 data points). The raw spectra are
G values and when G is large (G = 10) the number of Support Vectors is presented in Fig. 2.
very high and the value of C has a very small influence on the margin and The second dataset contains 221 NIR spectra of different materials:
boundary. When focusing on the results in Table 1 it can be seen that 130 for the calibration subset and 91 for the validation subset. The
many parameter combinations lead to almost identical classification purpose here is to determine a discrete physical property which can
rates, and therefore the choice of the parameters might be difficult when
the optimization are only based on this criterion.

3. Experimental

3.1. NIR datasets

Two NIR datasets have been used in this study. The first dataset
contains 288 NIR spectra of manufactured samples. The classification
task here is to assign the products in two classes depending on their
composition. This dataset has been split randomly in two subsets: 192

Table 1
Number of support vectors and leave one out cross validation classification rate (in %)
for different SVM parameters setting

C\G 0.01 0.1 1 10


1000 33 96.8% 23 98.0% 19 98.8% 67 98.8%
100 48 96.4% 31 97.2% 21 97.6% 68 97.6%
10 78 92.1% 48 95.3% 31 97.2% 71 96.8%
1 110 91.0% 80 92.5% 57 97.2% 90 96.8% Fig. 2. Example of NIR spectra for the first class (continuous lines) and the second class
(dotted lines) of dataset 1.
30 O. Devos et al. / Chemometrics and Intelligent Laboratory Systems 96 (2009) 27–33

take 3 values. This 3-class problem is complex as the belonging to one


class is not directly related to the composition of this material. The NIR
spectra were measured on a XDS Rapid Content Analyzer instrument
(FOSS) in reflectance mode from 1100 to 2500 nm at 0.5 nm apparent
resolution (2800 data points). Prior to analysis different pretreatments
as SNV, mean center and derivatives have been tested alone or in
combination and the one giving the best SVM classification perfor-
mance has been used.

3.2. Software

All data analyses were performed with MATLAB7 (The mathworks


Inc., Natick, MA). To perform the LDA and SVM the LIBSVM toolbox [21]
and the Statistical Pattern Recognition Toolbox for Matlab (STPRTool)
[22] have been used.

4. Results and discussion

SVM optimization is performed on the full spectrum and principal


components (PC) subspaces are only used for interpretation in order
to visualize the main data structure. Then, in a second time, we
propose to use the PC on which the data are the more clearly
observable to represent the results of the SVM model.

4.1. Dataset 1

The scores corresponding to the projection of the spectra on PC


subspaces (Fig. 3) show that the separation is almost linear when
looking at the PC1/PC2 subspace. The linear tendency is confirmed as
linear discriminant analysis (LDA) on the PC scores performs well with
a classification rate on the test set of 95.6%. For SVM classification, first
the effect of parameters C and G are visualized on an exhaustive grid
search with 20 values for C (from 1 to 105) and 20 for G (from 10− 5 to
1). To evaluate the performance of the SVM models the leave-one-out
cross validation (CV) classification rate is used. The results are
presented in Fig. 4A) in decimal logarithmic scales. From this figure
the determination of the optimum values is difficult as many different
combinations of C and G lead to classification rate greater than 98%.
The classification rate is more insensitive to the value of C when G
increases and might be explained by the fact that the data are almost Fig. 4. 20 × 20 optimization grid results for dataset 1 in term of A) classification rate
linearly separable and, as mentioned previously, the amount of errors obtained for full cross validation B) number of support vectors.
expressed by the sum of the slack variables (see Eq. (7)) is very low
minimizing the effect of C. To go further in the interpretation we are parameters values. The SVM results and the parameters values for
going to focus on two particular models (model A and B) which give these two models are presented in Table 2. Both SVM models give a
the same cross validation classification rate but for very different classification rate of 98.5% and a test rate of 98.9% but it can be
observed that the model A relies on 73 SVs whereas model B requires
8 SVs only. The repartition of the SVs in the PC1/PC2 subspace is given
in Fig. 5. In the case of model A the 73 SVs are almost uniformly spread
in this subspace. Given that SVs are samples on the margin border or
on the bad side of the border it's hardly possible to visualize what the
projection of the boundary will be in this subspace. In the case of
model B, the 8 SVs are localized close to the border between the two
classes in the PC1/PC2 subspace and consequently it can be imagined

Table 2
Performance of LDA and SVM classifiers for dataset 1

Parameters nSVa Classification rate


CV %b Test %c
LDA 95.6
SVM C = 3.31 73 98.5 98.9
(A) G = 0.55
SVM C = 2637 8 98.5 98.9
(B) G = 0.0078
a
Number of SVs.
b
Leave one out cross validation classification rate.
c
Fig. 3. Dataset 1 projected on PC subspaces with the two classes indicated by ‘o’ and ‘x’. Classification rate on the test set.
O. Devos et al. / Chemometrics and Intelligent Laboratory Systems 96 (2009) 27–33 31

that the boundary is somewhere between these SVs. Despite similar


classification rates the model B should be preferred as it is more
parsimonious and, as we have seen, easier to interpret. In fact the
reason for selecting the most parsimonious model is to minimize the
potential effects of “overfitting” which is likely to be the situation for
model A but cannot be accessed from the cross validation results only.
In the following, we thus insist on optimizing the parameters in term
of both the classification rate and the number of SVs.
The same grid can be used to visualize the number of support
vectors associated to the models build for each node of the grid (each
C and G). The results are presented Fig. 4B). As can be seen the SVM
models with the best CV classification rate correspond to models with
a number of SVs spanning from 80 to 8. It can be noticed that for the
dataset studied here the number of support vectors is much more
influenced by the parameters values than the classification rate, the
zone corresponding to the minimum number of SVs is small. Finally
the SVM model with C = 2637 and G = 0.0078, which corresponds to
model B, give the best classification performance for the smallest
number of SVs. Fig. 6. Dataset 2 projected on PC subspaces with the three classes indicated by ‘o’, ‘x’ and ‘+’.

4.2. Dataset 2 subspaces (Fig. 6), where overlapping classes are observed and no
clear (linear) separation can be drawn. Among the different PC
The classification problem for dataset 2 is more complex as can be subspaces, the projection on PC2/PC3 will be used for interpretation
observed from the visualization of the data scores on different PC purpose because better class separations are observed compared to

Fig. 5. Dataset 1 projected on the PC1/PC2 subspace with the two classes indicated by ‘o’
and ‘x’. The SVs are marked in bold and correspond to SVM model with RBF kernel and Fig. 7. 20 × 20 optimization grid results for dataset 2 in term of A) classification rate
A) G = 0.55 and C = 3.31, B) G = 0.0078 and C = 2637. obtained for full cross validation B) number of support vectors.
32 O. Devos et al. / Chemometrics and Intelligent Laboratory Systems 96 (2009) 27–33

Table 3 Table 4
Performance of SVM classifier for dataset 2 Lagrangian multipliers for the SVs marked in Fig. 8

Parameters nSVa Classification rate Sample α


CV %b Test %c A1 582
SVM C = 16236 35 88.5 91 A2 12,286
G = 0.00069 B1 8856
C1 16,236
a
Number of SVs.
b
Leave one out cross validation classification rate.
c
Classification rate on the test set.
indication of the importance of point i for the boundary. A1 belongs to a
the other subspaces. A multiclass SVM model has been trained on the subset of samples of the same class without samples of the other class
full spectra dataset. As discussed in the literature, different strategies nearby. As a consequence A1 has a weak influence compared to the others
can be developed in the case of multiclass SVM. The one chosen here is SVs on the boundary and therefore the coefficient for this point is small. In
a “one-against-one” algorithm. For a k class problem, the first step is the case of A2 and B1, which belong to two different classes and are facing
to build k.(k − 1) / 2 two-class classifier. A “max wins” voting strategy each other in this subspace, they are obviously key points for the boundary
based on performing a majority vote to assign the winning class is computation. As a consequence, their coefficients are quite large but
then used to make final decision [23]. It should be noticed that remain inferior to C: these two SVs are on their respective margin border.
identical parameters (C,G) are used for all the binary classification For atypical samples such as C1 their coefficients are equal to C which
tasks. These parameters are optimized in the same manner as pre- means that they are on the bad side of the margin border. Looking further
viously. Fig. 7 presents the optimization grids for 20 values of C and G from the localization of C1 in the subspace, it can be questioned if this
in term of cross validation classification rate and number of support sample is an outlier or not and clear decision can be taken from the C1
vectors. These two figures match quite well for dataset 2: the cross spectrum since SVs are samples of the training set.
validation results and the number of SVs present an optimum for the
same parameters range. Furthermore, it can be noticed that C and G 5. Conclusions
have a larger effect on the two criterions than in the case of dataset 1
which was an almost linearly separable case. The effect of parameter C In the present study, the effects of the regularization and kernel-width
might be explained by the amount of errors (∑ξi in Eq. (7)) which is meta-parameters on the cross validation classification rate and number of
probably important here due to overlapping classes and, for the support vectors are visualized. The importance to take into account both
parameter G by the complex boundary required for good classification criteria for parameters optimization is highlighted. Furthermore it shows
performance as data are non linearly separable. The parameters and the interest of implementing a 2-criteria optimization algorithm to obtain
results computed are given in Table 3. Finally, the retained parameters parsimonious model and this will be the purpose of future work. The
(C = 16236, G = 0.00069) are the ones leading to SVM model with the identification of the SVs, the only samples finally required to compute
smallest number of support vectors (35) and a good classification the boundary function, is useful for interpretation. For this purpose, the
performance (88.5% for cross validation and 91% on the test set). The projection of the dataset on principal components subspaces coupled
35 data points corresponding to the SVs are highlighted in the PC2/ with the Lagrangian coefficients values give valuable information on the
PC3 subspace in Fig. 8. Even if it's not the original space used for importance of the sample for the classification task and allow the
building the SVM model, it can be noticed that most of the SVs are identification of atypical samples.
localized on the boundary between two classes.
Once the model optimized, the SVM decision function can be References
expressed, according to Eq. (6), as a weighted sum of 35 Gaussians (35
SVs). Only the data points with non null Lagrangian coefficient αi [1] B.G. Osborne, T. Feran, P.H. Hindle, Practical NIR spectroscopy with applications in
food and beverage analysis, 2nd ednPrentice hall, 1993 Pearson Education.
correspond to SVs. Four SVs are marked in Fig. 8: A1, A2 are samples of [2] Y. Roggo, P. Chalus, L. Maurer, C. Lema-Martinez, A. Edmond, N. Jent, A review of
class 1, B1 of class 2 and C1 class 3. For these data points the corresponding near infrared spectroscopy and chemometrics in pharmaceutical technologies,
Lagrangian coefficients are presented Table 4. The magnitude of αi is an J. Pharm. Biomed. Anal. 44 (2007) 683–700.
[3] V. Vapnik, The Nature of statistical learning theory, Springer-Verlag, New York,
1995.
[4] V. Vapnik, Statistical learning Theory, John Wiley & Sons, New York, 1998.
[5] Y. Xu, S. Zomer, R.G. Brereton, Support vector machines: A recent method for
classification in chemometrics, Crit. Rev. Anal. Chem. 36 (2006) 177–188.
[6] A.I. Belousov, S.A. Verzakov, J. Von Frese, Applicational aspects of support vector
machines, J. Chemom. 16 (2002) 482–489.
[7] Y. Langeron, M. Doussot, D.J. Hewson, J. Duchene, Classifying NIR spectra of textile
products with kernel methods, Eng. Appl. Artif. Intell. 20 (2007) 415–427.
[8] J.A. Fernández Pierna, V. Baeten, P. Dardenne, Screening of compound feeds using
NIR hyperspectral data, Chemometr. Intell. Lab. Syst. 84 (2006) 114–118.
[9] M.J. De La Haba, J.A. Fernández Pierna, O. Fumière, A. Garrido-Varo, J.E. Guerrero, C.D.
Pérez-Marín, P. Dardenne, V. Baeten, Discrimination of fish bones from other animal
bones in the sedimented fraction of compound feeds by near infrared microscopy, J.
Near Infrared Spectrosc. 15 (2007) 81–88.
[10] S. Caetano, B. Ustun, S. Hennessy, J. Smeyers-Verbeke, W. Melssen, G. Downey, L.
Buydens, Y. Vander Heyden, Geographical classification of olive oils by the
application of CART and SVM to their FT-IR, J. Chemom. 21 (2007) 324–334.
[11] Q. Chen, J. Zhao, C.H. Fang, D. Wang, Feasibility study on identification of green,
black and Oolong teas using near-infrared reflectance spectroscopy based on
support vector machine (SVM), Spectrochim. Acta A 66 (2007) 568–574.
[12] C.J.C. Burges, A tutorial on support vector machines for pattern recognition, Data
Min. Knowl. Disc 2 (1998) 121–167.
[13] N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines,
Cambridge University Press, Cambridge, 2000.
[14] A.I. Belousov, S.A. Verzakov, J. Von Frese, A flexible classification approach with
Fig. 8. Dataset 2 projected on the PC2/PC3 subspace with the three classes indicated by optimal generalisation performance: support vector machines, Chemom. Intell.
‘o’, ‘x’ and ‘+’. The SVs corresponding to the SVM model with a RBF kernel, G = 0.00069 Lab. Syst. 64 (2002) 15–25.
and C = 16236 are marked in bold. [15] B. Schölkopf, A. Smola, Learning with Kernels, MIT Press, Cambridge, 2002.
O. Devos et al. / Chemometrics and Intelligent Laboratory Systems 96 (2009) 27–33 33

[16] M.S. Bazaraa, H.D. Sherali, C.M. Shetty, Nonlinear Programming: Theory and [21] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, 2001 Software
Algorithms, 2nd ednWiley, New York, 1992. available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[17] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995) 273–297. [22] V. Franc, V. Hlavac, Statistical Pattern Recognition Toolbox for Matlab, 2004
[18] C.-W. Hsu, C.-C. Chang, C.-J. Lin, A practical guide to support vector classification, Software available at ttp://cmp.felk.cvut.cz/cmp/software/stprtool/index.html.
2007 http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf. [23] C.-W. Hsu, C.-J. Lin, A comparison of methods for multi-class support vector
[19] C.M. Bishop, Neural networks for pattern recognition, Oxford University Press, machines, IEEE T. Neural Network 13 (2002) 415–425.
Oxford, 1995.
[20] O. Chapelle, V. Vapnik, O. Bousquet, S. Mukherjee, Choosing multiple parameters
for support vector machines, Mach. Learn 46 (2002) 131–159.

You might also like