0% found this document useful (0 votes)
16 views13 pages

Multivariate Strategies For Classificati

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views13 pages

Multivariate Strategies For Classificati

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Chemometrics and Intelligent Laboratory Systems 49 Ž1999.

19–31
www.elsevier.comrlocaterchemometrics

Multivariate strategies for classification based on


NIR-spectra—with application to mayonnaise
Ulf G. Indahl ) , Narinder S. Sahni 1, Bente Kirkhus 2 , Tormod Næs
˚ Norway
MATFORSK, OsloÕeien 1, N-1430 As,
Received 26 January 1998; accepted 28 March 1999

Abstract

The goal of the presented study is two-fold. First, we want to emphasize the power of Near Infrared Reflectance ŽNIR.
spectroscopy for discrimination between mayonnaise samples containing different vegetable oils. Secondly, we want to use
our data to compare the performances of different classification procedures. The NIR spectra with 351 variables correspond
to equally spaced wavelengths in the 1100–2500 nm area. Feature extraction both by automatic wavelength-selection and by
projection onto principal components ŽPCs. is discussed. The discriminant methods considered are linear discriminant analy-
sis ŽLDA., quadratic discriminant analysis ŽQDA. and regression with categorical 0,14-responses. A dataset containing 162
spectra of mayonnaise samples based on six different vegetable oils is analyzed. By LDA with authentic cross-validation
ŽPC-models re-estimated for each cross-validation segment., only one sample was misclassified. Classification by allocating
a sample according to the largest fitted value of a linear regression ŽDiscriminant-Partial least squares ŽDPLS. or Discrimi-
nant-Principal components regression ŽDPCR.. is demonstrated sub-optimal compared to LDA of the corresponding PLS- or
PCR-scores. QDA significantly outperforms LDA for projections of the data onto subspaces of moderate size Žscores of 7–9
PCs.. Two automatic variable-selection procedures choose 16 and 26 wavelengths Žvariables., respectively from the spectra.
Based on the selected wavelengths, LDA gives considerably better classification than the regression approach. By reporting
the performances of several feature extraction techniques in tandem with three of the most common classification methods,
we hope that the reader will notice two relevant aspects: Ž1. By using the DPLS and DPCR Žclassification by ‘dummy’ re-
gressions. one is exposed to a significant risk of obtaining sub-optimal classification results; Ž2. The automatic wavelength
selections may give valuable information about what is actually causing a successful discrimination. Such knowledge can,
for instance, be used to select the most suited filters for online applications of NIR. Besides, from demonstrating different
classification strategies, our study clearly shows that classification methods with NIR spectra can be used to discriminate
between mayonnaise samples of different oil types and fatty acid composition. q 1999 Elsevier Science B.V. All rights re-
served.

Keywords: Discriminant analysis; Principal components; Automatic variable selection; NIR; Vegetable oils

)
Corresponding author. Tel.: q47-22-47-67-45; Fax: q47-22-47-67-95; E-mail: [email protected]
1
Mills DA, P.O. Box 4644, Sofienberg, N-0506 Oslo, Norway.
2
Mills DA, P.O. Box 4644, Sofienberg, N-0506 Oslo, Norway.

0169-7439r99r$ - see front matter q 1999 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 9 - 7 4 3 9 Ž 9 9 . 0 0 0 2 3 - 4
20 U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31

1. Introduction to quality and authenticity studies of food products.


Evans et al. w11x give an application with orange juice,
Near Infrared Reflectance ŽNIR. spectroscopy and Ellekjær et al. w12x focus on determination of sodium
similar techniques are primarily used for Žrapid. de- chloride content in sausages, and tenderness of meat
tection of chemical compounds. During the last is classified in Ref. w13x. In Refs. w14,15x, NIR spec-
decade, a number of applications as well as theoreti- troscopy is successfully used to determine the correct
cal approaches have been reported where the main origin of basmati rice. Problems closely related to the
goal is discrimination between samples belonging to application presented in this paper is the work in NIR-
one of several distinct groups based on spectral prop- and IR-based discrimination between different veg-
erties. A number of theoretical approaches deals with etable oils w16–26x.
regularization-strategies. Regularization is necessary In the present paper, we address the problem of
to overcome the problems with unstable and singular discriminating between mayonnaise samples based on
covariance-estimates due to a large number of vari- six different vegetable oils, where the measured NIR
ables Žpossibly highly correlated. together with a spectra yield 351 equally spaced and highly corre-
moderate number of samples. One such strategy is to lated variables corresponding to wavelengths in the
project the data onto a smaller subspace spanned by 1100–2500 nm range. Various classification tech-
a manageable number of principal components ŽPCs.. niques are applied to NIR spectra of full fat mayon-
A standard application of linear discriminant analysis naise containing 70% to 80% oil.
ŽLDA. or quadratic discriminant analysis ŽQDA. can Mayonnaise is an emulsion with oil droplets dis-
then be executed for the projected data. There are also persed in water. Since the NIR spectra reflect both
other methods suggested in the literature. In Ref. w1x, absorption characteristics as well as scattering from
Friedman introduces a strategy called Regularized droplets and other particles, we had to consider inter-
Discriminant Analysis ŽRDA. for computing class- ference from other components than the oil itself.
dependent but biased covariance-estimates resulting Therefore, the six vegetable oils, in pure form, were
in a compromise between LDA and QDA. In Refs. also analyzed by NIR and Gas Liquid Chromatogra-
w2,3x, the method DASCO Ždiscriminant analysis with phy ŽGLC.. The results of the two discriminant-
shrunken covariances. is introduced and suggested as analyses will be compared.
a substitute for the well-known method SIMCA w4x. Studies have shown that NIR spectra of vegetable
Both methods allow individual class-specific PC- oils contain information about their fatty acid com-
models and a shrinking of the residual-spaces, but the position. Absorption bands around 1600–1800 nm
allocation rules of the two methods differ. In Ref. w5x, and 2100–2200 nm are assigned to straight carbon
a strategy for regularizing the covariance-estimate of chains and cis double-bonds. Attempts have previ-
LDA in high-dimensional situations is considered. A ously been made to classify various vegetable oils
stepwise procedure for selection of relevant PC scores based on full-range NIR spectral data w21x.
in the case of initial dimension reduction is demon- The goal of our study is two-fold. First, we want
strated in Ref. w6x. Other approaches of feature-ex- to emphasize the discriminative power of NIR spec-
traction including estimation of Fourier- and troscopy for the chosen problem. Secondly, we want
Wavelet-coefficients, as well as variable selection to discuss and compare the performance of several
strategies, can be found in Refs. w7,8x. Variable selec- different discrimination strategies applied to our data
tion in tandem with Neural Network ŽNN. classifiers and to point out some useful relationships between
is described in Ref. w9x. Practical examples presented them.
in several of the references above demonstrate suc- We will focus on the straightforward approach of
cessful applications of NIR spectroscopy to a num- projecting the data onto suitable subspaces found by
ber of scientific areas. a principal component analysis ŽPCA. and multi-re-
NIR spectroscopy has many applications in the sponse partial least squares ŽPLS2. modelling. The
food industry and food analysis w10x and it is used for latter approach involves a regression with categorical
quality control of raw materials and final products. A  0,1 4 -responses. We also consider variants of a
large number of NIR- and IR-applications are related method for automatic variable selection based on es-
U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31 21

timation of between-groups variance over within- matrices of each class are assumed, Eq. Ž3. is applied
groups variance ratio ŽBrW-ratio. for each spectral with:
wavelength of the dataset. t
d k Ž x . s Ž x y m k . Sy1
k Ž x y m k . q ln S k

y 2ln Ž p k . . Ž 4.
2. Materials and methods Ž . Ž .
Eqs. 3 and 4 together are leading to quadratic
2.1. MultiÕariate classification boundaries between the classes, and the resulting
classification rule is known as QDA. We will only
Similar to multivariate regression problems, the assume equal priors of the classes and consequently,
goal of a classification-procedure is to extract reli- cancel the last term of Eq. Ž4.. If also equal covari-
able information from potentially high-dimensional ance matrices of the classes are assumed, i.e., S k s
and collinear X-data. This often requires careful S , ;k g  1, . . . , K 4 , we get a global metric on the
shrinking, transformation, regularization or subset- variable space, d Ž x . s Ž x y y . tS y 1 Ž x y y . with
selection from the original variables Žwavelengths. d k Ž x . s Ž x y m k . tSy1 Ž x y m k .. By simple algebraic
before the standard statistical techniques can be suc- manipulations, one can deduce that the quadratic
cessfully applied. terms of x in these expressions cancel. This leads to
A classification problem can be described as fol- linear decision boundaries between the classes. The
lows: n measurements x 1 , . . . , x n , each of p vari- resulting classification rule is known as LDA.
ables are arranged in a data matrix X. In most applications of LDA and QDA the class-
Group membership is given by a vector Y g centers m k and the covariance matrices S k are un-
 1,2, . . . , K 4 n , where K is the number of groups. From known and usually replaced by:
the dataset ŽX,Y. a classification-rule CR: R p ™ 1
 1,2, . . . , K 4 is designed in such a way that the cor- mˆ ks Ý x
n k classŽ i .sk i
Ž 5.
rect class of a future anonymous sample x g R p is
predicted with high probability. Good references on where n k s‘the number of observations in group
the topic are Ripley w27x, Duda and Hart w29x, Mardia number k’ and:
et al. w30x, McLachlan w31x, and Bishop w32x. 1 t
By assuming the different groups generated by K Ŝ k s Ý Ž x y mˆ k . Ž x i y mˆ k . ,
n k classŽ i .sk i
Ž 6.
distinct probability-densities f k for k g  1, . . . , K 4
with prior probabilities p k , a straightforward argu- corresponding to the maximum likelihood ŽML. esti-
ment minimizing the risk of misclassification leads to mates of m k and S k .
the rule: CRŽ x . s k, ˆ where
2.1.1. Collinearity and singularity
f kˆ Ž x . p kˆ s max  fk Ž x . p k 4 , Ž 1.
ks1, . . . , K When dealing with collinear data or more vari-
i.e., allocation of x to the class of maximal probabil- ables than observations, the resulting ML-estimates of
ity score. If the densities involved are assumed covariance matrices will suffer from serious instabil-
multi-normal, i.e., ity or even singularity. This is easily seen by ex-
1 pressing the spectral decomposition of the covari-
fk Ž x . s pr2 1r2
ance matrices as:
Ž 2p . Sk p

=exp y1r2 Ž x y m k .
t
Sy1
k Ž mk . , Ž 2. Sk s Ý e i k zi k zitk , Ž 7.
is1
with prior probabilities p k for k g  1, . . . , K 4 , the where e i k is the ith eigenvalue and zi k the corre-
expression in Eq. Ž1. can be replaced by: sponding eigenvector of S k . The inverse in this rep-
d k̂ Ž x . s min dk Ž x . , Ž 3. resentation equals
ks1, . . . , K p
zi k zitk
simply by taking the logarithm of the densities and Sy1
k s Ý ei k
, Ž 8.
deleting common terms. When different covariance is1
22 U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31

and Eq. Ž4. takes the form BrW-ratio criterion z t Bzrz t Wz under orthogonal-
p 2 p
ity constraints. These vectors are found by solving the
zitk Ž x y m k . generalized eigenvalue problem Bz s l Wz, where l
dk Ž x . s Ý ei k
q Ý ln Ž e i k . is a generalized eigenvalue, W s n Sˆ within the sum
is1 is1
K Ž
y 2ln Ž p k . . Ž 9.
of squares matrix and B s S ks 1 mˆ k y mˆ .Ž mˆ k y mˆ . t
the between sum of squares matrix Ž mˆ s Ž 1r
Eq. Ž9. is heavily influenced by the smaller eigen- n. S n k m
ˆ k .. If W is non-singular, z will be an ordi-
values and the directions corresponding to their nary eigenvector of the expression Wy1 B Žsee Mar-
eigenvectors. According to Friedman w1x, the eigen- dia et al. w30x.. It is proved in Ref. w31x that a direct
values of the empirical estimates Sˆ k are biased such application of LDA to the original data yield the same
that small eigenvalues are underestimated and large results as LDA applied to a projection of the data onto
eigenvalues are overestimated. When the n k values the space spanned by the canonical variates. Plots of
are greater than but close to p, the estimated Sˆ k val- the data projected onto two or three of the first
ues will consequently be unstable and when n k - p canonical variates often give a useful graphical rep-
the smallest eigenvalues will be equal to 0 and imply resentation of the relationship between the different
singular estimates. In the situation of small groups groups Žsee Fig. 3..
Žcompared to the number of variables. LDA based on In the case of multivariate and highly correlated
the pooled covariance-estimate variables typical for spectral data, it is, however, im-
1 K portant to do some kind of regularization of the
Sˆ s Ý Ž n k . Sˆ k , Ž 10 . dataset before estimation of canonical variates. Oth-
n ks1 erwise, both the estimated classification rule and the
Ž n s S n k . will often outperform QDA because one is corresponding graphical representation can be
better off with a more stable estimate of the common strongly misleading.
covariance-structure than unstable individual esti-
mates. Sometimes, a better alternative is to look for
intermediate solutions where the class-dependent co- 2.1.3. Feature extraction by Õariable selection
variance matrices are modified according to certain An alternative to projecting the data onto PCs is
rules. Such strategies are discussed in Refs. w1,33,34x. to select a subset of the original variables Žwave-
lengths.. The ideal situation is, of course, to identify
2.1.2. Feature extraction by orthogonal decomposi- the variables having the significant discriminative
tions power.
In the situation with highly multivariate and One possible strategy to obtain focus on variables
collinear data such as NIR spectra, and only a mod- contributing to correct classification is to compute the
erate number of samples, even the pooled covari- one-dimensional BrW-ratios separately for each
ance-estimate of LDA may become unstable or wavelength w8x of the spectra to obtain the scatter-
singular. A reasonable solution to the problem is to curÕe Žif the full multivariate versions of W and B are
reduce the dimension of the data by decomposing and already computed, the scatter-curve can be obtained
projecting the samples onto a smaller number of by element-wise division of the diagonal of the ma-
orthogonal components found by PCA, PLS or some trix B with the diagonal of the matrix W.. Instead of
other suitable strategy. LDA or QDA can then be exclusively selecting the wavelengths corresponding
applied to the reduced data without stability prob- to the largest values, we suggest selection of vari-
lems. With some fortune, such procedures can give a ables corresponding to the local maxima of this spec-
drastic reduction of dimension without causing trum ŽFig. 4.. This idea is sensible considering the
significant loss of discriminative information. high correlation present between adjacent wave-
A useful and interesting part of discriminant ana- lengths of a NIR spectrum. Even if two adjacent
lysis is the computation of canonical variates also variables yield a high ratio, they both supply essen-
known as Fisher’s linear discriminants. The canoni- tially the same discriminative information. This ap-
cal variates are the solution-vectors z maximizing the proach is equivalent to computation of the F-statistic
U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31 23

for each wavelength in a One-way ANOVA where the LDA to fitted values corresponds to first projecting
grouping corresponds to the different oils. the data onto canonical variates derived from the
We also consider an extended hybrid of this pro- space spanned by the selected PCs, followed by an
cedure based on simultaneous selection of local max- application of LDA in this space.
ima of the scatter-curve and the related Õariance-
curÕe resulting from estimation of variable-wise 2.1.5. Validation and model selection
variances ŽFig. 4.. The variance-curve is identical to Estimates of success rates for the classifiers are
the diagonal of the total covariance matrix Sˆ T s ŽB based on cross-validation. Because estimation of PCs
q W.rn of the entire dataset. is included in the model-specification, extra care must
be taken. Some observations of the dataset can po-
2.1.4. Classification by regression tentially have a significant influence on the choice of
Regression with  0,14 -responses is a strategy for components, hence, evaluation of classifiers based on
classifying multivariate data apparently different from an initial decomposition may be misleading. This is
the probabilistic approaches leading to LDA and an analog to validation by leverage correction of PCR
QDA. A response matrix Y of n rows Žcorrespond- and PLS regression models. An ‘authentic cross-
ing to the observations of X. and K columns Žcorre- validation’ is obtained by decomposing the data for
sponding to the number of classes. is designed as each cross validation segment excluded. The actual
follows: for the ith row of Y Ž i s 1, . . . ,n., put a 1 in dataset was divided into segments by leaving out the
the k th column and a 0 in all the other columns if the three replicates of each sample per validation-step for
corresponding ith object of X belongs to class k. By all the methods presented.
regressing Y onto X, classification of a new sample
x is done by selecting the group corresponding to the 2.2. Mayonnaise and oil samples
largest component of the fitted yˆ Ž x . s Ž yˆ 1Ž x ., . . . ,
yˆK Ž x ... Ripley w27x gives an algebraic argument ex- A two level factorial design Ž2 4IV– 1 without repli-
plaining the regression approach as a variant of LDA cates. was used for the investigation of oil, stabilizer,
with equal prior probabilities, where the total covari- egg, and sugar in full fat mayonnaise. All four vari-
ance matrix is used in place of the within-groups co- ables were varied at two levels in each of six mayon-
variance matrix. When the model assumptions of naise samples based on different vegetable oils. These
LDA are appropriate, the regression approach is sel- were soybean oil, sunflower oil, canola oil, olive oil,
dom superior. For K s 2 Ža two-class problem., it can corn oil and grapeseed oil. The oil content was var-
be shown that LDA and the regression approach are ied from 70 to 80%. In addition to eight designed ex-
equivalent Žsee Ref. w27x.. However, as explained periments with mayonnaise of a certain oil type, one
both by Ripley and Hastie et al. w28x, for arbitrary sample containing 75% soybean oil was always pro-
many classes K, LDA applied with the fitted Y-val- ˆ duced immediately after the others to serve as a con-
ues of a linear regression always yield a classifi- trol. Hence, a total of 54 Ž6)9. samples of mayon-
cation equivalent to LDA applied to the original naise were produced. Of the considered six groups,
x-variables. Furthermore, this is equivalent to an the soybean oil group contained 14 samples Žinclud-
application of LDA when the data are projected onto ing the six 75% control samples. whereas the other
the space spanned by the canonical variates. The groups contained eight samples each. All the samples
equivalence is a consequence of the fact that the were analyzed by NIR.
fitted values span the same space as the canonical Six different commercially available vegetable
variates, and can be looked upon as a linear transfor- oils, of the same kind as mentioned above, were sub-
mation of the original data into the canonical space jected to a similar analysis as the mayonnaise sam-
Žnote that this relationship is no longer exact when ples. The reason for doing so was to compare the re-
LDA is substituted by QDA.. sults using oil type as the discriminating criteria both
In the cases of PCR and PLS2 regression with in pure form, as well as in a finished product
 0,14 -responses w36x, i.e., DPCR and DPLS, this rela- Žmayonnaise.. In order to assure sufficient data-vari-
tionship can be explained as follows: application of ation, different brands of all oils were bought from
24 U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31

local food stores. Because of restricted local avail- to 2500 nm with an InfraAlyzer 500 ŽBran q Luebbe,
ability of the different oil types, we ended up with six Germany.. A sample cup with a quartz coverglass
different brands of soybean oil, seven of sunflower was used for sample presentation.
oil, three of canola oil, four of olive oil, four of corn The pure vegetable oils were also analyzed ŽFig.
oil and three different brands of grapeseed oil, giving 1b. in triplicates Ž24)3 replicatess 72 samples. us-
a total of 24 different samples. The oils were stored ing the same instrument and specifications as for the
at 48C and analyzed by NIR and Gas Chromatogra- mayonnaise samples. A sample cup coated with 40
phy. mm of gold covered by a quartz cover glass was used
for sample presentation of the oils.
2.3. NIR measurements The spectra were not exposed to any pre-transfor-
mationsrscatter corrections.
The mayonnaise were analyzed in triplicates
Ž54)3 replicatess 162 samples. using logŽ1rreflec-
tance. spectra ŽFig. 1a. at 4 nm intervals from 1100 2.4. Gas Liquid Chromatography (GLC)

All the 24 samples of pure vegetable oil were ana-


lyzed in triplicates for their fatty acid concentration
using a gas chromatograph ŽHP-5890II. with a
split–split less injector and a flame ionization detec-
tor ŽJ & W DB-23 column, 25 m = 0.25 mm i.d., 0.25
mm film thickness.. The samples were trans-methyl-
ated with methanolic-HCl.

3. Results

Essentially, we considered three different classifi-


cation methods, i.e., LDA, QDA and regression
with  0,14 -responses. The methods were applied to
different decompositions of the dataset containing
NIR spectra for all the mayonnaise samples. The
decompositions considered were PCA, PLS2 and
wavelength selection based on the scatter- and vari-
ance-curves ŽQDA was not used with the selected
wavelengths.. To reveal an overview of the perfor-
mances of these three methods in the case of PCA-
and PLS2-decomposed data, cross-validated success
rates were recorded as a function of the number of
PCsincluded in model-estimation ŽFig. 2..
For projections of the dataset onto a moderate
number Ž7–9. of PCs, QDA based on PCA- or
PLS2-decomposed data gave significantly better re-
sults than LDA. However, using 15 Ž14 for PLS. PCs
or more, cross-validated LDA correctly classified 161
out of the 162 samples, outperforming QDA which
Fig. 1. Ža. NIR spectra of mayonnaise. Žb. NIR spectra of veg- then started to suffer from unstable covariance-
etable oils. estimates. Fig. 3 shows a plot of the data projected
U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31 25

consistently better than PCR ŽPCR discriminant anal-


ysis. for nine or more PCs, but both regressions were
significantly poorer than the corresponding versions
of LDA. However, the regressions outperform QDA
when the number of PCs included is large.
Automatic variable selection based on the
scatter-curve gave 16 wavelengths corresponding to
its local maxima ŽFig. 4.. LDA applied to the selec-
ted wavelengths resulted in a cross-validated success
rate of 153r162. Compared to the cross-validated
success rate of 126r162 obtained from a classifica-
tion according to the largest estimated response of
multiple linear regression ŽMLR., the latter approach
is clearly unfavourable. The hybrid selection based on
Fig. 2. Success rates of the different methods as a function of di- maxima of both the scatter- and the variance-curves
mension. PLS-methods Ždashed., PCA-methods Žsolid..
resulted in selection of 26 wavelengths. LDA applied
to these wavelengths gave a cross-validated success
rate of 160r162, proving the selected subset to con-
onto the first two canonical variates estimated from tain the essential information related to our problem.
an initial projection onto 15 PCs. By studying the The corresponding linear regression had a cross-
clusters, one member of class 2 Žsunflower. can be validated success rate of 148r162, still significantly
observed to be well inside class 1 Žsoybean.. The poorer than LDA. It should be noted that these re-
other two replicates of sunflower from the same batch sults were obtained by direct application of the meth-
were correctly classified, and indicate that a faulty ods to the set of selected wavelengths. No further re-
measurement, probably due to a switch of samples, ductions of dimension were carried out. ŽQDA was
may be the reason why a perfect classification is not not considered for the selected wavelengths because
obtained. of the instability-problems related to individual co-
Direct classification according to the largest esti- variance-estimates when the number of variables are
mated response of the PLS2 regression with  0,14 -in- large compared to class size.. A summary of the best
dicators ŽPLS discriminant analysis, Ref. w36x. was

Fig. 4. Automatically selected wavelengths for the mayonnaise-


Fig. 3. Canonical plot of mayonnaise-data after projection onto 15 data based on local maxima of the scatter-curves Žsolid. and vari-
PCA-components. ance-curves Ždashed..
26 U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31

Table 1
Selected success rates of the different methods
Methods Projections
PCA-components PLS2-components Selected variables Selected variables
ŽBrW. Žhybrid.
LDA Ž‘optimal’. ŽLDArPCA. ŽLDArPLS2. ŽLDA. ŽLDA.
15 components 14 components 16 variables 26 variables
161r162 161r162 152r162 160r162
Linear regression ŽPCR. ŽPLS2. ŽMLR. ŽMLR.
Žsame comp. as LDA. 15 components 14 components 16 variables 26 variables
131r162 152r162 126r162 148r162
Linear regression ŽPCR. ŽPLS2.
Ž‘optimal’. 20 components 17 components
154r162 158r162
QDA Ž‘optimal’. ŽQDArPCA. ŽQDArPLS.
9 components 15 components
155r162 155r162

cross-validated success rates for the different meth- canonical variates. To compute the canonical vari-
ods are given in Table 1. ates, class-membership of the samples is required.
Some regularization of the data is also necessary. In
the case of Fig. 3, the canonical variates were ob-
4. Discussion tained by first projecting the data onto 15 PCs Žopti-
mal classification. of a PCA on the entire dataset. For
There are two important aspects related to the data the canonical variates to represent a realistic group-
analyzed in this paper. The first one is methodologi- ing of the data, the number of PCA-components cho-
cal and related to successful discriminant analysis sen was decided according to the cross-validated
based on NIR spectra and similar multivariate data. success rates of LDApca in Fig. 2.
The second one is about the chemical interpretations The different clusterings solve different classifica-
of the information extracted from the NIR spectra to tion problems related to our dataset. To extract the
give good discrimination. The following discussion best possible features for the problem under consid-
will hence be divided into two parts.

4.1. Aspects concerning successful discriminant


analysis

Classification based on spectral data is no trivial


task. One single dataset can reveal several distinct
group structures and geometrical exploration based on
score-plots of PCA does not necessarily show the
clusterings we seek. For the mayonnaise-data, the
PCA score-plot of Fig. 5 shows a three-group clus-
tering, according to the amount of pure oil in the
samples, where the middle cluster represents all the
samples of 75% soybean oil. The score-plot of Fig. 6
shows a two-group clustering where the olive-based
samples are distinctly separated from the rest of the
data. The desired grouping according to oil type, is Fig. 5. Mayonnaise-data projected onto PC1 and PC2 found by
successfully obtained by projecting the data onto PCA.
U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31 27

of 132r162s 0.815 and 153r162s 0.945, respec-


tively. Hence, the seven components extracted by
PLS2 obviously contains much more discriminative
information than what is utilized by the traditional
PLS-discriminant procedure. The difference in suc-
cess rates between PLS2 and QDA is here, in fact,
55%! The situation is similar for PCR. Discrimina-
tion based on the regression approaches seems to be
clearly sub-optimal compared to application of LDA
Žand QDA when the number of selected components
is small. with the same data.
Wavelength selection based on local maxima of
the scatter- and variance-curves is attractive both
from the computational as well as the interpretational
Fig. 6. Mayonnaise-data projected onto PC2 and PC5 found by point of view. The technique is a univariate feature-
PCA.
extraction, aiming directly for relevant structure of
the data. A natural extension of these procedures will
eration is both a non-trivial and important task. But typically involve a standard stepwise variable selec-
the multi-dimensional nature of our data also in- tion based on statistical criteria as described in Ref.
crease the possibility of doing successful complex w31x. This can be quite helpful in a further reduction
classification. Even if the amount of variance ac- and interpretation of the data without reducing clas-
counted for by the difference in ‘oil type’ is small sification-performance. Multivariate feature-extrac-
compared to the total variance of the data, it can be tion is possibly a more powerful alternative, but not
filtered out to obtain successful classification. By si- without a considerable computational cost. If sensi-
multaneous consideration of the two distinct proper- tivity to disturbances in the spectra such as shifts, etc.,
ties ‘oil type’ and ‘oil-content’, i.e., Figs. 5 and 3, it more robust versions of the wavelength selection can
is in fact possible to obtain reliable classification of be required. For such problems, introduction of extra
the data into 2)5 q 3)1 s 13 distinct groups based variables corresponding to small intervals around of
on the 54 samples Ž)3 replicates. only. the chosen wavelengths is a possible extension. Fur-
As mentioned in Section 2.1.4, there is a close re- ther decompositions by PLS or PCA of the selected
lationship between the regression approach and the wavelengths is another natural option.
canonical variates. In fact, the fitted values of a lin-
ear regression with  0,14 -responses span the canoni-
cal space derived from the covariates. The reader
should note the insignificant difference between Figs.
3 and 7. Fig. 3 is showing a set of data projected onto
the first two canonical variates. Fig. 7 shows a
score-plot of the fitted values from a PCR Ž15 PCs.
with  0,14 -responses. The minor difference in the
structure of the two plots is caused by the extraction
of PC1 and PC2 from the canonical space rather than
the first two canonical variates.
In the case of latent covariates ŽPCR or PLS2-re-
gression. there are some remarks worth making. For
PLS2 with seven components Žsee Fig. 2., direct
classification according to the largest fitted response
gave a success rate of 64r162 s 0.395. With the Fig. 7. PC1 against PC2 of fitted values after PCR-regression of
same seven PCs, LDA and QDA gave success rates 0,14-responses onto mayonnaise-data.
28 U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31

It should be noted that, in general, PCs and nificantly outperformed by the other methods of our
canonical variates are not invariant under pre-trans- study.
formations Žlike multiplicative scatter correction. of Generally, one should expect data of a more com-
the data. This is also true for the variance- and scat- plicated group-structure than reflected by our exam-
ter-curves. In particular, the variance-curve is likely ple. In such cases, LDA may be too crude, and a di-
to be altered in the location of its peaks. Also, the mension-reduction followed by an application of
scatter-curve may be affected, but probably less seri- RDA w1x might be preferable. Non-linear multivariate
ously. As long as a reduction in total variance will regression onto  0,14 -responses followed by LDA ap-
tend to be quite evenly distributed between the plied to the fitted values in the spirit of Hastie et al.
‘within-samples’ and the ‘between-samples’—vari- w37x and Ripley w27x are other useful alternatives.
ance peaks are not likely to be lost. However, if the Also, the variable-selection approaches described by
reduction of variance is mostly due to the ‘within- Wu and Massart w9x and the wavelength-selections
samples’ variance, new peaks may appear. We do not suggested in this paper yield good starting-points for
pursue the subject any further here. discriminant analysis via other flexible methods like
Another aspect not considered in this paper is NNs.
non-chronological Žchronological with respect to the
amount of corresponding variancercovariance. se- 4.2. Chemical interpretations of the NIR spectra Õia
lection of PCA- and PLS2-components. Such selec- analysis of pure Õegetable oils
tion could be based on component-wise BrW-ratios
estimated for each component, or by a stepwise NIR is suited for measuring the composition of
canonical procedure as described in Ref. w6x. We de- food constituents w38x such as the amount of fat, wa-
cided not to consider such approaches in our study. ter, and protein. The spectrum is a result of light ab-
According to the conclusions in Ref. w35x and the fact sorption from various functional groups, including
that very good classification results were obtained –CH, –OH, –NH and the wavelength of an absorp-
without any selection strategy, we do not expect tion band often reveals the nature of the chemical
component-selections to give significant improve- bonds responsible for the absorption. Almost all ab-
ment in classification performance over the methods sorption bands observed in NIR arise from overtones
actually applied in our analysis. of hydrogenic stretching vibrations of functional
All together, the dataset presented seemed very groups or combinations involving stretching and
well-suited for discrimination. Except for QDA which bending modes of vibration of the groups. After ob-
starts to suffer from overparametrization due to the taining a successful classification, it is both natural
required estimation of individual covariance matri- and often necessary to understand and interpret the
ces, the cross-validated success rates of Fig. 2 in- discriminative information of the NIR spectra.
crease quite consistently for the number of compo- Vegetable oils contain different amounts of fatty
nents considered. For LDA with PCA-scores, we acids which differ in chain length Žaddition of –CH 2 .
confirmed the cross-validated success rate of and number and position of double bonds ŽTable 2..
161r162 for as many as 130Ž!. PCs Žinclusion of that These differences are likely to be expressed in a
many scores is of course not recommended.. number of different wavelength bands of the NIR
For completeness, we also tried classification of spectra making an exact interpretation difficult.
the mayonnaise-data by the SIMCA-method imple- However, information about the fatty acid composi-
mented in the software-package ‘SCAN’. The rou- tion in oils and fats seems to be concentrated in the
tine in ‘SCAN’ follows the original description of range of 1600–2200 nm, and attempts have been
SIMCA given by Wold w4x, and does not consider made to determine the fatty acid composition by NIR
leverage inside the PC-models. The best cross-vali- methodology Žsee Ref. w22x..
dated success rate of 75% by SIMCA was obtained The NIR spectra obtained from the 24)3 s 72
with seven PCs representing each class. Thus, at least samples of the same six pure vegetable oils being
this version of SIMCA seems to be limping due to the used in the mayonnaise samples were similar to NIR
small number of samples in each class, and it is sig- spectra described in the literature w10,22,38x. The
U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31
Table 2
The average fatty acid composition of the oil types with standard deviations calculated on the basis of replicates
Oil types Fatty acids
C14 C16 C16:1 C18:0 C18:1 C18:2 C18:3 C20–22
Soybean oil 0.0764"0.1046 12.7733"0.3285 0.0918"0.0058 3.9567"0.2606 22.3984"0.8813 54.2878"0.3346 7.3189"0.7410 0.3404"0.1324
Sunflower oil 0.0658"0.0022 6.2441"0.0741 0.0799"0.0064 4.1972"0.1325 23.9858"1.1621 64.8389"1.2101 0.1428"0.0618 0.2598"0.1222
Rapeseed oil 0.0573"0.0068 4.7344"0.4788 0.2229"0.0119 1.7589"0.0523 59.5633"0.2727 21.1522"0.3419 10.0722"0.5128 1.9246"0.0608
Olive oil 0.0023"0.0079 11.1625"1.1979 0.7531"0.1636 2.9558"0.5414 73.5224"1.2924 9.7883"0.10129 0.6623"0.0638 0.7872"0.0722
Corn oil 0.0409"0.0076 10.7617"0.2062 0.1029"0.0042 1.8500"0.0477 28.3832"1.2512 56.8917"1.4124 1.0005"0.3436 0.2641"0.1542
Grapeseed oil 0.0430"0.0016 6.9100"0.1138 0.0957"0.0058 3.7756"0.0682 17.6778"0.4992 70.5522"0.2414 0.5359"0.3439 0.2859"0.0701

29
30 U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31

Fig. 10. Plot of monounsaturated acids ŽC18:1. and alpha-lino-


lenic acid ŽC18:3. of oil-samples.
Fig. 8. Scatter-curve for oil-data showing the discriminative abil-
ity of the wavelengths.
two corresponding datasets. The same structure is
shown in Fig. 10 where the contents of the monoun-
spectra are showing peaks in the wavelength regions saturated acid C18:1 and the polyunsaturated alpha-
around 1700 nm and 2300–2400 nm ŽFig. 1b.. The linolenic acid C18:3 are plotted against each other.
relevance of these wavelength regions is confirmed Table 2 shows the complete fatty acid composition for
by corresponding peaks in the scatter-curve for the the six different oils analyzed by gas chromatogra-
pure oils ŽFig. 8.. One should also note that the spec- phy.
tra of pure oils show a shape quite similar to the Based on these observations, we conclude that the
variance-curve of Fig. 4. This supports the informa- first two canonical variates derived from NIR mea-
tion extracted from the data to be related to the oil- surements of both datasets are dominated by the
component of the mayonnaise. amount of C18:1 and C18:3, respectively. Hence, the
A successful discrimination of all six oils based on classification seems mainly to be based on the fatty
the NIR spectra is shown in Fig. 9. The canonical acid composition with the number of double bonds
variates are computed after projecting the data onto being one of the most important factors.
15 PCA-components. The canonical plots in Figs. 9 By comparing the discrimination of mayonnaise
and 3 show very similar internal groupings for the with the discrimination of pure oils, it is also appar-
ent that the classification is not significantly influ-
enced by the variation of other ingredients Žegg, sta-
bilizer, and sugar. or other chemical and physical
properties of the mayonnaise. The potential of NIR
spectroscopy for the analysis of other ingredients, as
well as their effects on the consistency in full fat
mayonnaise will be discussed in another paper.

5. Conclusions

In conclusion, the present study demonstrates that


NIR can be used to discriminate between mayon-
naise samples with different oil content and oil type,
in spite of the fact that a large amount of variance in
Fig. 9. Canonical plot of oil-data after projection onto 15 PCA- the data is caused by other phenomena. Classifica-
components. tion methods based on NIR data seem to be success-
U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31 31

ful in separating mayonnaise samples according to oil w10x P.G. Osborne, T. Fearn, Near Infrared Spectroscopy in Food
type and fatty acid composition. The results are in Analysis, Longman, Essex, 1986..
w11x D.G. Evans, C.N.G. Scotter, L.Z. Day, M.N. Hall, Journal of
agreement with recent studies showing that PCA and Near Infrared Spectroscopy 1 Ž1993. 33–44.
discriminant analysis can be used to identify oils Žsee w12x M.R. Ellekjær, K.I. Hildrum, T. Næs, T. Isaksson, Journal of
Refs. w18,21x.. It may therefore be possible to utilize Near Infrared Spectroscopy 1 Ž1993. 65–75.
NIR for the determination of oil type in full fat may- w13x T. Næs, K.I. Hildrum, Applied Spectroscopy 51 Ž3. Ž1997.
onnaise as well as detection of minor variations in oil 350–357.
w14x B.G. Osborne, B. Mertens, M. Thompson, T. Fearn, Journal
content and variations in the content of individual of Near Infrared Spectroscopy 1 Ž1993. 77–84.
fatty acids. Furthermore, we have seen that for our w15x W.J. Krzanowski, Journal of Near Infrared Spectroscopy 3
data, LDA is consistently superior to the regression Ž1995. 111–117.
approaches based on PLS2 and PCR with  0,14 -re- w16x S. Husain, K. Sita Devi, D. Krishna, P.J. Reddy, Chemomet-
sponses. However, our experience is that PLS2 can be rics and Intelligent Laboratory Systems 35 Ž1996. 117–126.
w17x Y.W. Lai, E.K. Kemsley, R.H. Wilson, Food Chemistry 53
very useful for extraction of subspaces where both Ž1995. 95–98.
QDA and LDA applied to the projected data yield w18x K.M. Bewig, A.D. Clarke, C. Roberts, N. Unklesbay, JAOCS
good discrimination. Wavelength-selection corre- 71 Ž2. Ž1994. 195–200.
sponding to local maxima of the scatter- and vari- w19x M.F. Devaux, P. Robert, A. Qannari, M. Safar, E. Vigneau,
ance-curves is demonstrated both as a useful data-re- Applied Spectroscopy 47 Ž7. Ž1993. 1024–1029.
w20x D.B. Dahlberg, S.M. Lee, S.J. Wenger, J.A. Vargo, Applied
duction technique, and as a tool for identification of Spectroscopy 51 Ž8. Ž1997. 1118–1124.
wavelengths containing discriminative information. w21x T. Sato, JAOCS 71 Ž3. Ž1994. 293–299.
w22x T. Sato, S. Kawano, M. Iwamoto, JAOCS 68 Ž11. Ž1991.
827–833.
Acknowledgements w23x H. Kamishikiryo, K. Hasegawa, H. Takamura, T. Matoba,
Journal of Food Science 57 Ž5. Ž1992. 1239–1240.
w24x A.J. Boot, A.J. Speek, Journal of AOAC International 77 Ž5.
This study was completed as a part of the Nordic Ž1994. 1184–1189.
R & D Programme for the Food Industry, Nordfood w25x I.J. Wesley, R.J. Barnes, A.E.J. McGill, JAOCS 72 Ž3. Ž1995.
project no. P93131—‘Emulsion quality’, with finan- 289–292.
cial support from the Nordic Industrial Fund and the w26x M.A. Czarnecki, Y. Liu, Y. Ozaki, M. Suzuki, M. Iwahashi,
Norwegian Food-company Mills DA. We are grate- Applied Spectroscopy 47 Ž12. Ž1993. 2162–2168.
w27x B.D. Ripley, Pattern Recognition and Neural Networks,
ful to Grethe Enersen of MATFORSK for measuring Cambridge, 1996.
the NIR spectra and Palsgaard Industri of Denmark w28x T. Hastie, R. Tibshirani, A. Buja, Journal of the American
for production of the mayonnaise samples. Statistical Association 89 Ž1994. 1255–1270.
w29x R.O. Duda, P.E. Hart, Pattern Classification and Scene Anal-
ysis, Wiley, 1973.
w30x K.V. Mardia, J.T. Kent, J.M. Bibby, Multivariate Analysis,
References
Academic Press, 1979.
w31x G.J. McLachlan, Discriminant Analysis and Statistical Pat-
w1x J.H. Friedman, JASA 84 Ž1989. 165–175.
tern Recognition, Wiley, 1992.
w2x I. Frank, Chemometrics and Intelligent Laboratory Systems 4
w32x C.M. Bishop, Neural Networks for Pattern Recognition, Ox-
Ž1988. 215–222.
ford, 1995.
w3x I. Frank, J.H. Friedman, Journal of Chemometrics 3 Ž1989.
w33x B. Flury, M.J. Schmidt, A. Natayanan, Journal of Classifica-
463–475.
tion 11 Ž1994. 101–120.
w4x S. Wold, Pattern Recognition 8 Ž1976. 127–139.
w34x J.D. Banfield, A.E. Raftery, Biometrics 49 Ž1993. 803–821.
w5x W.J. Krzanowski, P. Jonathan, W.V. McCarty, M.R. Thomas,
w35x T. Næs, H. Martens, Journal of Chemometrics 2 Ž1988. 155–
Applied Statistics 44 Ž1995. 101–115.
167.
w6x D. Bertrand, P. Courcoux, J.-C. Autran, P. Robert, Journal of
w36x L. Stahle,
˚ S. Wold, Journal of Chemometrics 1 Ž1987. 185–
Chemometrics 4 Ž1990. 411–427.
196.
w7x Y. Mallet, D. Coomans, O. de Vel, Chemometrics and Intel-
w37x T. Hastie, A. Buja, R. Tibshirani, Annals of Statistics 23
ligent Laboratory Systems 35 Ž1996. 157–173.
Ž1995. 73–102.
w8x W. Wu, B. Walczak, D.L. Massart, K.A. Prebble, I.R. Last,
w38x P. Williams, K. Norris ŽEds.., Near-Infrared Technology in
Analytica Chimica Acta 315 Ž1995. 243–255.
the Agricultural and Food Industries, American Association of
w9x W. Wu, D.L. Massart, Chemometrics and Intelligent Labora-
Cereal Chemists, St. Paul, 1990.
tory Systems 35 Ž1996. 127–135.

You might also like