Multivariate Strategies For Classificati
Multivariate Strategies For Classificati
19–31
www.elsevier.comrlocaterchemometrics
Abstract
The goal of the presented study is two-fold. First, we want to emphasize the power of Near Infrared Reflectance ŽNIR.
spectroscopy for discrimination between mayonnaise samples containing different vegetable oils. Secondly, we want to use
our data to compare the performances of different classification procedures. The NIR spectra with 351 variables correspond
to equally spaced wavelengths in the 1100–2500 nm area. Feature extraction both by automatic wavelength-selection and by
projection onto principal components ŽPCs. is discussed. The discriminant methods considered are linear discriminant analy-
sis ŽLDA., quadratic discriminant analysis ŽQDA. and regression with categorical 0,14-responses. A dataset containing 162
spectra of mayonnaise samples based on six different vegetable oils is analyzed. By LDA with authentic cross-validation
ŽPC-models re-estimated for each cross-validation segment., only one sample was misclassified. Classification by allocating
a sample according to the largest fitted value of a linear regression ŽDiscriminant-Partial least squares ŽDPLS. or Discrimi-
nant-Principal components regression ŽDPCR.. is demonstrated sub-optimal compared to LDA of the corresponding PLS- or
PCR-scores. QDA significantly outperforms LDA for projections of the data onto subspaces of moderate size Žscores of 7–9
PCs.. Two automatic variable-selection procedures choose 16 and 26 wavelengths Žvariables., respectively from the spectra.
Based on the selected wavelengths, LDA gives considerably better classification than the regression approach. By reporting
the performances of several feature extraction techniques in tandem with three of the most common classification methods,
we hope that the reader will notice two relevant aspects: Ž1. By using the DPLS and DPCR Žclassification by ‘dummy’ re-
gressions. one is exposed to a significant risk of obtaining sub-optimal classification results; Ž2. The automatic wavelength
selections may give valuable information about what is actually causing a successful discrimination. Such knowledge can,
for instance, be used to select the most suited filters for online applications of NIR. Besides, from demonstrating different
classification strategies, our study clearly shows that classification methods with NIR spectra can be used to discriminate
between mayonnaise samples of different oil types and fatty acid composition. q 1999 Elsevier Science B.V. All rights re-
served.
Keywords: Discriminant analysis; Principal components; Automatic variable selection; NIR; Vegetable oils
)
Corresponding author. Tel.: q47-22-47-67-45; Fax: q47-22-47-67-95; E-mail: [email protected]
1
Mills DA, P.O. Box 4644, Sofienberg, N-0506 Oslo, Norway.
2
Mills DA, P.O. Box 4644, Sofienberg, N-0506 Oslo, Norway.
0169-7439r99r$ - see front matter q 1999 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 9 - 7 4 3 9 Ž 9 9 . 0 0 0 2 3 - 4
20 U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31
timation of between-groups variance over within- matrices of each class are assumed, Eq. Ž3. is applied
groups variance ratio ŽBrW-ratio. for each spectral with:
wavelength of the dataset. t
d k Ž x . s Ž x y m k . Sy1
k Ž x y m k . q ln S k
y 2ln Ž p k . . Ž 4.
2. Materials and methods Ž . Ž .
Eqs. 3 and 4 together are leading to quadratic
2.1. MultiÕariate classification boundaries between the classes, and the resulting
classification rule is known as QDA. We will only
Similar to multivariate regression problems, the assume equal priors of the classes and consequently,
goal of a classification-procedure is to extract reli- cancel the last term of Eq. Ž4.. If also equal covari-
able information from potentially high-dimensional ance matrices of the classes are assumed, i.e., S k s
and collinear X-data. This often requires careful S , ;k g 1, . . . , K 4 , we get a global metric on the
shrinking, transformation, regularization or subset- variable space, d Ž x . s Ž x y y . tS y 1 Ž x y y . with
selection from the original variables Žwavelengths. d k Ž x . s Ž x y m k . tSy1 Ž x y m k .. By simple algebraic
before the standard statistical techniques can be suc- manipulations, one can deduce that the quadratic
cessfully applied. terms of x in these expressions cancel. This leads to
A classification problem can be described as fol- linear decision boundaries between the classes. The
lows: n measurements x 1 , . . . , x n , each of p vari- resulting classification rule is known as LDA.
ables are arranged in a data matrix X. In most applications of LDA and QDA the class-
Group membership is given by a vector Y g centers m k and the covariance matrices S k are un-
1,2, . . . , K 4 n , where K is the number of groups. From known and usually replaced by:
the dataset ŽX,Y. a classification-rule CR: R p ™ 1
1,2, . . . , K 4 is designed in such a way that the cor- mˆ ks Ý x
n k classŽ i .sk i
Ž 5.
rect class of a future anonymous sample x g R p is
predicted with high probability. Good references on where n k s‘the number of observations in group
the topic are Ripley w27x, Duda and Hart w29x, Mardia number k’ and:
et al. w30x, McLachlan w31x, and Bishop w32x. 1 t
By assuming the different groups generated by K Ŝ k s Ý Ž x y mˆ k . Ž x i y mˆ k . ,
n k classŽ i .sk i
Ž 6.
distinct probability-densities f k for k g 1, . . . , K 4
with prior probabilities p k , a straightforward argu- corresponding to the maximum likelihood ŽML. esti-
ment minimizing the risk of misclassification leads to mates of m k and S k .
the rule: CRŽ x . s k, ˆ where
2.1.1. Collinearity and singularity
f kˆ Ž x . p kˆ s max fk Ž x . p k 4 , Ž 1.
ks1, . . . , K When dealing with collinear data or more vari-
i.e., allocation of x to the class of maximal probabil- ables than observations, the resulting ML-estimates of
ity score. If the densities involved are assumed covariance matrices will suffer from serious instabil-
multi-normal, i.e., ity or even singularity. This is easily seen by ex-
1 pressing the spectral decomposition of the covari-
fk Ž x . s pr2 1r2
ance matrices as:
Ž 2p . Sk p
=exp y1r2 Ž x y m k .
t
Sy1
k Ž mk . , Ž 2. Sk s Ý e i k zi k zitk , Ž 7.
is1
with prior probabilities p k for k g 1, . . . , K 4 , the where e i k is the ith eigenvalue and zi k the corre-
expression in Eq. Ž1. can be replaced by: sponding eigenvector of S k . The inverse in this rep-
d k̂ Ž x . s min dk Ž x . , Ž 3. resentation equals
ks1, . . . , K p
zi k zitk
simply by taking the logarithm of the densities and Sy1
k s Ý ei k
, Ž 8.
deleting common terms. When different covariance is1
22 U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31
and Eq. Ž4. takes the form BrW-ratio criterion z t Bzrz t Wz under orthogonal-
p 2 p
ity constraints. These vectors are found by solving the
zitk Ž x y m k . generalized eigenvalue problem Bz s l Wz, where l
dk Ž x . s Ý ei k
q Ý ln Ž e i k . is a generalized eigenvalue, W s n Sˆ within the sum
is1 is1
K Ž
y 2ln Ž p k . . Ž 9.
of squares matrix and B s S ks 1 mˆ k y mˆ .Ž mˆ k y mˆ . t
the between sum of squares matrix Ž mˆ s Ž 1r
Eq. Ž9. is heavily influenced by the smaller eigen- n. S n k m
ˆ k .. If W is non-singular, z will be an ordi-
values and the directions corresponding to their nary eigenvector of the expression Wy1 B Žsee Mar-
eigenvectors. According to Friedman w1x, the eigen- dia et al. w30x.. It is proved in Ref. w31x that a direct
values of the empirical estimates Sˆ k are biased such application of LDA to the original data yield the same
that small eigenvalues are underestimated and large results as LDA applied to a projection of the data onto
eigenvalues are overestimated. When the n k values the space spanned by the canonical variates. Plots of
are greater than but close to p, the estimated Sˆ k val- the data projected onto two or three of the first
ues will consequently be unstable and when n k - p canonical variates often give a useful graphical rep-
the smallest eigenvalues will be equal to 0 and imply resentation of the relationship between the different
singular estimates. In the situation of small groups groups Žsee Fig. 3..
Žcompared to the number of variables. LDA based on In the case of multivariate and highly correlated
the pooled covariance-estimate variables typical for spectral data, it is, however, im-
1 K portant to do some kind of regularization of the
Sˆ s Ý Ž n k . Sˆ k , Ž 10 . dataset before estimation of canonical variates. Oth-
n ks1 erwise, both the estimated classification rule and the
Ž n s S n k . will often outperform QDA because one is corresponding graphical representation can be
better off with a more stable estimate of the common strongly misleading.
covariance-structure than unstable individual esti-
mates. Sometimes, a better alternative is to look for
intermediate solutions where the class-dependent co- 2.1.3. Feature extraction by Õariable selection
variance matrices are modified according to certain An alternative to projecting the data onto PCs is
rules. Such strategies are discussed in Refs. w1,33,34x. to select a subset of the original variables Žwave-
lengths.. The ideal situation is, of course, to identify
2.1.2. Feature extraction by orthogonal decomposi- the variables having the significant discriminative
tions power.
In the situation with highly multivariate and One possible strategy to obtain focus on variables
collinear data such as NIR spectra, and only a mod- contributing to correct classification is to compute the
erate number of samples, even the pooled covari- one-dimensional BrW-ratios separately for each
ance-estimate of LDA may become unstable or wavelength w8x of the spectra to obtain the scatter-
singular. A reasonable solution to the problem is to curÕe Žif the full multivariate versions of W and B are
reduce the dimension of the data by decomposing and already computed, the scatter-curve can be obtained
projecting the samples onto a smaller number of by element-wise division of the diagonal of the ma-
orthogonal components found by PCA, PLS or some trix B with the diagonal of the matrix W.. Instead of
other suitable strategy. LDA or QDA can then be exclusively selecting the wavelengths corresponding
applied to the reduced data without stability prob- to the largest values, we suggest selection of vari-
lems. With some fortune, such procedures can give a ables corresponding to the local maxima of this spec-
drastic reduction of dimension without causing trum ŽFig. 4.. This idea is sensible considering the
significant loss of discriminative information. high correlation present between adjacent wave-
A useful and interesting part of discriminant ana- lengths of a NIR spectrum. Even if two adjacent
lysis is the computation of canonical variates also variables yield a high ratio, they both supply essen-
known as Fisher’s linear discriminants. The canoni- tially the same discriminative information. This ap-
cal variates are the solution-vectors z maximizing the proach is equivalent to computation of the F-statistic
U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31 23
for each wavelength in a One-way ANOVA where the LDA to fitted values corresponds to first projecting
grouping corresponds to the different oils. the data onto canonical variates derived from the
We also consider an extended hybrid of this pro- space spanned by the selected PCs, followed by an
cedure based on simultaneous selection of local max- application of LDA in this space.
ima of the scatter-curve and the related Õariance-
curÕe resulting from estimation of variable-wise 2.1.5. Validation and model selection
variances ŽFig. 4.. The variance-curve is identical to Estimates of success rates for the classifiers are
the diagonal of the total covariance matrix Sˆ T s ŽB based on cross-validation. Because estimation of PCs
q W.rn of the entire dataset. is included in the model-specification, extra care must
be taken. Some observations of the dataset can po-
2.1.4. Classification by regression tentially have a significant influence on the choice of
Regression with 0,14 -responses is a strategy for components, hence, evaluation of classifiers based on
classifying multivariate data apparently different from an initial decomposition may be misleading. This is
the probabilistic approaches leading to LDA and an analog to validation by leverage correction of PCR
QDA. A response matrix Y of n rows Žcorrespond- and PLS regression models. An ‘authentic cross-
ing to the observations of X. and K columns Žcorre- validation’ is obtained by decomposing the data for
sponding to the number of classes. is designed as each cross validation segment excluded. The actual
follows: for the ith row of Y Ž i s 1, . . . ,n., put a 1 in dataset was divided into segments by leaving out the
the k th column and a 0 in all the other columns if the three replicates of each sample per validation-step for
corresponding ith object of X belongs to class k. By all the methods presented.
regressing Y onto X, classification of a new sample
x is done by selecting the group corresponding to the 2.2. Mayonnaise and oil samples
largest component of the fitted yˆ Ž x . s Ž yˆ 1Ž x ., . . . ,
yˆK Ž x ... Ripley w27x gives an algebraic argument ex- A two level factorial design Ž2 4IV– 1 without repli-
plaining the regression approach as a variant of LDA cates. was used for the investigation of oil, stabilizer,
with equal prior probabilities, where the total covari- egg, and sugar in full fat mayonnaise. All four vari-
ance matrix is used in place of the within-groups co- ables were varied at two levels in each of six mayon-
variance matrix. When the model assumptions of naise samples based on different vegetable oils. These
LDA are appropriate, the regression approach is sel- were soybean oil, sunflower oil, canola oil, olive oil,
dom superior. For K s 2 Ža two-class problem., it can corn oil and grapeseed oil. The oil content was var-
be shown that LDA and the regression approach are ied from 70 to 80%. In addition to eight designed ex-
equivalent Žsee Ref. w27x.. However, as explained periments with mayonnaise of a certain oil type, one
both by Ripley and Hastie et al. w28x, for arbitrary sample containing 75% soybean oil was always pro-
many classes K, LDA applied with the fitted Y-val- ˆ duced immediately after the others to serve as a con-
ues of a linear regression always yield a classifi- trol. Hence, a total of 54 Ž6)9. samples of mayon-
cation equivalent to LDA applied to the original naise were produced. Of the considered six groups,
x-variables. Furthermore, this is equivalent to an the soybean oil group contained 14 samples Žinclud-
application of LDA when the data are projected onto ing the six 75% control samples. whereas the other
the space spanned by the canonical variates. The groups contained eight samples each. All the samples
equivalence is a consequence of the fact that the were analyzed by NIR.
fitted values span the same space as the canonical Six different commercially available vegetable
variates, and can be looked upon as a linear transfor- oils, of the same kind as mentioned above, were sub-
mation of the original data into the canonical space jected to a similar analysis as the mayonnaise sam-
Žnote that this relationship is no longer exact when ples. The reason for doing so was to compare the re-
LDA is substituted by QDA.. sults using oil type as the discriminating criteria both
In the cases of PCR and PLS2 regression with in pure form, as well as in a finished product
0,14 -responses w36x, i.e., DPCR and DPLS, this rela- Žmayonnaise.. In order to assure sufficient data-vari-
tionship can be explained as follows: application of ation, different brands of all oils were bought from
24 U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31
local food stores. Because of restricted local avail- to 2500 nm with an InfraAlyzer 500 ŽBran q Luebbe,
ability of the different oil types, we ended up with six Germany.. A sample cup with a quartz coverglass
different brands of soybean oil, seven of sunflower was used for sample presentation.
oil, three of canola oil, four of olive oil, four of corn The pure vegetable oils were also analyzed ŽFig.
oil and three different brands of grapeseed oil, giving 1b. in triplicates Ž24)3 replicatess 72 samples. us-
a total of 24 different samples. The oils were stored ing the same instrument and specifications as for the
at 48C and analyzed by NIR and Gas Chromatogra- mayonnaise samples. A sample cup coated with 40
phy. mm of gold covered by a quartz cover glass was used
for sample presentation of the oils.
2.3. NIR measurements The spectra were not exposed to any pre-transfor-
mationsrscatter corrections.
The mayonnaise were analyzed in triplicates
Ž54)3 replicatess 162 samples. using logŽ1rreflec-
tance. spectra ŽFig. 1a. at 4 nm intervals from 1100 2.4. Gas Liquid Chromatography (GLC)
3. Results
Table 1
Selected success rates of the different methods
Methods Projections
PCA-components PLS2-components Selected variables Selected variables
ŽBrW. Žhybrid.
LDA Ž‘optimal’. ŽLDArPCA. ŽLDArPLS2. ŽLDA. ŽLDA.
15 components 14 components 16 variables 26 variables
161r162 161r162 152r162 160r162
Linear regression ŽPCR. ŽPLS2. ŽMLR. ŽMLR.
Žsame comp. as LDA. 15 components 14 components 16 variables 26 variables
131r162 152r162 126r162 148r162
Linear regression ŽPCR. ŽPLS2.
Ž‘optimal’. 20 components 17 components
154r162 158r162
QDA Ž‘optimal’. ŽQDArPCA. ŽQDArPLS.
9 components 15 components
155r162 155r162
cross-validated success rates for the different meth- canonical variates. To compute the canonical vari-
ods are given in Table 1. ates, class-membership of the samples is required.
Some regularization of the data is also necessary. In
the case of Fig. 3, the canonical variates were ob-
4. Discussion tained by first projecting the data onto 15 PCs Žopti-
mal classification. of a PCA on the entire dataset. For
There are two important aspects related to the data the canonical variates to represent a realistic group-
analyzed in this paper. The first one is methodologi- ing of the data, the number of PCA-components cho-
cal and related to successful discriminant analysis sen was decided according to the cross-validated
based on NIR spectra and similar multivariate data. success rates of LDApca in Fig. 2.
The second one is about the chemical interpretations The different clusterings solve different classifica-
of the information extracted from the NIR spectra to tion problems related to our dataset. To extract the
give good discrimination. The following discussion best possible features for the problem under consid-
will hence be divided into two parts.
It should be noted that, in general, PCs and nificantly outperformed by the other methods of our
canonical variates are not invariant under pre-trans- study.
formations Žlike multiplicative scatter correction. of Generally, one should expect data of a more com-
the data. This is also true for the variance- and scat- plicated group-structure than reflected by our exam-
ter-curves. In particular, the variance-curve is likely ple. In such cases, LDA may be too crude, and a di-
to be altered in the location of its peaks. Also, the mension-reduction followed by an application of
scatter-curve may be affected, but probably less seri- RDA w1x might be preferable. Non-linear multivariate
ously. As long as a reduction in total variance will regression onto 0,14 -responses followed by LDA ap-
tend to be quite evenly distributed between the plied to the fitted values in the spirit of Hastie et al.
‘within-samples’ and the ‘between-samples’—vari- w37x and Ripley w27x are other useful alternatives.
ance peaks are not likely to be lost. However, if the Also, the variable-selection approaches described by
reduction of variance is mostly due to the ‘within- Wu and Massart w9x and the wavelength-selections
samples’ variance, new peaks may appear. We do not suggested in this paper yield good starting-points for
pursue the subject any further here. discriminant analysis via other flexible methods like
Another aspect not considered in this paper is NNs.
non-chronological Žchronological with respect to the
amount of corresponding variancercovariance. se- 4.2. Chemical interpretations of the NIR spectra Õia
lection of PCA- and PLS2-components. Such selec- analysis of pure Õegetable oils
tion could be based on component-wise BrW-ratios
estimated for each component, or by a stepwise NIR is suited for measuring the composition of
canonical procedure as described in Ref. w6x. We de- food constituents w38x such as the amount of fat, wa-
cided not to consider such approaches in our study. ter, and protein. The spectrum is a result of light ab-
According to the conclusions in Ref. w35x and the fact sorption from various functional groups, including
that very good classification results were obtained –CH, –OH, –NH and the wavelength of an absorp-
without any selection strategy, we do not expect tion band often reveals the nature of the chemical
component-selections to give significant improve- bonds responsible for the absorption. Almost all ab-
ment in classification performance over the methods sorption bands observed in NIR arise from overtones
actually applied in our analysis. of hydrogenic stretching vibrations of functional
All together, the dataset presented seemed very groups or combinations involving stretching and
well-suited for discrimination. Except for QDA which bending modes of vibration of the groups. After ob-
starts to suffer from overparametrization due to the taining a successful classification, it is both natural
required estimation of individual covariance matri- and often necessary to understand and interpret the
ces, the cross-validated success rates of Fig. 2 in- discriminative information of the NIR spectra.
crease quite consistently for the number of compo- Vegetable oils contain different amounts of fatty
nents considered. For LDA with PCA-scores, we acids which differ in chain length Žaddition of –CH 2 .
confirmed the cross-validated success rate of and number and position of double bonds ŽTable 2..
161r162 for as many as 130Ž!. PCs Žinclusion of that These differences are likely to be expressed in a
many scores is of course not recommended.. number of different wavelength bands of the NIR
For completeness, we also tried classification of spectra making an exact interpretation difficult.
the mayonnaise-data by the SIMCA-method imple- However, information about the fatty acid composi-
mented in the software-package ‘SCAN’. The rou- tion in oils and fats seems to be concentrated in the
tine in ‘SCAN’ follows the original description of range of 1600–2200 nm, and attempts have been
SIMCA given by Wold w4x, and does not consider made to determine the fatty acid composition by NIR
leverage inside the PC-models. The best cross-vali- methodology Žsee Ref. w22x..
dated success rate of 75% by SIMCA was obtained The NIR spectra obtained from the 24)3 s 72
with seven PCs representing each class. Thus, at least samples of the same six pure vegetable oils being
this version of SIMCA seems to be limping due to the used in the mayonnaise samples were similar to NIR
small number of samples in each class, and it is sig- spectra described in the literature w10,22,38x. The
U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31
Table 2
The average fatty acid composition of the oil types with standard deviations calculated on the basis of replicates
Oil types Fatty acids
C14 C16 C16:1 C18:0 C18:1 C18:2 C18:3 C20–22
Soybean oil 0.0764"0.1046 12.7733"0.3285 0.0918"0.0058 3.9567"0.2606 22.3984"0.8813 54.2878"0.3346 7.3189"0.7410 0.3404"0.1324
Sunflower oil 0.0658"0.0022 6.2441"0.0741 0.0799"0.0064 4.1972"0.1325 23.9858"1.1621 64.8389"1.2101 0.1428"0.0618 0.2598"0.1222
Rapeseed oil 0.0573"0.0068 4.7344"0.4788 0.2229"0.0119 1.7589"0.0523 59.5633"0.2727 21.1522"0.3419 10.0722"0.5128 1.9246"0.0608
Olive oil 0.0023"0.0079 11.1625"1.1979 0.7531"0.1636 2.9558"0.5414 73.5224"1.2924 9.7883"0.10129 0.6623"0.0638 0.7872"0.0722
Corn oil 0.0409"0.0076 10.7617"0.2062 0.1029"0.0042 1.8500"0.0477 28.3832"1.2512 56.8917"1.4124 1.0005"0.3436 0.2641"0.1542
Grapeseed oil 0.0430"0.0016 6.9100"0.1138 0.0957"0.0058 3.7756"0.0682 17.6778"0.4992 70.5522"0.2414 0.5359"0.3439 0.2859"0.0701
29
30 U.G. Indahl et al.r Chemometrics and Intelligent Laboratory Systems 49 (1999) 19–31
5. Conclusions
ful in separating mayonnaise samples according to oil w10x P.G. Osborne, T. Fearn, Near Infrared Spectroscopy in Food
type and fatty acid composition. The results are in Analysis, Longman, Essex, 1986..
w11x D.G. Evans, C.N.G. Scotter, L.Z. Day, M.N. Hall, Journal of
agreement with recent studies showing that PCA and Near Infrared Spectroscopy 1 Ž1993. 33–44.
discriminant analysis can be used to identify oils Žsee w12x M.R. Ellekjær, K.I. Hildrum, T. Næs, T. Isaksson, Journal of
Refs. w18,21x.. It may therefore be possible to utilize Near Infrared Spectroscopy 1 Ž1993. 65–75.
NIR for the determination of oil type in full fat may- w13x T. Næs, K.I. Hildrum, Applied Spectroscopy 51 Ž3. Ž1997.
onnaise as well as detection of minor variations in oil 350–357.
w14x B.G. Osborne, B. Mertens, M. Thompson, T. Fearn, Journal
content and variations in the content of individual of Near Infrared Spectroscopy 1 Ž1993. 77–84.
fatty acids. Furthermore, we have seen that for our w15x W.J. Krzanowski, Journal of Near Infrared Spectroscopy 3
data, LDA is consistently superior to the regression Ž1995. 111–117.
approaches based on PLS2 and PCR with 0,14 -re- w16x S. Husain, K. Sita Devi, D. Krishna, P.J. Reddy, Chemomet-
sponses. However, our experience is that PLS2 can be rics and Intelligent Laboratory Systems 35 Ž1996. 117–126.
w17x Y.W. Lai, E.K. Kemsley, R.H. Wilson, Food Chemistry 53
very useful for extraction of subspaces where both Ž1995. 95–98.
QDA and LDA applied to the projected data yield w18x K.M. Bewig, A.D. Clarke, C. Roberts, N. Unklesbay, JAOCS
good discrimination. Wavelength-selection corre- 71 Ž2. Ž1994. 195–200.
sponding to local maxima of the scatter- and vari- w19x M.F. Devaux, P. Robert, A. Qannari, M. Safar, E. Vigneau,
ance-curves is demonstrated both as a useful data-re- Applied Spectroscopy 47 Ž7. Ž1993. 1024–1029.
w20x D.B. Dahlberg, S.M. Lee, S.J. Wenger, J.A. Vargo, Applied
duction technique, and as a tool for identification of Spectroscopy 51 Ž8. Ž1997. 1118–1124.
wavelengths containing discriminative information. w21x T. Sato, JAOCS 71 Ž3. Ž1994. 293–299.
w22x T. Sato, S. Kawano, M. Iwamoto, JAOCS 68 Ž11. Ž1991.
827–833.
Acknowledgements w23x H. Kamishikiryo, K. Hasegawa, H. Takamura, T. Matoba,
Journal of Food Science 57 Ž5. Ž1992. 1239–1240.
w24x A.J. Boot, A.J. Speek, Journal of AOAC International 77 Ž5.
This study was completed as a part of the Nordic Ž1994. 1184–1189.
R & D Programme for the Food Industry, Nordfood w25x I.J. Wesley, R.J. Barnes, A.E.J. McGill, JAOCS 72 Ž3. Ž1995.
project no. P93131—‘Emulsion quality’, with finan- 289–292.
cial support from the Nordic Industrial Fund and the w26x M.A. Czarnecki, Y. Liu, Y. Ozaki, M. Suzuki, M. Iwahashi,
Norwegian Food-company Mills DA. We are grate- Applied Spectroscopy 47 Ž12. Ž1993. 2162–2168.
w27x B.D. Ripley, Pattern Recognition and Neural Networks,
ful to Grethe Enersen of MATFORSK for measuring Cambridge, 1996.
the NIR spectra and Palsgaard Industri of Denmark w28x T. Hastie, R. Tibshirani, A. Buja, Journal of the American
for production of the mayonnaise samples. Statistical Association 89 Ž1994. 1255–1270.
w29x R.O. Duda, P.E. Hart, Pattern Classification and Scene Anal-
ysis, Wiley, 1973.
w30x K.V. Mardia, J.T. Kent, J.M. Bibby, Multivariate Analysis,
References
Academic Press, 1979.
w31x G.J. McLachlan, Discriminant Analysis and Statistical Pat-
w1x J.H. Friedman, JASA 84 Ž1989. 165–175.
tern Recognition, Wiley, 1992.
w2x I. Frank, Chemometrics and Intelligent Laboratory Systems 4
w32x C.M. Bishop, Neural Networks for Pattern Recognition, Ox-
Ž1988. 215–222.
ford, 1995.
w3x I. Frank, J.H. Friedman, Journal of Chemometrics 3 Ž1989.
w33x B. Flury, M.J. Schmidt, A. Natayanan, Journal of Classifica-
463–475.
tion 11 Ž1994. 101–120.
w4x S. Wold, Pattern Recognition 8 Ž1976. 127–139.
w34x J.D. Banfield, A.E. Raftery, Biometrics 49 Ž1993. 803–821.
w5x W.J. Krzanowski, P. Jonathan, W.V. McCarty, M.R. Thomas,
w35x T. Næs, H. Martens, Journal of Chemometrics 2 Ž1988. 155–
Applied Statistics 44 Ž1995. 101–115.
167.
w6x D. Bertrand, P. Courcoux, J.-C. Autran, P. Robert, Journal of
w36x L. Stahle,
˚ S. Wold, Journal of Chemometrics 1 Ž1987. 185–
Chemometrics 4 Ž1990. 411–427.
196.
w7x Y. Mallet, D. Coomans, O. de Vel, Chemometrics and Intel-
w37x T. Hastie, A. Buja, R. Tibshirani, Annals of Statistics 23
ligent Laboratory Systems 35 Ž1996. 157–173.
Ž1995. 73–102.
w8x W. Wu, B. Walczak, D.L. Massart, K.A. Prebble, I.R. Last,
w38x P. Williams, K. Norris ŽEds.., Near-Infrared Technology in
Analytica Chimica Acta 315 Ž1995. 243–255.
the Agricultural and Food Industries, American Association of
w9x W. Wu, D.L. Massart, Chemometrics and Intelligent Labora-
Cereal Chemists, St. Paul, 1990.
tory Systems 35 Ž1996. 127–135.