Spectra
Spectra
Spectra
net/publication/258653323
CITATIONS READS
4 509
4 authors, including:
M.V. Carriegos
Universidad de León
88 PUBLICATIONS 235 CITATIONS
SEE PROFILE
All content following this page was uploaded by M.V. Carriegos on 06 July 2015.
1
July 2, 2012 14:52 WSPC/INSTRUCTION FILE
MontseHectorMaiteMiguelIJMPB(Rev4a)
14000
13000
12000
11000
10000
9000
8000
7000
6000
5000
4000
3000
0 500 1000 1500 2000 2500 3000 3500
Our methods are focused on the analysis of perturbed samples (see Fig. 1)
because one rarely find a perfect trace (see Fig. 2) of a well known certified explosive.
3000
2500
2000
1500
1000
500
0
0 500 1000 1500 2000 2500 3000 3500
On the other hand our samples show minor calibration uncertainties which may
be assumed and corrected by our classification systems.
or mixtures and 212 blank substances). These observations are highly correlated.
Thus our first objective is to filter redundant information by means of selection
of a subset of non-correlated data but describing almost all information and thus
allowing to classify the sample into the set of spectra. This is performed by using
Principal Component Analysis (PCA).
Next we determine if new variables obtained from PCA analysis will be able to
discriminate between explosive and blank patterns; that is to say, if new variables
are enough to simulate the characteristic function of the set of explosives. We per-
form parametric tests for independent samples to determine which components have
significative different means. Of course this step depends upon the set of samples
of explosives and the set of samples of blank substances. Hence the previous choice
of both sets is critical.
Once significant principal components are selected for both sets (explosives and
blanks), a logistic regression is performed. Hence the probability of successful clas-
sification is obtained as well as the probability of type I and type II error.
On the other hand the correlations between the original variables and these
principal components are nearly zero and are not statistically significant (p > 0.05).
Hence we choose the seven principal components as output of PCA.
July 2, 2012 14:52 WSPC/INSTRUCTION FILE
MontseHectorMaiteMiguelIJMPB(Rev4a)
Our goal is to classify a concrete sample as explosive or not. We reduce the above
set of principal components by means of selecting those variables with significant
p-values for a two sample t-test. Only 4 principal components show statistically
significant different means. This set of principal components are used in a logistic
model as predict variables (p < 0.01). We also reduce sample size and balance the
amount of blank and non-blank (explosive or mixture) samples. This latter reduction
is performed in order to obtain better logistic regression results 18,19 .
We define value 0 for a substance classified as blank (non-explosive) and 1 for a
substance classified as explosive. After performing a logistic regression with critical
classification value p = 0.5 we obtain the following results
The following a priori probabilities are obtained from the above data:
sensibility = p(1|E) = 65.1%, specificity = p(0|B) = 78.8%. Hence the global
probability of successful classification is 71.93%.
Error probabilities are easily obtained. The false positive (type I error) with
probability 0.349 whereas the probability of false negative (type II error) is 0.212.
Fig. 4. ROC curve relating specificity vs sensitivity of procedure for noisy (non-corrected) data
(spectra)
decrease the probability of false positive, but this increase the probability of false
negative; this is shown, for noisy case, in Fig. 4 by means of ROC curve relating
specificity on X-axis versus sensitivity on Y -axis.
A compromise is necessary depending on the real application we are trying. Min-
imize false negative (i.e. maximize sensitivity) would apply in our case of detecting
explosive substances. Next we write down explicit pairs (sensitivity, specificity) for
several critical classification values p. Note that this values are obtained from the
COR curve, which is parametrized by p:
critical value (p) sensitivity specifity false positive
0.31 95% 22% 78%
0.36 90% 31% 69%
0.5 65% 79% 21%
0.62 40% 90% 10%
0.8 13% 95% 5%
11000
10000
9000
8000
7000
6000
5000
4000
3000
0 500 1000 1500 2000 2500 3000 3500
Many different parameters were tried in order to know which are the critical
properties of the signal are focused by the expert agent. Best results are obtained
by taking into account:
(1) Where are located the maxima of the slope of the signal? µ
(2) Which is the maximum of the intensity? ι
(3) Once above points (1) and (2) are solved, the significance of the solutions. σ
Once again we assign a fuzzy value µ, ι, σ ∈ (0, 1) to each of the above solutions.
We combine the values α, µ, ι, σ in order to obtain a fuzzy value (or probability)
π = π(α, µ, ι, σ) ∈ (0, 1) of being a peak. Finally we set a critical value π0 in order
to decide a peak if π(α, µ, ι, σ) ≥ π0 and to decide not a peak otherwise.
As usual in fuzzy reasoning it is necessary to state several constants in order to
traslate to fuzzy variables which we understand as big slope, equal intensities, close
peaks, most peaks agree and so on. By means of modifying these constants we get
more or less plasticity for our system. Hence more (or less) samples are classified
and therefore more (or less) false positives/negatives are obtained. Once again we
need a compromise fluency/accuracy for our system.
A nice property of this method is that system is proved to be robust; that is to
say, small perturbations of the parameters give small differences in the results.
Acknowledgements
This research has been developed in collaboration with Indra Sistemas, León within
SEDUCE project framework; partially supported by CDTI (MECC, Spain), MCyT
(Spain), and JCyL (Spain).
1. J. E. Everett et al., in The practical handbook of genetic algorithms. Applications. 2nd
ed., ed. L. Chambers (CHAPMAN & HALL/CRC, Boca Raton, USA, 2001).
2. B. Grard and D. Thierry, “ Neural networks for process control and optimization: Two
industrial applications”, ISA Transactions 42, 2003, pp. 39–51.
3. M. Iglesias, B. Naudts, A. Verschoren and C. Vidal, Foundations of generic opti-
mization. Volume 1: A combinatorial approach to epistasis (Springer, Dordrecht, The
Netherlands, 2005).
4. M. R. Khoshayanda, H. Abdollahib, M. Shariatpanahia, A. Saadatfarda and A. Mo-
hammadi, “Simultaneous spectrophotometric determination of paracetamol, ibuprofen
and caffeine in pharmaceuticals by chemometric methods”, Spectrochim Acta A Mol
Biomol Spectrosc 70, pp. 491–499.
5. T. Kohonen, S. Kaski, K. Lagus, j. Salojärvi, J. Honkela, v. Paatero and A. Saarela,
”Self Organization of a Massive Document Collection”, IEEE Transactions on Neural
Networws 11(3), 2000, pp. 574–585.
6. C. Krafft, G. Steiner, C. Beleites and R. Salzer, “Disease recognition by infrared and
Raman spectroscopy”, J. Biophoton 2(1-2), 2009, pp. 13–28.
7. M. Lee, “Evolution of behaviours in autonomous robot using artificial neural network
and genetic algorithm”, Information Sciences 155, 2003, pp. 43–60.
8. C. G. Looney, Pattern recognition using neural networks. Theory and applications for
engineers and scientists (Oxfors University Press, USA, 1997).
9. Q. Meng, M. Lee, “Error-driven active learning in growing radial basis function net-
works for early robot learning”, Neurocomputing 71(7-9), 2008, pp. 1449-1461.
10. Z. Michalewicz, Genetic Algorithms + Data Stuctures = Evolution programs. 2nd ed.
(Springer-Verlag, New York, 1994).
11. W. F. Pearman and A. W. Fountain III, “Classification of Chemical and Biological
Warfare Agent Simulants by Surface-Enhanced Raman Spectroscopy and Multivariate
Statistical Techniques”, Applied Spectroscopy 60(4), 2006, pp. 356–365.
12. Y. Roggo, P. Chalus, L. Maurer, C. Lema-Martinez, A. Edmond and N. Jent, “A
review of near infrared spectroscopy and chemometrics in pharmaceutical technologies”,
Journal of Pharmaceutical and Biomedical Analysis 44, 2007, pp. 683–700.
13. P. M. Talaván and J. Yáñez, “Parameter setting of the Hopfield networks applied to
TSP”, Journal of Neural Networks 15, 2002, pp. 363–373.
14. M. Urbano-Cuadrado, M. D. Luque-de-Castro and M. A. Gómez-Nieto, “Study of
spectral analytical data using fingerprints and scaled similarity measurements, Anal
Bioanal Chem 381, 2005, pp. 953–963.
15. I. Viuela and I. M. Galván-León, Redes de neuronas artificiales. Un enfoque práctico
(Prentice Hall, Madrid, 2004).
16. J. Weng, “On developmental mental architectures”, Neurocomputing 70, 2007,
pp. 2303–2323.
17. L. Yizeng, W. Hailong, S. Guoli, J. Jianhui, L. Sheng and Y. Ruqin, “Aspects of recent
developments in analytical chemometrics”, Science in China: Series B Chemistry 49(3),
2006, pp. 193–203.
18. A. Agresti. Categorical Data Analysis (Wiley-Interscience, New York, 2002)
19. D.W. Hosmer, S. Lemeshow. Applied Logistic Regression (Wiley, New York, 2000)