Stellar Data Classification Using SVM With Wavelet Transformatio
Stellar Data Classification Using SVM With Wavelet Transformatio
Y(r) can be designed such that the set in equation (2) is or-
\
, 0
thonormal and complete, which implies a stable and unique 0
reconstruction exists and is given by ,
where the his are the Lagrangian multipliers and are nonzero In general terms, a stellar spectrum consists of a black
only for the support vectors. Thus, hyperplane parameters body continuum light distribution, distorted by the interstel-
(w,b)and the classifier function f(x;w, b ) can be computed lar absorption and reemission of light and by the presence
by optimization process. The decision function is obtained or absence of absorption and emission lines and molecular
as follows: bands [ 121. Stellar spectra of the seven main types are shown
I
in Figure 2.
f(x) = sgn{xyihi(x.xi) +b}. (8) Before analyzing the spktra, all of them are scaled to flux
i=l 1 at wavelength 545.0 nm in order to normalize the Rux Val-
ues.
In cases where the linear boundary in input spaces will
not be enough to separate two classes properly. it is possi-
ble to create a hyperplane that allows linear separation in
the higher dimension. Thc method consists in. projecting
the data in a higher dimension space where they are consid-
ered to become linearly separable. The transformation into
higher-dimensional feature space is relatively computation-
intensive. A kernel can be used to perform this transforma-
9
tion and the dot product in a single step provided the trans- Y
z 4
formation can be replaced by an equivalent kemel function. a
This helps in reducing the computational load and at the
same time retaining the effect of higher-dimensional trans-
formation. The kemel function K(xi,xj) is defined as fol-
lows:
K(xi,xj) O(xi).O(xj). (9)
There are some commonly used kernels:
Radial basis: K(xi,xj) = exp(-llxi -xj11'/20'). Figure 2: Typical seven main types of stellar spectra
5896
Here we propose a new wavelet de-noising method tor
stellar spectra based on threshold technique with the follow-
ing steps:
A example of de-noising result using the proposed method scales are shown in Figure 5. Wavelet coefficients at scales
is shown in Figure 3. 5 , 6 and 7 are extracted and linked together to be a feature
vector. They will be used for classification in the following
1 I
section.
I i
311.0 7.12.7
5097
after being normalized.
Table 2: Mean and standard deviation of the classification
accuracy of SVM, PCA+SVM and wavelet+SVM
3.4 Classification
The bootstrap technique [2] is applied in our experiments. Methods CCR Std
161 samples are divided into two parts, I O independent ran- SVM 81.66% 3.75
dom samples drawn from each class are used to train the DN+SVM 93.26% 3.08
SVM classifier and the remaining samples are used as test FS+SVM 90.15% 2.98
samples to calculate CCR. Each experiment is repeated 25 FW+SVM 88.81% 2.52
times with random different partition and the mean and stan- PCA+SVM 80.34% 2.90
dard deviation of the classification accuracy are reported.
The SVM is designed to solve two-class problems. For
multi-class stellar spectra, a binary tree structure is proposed shows that the first 10 PCs just have only 0.43%reconstruc-
to solve the multi-class recognition problem. Usually two tion error, so we choose them to define a IO-dimensional sub-
approaches can he used for this purpose [16, IO]: I ) the one- space and map spectra on it.
against-all strategy to classify between each class and all the
remaining. 2) the one-against-one strategy to classify be-
tween each pair. We adopt the latter one for our multi-class
stellar spectral classification, although needing inore SVMs
to be applied, that allows the computing time to be decreased
because the complexity of the algorithm depends strongly on
the number of training samples.
Firstly in the experiment, original data are directly used
as input of a SVM. Table 2 shows that direct classification
using SVM could achieve a CCR of 8 I .66%.
Due to the very low SNR of the spectral data, the de-
noised spectra obtained by the proposed de-noising method
are chosen as the input of a SVM (Denoted as DN+SVM).
Table 2 indicates that this method could achieve a satisfy-
ing CCR of 93.26% and the smaller standard deviation than
direct classification using SVM. According to the hypothe-
sis tests (r-test) applied, the DN+SVM method has the mean
of the classification accuracy in the validation set signifi-
cantly (a=O.OS) larger than direct classification using SVM Figure 6: Distrihution of all the spectra in first three principle
@-values = 1.97 x IO@). components space
The feature spectra extracted with the method described in
Section 3.2 are also used as the input of a SVM (Denoted as From Table 2, we can see that the performance of
FS+SVM). A good CCR of 90.15% is obtained too. Because PCA+SVM is not very good. It is even worse than di-
we discard the continuum which sometimes can support the rect classification with SVM. According to the hypothesis
recognition, theCCR of this method is a little lower than that tests (t-test) applied, the PCA+SVM method has the mean
of DN+SVM. of the classification accuracy in the validation set signifi-
Then features in wavelet domain are used as the input cantly (a=O.OS) smaller than DN+SVM method @-values =
of a S V M (Denote as FW+SVM). The results show that 2.01 x The reason is that SVM can simulate a non-
wavelet coefficients features could be combined with SVM. linear projection which can make linearly inseparable data
The CCR is a little lower than that of DN+SVM, but the com- project into a higher dimension space, where the classes are
putational efficiency of this method can be greatly improved linearly separable. So the dimensionality of input data has
by omitting the reconstruction process. little influence on SVM.
5090
[8] 1. T. Jolliffe, Principal Cornponent Analysis, Springer-
Verlag, New York, 1986.
t20
I [9] J. H. Friedman, “Regularized Discriminant Analysis”, J.
Amer Statist. Assoc., Vol. 84, pp. 165-1 75, 1989.
‘
.*” ” ’0 20 a0 U
40 Io
EriL””Dl”* “e.
60 70 €3 go j
7m
[12] M. V. Zombeck, HandOook of Astronomy arid Asrro-
physics, 2nd ed., Cambridge University Press, London,
1990.
Figure 7: Eigenvalue in decreasing ordei [I31 S. Aeherhard, D. Coomans, 0. D. Vel, “Comparative
Analysis of Statistical Pattern Recognition Methods in
High Dimensional Settings”, Parterfi Recognition, Vol.
show that the proposcd technique is robust and computation- 27. pp. 1065-1077, 1994.
ally efficient. It is very suitable for automated recognitiori of
voluminous spcctra with low SNR. [I41 S. Dumais, “Using SVMs for Text Categorization”,
IEEE Ifirelligeiit S y s t e m , Vol. 13, No. 4, pp. 21-23,
Acknowledgements 1998.
The research work described in this paper was fully sup- [ 151 S. G . Mallat, “A Theory for Multiresolution Signal De-
ported by a grant from the National Natural Science Founda- composition: the Wavelet Representation”, IEEE Trafrs.
tion of China (Project No. 60275002) and by National High Paneni Anal. Mach. Intell., Vol. 11, No. 7. pp. 674-693,
Technology Research and Development Program of China 1989.
(863 Program, Project No. 2003AA 133060).
[I61 V. Vapnik, The Natirre of Statistical Lear-riirig Theory,
Springer-Verlag. New York, 1995.
References
[I] A. Webb, Statistical Panem Recogriirion, Oxford Uni.
versity Press, London. 1999.
5899