0% found this document useful (0 votes)
32 views

Stellar Data Classification Using SVM With Wavelet Transformatio

This document summarizes a technique for classifying stellar spectra using support vector machines (SVM) with wavelet transformation. Wavelet transformation is used to extract features from stellar spectra and reduce noise. SVM is then used to complete the classification. Experimental results showed the technique is robust to noise and more efficient than using either SVM or principal component analysis alone. Keywords included wavelet transform, support vector machine, principal component analysis, and stellar spectra.

Uploaded by

DRScience
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Stellar Data Classification Using SVM With Wavelet Transformatio

This document summarizes a technique for classifying stellar spectra using support vector machines (SVM) with wavelet transformation. Wavelet transformation is used to extract features from stellar spectra and reduce noise. SVM is then used to complete the classification. Experimental results showed the technique is robust to noise and more efficient than using either SVM or principal component analysis alone. Keywords included wavelet transform, support vector machine, principal component analysis, and stellar spectra.

Uploaded by

DRScience
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2004 IEEE International Conference on Systems, Man and Cybernetics

Stellar Data Classification Using SVM with Wavelet


Transformation*
Ping Guo, Fei Xing and YuGang Jiang
Department of Computer Science
Beijing Normal University, Beijing, 100875, P.R.China
[email protected]
Abstract - This paper presents a nosel stellar spectra the total number of samples is larger than the dimension of
recogriitiori tecli~iiqiie,which is based on waselet transform variables. The covariance matrix, in LDA, is substituted by
ofid support i'ector machines. Due to rhe y e n low signal-to- common covariance matrix. However, in the case of small
noise ratio of real nor-Id spectral data, a de-Jioisbig metltod sample sizes, the common covariance matrix is also singular.
f o r stellar spectra is praposed usirig wavelet transform based To solve the small training sample with high-dimension set-
011 rlie traditiorial threshold techniqlie. Theri support vector ting problem, Regularized discriminant analysis (RDA) [9J
rnacliiries are adopted to coriiplere the classifcatiorl. Fea- could be applied. RDA adds the identity matrix as a reg-
tiires in spatial and i<.arelerdomain are estrncted aud then ularization term to solve the problem in matrix estimation.
used a s iriput of siipport vector machines. E.rper.irnerira1re- But parameter optimization of RDA is time consuming. Sup-
sirlrs show rhat oiir technique is robust to rioise and ejjicient port Vector Machines (SVM) [ 3 ] is a new technique for data
fo comnpirtarion. The obtained correct classijcation rate of classification, it has been used successfully in many ob.iect
the proposed riierliods is rriuch higher than rhat of usirig ei- recognition applications [6, 141. SVM is known to gencral-
ther support vector machine alone o r principle coriipoiienr ize well even in high dimensional spaces under small training
aiialjsis feature extractiori method. sample condition. This characteristic is very appropriate for
stellar spectral classification where such conditions are typi-
Keywords: Wavelet Transform, Suppon Vector Machine, cally encountered.
Principle Component Analysis, Stellar Spectra. Stellar spectra are extreincly noisy, if we use the spectral
lines directly as input of a SVM without pre-processing, the
1 Introduction classification results are poor. In order Io obtain a high cor-
Spectral recognition is always a important task in astron- rect classification rate (CCR), one possible approach is to use
omy and astrophysics. The study of distribution of spectral wavelet transform technique.
types and analysis of spectral datacan help to understand the The wavelet transform provides a powerful and versatile
temporary change of the physical conditions of stars from a framework for astronomical signal processing. Each scale of
statistical point of view, and therefore, to learn about their resolution may pick up different types of structure in the sig-
evolution. The traditional manual recognition by experts is nals. It is particularly appropriate for spectral data since the
rather subjective and time-consuming, especially so when structures we want to recognize have different scales and res-
the number of spectra is very high and a large number of olutions. In order to reduce the noise, a wavelet de-noising
human resources is required. It would therefore be advisable method for stellar spectra is proposed. Feature extraction
to optimize the procedure by means of an automatic, fast and methods in wavelet domain and spatial domain are also in-
efficient computational technique. vestigated. Then we adopt a SVM to complete the classifica-
To automatically recognize stellar spcctra, we should build tion.
the classifier with training samples first. There are many The organization of this paper is as follows: In Section 2,
classification techniques in this research field, among them the wavelet transform and SVM are introduced. The exper-
discriminant analysis is one of the supervised leaming clas- imental strategies are described and the results and compar-
sifier building techniques. Quadratic discriminant analysis isons are given in Section 3. The conclusions are presented
(QDA) is widely used if sufficient training samples could in the last Section.
be supplied [ l , 131. Unfortunately, sometimes training sam-
ples are usually hard to acquire, and the dimensionality of
spectral data is extremely high, thus the estimated covari-
2 Background
ance matrix will become singular. Linear discriminant anal- 2.1 Wavelet Transform
ysis (LDA) could be used as one kind of regularization if
The wavelet transform is a technique for analyzing signals.
*0-7803-8566-7/04/$LO.00@ 2004 IEEE. It was developed as an alternative to the shon time Fourier
5894
transform (STFI') to overcome problems related to its he- A pyramidal algorithm is described in Reference [IS] for
quency and time resolution properties. The disadvantage of efficiently evaluating discrete wavelet transform. In this pa-
the STm being its inadequacy for analyzing multi-scale :jig- per, we adopt widely used Daubechies-4 as wavelet basis
nals due to its fixed resolution in the space and frequency
. . function.
domains, the solution is to have a flexible time-frequency
resolution bv trading resolution in time and resolution in fre-
I
2.2 Support Vector Machine
quency. That is, the window should narrow (zoom in) at high SVM was introduced by Vapnik in late 1960s on the foun-
center frequency (small scale) to give a better accuracy, and dation of statistical learning theory 1161. In the theory,
widen (zoom out) at low frequency (large scale) to give accu- SVM classification can be traced back lo the classical struc-
rate frequency information. This is exactly what is achieved tural risk minimization (SRM) approach, which determines
by wavelet transform, where the basis functions are obtained the classification decision function by minimizing empirical
from a single prototype wavelet (or mother wavelet) Y ( t )by risk.
translation and dilation: SVM uses linear model lo implement nonlinear class
boundaries through some nonlinear mapping the input vec-
tors x into the high-dimensional feature space. The optimal
separating hyperplane is determined by giving the largest
where a and b are both real numbers quantifying the scal- margin of separation between different classes. For the two-
ing and translation operations. In this study, we are only class case, this optimal hyperplane bisects the shortest line
consider discrete orthonormal wavelet transform, where the between the convex hulls of the two classes. The data are
time-frequency plane is discretized based on a logarilhm separated by a hyperplane defined by a number of support
sampling. By substitution of a = 2'" and b = i i . 2 " ' , in;n 6: Z, vectors. The SVM attempts to place a linear boundary be-
the basis functions become tween the two different classes, and orient it in such a way
that the margin i s maximized. In Figure 1, a series of points
for two different classes of data are shown by circles and
With the basis functions constructed, the discrete wavelet crosses respectively. The boundary can be expressed as fol-
transform of a signal X(I) is defined as lows:
W(n7,n) =<.C(I),Y,,,.,,(t)> . (3) (w.x)+b=O; w E p , bER. (5)
where the vector w defines the boundary, x is the input vector
In interpretation, the.parameter 1j7 is referred to as de-
of dimension N and b is a scalar threshold.
composition level correlated to the frequency band asses!;ed,
~~

and I I indicates the steps of the translation. It is proved


0
by Daubechies [7] that compactly supported mother wavelet \

Y(r) can be designed such that the set in equation (2) is or-
\
, 0
thonormal and complete, which implies a stable and unique 0
reconstruction exists and is given by ,

where q,,, 8 , ( t )is the conjugate of the wavelet functions.


The wavelet representation defined by equation (4) allows
us to interpret the signal at a certain location I as the weighted
sum of the contributions at various scales around that lo-
cation. It is distinct in that the wavelet basis '+'In.n(~) have
time-widths adopted to their frequency. At the highest Iwel
the wavelet basis covers the whole data, and allows U'; to
examine the stationary feature of the signal over the whole
recorded time period. At each consecutively lower level, the
support of the basis function is halved so that we are ex-
posed to smaller and smaller features of the signal. Also
note that the translation step is halved as the extent of the Figure I : Classification of data by SVM. Patterns from the
basis wavelet is halved. This indicates there is no overlap two classes are represented by x and 0, respectively. The
between the supports of the translated wavelets, hence there extracted support vectors are shown in boldface
is no redundancy in the information covered. It is clear that
the property of good time-frequency localization with a Aex- The optimal hyperplane is required to satisfy the following
ible time-frequency resolution of the discrete wavelet trans- constrained minimization,
form makes it superior to the other transforms in resol\ing 1
features of very different scales. minimize{ - 11w11'}
2
5895
with y i ( w . x i + b ) > l , i = 1 , 2 ,..., I , (6)
Table I : Main features of spectral types in MK system
where I is the number of training sets.
For a linearly non-separable case, the above formula can Type Color Prominent Lines
be extended by introducing a regularization parameter C as 0 Bluest Ionized He
the measurement of violation of the constraints as follows: B Bluish Neutral He, Neutral H
A Blue-white Neutral H
I 1 ' F White Neutral H, Ionized Ca
minimize{Chi- - hihj.yiyj(xi.xj)}
i=l 2 i,j=l G Yellow-white Neutral H, Ionized Ca
I
K Orange Neutral Metals, Ionized Ca
with x y i h i = O , O < h i < C , i = 1 , 2 ,... ; I , (7) M Red Molecules & Neutral Metals
i= 1

where the his are the Lagrangian multipliers and are nonzero In general terms, a stellar spectrum consists of a black
only for the support vectors. Thus, hyperplane parameters body continuum light distribution, distorted by the interstel-
(w,b)and the classifier function f(x;w, b ) can be computed lar absorption and reemission of light and by the presence
by optimization process. The decision function is obtained or absence of absorption and emission lines and molecular
as follows: bands [ 121. Stellar spectra of the seven main types are shown
I
in Figure 2.
f(x) = sgn{xyihi(x.xi) +b}. (8) Before analyzing the spktra, all of them are scaled to flux
i=l 1 at wavelength 545.0 nm in order to normalize the Rux Val-
ues.
In cases where the linear boundary in input spaces will
not be enough to separate two classes properly. it is possi-
ble to create a hyperplane that allows linear separation in
the higher dimension. Thc method consists in. projecting
the data in a higher dimension space where they are consid-
ered to become linearly separable. The transformation into
higher-dimensional feature space is relatively computation-
intensive. A kernel can be used to perform this transforma-
9
tion and the dot product in a single step provided the trans- Y
z 4
formation can be replaced by an equivalent kemel function. a
This helps in reducing the computational load and at the
same time retaining the effect of higher-dimensional trans-
formation. The kemel function K(xi,xj) is defined as fol-
lows:
K(xi,xj) O(xi).O(xj). (9)
There are some commonly used kernels:

Polynomial: K(xi,xj) = [(xi.xj)+ I]*.

Radial basis: K(xi,xj) = exp(-llxi -xj11'/20'). Figure 2: Typical seven main types of stellar spectra

Sigmoid K(xi,xj) =tanh(v(xi.xj)+c).


3.2 Feature in Spacial Domain
3 Experiments
The signal-to-noise ratio (SNR) of the real world spectra
3.1 The Stellar Data Set is very low. In order to raise the CCR, firstly we consider to
In MK classification system, stellar spectra are catalogued implement the de-noising process. In the experiment we find
into seven main spectral types in the order of decreasing tem- that traditional threshold de-noising technique [4] alone can
perature, namely: 0 , €3, A, F, G, K, M [I I]. Table 1 illus- not obtain satisfying result.
trates the main properties of each spectral type in the system. Usually the frequency of noise are much higher than that
The stellar spectra used in our experiments are selected from of signal. When wavelet scale increases, the frequency rep-
Astronomical Data Center (ADC). We use 161 stellar spectra resented in the scale decreases as described in Section 2.1.
contributed by Jacoby et a/ [5]. The spectral samples covers So de-noising process can be implemented by omitting coef-
all the seven classes. The wavelength is from 351.0 to 742.7 ficients from the first several wavelet scales, which belong to
nm and resolution is 0.14 nm. higher frequency.

5896
Here we propose a new wavelet de-noising method tor
stellar spectra based on threshold technique with the follow-
ing steps:

Perform wavelet decomposition at seven scales of all


the spectra by Mallat algorithm [IS]. The decompoii-
tion of each spectral line produces a wavelet coefficient
set W,, at each scale m and one smoothed array C a t the
last scale.
Use traditional threshold de-noising technique to dis-
criminate between all the coefficients considered to
arise from noise and those considered to arise from fez-
tures of the signal. Coefficients considered to arise from
noise are set to zero.
Omit (zero out) wavelet coefficients belonged to the first
four scales.
Reconstruct the spectra using the processed wavelet co- Figure 4: Feature spectrum. Above line is original spectrum,
efficients and smoothed array S. middle one is continuum, bottom one is the feature spectrum

A example of de-noising result using the proposed method scales are shown in Figure 5. Wavelet coefficients at scales
is shown in Figure 3. 5 , 6 and 7 are extracted and linked together to be a feature
vector. They will be used for classification in the following
1 I
section.

I i
311.0 7.12.7

Figure 3: De-noising results. Above line is original s p m


trum, middle one is with threshold technique alone, bottom
one is with our method Figure 5 : Wavelet coefficients. (a) Original spectrum. (b-h)
Wavelet Scales 1 to 7, respectively
The conlinuum can be obtained through reconstructing the
spectra using the smoothed array S. Similar to the steps de-
Once the wavelet coefficients are obtained by the proposed
scribed above, if we discard the smoothed array S when re-
method, they must be normalized so as to be presented to
constructing the spectra at Step 4, the feature spectra can be
the SVM. The inputs of the SVM are standardized in the
obtained. Figure 4 shows the continuum and feature spectra
following way:
extracted by this approach.
A global sigmoidal function is applied to all the coeffi-
3.3 Feature in Wavelet Domain cients. This function normalizes the input coefficients in the
[0,I ] interval. That is
In Section 3.2, we select some wavelet coeficients at fired
scales and reconstruct the feature spectra. In this section, we
.XI= 1/(1+e-"), (10)
try to extract those coefficients and choose them as input of
a SVM directly. The wavelet coefficients at different wavelet where x is the original coefficients and x' is the output result

5097
after being normalized.
Table 2: Mean and standard deviation of the classification
accuracy of SVM, PCA+SVM and wavelet+SVM
3.4 Classification
The bootstrap technique [2] is applied in our experiments. Methods CCR Std
161 samples are divided into two parts, I O independent ran- SVM 81.66% 3.75
dom samples drawn from each class are used to train the DN+SVM 93.26% 3.08
SVM classifier and the remaining samples are used as test FS+SVM 90.15% 2.98
samples to calculate CCR. Each experiment is repeated 25 FW+SVM 88.81% 2.52
times with random different partition and the mean and stan- PCA+SVM 80.34% 2.90
dard deviation of the classification accuracy are reported.
The SVM is designed to solve two-class problems. For
multi-class stellar spectra, a binary tree structure is proposed shows that the first 10 PCs just have only 0.43%reconstruc-
to solve the multi-class recognition problem. Usually two tion error, so we choose them to define a IO-dimensional sub-
approaches can he used for this purpose [16, IO]: I ) the one- space and map spectra on it.
against-all strategy to classify between each class and all the
remaining. 2) the one-against-one strategy to classify be-
tween each pair. We adopt the latter one for our multi-class
stellar spectral classification, although needing inore SVMs
to be applied, that allows the computing time to be decreased
because the complexity of the algorithm depends strongly on
the number of training samples.
Firstly in the experiment, original data are directly used
as input of a SVM. Table 2 shows that direct classification
using SVM could achieve a CCR of 8 I .66%.
Due to the very low SNR of the spectral data, the de-
noised spectra obtained by the proposed de-noising method
are chosen as the input of a SVM (Denoted as DN+SVM).
Table 2 indicates that this method could achieve a satisfy-
ing CCR of 93.26% and the smaller standard deviation than
direct classification using SVM. According to the hypothe-
sis tests (r-test) applied, the DN+SVM method has the mean
of the classification accuracy in the validation set signifi-
cantly (a=O.OS) larger than direct classification using SVM Figure 6: Distrihution of all the spectra in first three principle
@-values = 1.97 x IO@). components space
The feature spectra extracted with the method described in
Section 3.2 are also used as the input of a SVM (Denoted as From Table 2, we can see that the performance of
FS+SVM). A good CCR of 90.15% is obtained too. Because PCA+SVM is not very good. It is even worse than di-
we discard the continuum which sometimes can support the rect classification with SVM. According to the hypothesis
recognition, theCCR of this method is a little lower than that tests (t-test) applied, the PCA+SVM method has the mean
of DN+SVM. of the classification accuracy in the validation set signifi-
Then features in wavelet domain are used as the input cantly (a=O.OS) smaller than DN+SVM method @-values =
of a S V M (Denote as FW+SVM). The results show that 2.01 x The reason is that SVM can simulate a non-
wavelet coefficients features could be combined with SVM. linear projection which can make linearly inseparable data
The CCR is a little lower than that of DN+SVM, but the com- project into a higher dimension space, where the classes are
putational efficiency of this method can be greatly improved linearly separable. So the dimensionality of input data has
by omitting the reconstruction process. little influence on SVM.

3.5 Comparison with Other Feature Extrac- 4 Conclusions


tion Method In this paper, wavelet transform and SVM are applied to
Principal component analysis (PCA) [SI is a good tool for stellar spectra recognition. Methods for the continuum de-
dimension reduction, data compression and feature extrac- termination and noise suppression are described in detail. In
tion. As a comparison, we use the dimension reduced data order to raise the correct classification rate, feature extraction
with PCA as the input of SVM (Denoted as PCA+SVM). methods in spatial and wavelet domain are proposed. Exper-
The distribution of these eigen-spectra in first three princi- imental results show that our recognition technique is much
pal components (PCs) space is shown in Figure 6. Figure 7 better than using SVM alone or PCA+SVM. The results also

5090
[8] 1. T. Jolliffe, Principal Cornponent Analysis, Springer-
Verlag, New York, 1986.

t20
I [9] J. H. Friedman, “Regularized Discriminant Analysis”, J.
Amer Statist. Assoc., Vol. 84, pp. 165-1 75, 1989.

[lo] J. Weston, C. Watkins, “Support vector machines for


multi-class pattem recognition”, Proc. of the Seventh
European Symposium on Artificial Neural Networks,
Bruges, pp. 219-224.1999,
1111.
. M. J. Kunz, ‘The MK Process and Stellar Classifi-

{ cation of Stars”, David Dunlap Observatory, Toronto,


1984.


.*” ” ’0 20 a0 U
40 Io
EriL””Dl”* “e.
60 70 €3 go j
7m
[12] M. V. Zombeck, HandOook of Astronomy arid Asrro-
physics, 2nd ed., Cambridge University Press, London,
1990.
Figure 7: Eigenvalue in decreasing ordei [I31 S. Aeherhard, D. Coomans, 0. D. Vel, “Comparative
Analysis of Statistical Pattern Recognition Methods in
High Dimensional Settings”, Parterfi Recognition, Vol.
show that the proposcd technique is robust and computation- 27. pp. 1065-1077, 1994.
ally efficient. It is very suitable for automated recognitiori of
voluminous spcctra with low SNR. [I41 S. Dumais, “Using SVMs for Text Categorization”,
IEEE Ifirelligeiit S y s t e m , Vol. 13, No. 4, pp. 21-23,
Acknowledgements 1998.
The research work described in this paper was fully sup- [ 151 S. G . Mallat, “A Theory for Multiresolution Signal De-
ported by a grant from the National Natural Science Founda- composition: the Wavelet Representation”, IEEE Trafrs.
tion of China (Project No. 60275002) and by National High Paneni Anal. Mach. Intell., Vol. 11, No. 7. pp. 674-693,
Technology Research and Development Program of China 1989.
(863 Program, Project No. 2003AA 133060).
[I61 V. Vapnik, The Natirre of Statistical Lear-riirig Theory,
Springer-Verlag. New York, 1995.
References
[I] A. Webb, Statistical Panem Recogriirion, Oxford Uni.
versity Press, London. 1999.

[2] B. Efron, R. Tibshirani, An hinadrrction to the Boor-


strap, Chaoman & Hall, London, 1993.

[3] C. Cones, V. Vapnik, “Support Vector Networks”, Ma.


chifieLeanring, Vol. 20, pp. 273-297, 1995.

[4] D. L. Donoho, “De-noising by soft-thresholding”, /LEE


Trans. of1 Itifof-tnatiofiTheory, Vol. 41, No. 3, pp. 613-
627, 1995.

[SI G. H. Jacoby, D. A. Hunter, and C. A. Christian, “A li-


brary of stellar spectra”. The Astr-ophysical Joirnial Sup-
plernerit Series, Vol. 56, pp. 257-281, 1984.

[61 H. Drucker, D. Wu, V. Vapnik, “Support Vector ]Ma-


chines for Spam Categorization”, IEEE Trans. on Neirral
Networks, Vol. IO, No. 5 , pp. 1048-1054, 1999.

[7] 1. Daubechies, “The Wavelet Transform, Time-


frequency Localization and Signal Analysis”, IEEE
Trans. hiforination Theory, Vol. 30, pp. 961-1005,
1990.

5899

You might also like