Feature Parameter Extraction From Wavelet Sub-Band Analysis For The Recognition of Isolated Malayalam Spoken Words

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

52 (IJCNS) International Journal of Computer and Network Security,

Vol. 1, No. 1, October 2009

Feature Parameter Extraction from Wavelet Sub-


band Analysis for the Recognition of Isolated
Malayalam Spoken Words
Vimal Krishnan V.R, Babu Anto P
School of Information Science and Technology
Kannur University, Kerala, India. 670 567
[email protected], [email protected]

Abstract: The aim is to improve the recognition rate by finding with compact support capable of representing signals with
out good feature parameters based on discrete wavelet transform good time and frequency resolution. The choice of Wavelet
techniques. The data set is created by using Malayalam spoken Transform over conventional methods is due their ability to
words which is collected from twenty individuals in various time capture localized features [2]. ANN is an adaptive system
intervals. We have employed Daubechies wavelet for the that changes its structure based on external or internal
experiment. The feature vector was formed by using the information that flows through the network. Here, accuracy
parameters extracted from discrete wavelet transform
has been increased by the combination of wavelet and
techniques. The feature vector was produced for all words and
formed a training set for classification and recognition purpose.
artificial neural network.
Feature vectors of element size sixteen was collected for all the The rest of the paper structured as follows: Section 2 gives
words by using classical wavelet t decomposition technique. a brief Review on Wavelet based feature extraction for
speech recognition. Section 3 deals with the Classifier used
Keywords: Wavelet, Speech Recognition, Feature for the Experiment. Section 4 discusses the creation of
Extraction, Artificial Neural Network. Speech database. Experiment and Results are summarized
in section 5.

1. Introduction 2. Feature Extraction Based On Wavelet


Over the past many decades the researchers are trying to Wavelets are functions that satisfy certain mathematical
come out with new feature parameters which give good requirements and are used in representing data or other
recognition result for computer speech recognition. Majority functions. The basic idea of the wavelet transform is to
of the research activities are focusing on some of the represent any arbitrary signal S as a superposition of a set of
conventional transform techniques like FFT, MFCC, LPC such wavelets or basis functions. These basis functions are
and STFT etc. Speech signals from human are considered to obtained from a single photo type wavelet called the mother
be non stationary in nature. It is very difficult to analyze wavelet by dilation (scaling) and translation (shifts). The
these non stationary signals by using these conventional discrete wavelet transform for one-dimensional signal can
transform techniques. The conventional transform be defined as follows [3].
techniques only focus on the frequency parameters extracted
from the speech signal. Large variation in speech signals
and other criteria like native accent and varying
pronunciations makes the task very difficult. Scientists all
over the globe have been working under the domain, speech
recognition for last many decades. This is one of the (1)
intensive areas of research [1]. However automatic speech
recognition is yet to achieve a completely reliable The indexes c (a, b) are called wavelet coefficients of
performance. Hence ASR has been a subject of intensive signal s(t), a is dilation and b is translation, Ψ(t) is the
research. Recent advances in soft computing techniques give transforming function, the mother wavelet. It is so called
more importance to automatic speech recognition. ASR is a because the wavelet derived from it analyzes signal at
complex task and it requires more intelligence to achieve a different resolutions (1/a). Low frequencies are examined
good recognition result. In abstract mathematics, it has been with low temporal resolution while high frequencies with
known for quite some time that techniques based on Fourier more temporal resolution. A wavelet transform combines
series and Fourier transforms are not quite adequate for both low pass and high pass filtering in Spectral
many problems. Wavelet based transform techniques decomposition of signals [3].
remains indifferent in handling such problems. We have
used wavelet based feature extraction for developing a Wavelet analysis is a powerful and popular tool for the
feature vector. Performance of the overall system depends on analysis of non stationary signals. The wavelet transform is
pre-processing, feature extraction and classification. a joint function of a time series of interest x(t) and an
Selecting a feature extraction method and classifier often analyzing function or wavelet Ã(t). This transform isolates
depends on the available resources. Wavelets are functions
(IJCNS) International Journal of Computer and Network Security, 53
Vol. 1, No. 1, October 2009

signal variability both in time t, and also in “scale” s, by


rescaling and shifting the analyzing wavelet [4].
We have used wavelet based transform technique to
extract feature from very complex speech data. Feature
extraction involves information retrieval from the audio
signal [5]. Here we have used Daubechies 4 (db4) type of
mother wavelet for feature extraction purpose. Daubechies Figure 1: Signal x[n] is passed through low pass and high
wavelets are the most popular wavelets. They represent the pass filters and it is down sampled by 2
foundations of wavelet signal processing and are used in
numerous applications. These are also called Maxflat
wavelets as their frequency responses have maximum
flatness at frequencies 0 and π.

(5)
The Daubechies wavelets have surprising features, such
2.1 Discrete Wavelet Transform as intimate connections with the theory of fractals. The
peculiarity of this wavelet system is that there is no explicit
The transform of a signal is just another form of function, so we can not draw it directly. What we are given
representing the signal. It does not change the information is h(k)s, the coefficients in refinement relation which
content present in the signal. For many signals, the low- connect Ø(t) and translates of Ø(2t) these coefficients for
frequency part contain the most important part. It gives an normalized Daubechies -4 are as follows:
identity to a signal. Consider the human voice. If we remove That is
the high-frequency components, the voice sounds different, Ø(t) = h(0)√2Ø (2t) + h(1)√2 Ø(2t-1) + h(2)√2 Ø(2t-2) +
but we can still tell what’s being said. In wavelet analysis, h(3) √2 Ø(2t-3)
we often speak of approximations and details. The (3)
approximations are the high- scale, low-frequency Where Ø(t) is expressed in terms of Ø(2t) and its translates
components of the signal. The details are the low-scale, high [7].
frequency components [6]. The DWT is defined by the
following equation:

(2)

Where (Ψt) is a time function with finite energy and fast


decay called the mother wavelet. The DWT analysis can be Figure 2: Decomposition Tree
performed using a fast, pyramidal algorithm related to
multirate filter banks. As a multirate filter bank the DWT 3. Classification
can be viewed as a constant Q filter bank with octave
spacing between the centers of the filters. Each sub band In a general sense, a neural network is a system that
contains half the samples of the neighboring higher emulates the optimal processor for a particular task,
frequency sub band. In the pyramidal algorithm the signal something which cannot be done using a conventional
is analyzed at different frequency bands with different digital computer, except with a lot of user input. Optimal
resolution by decomposing the signal into a coarse processors are sometimes highly complex, nonlinear and
approximation and detail information. The coarse parallel information processing systems. Multi Layer
approximation is then further decomposed using the same Perception Network architecture is used for training and
wavelet decomposition step. This is achieved by successive testing purpose. The MLP is a feed-forward network
highpass and lowpass filtering of the time domain signal consisting of units arranged in layers with only forward
and is defined by the following equations: connections to units in subsequent layers [8]. The
connections have weights associated with them. Each signal
traveling along a link is multiplied by its weight. The input
layer, being the first layer, has input units that distribute the
inputs to units in subsequent layers. In the following
(hidden) layer, each unit sums its inputs and adds a
(3) threshold to it and nonlinearly transforms the sum (called
the net function) to produce the unit output (called the
activation). The output layer units often have linear
activations, so that output activations equal net function
values [8][9].
(4)
54 (IJCNS) International Journal of Computer and Network Security,
Vol. 1, No. 1, October 2009

4. Selection of Data Set 5. Experiment and Result


For this experiment we have selected Malayalam Db4 type of wavelet is used in discrete wavelet
language. We have used twenty Malayalam spoken words decomposition technique. After conducting eighth level of
for the experiment. The International Phonetic Alphabet decomposition we have collected largest and smallest
(IPA) format is shown in Table 1. We have selected elements from each sub band level. That is from each level
words of a particular context. The selected words can be of decomposition we have collected the maximum and the
included in the category ‘Consonant Vowel Consonant minimum values. The largest and smallest elements of each
Vowel’. In Malayalam, no consonant is independent. It decomposition level are found to be the dominant feature
always stands with a vowel. The selected spoken words are value for each sample. These dominant elements are used to
very commonly used in Malayalam. Twenty words have develop feature vector for each sample. Thus a feature vector
been selected for the experiment. Samples were collected of size sixteen is gained. This feature vector is given as an
from twenty individuals in various time intervals. 32 speech input to ANN classifier for the training purpose. A
samples were collected for each word. Six hundred and forty feature vector for testing purpose is also developed by using
samples of twenty different words from twenty individuals the same techniques. Fifteen samples for twenty Malayalam
are collected for the experiment. words are collected from different individuals and stored
under various words categories. We trained the training set
Table 1: Input Data And Phonetic Alphabet
using the multi layer perceptron network architecture.
From the experiment we obtained a 84% recognition (
Sl.
No Words In English IPA Out of 640 speech samples 538 could be classified correctly)
by using Artificial Neural Network. The results are shown
in the figure 3 and 4.
1 Vegam //v//ɛ //ɡ //ɑ ː //m/

2 Poayi /p//ɒ //eɪ //ɪ /

3 Chiri /tʃ //ɪ //r//ɪ /

4 Pava /p//ɑ ː //v//ɑ ː /

5 Vila / v//ɪ //l//ɑ ː /

6 Veetham /v//ɛ //ɛ //θ//ɑ ː //m/

7 Niram /n//ɪ //r//ɑ ː //m/

8 Varam /v//ɑ ː //r//ɑ ː //m/

9 Panam /p//ɑ ː //n//ɑ ː //m/ Figure 3. Graph plotted with the feature values of a
speech data that is used for the experiment.
10 Nila /n//ɪ //l//ɑ ː /

11 Paka /p//ɑ ː //k//ɑ ː /

12 Patam /p//ɑ ː //t//ɑ ː //m/

13 Nayam / n//ɑ ː //j//ɑ ː //m/

14 Palam /p//ɑ ː //l//ɑ ː //m/

15 Nale /n//ɑ ː //l//ɛ /

16 Mari /m//ɑ ː //r/ɪ

/m//ɑ ː //ʌ //n//


17 Maunam
ɑ ː //m/

18 Nidhi / n//ɪ // d// h//ɪ /

19 Samam /s//ɑ ː //m//ɑ ː // m/ Figure 4. Various decomposition levels of sample


speech data
20 Tharam / θ//ɑ ː //r//ɑ ː //m/
(IJCNS) International Journal of Computer and Network Security, 55
Vol. 1, No. 1, October 2009

6. Conclusion
From this study we could understand and experience the Vimal Krishnan V R is working as Project Fellow under the
effectiveness of Classical Wavelet Decomposition. The domain Speech Processing at School of Information Science and
performance of discrete wavelet decomposition in feature Technology, Kannur University, India. In 2005, he has received his
Master of Science degree in Software Science from the Periyar
extraction is appreciable. We have also observed that,
University, Tamil Nadu, India. He is Persuading his Doctoral
Neural Network is an effective tool which can be embedded degree in Kannur University. His main research interest lies in the
successfully with wavelet. The efficiency of the method is to area of Soft Computing Techniques, Speech and Signal Processing
be verified with very large database. and Pattern recognition.

Babu Anto P is working as Head, Department of Information


Acknowledgments technology, Kannur University, India. He has received his Master
The authors would like to thank Kerala State Council for of Science degree from Cochin University of Science and
Science, Technology and Environment, Govt. of Kerala, Technology, India in 1982 and he has awarded with his Doctoral
India for their support and grant awarded for the conduct of Degree in 1992 by Cochin University. He has a number of
this research work. international journals and conference papers in his credit. He is
guiding doctoral students for the past years. His mail research
interest lies in Speech processing, Pattern Recognition, Data
mining and visual Cryptography.
Reference
[1] Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of
Speech Recognition, Englewood Cliffs, NJ: Prentice-
Hall,1993.
[2] Mallat, Stephane, A wavelet tour of signal processing,
San Diego: Academic Press, 1999. ISBN 012466606.
[3] G.K. Kharate1, A.A. Ghatol2 and P.P. Rege3 “Selection
of Mother Wavelet for Image Compression on Basis of
Image”, IEEE - ICSCN 2007, MIT Campus, Anna
University, Chennai, India. Feb. 22-24, 2007 pp.281-285.
[4] Mallat, SA, Theory for multiresolution signal
decomposition: The Wavelet Representation, IEEE
Transactions on Pattern Analysis Machine Intelligence,
vol.31, 1989, pp 674-693.
[5] K..P.Soman, K.I. Ramachandran, Insight into Wavelets
from theory to Practice, Second Edition, PHI, 2005.
[6] Kadambe, S; Srinivasan, P, Application of Adaptive
Wavelets for Speech, Optical Engineering 33(7, pp
2204-2211), July 1994.
[7] Stuart Russel, Peter Norvig, Artificial Intelligence A
Modern Approach, New Delhi:Prentice Hall of India,
2005.
[8] S N Sivanadam, S Sumathi, S N Deepa Introduction to
Neural Networks using Matlab 6.0 ,New Delhi: Tata
McGraw-Hill, 2006.
[9] James A Freeman, David M Skapura, Neural Networks
Algorithm , Application and Programming Techniques ,
Pearson Education, 2006.

You might also like