0% found this document useful (0 votes)
42 views8 pages

Feature Extraction Techniques For Speech Processing A Review

The document reviews various feature extraction techniques used in speech processing, including Linear Predictive Coefficient (LPC), Mel Frequency Cepstral Coefficient (MFCC), and Wavelet Transform (WT). It highlights the strengths and weaknesses of each method, emphasizing the importance of selecting appropriate techniques based on application requirements. Additionally, hybrid methods are discussed, showing improved performance in speech processing tasks.

Uploaded by

vnbam2502
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views8 pages

Feature Extraction Techniques For Speech Processing A Review

The document reviews various feature extraction techniques used in speech processing, including Linear Predictive Coefficient (LPC), Mel Frequency Cepstral Coefficient (MFCC), and Wavelet Transform (WT). It highlights the strengths and weaknesses of each method, emphasizing the importance of selecting appropriate techniques based on application requirements. Additionally, hybrid methods are discussed, showing improved performance in speech processing tasks.

Uploaded by

vnbam2502
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ISSN 2278-3091

Volume
Mohammed Arif Mazumder et al., International Journal 8, No.1.3,
of Advanced 2019
Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292
International Journal of Advanced Trends in Computer Science and Engineering
Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse5481.32019.pdf
https://doi.org/10.30534/ijatcse/2019/5481.32019

Feature Extraction Techniques for Speech Processing: A Review


Mohammed Arif Mazumder1, Rosalina Abdul Salam1,2
1
Faculty of Science and Technology, Universiti Sains Islam Malaysias (USIM), 71800 Nilai, Negeri Sembilan
Malaysia, [email protected]
2
Islamic Science Institute, Universiti Sains Islam Malaysia (USIM), 71800 Nilai, Negeri Sembilan, Malaysia,
[email protected]

 many existing methods that are available for feature


ABSTRACT extraction in speech processing. The most commonly used are
such as Linear Predictive Coefficient (LPC), Perceptual
In digital signal processing, speech processing is one of the Linear Prediction (PLP), Mel Frequency Cepstral Coefficient
areas that is used in many type of applications. It is one of an (MFCC), Relative Spectral Perceptual Linear Prediction
intensive field of research. The major criterion for good (RASTA-PLP) and Wavelet Transform (WT). These methods
speech processing system is the selection of feature extraction are explained and discussed in this paper. Comparative
technique, which plays a major role in achieving higher studies for these methods are provided. Studies show that
accuracy. In this paper, most commonly used techniques for methods are selected based on its applications. In recent years,
feature extraction such as Linear Predictive Coefficient hybrid methods are also introduced, and it shows that in most
(LPC), Mel Frequency Cepstral Coefficient (MFCC), cases hybrid methods outperformed single methods.
Perceptual Linear Prediction (PLP), Relative Spectral However, suitable methods are selected based on the domain.
Perceptual Linear Prediction (RASTA-PLP) and Wavelet In the next following sections, the overview of what features
Transform (WT) are presented. Comparisons that highlight extractions will be explained, it will be followed by the
the strengths and the weaknesses of these techniques are also comparative studies of the most commonly used methods.
presented. Studies show that feature extraction techniques are Then the hybrid feature extractions methods and their major
mainly selected based on the requirement of the applications. properties will be discussed.
Wavelet transform outperform other techniques for the
analysis of non-stationary signals in audio signal. Enhanced 2. FEATURE EXTRACTION
Wavelet transform technique is a way forward and studies can
be focused on its coefficients. Hybrid methods can be further In a speech processing the process of extracting important
explored to increase the performance in speech processing. A information from a speech signal and reducing noise and
number of hybrid methods were reviewed, and studies show unwanted information is called feature extraction. Basic
that Mel-Frequency Cepstral Coefficients (WPCC) provide operation of feature extraction involves spectral analysis,
better results for speech processing applications with standard parametric transformation and statistical modeling [1]. The
coefficient for classification. output is a parameter vector [2]. However, it is normal to lose
useful information while removing unnecessary information
Key words: Linear Predictive Coefficient (LPC); Mel [3]. Feature extraction involves the process of converting the
Frequency Cepstral Coefficient (MFCC); Perceptual Linear speech signal into digital form [4]. This basic step of feature
Prediction (PLP); Relative Spectral Perceptual Linear extraction is shown in Figure 1.
Prediction (RASTA-PLP); Wavelet Transform (WT)

1. INTRODUCTION

Speech processing involves a huge amount of signal data. In


speech processing the speech signal is very crucial. This can
later affect the classification and recognition stages. The
nature of speech signal is a non-stationary signal. This make it
more complex during the feature extraction stage. Figure 1: Basic Operation of Feature Extraction [1]
Dimensionality reduction is very important to ensure
minimum or zero data lost during the feature extraction stage. Spectral Analysis is the first stage of speech analysis and it
The time domain waveform of a speech signal give us includes spectro temporal analysis of signal [1]. In Parametric
auditory information of this non-stationary signal. This Transforms, two fundamental operations that are
waveform of the speech describes minimum information of a differentiation and concatenation stage are applied to create
speech signal. Feature extraction in speech processing is very signal parameters from signal measurements [1]. Signal
crucial especially for accuracy and performance. Currently, parameters were generated from few underlying multivariate
random processes and this happen in Statistical Modeling
stage.

285
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

3. FEATURE EXTRACTION TECHNIQUES First input speech signal is divided into overlapping frames.
Windowing is applied and then it is subjected to fast Fourier
Speech signal can be retrieved directly from the digitized Transform. In the next step the frequency domain signal is
waveform [5]. Large data of speech signal requires suitable converted to Mel frequency scale. Then the log Mel scale
and reliable feature extraction techniques. This can improve spectrum is converted to time domain using Discrete Cosine
the performance and computationally more effective. It will Transform (DCT) [9]. The result of the conversion is called
remove various source of information, such as whether the Mel Frequency Cepstrum Coefficient. MFCC mainly
sound is voiced or unvoiced, that is whether speech are concentrates on the static characteristics of a signal.
affected by noise or not [6].
3.3 Perceptual Linear Prediction (PLP)
3.1 Linear Predictive Coding (LPC) The Perceptual Linear Prediction basically discards irrelevant
In Linear Predictive Coding (LPC) analysis, a speech sample information such as noise and not similar to human voice.
approximately combines past speech samples linearly. LPC is PLP is very similar to LPC but PLP is close to human voice
a frame based analysis of the speech signal [7]. LPC feature system. The process of PLP is shown in Figure 4 [10].
extraction process are shown in Figure 2. Adjacent frames in
input speech signal are separated and is framed blocked into
frames of samples. In order to minimize the signal
discontinuities each individual frame is windowed [8]. This is
followed by auto correlating each frame of windowed signal
and then it converts each frame of autocorrelations into LPC
parameter set by using Durbins method [8]. The LPC features
vector were then created.

Figure 2: Linear Predictive Coding (LPC) Feature Extraction


Process [6]
Figure 4: Perceptual Linear Prediction (PLP) Feature
3.2 Mel Frequency Cepstral Coefficient (MFCC) Extraction Process [10]
The Mel-frequency Cepstrum Coefficient (MFCC) technique
is mainly used to create the fingerprint of the sound files.
MFCC feature extraction process is shown in Figure 3. First, the quantized sign is windowed. This is to limit the sign
discontinuities. At that point, Hamming Window is utilized
and the power range of the windowed sign is determined to
utilize FFT. The three stages of recurrence distorting,
smoothing and examining are incorporated into a solitary
channel bank called Bark Filter Bank [11]. To invigorate the
affectability of human hearing an equivalent uproar
pre-accentuation is utilized to loads the channel bank yields
[12]. The yield that is the sound-related twisted line range is
then prepared by the Linear Prediction to organize [13]. The
last advance is the calculation of the Cepstral Coefficients.

3.4 Perceptual Linear Prediction (PLP) Relative Spectral


Perceptual Linear Prediction (RASTA-PLP)
To remove short-term noise variations a special band-pass
filter was added to each frequency sub-band in traditional PLP
algorithm [14]. This is called RASTA-PLP and it is the
Figure 3: Mel Frequency Cepstral Coefficient (MFCC) filtering method used for removing the conventional
Feature Extraction Process [9] disturbances. In traditional PLP, it has limited capability in

286
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

dealing with distortions. This is overcome by the introduction 5. FEATURE EXTRACTION TECHNIQUES
of RASTA-PLP. The input speech signal will go through the
process of spectral analysis by using MFCC or PLP. This is Earlier a few features extraction techniques that are single
then modified by the compressing the static non-linearity and techniques with their strengths and weaknesses were
will be filtered by band pass filter. Then anther filter banks is
used to expand the non-linearity and coefficients are produced.
This is shown in Figure 5.

presented. Studies show that better performance can be


obtained by combining a few methods together to extract
relevant features. These hybrid methods can be further
investigated. A few significant hybrid feature extraction
techniques and their comparison will be discussed in the next
Figure 5: RASTA-PLP Feature Extraction Process [15]
following sections.
3.5 Wavelet Transform (WT)
5.1 Discrete Wavelet Packet Decomposition (DWPD)
The wavelet transform is another method that has a similarity For discourse improvement and to conquer the impediments
with how human ear processes sound. Therefore, it is suitable of DWT and WPD, new cross breed strategies were presented.
for speech processing. Discrete Wavelet Transform (DWT) This new half breed technique is called Discrete Wavelet
and Wavelet Packet Decomposition (WPD) are explained and Packet Decomposition (DWPD) and it joins the highlights of
discussed in the next section. both DWT and WPD. It comprises of three stages process
3.5.1 Discrete Wavelet Transform (DWT) where from the outset the discourse sign is part into two
DWT can extract information of non-stationary signals and it groups that are High and Low-recurrence band signal. At that
is very suitable for speech data. It is better in performance and point, WPD is connected to the high-recurrence segments and
computationally effective and efficient for feature extraction DWT is connected to the low-recurrence segments. In
in speech. It has a varying window sizes therefore, it is conclusion, the highlights delivered from the two techniques
efficient in all frequency ranges. Signal are passes through are joined and a component vector set is shaped [24].
two filters that are low-pass filter and a high-pass filter and it
produces two signals [17]. The output of a low pass filter is The half and half calculation DWPD has a couple of focal
called as approximation coefficients and the output of points, for example, the high-recurrence band are
highpass filter is called as detail coefficients [17]. disintegrated into more parcels. This will expand the
3.5.2 Wavelet Packet Decomposition (WPD) presentation and computationally increasingly successful and
produce a higher acknowledgment rates [25], [26].5.2 Phase
A generalization of DWT is actually WPD. Therefore, WPD
Autocorrelation Bark Wavelet Transform (PACWT)
is more flexible. Similar to DWT, WPD is decomposed into
low frequency components and high frequency components. Phase Autocorrelation Bark Wavelet Transform (PACWT)
The difference is that in WPD it applies the transform step to combines the benefits of phase autocorrelation (PAC) with
the low pass and high pass results whereas in DWT it only bark wavelet transform. It is a hybrid method and improve the
apply to low pass results [18]. robustness based on alternative measure of autocorrelation.
The process of PACWT is shown in Figure 6.
4. COMPARISON OF FEATURE EXTRACTION
TECHNIQUES

Feature selection and extract are very crucial to speech


recognition system. In most cases, it is domain or applications
oriented. Table 1 presented the strengths and the weaknesses
of the most commonly used feature extractions methods.
Applications related to each method are also highlighted.
Figure 6: Block diagram of the PACWT Feature Extraction
[28]

287
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

Table 1: Selected Feature Extraction Techniques

Methods Applications Strengths Weaknesses


Linear Tonal analysis,  LPC method is easy to implement  Feature components are
Predictive Musical instrument and the mathematics are very precise highly correlated [19]
Coding and simple [19].  The representation of speech
(LPC)  Low dimension feature vectors are production or perception
represented for the spectral envelope based on the linear scales are
[19], [21]. not adequate [20].
 A priori information on the
speech signal under test
cannot be included [19].

Mel Voice recognition  It's not based on linear characteristics;  Limited representation of
Frequency system for security hence, similar to the human auditory speech signals since only the
Cepstral purpose perception system [19], [20] power spectrum is
Coefficients  Low correlation between coefficients considered [19]
(MFCC) [19]  Low robustness to noise
 Provides good discrimination [19],[20]

Perceptual Speech analysis  Low dimensional for the resultant  Spectral balance is easily
Linear feature vector [19] altered by the
Predictive  Voiced and unvoiced speech has communication channel,
Analysis reduction in the discrepancy [19] noise, and the equipment
(PLP) used [19]
 Dependent on the whole
spectral balance [19].

Relative Spectrum factor  Spectral components that change  Poor performance in clean
Spectral analysis slower or quicker than the rate of speech environments [22]
Perceptual change of the speech signal are
Linear suppressed [19]
Prediction  These features are best used when
(RASTA-PL there is a mismatch in the Analog
P) input channel between the
development and fielded systems [20]

Wavelet Multiresolution  Capable of compressing a signal  Not flexible as same basic


Transform analysis, Time without major degradation [19] wavelets have to be used for
(WT) frequency localization,  Able to perform efficient time and all speech signals [19]
and Multirate filtering frequency localizations [19],[23]
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

First, the speech signal is pre-emphasized where Hamming


window is used for a given frame on the pre-emphasized
signal. Then computing correlation coefficients produce
autocorrelation sequence during the Phase Autocorrelation.
This is followed by simply applying the bark wavelet
transform to the signal that passes through the Mel-filter bank.
Finally, PACWT feature coefficients are produced. Then the
first and second derivatives of the time sequence of each base
feature are also calculated. Final PACWT feature coefficients
set were produced by the concatenation of the derivatives to
the base feature set.
5.3 Wavelet Based Mel-Frequency Cepstral Coefficients
(WPCC)
Wavelet Based Mel-Frequency Cepstral Coefficients (WPCC)
is a hybrid of the wavelet transform method and the MFCC.
Firstly, the wavelet transform is applied to the speech signal
into two different frequency channels to decompose them.
High frequency channel components have all the details and
the low frequency channel are only the approximations. Then
the MFCC of the approximations and details channels are
calculated. This is for capturing the characteristics of
individual speakers [29]. This will ease the calculation of the
coefficients. The process of WPCC is shown in Figure 7.
Figure 8: Feature Extraction using RPLP [32]

5.5 Bark frequency cepstral coefficients (BFCC)


BFCC is a hybrid of PLP and Bark filter bank. BFCC is very
similar to MFCC except that it uses the bark filter bank in
comparison to Mel filter bank [34]. As mentioned earlier bark
filter are sensitive to human hearing. The signal is
compressed and finally DCT is used to de-correlate the
features.

Analysis shows that wavelet based DWPD are much more


efficient, the performance is higher and the computational
complexity are reduced. The dimensionality reduction is
efficient with wavelet based DWPD and it produces better
vector size. It increases the accuracy and suitable for
non-stationary signals. Comparison between Phase
Figure 7: Mel-Frequency Cepstral Coefficient (MFCC) using Autocorrelation Bark Wavelet Transform (PACWT) and
Mel filter bank and Wavelet Packet Cepstral Coefficient MFCC shows that PACWT are better for male voice data
(WPCC) using wavelet packet (WP) filter bank [29] compared to female voice data. This is because it is better in
low-SNR conditions. Revised Perceptual Linear Prediction
5.4 Revised Perceptual Linear Prediction (RPLP) Coefficients (RPLP), are mostly used in spoken language
identification. It has the advantage of the pre-emphasis filter,
RPLP is a hybrid feature extraction based on PLP and MFCC. Mel scale filter bank, LP and cepstral analysis. MFCC and
It uses Mel Filter bank instead of bark filter bank. First, the BFCC shows good performance, however in noise
input signal is pre-emphasized then the segmentation and FFT environments, MFCC shows better performance. However,
spectrum is processed by applying Mel scale filter bank. The Wavelet Packets (WPs) shows better performance in
output is converted to the cepstral coefficients using LP comparison to MFCC due to its rich coverage of
analysis. The first six steps are similar to MFCC steps. Then time-frequency properties. Table 2 highlights the strengths
it is followed by PLP steps. This steps can be seen in Figure 8. and the weaknesses of the presented hybrid methods.
After all these steps IDFT, LP analysis and Cepstral analysis
were applied in the same way as in PLP features.
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

Table 2: Hybrid Feature Extraction Techniques

Methods Applications Strengths Weaknesses

Discrete Wavelet  Computational complexity is


Packet Speaker independent reduced because it can  Performance reduce for
Decomposition digits recognition decompose high frequency band stationary signals [25].
(DWPD) into more partitions [24], [26].

 The PACWT feature extraction


method is generally noise-robust
Phase
Robust Speech compared to MFCC,
Autocorrelation  In clean speech MFCC
Recognition and particularly in high-noise
Bark Wavelet has a higher recognition
Speaker (low-SNR) environments [28].
Transform rate than the PACWT
Identification  Recognition performance was
(PACWT) [28].
significantly better for male data
than for female data [28].

 For clean speech, it provides


Wavelet-Based
better performance compared to
Mel-Frequency Speaker  WPCC does not show
MFCC features [30], [31].
Cepstral Identification the robust performance
Coefficients System  It reduces the problem of noise
in ASR [30], [29].
(WPCC) and improves efficiently the
recognition rate [31].

 RPLP features increase the


accuracy of the recognition
Revised
relatively better than the  Identification accuracy
Perceptual Spoken Language
standard MFCC [50]. vary; depends on
Linear Prediction Identification
 Improve of recognition accuracy different classifier
(RPLP)
against PLP under noisy
conditions [32].

 MFCC perform better


Bark Frequency  Higher identification accuracy is than the conventional
Speech Recognition
Cepstral produced for infinite distance in BFCC method and
in noisy
Coefficients comparison with other feature sometimes performance
environments
(BFCC) extraction methods [33]. degrade under noisy
environment [29], [33].
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

6. CONCLUSION And Its Related Fluency Disorders, IJCSIT, Vol. 5 (5) ,


6764-6767
Speech processing involves with big amount of speech signal [5] Sayf A. Majeed, Hafizah Husain, Salina A. Samad, 2015,
data. Therefore, data reduction is very important in reducing Phase Autocorrelation Bark Wavelet Transform
the computational complexity and increase the performance. (PACWT) Features for Robust Speech Recognition,
However, data reduction can results in losing important PAN – IPPT, Archives Of Acoustics, Vol. 40, No. 1, pp.
speech signals. Selecting feature extraction technique is very 25–31
important in preserving important speech signals. Careful https://doi.org/10.1515/aoa-2015-0004
selection of methods can be decided and the applications [6] Ms. Pratibha Saroj, Mrs.Shilpa Verma, 2015, Speech
should also be considered. This paper presented a number of Recognition Of Deaf And Hard Of Hearing People By
commonly used feature selection methods and a few hybrid Using Neural Network, International Journal of
methods. The strengths and the weaknesses of the methods Emerging Technology and Innovative Engineering
were presented and discussed. FFT, LPC and MFCC has Volume 1, Issue 8, ISSN: 2394 6598.
higher computational complexity. Basically, they are much [7] Pratik K. Kurzekar, Ratnadeep R. Deshmukh, Vishal B.
better for stationary signals compared non-stationary signal. Waghmare, Pukhraj P. Shrishrimal, 2014 , Issues and
Wavelet based methods provide less computational Challenges of Voice Recognition in Pervasive
complexity and give higher performance. The accuracy are Environment, International Journal of Innovative
also higher in comparison with non-wavelet based method. Research in Science, Engineering and Technology, Vol.
From the literature, studies show that wavelet based method is 3, Issue 12.
a recommended method for speech signals. It shows that [8] Hariharan Muthusamy, Kemal Polat, Sazali Yaacob,
different applications require different feature extraction 2015, Improved Emotion Recognition Using Gaussian
methods. However, in most cases wavelet based methods Mixture Model and Extreme Learning Machine in
gives better accuracy with higher performance. Hybrid Speech and Glottal Signals, Mathematical Problems in
methods provide better results in comparison with single Engineering, Volume 20, Article ID 394083
methods. The wavelet based Mel-Frequency Cepstral https://doi.org/10.1155/2015/394083
Coefficients (WPCC) shows higher accuracy for speech [9] Pratik K. Kurzekar, Ratnadeep R. Deshmukh, Vishal B.
processing applications and provide standard coefficient for Waghmare, Pukhraj P. Shrishrimal,2014, A
classifications. Further improvement can be achieved by Comparative Study of Feature Extraction Techniques
incorporating optimization algorithms. This can further for Speech Recognition System, International Journal of
provide higher accuracy with reduced computational Innovative Research in Science, Engineering and
complexities especially under noisy conditions. Technology, Vol. 3, Issue 12.
https://doi.org/10.15680/IJIRSET.2014.0312034
ACKNOWLEDGEMENT [10] E.Chandra, K.Manikandan, M. Sivasankar, 2014, A
Proportional Study on Feature Extraction Method in
The authors would like to express their gratitude to Universiti Automatic Speech Recognition System, International
Sains Islam Malaysia (USIM) for the supports and facilities Journal Of Innovative Research In Electrical, Electronics,
provided. This research study is sponsored by Universiti Sains Instrumentation And Control Engineering, Vol. 2, Issue
Islam Malaysia (USIM) under USIM Competitive Grant 1.
[PPP/UTG-0114/FST/30/11414]. [11] Sascha Disch, Harald Popp, 2012, Apparatus and
Method for Determining a Plurality of Local Center
REFERENCES Of Gravity Frequencies of a Spectrum of an Audio
Signal, United States Patent Application Publication, US
[1] Pooja V. Janse, Ratnadeep R. Deshmukh, 2014, Design 2012/0008799 A1.
and Development of Database and Automatic Speech [12] A.Nagesh, 2016, A Comparison of Feather Extraction
Recognition System for Travel Purpose in Marathi, Methods for Language Identification using GMM,
OSR Journal of Computer Engineering (IOSR-JCE), International Journal of Engineering Trends and
Volume 16, Issue 5, Ver. IV, PP 97-104 Technology (IJETT), Volume 31, Number 4.
https://doi.org/10.9790/0661-165497104 https://doi.org/10.14445/22315381/IJETT-V31P239
[2] Urmila Shrawankar, Techniques for Feature [13] Namrata Dave, 2013, Feature Extraction Methods
Extraction in Speech Recognition System: A LPC, PLP and MFCC in Speech Recognition,
Comparative Study, Available from: International Journal for Advance Research in
https://arxiv.org/ftp/arxiv/papers/1305/1305.1145.pdf Engineering and Technology, Volume 1, Issue VI.
[3] Ms. Yogita A. More, Mrs. S. S. Munot(Bhabad), 2016, [14] Inshirah Idris, Md Sah Salam, 2016, Improved Speech
Effect Of Combination Of Different Features On Emotion Classification from Spectral Coefficient
Speech Recognition For Abnormal Speech, Optimization, Advances in Machine Learning and
International Journal Of Engineering And Computer Signal Processing, pp 247-257.
Science ISSN: 2319-7242 Volume 5 Issues 8,Page No. https://doi.org/10.1007/978-3-319-32213-1_22
17590-17592 [15] Sanjivani S. Bhabad, Kamaraj Naidu, 2014,
[4] Monica Mundada, Bharti Gawali ,Sangramsing RASTA-PLP for Speech Recognition of Articulatory
Kayte,2014, Recognition and Classification Of Speech Handicapped People, International Journal off
291
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

Scientific Research and Education, Volume 2, Issue 11, [28] Garau, G., Renals, S., 2008, Combining spectral
Pages-2313-2321. representations for large vocabulary continuous
[16] Risn Loughran,Alexandros Agapitos, Ahmed Kattan, speech recognition, IEEE Trans. Audio Speech
Anthony Brabazon, 2017, Feature selection for speaker Language Process.,16, (3), pp. 508518
verification using genetic programming, Evolutionary https://doi.org/10.1109/TASL.2008.916519
Intelligence, Volume 10,Issue 12, pp 121 [29] Fontaine, V., Ris, C., Leich, H., 1996, Nonlinear
https://doi.org/10.1007/s12065-016-0150-5 discriminant analysis with neural networks for speech
[17] Xuechuan Wang, Kuldip K. Paliwal, 2002, A Modified recognition, Proc. EUSIPCO 96, EURASIP, pp.
Minimum Classification Error (MCE) Training 15831586
Algorithm for Dimensionality Reduction, Journal of [30] Venkateswarlu, R.L.K., Kumari, R.V., Jayasri, G.V.,
VLSI Signal Processing 32. 2011, Speech recognition using radial basis function
[18] Marko V. Jankovic, Masashi Sugiyama, Probabilistic neural network, Third Int. Conf. on Electronics
Principal Component Analysis Based on JoyStick Computer Technology (ICECT), 2011, Kanyakumari, pp.
Probability Selector, Available from: 441445
https://www.researchgate.net/publication/221533826 https://doi.org/10.1109/ICECTECH.2011.5941788
[19] Wenzhi Liao, Aleksandra Piurica, Paul Scheunders, [31] Dengfeng, K., Shuang, X., Bo, X., 2008, Optimization
Wilfried Philips, Youguo Pi,2013, Semisupervised of tone recognition via applying linear discriminant
Local Discriminant Analysis for Feature Extraction analysis in feature extraction, Third Int. Conf. on
in Hyperspectral Images, IEEE Transactions On Geo Innovative Computing Information and Control
Science And Remote Sensing , Vol. 51, No. 1. (ICICIC), Dalian, Liaoning China,pp. 528531
https://doi.org/10.1109/TGRS.2012.2200106 [32] Sonia Sunny, David Peter S, K Poulose Jacob, 2013,
[20] Lahiru Dinalankara, 2017, Face Detection and Face Design of a Novel Hybrid Algorithm for Improved
Recognition Using Open Computer Vision Classifies. Speech Recognition with Support vector Machines
Available from: Classifier, International Journal of Emerging
https://www.researchgate.net/publication/318900718 Technology and Advanced Engineering, vol.3,
[21] Anusuya, M., Katti, S., 2011, Front end analysis of pp.249-254.
speech recognition: a review, Int. J. Speech Technol., [33] [33] P. Kumar, A. Biswas, A .N. Mishra and M.
14, (2), pp. 99145 Chandra, 2010, Spoken Language identification using
https://doi.org/10.1007/s10772-010-9088-7 hybrid feature extraction Methods, Journal of
[22] Sonia Sunny,David Peter S., K Poulose Jacob, A telecommunication, vol. 1, pp. 11-5.
Comparative Study of Wavelet Based Feature [34] Shaurya Agarwala, Pushkin Kachroob, Emma
Extraction Techniques in Recognizing Isolated Regentovab, 2016, A hybrid model using logistic
Spoken Words, Available from: regression and wavelet transformation to detect
http://www.ijsps.com/uploadfile/2013/0710/2013071010 traffic incidents, IATSS Research, Volume 40, Issue
5020955.pdf 1,Pages 56-63.
[23] S. Kadambe , P. Srinivasan,1994, Application of https://doi.org/10.1016/j.iatssr.2016.06.001
adaptive wavelets for speech coding, Proceedings of
IEEE-SP International Symposium on Time- Frequency
and Time-Scale Analysis.
[24] Korba, M.C.A., Messadeg, D., Djemili, R.H.B., 2004,
Robust speech recognition using perceptual wavelet
denoising and mel-frequency product spectrum
cepstral coefficient feature, Informatica, 32, pp.
283288.
[25] Zhou, P., Tang, L.Z., Xu, D.F., 2009, Speech
recognition algorithm of parallel subband HMM
based on wavelet analysis and neural network, Inf.
Technol. J., 8, pp. 796800
https://doi.org/10.3923/itj.2009.796.800
[26] Veisi, H., Sameti, H., 2011, The integration of principal
component analysis and cepstral mean subtraction in
parallel model combination for robust speech
recognition, Digit. Signal Process, 21, (1), pp. 3653
https://doi.org/10.1016/j.dsp.2010.07.004
[27] Lee, J.Y., Hung, J. , 2011, Exploiting principal
component analysis in modulation spectrum
enhancement for robust speech recognition, Eighth
Int. Conf. on Fuzzy Systems and Knowledge Discovery
(FSKD), Shanghai, pp. 19471951
https://doi.org/10.1109/FSKD.2011.6019893
292

You might also like