0% found this document useful (0 votes)

42 views8 pages

Feature Extraction Techniques For Speech Processing A Review

The document reviews various feature extraction techniques used in speech processing, including Linear Predictive Coefficient (LPC), Mel Frequency Cepstral Coefficient (MFCC), and Wavelet Transform (WT). It highlights the strengths and weaknesses of each method, emphasizing the importance of selecting appropriate techniques based on application requirements. Additionally, hybrid methods are discussed, showing improved performance in speech processing tasks.

Uploaded by

vnbam2502

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views8 pages

Feature Extraction Techniques For Speech Processing A Review

Uploaded by

vnbam2502

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ISSN 2278-3091

Volume
Mohammed Arif Mazumder et al., International Journal 8, No.1.3,
of Advanced 2019
Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292
International Journal of Advanced Trends in Computer Science and Engineering
Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse5481.32019.pdf
https://doi.org/10.30534/ijatcse/2019/5481.32019

Feature Extraction Techniques for Speech Processing: A Review

Mohammed Arif Mazumder1, Rosalina Abdul Salam1,2
1
Faculty of Science and Technology, Universiti Sains Islam Malaysias (USIM), 71800 Nilai, Negeri Sembilan
Malaysia, [email protected]
2
Islamic Science Institute, Universiti Sains Islam Malaysia (USIM), 71800 Nilai, Negeri Sembilan, Malaysia,
[email protected]

 many existing methods that are available for feature

ABSTRACT extraction in speech processing. The most commonly used are
such as Linear Predictive Coefficient (LPC), Perceptual
In digital signal processing, speech processing is one of the Linear Prediction (PLP), Mel Frequency Cepstral Coefficient
areas that is used in many type of applications. It is one of an (MFCC), Relative Spectral Perceptual Linear Prediction
intensive field of research. The major criterion for good (RASTA-PLP) and Wavelet Transform (WT). These methods
speech processing system is the selection of feature extraction are explained and discussed in this paper. Comparative
technique, which plays a major role in achieving higher studies for these methods are provided. Studies show that
accuracy. In this paper, most commonly used techniques for methods are selected based on its applications. In recent years,
feature extraction such as Linear Predictive Coefficient hybrid methods are also introduced, and it shows that in most
(LPC), Mel Frequency Cepstral Coefficient (MFCC), cases hybrid methods outperformed single methods.
Perceptual Linear Prediction (PLP), Relative Spectral However, suitable methods are selected based on the domain.
Perceptual Linear Prediction (RASTA-PLP) and Wavelet In the next following sections, the overview of what features
Transform (WT) are presented. Comparisons that highlight extractions will be explained, it will be followed by the
the strengths and the weaknesses of these techniques are also comparative studies of the most commonly used methods.
presented. Studies show that feature extraction techniques are Then the hybrid feature extractions methods and their major
mainly selected based on the requirement of the applications. properties will be discussed.
Wavelet transform outperform other techniques for the
analysis of non-stationary signals in audio signal. Enhanced 2. FEATURE EXTRACTION
Wavelet transform technique is a way forward and studies can
be focused on its coefficients. Hybrid methods can be further In a speech processing the process of extracting important
explored to increase the performance in speech processing. A information from a speech signal and reducing noise and
number of hybrid methods were reviewed, and studies show unwanted information is called feature extraction. Basic
that Mel-Frequency Cepstral Coefficients (WPCC) provide operation of feature extraction involves spectral analysis,
better results for speech processing applications with standard parametric transformation and statistical modeling [1]. The
coefficient for classification. output is a parameter vector [2]. However, it is normal to lose
useful information while removing unnecessary information
Key words: Linear Predictive Coefficient (LPC); Mel [3]. Feature extraction involves the process of converting the
Frequency Cepstral Coefficient (MFCC); Perceptual Linear speech signal into digital form [4]. This basic step of feature
Prediction (PLP); Relative Spectral Perceptual Linear extraction is shown in Figure 1.
Prediction (RASTA-PLP); Wavelet Transform (WT)

1. INTRODUCTION

Speech processing involves a huge amount of signal data. In

speech processing the speech signal is very crucial. This can
later affect the classification and recognition stages. The
nature of speech signal is a non-stationary signal. This make it
more complex during the feature extraction stage. Figure 1: Basic Operation of Feature Extraction [1]
Dimensionality reduction is very important to ensure
minimum or zero data lost during the feature extraction stage. Spectral Analysis is the first stage of speech analysis and it
The time domain waveform of a speech signal give us includes spectro temporal analysis of signal [1]. In Parametric
auditory information of this non-stationary signal. This Transforms, two fundamental operations that are
waveform of the speech describes minimum information of a differentiation and concatenation stage are applied to create
speech signal. Feature extraction in speech processing is very signal parameters from signal measurements [1]. Signal
crucial especially for accuracy and performance. Currently, parameters were generated from few underlying multivariate
random processes and this happen in Statistical Modeling
stage.

285
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

3. FEATURE EXTRACTION TECHNIQUES First input speech signal is divided into overlapping frames.
Windowing is applied and then it is subjected to fast Fourier
Speech signal can be retrieved directly from the digitized Transform. In the next step the frequency domain signal is
waveform [5]. Large data of speech signal requires suitable converted to Mel frequency scale. Then the log Mel scale
and reliable feature extraction techniques. This can improve spectrum is converted to time domain using Discrete Cosine
the performance and computationally more effective. It will Transform (DCT) [9]. The result of the conversion is called
remove various source of information, such as whether the Mel Frequency Cepstrum Coefficient. MFCC mainly
sound is voiced or unvoiced, that is whether speech are concentrates on the static characteristics of a signal.
affected by noise or not [6].
3.3 Perceptual Linear Prediction (PLP)
3.1 Linear Predictive Coding (LPC) The Perceptual Linear Prediction basically discards irrelevant
In Linear Predictive Coding (LPC) analysis, a speech sample information such as noise and not similar to human voice.
approximately combines past speech samples linearly. LPC is PLP is very similar to LPC but PLP is close to human voice
a frame based analysis of the speech signal [7]. LPC feature system. The process of PLP is shown in Figure 4 [10].
extraction process are shown in Figure 2. Adjacent frames in
input speech signal are separated and is framed blocked into
frames of samples. In order to minimize the signal
discontinuities each individual frame is windowed [8]. This is
followed by auto correlating each frame of windowed signal
and then it converts each frame of autocorrelations into LPC
parameter set by using Durbins method [8]. The LPC features
vector were then created.

Figure 2: Linear Predictive Coding (LPC) Feature Extraction

Process [6]
Figure 4: Perceptual Linear Prediction (PLP) Feature
3.2 Mel Frequency Cepstral Coefficient (MFCC) Extraction Process [10]
The Mel-frequency Cepstrum Coefficient (MFCC) technique
is mainly used to create the fingerprint of the sound files.
MFCC feature extraction process is shown in Figure 3. First, the quantized sign is windowed. This is to limit the sign
discontinuities. At that point, Hamming Window is utilized
and the power range of the windowed sign is determined to
utilize FFT. The three stages of recurrence distorting,
smoothing and examining are incorporated into a solitary
channel bank called Bark Filter Bank [11]. To invigorate the
affectability of human hearing an equivalent uproar
pre-accentuation is utilized to loads the channel bank yields
[12]. The yield that is the sound-related twisted line range is
then prepared by the Linear Prediction to organize [13]. The
last advance is the calculation of the Cepstral Coefficients.

3.4 Perceptual Linear Prediction (PLP) Relative Spectral

Perceptual Linear Prediction (RASTA-PLP)
To remove short-term noise variations a special band-pass
filter was added to each frequency sub-band in traditional PLP
algorithm [14]. This is called RASTA-PLP and it is the
Figure 3: Mel Frequency Cepstral Coefficient (MFCC) filtering method used for removing the conventional
Feature Extraction Process [9] disturbances. In traditional PLP, it has limited capability in

286
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

dealing with distortions. This is overcome by the introduction 5. FEATURE EXTRACTION TECHNIQUES
of RASTA-PLP. The input speech signal will go through the
process of spectral analysis by using MFCC or PLP. This is Earlier a few features extraction techniques that are single
then modified by the compressing the static non-linearity and techniques with their strengths and weaknesses were
will be filtered by band pass filter. Then anther filter banks is
used to expand the non-linearity and coefficients are produced.
This is shown in Figure 5.

presented. Studies show that better performance can be

obtained by combining a few methods together to extract
relevant features. These hybrid methods can be further
investigated. A few significant hybrid feature extraction
techniques and their comparison will be discussed in the next
Figure 5: RASTA-PLP Feature Extraction Process [15]
following sections.
3.5 Wavelet Transform (WT)
5.1 Discrete Wavelet Packet Decomposition (DWPD)
The wavelet transform is another method that has a similarity For discourse improvement and to conquer the impediments
with how human ear processes sound. Therefore, it is suitable of DWT and WPD, new cross breed strategies were presented.
for speech processing. Discrete Wavelet Transform (DWT) This new half breed technique is called Discrete Wavelet
and Wavelet Packet Decomposition (WPD) are explained and Packet Decomposition (DWPD) and it joins the highlights of
discussed in the next section. both DWT and WPD. It comprises of three stages process
3.5.1 Discrete Wavelet Transform (DWT) where from the outset the discourse sign is part into two
DWT can extract information of non-stationary signals and it groups that are High and Low-recurrence band signal. At that
is very suitable for speech data. It is better in performance and point, WPD is connected to the high-recurrence segments and
computationally effective and efficient for feature extraction DWT is connected to the low-recurrence segments. In
in speech. It has a varying window sizes therefore, it is conclusion, the highlights delivered from the two techniques
efficient in all frequency ranges. Signal are passes through are joined and a component vector set is shaped [24].
two filters that are low-pass filter and a high-pass filter and it
produces two signals [17]. The output of a low pass filter is The half and half calculation DWPD has a couple of focal
called as approximation coefficients and the output of points, for example, the high-recurrence band are
highpass filter is called as detail coefficients [17]. disintegrated into more parcels. This will expand the
3.5.2 Wavelet Packet Decomposition (WPD) presentation and computationally increasingly successful and
produce a higher acknowledgment rates [25], [26].5.2 Phase
A generalization of DWT is actually WPD. Therefore, WPD
Autocorrelation Bark Wavelet Transform (PACWT)
is more flexible. Similar to DWT, WPD is decomposed into
low frequency components and high frequency components. Phase Autocorrelation Bark Wavelet Transform (PACWT)
The difference is that in WPD it applies the transform step to combines the benefits of phase autocorrelation (PAC) with
the low pass and high pass results whereas in DWT it only bark wavelet transform. It is a hybrid method and improve the
apply to low pass results [18]. robustness based on alternative measure of autocorrelation.
The process of PACWT is shown in Figure 6.
4. COMPARISON OF FEATURE EXTRACTION
TECHNIQUES

Feature selection and extract are very crucial to speech

recognition system. In most cases, it is domain or applications
oriented. Table 1 presented the strengths and the weaknesses
of the most commonly used feature extractions methods.
Applications related to each method are also highlighted.
Figure 6: Block diagram of the PACWT Feature Extraction
[28]

287
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

Table 1: Selected Feature Extraction Techniques

Methods Applications Strengths Weaknesses

Linear Tonal analysis,  LPC method is easy to implement  Feature components are
Predictive Musical instrument and the mathematics are very precise highly correlated [19]
Coding and simple [19].  The representation of speech
(LPC)  Low dimension feature vectors are production or perception
represented for the spectral envelope based on the linear scales are
[19], [21]. not adequate [20].
 A priori information on the
speech signal under test
cannot be included [19].

Mel Voice recognition  It's not based on linear characteristics;  Limited representation of
Frequency system for security hence, similar to the human auditory speech signals since only the
Cepstral purpose perception system [19], [20] power spectrum is
Coefficients  Low correlation between coefficients considered [19]
(MFCC) [19]  Low robustness to noise
 Provides good discrimination [19],[20]

Perceptual Speech analysis  Low dimensional for the resultant  Spectral balance is easily
Linear feature vector [19] altered by the
Predictive  Voiced and unvoiced speech has communication channel,
Analysis reduction in the discrepancy [19] noise, and the equipment
(PLP) used [19]
 Dependent on the whole
spectral balance [19].

Relative Spectrum factor  Spectral components that change  Poor performance in clean
Spectral analysis slower or quicker than the rate of speech environments [22]
Perceptual change of the speech signal are
Linear suppressed [19]
Prediction  These features are best used when
(RASTA-PL there is a mismatch in the Analog
P) input channel between the
development and fielded systems [20]

Wavelet Multiresolution  Capable of compressing a signal  Not flexible as same basic

Transform analysis, Time without major degradation [19] wavelets have to be used for
(WT) frequency localization,  Able to perform efficient time and all speech signals [19]
and Multirate filtering frequency localizations [19],[23]
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

First, the speech signal is pre-emphasized where Hamming

window is used for a given frame on the pre-emphasized
signal. Then computing correlation coefficients produce
autocorrelation sequence during the Phase Autocorrelation.
This is followed by simply applying the bark wavelet
transform to the signal that passes through the Mel-filter bank.
Finally, PACWT feature coefficients are produced. Then the
first and second derivatives of the time sequence of each base
feature are also calculated. Final PACWT feature coefficients
set were produced by the concatenation of the derivatives to
the base feature set.
5.3 Wavelet Based Mel-Frequency Cepstral Coefficients
(WPCC)
Wavelet Based Mel-Frequency Cepstral Coefficients (WPCC)
is a hybrid of the wavelet transform method and the MFCC.
Firstly, the wavelet transform is applied to the speech signal
into two different frequency channels to decompose them.
High frequency channel components have all the details and
the low frequency channel are only the approximations. Then
the MFCC of the approximations and details channels are
calculated. This is for capturing the characteristics of
individual speakers [29]. This will ease the calculation of the
coefficients. The process of WPCC is shown in Figure 7.
Figure 8: Feature Extraction using RPLP [32]

5.5 Bark frequency cepstral coefficients (BFCC)

BFCC is a hybrid of PLP and Bark filter bank. BFCC is very
similar to MFCC except that it uses the bark filter bank in
comparison to Mel filter bank [34]. As mentioned earlier bark
filter are sensitive to human hearing. The signal is
compressed and finally DCT is used to de-correlate the
features.

Analysis shows that wavelet based DWPD are much more

efficient, the performance is higher and the computational
complexity are reduced. The dimensionality reduction is
efficient with wavelet based DWPD and it produces better
vector size. It increases the accuracy and suitable for
non-stationary signals. Comparison between Phase
Figure 7: Mel-Frequency Cepstral Coefficient (MFCC) using Autocorrelation Bark Wavelet Transform (PACWT) and
Mel filter bank and Wavelet Packet Cepstral Coefficient MFCC shows that PACWT are better for male voice data
(WPCC) using wavelet packet (WP) filter bank [29] compared to female voice data. This is because it is better in
low-SNR conditions. Revised Perceptual Linear Prediction
5.4 Revised Perceptual Linear Prediction (RPLP) Coefficients (RPLP), are mostly used in spoken language
identification. It has the advantage of the pre-emphasis filter,
RPLP is a hybrid feature extraction based on PLP and MFCC. Mel scale filter bank, LP and cepstral analysis. MFCC and
It uses Mel Filter bank instead of bark filter bank. First, the BFCC shows good performance, however in noise
input signal is pre-emphasized then the segmentation and FFT environments, MFCC shows better performance. However,
spectrum is processed by applying Mel scale filter bank. The Wavelet Packets (WPs) shows better performance in
output is converted to the cepstral coefficients using LP comparison to MFCC due to its rich coverage of
analysis. The first six steps are similar to MFCC steps. Then time-frequency properties. Table 2 highlights the strengths
it is followed by PLP steps. This steps can be seen in Figure 8. and the weaknesses of the presented hybrid methods.
After all these steps IDFT, LP analysis and Cepstral analysis
were applied in the same way as in PLP features.
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

Table 2: Hybrid Feature Extraction Techniques

Methods Applications Strengths Weaknesses

Discrete Wavelet  Computational complexity is

Packet Speaker independent reduced because it can  Performance reduce for
Decomposition digits recognition decompose high frequency band stationary signals [25].
(DWPD) into more partitions [24], [26].

 The PACWT feature extraction

method is generally noise-robust
Phase
Robust Speech compared to MFCC,
Autocorrelation  In clean speech MFCC
Recognition and particularly in high-noise
Bark Wavelet has a higher recognition
Speaker (low-SNR) environments [28].
Transform rate than the PACWT
Identification  Recognition performance was
(PACWT) [28].
significantly better for male data
than for female data [28].

 For clean speech, it provides

Wavelet-Based
better performance compared to
Mel-Frequency Speaker  WPCC does not show
MFCC features [30], [31].
Cepstral Identification the robust performance
Coefficients System  It reduces the problem of noise
in ASR [30], [29].
(WPCC) and improves efficiently the
recognition rate [31].

 RPLP features increase the

accuracy of the recognition
Revised
relatively better than the  Identification accuracy
Perceptual Spoken Language
standard MFCC [50]. vary; depends on
Linear Prediction Identification
 Improve of recognition accuracy different classifier
(RPLP)
against PLP under noisy
conditions [32].

 MFCC perform better

Bark Frequency  Higher identification accuracy is than the conventional
Speech Recognition
Cepstral produced for infinite distance in BFCC method and
in noisy
Coefficients comparison with other feature sometimes performance
environments
(BFCC) extraction methods [33]. degrade under noisy
environment [29], [33].
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

6. CONCLUSION And Its Related Fluency Disorders, IJCSIT, Vol. 5 (5) ,

6764-6767
Speech processing involves with big amount of speech signal [5] Sayf A. Majeed, Hafizah Husain, Salina A. Samad, 2015,
data. Therefore, data reduction is very important in reducing Phase Autocorrelation Bark Wavelet Transform
the computational complexity and increase the performance. (PACWT) Features for Robust Speech Recognition,
However, data reduction can results in losing important PAN – IPPT, Archives Of Acoustics, Vol. 40, No. 1, pp.
speech signals. Selecting feature extraction technique is very 25–31
important in preserving important speech signals. Careful https://doi.org/10.1515/aoa-2015-0004
selection of methods can be decided and the applications [6] Ms. Pratibha Saroj, Mrs.Shilpa Verma, 2015, Speech
should also be considered. This paper presented a number of Recognition Of Deaf And Hard Of Hearing People By
commonly used feature selection methods and a few hybrid Using Neural Network, International Journal of
methods. The strengths and the weaknesses of the methods Emerging Technology and Innovative Engineering
were presented and discussed. FFT, LPC and MFCC has Volume 1, Issue 8, ISSN: 2394 6598.
higher computational complexity. Basically, they are much [7] Pratik K. Kurzekar, Ratnadeep R. Deshmukh, Vishal B.
better for stationary signals compared non-stationary signal. Waghmare, Pukhraj P. Shrishrimal, 2014 , Issues and
Wavelet based methods provide less computational Challenges of Voice Recognition in Pervasive
complexity and give higher performance. The accuracy are Environment, International Journal of Innovative
also higher in comparison with non-wavelet based method. Research in Science, Engineering and Technology, Vol.
From the literature, studies show that wavelet based method is 3, Issue 12.
a recommended method for speech signals. It shows that [8] Hariharan Muthusamy, Kemal Polat, Sazali Yaacob,
different applications require different feature extraction 2015, Improved Emotion Recognition Using Gaussian
methods. However, in most cases wavelet based methods Mixture Model and Extreme Learning Machine in
gives better accuracy with higher performance. Hybrid Speech and Glottal Signals, Mathematical Problems in
methods provide better results in comparison with single Engineering, Volume 20, Article ID 394083
methods. The wavelet based Mel-Frequency Cepstral https://doi.org/10.1155/2015/394083
Coefficients (WPCC) shows higher accuracy for speech [9] Pratik K. Kurzekar, Ratnadeep R. Deshmukh, Vishal B.
processing applications and provide standard coefficient for Waghmare, Pukhraj P. Shrishrimal,2014, A
classifications. Further improvement can be achieved by Comparative Study of Feature Extraction Techniques
incorporating optimization algorithms. This can further for Speech Recognition System, International Journal of
provide higher accuracy with reduced computational Innovative Research in Science, Engineering and
complexities especially under noisy conditions. Technology, Vol. 3, Issue 12.
https://doi.org/10.15680/IJIRSET.2014.0312034
ACKNOWLEDGEMENT [10] E.Chandra, K.Manikandan, M. Sivasankar, 2014, A
Proportional Study on Feature Extraction Method in
The authors would like to express their gratitude to Universiti Automatic Speech Recognition System, International
Sains Islam Malaysia (USIM) for the supports and facilities Journal Of Innovative Research In Electrical, Electronics,
provided. This research study is sponsored by Universiti Sains Instrumentation And Control Engineering, Vol. 2, Issue
Islam Malaysia (USIM) under USIM Competitive Grant 1.
[PPP/UTG-0114/FST/30/11414]. [11] Sascha Disch, Harald Popp, 2012, Apparatus and
Method for Determining a Plurality of Local Center
REFERENCES Of Gravity Frequencies of a Spectrum of an Audio
Signal, United States Patent Application Publication, US
[1] Pooja V. Janse, Ratnadeep R. Deshmukh, 2014, Design 2012/0008799 A1.
and Development of Database and Automatic Speech [12] A.Nagesh, 2016, A Comparison of Feather Extraction
Recognition System for Travel Purpose in Marathi, Methods for Language Identification using GMM,
OSR Journal of Computer Engineering (IOSR-JCE), International Journal of Engineering Trends and
Volume 16, Issue 5, Ver. IV, PP 97-104 Technology (IJETT), Volume 31, Number 4.
https://doi.org/10.9790/0661-165497104 https://doi.org/10.14445/22315381/IJETT-V31P239
[2] Urmila Shrawankar, Techniques for Feature [13] Namrata Dave, 2013, Feature Extraction Methods
Extraction in Speech Recognition System: A LPC, PLP and MFCC in Speech Recognition,
Comparative Study, Available from: International Journal for Advance Research in
https://arxiv.org/ftp/arxiv/papers/1305/1305.1145.pdf Engineering and Technology, Volume 1, Issue VI.
[3] Ms. Yogita A. More, Mrs. S. S. Munot(Bhabad), 2016, [14] Inshirah Idris, Md Sah Salam, 2016, Improved Speech
Effect Of Combination Of Different Features On Emotion Classification from Spectral Coefficient
Speech Recognition For Abnormal Speech, Optimization, Advances in Machine Learning and
International Journal Of Engineering And Computer Signal Processing, pp 247-257.
Science ISSN: 2319-7242 Volume 5 Issues 8,Page No. https://doi.org/10.1007/978-3-319-32213-1_22
17590-17592 [15] Sanjivani S. Bhabad, Kamaraj Naidu, 2014,
[4] Monica Mundada, Bharti Gawali ,Sangramsing RASTA-PLP for Speech Recognition of Articulatory
Kayte,2014, Recognition and Classification Of Speech Handicapped People, International Journal off
291
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292

Scientific Research and Education, Volume 2, Issue 11, [28] Garau, G., Renals, S., 2008, Combining spectral
Pages-2313-2321. representations for large vocabulary continuous
[16] Risn Loughran,Alexandros Agapitos, Ahmed Kattan, speech recognition, IEEE Trans. Audio Speech
Anthony Brabazon, 2017, Feature selection for speaker Language Process.,16, (3), pp. 508518
verification using genetic programming, Evolutionary https://doi.org/10.1109/TASL.2008.916519
Intelligence, Volume 10,Issue 12, pp 121 [29] Fontaine, V., Ris, C., Leich, H., 1996, Nonlinear
https://doi.org/10.1007/s12065-016-0150-5 discriminant analysis with neural networks for speech
[17] Xuechuan Wang, Kuldip K. Paliwal, 2002, A Modified recognition, Proc. EUSIPCO 96, EURASIP, pp.
Minimum Classification Error (MCE) Training 15831586
Algorithm for Dimensionality Reduction, Journal of [30] Venkateswarlu, R.L.K., Kumari, R.V., Jayasri, G.V.,
VLSI Signal Processing 32. 2011, Speech recognition using radial basis function
[18] Marko V. Jankovic, Masashi Sugiyama, Probabilistic neural network, Third Int. Conf. on Electronics
Principal Component Analysis Based on JoyStick Computer Technology (ICECT), 2011, Kanyakumari, pp.
Probability Selector, Available from: 441445
https://www.researchgate.net/publication/221533826 https://doi.org/10.1109/ICECTECH.2011.5941788
[19] Wenzhi Liao, Aleksandra Piurica, Paul Scheunders, [31] Dengfeng, K., Shuang, X., Bo, X., 2008, Optimization
Wilfried Philips, Youguo Pi,2013, Semisupervised of tone recognition via applying linear discriminant
Local Discriminant Analysis for Feature Extraction analysis in feature extraction, Third Int. Conf. on
in Hyperspectral Images, IEEE Transactions On Geo Innovative Computing Information and Control
Science And Remote Sensing , Vol. 51, No. 1. (ICICIC), Dalian, Liaoning China,pp. 528531
https://doi.org/10.1109/TGRS.2012.2200106 [32] Sonia Sunny, David Peter S, K Poulose Jacob, 2013,
[20] Lahiru Dinalankara, 2017, Face Detection and Face Design of a Novel Hybrid Algorithm for Improved
Recognition Using Open Computer Vision Classifies. Speech Recognition with Support vector Machines
Available from: Classifier, International Journal of Emerging
https://www.researchgate.net/publication/318900718 Technology and Advanced Engineering, vol.3,
[21] Anusuya, M., Katti, S., 2011, Front end analysis of pp.249-254.
speech recognition: a review, Int. J. Speech Technol., [33] [33] P. Kumar, A. Biswas, A .N. Mishra and M.
14, (2), pp. 99145 Chandra, 2010, Spoken Language identification using
https://doi.org/10.1007/s10772-010-9088-7 hybrid feature extraction Methods, Journal of
[22] Sonia Sunny,David Peter S., K Poulose Jacob, A telecommunication, vol. 1, pp. 11-5.
Comparative Study of Wavelet Based Feature [34] Shaurya Agarwala, Pushkin Kachroob, Emma
Extraction Techniques in Recognizing Isolated Regentovab, 2016, A hybrid model using logistic
Spoken Words, Available from: regression and wavelet transformation to detect
http://www.ijsps.com/uploadfile/2013/0710/2013071010 traffic incidents, IATSS Research, Volume 40, Issue
5020955.pdf 1,Pages 56-63.
[23] S. Kadambe , P. Srinivasan,1994, Application of https://doi.org/10.1016/j.iatssr.2016.06.001
adaptive wavelets for speech coding, Proceedings of
IEEE-SP International Symposium on Time- Frequency
and Time-Scale Analysis.
[24] Korba, M.C.A., Messadeg, D., Djemili, R.H.B., 2004,
Robust speech recognition using perceptual wavelet
denoising and mel-frequency product spectrum
cepstral coefficient feature, Informatica, 32, pp.
283288.
[25] Zhou, P., Tang, L.Z., Xu, D.F., 2009, Speech
recognition algorithm of parallel subband HMM
based on wavelet analysis and neural network, Inf.
Technol. J., 8, pp. 796800
https://doi.org/10.3923/itj.2009.796.800
[26] Veisi, H., Sameti, H., 2011, The integration of principal
component analysis and cepstral mean subtraction in
parallel model combination for robust speech
recognition, Digit. Signal Process, 21, (1), pp. 3653
https://doi.org/10.1016/j.dsp.2010.07.004
[27] Lee, J.Y., Hung, J. , 2011, Exploiting principal
component analysis in modulation spectrum
enhancement for robust speech recognition, Eighth
Int. Conf. on Fuzzy Systems and Knowledge Discovery
(FSKD), Shanghai, pp. 19471951
https://doi.org/10.1109/FSKD.2011.6019893
292

Feature Extraction Techniques Comparison For Emotion Recognition Using Acoustic Features
No ratings yet
Feature Extraction Techniques Comparison For Emotion Recognition Using Acoustic Features
4 pages
LPC and LPCC Method of Feature Extraction in Speech Recognition System
No ratings yet
LPC and LPCC Method of Feature Extraction in Speech Recognition System
5 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
A Review of Various Techniques Related To Feature Extraction and Classi Fication For Speech Signal Analysis
No ratings yet
A Review of Various Techniques Related To Feature Extraction and Classi Fication For Speech Signal Analysis
16 pages
Speech Recognition and PCA
No ratings yet
Speech Recognition and PCA
14 pages
Applsci 09 02166
No ratings yet
Applsci 09 02166
12 pages
MFCC and Vector Quantization For Arabic Fricatives2012
No ratings yet
MFCC and Vector Quantization For Arabic Fricatives2012
6 pages
Chapter 2 - Speech Signal Processing
No ratings yet
Chapter 2 - Speech Signal Processing
60 pages
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
No ratings yet
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
19 pages
Speaker Recognition Using MATLAB
95% (64)
Speaker Recognition Using MATLAB
75 pages
Power-Normalized Cepstral Coefficients (PNCC) For
No ratings yet
Power-Normalized Cepstral Coefficients (PNCC) For
14 pages
2015 Elsevier Speaker Identification Using Vowels Features Through A Combined Method of Formants Wavelets and Neural Network Classifiers
No ratings yet
2015 Elsevier Speaker Identification Using Vowels Features Through A Combined Method of Formants Wavelets and Neural Network Classifiers
9 pages
Paper 21 Automatic Speech Recognition Features Extraction Techniques
No ratings yet
Paper 21 Automatic Speech Recognition Features Extraction Techniques
6 pages
Voice Recognition
100% (1)
Voice Recognition
18 pages
Introduction
No ratings yet
Introduction
9 pages
Automatic Speech Recognition 2
No ratings yet
Automatic Speech Recognition 2
22 pages
NLP Unit V
No ratings yet
NLP Unit V
8 pages
Effect of MFCC Based Features For Speech Signal Alignments
No ratings yet
Effect of MFCC Based Features For Speech Signal Alignments
7 pages
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
No ratings yet
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
50 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
Lecture 7 - Automatic Speech Recognition
No ratings yet
Lecture 7 - Automatic Speech Recognition
58 pages
Unit 5 (Automatic Speech Recognition)
No ratings yet
Unit 5 (Automatic Speech Recognition)
13 pages
Methodology For Speaker Identification and Recognition System
100% (1)
Methodology For Speaker Identification and Recognition System
13 pages
Intechopen 80419
No ratings yet
Intechopen 80419
18 pages
Effect of MFCC Based Features For Speech Signal Alignments
No ratings yet
Effect of MFCC Based Features For Speech Signal Alignments
7 pages
Comparative Study of Voice Print Based Acoustic Features: MFCC and LPCC
No ratings yet
Comparative Study of Voice Print Based Acoustic Features: MFCC and LPCC
3 pages
Hedha Houa
No ratings yet
Hedha Houa
5 pages
DWT and Mfccs Based Feature Extraction Methods For Isolated Word Recognition
No ratings yet
DWT and Mfccs Based Feature Extraction Methods For Isolated Word Recognition
6 pages
Speech Recognition Using MFCC and DTW: January 2014
No ratings yet
Speech Recognition Using MFCC and DTW: January 2014
5 pages
Phoneme Recognition Using ICA-based Feature Extraction and Transformation
No ratings yet
Phoneme Recognition Using ICA-based Feature Extraction and Transformation
15 pages
MFCC Feature Extraction
No ratings yet
MFCC Feature Extraction
9 pages
Final Project Report
No ratings yet
Final Project Report
15 pages
M FCC Review
No ratings yet
M FCC Review
10 pages
The Process of Feature Extraction in Automatic Speech Recognition System For Computer Machine Interaction With Humans: A Review
No ratings yet
The Process of Feature Extraction in Automatic Speech Recognition System For Computer Machine Interaction With Humans: A Review
7 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
45 pages
Feature Extraction Methods LPC, PLP and MFCC
100% (1)
Feature Extraction Methods LPC, PLP and MFCC
5 pages
Speaker Recognition System Using MFCC and Vector Quantization
No ratings yet
Speaker Recognition System Using MFCC and Vector Quantization
7 pages
Voice Recognition
No ratings yet
Voice Recognition
6 pages
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
No ratings yet
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
6 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Voice Activation Using Speaker Recognition For Controlling Humanoid Robot
No ratings yet
Voice Activation Using Speaker Recognition For Controlling Humanoid Robot
6 pages
MajorInterim Report1
No ratings yet
MajorInterim Report1
10 pages
Speaker Recognition Using Vocal Tract Features
No ratings yet
Speaker Recognition Using Vocal Tract Features
5 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
No ratings yet
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
3 pages
A Review On Feature Extraction and Noise Reduction Technique
No ratings yet
A Review On Feature Extraction and Noise Reduction Technique
5 pages
Maretext Independent Speaker Identification Based On K-Mean Algorithm
No ratings yet
Maretext Independent Speaker Identification Based On K-Mean Algorithm
9 pages
Ijves Y14 05338
No ratings yet
Ijves Y14 05338
5 pages
Spoken Language Identification Using Hybrid Feature Extraction Methods
No ratings yet
Spoken Language Identification Using Hybrid Feature Extraction Methods
5 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
3 pages
Performance Evaluation of MLP For Speech Recognition in Noisy Environments Using MFCC & Wavelets
No ratings yet
Performance Evaluation of MLP For Speech Recognition in Noisy Environments Using MFCC & Wavelets
5 pages
Voice Command Recognition System Based On MFCC and DTW: Anjali Bala
No ratings yet
Voice Command Recognition System Based On MFCC and DTW: Anjali Bala
8 pages
EEL6586 Final Project:: A Speaker Identification and Verification System
No ratings yet
EEL6586 Final Project:: A Speaker Identification and Verification System
16 pages
Digital Signal Processing "Speech Recognition": Paper Presentation On
No ratings yet
Digital Signal Processing "Speech Recognition": Paper Presentation On
12 pages
3.2 Automatic Speech Recognition
No ratings yet
3.2 Automatic Speech Recognition
151 pages
Report Mini Project
No ratings yet
Report Mini Project
35 pages
Voice Based ATM
No ratings yet
Voice Based ATM
29 pages
Mel-Frequency Cepstral Coefficients Explained Easily
No ratings yet
Mel-Frequency Cepstral Coefficients Explained Easily
75 pages
Speaker Recognition Thesis
100% (3)
Speaker Recognition Thesis
8 pages
WSMA Mid-2 1
No ratings yet
WSMA Mid-2 1
26 pages
Recognition of Emotions in Speech Using Deep CNN A
No ratings yet
Recognition of Emotions in Speech Using Deep CNN A
18 pages
Electronics 12 00839 v2
No ratings yet
Electronics 12 00839 v2
17 pages
Emotion Recognition Using Speech Features by K. Sreenivasa Rao, Shashidhar G. Koolagudi (Auth.)
No ratings yet
Emotion Recognition Using Speech Features by K. Sreenivasa Rao, Shashidhar G. Koolagudi (Auth.)
133 pages
Updated Project Report Biomodal Biometric Authentication System
No ratings yet
Updated Project Report Biomodal Biometric Authentication System
30 pages
FBG Contact Microphone Harshini
No ratings yet
FBG Contact Microphone Harshini
11 pages
Unmasking The Fake Machine Learning Approach For Deepfake Voice Detection
No ratings yet
Unmasking The Fake Machine Learning Approach For Deepfake Voice Detection
12 pages
Speech Intelligibility Assessment of Dysarthria Using Fisher Vector Encoding
No ratings yet
Speech Intelligibility Assessment of Dysarthria Using Fisher Vector Encoding
12 pages
UNIT-V Automatic Speech Recognition 22.10,24
No ratings yet
UNIT-V Automatic Speech Recognition 22.10,24
15 pages
SC39
No ratings yet
SC39
5 pages
A Survey On Fingerprinting Technologies For Smartphones Based On Embedded Transducers
No ratings yet
A Survey On Fingerprinting Technologies For Smartphones Based On Embedded Transducers
25 pages
A Novel Convolutional Neural Network Model For Automatic Speaker Identification From Speech Signals
No ratings yet
A Novel Convolutional Neural Network Model For Automatic Speaker Identification From Speech Signals
14 pages
Speech To Text Conversion System For Myanmar Alphabet
No ratings yet
Speech To Text Conversion System For Myanmar Alphabet
2 pages
A Review On Speech Recognition Challenge
No ratings yet
A Review On Speech Recognition Challenge
7 pages
10 1109icsc45622 2019 8938371
No ratings yet
10 1109icsc45622 2019 8938371
7 pages
Update On Speech Recognition System Using LibriSpeech
No ratings yet
Update On Speech Recognition System Using LibriSpeech
3 pages
11111111j Apacoust 2020 107289
No ratings yet
11111111j Apacoust 2020 107289
9 pages
A Novel Speech-Driven Lip-Sync Model With CNN and LSTM
No ratings yet
A Novel Speech-Driven Lip-Sync Model With CNN and LSTM
6 pages
1 s2.0 S1746809423006924 Main
No ratings yet
1 s2.0 S1746809423006924 Main
11 pages
Preprocessing Signal
No ratings yet
Preprocessing Signal
6 pages
Final Survey Paper1
No ratings yet
Final Survey Paper1
5 pages
Audio To Text Cookbook
No ratings yet
Audio To Text Cookbook
3 pages
A First Step Towards Text-Independent Voice Conversion: ISCA Archive
No ratings yet
A First Step Towards Text-Independent Voice Conversion: ISCA Archive
4 pages
Simulation of Digital Communication Systems Using Matlab
From Everand
Simulation of Digital Communication Systems Using Matlab
Mathuranathan Viswanathan
3.5/5 (22)
Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Applied Digital Signal Processing and Applications
From Everand
Applied Digital Signal Processing and Applications
Othman Omran Khalifa
No ratings yet
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet

Feature Extraction Techniques For Speech Processing A Review

Uploaded by

Feature Extraction Techniques For Speech Processing A Review

Uploaded by

ISSN 2278-3091

Feature Extraction Techniques for Speech Processing: A Review

 many existing methods that are available for feature

Speech processing involves a huge amount of signal data. In

Figure 2: Linear Predictive Coding (LPC) Feature Extraction

3.4 Perceptual Linear Prediction (PLP) Relative Spectral

presented. Studies show that better performance can be

Feature selection and extract are very crucial to speech

Table 1: Selected Feature Extraction Techniques

Methods Applications Strengths Weaknesses

Wavelet Multiresolution  Capable of compressing a signal  Not flexible as same basic

First, the speech signal is pre-emphasized where Hamming

5.5 Bark frequency cepstral coefficients (BFCC)

Analysis shows that wavelet based DWPD are much more

Table 2: Hybrid Feature Extraction Techniques

Methods Applications Strengths Weaknesses

Discrete Wavelet  Computational complexity is

 The PACWT feature extraction

 For clean speech, it provides

 RPLP features increase the

 MFCC perform better

6. CONCLUSION And Its Related Fluency Disorders, IJCSIT, Vol. 5 (5) ,

You might also like