Feature Extraction Techniques For Speech Processing A Review
Feature Extraction Techniques For Speech Processing A Review
Volume
Mohammed Arif Mazumder et al., International Journal 8, No.1.3,
of Advanced 2019
Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292
International Journal of Advanced Trends in Computer Science and Engineering
Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse5481.32019.pdf
https://doi.org/10.30534/ijatcse/2019/5481.32019
1. INTRODUCTION
285
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292
3. FEATURE EXTRACTION TECHNIQUES First input speech signal is divided into overlapping frames.
Windowing is applied and then it is subjected to fast Fourier
Speech signal can be retrieved directly from the digitized Transform. In the next step the frequency domain signal is
waveform [5]. Large data of speech signal requires suitable converted to Mel frequency scale. Then the log Mel scale
and reliable feature extraction techniques. This can improve spectrum is converted to time domain using Discrete Cosine
the performance and computationally more effective. It will Transform (DCT) [9]. The result of the conversion is called
remove various source of information, such as whether the Mel Frequency Cepstrum Coefficient. MFCC mainly
sound is voiced or unvoiced, that is whether speech are concentrates on the static characteristics of a signal.
affected by noise or not [6].
3.3 Perceptual Linear Prediction (PLP)
3.1 Linear Predictive Coding (LPC) The Perceptual Linear Prediction basically discards irrelevant
In Linear Predictive Coding (LPC) analysis, a speech sample information such as noise and not similar to human voice.
approximately combines past speech samples linearly. LPC is PLP is very similar to LPC but PLP is close to human voice
a frame based analysis of the speech signal [7]. LPC feature system. The process of PLP is shown in Figure 4 [10].
extraction process are shown in Figure 2. Adjacent frames in
input speech signal are separated and is framed blocked into
frames of samples. In order to minimize the signal
discontinuities each individual frame is windowed [8]. This is
followed by auto correlating each frame of windowed signal
and then it converts each frame of autocorrelations into LPC
parameter set by using Durbins method [8]. The LPC features
vector were then created.
286
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292
dealing with distortions. This is overcome by the introduction 5. FEATURE EXTRACTION TECHNIQUES
of RASTA-PLP. The input speech signal will go through the
process of spectral analysis by using MFCC or PLP. This is Earlier a few features extraction techniques that are single
then modified by the compressing the static non-linearity and techniques with their strengths and weaknesses were
will be filtered by band pass filter. Then anther filter banks is
used to expand the non-linearity and coefficients are produced.
This is shown in Figure 5.
287
Mohammed Arif Mazumder et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(1.3), 2019, 285 - 292
Mel Voice recognition It's not based on linear characteristics; Limited representation of
Frequency system for security hence, similar to the human auditory speech signals since only the
Cepstral purpose perception system [19], [20] power spectrum is
Coefficients Low correlation between coefficients considered [19]
(MFCC) [19] Low robustness to noise
Provides good discrimination [19],[20]
Perceptual Speech analysis Low dimensional for the resultant Spectral balance is easily
Linear feature vector [19] altered by the
Predictive Voiced and unvoiced speech has communication channel,
Analysis reduction in the discrepancy [19] noise, and the equipment
(PLP) used [19]
Dependent on the whole
spectral balance [19].
Relative Spectrum factor Spectral components that change Poor performance in clean
Spectral analysis slower or quicker than the rate of speech environments [22]
Perceptual change of the speech signal are
Linear suppressed [19]
Prediction These features are best used when
(RASTA-PL there is a mismatch in the Analog
P) input channel between the
development and fielded systems [20]
Scientific Research and Education, Volume 2, Issue 11, [28] Garau, G., Renals, S., 2008, Combining spectral
Pages-2313-2321. representations for large vocabulary continuous
[16] Risn Loughran,Alexandros Agapitos, Ahmed Kattan, speech recognition, IEEE Trans. Audio Speech
Anthony Brabazon, 2017, Feature selection for speaker Language Process.,16, (3), pp. 508518
verification using genetic programming, Evolutionary https://doi.org/10.1109/TASL.2008.916519
Intelligence, Volume 10,Issue 12, pp 121 [29] Fontaine, V., Ris, C., Leich, H., 1996, Nonlinear
https://doi.org/10.1007/s12065-016-0150-5 discriminant analysis with neural networks for speech
[17] Xuechuan Wang, Kuldip K. Paliwal, 2002, A Modified recognition, Proc. EUSIPCO 96, EURASIP, pp.
Minimum Classification Error (MCE) Training 15831586
Algorithm for Dimensionality Reduction, Journal of [30] Venkateswarlu, R.L.K., Kumari, R.V., Jayasri, G.V.,
VLSI Signal Processing 32. 2011, Speech recognition using radial basis function
[18] Marko V. Jankovic, Masashi Sugiyama, Probabilistic neural network, Third Int. Conf. on Electronics
Principal Component Analysis Based on JoyStick Computer Technology (ICECT), 2011, Kanyakumari, pp.
Probability Selector, Available from: 441445
https://www.researchgate.net/publication/221533826 https://doi.org/10.1109/ICECTECH.2011.5941788
[19] Wenzhi Liao, Aleksandra Piurica, Paul Scheunders, [31] Dengfeng, K., Shuang, X., Bo, X., 2008, Optimization
Wilfried Philips, Youguo Pi,2013, Semisupervised of tone recognition via applying linear discriminant
Local Discriminant Analysis for Feature Extraction analysis in feature extraction, Third Int. Conf. on
in Hyperspectral Images, IEEE Transactions On Geo Innovative Computing Information and Control
Science And Remote Sensing , Vol. 51, No. 1. (ICICIC), Dalian, Liaoning China,pp. 528531
https://doi.org/10.1109/TGRS.2012.2200106 [32] Sonia Sunny, David Peter S, K Poulose Jacob, 2013,
[20] Lahiru Dinalankara, 2017, Face Detection and Face Design of a Novel Hybrid Algorithm for Improved
Recognition Using Open Computer Vision Classifies. Speech Recognition with Support vector Machines
Available from: Classifier, International Journal of Emerging
https://www.researchgate.net/publication/318900718 Technology and Advanced Engineering, vol.3,
[21] Anusuya, M., Katti, S., 2011, Front end analysis of pp.249-254.
speech recognition: a review, Int. J. Speech Technol., [33] [33] P. Kumar, A. Biswas, A .N. Mishra and M.
14, (2), pp. 99145 Chandra, 2010, Spoken Language identification using
https://doi.org/10.1007/s10772-010-9088-7 hybrid feature extraction Methods, Journal of
[22] Sonia Sunny,David Peter S., K Poulose Jacob, A telecommunication, vol. 1, pp. 11-5.
Comparative Study of Wavelet Based Feature [34] Shaurya Agarwala, Pushkin Kachroob, Emma
Extraction Techniques in Recognizing Isolated Regentovab, 2016, A hybrid model using logistic
Spoken Words, Available from: regression and wavelet transformation to detect
http://www.ijsps.com/uploadfile/2013/0710/2013071010 traffic incidents, IATSS Research, Volume 40, Issue
5020955.pdf 1,Pages 56-63.
[23] S. Kadambe , P. Srinivasan,1994, Application of https://doi.org/10.1016/j.iatssr.2016.06.001
adaptive wavelets for speech coding, Proceedings of
IEEE-SP International Symposium on Time- Frequency
and Time-Scale Analysis.
[24] Korba, M.C.A., Messadeg, D., Djemili, R.H.B., 2004,
Robust speech recognition using perceptual wavelet
denoising and mel-frequency product spectrum
cepstral coefficient feature, Informatica, 32, pp.
283288.
[25] Zhou, P., Tang, L.Z., Xu, D.F., 2009, Speech
recognition algorithm of parallel subband HMM
based on wavelet analysis and neural network, Inf.
Technol. J., 8, pp. 796800
https://doi.org/10.3923/itj.2009.796.800
[26] Veisi, H., Sameti, H., 2011, The integration of principal
component analysis and cepstral mean subtraction in
parallel model combination for robust speech
recognition, Digit. Signal Process, 21, (1), pp. 3653
https://doi.org/10.1016/j.dsp.2010.07.004
[27] Lee, J.Y., Hung, J. , 2011, Exploiting principal
component analysis in modulation spectrum
enhancement for robust speech recognition, Eighth
Int. Conf. on Fuzzy Systems and Knowledge Discovery
(FSKD), Shanghai, pp. 19471951
https://doi.org/10.1109/FSKD.2011.6019893
292