Speech Recognition Using Discrete Hidden Markov Model: Department of ECE, Saveetha Engineering College, Chennai, India
Speech Recognition Using Discrete Hidden Markov Model: Department of ECE, Saveetha Engineering College, Chennai, India
Speech Recognition Using Discrete Hidden Markov Model: Department of ECE, Saveetha Engineering College, Chennai, India
ISSN 1990-9233
IDOSI Publications, 2015
DOI: 10.5829/idosi.mejsr.2015.23.07.22353
Abstract: In recent years, Speech Recognition has the great development in the automation industry.
This paper proposes an Automatic Speech Recognitin (ASR) to facilitate an interaction between human and
the electronic components. The main concern of this paper involves the suppression of various noises to
achieve a robust speech recognition system. Discrete Hidden Markov Model is used to increase the speed of
speech recognition. This paper explores the hardware realization of desired speech recognition system on the
Field Programmable Gate array (FPGA). The accuracy has to be increased to get the clear and robust Speech
Recognition. The speech features can be extracted through the cepstral coefficients by using warping filter
banks. The cepstral coefficients are used to increase the robustness of Speech Recognition. To minimize the
complexity of desired ASR system, the number of coefficients has to be minimized. The Speech-to-Text
conversion is the main objective of this paper. This can be achieved by using an in-built function in Matlab
software.
Key words: Feature Extraction Cepstral Coefficients Discrete Cosine Transform Discrete Hidden
Markov Model
1506
Middle-East J. Sci. Res., 23 (7): 1506-1511, 2015
(3.1)
Fig. 2: Feature Extraction technique
By introducing hamming windowing to each frames,
modeling of speech signal can be done according to the windowing generates the least distortion[5].
assumption that such a small segment of speech s Fast Fourier Transform: Time domain signal can be
sufficiently stationary [4]. converted into frequency domain by applying fast
Section II describes the proposed methodology of fourier transform to each and every windowed frames.
this paper. Section III explores the steps has been The output of FFT can be complex numbers having both
followed in the feature extraction Section IV explodes the real and imaginary parts. Real time data has to be
theory of Discrete Hidden Markov Model. Section V processed with the speech recognition system. The
explains the hardware architecture which is designed to complex variables could be neglected by the FFT[5].
implement in FPGA. Section VI discussed the results Equation (3.2) describes the spectral domain
which are obtained for the desired system. Section VII
explains the concepts which are concluded from this (3.2)
paper. Section VIII explains the concepts from various
papers that are referred.
Mel Frequency Filter Bank: Based on the human
Proposed Methodology: The Proposed methodology perception, the Mel frequency analysis is preferable.
consists of the feature extraction module and the The human ear is very sensitive and it is proved that
codebook generation. The proposed architecture has humans having high resolution to the low frequency
been shown in Fig. 1, which explodes the steps has been rather than the higher frequency. Speech signal does not
followed in this paper. The original signal has to be split be linear. To make a linear scale conversion for the
1507
Middle-East J. Sci. Res., 23 (7): 1506-1511, 2015
(3.5)
frequency using Mel scale is used to warping a signal in Discrete Hidden Markov Model: Discrete Hidden Markov
frequency domain to the Mel scale. The conversion of Model is used to accelerate the speed of Speech
speech signal from frequency domain to Mel scale can be Recognition. A Codebook is to be first generated for the
done using the following equation (3.3). feature vectors. Feature vectors can be trained using
DHMM in the codebook. From the training samples, the
upper and lower bounds of each element has to be
(3.3)
calculated to generate the codebook. The range of upper
and lower bounds is divided into various sub-intervals
The Mel Filter bank spacing has to be applied to the
from which the feature vectors are extracted. By
FFT values to get the conversion for the frequency
randomizing the same number of vectors according tothe
domain into the Mel scale. Triangular band pass filters are
number of classes, the initial codebook has to be formed.
applied as a filter bank spacing which his non-uniformly
The codebook can be initialized with the values obtained
spaced on the linear frequency axis and it is uniformly
from the feature vectors. DHMM is the only classifier
spaced on the linear frequency axis, with the larger
based on probability. This paper utilizes this technique as
number of filters in the low frequency region and lesser
a comparator based on the probability basis. Since
number of filters in the high frequency region and is
DHMM is a time consuming process, it improves the
shown in Fig. 3. efficiency of the desired system [6].
Logarithm of Energies: To compute the log-energy, i.e., Hardware Architecture: The desired system can be
the logarithm of the sum of filtered components for each implemented in the Altera DE1 board. The desired system
filter. Equation(3.4) expresses the computing logarithm of can be evaluated and can be implemented through the
weighted sum of spectral values in the filter-bank channel. System On Chip architecture of FPGA. The SOC
At this stage, the number of architecture can be explained below as shown in Fig. 4.
All the algorithms of the desired methodology can be
(3.4) implemented through the NIOS-II processor. The Altera
DE1 development board in which the CYCLONE-II
rows equal to number of frames and the number of processor is included is used for this experiment.
columns equal to the number of filters in the filter The push button is used here for noise suppression.
bank. The toggle switch can be used as an input for the FPGA
board. This can be used as an authentication purpose.
Discrete Cosine Transform: The cepstral analysis The microphone can be used as an output for checking
includes the conversion of spatial domain to frequency the robustness of ASR. The Liquid Crystal Display will be
domain by applying DCT to the Mel Scale values. DCT used to display the words under the conversion of
expresses a finite set of data points in terms of a sum of Speech-to-Text. An Audio Controller is used to receive
cosine functions. The conversion of DCT is similar to the the speech signal. The I2C protocol is used to control the
DFT in the conversion process, DCT is more preferable register of the platform [6].
1508
Middle-East J. Sci. Res., 23 (7): 1506-1511, 2015
1509
Middle-East J. Sci. Res., 23 (7): 1506-1511, 2015
CONCLUSION REFERENCES
The speech signal can be processed and that can be 1. Yuan Mang, 2004. Speech Recognition on DSP:
trained and compared with the feature vectors that are Algorithm on optimization & performance
obtained by processing the speech. DHMM technique is analysis, The Chinese University of Hong Kong,
a slight time consuming process but it provides accuracy pp: 1-18.
for robust speech recognition. 2. Huggins-Daines D., M. Kumar, A. Chan,
The future work is to be implemented in ALTERA A. Black, M. Ravishekar and A. Rudnicky,
DE1 FPGA starter kit and this also can be used to convert 2006. Pocketsphinx: A free, real-time
the speech to text. The research work can be extended to continuous speech recognition system for
activate the voice controlled device for an authentication hand-held devices, in Proceedings of
purpose. ICASSP.
1510
Middle-East J. Sci. Res., 23 (7): 1506-1511, 2015
3. Rumia Sultana and Rajesh Palit, 2014. A Survey on 5. Joshi, Siddhant, C. and Dr. A.N. Cheeran, 2014.
Bengali Speech-To-Text Recognition Techniques, MATLAB Based Feature Extraction Using Mel
The 9th International Forum on Strategic Technology, Frequency Cepstrum coefficients for Automatic
Coxs Bazar, Bangladesh. Speech Recognition, IJSETR, 3(6).
4. Muda Lindasalwa, Mumtaj Begam and I. Elamvazuthi, 6. Pan Shing-Tai and Xu-Yu Li, 2012. An FPGA Based
2010. Voice recognition algorithm using MFCC & Embedded Robust Speech Recognition System
DTW techniques, Journal of Computing, ISSN 2151- Designed By Combining Empirical Mode
9617, 2(3): 138-143. Decomposition and a Genetic Algorithm, IEEE Trans
on Instrumentation and Measurement, 61(9).
1511