Speech Recognition Using Neural Networks: A Review: Dhavale Dhanashri, S.B. Dhonde
Speech Recognition Using Neural Networks: A Review: Dhavale Dhanashri, S.B. Dhonde
Trainability: Networks are trained to form association B. Perceptrons and multi-layer perceptrons
between input layer and output layer. This type of ability can It is one of the type of feed forward neural network. A
be used to train the network how to classify speech patterns perceptron is a simple neuron model that consists of set of
into phoneme categories. inputs, weights regarded each input and the activation
functions. Neuron performs the activated function to the
Generalization. Networks dont just memorize the training weighted sum of inputs before sending the value to its output
[17]
data; rather, they learn the underlying patterns, so they can . The perceptron model is shown in Figure 1, where x is an
generalize from the training data to new examples. This is input vector, w is a weight vector and the activation function
essential in speech recognition, because acoustical patterns are is a step function.
never exactly the same.
similar [17]. Previous instantiations of the neural network new solution for large database. It is an interesting research
approach have used the backpropagation algorithm to train the area. Neural networks act like a human brain. In this paper we
neural networks discriminatively [6]. In the next sections we gave overview of the types of the neural networks and hybrid
will see them in detail. architecture of HMM and NN model. Hybrid architecture of
HMM and NN works well for the acoustic model of speech
3.1 Supervised learning recognition. Later in that deep neural networks are involved.
In this learning, the training data is classified first using They work well in noise also. Research is going on in this
speech recognition system or manually. After that the network field to find out some more facts about neural network.
is trained by using this data to compute the data of
classification. The network iteratively changes its weights to 6. References
minimize a given cost function E. In error back propagation 1. M.A.Anusuya, S.K.Katti, Speech Recognition by
algorithm first the error is calculated at output and then it is Machine: A Review, (IJCSIS) International Journal of
send back to network and partial derivatives of error are Computer Science and Information Security, Vol. 6, No.
calculated. Each neuron is updated and then new iteration 3,pp.181-205, 2009
starts. This algorithm is used for updates [17]. 2. Bo Li and Khe Chai Sim, A Spectral Masking Approach
to Noise-Robust Speech Recognition Using Deep Neural
3.2 Unsupervised learning Networks, IEEE/ACM Transactions On Audio, Speech,
In unsupervised learning there is no need to define the target And Language Processing, Vol. 22, No. 8, pp. 1296-
output. It tries to figure out the underlying pattern or trend in 1305, AUGUST 2014
the input data alone. Here no labeled data is given to the 3. Y. Bengio, Learning deep architectures for AI,
network. Instead of this there are some similarity measures Foundat. and Trends Mach. Learn., vol. 2, no. 1, pp. 1
like cosine distance that it uses to find the input vectors whose 127, 2009
distance according to similarity measure is small. 4. Xiaohui Zhang, Jan Trmal, Daniel Povey, Sanjeev
Khudanpur, Improving Deep Neural Network Acoustic
4. Hybrid HMM/NN Models Models Using Generalized Maxout Networks,IEEE
Combination of hidden Markov Model and Neural Network Interrnational Conference On Acoustic, Speech and
works as an alternative paradigm for ASR started between Signal, pp 214-219, 2014
1980s.In hybrid NN-HMM model each output unit of NN is 5. Xicai Yue, Datian Ye, Chongxun Zheng, Xiaoyu Wu,
trained to estimate the posterior probability of a continuous Neural networks for improved text independent speaker
density HMMs state given the acoustic observations [7]. Use identification, IEEE Engineering In Medicine And
of combination of HMM and NN for speech recognition gives Biology, pp 53-58, April 2002
better results than GMM.As comparing to GMM, neural 6. Abdel-rahman Mohamed, George E. Dahl, and Geoffrey
networks gives the same performance but require smaller Hinton, Acoustic Modeling Using Deep Belief
amount of parameters [17]. Most of the work on the hybrid Networks, IEEE Transactions On Audio, Speech, And
approach used context-independent phone states as labels for Language Processing, Vol. 20, No. 1, pp. 14-22,
NN training and considered small vocabulary tasks. ANN- JANUARY 2012
HMMs were later extended to model context-dependent 7. George E. Dahl, Dong Yu,Li Deng,and Alex Acero,
phones and were applied to mid-vocabulary and some large- Context-Dependent Pre-Trained Deep Neural Networks
vocabulary ASR tasks [7]. There are some limitations for this for Large-Vocabulary Speech Recognition, IEEE
hybrid approach. By using only backpropagation to train the Transactions On Audio, Speech, And Language
network makes it challenging to exploit more than two hidden Processing, Vol. 20, No. 1, pp, 30-42, JANUARY 2012
layers well. In the language processing field and speech 8. Ke Chen, Ahmad Salman, Learning Speaker-Specific
recognition neural networks are used widely. There have been Characteristics with a Deep Neural Architecture, IEEE
numerous applications of neural networks in these fields. Transactions On Neural Networks, Vol. 22, No. 11, pp
Neural networks particularly deep networks with many hidden 1744-1756, November 2011
layers are capable of modeling complex structures [9] 9. Ruhi Sarikaya, Geoffrey E. Hinton, and Anoop Deoras,
There are three main reasons which were responsible for the Application of Deep Belief Networks for Natural
use of neural networks as high-quality acoustic models: (1) Language Understanding, IEEE/ACM Transactions On
making the networks deeper makes them more powerful, Audio, Speech, And Language Processing, Vol. 22, No.
hence deep neural networks (DNN); 2) initializing the weights 4, pp 778-784, April 2014
sensibly and using much faster hardware makes it possible to 10. Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl,
train deep neural networks effectively, and 3) using a larger Abdel-rahman Mohamed, Navdeep Jaitly, Andrew
number of output units greatly improves their performance Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N.
[12]
. As compared to other networks, deep neural networks Sainath, and Brian Kingsbury, Deep neural networks for
have higher modeling capacity with the same number of acoustic modeling in speech recognition, IEEE Signal
parameters. But deep neural networks are harder to train, both Processing Magazine, pp 82-97, November 2012
as stochastic top-down generative models and as deterministic 11. Li Deng, Geoffrey Hinton, and Brian Kingsbury, NEW
bottom-up discriminative models [9]. The DNN architecture TYPES OF DEEP NEURAL NETWORK LEARNING
can be used for multi-task learning in several different ways FOR SPEECH RECOGNITION AND RELATED
and DNNs are far more effective than GMMs at leveraging APPLICATIONS: AN OVERVIEW, IEEE Publication,
data from one task to improve performance on related tasks [12]. pp 8599-8603, 2013
12. Jonas Gehring, Wonkyum Lee, Kevin Kilgour, Ian Lane,
5. Conclusion Yaije Miao, Alex Waibel, Modular Combination of
This paper is showing that the neural networks are the most Deep Neural Networks for Acoustic Modeling,
important in the field of speech recognition. They came with a INTERSPEECH 2013
~228~
International Journal of Multidisciplinary Research and Development
~229~