0% found this document useful (0 votes)
52 views4 pages

Speech Recognition Using Neural Networks: A Review: Dhavale Dhanashri, S.B. Dhonde

This document discusses the use of neural networks for speech recognition. It begins by introducing different types of neural networks that can be used, including feedforward neural networks like perceptrons and multi-layer perceptrons. Recurrent neural networks are also discussed. The document notes that neural networks are well-suited for speech recognition tasks due to their ability to learn patterns, generalize to new examples, handle non-linear relationships, and be trained with noisy data. Hybrid systems combining hidden Markov models and neural networks have been shown to perform better than HMM-GMM systems for speech recognition. In conclusion, the document provides an overview of how neural networks have significantly advanced the field of speech recognition.

Uploaded by

xtfr2215h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views4 pages

Speech Recognition Using Neural Networks: A Review: Dhavale Dhanashri, S.B. Dhonde

This document discusses the use of neural networks for speech recognition. It begins by introducing different types of neural networks that can be used, including feedforward neural networks like perceptrons and multi-layer perceptrons. Recurrent neural networks are also discussed. The document notes that neural networks are well-suited for speech recognition tasks due to their ability to learn patterns, generalize to new examples, handle non-linear relationships, and be trained with noisy data. Hybrid systems combining hidden Markov models and neural networks have been shown to perform better than HMM-GMM systems for speech recognition. In conclusion, the document provides an overview of how neural networks have significantly advanced the field of speech recognition.

Uploaded by

xtfr2215h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Intern at ional Jo urna l of Mult idisciplinary Research and Dev elo pmen t

Volume: 2, Issue: 6, 226-229


June 2015 Speech Recognition Using Neural Networks: A Review
www.allsubjectjournal.com
e-ISSN: 2349-4182
p-ISSN: 2349-5979 Dhavale Dhanashri, S.B. Dhonde
Impact Factor: 3.762
Abstract
Dhavale Dhanashri In this review paper firstly we will look after the types of neural networks and their introduction. Also the
Affiliated to savitribai phule hybrid architecture of HMM and NN is also studied. Developments in the field of neural network will be
Pune University Department discussed. Deep neural networks are mostly used for ASR systems. They give better performance as
of Electronics, AISSMS- compare to traditional GMM. When used in hybrid architecture with HMM, deep neural networks give
IOIT, Pune, Maharashtra, better performance as compared to HMM-GMM system.
India.
Keywords: Speech recognition system, Neural network, Feedforward neural network, Recurrent neural
S.B. Dhonde network
Department of Electronics,
AISSMS-IOIT, Pune, 1. Introduction
Maharashtra, India For communication purpose, speech is considered to be the most easiest way. For recognizing
humans by analyzing their voices and understanding others, speech is mostly used. It is the
useful interface to interact with machines. The process of converting speech signals into words
with the help of an algorithm which is known as computer programme is called as speech
recognition [1] To understand the human voice and languages for computer speech recognition
technology proves to be very useful. There are three approaches to the speech recognition
namely template based, knowledge based, stastical based approach [20].

Template-based approaches, in this approach speech is compared against number of


templates. After that the best match is found out which is the required output. In this approach
errors due to segmentation can be avoided, but as the recorded templates are fixed, variations
in speech can be modeled by many templates which again becomes impractical. This is the
main disadvantage of this approach.

Knowledge-based approaches, Knowledge based approach uses the information regarding


linguistic, phonetic and spectrogram. In this approach we can model the speech variations but
it is difficult to obtain. so this approach was judged to be impractical, and automatic learning
procedures were sought instead.

Statistical-based approaches, in which variations in speech are modeled statistical using


automatic learning procedures. Hidden markov model is used. The statistical models prepared
make a priori modeling assumptions, which are liable to be inaccurate to the systems
performance. This disadvantage of approach can be removed by using neural network [20, 1].
People understand the speech well. Depend on this, many speech and speaker recognition
systems have been developed. Having long history in speech recognition, neural networks are
mostly used in acoustic model [19]. The purpose of this review paper is to understand the proper
usage of neural networks in the field of speech processing. In this review paper, different types
of neural network methods that are used for speech recognition are explained [17]. Also it gives
the basic idea about why they used and what are they. In recent years most of the work has
been done in the field of neural networks. Results shows that neural networks are proved to be
performance improving in case of speech recognition systems [16]. This paper basically deals
with the different types of neural networks and hybrid approach of HMM and NN model for
Correspondence: speech recognition.
Dhavale Dhanashri
Affiliated to savitribai phule 2. Neural Network in the Field Of Speech Recognition
Pune University Department Neural network is mathematical model which is used to perform a particular function. It works
of Electronics, AISSMS-
like a human brain. Computers are trained by using machine learning algorithms to perform
IOIT, Pune, Maharashtra,
tasks by their own. There is need to focus on some properties of neural networks. It include
India.
~226~
International Journal of Multidisciplinary Research and Development

Trainability: Networks are trained to form association B. Perceptrons and multi-layer perceptrons
between input layer and output layer. This type of ability can It is one of the type of feed forward neural network. A
be used to train the network how to classify speech patterns perceptron is a simple neuron model that consists of set of
into phoneme categories. inputs, weights regarded each input and the activation
functions. Neuron performs the activated function to the
Generalization. Networks dont just memorize the training weighted sum of inputs before sending the value to its output
[17]
data; rather, they learn the underlying patterns, so they can . The perceptron model is shown in Figure 1, where x is an
generalize from the training data to new examples. This is input vector, w is a weight vector and the activation function
essential in speech recognition, because acoustical patterns are is a step function.
never exactly the same.

Nonlinearity. Networks can compute nonlinear,


nonparametric functions of their input, enabling them to
perform arbitrarily complex transformations of data. This is
useful since speech is a highly nonlinear process.

Robustness. Networks are tolerant of both physical damage


and noisy data; in fact noisy data can help the networks to
form better generalizations. This is a valuable feature, because
speech patterns are notoriously noisy.
Fig 1: Model of perceptron
Uniformity. Networks offer a uniform computational
paradigm which can easily integrate constraints from different A multilayer perceptron has at least two layers of perceptron.
types of inputs. This makes it easy to use both basic and It contains input layer, one or more hidden layer and output
differential speech inputs, for example, or to combine acoustic layer. Hidden layer works as a feature extractor. It uses
and visual cues in a multimodal system. nonlinear function like sigmoid or radial basis function to
generate input [17]. The hidden layer consists of non-linear
Parallelism. Networks are highly parallel in nature, so they sigmoidal activation function neurons [19]. The number of
are well-suited to implementations on massively parallel neurons present in the hidden layer depends on the amount of
computers. This will ultimately permit very fast processing of input data, no of neurons in output layer, the needed
speech or other data [20]. Artificial neural network is the generalization capacity of the network and size of the training
approach used for machine learning. These machine learning set [19]. The output of all the neurons in the hidden layer act as
algorithms leads to many improvements in the field of speech a input to the next layer.
recognition [10]. Artificial neural networks, these are the
systems consisting of nodes (neurons)interconnected with C. Recurrent Neural Network
each other. They works similar to the human brain [17]. In this type of neural network output of neuron is multiplied
Artificial neural network contains many processing elements by a weight and fed bact to the neuron including delay [17].
connected which influence each others behavior via network Recurrent neural networks (RNNs) are a powerful model for
of weight [20]. Each unit in the neural network computes a sequential data [18]. Here the state of neuron is the input and
nonlinear weighted sum of its input and pass it over to the the previous state of neuron itself. RNNs are inherently deep
other units through the outgoing connections [20]. in time, since their hidden state is a function of all previous
hidden states. In order to train RNNs, end to end training
2.1 Different types of neural networks methods like connectionist temporal classification is used. It is
There are different types of neural networks all over for sequence labeling problems where the input-output
researchers searched. In order to map the complex inputs into alignment is unknown [18]. As compared to MLP, RNNs have
simple outputs neural networks are used. They perform static achieved better performance in speech recognition. The
pattern recognition for example such as an N-array training algorithm used is more complex and are sensitive
classification of the input patterns [20]. which can cause problem [17].

A. Feedforward neural networks D. Self-organizing maps


It is the one way connection without back loop is used. It has Self-organizing maps are the technique of converting high
only connections forward in time. Suppose a neuron is in layer dimensional space to a smaller dimensional space. Because of
a then it can only send data to neuron in layer b if b>a. The this the input vectors which are close to each other
layers which are adjacent to each other can be connected corresponds to the neurons which are close to each other in
together as in multilayer perceptrons Also there are shortcuts map. There is a code vector associated with each neuron in the
between the layers which are not adjacent [17]. In this one input network which points to a corresponding neuron in the map.
is associated with one output so it is called as static. Feed Competitive learning is used to train the network. These maps
forward connections are used by a time delay network which are used in speech recognition tasks mostly [17].
is to be used in classification of data with weighted delays [17].
There is a fixed weight mapping from inputs to outputs in 3. Learning
feedforward networks as the weights of a feedforward neural By training neural network, the network classifies data well
network are fixed after training. It is confirmed that the state and not overleam details of the training data. There are two
of any neuron is determined by its input and output pattern ways to train a network either by supervised way or
and not by its initial state or past state. It indicates that it is a unsuperwised way. In superwised way the network is given a
static and no dynamics are involved. It is the most popular set of labeled data for learning and in case of unsuperwised
neural network used today. way the task of the network is to find clusters of data that are
~227~
International Journal of Multidisciplinary Research and Development

similar [17]. Previous instantiations of the neural network new solution for large database. It is an interesting research
approach have used the backpropagation algorithm to train the area. Neural networks act like a human brain. In this paper we
neural networks discriminatively [6]. In the next sections we gave overview of the types of the neural networks and hybrid
will see them in detail. architecture of HMM and NN model. Hybrid architecture of
HMM and NN works well for the acoustic model of speech
3.1 Supervised learning recognition. Later in that deep neural networks are involved.
In this learning, the training data is classified first using They work well in noise also. Research is going on in this
speech recognition system or manually. After that the network field to find out some more facts about neural network.
is trained by using this data to compute the data of
classification. The network iteratively changes its weights to 6. References
minimize a given cost function E. In error back propagation 1. M.A.Anusuya, S.K.Katti, Speech Recognition by
algorithm first the error is calculated at output and then it is Machine: A Review, (IJCSIS) International Journal of
send back to network and partial derivatives of error are Computer Science and Information Security, Vol. 6, No.
calculated. Each neuron is updated and then new iteration 3,pp.181-205, 2009
starts. This algorithm is used for updates [17]. 2. Bo Li and Khe Chai Sim, A Spectral Masking Approach
to Noise-Robust Speech Recognition Using Deep Neural
3.2 Unsupervised learning Networks, IEEE/ACM Transactions On Audio, Speech,
In unsupervised learning there is no need to define the target And Language Processing, Vol. 22, No. 8, pp. 1296-
output. It tries to figure out the underlying pattern or trend in 1305, AUGUST 2014
the input data alone. Here no labeled data is given to the 3. Y. Bengio, Learning deep architectures for AI,
network. Instead of this there are some similarity measures Foundat. and Trends Mach. Learn., vol. 2, no. 1, pp. 1
like cosine distance that it uses to find the input vectors whose 127, 2009
distance according to similarity measure is small. 4. Xiaohui Zhang, Jan Trmal, Daniel Povey, Sanjeev
Khudanpur, Improving Deep Neural Network Acoustic
4. Hybrid HMM/NN Models Models Using Generalized Maxout Networks,IEEE
Combination of hidden Markov Model and Neural Network Interrnational Conference On Acoustic, Speech and
works as an alternative paradigm for ASR started between Signal, pp 214-219, 2014
1980s.In hybrid NN-HMM model each output unit of NN is 5. Xicai Yue, Datian Ye, Chongxun Zheng, Xiaoyu Wu,
trained to estimate the posterior probability of a continuous Neural networks for improved text independent speaker
density HMMs state given the acoustic observations [7]. Use identification, IEEE Engineering In Medicine And
of combination of HMM and NN for speech recognition gives Biology, pp 53-58, April 2002
better results than GMM.As comparing to GMM, neural 6. Abdel-rahman Mohamed, George E. Dahl, and Geoffrey
networks gives the same performance but require smaller Hinton, Acoustic Modeling Using Deep Belief
amount of parameters [17]. Most of the work on the hybrid Networks, IEEE Transactions On Audio, Speech, And
approach used context-independent phone states as labels for Language Processing, Vol. 20, No. 1, pp. 14-22,
NN training and considered small vocabulary tasks. ANN- JANUARY 2012
HMMs were later extended to model context-dependent 7. George E. Dahl, Dong Yu,Li Deng,and Alex Acero,
phones and were applied to mid-vocabulary and some large- Context-Dependent Pre-Trained Deep Neural Networks
vocabulary ASR tasks [7]. There are some limitations for this for Large-Vocabulary Speech Recognition, IEEE
hybrid approach. By using only backpropagation to train the Transactions On Audio, Speech, And Language
network makes it challenging to exploit more than two hidden Processing, Vol. 20, No. 1, pp, 30-42, JANUARY 2012
layers well. In the language processing field and speech 8. Ke Chen, Ahmad Salman, Learning Speaker-Specific
recognition neural networks are used widely. There have been Characteristics with a Deep Neural Architecture, IEEE
numerous applications of neural networks in these fields. Transactions On Neural Networks, Vol. 22, No. 11, pp
Neural networks particularly deep networks with many hidden 1744-1756, November 2011
layers are capable of modeling complex structures [9] 9. Ruhi Sarikaya, Geoffrey E. Hinton, and Anoop Deoras,
There are three main reasons which were responsible for the Application of Deep Belief Networks for Natural
use of neural networks as high-quality acoustic models: (1) Language Understanding, IEEE/ACM Transactions On
making the networks deeper makes them more powerful, Audio, Speech, And Language Processing, Vol. 22, No.
hence deep neural networks (DNN); 2) initializing the weights 4, pp 778-784, April 2014
sensibly and using much faster hardware makes it possible to 10. Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl,
train deep neural networks effectively, and 3) using a larger Abdel-rahman Mohamed, Navdeep Jaitly, Andrew
number of output units greatly improves their performance Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N.
[12]
. As compared to other networks, deep neural networks Sainath, and Brian Kingsbury, Deep neural networks for
have higher modeling capacity with the same number of acoustic modeling in speech recognition, IEEE Signal
parameters. But deep neural networks are harder to train, both Processing Magazine, pp 82-97, November 2012
as stochastic top-down generative models and as deterministic 11. Li Deng, Geoffrey Hinton, and Brian Kingsbury, NEW
bottom-up discriminative models [9]. The DNN architecture TYPES OF DEEP NEURAL NETWORK LEARNING
can be used for multi-task learning in several different ways FOR SPEECH RECOGNITION AND RELATED
and DNNs are far more effective than GMMs at leveraging APPLICATIONS: AN OVERVIEW, IEEE Publication,
data from one task to improve performance on related tasks [12]. pp 8599-8603, 2013
12. Jonas Gehring, Wonkyum Lee, Kevin Kilgour, Ian Lane,
5. Conclusion Yaije Miao, Alex Waibel, Modular Combination of
This paper is showing that the neural networks are the most Deep Neural Networks for Acoustic Modeling,
important in the field of speech recognition. They came with a INTERSPEECH 2013

~228~
International Journal of Multidisciplinary Research and Development

13. Veera Ala-Keturi, Speech Recognition Based on


Artificial Neural Networks, Helsinki hnology Institute of
Technology, 2004
14. Alex Graves, Abdel-rahman Mohamed and Geoffrey
Hinton, SPEECH RECOGNITION WITH DEEP
RECURRENT NEURAL NETWORKS, ICASSP, pp
6645-6649, 2013
15. Wouter Gevaert, Georgi Tsenov, Valeri Mladenov,
Neural Networks used for Speech Recognition, Journal
Of Automatic Control, University Of Belgrade, Vol. 20,
pp 1-7, 2010
16. Joe Tebelskis, Speech Recognition using Neural
Networks, May 1995

~229~

You might also like