0% found this document useful (0 votes)

49 views9 pages

Isolated-Word Speech Recognition Using Hidden Markov Models: H Akon Sandsmark December 18, 2010

The document summarizes the process of isolated-word speech recognition using hidden Markov models. It discusses: 1) Hidden Markov models are used to train one model for each word to recognize. Features extracted from speech are passed to each model and the best match is selected. 2) The forward algorithm calculates the probability of observations for a model using recursive calculations of forward variables. 3) The Baum-Welch algorithm iteratively trains models by estimating transition probabilities and observation probabilities using forward-backward variables to maximize likelihood.

Uploaded by

Übel L RE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views9 pages

Isolated-Word Speech Recognition Using Hidden Markov Models: H Akon Sandsmark December 18, 2010

Uploaded by

Übel L RE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Isolated-word speech recognition using hidden Markov models

Håkon Sandsmark

December 18, 2010

1 Introduction
Speech recognition is a challenging problem on which much work has been done the last
decades. Some of the most successful results have been obtained by using hidden Markov
models as explained by Rabiner in 1989 [1].
A well working generic speech recognizer would enable more efficient communication
for everybody, but especially for children, analphabets and people with disabilities. A
speech recognizer could also be a subsystem in a speech-to-speech translator.
The speech recognition system implemented during this project trains one hidden
Markov model for each word that it should be able to recognize. The models are trained
with labeled training data, and the classification is performed by passing the features to
each model and then selecting the best match.

apple

banana

kiwi
speech
feature extraction lime classification

orange

peach

pineapple

Figure 1: Flow chart of the system. The features extracted from the speech signal are
passed to each word model and the best match is selected.

1
2 Background theory
2.1 Hidden Markov models
Basic knowledge of hidden Markov models is assumed, but the two most important
algorithms used in this project will be described.
The observable output from a hidden state is assumed to be generated by a mul-
tivariate Gaussian distribution, so there is one mean vector and covariance matrix for
each state. We will also assume that the state transition probabilities are independent
of time, such that the hidden Markov chain is homogenous.
We will now define the notation for describing a hidden Markov model as used in
this project. There is a total number of N states. An element ass0 in the transition
probability matrix A denotes the transition probability from state s to state s0 , and
the probability for the chain to start in state s is πs . The mean vector and covariance
matrix for the multivariate Gaussian distribution modeling the observable output from
state s are µs and Σs , respectively. For an observation o, bs (o) denotes the probability
density of the multivariate Gaussian distribution of state s at the values of o. We will
sometimes denote the collection of parameters describing the hidden Markov model as
λ = {A, π, µ, Σ}.

2.2 The forward algorithm

We want to calculate the probability density of an observation o1 , . . . , oT for a specific
model. This will be used to select the model (i.e. word) that most likely generated the
speech signal.
X
f (o1 , . . . , oT ; λ) = f (o1 , . . . , oT , sT ; λ) (1)
sT
X
= f (oT |o1 , . . . , oT −1 , sT ; λ)f (o1 , . . . , oT −1 , sT ; λ) (2)
sT
X X
= bsT (oT ) f (o1 , . . . , oT −1 , sT −1 , sT ; λ) (3)
sT sT −1
X X
= bsT (oT ) f (sT |o1 , . . . , oT −1 , sT −1 ; λ)f (o1 , . . . , oT −1 , sT −1 ; λ)
sT sT −1
(4)
X X
= bsT (oT ) asT −1 sT f (o1 , . . . , oT −1 , sT −1 ; λ) (5)
sT sT −1

The recursive structure is revealed as we reduced the problem from needing f (o1 , . . . , oT , sT ; λ)
for all sT to needing f (o1 , . . . , oT −1 , sT −1 ; λ) for all sT −1 . Let us introduce the forward
variable to ease the notation.

2
α1 (s) ≡ f (o1 , S1 = s; λ) (6)
= bs (o1 )πs (7)
αt (s) ≡ f (o1 , . . . , ot , St = s; λ) (8)
X
= bs (ot ) as0 s αt−1 (s0 ) (9)
s0

Then our solution can be expressed nicely as

X
f (o1 , . . . , oT ; λ) = αT (s). (10)
s

Implemented naı̈vely top-down (backwards in time) this would not bring us any luck
because of the exponentially recursive structure. The naı̈ve algorithm is however easily
convertible to an efficient variant using dynamic programming where we calculate the
forward variables bottom-up (forwards in time). We simply calculate αt (s) for all states
s, first for t = 1 and then all the way up to T . This way all the forward variables from
the previous time step are readily available when needed.

2.3 The Baum-Welch algorithm

We want to find the parameters λ that maximize the likelihood of the observations. This
will be used to train the hidden Markov model with speech signals. The Baum-Welch
algorithm is an iterative expectation-maximization (EM) algorithm that converges to a
locally optimal solution from the initialization values.
The M-step consists of updating the parameters in the following intuitive way:
expected number of times in state s at t = 1
πs := πs = (11)
expected number of times at t = 1
expected number of transitions from s to s0
ass0 := ass0 = (12)
expected number of transitions from s
µs := µs = expected observation when in state s (13)
Σs := Σs = observation covariance when in state s (14)

(t)
The E-step thus consists of calculating these expectations for a fixed λ. Let Vs
(t)
denote the event of transition from state s at time step t, and Vs,s0 the event of transition
from s to s0 at t. Then we calculate these expectations by using indicator functions and
linearity of expectation.

3
πs = E{1[Vs(1) ]} = P (Vs(1) ) (15)
P (t) P (t)
E{ t 1[Vs,s0 ]} t P (Vs,s0 )
ass0 = P (t)
= P (t)
(16)
E{ t 1[Vs ]} t P (Vs )
P (t) P (t)
E{ t 1[Vs ]ot } t P (Vs )ot
µs = P (t)
= P (t)
(17)
E{ t 1[Vs ]} t P (Vs )
(t) (t)
E{ t 1[Vs ](ot oT T T
P P
t − µs µs )} t P (Vs )ot ot
Σs = P (t)
= P (t)
− µ s µs T (18)
E{ t 1[Vs ]} t P (Vs )

Note that the non-italic T denotes transpose and has nothing to do with time.
To be able to calculate these probabilities we first introduce the backward variable
which is very similar to the forward variable previously defined.

βT (s) ≡ 1 (19)
βt (s) ≡ f (ot+1 , . . . , oT |St = s; λ) (20)
X
= ass0 bs0 (ot+1 )βt+1 (s0 ) (21)
s0

The backward variable has its name because it is first calculated for the last time step
and then backwards in time when implemented with dynamic programming (essentially
the reverse procedure of the one described in detail for the forward variable).
Then we rename the probabilities to the same symbols as used by Rabiner and express
them by forward and backward variables:

γt (s) ≡ P (Vs(t) ) = P (St = s|o1 , . . . , oT ; λ) (22)

f (o1 , . . . , oT |St = s)P (St = s)
= (23)
f (o1 , . . . , oT )
f (o1 , . . . , ot , St = s)f (ot+1 , . . . , oT |St = s)
= (24)
f (o1 , . . . , oT )
αt (s)βt (s)
= (25)
f (o1 , . . . , oT )

And similarly (details omitted):

(t)
ξt (s, s0 ) ≡ P (Vs,s0 ) = P (St = s, St+1 = s0 |o1 , . . . , oT ; λ) (26)
αt (s)bs0 (ot+1 )ass0 βt+1 (s0 )
= (27)
f (o1 , . . . , oT )

And finally we get the following:

4
πs = γ1 (s) (28)
0
P
t ξt (s, s )
ass0 = P (29)
γt (s)
P t
γt (s)ot
µs = Pt (30)
γt (s)
P t T
t γt (s)ot ot
Σs = P − µ s µs T (31)
t γ t (s)
To summarize the E-step boils down to computing γt (s) and ξt (s, s0 ) for all s, s0
and t while the parameters λ are fixed, and then the M-step will update λ by using the
calculations done in the E-step. This is iterated until satisfaction.

3 System design
3.1 Feature extraction
The source speech is sampled at 8000 Hz and quantized with 16 bits. The signal is split
up in short frames of 80 samples corresponding to 10 ms of speech. The frames overlap
with 20 samples on each side. The idea is that the speech is close to stationary during
this short period of time because of the relatively limited flexibility of the throat. We
will pick out our features from the frequency domain, but before we get there by taking
the fast Fourier transform, we multiply by a Hamming window to reduce spectral leakage
caused by the framing of the signal.

ï3
x 10
1 3.5
Speech signal
0.8 Hamming window
3
0.6

0.4 2.5

0.2
2
|X(F)|

0
1.5
ï0.2

ï0.4 1

ï0.6
0.5
ï0.8

ï1 0
0 10 20 30 40 50 60 70 80 0 500 1000 1500 2000 2500 3000 3500 4000
time F [Hz]

(a) Speech signal and Hamming window in time (b) Single-sided magnitude spectrum of the
domain. same speech signal multiplied by the Hamming
window.

Figure 2: An 80 sample frame of an unvoiced part of a speech signal. Unvoiced speech,

like ‘sh’, is more noisy and contains higher frequencies than voiced speech.

The D largest local maxima from the single-sided magnitude spectrum are are picked
as features for each frame, and D is indeed an important parameter of the system that
will be discussed later.

5
1 0.07
Speech signal
0.8 Hamming window
0.06
0.6

0.4 0.05

0.2
0.04

|X(F)|
0
0.03
ï0.2

ï0.4 0.02

ï0.6
0.01
ï0.8

ï1 0
0 10 20 30 40 50 60 70 80 0 500 1000 1500 2000 2500 3000 3500 4000
time F [Hz]

(a) Speech signal and Hamming window in time (b) Single-sided magnitude spectrum of the
domain. same speech signal multiplied by the Hamming
window.

Figure 3: An 80 sample frame of a voiced part of a speech signal.

3.2 Training
The training is a combination of both supervised and unsupervised techniques. We
train one hidden Markov model per word with already classified speech signals. One
important choice is the number of different states in each model. The goal is that each
state should represent a phoneme in the word. The clustering of the Gaussians is however
unsupervised and will depend on the initial values used for the Baum-Welch algorithm.
For this project, totally random guesses (that obey the statistical properties) for A
and π were used as initial values. For Σs , the diagonal covariance matrix for the training
data was used for all states. For each state a random training data point was chosen as
µs . The training examples for each word are concatenated together, and Baum-Welch
is run for 15 iterations.

3.3 Classification
Let λi denote the parameter set for word i. When presented with an observation
o1 , . . . , oT , the selection is done as follows.

predicted word = arg max f (o1 , . . . , oT ; λi ) (32)

And we recognize that f (o1 , . . . , oT ; λi ) is exactly what the forward algorithm com-
putes.

4 Experimental setup and results

For each of the seven words, 15 utterances by this author were recorded. The perfor-
mance of the system was measured by five-fold cross-validation on the recorded data set
of 105 utterances. Experimentation indicated that the two most important parameters

6
Training apple Training lime
4000 4000

3500 3500

3000 3000

2500 2500
F2 [Hz]

F2 [Hz]
2000 2000

1500 1500

1000 1000

500 500

0 0
0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 2500 3000 3500 4000
F1 [Hz] F1 [Hz]

(a) Apple. (b) Lime.

Training orange Training peach

4000 4000

3500 3500

3000 3000

2500 2500
F2 [Hz]

F2 [Hz]

2000 2000

1500 1500

1000 1000

500 500

0 0
0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 2500 3000 3500 4000
F1 [Hz] F1 [Hz]

(c) Orange. (d) Peach.

Figure 4: Fitted Gaussians after ten iterations of the Baum-Welch algorithm. We have
six states with one Gaussian each. The two most dominant frequencies (features) are
shown. Each green plus is represents a frame from a training speech signal. The stars
are the means of each Gaussian, and the ellipses indicate their 75% confidence interval.
Notice the higher frequencies present in the words containing unvoiced phonemes (‘peach’
and ‘orange’) compared to the words that do not (‘apple’ and ‘lime’).

were the number of hidden states, N , and the number of frequencies extracted from
each frame, D. The cross-validation was therefore run with different values for these
parameters, and the results are shown in table 1.

5 Discussion
The results are quite good compared to the simple approach taken, especially in the
feature extraction phase. More advanced features like Mel-frequency cepstral coefficients
were considered, but we decided on simple frequencies due to the low misclassification

7
N \D 2 3 4 5 6 7 8
2 21.9% 8.6%
3 21.0% 15.2% 9.5% 12.4% 1.9% 14.3% 5.7%
4 16.2% 11.4% 8.6% 5.7% 3.8% 6.7% 4.8%
5 13.3% 8.6% 9.5% 4.8% 2.9% 5.7% 4.8%
6 12.4% 10.5% 3.8% 5.7% 7.6% 6.7% 10.5%
7 15.2% 12.4% 6.7% 10.5% 7.6% 2.9% 8.6%
8 12.4% 5.7%

Table 1: Misclassification rates for five-fold cross-validation with different values for
the number of hidden states, N , and the number of frequencies extracted from each
frame, D. Each five-fold cross-validation procedure takes about 7 minutes with the 105
utterances on a 2 GHz Intel Core 2 Duo (serial execution).

rates achieved. It should be noted that this system would not perform well if trained and
tested with different speakers. This is because of the different frequency characteristics
of different voices, especially for speakers of different gender.
We also experimented with increasing the number of training iterations for the Baum-
Welch algorithm, including setting a threshold on the likelihood difference between steps.
That, however, proved to have little benefit in practice; neither the execution time nor the
misclassification rate showed any mentionable improvements over just fixing the number
of iterations to 15. The reason why the execution time did not show any significant
improvements is because most of the execution time is spent during feature extraction,
and not in training.
It is also interesting to note that when N is too small, there are many ‘apple’s
misclassified as ‘pineapple’s, and vice versa, due to the loss of temporal information.
Another important parameter is the number of samples in each frame. If the frame
is too small, it becomes hard to pick out meaningful features, and if it is too large,
temporal information is lost. However, due to time constraints, we did not test anything
else than 80 samples for this project.
The concatenation of the training examples trains a probability of transitioning from
the ‘last state’ to the ‘initial state’ that is not needed for classification. Rabiner gives a
modified Baum-Welch algorithm for multiple training examples such that concatenation
is not necessary, but that was not implemented during this project as the concatenation
seemed to work well.

6 Conclusion and future work

During this project a system for isolated-word speech recognition was implemented and
tested. The cross-validation results are good for a single speaker. Two obvious extensions
are better support for several speakers, and support for continuos speech. The first step
towards the former would be more, and more robust, features. For the latter the simplest
approach is probably to detect word boundaries and then proceed with an isolated-word

8
recognizer.
The Matlab implementation along with the data set is published as open source and
can be found at http://code.google.com/p/hmm-speech-recognition/.

7 References
[1] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in
speech recognition,” Proceedings of the IEEE, vol. 77, pp. 257–286, Feb 1989.

[2] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and
Statistics). Springer, 1st ed. 2006. corr. 2nd printing ed., October 2007.

[3] X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing: A Guide to
Theory, Algorithm and System Development. Prentice Hall PTR, May 2001.

Probability Theory and Probability Distributions
No ratings yet
Probability Theory and Probability Distributions
8 pages
Tigist Girma Thesis Final Final PDF
No ratings yet
Tigist Girma Thesis Final Final PDF
85 pages
Assignment Anachem
75% (4)
Assignment Anachem
14 pages
AIL Report
No ratings yet
AIL Report
43 pages
Bayesian Statistics (Szábo & V.d.vaart)
No ratings yet
Bayesian Statistics (Szábo & V.d.vaart)
146 pages
Speech Processing Theory and Applications
No ratings yet
Speech Processing Theory and Applications
92 pages
Quantitative Analysis: CPA CCP Cifa
No ratings yet
Quantitative Analysis: CPA CCP Cifa
5 pages
Measures of Central Tendency: Mean, Mode, Median
No ratings yet
Measures of Central Tendency: Mean, Mode, Median
30 pages
L4 Tagging
No ratings yet
L4 Tagging
107 pages
Cs 224S / Linguist 281 Speech Recognition, Synthesis, and Dialogue
No ratings yet
Cs 224S / Linguist 281 Speech Recognition, Synthesis, and Dialogue
59 pages
Lec18 HMMs
No ratings yet
Lec18 HMMs
56 pages
Sequence Model:: Hidden Markov Models
No ratings yet
Sequence Model:: Hidden Markov Models
60 pages
Introduction
No ratings yet
Introduction
43 pages
Slides
No ratings yet
Slides
69 pages
NLP Assignment 5
No ratings yet
NLP Assignment 5
5 pages
Signal Modeling Techniques in Speech Recognition
No ratings yet
Signal Modeling Techniques in Speech Recognition
81 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
AML Mod2
No ratings yet
AML Mod2
38 pages
Topic 2
No ratings yet
Topic 2
55 pages
OREAS 551b Certificate
No ratings yet
OREAS 551b Certificate
32 pages
Speech Recognition
No ratings yet
Speech Recognition
29 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
35 pages
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
No ratings yet
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
26 pages
HMM Isolated Word Recognition
No ratings yet
HMM Isolated Word Recognition
23 pages
CpE646 6v3 PDF
No ratings yet
CpE646 6v3 PDF
44 pages
Improved Text-Independent Speaker Recognition Using Gaussian Mixture Probabilities
No ratings yet
Improved Text-Independent Speaker Recognition Using Gaussian Mixture Probabilities
46 pages
Hidden Markov Models and Sequential Data
No ratings yet
Hidden Markov Models and Sequential Data
45 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
Biometric Voice Recognition
100% (1)
Biometric Voice Recognition
33 pages
Introduction To Machine Learning CMU-10701: Hidden Markov Models
No ratings yet
Introduction To Machine Learning CMU-10701: Hidden Markov Models
30 pages
ASR System: Rashmi Kethireddy
No ratings yet
ASR System: Rashmi Kethireddy
28 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
55 pages
CCS369 - TSS-Unit 5
No ratings yet
CCS369 - TSS-Unit 5
23 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
51 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
20 pages
HMM Presentation
No ratings yet
HMM Presentation
31 pages
Lecture 7 PDF
No ratings yet
Lecture 7 PDF
23 pages
Improving The HMM Phone Models
No ratings yet
Improving The HMM Phone Models
14 pages
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
No ratings yet
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
29 pages
Hidden Markov Models in Speech Recognition: Wayne Ward
No ratings yet
Hidden Markov Models in Speech Recognition: Wayne Ward
35 pages
Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition
No ratings yet
Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition
31 pages
MSCBA - RTE01.H23 - Elizaveta & Alba
No ratings yet
MSCBA - RTE01.H23 - Elizaveta & Alba
11 pages
10a PPT pdf2
No ratings yet
10a PPT pdf2
22 pages
Speech Recognition Using Hidden Markov Models
No ratings yet
Speech Recognition Using Hidden Markov Models
22 pages
PR l23 PDF
No ratings yet
PR l23 PDF
23 pages
9) Interpreting Histograms
No ratings yet
9) Interpreting Histograms
14 pages
Gaussian Mixture Model and The EM Algorithm in Speech Recognition
No ratings yet
Gaussian Mixture Model and The EM Algorithm in Speech Recognition
22 pages
Speech Recognition Using Matlab: Objective
No ratings yet
Speech Recognition Using Matlab: Objective
2 pages
Analysis of Variance: Mcgraw-Hill/Irwin
No ratings yet
Analysis of Variance: Mcgraw-Hill/Irwin
15 pages
Effects of Fucking Alcohol
No ratings yet
Effects of Fucking Alcohol
20 pages
Recognition of Socphatic Speaking
No ratings yet
Recognition of Socphatic Speaking
7 pages
Factor Analysis Using SPSS: Example
No ratings yet
Factor Analysis Using SPSS: Example
16 pages
Inbound 9065663253201989159
No ratings yet
Inbound 9065663253201989159
9 pages
Diplomarbeit
No ratings yet
Diplomarbeit
20 pages
Test of Homogeneity Based On Geometric Mean of Variances
No ratings yet
Test of Homogeneity Based On Geometric Mean of Variances
11 pages
Cu HMM
No ratings yet
Cu HMM
13 pages
Doire Icassp 2015
No ratings yet
Doire Icassp 2015
5 pages
Markov Models: 1 Definitions
No ratings yet
Markov Models: 1 Definitions
10 pages
Business Stat CHAPTER 6
No ratings yet
Business Stat CHAPTER 6
5 pages
PR - 6 - Sample Statistics (Level 3)
No ratings yet
PR - 6 - Sample Statistics (Level 3)
4 pages
Lec19 PDF
No ratings yet
Lec19 PDF
9 pages
Acoustic Modeling Using Deep Belief Networks: Abdel-Rahman Mohamed, George E. Dahl, and Geoffrey Hinton
No ratings yet
Acoustic Modeling Using Deep Belief Networks: Abdel-Rahman Mohamed, George E. Dahl, and Geoffrey Hinton
10 pages
Random Forests - SpringerLink
No ratings yet
Random Forests - SpringerLink
6 pages
Recitation4 Notes
No ratings yet
Recitation4 Notes
6 pages
A Literature Survey of Speech Recognition and Hidden Markov Models
No ratings yet
A Literature Survey of Speech Recognition and Hidden Markov Models
6 pages
An Introduction To Hidden Markov Models
No ratings yet
An Introduction To Hidden Markov Models
12 pages
NLP Assignment 5
No ratings yet
NLP Assignment 5
5 pages
Proposal of An Intelligent Speech Recognition System: November 2012
No ratings yet
Proposal of An Intelligent Speech Recognition System: November 2012
7 pages
Syllabus ECON GR5412 Spring 2025
No ratings yet
Syllabus ECON GR5412 Spring 2025
3 pages
Lec20 PDF
No ratings yet
Lec20 PDF
7 pages
Hmms and The Forward-Backward Algorithm: Hidden Markov Models
No ratings yet
Hmms and The Forward-Backward Algorithm: Hidden Markov Models
7 pages
CSIS 5420 Final Exam - Answers (13 Jul 05)
No ratings yet
CSIS 5420 Final Exam - Answers (13 Jul 05)
8 pages
Learning Structured Models For Phone Recognition
No ratings yet
Learning Structured Models For Phone Recognition
9 pages
Conducting A Wilcoxon's Signed Rank Test in Jamovi
No ratings yet
Conducting A Wilcoxon's Signed Rank Test in Jamovi
3 pages
Hybrid HMM MLP Models For Time Series Prediction
No ratings yet
Hybrid HMM MLP Models For Time Series Prediction
8 pages
Sem 3 Mock 2022 ST Thomas-Q Maths T
No ratings yet
Sem 3 Mock 2022 ST Thomas-Q Maths T
3 pages
MST 001
No ratings yet
MST 001
5 pages
NEW AGE - Mathematics Catalogue 2013 - 2014
No ratings yet
NEW AGE - Mathematics Catalogue 2013 - 2014
7 pages
HMM-Based Speech Synthesis: Smita Chopde, Pushpa U. S
No ratings yet
HMM-Based Speech Synthesis: Smita Chopde, Pushpa U. S
6 pages
Median Center (Spatial Statistics) - ArcMap - Documentation
No ratings yet
Median Center (Spatial Statistics) - ArcMap - Documentation
3 pages
Two Pass Hidden Markov Model For Speech Recognition Systems: 1 Abstract
No ratings yet
Two Pass Hidden Markov Model For Speech Recognition Systems: 1 Abstract
5 pages
Financial Econometrics Practical
No ratings yet
Financial Econometrics Practical
2 pages
Problem 2 Reflection
No ratings yet
Problem 2 Reflection
3 pages
Schuerger - Statement of Understanding
No ratings yet
Schuerger - Statement of Understanding
10 pages

Isolated-Word Speech Recognition Using Hidden Markov Models: H Akon Sandsmark December 18, 2010

Uploaded by

Isolated-Word Speech Recognition Using Hidden Markov Models: H Akon Sandsmark December 18, 2010

Uploaded by

Isolated-word speech recognition using hidden Markov models

December 18, 2010

2.2 The forward algorithm

Then our solution can be expressed nicely as

2.3 The Baum-Welch algorithm

γt (s) ≡ P (Vs(t) ) = P (St = s|o1 , . . . , oT ; λ) (22)

And similarly (details omitted):

And finally we get the following:

Figure 2: An 80 sample frame of an unvoiced part of a speech signal. Unvoiced speech,

Figure 3: An 80 sample frame of a voiced part of a speech signal.

predicted word = arg max f (o1 , . . . , oT ; λi ) (32)

4 Experimental setup and results

(a) Apple. (b) Lime.

Training orange Training peach

(c) Orange. (d) Peach.

6 Conclusion and future work

You might also like