0% found this document useful (0 votes)

6 views

Lecture 2

Uploaded by

anishdeshpande.signup

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lecture 2

Uploaded by

anishdeshpande.signup

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

HMMs for Acoustic Modeling

(Part I)
Lecture 2

CS 753
Instructor: Preethi Jyothi
Recall: Statistical ASR

Let O be a sequence of acoustic features corresponding to a speech signal.

d
That is, O = {O1, …, OT}, where Oi ∈ ℝ refers to a d-dimensional
acoustic feature vector and T is the length of the sequence.

Let W denote a word sequence. An ASR decoder solves the foll. problem:

W* = arg max Pr(W | O) Language

W Model
= arg max Pr(O | W) Pr(W)
W
Acoustic
Model
St+1
Isolated word recognition
Trt a11 a22 a33

a01 a12 a23 a34

Pr(O | "up")
up
Pht+1 0 1 2 3 4

St+1 b1( ) b2( ) b3( )

Ot+1
O1 O2 O3 O4 .... OT
Trt a11 a22 a33
Figure 2.1: Standard topology used to represent a phone HMM.

Pr(O | "down")
a01 a12 a23 a34
down
Pht+1 0 1 2 3 4
sub-word units Q corresponding to the word sequence W and the language model

b ()
P (W ) provides a prior
1 probability for2 W . b () b3( )
St+1
Ot+1
O O O O .... O
Compute arg max Pr(O | w)
Acoustic model:
1 The most2commonly3 used acoustic
4 models in ASR
T systems to-
day are Hidden Markov Models (HMMs). Please refer to Rabiner (1989) for a com-
Trt prehensive tutorial of a a
11HMMs and their22applicability to ASR a
33 in the 1980’s (with acoustic w
Figure 2.1: Standard topology used to represent a phone HMM.
ideas that are largely applicable to systems today). HMMs are used to build prob- features
a a a a
left abilistic models01 12 labeling problems.
23 Since speech 34 is represented O
Pr(O | "left")
for linear sequence
1 2 3
Pht+1 0
sub-word units Q corresponding to the word sequence W and the language model 4
in the form of a sequence of acoustic vectors O, it lends itself to be naturally mod-
P (W ) provides a prior probability for W .

St+1
eled using HMMs. 1 b () 2 b () 3 b ()
j
Themodel:
Acoustic HMM isThe
defined
mostby specifyingused
commonly transition probabilities
acoustic models in (a i ) and
ASR observation
systems to-
Ot+1
day(or
areemission)
O
1
probability
Hidden Markov Models
2 O
distributions
(HMMs). Please
O
(b3j (Oi ))refer
(along O
to Rabiner
....
4 with the number
T O
(1989) for aofcom-
hidden

Trt states intutorial

prehensive the HMM). An HMM
of HMMs makes
and their a transitiontofrom
applicability ASRstate i to1980’s
in the state (with
j with a
a11 a22 a33
probability
ideas of aji2.1:
that areFigure
largely . applicable
On reaching
Standard to a stateused
systems
topology the
j,today).observation
HMMsa are
to represent vector
used
phone at build
to
HMM. that state
prob-(Oj )

a
abilistic models for01
linear sequence labeling20
a12 problems. Since speech is represented
a23 a34
right
Pht+1
insub-word
0
the form of a sequence
units
1
of acoustictovectors
Q corresponding
2
O, it
the word lends itself
sequence
3
to bethe
W and naturally
languagemod-
model
4
eled using
P (W HMMs.a prior probability for W .
) provides
b1( ) b2( ) b3( j)

Pr(O | "right")
The HMM is defined by specifying transition probabilities (ai ) and observation
Ot+1 Acoustic
(or model:
emission) O The most
probability
1
commonly
O2
distributions used
(bj (O O3 O4 ....
acoustic models in ASR systems to-
i )) (along with the number of hidden OT
day are
states Hidden
in the Markov
HMM). Models
An HMM (HMMs).
makes Please refer
a transition fromtostate
Rabiner
i to(1989)
state jfor a com-
with a
prehensiveoftutorial
probability aji . On of HMMsaand
reaching statetheir applicability
j, the observationtovector
ASR inat the
that1980’s (with
state (O j)
Figure 2.1: Standard topology used to represent a phone HMM.
ideas that are largely applicable to systems today). HMMs are used to build prob-
What are Hidden Markov Models (HMMs)?

Following slides contain figures/material from “Hidden Markov Models”,

“Speech and Language Processing”, D. Jurafsky and J. H. Martin, 2019.
(https://web.stanford.edu/~jurafsky/slp3/A.pdf)
future except via the current state. It’s as if to predict tomorrow’s weath
.6 .3 .6 .1 .6
examine today’s weather but you weren’t allowed to look at yesterday’
Markov Chains
(a) (b)
Figure A.1 A Markov chain for weather .8 (a) and one for words (b), showing s
are
2 A PPENDIX A • H IDDEN M ARKOV M ODELS
transitions. A start .1 distribution p is required; setting
.1 COLD .1 p =.4 [0.1, 0.7, 0.2]
.5 for (a) woul
.5
2
probability 0.7 of starting in state 2 (cold),
state must sum to 1. Figure A.1b shows a Markov
probability 0.1 of starting
.1 chain for assigning a probabil-
in state 1 (hot)
.5
ity to a sequence of words w1 ...w.3
n . This Markov chain should be familiar; in fact,
uniformly
HOT1 WARM3
it represents a bigram language model, with each edge expressing the probability
More formally, consider a sequence of state variables
p(wi |w j )! Given the two models in Fig. A.1, we can assign a probability to any
q 1 , q 2 , ..., q i . A
Markov .6
assumption model
sequenceembodies the Markov assumption on the probabilities of this seque
.6 vocabulary.
from our .3 .6 .1
Formally, a Markov chain is specified by the following components:
when predicting the future, the
(a) past doesn’t matter, only the present.
(b)
Q = q1 q2 . . . qN a set of N states
A = a11 a12 . .Figure A.1
. an1 . . . ann aA Markov
transition chain for
probability weather
matrix (a)
A, each aand one
i j represent-for words (b), show
Markov Assumption: P(q i = a|q 1 ...q i 1 ) = P(q i = a|q i 1 )
transitions. Aing the
start
Pn probability of
distribution p
moving
is from state
required; i to
setting p
state j,
= s.t.
[0.1, 0.7, 0.2] for (a)
j=1 ai j = 1 8i
probability 0.7 of starting in state 2 (cold), probability 0.1 of starting in state 1
Figure
p = p1A.1a
, p2 , ..., pshows
N a anMarkov chaindistribution
initial probability for assigningover states.a pprobability
i is the to a sequ
weather events,More probability
for which the that the Markov
vocabulary chain will
consists start
of in state
HOT i.
, COLD , and WAR
formally,
Some statesconsider
j may havePpaj =
sequence
0, meaning ofthatstate variables q1 , q2 , ..., qi
they cannot
states are
Markov
assumption
represented as nodes in the
model embodies the Markov assumption
be initial states. Also, graph,
n
i=1 p i = and
1 the transitions, with their
on the probabilities of this se p
(H and C) for
used correspond
HMMs doesn’t to hot and
rely cold
on a weather, andstate,
start or end the observations (drawn the
instead representing from the
distri-
alphabet O over
bution 2, 3})and
= {1,initial correspond
accepting to the explicitly.
states number ofWe icedon’t
creams
useeaten
the p by Jasoninonthis
notation a
used for HMMs doesn’t rely on a start or end 1state, instead representing the distri-
giventextbook,
day. but you may seeHMM it in theAssumptions
literature :
bution over initial and accepting states explicitly. We don’t use the p notation in this
p = p 1 , p 2 , ..., p an initial
textbook, but you may see it in the literature :
N probability
1 distribution over states. p i is the
.6
probability .5 chain will start in state i. Some
that the Markov
p = p1 , p2 , ..., pN an initial probability
states j may have p distribution
= 0, meaning over
that states.
they cannot p i
be is the
initial
P j
probability that .4 n Markov chain will start in state i. Some
the
states. Also, i=1 pi = 1
states j may
HOT 1 have p j = 0, meaning
COLD2 that
QA = {qx , qy ...} a set QA ⇢PQn of legal accepting states
they cannot be initial
states. Also, i=1.5pi = 1
B1 B2
QA =
AP(1{q x , qy ...} hidden
first-order a set QA ⇢ Q model
Markov acceptingtwo
of legalinstantiates states
simplifying assumptions.
| HOT) .2 P(1 | COLD) .5
First,P(2as| HOT)
with a= first-order
.4 Markov chain, the probability of a particular
P(2 | COLD) state
= .4 depends
P(3 | HOT) .4 P(3 | COLD) .1
Aonly on the previous
first-order hidden state:
Markov model instantiates two simplifying assumptions.
π = [.8,.2]
First, as with a first-order Markov chain, the probability of a particular state depends
only on the previous Markov
state: Assumption: P(qi |q1 ...qi 1 ) = P(qi |qi 1 ) (9.6)
Figure A.2 A hidden Markov model for relating numbers of ice creams eaten by Jason (the
Second,
observations) to thethe probability
weather (H or of
C,an
theoutput
hiddenobservation
variables). oi depends only on the state that
produced theMarkov
observation qi and not P(q
Assumption: on any
i |q1other 1 ) = P(q
...qi states or any
i |qi other
1 ) observations:
(9.6)
An influential
Second, tutorial
theIndependence:by Rabiner
probability of an (1989), based on tutorials by Jack Ferguson in
Output P(ooutput
i |q1 . . .observation
qi , . . . , qT , o1 ,o.i.depends
. , oi , . . . , oonly
T ) =on the
P(o i |qstate
i) that
(9.7)
the 1960s, introduced the idea that hidden Markov models
produced the observation qi and not on any other states or any other observations: should be characterized
by threeFigure 9.3 shows
fundamental a sample HMM for the ice cream task. The two hidden states
problems:
Hidden A hidden Markov model (HMM) allows us to talk about both observed events
Markov model
(like words that we see in the input) and hidden events (like part-of-speech tags) that
we think of as causalHidden
factors inMarkov Modelmodel. An HMM is specified by
our probabilistic
the following components:
Q = q1 q2 . . . qN a set of N states
A = a11 . . . ai j . . . aNN a transition probability matrix A, each ai j representing the probability
PN
of moving from state i to state j, s.t. j=1 ai j = 1 8i
O = o1 o2 . . . oT a sequence of T observations, each one drawn from a vocabulary V =
v1 , v2 , ..., vV
B = bi (ot ) a sequence of observation likelihoods, also called emission probabili-
ties, each expressing the probability of an observation ot being generated
from a state i
p = p1 , p2 , ..., pN an initial probability distribution over states. pi is the probability that
the Markov chain will start in state i. Some states
Pn j may have p j = 0,
meaning that they cannot be initial states. Also, i=1 pi = 1
Problem 3 (Learning):
have zero probability. Given an observation sequence O and the set of states
in the HMM, learn the HMM parameters A and B.
Now that we have seen the structure of an HMM, we turn to algorithms for
Wecomputing
already things Three
saw anwith
example problems
them. of
An for HMMs
influential2 tutorial
Problem in by Rabiner
Chapter 10. In (1989),
the nextbased
threeonsec-
tutorials by Jack Ferguson in the 1960s, introduced the idea that hidden Markov
tions we introduce all three problems more formally.
models should be characterized by three fundamental problems:

Problem 1 (Likelihood): Given an HMM l = (A, B) and an observation se-

Likelihood Computation: The Forward Algorithm
quence O, determine the likelihood P(O|l ).
Problem 2 (Decoding): Given an observation sequence O and an HMM l =
(A, B), discover the best hidden state sequence Q.
Problem
Our first 3 (Learning):
problem is to compute Given an observation
the likelihood of a sequence
particularOobservation
and the set of sequence.
states
in the HMM, learn the HMM parameters A and
For example, given the HMM in Fig. 9.3, what is the probability of the sequence 3 B.
1 3? More formally:
We already saw an example of Problem 2 in Chapter 10. In the next three sec-
tions we introduce all three problems more formally.
Computing Likelihood: Given an HMM l = (A, B) and an observa-
tion sequence O, determine the likelihood P(O|l ).
9.3 Likelihood
For a MarkovComputation: The
chain, where the surface Forward
observations Algorithm
are the same as the hidden
events, we could compute the probability of 3 1 3 just by following the states labeled
3 1 3 Our
andfirst
multiplying the probabilities
problem is to compute the
“A tutorial on
along
likelihood
hidden
the
Markov modelsof
arcs.
andaselected
For
particular a hidden
observation
applications
Markovsequence.
in speech recognition”,
model,
Rabiner, 1989
g in state j after seeing the first t observations, given the automaton l . The value
this probability at ( j) by summing over the extensions of all the path
each cell at ( j) is computed by summing over the probabilities of every path that
the current cell. For a given state q j at time t, the value at ( j) is compu
uld lead us to this cell. Formally, each cellForwardexpresses the Algorithm
following probability:
XN
Forward
6 A PPENDIX A a•t ( j) H= P(o1M
IDDEN , oARKOV Mt ODELS
2 . . . ot , q = j|l ) at ( j) = (9.13)
at 1 (i)ai j b j (ot )
Probability
i=1
Here, qt = j means “the tth state in the sequence of states is state j”. We compute
s probability at ( j) by summing over the extensions
α1(2)=.32 of all the paths
α2(2)= that lead
+ .02*.1 =to
The three factors that are multiplied in Eq. 9.14 in extending the p
.32*.12 .0404

e current cell. For a given state q j at time

to t,compute
theP(H|H)
valuethe
* atforward
( j) is computed
P(1|H) probabilityas at time t are
q 2
H H P(C .6 * .2
H H
|H)
*P
N .4 * (1|C
X a (i)
.5 t )1 the previous forward path probability from the previ
at ( j) = at 1 (i)ai j b j (ot ) (9.14)
H)

a i|Hj
) the transition probability
α2(1) = .32*.2 from
+ .02*.25 = .069 previous state qi to cu
* . (3 |

α (1) =i=1
.02 P ( 1
*
.8 *P

1 |C )
.2
b the state observation likelihood of the observation sy
rt)
4

H
(o )P ( *
.5 P(C|C) j t * P(1|C)
sta

The three factors

q1 C that are multipliedC in Eq. 9.14 in extending
the the
C previous paths
current state j C
H|

) .5 * .5
P(

C
compute the forward probability
* P ( 3 |
at time t are
r t) .1
s ta *
C | .2
at 1 (i) the previous
P (
forward
3 path probability from the 1previous time step 3
ai j
π
the transition probability from previous state q
o1 i to
o2
current state q j o3
b j (ot ) the state observation likelihood of the observation symbol ot given
t
the current state j
PTER 9 • H IDDEN M ARKOV M ODELS
Visualizing the forward recursion
αt-2(N) αt-1(N)

qN qN qN
aNj αt(j)= Σi αt-1(i) aij bj(ot)

αt-2(3) αt-1(3) a3j

q3 q3 q3
a2j
αt-2(2) αt-1(2)
bj(ot)
q2 q2 a1j q2 q2

αt-2(1) αt-1(1)

q1 q1 q1 q1

ot-2 ot-1 ot ot+1

s=1
return forwardprob

Figure A.7
Forward Algorithm
The forward algorithm, where forward[s,t] represents a (s). t

1. Initialization:

a1 ( j) = p j b j (o1 ) 1  j  N

2. Recursion:
N
X
at ( j) = at 1 (i)ai j b j (ot ); 1  j  N, 1 < t  T
i=1

3. Termination:
N
X
P(O|l ) = aT (i)
i=1
have zero probability.
Figure 9.9 The forward algorithm. We’ve used the notation forward[s,t] to represent
at (s). Now that we have seen the structure of an HMM, we turn to algorithms for
computing things withThree problems
them. An for HMMs
influential tutorial by Rabiner (1989), based on
tutorials by Jack Ferguson in the 1960s, introduced the idea that hidden Markov
Decoding: The Viterbi Algorithm
models should be characterized by three fundamental problems:

Problem 1 (Likelihood): Given an HMM l = (A, B) and an observation se-

quence O, determine the likelihood P(O|l ).
For any model, such as an HMM, that contains hidden variables, the task of deter-
Problem 2 (Decoding): Given an observation sequence O and an HMM l =
mining which sequence of variables is the underlying source of some
(A, B), discover the best hidden state sequence Q.
sequence of
ecoding observations is 3called
Problem the decoding
(Learning): Giventask. In the ice-cream
an observation sequence O domain, given
and the set a sequence
of states
Decoder of ice-cream observations 3 1 3inand
the an
HMM, HMM,learnthe
the task the decoder
HMMofparameters A andisB.to find the
best hidden weather sequence (H H H). More formally,
We already saw an example of Problem 2 in Chapter 10. In the next three sec-
tions we introduce all three problems more formally.
Decoding: Given as input an HMM l = (A, B) and a sequence of ob-
servations O = o1 , o2 , ..., oT , find the most probable sequence of states
9.3 Q = q1 q 2 q3 . . . qT .
Likelihood Computation: The Forward Algorithm
Our first problem is to compute the likelihood of a particular observation sequence.
because there are an exponentially large number of state sequences.
robable path that could leadInstead, us to this cell.
cell. Formally,
For a each
given cell
state qexpresses
j at
the most common decoding algorithms for HMMs time the
t, the value
is the vViterbi
t ( j) isalgo-
computed as
ility Viterbi
algorithm Viterbi Trellis
rithm. Like the forward algorithm, Viterbi is a kind of dynamic programming
that makes uses of a dynamic programming trellis. Viterbi also strongly resembles
N
vt ( j) = maxanother dynamic
P(q1 ...qt 1 , o1programming variant,
, o2 . . . ot , qt = vt ( (A.13)
j|l ) the minimum j) =
editmax vt 1 (i)
distance ai j b j (o
algorithm oft )
q1 ,...,qt Chapter
1 2. i=1

e thatViterbi Path
we represent the most probable path The three factors
by taking that areover
the maximum multiplied
all in Eq. 9.19 for extending th
Probability
e previous state sequences max v1.(2)=.32
Liketo other dynamic
compute programming
the Viterbi algo-at time
vprobability
2(2)= max(.32*.12,
.02*.10)t =are
.038
q1 ,...,qt 1
Viterbi fillsq2each cell recursively. GivenH that we had
P(H|H) already
* P(1|H) computed
H the H
previous Viterbi path probability
H
|v the from the pre
P(C
Ht) 1 (i) .6 * .2
ility of being in every state at time t 1, we.4compute *P
(1|C the Viterbi probability
ng the most probable of the extensions of the
H) ai j paths that
* .5 )
transition
thelead probability from previous state qi to
to the current
or a given state q j at time t, the value v ( j) is computed
b) *jP(ot ) |H )
as
the state observation likelihood of the observation
v2(1) = max(.32*.20, .02*.25) = .064
* . (3|

v1(1) =t.02 ( 1
.8 *P

|C .2
the current state j
rt)
4

P ( H *
.5 P(C|C) * P(1|C)
sta

q1 C N C C C
H|

vt ( j) = max C vt 1 (i) ai j b j (ot )

) .5 * .5
(A.14)
P(

3 |
*
i=1
P (
r t) .1
s ta *
| .2
ee factors that are multiplied
P in Eq. A.14 for extending the previous paths to
( C

e the Viterbi probability

π at time t are 3 1 3
function V ITERBI(observations of len T, state-graph of len N) returns
o1 o o
create a path probability matrix viterbi[N+2,T] 3
2
1 (i) the previous Viterbi path probability from the previous time step
for each state s from 1 to
t N do ; initialization step
the transition probability from previousviterbi[s,1]
state qi to current state q
a0,s ⇤ bs (o1 ) j
Figure A.8 The Viterbi trellis for computing the best path through the hidden state space for the ice-cream
we keep a backpointer (shown with broken lines) to the best path that led us to this state.

Viterbi
Finally, we can give a formal recursion
definition of the Viterbi recursion as follows:

1. Initialization:

v1 ( j) = p j b j (o1 ) 1 jN
bt1 ( j) = 0 1 jN

2. Recursion
N
vt ( j) = max vt 1 (i) ai j b j (ot ); 1  j  N, 1 < t  T
i=1
N
btt ( j) = argmax vt 1 (i) ai j b j (ot ); 1  j  N, 1 < t  T
i=1

3. Termination:
N
The best score: P⇤ = max vT (i)
i=1
N
The start of backtrace: qT ⇤ = argmax vT (i)
i=1
also the most likely state sequence. We compute this best state sequence by keeping

Viterbi
Viterbi backtrace
track of the path of hidden states that led to each state, as suggested in Fig. A.10, and
backtrace then at the end backtracing the best path to the beginning (the Viterbi backtrace).

v1(2)=.32 v2(2)= max(.32.12, .02.10) = .038

P(H|H) * P(1|H)
q2 H H P(C .6 * .2 H H
|H)
*P
.4 * (1|C
.5 )
* .4 (3|H
) |H )
(1P v2(1) = max(.32*.20, .02*.25) = .064
) *
v1(1) = .02 |C .2
P

(H *
rt)*

P .5
sta

q1 P(C|C) * P(1|C)
C C C
.8

C
H|

) .5 * .5
P(

C
( 3|
* P
r )
t .1
s ta *
C | .2
P(
3 1 3
π
o1 o2 o3

t
Figure A.10 The Viterbi backtrace. As we extend each path to a new state account for the next observation,
Gaussian Observation Model
• So far, we considered HMMs with discrete outputs
• In acoustic models, HMMs output real valued vectors
• Hence, observation probabilities are defined using probability density functions
• A widely used model: Gaussian distribution
2 1 1
(x µ) 2
N (x|µ, )= p e 2 2
2⇡ 2

2
• HMM emission/observation probabilities bj(x) = 𝒩(x | μj, σj ) where μj is
2
the mean associated with state j and σj is its variance

• For multivariate Gaussians, bj(x) = 𝒩(x | μj, Σj) where Σj is the

covariance matrix associated with state j
Gaussian Mixture Model
• A single Gaussian observation model assumes that
the observed acoustic feature vectors are unimodal
Unimodal
23/01/2017 https://upload.wikimedia.org/wikipedia/commons/7/74/Normal_Distribution_PDF.svg
23/01/2017 Gnuplot

1.0

μ = 0, σ 2 = 0.2,
μ = 0, σ 2 = 1.0,
0.8
μ = 0, σ 2 = 5.0,
μ = −2, σ 2 = 0.5,
0.6
φμ,σ (x)
2

0.4

0.2
1
0.0
0.8
−5 −4 −3 −2 −1 0 1 2 3 4 5 0.6
x 0.4
0.2
0 3
2
1 3
0 2
1
-1 0
-2 -1
-2
-3 -3
Gaussian Mixture Model
• A single Gaussian observation model assumes that
the observed acoustic feature vectors are unimodal
• More generally, we use a “mixture of Gaussians” to
model multiple modes in the data
Mixture Models
Gaussian Mixture Model
• A single Gaussian observation model assumes that
the observed acoustic feature vectors are unimodal

• More generally, we use a “mixture of Gaussians” to

model multiple modes in the data

• Instead of bj(x) = 𝒩(x | μj, Σj) in the single Gaussian

case, bj(x) now becomes:
M
X
bj (x) = cjm N (x|µjm , ⌃jm )
<latexit sha1_base64="ZpmSZEggz1V14OmOhiizYlXlrOw=">AAACRnicbVBNS8QwEJ2uX+v6terRS3ARFGRpVdCLIHrxoqzoqrCtJc2mazRpS5KKS+2v8+LZmz/BiwdFvJrWPfg1EPJ47w0z84KEM6Vt+8mqDA2PjI5Vx2sTk1PTM/XZuVMVp5LQNol5LM8DrChnEW1rpjk9TyTFIuD0LLjeK/SzGyoVi6MT3U+oJ3AvYiEjWBvKr3uBf7XsCqwvgzC7zVfQNnJVKvxMbDv5xQEifnYlclQ6CObZYf7NfecGMe+qvjCfK9LSuuoes57AJV7x6w27aZeF/gJnABowqJZff3S7MUkFjTThWKmOYyfay7DUjHCa19xU0QSTa9yjHQMjLKjysjKGHC0ZpovCWJoXaVSy3zsyLFSxrHEWJ6jfWkH+p3VSHW55GYuSVNOIfA0KU450jIpMUZdJSjTvG4CJZGZXRC6xxESb5GsmBOf3yX/B6VrTWW+uHW00dnYHcVRhARZhGRzYhB3Yhxa0gcA9PMMrvFkP1ov1bn18WSvWoGceflQFPgGPU7MV</latexit>
m=1
where cjm is the mixing probability for Gaussian component m of state j
XM
cjm = 1, cjm 0
m=1

Hidden Markov Models
No ratings yet
Hidden Markov Models
17 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
9 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
17 pages
HMM - Extra
No ratings yet
HMM - Extra
17 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
55 pages
HMM
No ratings yet
HMM
5 pages
Sequence Model:: Hidden Markov Models
No ratings yet
Sequence Model:: Hidden Markov Models
60 pages
HMM Detailed
No ratings yet
HMM Detailed
41 pages
Asr04 HMM Intro
No ratings yet
Asr04 HMM Intro
38 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
56 pages
Hidden Markov Models (HMMS) : Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Hidden Markov Models (HMMS) : Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
Hmm
No ratings yet
Hmm
11 pages
Hidden Markov Models: Julia Hirschberg CS4705
No ratings yet
Hidden Markov Models: Julia Hirschberg CS4705
37 pages
Cis262 HMM
No ratings yet
Cis262 HMM
34 pages
Statistical Speech Processing
No ratings yet
Statistical Speech Processing
51 pages
A Literature Survey of Speech Recognition and Hidden Markov Models
No ratings yet
A Literature Survey of Speech Recognition and Hidden Markov Models
6 pages
Hidden Markov Models Applied To Information Extraction: Part I: Concept
No ratings yet
Hidden Markov Models Applied To Information Extraction: Part I: Concept
34 pages
L4 Tagging
No ratings yet
L4 Tagging
107 pages
CS 4705 Hidden Markov Models: Slides Adapted From Dan Jurafsky, and James Martin
No ratings yet
CS 4705 Hidden Markov Models: Slides Adapted From Dan Jurafsky, and James Martin
35 pages
AML TB3 CH12 Highlighted
No ratings yet
AML TB3 CH12 Highlighted
9 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
35 pages
Presentation_20241212_094152_0000
No ratings yet
Presentation_20241212_094152_0000
8 pages
MLRD 8
No ratings yet
MLRD 8
39 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
20 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
5 pages
Applications of Hidden Markov Model Stat-1
No ratings yet
Applications of Hidden Markov Model Stat-1
8 pages
A Guide To Hidden Markov Model and Its Applications in NLP
No ratings yet
A Guide To Hidden Markov Model and Its Applications in NLP
11 pages
Lecture_6_Hidden_Markov_and_Maximum_Entropy_Models
No ratings yet
Lecture_6_Hidden_Markov_and_Maximum_Entropy_Models
28 pages
Hidden Markov Model HMM
No ratings yet
Hidden Markov Model HMM
11 pages
Where: P (S0) Specifies Initial Conditions P (St+1 - ST) Specifies The Dynamics P (Ot - ST) Specifies The Sensor Model
No ratings yet
Where: P (S0) Specifies Initial Conditions P (St+1 - ST) Specifies The Dynamics P (Ot - ST) Specifies The Sensor Model
2 pages
HMM Presentation
No ratings yet
HMM Presentation
31 pages
NLP Lecture 01-10-Hmm
No ratings yet
NLP Lecture 01-10-Hmm
9 pages
Markov Models
No ratings yet
Markov Models
54 pages
HMMs Models IN NLP
No ratings yet
HMMs Models IN NLP
16 pages
2024-Fall-CSE366-12-HMM
No ratings yet
2024-Fall-CSE366-12-HMM
46 pages
Hidden Markov Model in Automatic Speech Recognition
No ratings yet
Hidden Markov Model in Automatic Speech Recognition
29 pages
Speech Recognition With Hidden Markov Model: A Review
100% (1)
Speech Recognition With Hidden Markov Model: A Review
4 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
46 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
4 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
31 pages
HMM
No ratings yet
HMM
4 pages
Hidden Markov Models: A Simple Markov Chain
No ratings yet
Hidden Markov Models: A Simple Markov Chain
46 pages
Winter Semester 2022-23 CSE3008 ETH AP2022236000448 Reference Material I 26-Apr-2023 HMM Class-1 PDF
No ratings yet
Winter Semester 2022-23 CSE3008 ETH AP2022236000448 Reference Material I 26-Apr-2023 HMM Class-1 PDF
56 pages
HMM Isolated Word Recognition
No ratings yet
HMM Isolated Word Recognition
23 pages
Parametric Models Hidden Markov Models
No ratings yet
Parametric Models Hidden Markov Models
30 pages
PR l23 PDF
No ratings yet
PR l23 PDF
23 pages
Hidden Markov Model (HMM) Tutorial: Home Ciphers Cryptanalysis Hashes Resources
No ratings yet
Hidden Markov Model (HMM) Tutorial: Home Ciphers Cryptanalysis Hashes Resources
5 pages
An Introduction To Hidden Markov Models
No ratings yet
An Introduction To Hidden Markov Models
12 pages
Hidden Markov Models and Sequential Data
No ratings yet
Hidden Markov Models and Sequential Data
45 pages
HMM
No ratings yet
HMM
41 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
51 pages
Hidden Markov Models Theory and Applications
100% (1)
Hidden Markov Models Theory and Applications
326 pages
19CSE453 - Natural Language Processing: Part of Speech Tagging
No ratings yet
19CSE453 - Natural Language Processing: Part of Speech Tagging
59 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
36 pages
Lecture Week11
No ratings yet
Lecture Week11
24 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Constructed Layered Systems: Measurements and Analysis
From Everand
Constructed Layered Systems: Measurements and Analysis
W. H. Cogill
No ratings yet
Fractional Brownian Motion: Approximations and Projections
From Everand
Fractional Brownian Motion: Approximations and Projections
Oksana Banna
No ratings yet
Human Eye and Colorful World-10th Class Lesson Plan
100% (1)
Human Eye and Colorful World-10th Class Lesson Plan
7 pages
General Luna CDP Assessment Consoloidated
No ratings yet
General Luna CDP Assessment Consoloidated
19 pages
Health Trends, Issues, and Concerns (Global Level)
No ratings yet
Health Trends, Issues, and Concerns (Global Level)
4 pages
Haridarshan Equipments: Digital Coating Thickness Gauge (Dft-111A)
No ratings yet
Haridarshan Equipments: Digital Coating Thickness Gauge (Dft-111A)
1 page
MatLab Assignment
No ratings yet
MatLab Assignment
10 pages
For Those of You Who Missed It... : Absolute Value!!!
No ratings yet
For Those of You Who Missed It... : Absolute Value!!!
21 pages
4.3. Baumert Et Al. (2013) - Justice Sensitivity Scale
No ratings yet
4.3. Baumert Et Al. (2013) - Justice Sensitivity Scale
12 pages
Homework 16-1 Modern Chemistry
100% (2)
Homework 16-1 Modern Chemistry
8 pages
MEELUX Product Introduction - EN
No ratings yet
MEELUX Product Introduction - EN
26 pages
ROADPACKER GROUP - Feb 2020 - STABILISED EARTH BRICKS - EN
100% (1)
ROADPACKER GROUP - Feb 2020 - STABILISED EARTH BRICKS - EN
21 pages
Economic Complexity
No ratings yet
Economic Complexity
22 pages
Effects of Parenting Styles and Behavior To Academic Performance
No ratings yet
Effects of Parenting Styles and Behavior To Academic Performance
101 pages
98464
No ratings yet
98464
55 pages
01J3MTDX729KBJ0NMB1CNJXBY2
No ratings yet
01J3MTDX729KBJ0NMB1CNJXBY2
13 pages
Robotics: 1-What Is A Robot ?
No ratings yet
Robotics: 1-What Is A Robot ?
4 pages
Comparison Cause Effect
No ratings yet
Comparison Cause Effect
35 pages
Healy-World-Presentation-HealAdvisor-Coach-en-US
No ratings yet
Healy-World-Presentation-HealAdvisor-Coach-en-US
13 pages
2019 Book GreenBio-processes
No ratings yet
2019 Book GreenBio-processes
451 pages
MAE166A Homework 3 Solutuion
No ratings yet
MAE166A Homework 3 Solutuion
13 pages
EIC2-Unit 10A-Reading & Vocabulary-Tran Khanh An
No ratings yet
EIC2-Unit 10A-Reading & Vocabulary-Tran Khanh An
22 pages
1.1 What Is Decision Making?: Decision Making Can Be Regarded As The Mental Processes (Cognitive Process)
No ratings yet
1.1 What Is Decision Making?: Decision Making Can Be Regarded As The Mental Processes (Cognitive Process)
12 pages
Immediate download Handbook of venoms and toxins of reptiles 2nd Edition Stephen P. Mackessy (Editor) ebooks 2024
100% (6)
Immediate download Handbook of venoms and toxins of reptiles 2nd Edition Stephen P. Mackessy (Editor) ebooks 2024
37 pages
9th Science Selection Guide English Medium PDF Download
No ratings yet
9th Science Selection Guide English Medium PDF Download
106 pages
Unit Iii Saqs and Assignment
No ratings yet
Unit Iii Saqs and Assignment
3 pages
Week 9 11 HRM
No ratings yet
Week 9 11 HRM
20 pages
Class 8 Math Worksheet 05-A Sequences (Turned In)
67% (3)
Class 8 Math Worksheet 05-A Sequences (Turned In)
3 pages
ndiaini-girls-school-magazine-2019 (1)
No ratings yet
ndiaini-girls-school-magazine-2019 (1)
25 pages
Volumetric Deterministic
100% (1)
Volumetric Deterministic
15 pages
4000 Essay
No ratings yet
4000 Essay
9 pages
Beacon Actvi 3rd Chapter Addition of Vectors
No ratings yet
Beacon Actvi 3rd Chapter Addition of Vectors
8 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

HMMs for Acoustic Modeling

Let O be a sequence of acoustic features corresponding to a speech signal.

W* = arg max Pr(W | O) Language

a01 a12 a23 a34

St+1 b1( ) b2( ) b3( )

Trt states intutorial

Following slides contain figures/material from “Hidden Markov Models”,

Problem 1 (Likelihood): Given an HMM l = (A, B) and an observation se-

e current cell. For a given state q j at time

The three factors

αt-2(3) αt-1(3) a3j

ot-2 ot-1 ot ot+1

Problem 1 (Likelihood): Given an HMM l = (A, B) and an observation se-

vt ( j) = max C vt 1 (i) ai j b j (ot )

e the Viterbi probability

v1(2)=.32 v2(2)= max(.32*.12, .02*.10) = .038

• For multivariate Gaussians, bj(x) = 𝒩(x | μj, Σj) where Σj is the

• More generally, we use a “mixture of Gaussians” to

• Instead of bj(x) = 𝒩(x | μj, Σj) in the single Gaussian

You might also like

v1(2)=.32 v2(2)= max(.32.12, .02.10) = .038