0% found this document useful (0 votes)
20 views51 pages

Vite Rbi

Uploaded by

samjhon02022002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views51 pages

Vite Rbi

Uploaded by

samjhon02022002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Review HMM Recognition Segmentation Example Summary Example

Lecture 14: Hidden Markov Models

Mark Hasegawa-Johnson
All content CC-SA 4.0 unless otherwise specified.

ECE 417: Multimedia Signal Processing, Fall 2021


Review HMM Recognition Segmentation Example Summary Example

1 Review: Bayesian Classifiers

2 Hidden Markov Models

3 Recognition: the Forward Algorithm

4 Segmentation: the Backward Algorithm

5 Numerical Example

6 Summary

7 Written Example
Review HMM Recognition Segmentation Example Summary Example

Outline

1 Review: Bayesian Classifiers

2 Hidden Markov Models

3 Recognition: the Forward Algorithm

4 Segmentation: the Backward Algorithm

5 Numerical Example

6 Summary

7 Written Example
Review HMM Recognition Segmentation Example Summary Example

Bayesian Classifiers

A Bayesian classifier chooses a label, y ∈ {0 . . . NY − 1}, that has


the minimum probability of error given an observation, ~x ∈ <D :
n o
ŷ = argmin Pr Y 6= y |X~ = ~x
y
n o
= argmax Pr Y = y |X ~ = ~x
y

= argmax pY |X~ (y |~x )


y

= argmax pY (ŷ )pX~ |Y (~x |y )


y
Review HMM Recognition Segmentation Example Summary Example

Review: Learning the Bayesian Probabilities

Here’s how we can estimate the four Bayesian probabilities:


1 Posterior, Evidence: need lots of training data, and a neural
net or kernel estimator.
2 Prior:
# times Y = y occurred in training data
pY (y ) =
# frames in training data
3 Likelihood:
K
X −1
pX~ |Y (~x |y ) = cy ,k N (~x |~
µy ,k , Σy ,k )
k=0

where cy ,k , µ
~ y ,k , Σy ,k might be estimated using EM.
Review HMM Recognition Segmentation Example Summary Example

Outline

1 Review: Bayesian Classifiers

2 Hidden Markov Models

3 Recognition: the Forward Algorithm

4 Segmentation: the Backward Algorithm

5 Numerical Example

6 Summary

7 Written Example
Review HMM Recognition Segmentation Example Summary Example

Notation: Inputs and Outputs

Let’s assume we have T consecutive observations,


X = [~x1 , . . . , ~xT ].
A “hidden Markov model” represents those probabilities by
assuming some sort of “hidden” state sequence,
Q = [q1 , . . . , qT ], where qt is the hidden (unknown) state
variable at time t.
The idea is, can we model these probabilities well enough to solve
problems like:
1 Recognition: What’s p(X ) given the model?
2 Segmentation: What state is the model in at time t?
3 Training: Can we learn a model to fit some data?
Review HMM Recognition Segmentation Example Summary Example

Notation: Inputs and Outputs


Review HMM Recognition Segmentation Example Summary Example

HMM: Key Concepts

An HMM is a “generative model,” meaning that it models the


joint probability p(Q, X ) using a model of the way in which those
data might have been generated. An HMM pretends the following
generative process:
1 Start in state qt = i with pmf πi = p(q1 = i).
2 Generate an observation, ~x , with pdf bi (~x ) = p(~x |qt = i).
3 Transition to a new state, qt+1 = j, according to pmf
aij = p(qt+1 = j|qt = i).
4 Repeat.
Review HMM Recognition Segmentation Example Summary Example

HMM: Finite State Diagram

a13
a11 a22 a33
a12 a23
1 a21 2 a32 3
a31
b1 (~x ) b2 (~x ) b3 (~x )

~x ~x ~x

1 Start in state qt = i, for some 1 ≤ i ≤ N.


2 Generate an observation, ~x , with pdf bi (~x ).
3 Transition to a new state, qt+1 = j, according to pmf aij .
4 Repeat steps #2 and #3, T times each.
Review HMM Recognition Segmentation Example Summary Example

Notation: Model Parameters

Solving an HMM is possible if you carefully keep track of


notation. Here’s standard notation for the parameters:
πi = p(q1 = i) is called the initial state probability. Let N
be the number of different states, so that 1 ≤ i ≤ N.
aij = p(qt = j|qt−1 = i) is called the transition probability,
1 ≤ i, j ≤ N.
bj (~x ) = p(~xt = ~x |qt = j) is called the observation
probability. It is usually estimated by a neural network,
though Gaussians, GMMs, and even lookup tables are possible.
Λ is the complete set of model parameters, including all the
πi ’s and aij ’s, and the Gaussian, GMM, or neural net
parameters necessary to compute bj (~x ).
Review HMM Recognition Segmentation Example Summary Example

The Three Problems for an HMM

1 Recognition: Given two different HMMs, Λ1 and Λ2 , and an


observation sequence X . Which HMM was more likely to have
produced X ? In other words, is p(X |Λ1 ) > p(X |Λ2 )?
2 Segmentation: What is p(qt = i|X , Λ)?
3 Training: Given an initial HMM Λ, and an observation
sequence X , can we find Λ0 such that p(X |Λ0 ) > p(X |Λ)?
Review HMM Recognition Segmentation Example Summary Example

Outline

1 Review: Bayesian Classifiers

2 Hidden Markov Models

3 Recognition: the Forward Algorithm

4 Segmentation: the Backward Algorithm

5 Numerical Example

6 Summary

7 Written Example
Review HMM Recognition Segmentation Example Summary Example

The HMM Recognition Problem

Given
X = [~x1 , . . . , ~xT ] and
Λ = {πi , aij , bj (~x )∀i, j},
what is p(X |Λ)?
Let’s solve a simpler problem first:
Given
X = [~x1 , . . . , ~xT ] and
Q = [q1 , . . . , qT ] and
Λ = {πi , aij , bj (~x )∀i, j},
what is p(X |Λ)?
Review HMM Recognition Segmentation Example Summary Example

Joint Probability of State Sequence and Observation


Sequence

The joint probability of the state sequence and the observation


sequence is calculated iteratively, from beginning to end:
The probability that q1 = q1 is πq1 .
Given q1 , the probability of ~x1 is bq1 (~x1 ).
Given q1 , the probability of q2 is aq1 q2 .
. . . and so on. . .
T
Y
p(Q, X |Λ) = πq1 bq1 (~x1 ) aqt−1 qt bqt (~xt )
t=2
Review HMM Recognition Segmentation Example Summary Example

Probability of the Observation Sequence

The probability of the observation sequence, alone, is somewhat


harder, because we have to solve this sum:
X
p(X |Λ) = p(Q, X |Λ)
Q
N
X N
X
= ··· p(Q, X |Λ)
qT =1 q1 =1

Onthe face of it, this calculation seems to have complexity


O N T . So for a very small 100-frame utterance, with only 10
states, we have a complexity of O 10100 =one google.

Review HMM Recognition Segmentation Example Summary Example

The Forward Algorithm

The solution is to use a kind of dynamic programming algorithm,


called “the forward algorithm.” The forward probability is defined
as follows:
αt (i) ≡ p(~x1 , . . . , ~xt , qt = i|Λ)
Obviously, if we can find αt (i) for all i and all t, we will have
solved the recognition problem, because

p(X |Λ) = p(~x1 , . . . , ~xT |Λ)


N
X
= p(~x1 , . . . , ~xT , qT = i|Λ)
i=1
N
X
= αT (i)
i=1
Review HMM Recognition Segmentation Example Summary Example

The Forward Algorithm

So, working with the definition αt (i) ≡ p(~x1 , . . . , ~xt , qt = i|Λ),


let’s see how we can actually calculate αt (i).
1 Initialize:

α1 (i) = p(q1 = i, ~x1 |Λ)


= p(q1 = i|Λ)p(~x1 |q1 = i, Λ)
= πi bi (~x1 )
Review HMM Recognition Segmentation Example Summary Example

The Forward Algorithm

Definition: αt (i) ≡ p(~x1 , . . . , ~xt , qt = i|Λ).


1 Initialize:
α1 (i) = πi bi (~x1 ), 1≤i ≤N
2 Iterate:

αt (j) = p(~x1 , . . . , ~xt , qt = j|Λ)


N
X
= p(~x1 , . . . , ~xt−1 , qt−1 = i)p(qt = j|qt−1 = i)p(~xt |qt = j)
i=1
XN
= αt−1 (i)aij bj (~xt )
i=1
Review HMM Recognition Segmentation Example Summary Example

The Forward Algorithm

So, working with the definition αt (i) ≡ p(~x1 , . . . , ~xt , qt = i|Λ),


let’s see how we can actually calculate αt (i).
1 Initialize:
α1 (i) = πi bi (~x1 ), 1≤i ≤N
2 Iterate:
N
X
αt (j) = αt−1 (i)aij bj (~xt ), 1 ≤ j ≤ N, 2 ≤ t ≤ T
i=1

3 Terminate:
N
X
p(X |Λ) = αT (i)
i=1
Review HMM Recognition Segmentation Example Summary Example

Visualizing the Forward Algorithm using a Trellis

One way to think about the forward algorithm is by way of a


trellis. A trellis is a matrix in which each time step is a column,
and each row shows a different state. For example, here’s a trellis
with N = 4 states, and T = 5 frames:

Public domain image by Qef, 2009


Review HMM Recognition Segmentation Example Summary Example

Visualizing the Forward Algorithm using a Trellis

Using a trellis, the initialize step computes probabilities for the


first column of the trellis:

α1 (i) = πi bi (~x1 ), 1≤i ≤N


Review HMM Recognition Segmentation Example Summary Example

Visualizing the Forward Algorithm using a Trellis

The iterate step then computes the probabilities in the t th column


by adding up the probabilities in the (t − 1)st column, each
multiplied by the corresponding transition probability:
N
X
αt (j) = αt−1 (i)aij bj (~xt ), 1 ≤ j ≤ N, 2 ≤ t ≤ T
i=1
Review HMM Recognition Segmentation Example Summary Example

Visualizing the Forward Algorithm using a Trellis

The terminate step then computes the likelihood of the model by


adding the probabilities in the last column:
N
X
p(X |Λ) = αT (i)
i=1
Review HMM Recognition Segmentation Example Summary Example

The Forward Algorithm: Computational Complexity

Most of the computational complexity is in this step:


Iterate:
N
X
αt (j) = αt−1 (i)aij bj (~xt ), 1 ≤ i, j ≤ N, 2 ≤ t ≤ T
i=1

Its complexity is:


For each of T − 1 time steps, 2 ≤ t ≤ T ,. . .
we need to calculate N different alpha-variables, αt (j), for
1 ≤ j ≤ N,. . .
each of which requires a summation with N terms.
So the total complexity is O TN 2 . For example, with N = 10


and T = 100, the complexity is only TN 2 = 10, 000 multiplies


(much, much less than N T !!)
Review HMM Recognition Segmentation Example Summary Example

Outline

1 Review: Bayesian Classifiers

2 Hidden Markov Models

3 Recognition: the Forward Algorithm

4 Segmentation: the Backward Algorithm

5 Numerical Example

6 Summary

7 Written Example
Review HMM Recognition Segmentation Example Summary Example

The Segmentation Problem

There are different ways to define the segmentation problem. Let’s


define it this way:
We want to find the most likely state, qt = i, at time t,. . .
given knowledge of the entire sequence X = [~x1 , . . . , ~xT ], not
just the current observation. So for example, we don’t want
to recognize state i at time t if the surrounding observations,
~xt−1 and ~xt+1 , make it obvious that this choice is impossible.
Also,. . .
given knowledge of the HMM that produced this sequence, Λ.
In other words, we want to find the state posterior probability,
p(qt = i|X , Λ). Let’s define some more notation for the state
posterior probability, let’s call it

γt (i) = p(qt = i|X , Λ)


Review HMM Recognition Segmentation Example Summary Example

Use Bayes’ Rule

Suppose we already knew the joint probability, p(X , qt = i|Λ).


Then we could find the state posterior using Bayes’ rule:

p(X , qt = i|Λ)
γt (i) = p(qt = i|X , Λ) = PN
j=1 p(X , qt = j|Λ)
Review HMM Recognition Segmentation Example Summary Example

Use the Forward Algorithm

Let’s expand this:

p(X , qt = i|Λ) = p(qt = i, ~x1 , . . . , ~xT |Λ)

We already know about half of that:


αt (i) = p(qt = i, ~x1 , . . . , ~xt |Λ). We’re only missing this part:

p(X , qt = i|Λ) = αt (i)p(~xt+1 , . . . , ~xT |qt = i, Λ)

Again, let’s try the trick of “solve the problem by inventing new
notation.” Let’s define

βt (i) ≡ p(~xt+1 , . . . , ~xT |qt = i, Λ)


Review HMM Recognition Segmentation Example Summary Example

The Backward Algorithm

Now let’s use the definition βt (i) ≡ p(~xt+1 , . . . , ~xT |qt = i, Λ), and
see how we can compute that.
1 Initialize:
βT (i) = 1, 1 ≤ i ≤ N
This might not seem immediately obvious, but think about it.
Given that there are no more ~x vectors after time T , what is
the probability that there are no more ~x vectors after time T ?
Well, 1, obviously.
Review HMM Recognition Segmentation Example Summary Example

The Backward Algorithm

Now let’s use the definition βt (i) ≡ p(~xt+1 , . . . , ~xT |qt = i, Λ), and
see how we can compute that.
1 Initialize:
βT (i) = 1, 1 ≤ i ≤ N
2 Iterate:

βt (i) = p(~xt+1 , . . . , ~xT |qt = i, Λ)


N
X
= p(qt+1 = j|qt = i)p(~xt+1 |qt+1 = j)p(~xt+2 , . . . , ~xT |qt+1 = j)
j=1
N
X
= aij bj (~xt+1 )βt+1 (j)
j=1
Review HMM Recognition Segmentation Example Summary Example

The Backward Algorithm

Now let’s use the definition βt (i) ≡ p(~xt+1 , . . . , ~xT |qt = i, Λ), and
see how we can compute that.
1 Initialize:
βT (i) = 1, 1 ≤ i ≤ N
2 Iterate:
N
X
βt (i) = aij bj (~xt+1 )βt+1 (j), 1 ≤ i ≤ N, 1 ≤ t ≤ T − 1
j=1

3 Terminate:
N
X
p(X |Λ) = πi bi (~x1 )β1 (i)
i=1
Review HMM Recognition Segmentation Example Summary Example

The Backward Algorithm: Computational Complexity

Most of the computational complexity is in this step:


Iterate:
N
X
βt (i) = aij bj (~xt+1 )βt+1 (j), 1 ≤ i ≤ N, 2 ≤ t ≤ T
j=1

Its complexity is:


For each of T − 1 time steps, 1 ≤ t ≤ T − 1,. . .
we need to calculate N different beta-variables, βt (i), for
1 ≤ i ≤ N,. . .
each of which requires a summation with N terms.
So the total complexity is O TN 2 .

Review HMM Recognition Segmentation Example Summary Example

Use Bayes’ Rule

The segmentation probability is then

p(X , qt = i|Λ)
γt (i) = PN
k=1 p(X , qt = k|Λ)
p(~x1 , . . . , ~xt , qt = i|Λ)p(~xt+1 , . . . , ~xT |qt = i, Λ)
= PN
k=1 p(~ x1 , . . . , ~xt , qt = k|Λ)p(~xt+1 , . . . , ~xT |qt = k, Λ)
αt (i)βt (i)
= PN
k=1 αt (k)βt (k)
Review HMM Recognition Segmentation Example Summary Example

What About State Sequences?

Notice a problem: γt (i) only tells us about one frame at a


time! It doesn’t tell us anything about the probability of a
sequence of states, covering a sequence of frames!
. . . but we can extend the same reasoning to cover two or
more consecutive frames. For example, let’s define:

ξt (i, j) = p(qt = i, qt+1 = j|X , Λ)

We can solve for ξt (i, j) using the same reasoning that we


used for γt (i)!
Review HMM Recognition Segmentation Example Summary Example

Segmentation: The Backward Algorithm


In summary, we now
 have three new probabilities, all of which can
be computed in O TN 2 time:
1 The Backward Probability:

βt (i) = p(~xt+1 , . . . , ~xT |qt = i, Λ)

2 The State Posterior:


αt (i)βt (i)
γt (i) = p(qt = i|X , Λ) = PN
k=1 αt (k)βt (k)

3 The Segment Posterior:

ξt (i, j) = p(qt = i, qt+1 = j|X , Λ)


αt (i)aij bj (~xt+1 )βt+1 (j)
= PN PN
k=1 `=1 αt (k)ak` b` (~ xt+1 )βt+1 (`)
Review HMM Recognition Segmentation Example Summary Example

Outline

1 Review: Bayesian Classifiers

2 Hidden Markov Models

3 Recognition: the Forward Algorithm

4 Segmentation: the Backward Algorithm

5 Numerical Example

6 Summary

7 Written Example
Review HMM Recognition Segmentation Example Summary Example

Example: Gumball Machines

“Gumball machines in a Diner at Dallas, Texas, in 2008,” Andreas Praefcke, public domain image.
Review HMM Recognition Segmentation Example Summary Example

Example: Gumball Machines


Observation Probabilities: Suppose we have two gumball
machines, q = 1 and q = 2. Machine #1 contains 60%
Grapefruit gumballs, 40% Apple gumballs. Machine #2
contains 90% Apple, 10% Grapefruit.
( (
0.4 x = A 0.9 x = A
b1 (x) = , b2 (x) =
0.6 x = G 0.1 x = G
Initial State Probabilities: My friend George flips a coin to
decide which machine to use first.
πi = 0.5, i ∈ {1, 2}
Transition Probabilities: After he’s used a machine, George
flips two coins, and he only changes machines if both coins
come up heads. (
0.75 i = j
aij =
0.25 i 6= j
Review HMM Recognition Segmentation Example Summary Example

A Segmentation Problem

George bought three gumballs, using three quarters. The


three gumballs are (x1 = A, x2 = G , x3 = A).
Unfortunately, George is a bit of a goofball. The second of the
three “quarters” was actually my 1867 silver “Seated Liberty”
dollar, worth $4467.
Which of the two machines do I need to dismantle in order to
get my coin back?

Image used with permission of the National Numismatic Collection, National Museum of American History.
Review HMM Recognition Segmentation Example Summary Example

The Forward Algorithm: t = 1

Remember, the observation sequence is X = (A, G , A).

α1 (i) = πi b1 (i)
(
(0.5)(0.4) = 0.2 i = 1
=
(0.5)(0.9) = 0.45 i = 2
Review HMM Recognition Segmentation Example Summary Example

The Forward Algorithm: t = 2

Remember, the observation sequence is X = (A, G , A).


2
X
α2 (j) = α1 (i)aij bj (x2 )
i=1
(
α1 (1)a11 b1 (x2 ) + α1 (2)a21 b1 (x2 ) j = 1
=
α1 (1)a12 b2 (x2 ) + α1 (2)a22 b2 (x2 ) j = 2
(
(0.2)(0.75)(0.6) + (0.45)(0.25)(0.6) = 0.04125 j = 1
=
(0.2)(0.25)(0.1) + (0.45)(0.75)(0.1) = 0.03875 j = 2
Review HMM Recognition Segmentation Example Summary Example

The Backward Algorithm: t = 3

The backward algorithm always starts out with βT (i) = 1!

β3 (i) = 1, i ∈ {1, 2}
Review HMM Recognition Segmentation Example Summary Example

The Backward Algorithm: t = 2

Remember, the observation sequence is X = (A, G , A).


2
X
β2 (i) = aij bj (x3 )β3 (j)
j=1
(
a11 b1 (x3 ) + a12 b2 (x3 ) i = 1
=
a21 b1 (x3 ) + a22 b2 (x3 ) i = 2
(
(0.75)(0.4) + (0.25)(0.9) = 0.525 j =1
=
(0.25)(0.4) + (0.75)(0.9) = 0.775 j =2
Review HMM Recognition Segmentation Example Summary Example

The Solution to the Puzzle

Given the observation sequence is X = (A, G , A), the posterior


state probability is

α2 (i)β2 (i)
γ2 (i) = P2
k=1 α2 (k)β2 (k)
(0.04125)(0.525)
(
(0.04125)(0.525)+(0.03875)(0.775) = 0.42 i = 1
= (0.03875)(0.775)
(0.04125)(0.525)+(0.03875)(0.775) = 0.58 i = 2

So I should dismantle gumball machine #2, hoping to find my rare


1867 silver dollar. Good luck!
Review HMM Recognition Segmentation Example Summary Example

Outline

1 Review: Bayesian Classifiers

2 Hidden Markov Models

3 Recognition: the Forward Algorithm

4 Segmentation: the Backward Algorithm

5 Numerical Example

6 Summary

7 Written Example
Review HMM Recognition Segmentation Example Summary Example

The Forward Algorithm

Definition: αt (i) ≡ p(~x1 , . . . , ~xt , qt = i|Λ). Computation:


1 Initialize:
α1 (i) = πi bi (~x1 ), 1≤i ≤N
2 Iterate:
N
X
αt (j) = αt−1 (i)aij bj (~xt ), 1 ≤ j ≤ N, 2 ≤ t ≤ T
i=1

3 Terminate:
N
X
p(X |Λ) = αT (i)
i=1
Review HMM Recognition Segmentation Example Summary Example

The Backward Algorithm

Definition: βt (i) ≡ p(~xt+1 , . . . , ~xT |qt = i, Λ). Computation:


1 Initialize:
βT (i) = 1, 1≤i ≤N
2 Iterate:
N
X
βt (i) = aij bj (~xt+1 )βt+1 (j), 1 ≤ i ≤ N, 1 ≤ t ≤ T − 1
j=1

3 Terminate:
N
X
p(X |Λ) = πi bi (~x1 )β1 (i)
i=1
Review HMM Recognition Segmentation Example Summary Example

Hidden Markov Model

a13
a11 a22 a33
a12 a23
1 a21 2 a32 3
a31
b1 (~x ) b2 (~x ) b3 (~x )

~x ~x ~x

1 Start in state qt = i with pmf πi .


2 Generate an observation, ~x , with pdf bi (~x ).
3 Transition to a new state, qt+1 = j, according to pmf aij .
4 Repeat.
Review HMM Recognition Segmentation Example Summary Example

Outline

1 Review: Bayesian Classifiers

2 Hidden Markov Models

3 Recognition: the Forward Algorithm

4 Segmentation: the Backward Algorithm

5 Numerical Example

6 Summary

7 Written Example
Review HMM Recognition Segmentation Example Summary Example

Written Example

Joe’s magic shop opens at random, and closes at random. To be


more specific, if it’s currently closed, the probability that it will
open any time in the next hour is 10%; if it’s currently open, the
probability that it will close any time in the next hour is 10%.
The shop is in a busy part of town; when the shop is open, the
area gets even busier. If the shop is closed, the area is noisy with a
probability of 40%. If it’s open, the area is noisy with a probability
of 70%.
At 1:00, you notice that the area is noisy, so you go to check;
unfortunately, the shop is closed. At 2:00, the area is still noisy,
but you decide that it’s unlikely that the shop has opened in just
one hour. At 3:00 the area is still noisy, and at 4:00, and at 5:00.
How many hours in a row does the area need to be noisy before
you decide that, with a probability of greater than 50%, the shop is
open?

You might also like