0% found this document useful (0 votes)
58 views46 pages

2024 Fall CSE366 12 HMM

The document discusses Hidden Markov Models (HMMs) and their applications in modeling stochastic processes where the system states are not directly observable. It explains the Markov property, the structure of HMMs including transition and observation probabilities, and provides examples of HMMs in word recognition and character recognition. Key problems associated with HMMs such as evaluation, decoding, and learning are also outlined.

Uploaded by

ashik3232himu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views46 pages

2024 Fall CSE366 12 HMM

The document discusses Hidden Markov Models (HMMs) and their applications in modeling stochastic processes where the system states are not directly observable. It explains the Markov property, the structure of HMMs including transition and observation probabilities, and provides examples of HMMs in word recognition and character recognition. Key problems associated with HMMs such as evaluation, decoding, and learning are also outlined.

Uploaded by

ashik3232himu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Hidden Markov

Models
DR. RAIHAN UL ISLAM
A S S O C I AT E P R O F E S S O R
D E PA R T M E N T O F C O M P U T E R S C I E N C E & E N G I N E E R I N G
ROOM NO# 256
EMAIL: [email protected]
MOBILE: +8801992392611
Markov
Models
• In probability theory, a Markov model is a stochastic
model used to model pseudo-randomly changing systems.
• It is assumed that future states depend only on the
current state, not on the events that occurred before it
(that is, it assumes the Markov property).
• Generally, this assumption enables reasoning and
computation with the model that would otherwise
be intractable.
• In the fields of predictive modelling and probabilistic
forecasting, it is desirable for a given model to exhibit
the Markov property.
Markov
Models
• Set of states: {s1 , s2 ,…, s N }
• Process moves from one state to another generating
a sequence of states : si1 , si 2 ,…, sik ,…
•Markov chain property: probability of each subsequent
state depends only on what was the previous state:
P(sik | si1, si 2 ,…, sik1 )  P(sik | sik1 )
• To define Markov model, the following probabilities have to be
specified: transition probabilities aij  P(si | s and
initial probabilities )j
i 
P(si )
Markov models
System state is fully System state is partially
observable observable
System is autonomous Markov chain Hidden Markov model
Partially observable
System is controlled Markov decision process
Markov decision process
Example of Markov
Model
0.3 0.7

Rain Dry

0.2 0.8

• Two states : ‘Rain’ and ‘Dry’.


• Transition probabilities: P(‘Rain’|‘Rain’)=0.3 ,
P(‘Dry’|‘Rain’)=0.7 , P(‘Rain’|‘Dry’)=0.2, P(‘Dry’|‘Dry’)=0.8
• Initial probabilities: say P(‘Rain’)=0.4 , P(‘Dry’)=0.6 .
Calculation of sequence
probability
• By Markov chain property, probability of state sequence can be
found by the formula:
P(si1 , si 2 ,…, sik )  P(sik | si1, si 2 ,…, sik1 )P(si1 , si 2 ,…, sik1
)
 P(sik | sik1 )P(si1 , si 2 ,…, sik1 )  …
 P(sik | sik1 )P(sik1 | sik2 )…P(s i 2 | si1 )P(si1 )

•Suppose we want to calculate a probability of a sequence of


states in our example, {‘Dry’,’Dry’,’Rain’,Rain’}.
P({‘Dry’,’Dry’,’Rain’,Rain’} ) =
P(‘Rain’|’Rain’) P(‘Rain’|’Dry’) P(‘Dry’|’Dry’)
Hidden Markov models.
• Set of states: {s1 , s2 ,…, s N }
• Process moves from one state to another generating a
sequence of states : si1 , si 2 ,…, sik ,…
• Markov chain property: probability of each subsequent
state
depends only on what was the previous state:
P(sik | si1, si 2 ,…, sik1 )  P(sik | sik1 )
• States are not visible, but each state randomly generates
one of M
observations (or visible states) {v1 , v2 ,…, vM }
Hidden Markov models.
•To define hidden Markov model, the following probabilities have to
be specified:
matrix of transition probabilities A=(aij),aij= P(si | sj) ,

matrix of observation probabilities B=(bi (vm )), bi(vm ) = P(vm | si)

a vector of initial probabilities =(i),i = P(si) .

Model is represented by M=(A, B, ).


Example of Hidden Markov
Model
0.3 0.7

Low High

0.2 0.8

0.6 0.6
0.4 0.4

Rain Dry
Example of Hidden Markov
Model
• Two states : ‘Low’ and ‘High’ atmospheric pressure.
• Two observations : ‘Rain’ and ‘Dry’.
• Transition probabilities:
P(‘Low’|‘Low’)=0.3 ,
P(‘High’|‘Low’)=0.7 ,
P(‘Low’|‘High’)=0.2,
P(‘High’|‘High’)=0.8

•Observation probabilities :
P(‘Rain’|‘Low’)=0.6 , P(‘Dry’|‘Low’)=0.4 ,
P(‘Rain’|‘High’)=0.4 , P(‘Dry’|‘High’)=0.3 .

• Initial probabilities: say P(‘Low’)=0.4 , P(‘High’)=0.6 .


Calculation of observation sequence probability
•Suppose we want to calculate a probability of a sequence of observations in our example,
{‘Dry’,’Rain’}.
• Consider all possible hidden state sequences:

P({‘Dry’,’Rain’} ) =
P({‘Dry’,’Rain’} , {‘Low’,’Low’}) +
P({‘Dry’,’Rain’} , {‘Low’,’High’}) +
P({‘Dry’,’Rain’} ,{‘High’,’Low’}) +
P({‘Dry’,’Rain’} , {‘High’,’High’})

where first term is : [here we consider Low->Low]


P({‘Dry’,’Rain’} , {‘Low’,’Low’})
= P({‘Dry’,’Rain’} | {‘Low’,’Low’})P({‘Low’,’Low’})
= P(‘Low’) P(‘Dry’|’Low’) P(‘Low’|’Low’)P(‘Rain’|’Low’)
= 0.4*0.4*0.3*0.6 = 0.0288
Calculation of observation sequence probability
•Suppose we want to calculate a probability of a sequence of observations in our example,
{‘Dry’,’Rain’}.
•Consider all possible hidden state sequences:
P({‘Dry’,’Rain’} ) =
P({‘Dry’,’Rain’} , {‘Low’,’Low’}) +
P({‘Dry’,’Rain’} , {‘Low’,’High’}) +
P({‘Dry’,’Rain’} ,{‘High’,’Low’}) +
P({‘Dry’,’Rain’} , {‘High’,’High’})

where second term is : [Here we consider Low->High]


P({‘Dry’,’Rain’} , {‘Low’,’High’})
= P({‘Dry’,’Rain’} | {‘Low’,’High’}) P({‘Low’,’High’})
= P(‘Low’) P(‘Dry’|’Low’) P(‘High’|’Low) P(‘Rain’|’High’) = 0.4*0.4*0.7*0.4 = 0.0448
Calculation of observation sequence probability
•Suppose we want to calculate a probability of a sequence of observations in our example,
{‘Dry’,’Rain’}.
•Consider all possible hidden state sequences:
P({‘Dry’,’Rain’} ) =
P({‘Dry’,’Rain’} , {‘Low’,’Low’}) +
P({‘Dry’,’Rain’} , {‘Low’,’High’}) +
P({‘Dry’,’Rain’} , {‘High’,’Low’}) +
P({‘Dry’,’Rain’} , {‘High’,’High’})
where third term is : [Here we consider High->Low]
P({‘Dry’,’Rain’} , {‘High’,’Low’})
= P({‘Dry’,’Rain’} | {‘High’,’Low’}) P({‘High’,’Low’})
= P(‘High’) P(‘Dry’|’High’) P(‘Low’|’High’) P(‘Rain’|’Low’) = 0.6*0.3*0.2*0.6 =0.0216
Calculation of observation sequence probability
•Suppose we want to calculate a probability of a sequence of observations in our example,
{‘Dry’,’Rain’}.
•Consider all possible hidden state sequences:
P({‘Dry’,’Rain’} ) = P({‘Dry’,’Rain’} , {‘Low’,’Low’}) + P({‘Dry’,’Rain’} ,
‘Low’,’High’})
+ P({‘Dry’,’Rain’} ,{‘High’,’Low’}) + P({‘Dry’,’Rain’} ,
{‘High’,’High’})

where fourth term is : [Here we consider High->High]


P({‘Dry’,’Rain’} , {‘High’,’High’})
= P({‘Dry’,’Rain’} | {‘High’,’High’}) P({‘High’,’High’})
= P(‘High’) P(‘Dry’|’High’) P(‘High’|’High’) P(‘Rain’|’High’) = 0.6*0.3*0.8*0.4 = 0.0576
Summing these contributions gives
:
• Evaluation problem. Given the HMM M=(A, B, ) and
observation sequence O=o1 o2 ... oK , calculate the probability
the that
model M has generated sequence O .
• Decoding problem. Given the HMM M=(A, B, and
)
observation sequence O=o1 o2 ... the
oK , calculate the most likely sequence of hidden
states si that produced this observation sequence O.
•Learning problem. Given some training observation sequences O=o1 o2 ... oK
and general structure of HMM (numbers of hidden and visible states), determine
HMM parameters M=(A, B, ) that best fit training data.

O=o1...oK denotes a sequence of observations ok{v1,…,v }. M


Word recognition example(1).
• Typed word recognition, assume all characters are separated.

•Character recognizer outputs probability of the image being


particular character, P(image|character).
a 0.5
b 0.03

c 0.005

z 0.31
Hidden state Observation
Word recognition
example(2).
• Hidden states of HMM = characters.

•Observations = typed images of characters segmented from the


image v . Note that there is an infinite number of
observations

• Observation probabilities = character recognizer scores.


B  bi (v )   P(v | si ) 

•Transition probabilities will be defined differently in two


subsequent models.
Word recognition
example(3).
•If lexicon is given, we can construct separate HMM
models for each lexicon word.

Amherst a m h e r s t

Buffalo b u f f a l o

0.5 0.03 0.4 0.6

• Here recognition of word image is equivalent to the problem


of evaluating few HMM models.
•This is an application of Evaluation problem.
Word recognition
example(4).
• We can construct a single HMM for all words.
• Hidden states = all characters in the alphabet.
•Transition probabilities and initial probabilities are calculated
from language model.
• Observations and observation probabilities are as before.

a m r
f t
o

b h v
e s

•Here we have to determine the best sequence of hidden states,


the one that most likely produced word image.
• This is an application of Decoding problem.
Character recognition with HMM
example.
• The structure of hidden states is
chosen.

• Observations are feature vectors extracted from vertical


slices.

A
• Probabilistic mapping from hidden state to feature
vectors:
1. use mixture of Gaussian models
2. Quantize feature vector space.
Exercise: character recognition with
HMM(1)
• The structure of hidden states: s1 s2 s3
• Observation = number of islands in the vertical slice.
•HMM for character ‘A’ :
 .8 .2
0 
Transition probabilities: {aij}=  0 .8 .2 
 0
 .9 .1
0 
A
Observation probabilities: {bjk}= 0  .1 .8 .1 
 .9 .1
1
 .8 .2

B
0 
0 
•Transition
HMM for character
probabilities: {aij‘B’
}=  :0 .8 .2 
 .9 .1
 0
0 
Observation probabilities: {bjk}= 0  0 .2 .8 
Exercise: character recognition with
HMM(2)
•Suppose that after character image segmentation the following
sequence of island numbers in 4 slices was observed:
{ 1, 3, 2, 1}

• What HMM is more likely to generate this observation


sequence , HMM for ‘A’ or HMM for ‘B’ ?
HMM(3)
Consider likelihood of generating given observation for
each possible sequence of hidden states:
• HMM for character ‘A’:
Hidden state sequence Transition probabilities Observation
probabilities
s  s  s s
1 1 2 3 .8  .2  .2  .9  0  .8  .9 = 0
s1 s2 s2s3 .2  .8  .2  .9  .1  .8  .9 = 0.0020736
s1 s2 s3s3 .2  .2  1  .9  .1  .1  .9 = 0.000324
Total = 0.0023976
• HMM for character ‘B’:
Hidden state sequence Transition Observation probabilities
probabilities
s  s  s s
1 1 2 3 .8  .2  .2  .9  0  .2  .6 = 0
s1 s2 s2s3 .2  .8  .2  .9  .8  .2  .6 = 0.0027648
s1 s2 s3s3 .2  .2  1  .9  .8  .4  .6 = 0.006912
Total = 0.0096768
Problem.
•Evaluation problem. Given the HMM M=(A, B, and
)
observation sequence O=o1 o2 ... oK the
, calculate the probability
that model M has generated sequence O.
• Trying to find probability of observations O=o1 o2 ... oK
by
means of considering all hidden state sequences (as was done in
example) is impractical:
NK hidden state sequences - exponential complexity.

•Define the forward variable k(i) as the joint probability of the


partial observation sequence o1 o2 ... ok and that the hidden state at
time k is si : k(i)= P(o1 o2 ... ok , qk= si )
Trellis representation of an
HMM
o1 ok oK =
ok+1 Observations

s1 s1
s1 s1
a1j
s2 s2
s2 s2
a2j

si si sj si
aij

aNj
sN sN sN sN
Time= 1 k k+1 K
Forward recursion
for HMM
• Initialization:
1(i)= P(o1 , q1= si ) = i bi (o1) , 1<=i<=N.
• Forward recursion:
k+1(i)= P(o1 o2 ... ok+1 , qk+1= sj ) =
i P(o1 o2 ... ok+1 , qk= si , qk+1= sj ) =
i P(o1 o2 ... ok , qk= si) aij bj (ok+1 ) =
[i k(i) aij ] bj (ok+1 ) , 1<=j<=N, 1<=k<=K-1.
• Termination:
P(o1 o2 ... oK) = i P(o1 o2 ... oK , qK= si) = i K(i)
• Complexity :
N2K operations.
The Forward
Algorithm
The core idea behind the Forward Algorithm is to calculate the probability
of being in a particular state at a particular time, given all the observations
up to that time. We then use these probabilities to compute the
probability of the entire observed sequence.

Let's define some notation:


•α_t(i): The probability of being in state i at time t, having observed the first
t observations. This is called the forward probability.
•N: The number of hidden states in the HMM.
•T: The length of the observed sequence.
•A: The transition probability matrix (N x N).
•B: The emission probability matrix (N x M).
•π: The initial state probability vector (N x 1).
•O: The observed sequence O = {o_1, o_2, ..., o_T}.
Steps of the Algorithm
Initialization
For each state i (1 ≤ i ≤ N):
α_1(i) = π_i * b_i(o_1)
• This calculates the probability of starting in state i and observing the first
observation o_1.
Recursion
For each time step t (2 ≤ t ≤ T) and each state i (1 ≤ i ≤ N):
α_t(i) = [ ∑_(j=1)^N α_(t-1)(j) * a_ji ] * b_i(o_t)
This calculates the probability of being in state i at time t by summing over all possible previous
states j at time t-1.
We consider the probability of transitioning from state j to state i (a_ji), the probability of being in
state j at time t-1 (α_(t-1)(j)), and the probability of emitting observation o_t from state i (b_i(o_t)).
Termination
P(O | λ) = ∑_(i=1)^N α_T(i)
This sums the probabilities of being in any state at the final time T, giving us the
probability of the entire observed sequence.
The Umbrella HMM
Imagine a simple HMM where the hidden states represent whether it's
raining or not ('Rainy' and 'Sunny'), and the observations represent
whether someone is carrying an umbrella ('Umbrella' and 'No Umbrella’).
Initial State Probabilities:
• π_Rainy = 0.5 (50% chance of starting in the 'Rainy' state)
• π_Sunny = 0.5 (50% chance of starting in the 'Sunny' state)

Transition Probabilities:
• a_RainyRainy = 0.7 (70% chance of staying 'Rainy' if it's already 'Rainy')
• a_RainySunny = 0.3 (30% chance of transitioning from 'Rainy' to 'Sunny')
• a_SunnyRainy = 0.4 (40% chance of transitioning from 'Sunny' to 'Rainy')
• a_SunnySunny = 0.6 (60% chance of staying 'Sunny' if it's already 'Sunny’)

Emission Probabilities:
• b_RainyUmbrella = 0.9 (90% chance of carrying an umbrella if it's 'Rainy')
• b_RainyNoUmbrella = 0.1 (10% chance of not carrying an umbrella if it's 'Rainy')
• b_SunnyUmbrella = 0.2 (20% chance of carrying an umbrella if it's 'Sunny')
• b_SunnyNoUmbrella = 0.8 (80% chance of not carrying an umbrella if it's 'Sunny')
Let's say we observe the following sequence over three days:
• O = {Umbrella, No Umbrella, Umbrella}

Applying the Forward Algorithm

1.Initialization
(t = 1)
• α_1(Rainy) = π_Rainy * b_RainyUmbrella = 0.5 * 0.9 = 0.45
• α_1(Sunny) = π_Sunny * b_SunnyUmbrella = 0.5 * 0.2 = 0.1
2.Recursion
•t = 2
• α_2(Rainy)
◦ = [α_1(Rainy) * a_RainyRainy + α_1(Sunny) * a_SunnyRainy] *b_RainyNoUmbrella
◦ = [(0.45 * 0.7) + (0.1 * 0.4)] * 0.1 = 0.0355
• α_2(Sunny)
◦ = [α_1(Rainy) * a_RainySunny + α_1(Sunny) * a_SunnySunny] * b_SunnyNoUmbrella
◦ = [(0.45 * 0.3) + (0.1 * 0.6)] * 0.8 = 0.156

• t=3
• α_3(Rainy)
◦ = [α_2(Rainy) * a_RainyRainy + α_2(Sunny) * a_SunnyRainy] * b_RainyUmbrella
◦ = [(0.0355 * 0.7) + (0.156 * 0.4)] * 0.9 = 0.07707
• α_3(Sunny)
◦ = [α_2(Rainy) * a_RainySunny + α_2(Sunny) * a_SunnySunny] * b_SunnyUmbrella
◦ = [(0.0355 * 0.3) + (0.156 * 0.6)] * 0.2= 0.02112

3. Termination
for HMM
•Define the forward variable k(i) as the joint probability of the
partial observation sequence ok+1 ok+2 ... oK given that the
hidden state at time k is si : k(i)= P(ok+1 ok+2 ... oK |qk= si )
• Initialization:
K(i)= 1 , 1<=i<=N.
• Backward recursion:
k(j)= P(ok+1 ok+2 ... oK | qk= sj ) =
i P(ok+1 ok+2 ... oK , qk+1= si | qk= sj ) =
i P(ok+2 ok+3 ... oK | qk+1= si) aji bi (ok+1 ) =
i k+1(i) aji bi (ok+1 ) , 1<=j<=N, 1<=k<=K-1.
• Termination:
P(o1 o2 ... oK) = i P(o1 o2 ... oK , q1= si) =
i P(o1 o2 ... oK |q1= si) P(q1= si) = i 1(i) bi (o1) i
problem
•Decoding problem. Given the HMM M=(A, B, and
)
observation sequence O=o1 o2 ... oK , calculate the mostthe
likely
sequence of hidden states si that produced this observation sequence.
• We want to find the state sequence Q= q1…qK which maximizes
P(Q | o1 o2 ... oK ) , or equivalently P(Q , o1 o2 ... oK ) .
•Brute force consideration of all paths takes exponential time. Use
efficient Viterbi algorithm instead.
•Define variable k(i) as the maximum probability of producing
observation sequence o1 o2 ... ok when moving along any hidden
state sequence q1… qk-1 and getting into qk= si .
k(i) = max P(q1… qk-1 , qk= si , o1 o2 ... ok)
where max is taken over all possible paths q1… qk-1 .
Viterbi
• General idea:
algorithm (1)
if best path ending in qk= sj goes through qk-1= si then it
should coincide with best path ending in qk-1= si .
qk-1 qk

s1
a1j

si sj
aij

sN aNj
• k(i) = max P(q1… qk-1 , qk= sj , o1 o2 ... ok) =
maxi [ aij bj (ok ) max P(q1… qk-1= si , o1 o2 ... ok-1) ]
• To backtrack best path keep info that predecessor of sj was si.
Viterbi
• Initialization:
algorithm (2)
1(i) = max P(q1= si , o1) = i bi (o1) , 1<=i<=N.
•Forward recursion:
k(j) = max P(q1… qk-1 , qk= sj , o1 o2 ... ok) =
maxi [ aij bj (ok ) max P(q1… qk-1= si , o1 o2 ... ok-1) ] =
maxi [ aij bj (ok ) k-1(i) ] , 1<=j<=N, 2<=k<=K.

•Termination: choose best path ending at time K


maxi [ K(i) ]
• Backtrack best path.

This algorithm is similar to the forward recursion of evaluation


problem, with  replaced by max and additional backtracking.
Learning
problem (1)

•Learning problem. Given some training observation sequences


O=o1 o2 ... oK and general structure of HMM (numbers of
hidden and visible states), determine HMM parameters M=(A,
B, ) that best fit training data, that is maximizes P(O | M)
.

• There is no algorithm producing optimal parameter values.

•Use iterative expectation-maximization algorithm to find local


maximum of P(O |M) - Baum-Welch algorithm.
Learning
problem (2)
•If training data has information about sequence of hidden states
(as in word recognition example), then use maximum likelihood
estimation of parameters:
Number of transitions from state sj to
aij= P(si | sj) = Number of transitions out of state sj
state si

Number of times observation vm occurs in state si


bi(vm ) = P(vm | si)= Number of times in state si
Baum-Welch
algorithm
General idea:

Expected number of transitions from state sj to state si


aij= P(si | sj) = Expected number of transitions out of state sj

vm occurs in state si
Expected number of times observation
bi(vm ) = P(vm | si)= Expected number of times in state si

i = P(si) = Expected frequency in state si at time


k=1.
Baum-Welch algorithm:
expectation step(1)
•Define variable  (i,j) as
k the probability of being in state si at time k and in st
sj at time k+1, given the observation sequence o1 o2 ... o
k(i,j)= P(qk= si , qk+1= sj |o1 o2 ... oK)

P(qk= si , qk+1= sj , o1 o2 ... ok)


k(i,j)= =
P(o1 o2 ... ok)
P(qk= si , o1 o2 ... ok) aij bj (ok+1 ) P(ok+2 ... oK | qk+1= sj )
=
P(o1 o2 ... ok)
k(i) aij bj (ok+1 ) k+1(j)
i j k(i) aij bj (ok+1 ) k+1(j)
Baum-Welch algorithm:
expectation step(2)
•Define variable k(i) as the probability of being in state si
at time k, given the observation sequence o1 o2 ... oK .
k(i)= P(qk= si |o 1 o2 ... oK)

P(qk= si , o1 o2 ... ok) k(i) k(i)


k =
(i)=
P(o1 o2 ... ok) i k(i) k(i)
Baum-Welch algorithm:
expectation step(3)

•We calculated k(i,j) = P(qk= si , qk+1= sj |o1 o2 ... oK)


and k(i)= P(qk= si |o 1 o2 ... oK)
• Expected number of transitions from state si to state sj
=
= k k(i,j)
• Expected number of transitions out of state si = k
k(i)
• Expected number of times observation vm occurs in
state si =
= k k(i) , k is such that ok= vm
Baum-Welch algorithm:
maximization step
Expected number of transitions from state sj to state si k k(i,j)
aij = =
Expected number of transitions out of state sj k 
(i)k

Expected number of times observation vmoccurs in state si


k k(i,j)
bi(vm ) = Expected number of times in state si = k,o = v  (i)
k km

i = (Expected frequency in state si at time k=1) =


1(i).
Thank you

16-Nov-23 46

You might also like