2024 Fall CSE366 12 HMM
2024 Fall CSE366 12 HMM
Models
DR. RAIHAN UL ISLAM
A S S O C I AT E P R O F E S S O R
D E PA R T M E N T O F C O M P U T E R S C I E N C E & E N G I N E E R I N G
ROOM NO# 256
EMAIL: [email protected]
MOBILE: +8801992392611
Markov
Models
• In probability theory, a Markov model is a stochastic
model used to model pseudo-randomly changing systems.
• It is assumed that future states depend only on the
current state, not on the events that occurred before it
(that is, it assumes the Markov property).
• Generally, this assumption enables reasoning and
computation with the model that would otherwise
be intractable.
• In the fields of predictive modelling and probabilistic
forecasting, it is desirable for a given model to exhibit
the Markov property.
Markov
Models
• Set of states: {s1 , s2 ,…, s N }
• Process moves from one state to another generating
a sequence of states : si1 , si 2 ,…, sik ,…
•Markov chain property: probability of each subsequent
state depends only on what was the previous state:
P(sik | si1, si 2 ,…, sik1 ) P(sik | sik1 )
• To define Markov model, the following probabilities have to be
specified: transition probabilities aij P(si | s and
initial probabilities )j
i
P(si )
Markov models
System state is fully System state is partially
observable observable
System is autonomous Markov chain Hidden Markov model
Partially observable
System is controlled Markov decision process
Markov decision process
Example of Markov
Model
0.3 0.7
Rain Dry
0.2 0.8
Low High
0.2 0.8
0.6 0.6
0.4 0.4
Rain Dry
Example of Hidden Markov
Model
• Two states : ‘Low’ and ‘High’ atmospheric pressure.
• Two observations : ‘Rain’ and ‘Dry’.
• Transition probabilities:
P(‘Low’|‘Low’)=0.3 ,
P(‘High’|‘Low’)=0.7 ,
P(‘Low’|‘High’)=0.2,
P(‘High’|‘High’)=0.8
•Observation probabilities :
P(‘Rain’|‘Low’)=0.6 , P(‘Dry’|‘Low’)=0.4 ,
P(‘Rain’|‘High’)=0.4 , P(‘Dry’|‘High’)=0.3 .
P({‘Dry’,’Rain’} ) =
P({‘Dry’,’Rain’} , {‘Low’,’Low’}) +
P({‘Dry’,’Rain’} , {‘Low’,’High’}) +
P({‘Dry’,’Rain’} ,{‘High’,’Low’}) +
P({‘Dry’,’Rain’} , {‘High’,’High’})
c 0.005
z 0.31
Hidden state Observation
Word recognition
example(2).
• Hidden states of HMM = characters.
Amherst a m h e r s t
Buffalo b u f f a l o
a m r
f t
o
b h v
e s
A
• Probabilistic mapping from hidden state to feature
vectors:
1. use mixture of Gaussian models
2. Quantize feature vector space.
Exercise: character recognition with
HMM(1)
• The structure of hidden states: s1 s2 s3
• Observation = number of islands in the vertical slice.
•HMM for character ‘A’ :
.8 .2
0
Transition probabilities: {aij}= 0 .8 .2
0
.9 .1
0
A
Observation probabilities: {bjk}= 0 .1 .8 .1
.9 .1
1
.8 .2
B
0
0
•Transition
HMM for character
probabilities: {aij‘B’
}= :0 .8 .2
.9 .1
0
0
Observation probabilities: {bjk}= 0 0 .2 .8
Exercise: character recognition with
HMM(2)
•Suppose that after character image segmentation the following
sequence of island numbers in 4 slices was observed:
{ 1, 3, 2, 1}
s1 s1
s1 s1
a1j
s2 s2
s2 s2
a2j
si si sj si
aij
aNj
sN sN sN sN
Time= 1 k k+1 K
Forward recursion
for HMM
• Initialization:
1(i)= P(o1 , q1= si ) = i bi (o1) , 1<=i<=N.
• Forward recursion:
k+1(i)= P(o1 o2 ... ok+1 , qk+1= sj ) =
i P(o1 o2 ... ok+1 , qk= si , qk+1= sj ) =
i P(o1 o2 ... ok , qk= si) aij bj (ok+1 ) =
[i k(i) aij ] bj (ok+1 ) , 1<=j<=N, 1<=k<=K-1.
• Termination:
P(o1 o2 ... oK) = i P(o1 o2 ... oK , qK= si) = i K(i)
• Complexity :
N2K operations.
The Forward
Algorithm
The core idea behind the Forward Algorithm is to calculate the probability
of being in a particular state at a particular time, given all the observations
up to that time. We then use these probabilities to compute the
probability of the entire observed sequence.
Transition Probabilities:
• a_RainyRainy = 0.7 (70% chance of staying 'Rainy' if it's already 'Rainy')
• a_RainySunny = 0.3 (30% chance of transitioning from 'Rainy' to 'Sunny')
• a_SunnyRainy = 0.4 (40% chance of transitioning from 'Sunny' to 'Rainy')
• a_SunnySunny = 0.6 (60% chance of staying 'Sunny' if it's already 'Sunny’)
Emission Probabilities:
• b_RainyUmbrella = 0.9 (90% chance of carrying an umbrella if it's 'Rainy')
• b_RainyNoUmbrella = 0.1 (10% chance of not carrying an umbrella if it's 'Rainy')
• b_SunnyUmbrella = 0.2 (20% chance of carrying an umbrella if it's 'Sunny')
• b_SunnyNoUmbrella = 0.8 (80% chance of not carrying an umbrella if it's 'Sunny')
Let's say we observe the following sequence over three days:
• O = {Umbrella, No Umbrella, Umbrella}
1.Initialization
(t = 1)
• α_1(Rainy) = π_Rainy * b_RainyUmbrella = 0.5 * 0.9 = 0.45
• α_1(Sunny) = π_Sunny * b_SunnyUmbrella = 0.5 * 0.2 = 0.1
2.Recursion
•t = 2
• α_2(Rainy)
◦ = [α_1(Rainy) * a_RainyRainy + α_1(Sunny) * a_SunnyRainy] *b_RainyNoUmbrella
◦ = [(0.45 * 0.7) + (0.1 * 0.4)] * 0.1 = 0.0355
• α_2(Sunny)
◦ = [α_1(Rainy) * a_RainySunny + α_1(Sunny) * a_SunnySunny] * b_SunnyNoUmbrella
◦ = [(0.45 * 0.3) + (0.1 * 0.6)] * 0.8 = 0.156
• t=3
• α_3(Rainy)
◦ = [α_2(Rainy) * a_RainyRainy + α_2(Sunny) * a_SunnyRainy] * b_RainyUmbrella
◦ = [(0.0355 * 0.7) + (0.156 * 0.4)] * 0.9 = 0.07707
• α_3(Sunny)
◦ = [α_2(Rainy) * a_RainySunny + α_2(Sunny) * a_SunnySunny] * b_SunnyUmbrella
◦ = [(0.0355 * 0.3) + (0.156 * 0.6)] * 0.2= 0.02112
3. Termination
for HMM
•Define the forward variable k(i) as the joint probability of the
partial observation sequence ok+1 ok+2 ... oK given that the
hidden state at time k is si : k(i)= P(ok+1 ok+2 ... oK |qk= si )
• Initialization:
K(i)= 1 , 1<=i<=N.
• Backward recursion:
k(j)= P(ok+1 ok+2 ... oK | qk= sj ) =
i P(ok+1 ok+2 ... oK , qk+1= si | qk= sj ) =
i P(ok+2 ok+3 ... oK | qk+1= si) aji bi (ok+1 ) =
i k+1(i) aji bi (ok+1 ) , 1<=j<=N, 1<=k<=K-1.
• Termination:
P(o1 o2 ... oK) = i P(o1 o2 ... oK , q1= si) =
i P(o1 o2 ... oK |q1= si) P(q1= si) = i 1(i) bi (o1) i
problem
•Decoding problem. Given the HMM M=(A, B, and
)
observation sequence O=o1 o2 ... oK , calculate the mostthe
likely
sequence of hidden states si that produced this observation sequence.
• We want to find the state sequence Q= q1…qK which maximizes
P(Q | o1 o2 ... oK ) , or equivalently P(Q , o1 o2 ... oK ) .
•Brute force consideration of all paths takes exponential time. Use
efficient Viterbi algorithm instead.
•Define variable k(i) as the maximum probability of producing
observation sequence o1 o2 ... ok when moving along any hidden
state sequence q1… qk-1 and getting into qk= si .
k(i) = max P(q1… qk-1 , qk= si , o1 o2 ... ok)
where max is taken over all possible paths q1… qk-1 .
Viterbi
• General idea:
algorithm (1)
if best path ending in qk= sj goes through qk-1= si then it
should coincide with best path ending in qk-1= si .
qk-1 qk
s1
a1j
si sj
aij
sN aNj
• k(i) = max P(q1… qk-1 , qk= sj , o1 o2 ... ok) =
maxi [ aij bj (ok ) max P(q1… qk-1= si , o1 o2 ... ok-1) ]
• To backtrack best path keep info that predecessor of sj was si.
Viterbi
• Initialization:
algorithm (2)
1(i) = max P(q1= si , o1) = i bi (o1) , 1<=i<=N.
•Forward recursion:
k(j) = max P(q1… qk-1 , qk= sj , o1 o2 ... ok) =
maxi [ aij bj (ok ) max P(q1… qk-1= si , o1 o2 ... ok-1) ] =
maxi [ aij bj (ok ) k-1(i) ] , 1<=j<=N, 2<=k<=K.
vm occurs in state si
Expected number of times observation
bi(vm ) = P(vm | si)= Expected number of times in state si
16-Nov-23 46