24f 09 Hidden Markov Models
24f 09 Hidden Markov Models
1
Reasoning over Time or Space
– Often, we want to reason about a sequence of observations where the
state of the underlying system is changing
● Speech recognition
● Robot localization
● User attention
● Medical monitoring
2
Markov Models (aka Markov chain/process).
X0 X1 X2 X3
3
Quiz: are Markov models a special case of Bayes nets?
– Yes and no!
– Yes:
● Directed acyclic graph, joint = product of conditionals
– No:
● Infinitely many variables (unless we truncate)
4
Example: Random walk in one dimension
-4 -3 -2 -1 0 1 2 3 4
5
Example: n-gram models
We call ourselves Homo sapiens—man the wise—because our intelligence is so important to us.
For thousands of years, we have tried to understand how we think; that is, how a mere handful of matter can
perceive, understand, predict, and manipulate a world far larger and more complicated than itself. ….
7
Example: Weather
– States {rain, sun}
Two new ways of representing the same CPT
0.9
0.3
▪ Initial distribution P(X0)
rain sun
P(X0)
sun rain
0.7
0.5 0.5 0.1
9
Weather prediction
– Time 1: <0.6,0.4>
10
Weather prediction
– Time 2: <0.66,0.34>
11
Forward algorithm (simple form)
Probability from
previous iteration
Transition model
12
And the same thing in linear algebra form
● What is the weather like at time 2?
○ P(X2) = 0.6<0.9,0.1> + 0.4<0.3,0.7> = <0.66,0.34>
● In matrix-vector form:
Xt-1 P(Xt|Xt-1)
○ P(X2) = ( )( )=( )
0.9 0.3 0.6 0.66 sun rain
0.1 0.7 0.4 0.34
sun 0.9 0.1
rain 0.3 0.7
13
Stationary Distributions
– The limiting distribution is called the stationary distribution P
∞
of the chain
– It satisfies P = P +1 = TT P∞
∞ ∞
– Solving for P in the example:
∞
( 0.9 0.3
0.1 0.7 )( ) ( )
p p
1-p = 1-p
0.9p + 0.3(1-p) = p
p = 0.75
14
Example Run of Mini-Forward Algorithm
▪ From initial observation of sun
…
P(X1) P(X∞)
Video of Demo Ghostbusters Basic Dynamics
Video of Demo Ghostbusters Circular Dynamics
Video of Demo Ghostbusters Whirlpool Dynamics
Application of Stationary Distributions: Gibbs Sampling*
– Each joint instantiation over all hidden and query variables is a state:
{X1, …, Xn} = H U Q
– Transitions:
● With probability 1/n resample variable Xj according to
– Stationary distribution:
● Conditional distribution P(X1, X2 , … , Xn|e1, …, em)
● Means that when running Gibbs sampling long enough we get a sample from the
desired distribution
● Requires some proof to show this is true!
19
MC Example
MC Example: by forward sim (Monte Carlo)
100K steps
MC Example: by Linear Algebra
Pizza day!
MC Example: by Linear Algebra
Pizza day!
MC Example: by Linear Algebra
Pizza day!
MC Example: by Linear Algebra
Pizza day!
&
X0 X1 X2 X3
X
5
E1 E2 E3
E
5
27
Example: Weather HMM
– An HMM is defined by:
Wt-1 P(Wt|Wt-1) Wt P(Ut|Wt)
● Initial distribution: P(X0)
sun rain true false
sun 0.9 0.1 sun 0.2 0.8
● Transition model: P(Xt| Xt-1)
rain 0.3 0.7 rain 0.9 0.1 ● Sensor model: P(Et| Xt)
X0 X1 X2 X3
X
Useful notation:
5
Xa:b = Xa , Xa+1, …, Xb
E1 E2 E3
E
5
HMMs: Some Relevant Problems
30
Real HMM Examples
– Speech recognition HMMs:
● Observations are acoustic signals (continuous valued)
● States are specific positions in specific words (so, tens of thousands)
– Robot tracking:
● Observations are range readings (continuous)
● States are positions on a map (continuous)
– Molecular biology:
● Observations are nucleotides ACGT
● States are coding/non-coding/start/stop/splice-site etc.
31
Inference tasks
32
Inference tasks
– Filtering: P(X |e )
t 1:t
● belief state—input to the decision process of a rational agent
– Prediction: P(X
t+k
|e
1:t
) for k > 0
● evaluation of possible action sequences; like filtering without the evidence
33
Inference tasks
e1 e2 e3 e4 e1 e2 e3
e1 e2 e3 e4 e1 e2 e3 e4
Example: Ghostbusters HMM
– P(X1) = uniform 1/9 1/9 1/9
X1 X2 X3 X4 0 0 0
X P(X|X’=<1,2>)
5
Ri,j Ri,j Ri,j Ri,j
35
[Demo: Ghostbusters – Circular Dynamics – HMM (L14D2)]
Video of Demo Ghostbusters – Circular Dynamics --
HMM
Example 1: Weather-Mood (states observed)
Example 1: Weather-Mood (states observed)
)
Example 1: Weather-Mood (states observed)
– The Kalman filter was invented in the 60’s and first implemented as a
method of trajectory estimation for the Apollo program; 1.120.000 papers
on Google Scholar
55
Example: Robot Localization
Example from
Michael Pfeiffer
Prob 0 1
t=0
Sensor model: four bits for wall/no-wall in each direction,
never more than 1 mistake
Transition model: action may fail with small prob.
Example: Robot Localization
Prob 0 1
t=1
Lighter grey: was possible to get the reading,
but less likely (required 1 mistake)
Example: Robot Localization
Prob 0 1
t=2
Example: Robot Localization
Prob 0 1
t=3
Example: Robot Localization
Prob 0 1
t=4
Example: Robot Localization
Prob 0 1
t=5
Inference: Base Cases
X
1
X X
E 1 2
1
Passage of Time
– Aim: devise a recursive filtering algorithm of the form: P(Xt+1|e1:t+1) = g(et+1, P(Xt|e1:t) )
▪ Or compactly:
64
Passage of Time
66
Observation
– Assume we have current belief P(X | previous evidence): X
1
– Or, compactly:
Base case
known known
Online Belief Updates
– Every time step, we start with current P(X | evidence)
– We update for time:
X1 X2
– Filtering: P(X |e )
t 1:t
● belief state—input to the decision process of a rational agent
– Prediction: P(X
t+k
|e
1:t
) for k > 0
● evaluation of possible action sequences; like filtering without the evidence
76
Most likely explanation = most probable path
– State trellis: graph of states and transitions over time
• arg maxx1:t P(x1:t | e1:t)
sun sun sun sun = arg maxx1:t α P(x1:t , e1:t)
= arg maxx1:t P(x1:t , e1:t)
rain rain rain rain = arg maxx1:t P(x0) ∏t P(xt | xt-1) P(et | xt)
X0 X1 … XT
77
Forward / Viterbi algorithms
X0 X1 … XT