0% found this document useful (0 votes)
72 views22 pages

I2ml3e Chap15

This document provides an overview of hidden Markov models (HMMs) with 3 key points: 1. HMMs can model sequence data where the underlying states are hidden and observations depend on the current hidden state. 2. The three basic problems of HMMs are evaluation, finding the most likely state sequence, and learning model parameters from data. 3. The Baum-Welch algorithm uses expectation-maximization to estimate model parameters from observation sequences.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views22 pages

I2ml3e Chap15

This document provides an overview of hidden Markov models (HMMs) with 3 key points: 1. HMMs can model sequence data where the underlying states are hidden and observations depend on the current hidden state. 2. The three basic problems of HMMs are evaluation, finding the most likely state sequence, and learning model parameters from data. 3. The Baum-Welch algorithm uses expectation-maximization to estimate model parameters from observation sequences.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Lecture Slides for

INTRODUCTION
TO
MACHINE
LEARNING
3RD EDITION
ETHEM ALPAYDIN
© The MIT Press, 2014

[email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 15:

HIDDEN MARKOV MODELS


Introduction
3

 Modeling dependencies in input; no longer iid


 Sequences:
 Temporal: In speech; phonemes in a word (dictionary), words
in a sentence (syntax, semantics of the language).
In handwriting, pen movements
 Spatial: In a DNA sequence; base pairs
Discrete Markov Process
4

 N states: S1, S2, ..., SN State at “time” t, qt = Si


 First-order Markov
P(qt+1=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si)

 Transition probabilities
aij ≡ P(qt+1=Sj | qt=Si) aij ≥ 0 and Σj=1N
aij=1

 Initial probabilities
πi ≡ P(q1=Si) Σj=1N πi=1
Stochastic Automaton
5

T
P O  Q | A,    P q1  P qt | qt 1   q1aq1q2 aqT 1qT
t 2
Example: Balls and Urns
6

 Three urns each full of balls of one color


S1: red, S2: blue, S3: green
0.4 0.3 0.3
  0.5,0.2,0.3T A  0.2 0.6 0.2
 0.1 0.1 0.8
O  S1 , S1 , S3 , S3 
P O| A ,    P S1   P S1 | S1   P S3 | S1   P S3 | S3 
  1  a11  a13  a33
 0.5  0.4  0.3  0.8  0.048
Balls and Urns: Learning
7

 Given K example sequences of length T

# sequences starting with Si  k1q1  Si 


k

ˆi  
# sequences K
# transitions from Si to S j 
aˆij 
# transitions from Si 
1
k t 1 t i
T- 1
q k
 S and qt 1  S j 
k


  1q  Si 
T- 1 k
k t 1 t
Hidden Markov Models
8

 States are not observable


 Discrete observations {v1,v2,...,vM} are recorded; a
probabilistic function of the state
 Emission probabilities
bj(m) ≡ P(Ot=vm | qt=Sj)
 Example: In each urn, there are balls of different
colors, but with different probabilities.
 For each observation sequence, there are multiple
state sequences
HMM Unfolded in Time
9
Elements of an HMM
10

 N: Number of states
 M: Number of observation symbols
 A = [aij]: N by N state transition probability matrix
 B = bj(m): N by M observation probability matrix
 Π = [πi]: N by 1 initial state probability vector

λ = (A, B, Π), parameter set of HMM


Three Basic Problems of HMMs
11

1. Evaluation: Given λ, and O, calculate P (O | λ)


2. State sequence: Given λ, and O, find Q* such that
P (Q* | O, λ ) = maxQ P (Q | O , λ )
3. Learning: Given X={Ok}k, find λ* such that
P ( X | λ* )=maxλ P ( X | λ )

(Rabiner, 1989)
Evaluation
12

 Forward variable:
 t i   P O1 Ot , qt  Si |  
Initialization :
1 i    i bi O1 
Recursion:
N 
 t 1  j    t i aij  b j Ot 1 
 i 1 
N
P O |     T i 
i 1
Backward variable:

 t i   P Ot 1 OT | qt  Si ,  

Initialization :
T i   1
Recursion:
N
t i    aijb j Ot 1  t 1  j 
j 1

13
Finding the State Sequence
14

 t i   P qt  Si O,  
 t i t i 
 N
 j 1t  j t  j 
Choose the state that has the highest probability,
for each time step:
qt*= arg maxi γt(i)

No!
Viterbi’s Algorithm
15

δt(i) ≡ maxq1q2∙∙∙ qt-1 p(q1q2∙∙∙qt-1,qt =Si,O1∙∙∙Ot | λ)


 Initialization:
δ1(i) = πibi(O1), ψ1(i) = 0
 Recursion:
δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-
1(i)aij
 Termination:
p* = maxi δT(i), qT*= argmaxi δT (i)
 Path backtracking:
qt* = ψt+1(qt+1* ), t=T-1, T-2, ..., 1
Learning
16

t i , j   P qt  Si , qt 1  S j | O,  
 t i aijb j Ot 1  t 1  j 
t i , j  
k l  t k akl bl Ot 1 t 1 l 
Baum - Welch algorithm (EM) :
1 if qt  Si 1 if qt  Si and qt 1  S j
z 
t
i zij  
t

0 otherwise 0 otherwise
Baum-Welch (EM)
   
E  s tep: E zit   t i  E zijt  t i , j 
M  s tep:
K

 1 i 
 k

k 1 t 1 t i , j 
K Tk 1
 k

ˆi  k 1
aˆij 
k 1 t 1 t i 
K Tk 1
K  k

k 1 t 1 t
K
 k

Tk 1
j 1Ot  vm 
k

bˆ j m  
k 1 t 1 t i 
K Tk 1
 k

17
Continuous Observations
18

 Discrete:
rmt
1 if Ot  vm
P Ot | qt  S j ,     bj m
M
r 
t
m
m 1 0 otherwise

 Gaussian mixture (Discretize using k-means):


P Ot |qt  S j ,     P G jl  pOt |qt  S j ,Gl ,  
L

l 1

~ N l ,  l 
Continuous:
POt |qt  S j ,   ~ N  j , j2 

Use EM to learn parameters, e.g.,


̂ 
   j O
t t t

   j
j
t t
HMM with Input
19

 Input-dependent observations:
POt | qt  S j , x t ,   ~ N g j x t |  j , j2 

 Input-dependent transitions (Meila and Jordan,


1996; Bengio and Frasconi, 1996):
Pqt 1  S j |qt  Si , x t 

 Time-delay input:
xt  f Ot  ,...,Ot 1 
HMM as a Graphical Model
20
21
Model Selection in HMM
22

 Left-to-right HMMs:
a11 a12 a13 0 
0 a a a 
A
22 23 24 

0 0 a33 a34 
 
 0 0 0 a 44 

 In classification, for each Ci, estimate P (O | λi) by a


separate HMM and use Bayes’ rule P O | i Pi 
Pi | O  
 PO |  P 
j j j

You might also like