0% found this document useful (0 votes)
42 views22 pages

I2ml3e Chap15

15 chapt

Uploaded by

varun3dec1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views22 pages

I2ml3e Chap15

15 chapt

Uploaded by

varun3dec1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Lecture Slides for

INTRODUCTION
TO
MACHNE
LEARNNG
3RD EDTON
ETHEM ALPAYDIN
The MIT Press, 2014

[email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 15:

HDDEN MARKOV MODELS


Introduction
3

Modeling dependencies in input; no longer iid


Sequences:
Temporal: In speech; phonemes in a word (dictionary), words
in a sentence (syntax, semantics of the language).
In handwriting, pen movements
Spatial: In a DNA sequence; base pairs
Discrete Markov Process
4

N states: S1, S2, ..., SN State at time t, qt = Si


First-order Markov
P(qt+1=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si)

Transition probabilities
aij P(qt+1=Sj | qt=Si) aij 0 and j=1N
aij=1

Initial probabilities
i P(q1=Si) j=1N i=1
Stochastic Automaton
5

T
P O Q | A , P q1 P qt | qt 1 q1aq1q2 aqT 1qT
t 2
Example: Balls and Urns
6

Three urns each full of balls of one color


S1: red, S2: blue, S3: green
0.4 0.3 0.3
0.5,0.2,0.3T A 0.2 0.6 0.2
0.1 0.1 0.8
O S1 , S1 , S3 , S3
P O| A , P S1 P S1 | S1 P S3 | S1 P S3 | S3
1 a11 a13 a33
0.5 0.4 0.3 0.8 0.048
Balls and Urns: Learning
7

Given K example sequences of length T

# sequences starting with Si k1q1 Si


k

i
# sequences K
# transition s from Si to S j
aij
# transition s from Si
1
k t 1 t i
T- 1
q k
S and qt 1 S j
k


1q Si
T- 1 k
k t 1 t
Hidden Markov Models
8

States are not observable


Discrete observations {v1,v2,...,vM} are recorded; a
probabilistic function of the state
Emission probabilities
bj(m) P(Ot=vm | qt=Sj)
Example: In each urn, there are balls of different
colors, but with different probabilities.
For each observation sequence, there are multiple
state sequences
HMM Unfolded in Time
9
Elements of an HMM
10

N: Number of states
M: Number of observation symbols
A = [aij]: N by N state transition probability matrix
B = bj(m): N by M observation probability matrix
= [i]: N by 1 initial state probability vector

= (A, B, ), parameter set of HMM


Three Basic Problems of HMMs
11

1. Evaluation: Given , and O, calculate P (O | )


2. State sequence: Given , and O, find Q* such that
P (Q* | O, ) = maxQ P (Q | O , )
3. Learning: Given X={Ok}k, find * such that
P ( X | * )=max P ( X | )

(Rabiner, 1989)
Evaluation
12

Forward variable:
t i P O1 Ot , qt Si |
Initializa tion :
1 i i bi O1
Recursion :
N
t 1 j t i aij b j Ot 1
i 1
N
P O | T i
i 1
Backward variable:

t i P Ot 1 OT | qt Si ,

Initializa tion :
T i 1
Recursion :
N
t i aij b j Ot 1 t 1 j
j 1

13
Finding the State Sequence
14

t i P qt Si O,
t i t i
N
j 1 t j t j
Choose the state that has the highest probability,
for each time step:
qt*= arg maxi t(i)

No!
Viterbis Algorithm
15

t(i) maxq1q2 qt-1 p(q1q2qt-1,qt =Si,O1Ot | )


Initialization:
1(i) = ibi(O1), 1(i) = 0
Recursion:
t(j) = maxi t-1(i)aijbj(Ot), t(j) = argmaxi t-
1(i)aij
Termination:
p* = maxi T(i), qT*= argmaxi T (i)
Path backtracking:
qt* = t+1(qt+1* ), t=T-1, T-2, ..., 1
Learning
16

t i , j P qt Si , qt 1 S j | O ,
t i aij b j Ot 1 t 1 j
t i , j
k l t k akl bl Ot 1 t 1 l
Baum - Welch algorithm (EM) :
1 if qt Si 1 if qt Si and qt 1 S j
z
t
i zij
t

0 otherwise 0 otherwise
Baum-Welch (EM)

E step : E zit t i E zijt t i , j
M step :
K

1 i
k

k 1 t 1 t i , j
K Tk 1
k

i k 1
aij
k 1 t 1 t i
K Tk 1
K k

k 1 t 1 t
K
k

Tk 1
j 1Ot vm
k

b j m
k 1 t 1 t i
K Tk 1
k

17
Continuous Observations
18

Discrete:
rmt
1 if Ot vm
P Ot | qt S j , bj m
M
r
t
m
m 1 0 otherwise

Gaussian mixture (Discretize using k-means):


P Ot |qt S j , P G jl pOt |qt S j ,Gl ,
L

l 1

~ N l , l
Continuous:
POt |qt S j , ~ N j , j2

Use EM to learn parameters, e.g.,



j O
t t t

j
j
t t
HMM with Input
19

Input-dependent observations:
POt | qt S j , x t , ~ N g j x t | j , j2

Input-dependent transitions (Meila and Jordan,


1996; Bengio and Frasconi, 1996):
Pqt 1 S j |qt Si , x t

Time-delay input:
xt f Ot ,...,Ot 1
HMM as a Graphical Model
20
21
Model Selection in HMM
22

Left-to-right HMMs:
a11 a12 a13 0
0 a a a
A
22 23 24

0 0 a33 a34

0 0 0 a 44

In classification, for each Ci, estimate P (O | i) by a


separate HMM and use Bayes rule P O | i P i
P i | O
P O | P
j j j

You might also like