Lec7 - 10 - HMM Learning
Lec7 - 10 - HMM Learning
2
Markov Models
3
Markov Models
4
Example of Markov Model
0.3 0.7
Rain Dry
0.2 0.8
5
Calculation of sequence probability
7
Hidden Markov Model
In Markov Model,
Emission Prof. is not used
8
Hidden Markov Models
9
Hidden Markov Models
10
Hidden Markov Models
11
HMM Example
12
13
HMM Problems
14
15
16
Evaluation Problem
17
Hidden Markov Model
18
Problems can be solved using HMM
1) Calculation of observation sequence probability
S1 S2 S3
O1 O2 O3
O1 O2 O3
S1 S2 S3
P (O1, O 2, S 2 i )
i
2
O1 O2 O3
S1 S2 S3
P (O1, S1 i )
i
1
21
Calculation
Difficulty
22
Can we find some relationship between 3i & 2i & 1i
Recursively!
23
Can we find the relationship? (Yes)
3Low P (O1, O 2, O 3, S 3 Low )
P (O1, O 2, O 3, S 3 Low, S 2 High ) P (O1, O 2, O 3, S 3 Low, S 2 Low )
P (O1, O 2, O 3, S 3 Low, S 2 j )
j { Low , High }
D-Separation
P (O3 | S 3 Low)
P ( S 3 Low | S 2 j )
S1 S2 S3
O1 O2 O3 24
Forward Algorithm
3Low P (O1, O 2, S 2 j ) P (O 3 | S 3 Low ) P ( S 3 Low | S 2 j )
j { Low , High }
2j P (O 3 | S 3 Low ) P ( S 3 Low | S 2 j )
j { Low , High }
P (O 3 | S 3 Low ) 2j P ( S 3 Low | S 2 j )
j { Low , High }
Dynamic
Programming
25
26
Forward Algorithm
27
28
HMM Example
29
Decoding Problem 1
30
Problem can be Solved by HMM
• Decoding problem 1
S1 S2 S3 S4
O1 O2 O3 O4
P(O1,O2,O3,O4,S3=High)=?
31
P (O1, O 2, O 3, O 4, S 3 High )
P (O1, O 2, O 3, S 3 High ) P (O 4 | O1, O 2, O 3, S 3 High )
P (O1, O 2, O 3, S 3 High ) P (O 4 | S 3 High )
3High 3High
O1 O2 O3 O4 O5 O6
S1 S2 S3 S4 S5 S6 Calculation
High
4
Difficulty
O1 O2 O3 O4 O5 O6
S1 S2 S3 S4 S5 S6
5High
O1 O2 O3 O4 O5 O6
33
Can we find some relationship among i & i & i & i
3 4 5 6
3 4
4) Calculate i based on i
Recursively!
34
Can we find the relationship? (Yes)
3High P (O 4, O 5, O 6 | S 3 High )
P (O 4, O 5, O 6, S 4 j | S 3 High )
j { Low , High }
D-Separation
4j P (O 4 | S 4 j )
S1 S2 S3 S4 S5 S6
O1 O2 O3 O4 O5 O6
35
Backward Algorithm
3High 4j P (O 4 | S 4 j ) P ( S 4 j | S 3 High )
j { Low , High }
Emission Prob.
Transition Prob.
36
Backward Probability
37
Backward Algorithm
38
Decoding Problem 2
S1 S2 S3
O1 O2 O3
V3k
Probability of most likely sequence of states ending at states S3=k
40
V3k
max P ( S 3 k , S 2, S1, O1, O 2, O 3)
S 1, S 2
D-Separation
P (O 3 | S 3 k ) P(S 3 k | S 2 i)
P (O 3 | S 3 k ) max P ( S 3 k | S 2 i )V 2
i
41
i
Viterbi algorithm
P (O 3 | S 3 k ) max P ( S 3 k | S 2 i )V2i
i
42
Viterbi algorithm
43
HMM
44
Viterbi Algorithm
45
Viterbi Algorithm
46
Viterbi Algorithm
47
Viterbi Algorithm
48
Viterbi Algorithm
52
53
EM
54
55
Expectation Maximization (EM)
56
Expected number of transitions from state
i to state j at time t 57
Expected number of transitions from state
i to state j at time t 58
Expected number of transitions from state
i to state j at time t 59
Expected number of times for being in
state i at time t.
60
Expected number of times for being in
state i at time t. 61
62
63
Example
Observations: R, D
State sequence 1
ObS sequence 1
State sequence 2
Obs sequence 2
State sequence 3
Obs sequence 3
State sequence 4
Obs sequence 4
State sequence 5
Obs sequence 5
64
Example
65
Expected number of
transitions from state
i to state j at time t
Expected number
of times for being
in state i at time t.
66
HMM Parameter Learning
67
HMM Parameter Learning
68
HMM Parameter Learning (Given fully
labeled sequences)
69
HMM Parameter Learning
70
HMM Parameter Learning
71
HMM Parameter Learning
72
HMM Parameter Learning without fully
labeled hidden sequences
We have seen the procedure to calculate the optimal parameters given the
hidden state sequence.
Then, we use the principles of MLE for the observed state sequence to refine
the parameters.
74
HMM Parameter Learning
75
HMM Parameter Learning
76
HMM Parameter Learning
77
HMM Parameter Learning
78
HMM Parameter Learning
By repeating the Expectation and Maximization steps till convergence, we get a set
of local optima. We may run the algorithms multiple times with different
initializations and finally choose the set of parameters giving the highest
likelihood.
79
HMM Parameter Learning Example
Without fully labeled sequences
81
HMM Parameter Learning
82
HMM Parameter Learning (Example)
83
HMM Parameter Learning
84
HMM Parameter Learning (Example)
85
86
87
Thank You
• Standard HMM reference:
L. R. Rabiner, "A Tutorial on Hidden Markov Models and
Selected Applications in Speech Recognition," Proc. of the
IEEE, Vol.77, No.2, pp.257-286, 1989.
88