0% found this document useful (0 votes)
17 views88 pages

Lec7 - 10 - HMM Learning

Uploaded by

biswasarno75
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views88 pages

Lec7 - 10 - HMM Learning

Uploaded by

biswasarno75
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Hidden Markov Models

Dr. Md. Golam Rabiul Alam


BRAC University
1
Markov Models

2
Markov Models

3
Markov Models

4
Example of Markov Model
0.3 0.7

Rain Dry

0.2 0.8

• Two states : ‘Rain’ and ‘Dry’.


• Transition probabilities:
P(‘Rain’|‘Rain’)=0.3 , P(‘Dry’|‘Rain’)=0.7 ,
P(‘Rain’|‘Dry’)=0.2, P(‘Dry’|‘Dry’)=0.8
• Initial probabilities: say P(‘Rain’)=0.4 , P(‘Dry’)=0.6 .

5
Calculation of sequence probability

Dry Dry Rain

P ( S1  Dry , S 2  Dry , S 3  Rain )


 P ( S1  Dry ) P ( S 2  Dry | S1  Dry ) P ( S 3  Rain | S 2  Dry )
 0.6  0.8  0.2

Markov Model is just a Bayesian Network;


In this network, P(S1,S2,S3)=P(S1)P(S2|S1)P(S3|S2) 6
Hidden Markov Models

• Based on Markov Models


• Differences include
– State becomes “Hidden”
– The state information is not available, instead
of, there are some “Observations” with are
correlated with “Hidden” State

7
Hidden Markov Model

In Markov Model,
Emission Prof. is not used
8
Hidden Markov Models

9
Hidden Markov Models

10
Hidden Markov Models

11
HMM Example

12
13
HMM Problems

14
15
16
Evaluation Problem

17
Hidden Markov Model

18
Problems can be solved using HMM
1) Calculation of observation sequence probability

S1 S2 S3

O1 O2 O3

Rain Rain Dry

P (O1  Rain, O 2  Rain, O 3  Dry ) 19


P (O1  Rain , O 2  Rain , O 3  Dry )
  P (O1  Rain , O 2  Rain , O 3  Dry , S 3)
S 3 {low , high }

P (O1  Rain, O 2  Rain , O 3  Dry , S 3  Low )   3Low


P (O1  Rain, O 2  Rain , O 3  Dry , S 3  High )   3High

P (O1  Rain, O 2  Rain , O 3  Dry )   3Low   3High    3i


i{ Low , High }

* Now the problem is how to calculate  3Low &  3High


20
S1 S2 S3  3i  P (O1, O 2, O 3, S 3  i )

O1 O2 O3

S1 S2 S3
  P (O1, O 2, S 2  i )
i
2

O1 O2 O3

S1 S2 S3
  P (O1, S1  i )
i
1
21
Calculation
Difficulty

22
Can we find some relationship between  3i &  2i & 1i

If we can find the relationship, then we can:


1) Calculate  1i
2) Calculate  2i based on  1i
3) Calculate  3i based on  2i

Recursively!

23
Can we find the relationship? (Yes)
 3Low  P (O1, O 2, O 3, S 3  Low )
 P (O1, O 2, O 3, S 3  Low, S 2  High )  P (O1, O 2, O 3, S 3  Low, S 2  Low )
  P (O1, O 2, O 3, S 3  Low, S 2  j )
j { Low , High }

  P (O1, O 2, S 2  j )  P (O 3 | S 3  Low, O1, O 2, S 2  j )  P ( S 3  Low | O1, O 2, S 2  j )


j { Low , High }

D-Separation

P (O3 | S 3  Low)
P ( S 3  Low | S 2  j )

S1 S2 S3

O1 O2 O3 24
Forward Algorithm
 3Low   P (O1, O 2, S 2  j )  P (O 3 | S 3  Low )  P ( S 3  Low | S 2  j )
j { Low , High }

   2j  P (O 3 | S 3  Low )  P ( S 3  Low | S 2  j )
j { Low , High }

 P (O 3 | S 3  Low )   2j  P ( S 3  Low | S 2  j )
j { Low , High }
Dynamic
Programming

Emission Probability Transition Probability

25
26
Forward Algorithm

27
28
HMM Example

29
Decoding Problem 1

30
Problem can be Solved by HMM

• Decoding problem 1
S1 S2 S3 S4

O1 O2 O3 O4

P(O1,O2,O3,O4,S3=High)=?

31
P (O1, O 2, O 3, O 4, S 3  High ) 
P (O1, O 2, O 3, S 3  High )  P (O 4 | O1, O 2, O 3, S 3  High )
 P (O1, O 2, O 3, S 3  High )  P (O 4 | S 3  High )
  3High   3High

We already know how


to calculate it. ?
32
S1 S2 S3 S4 S5

S6 High
3

O1 O2 O3 O4 O5 O6

S1 S2 S3 S4 S5 S6 Calculation
 High
4
Difficulty
O1 O2 O3 O4 O5 O6

S1 S2 S3 S4 S5 S6

 5High
O1 O2 O3 O4 O5 O6
33
Can we find some relationship among  i &  i &  i &  i
3 4 5 6

If we can find the relationship, then we can:


1) Calculate  6i  1
2) Calculate  5i based on  6i
3) Calculate  4i based on  i
5

3 4
4) Calculate i based on i

Recursively!
34
Can we find the relationship? (Yes)
 3High  P (O 4, O 5, O 6 | S 3  High )
  P (O 4, O 5, O 6, S 4  j | S 3  High )
j { Low , High }

  P (O 5, O 6 | S 4  j , O 4, S 3  High )  P (O 4 | S 4  j , S 3  High )  P ( S 4  j | S 3  High )


j { Low , High }

D-Separation

 4j P (O 4 | S 4  j )

S1 S2 S3 S4 S5 S6

O1 O2 O3 O4 O5 O6
35
Backward Algorithm

 3High    4j  P (O 4 | S 4  j )  P ( S 4  j | S 3  High )
j { Low , High }

Emission Prob.
Transition Prob.

36
Backward Probability

37
Backward Algorithm

38
Decoding Problem 2

S1 S2 S3

O1 O2 O3

Rain Rain Dry

O1,O2,O3 are known, what is the most probable sequence S1,S2,S3


For example: {High, High, Low}, {Low, High, Low}, {Low, High, High}…….
39
arg max P ( S1, S 2, S 3 | O1, O 2, O 3)
S 1, S 2, S 3

 arg max P ( S1, S 2, S 3, O1, O 2, O 3)


S 1, S 2, S 3

 arg max max P ( S 3  k , S1, S 2, O1, O 2, O 3)


k S 1, S 2

V3k
Probability of most likely sequence of states ending at states S3=k

40
V3k
 max P ( S 3  k , S 2, S1, O1, O 2, O 3)
S 1, S 2

 max max P ( S 3  k , S 2  i, S1, O1, O 2, O 3)


i S1

 max max P ( S 2  i , S1, O1, O 2) P ( S 3  k , O 3 | S 2  i , S1, O1, O 2)


i S1

 max max P ( S 2  i , S1, O1, O 2) P (O 3 | S 3  k , S 2  i, S1, O1, O 2) P ( S 3  k | S 2  i, S1, O1, O 2)


i S1

D-Separation

P (O 3 | S 3  k ) P(S 3  k | S 2  i)

V3k  max V2i P (O 3 | S 3  k ) P ( S 3  k | S 2  i )


i

 P (O 3 | S 3  k ) max P ( S 3  k | S 2  i )V 2
i
41
i
Viterbi algorithm

V3k  max V2i P (O 3 | S 3  k ) P ( S 3  k | S 2  i )


i

 P (O 3 | S 3  k ) max P ( S 3  k | S 2  i )V2i
i

V2k  P (O 2 | S 2  k ) max P ( S 2  k | S1  i )V1i


i

42
Viterbi algorithm

43
HMM

What is the most likely sequence


of health status for the
observation sequence
[Normal, Cold, Dizzy]?

44
Viterbi Algorithm

45
Viterbi Algorithm

46
Viterbi Algorithm

47
Viterbi Algorithm

48
Viterbi Algorithm

The most likely sequence of health status is [Health,


Healthy, Fever] for the observation sequence
[Normal, Cold, Dizzy]. 49
50
51
Expectation Maximization (EM)

52
53
EM

54
55
Expectation Maximization (EM)

56
Expected number of transitions from state
i to state j at time t 57
Expected number of transitions from state
i to state j at time t 58
Expected number of transitions from state
i to state j at time t 59
Expected number of times for being in
state i at time t.

60
Expected number of times for being in
state i at time t. 61
62
63
Example

Observations: R, D

State sequence 1

ObS sequence 1

State sequence 2

Obs sequence 2

State sequence 3

Obs sequence 3

State sequence 4

Obs sequence 4

State sequence 5

Obs sequence 5

64
Example

65
Expected number of
transitions from state
i to state j at time t

Expected number
of times for being
in state i at time t.

66
HMM Parameter Learning

67
HMM Parameter Learning

68
HMM Parameter Learning (Given fully
labeled sequences)

69
HMM Parameter Learning

70
HMM Parameter Learning

71
HMM Parameter Learning

72
HMM Parameter Learning without fully
labeled hidden sequences

We have seen the procedure to calculate the optimal parameters given the
hidden state sequence.

However, it is common that the hidden state sequence is unknown. In such a


case, we first try to "estimate" the "expected" state sequence based on some
initial estimates of parameters.

Then, we use the principles of MLE for the observed state sequence to refine
the parameters.

We apply these two steps, iteratively, via an algorithm called Expectation-


Maximization.
73
HMM Parameter Learning

74
HMM Parameter Learning

75
HMM Parameter Learning

76
HMM Parameter Learning

77
HMM Parameter Learning

78
HMM Parameter Learning

By repeating the Expectation and Maximization steps till convergence, we get a set
of local optima. We may run the algorithms multiple times with different
initializations and finally choose the set of parameters giving the highest
likelihood.

79
HMM Parameter Learning Example
Without fully labeled sequences

• Assume, we have the observations for a


single example in our training set from the
Fair and Biased coin HMM like the
following:

Parameter learning Example Without fully


labeled sequences
We wish to compute the parameters using the
EM algorithm. Assume that K=2: Fair coin
and Biased coin. 80
HMM Parameter Learning (Example)

81
HMM Parameter Learning

82
HMM Parameter Learning (Example)

83
HMM Parameter Learning

84
HMM Parameter Learning (Example)

85
86
87
Thank You
• Standard HMM reference:
L. R. Rabiner, "A Tutorial on Hidden Markov Models and
Selected Applications in Speech Recognition," Proc. of the
IEEE, Vol.77, No.2, pp.257-286, 1989.

• Excellent reference for Dynamic Bayes Nets as a unifying


framework for probabilistic temporal models (including
HMMs and Kalman filters):

Chapter 15 of Artificial Intelligence, A Modern Approach,


2nd Edition, by Russell & Norvig

88

You might also like