0% found this document useful (0 votes)

145 views14 pages

Tutorial HMM CI

Hidden Markov models (HMMs) are statistical Markov models where the system being modeled is assumed to be a Markov process with unobserved (hidden) states. HMMs can be used to characterize weather patterns, with weather states as the hidden variables and umbrella-carrying behavior as observed variables. The likelihood of weather state sequences can be calculated based on the observed umbrella sequence using Bayes' rule and the Markov assumption. HMMs allow making probabilistic inferences about the hidden weather states based on the observed umbrella data.

Uploaded by

Trương Tiểu Phàm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views14 pages

Tutorial HMM CI

Uploaded by

Trương Tiểu Phàm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Hidden Markov Models

A Tutorial for the Course Computational Intelligence

http://www.igi.tugraz.at/lehre/CI

Barbara Resch (modified by Erhard and Car line Rank)

Signal Processing and Speech Communication Laboratory
Inffeldgasse 16c/II
phone 8734436

Abstract
This tutorial gives a gentle introduction to Markov models and hidden Markov models (HMMs) and relates
them to their use in automatic speech recognition. Additionally, the Viterbi algorithm is considered,
relating the most likely state sequence of a HMM to a given sequence of observations.
Usage
To make full use of this tutorial you should
1. Download the file HMM.zip1 which contains this tutorial and the accompanying Matlab programs.
2. Unzip HMM.zip which will generate a subdirectory named HMM/matlab where you can find all the
Matlab programs.
3. Add the folder HMM/matlab and the subfolders to the Matlab search path with a command
like addpath(C:\Work\HMM\matlab) if you are using a Windows machine or addpath(/home/jack/HMM/matlab) if you are using a Unix/Linux machine.
Sources
This tutorial is based on
Markov Models and Hidden Markov Models - A Brief Tutorial International Computer Science
Institute Technical Report TR-98-041, by Eric Fosler-Lussier,
EPFL lab notes Introduction to Hidden Markov Models by Herve Bourlard, Sacha Krstulovic,
and Mathew Magimai-Doss, and
HMM-Toolbox (also included in BayesNet Toolbox) for Matlab by Kevin Murphy.

Markov Models

Lets talk about the weather. Lets say in Graz, there are three types of weather: sunny , rainy , and
foggy . Lets assume for the moment that the weather lasts all day, i.e., it doesnt change from rainy
to sunny in the middle of the day.
Weather prediction is about trying to guess what the weather will be like tomorrow based on the observations of the weather in the past (the history). Lets set up a statistical model for weather prediction:
We collect statistics on what the weather qn is like today (on day n) depending on what the weather
1 http://www.igi.tugraz.at/lehre/CI/tutorials/HMM.zip

was like yesterday qn1 , the day before qn2 , and so forth. We want to find the following conditional
probabilities
P (qn |qn1 , qn2 , ..., q1 ),
(1)
meaning, the probability of the unknown weather at day n, qn { , , }, depending on the (known)
weather qn1 , qn2 , . . . of the past days.
Using the probability in eq. 1, we can make probabilistic predictions of the type of weather for
tomorrow and the next days using the observations of the weather history. For example, if we knew that
the weather for the past three days was { , , } in chronological order, the probability that tomorrow
would be is given by:
P (q4 = |q3 = , q2 = , q1 = ).
(2)
This probability could be inferred from the relative frequency (the statistics) of past observations of
weather sequences { , ,
}.
Heres one problem: the larger n is, the more observations we must collect. Suppose that n = 6,
then we have to collect statistics for 3(61) = 243 past histories. Therefore, we will make a simplifying
assumption, called the Markov assumption:
For a sequence {q1 , q2 , ..., qn }:
P (qn |qn1 , qn2 , ..., q1 ) = P (qn |qn1 ).

(3)

This is called a first-order Markov assumption: we say that the probability of a certain observation at
time n only depends on the observation qn1 at time n 1. (A second-order Markov assumption would
have the probability of an observation at time n depend on qn1 and qn2 . In general, when people
talk about a Markov assumption, they usually mean the first-order Markov assumption.) A system for
which eq. 3 is true is a (first-order) Markov model, and an output sequence {qi } of such a system is a
(first-order) Markov chain.
We can also express the probability of a certain sequence {q1 , q2 , . . . , qn } (the joint probability of
certain past and current observations) using the Markov assumption:2
P (q1 , ..., qn ) =

n
Y

P (qi |qi1 ).

(4)

i=1

The Markov assumption has a profound affect on the number of histories that we have to find statistics
for we now only need 3 3 = 9 numbers (P (qn |qn1 ) for every possible combination of qn , qn1
{ , , }) to characterize the probabilities of all possible sequences. The Markov assumption may or
may not be a valid assumption depending on the situation (in the case of weather, its probably not
valid), but it is often used to simplify modeling.
So lets arbitrarily pick some numbers for P (qtomorrow |qtoday ), as given in table 1 (note, that whatever
the weather is today there certainly is some kind of weather tomorrow, so the probabilities in every
row of table 1 sum up to one).
Tomorrows weather
Todays weather
0.8

0.05

0.15

0.2

0.6

0.2

0.3

0.5

Table 1: Probabilities p(qn+1 |qn ) of tomorrows weather based on todays weather

For first-order Markov models, we can use these probabilities to draw a probabilistic finite state
automaton. For the weather domain, we would have three states, S = { , , }, and every day there
would be a possibility p(qn |qn1 ) of a transition to a (possibly different) state according to the probabilities
in table 1. Such an automaton would look like shown in figure 1.
2 One question that comes to mind is What is q ? In general, one can think of q as the start word, so P (q |q )
0
0
1 0
is
the
Qn probability that q1 can start a sentence. We can also just multiply a prior probability of q1 with the product of
i=2 P (qi |qi1 ) , its just a matter of definitions.

0.8

0.5
0.15

Sunny

Foggy
0.2
0.05

0.3

0.2

Rainy

0.6
Figure 1: Markov model for the Graz weather with state transition probabilities according to table 1
1.0.1

Examples

1. Given that today the weather is , whats the probability that tomorrow is
and the day after is
?
Using the Markov assumption and the probabilities in table 1, this translates into:
P (q2 =

, q3 =

|q1 =

) = P (q3 = |q2 =
= P (q3 = |q2 =
= 0.05 0.8
= 0.04

, q1 = ) P (q2 = |q1 = )
) P (q2 = |q1 = )
(Markov assumption)

You can also think about this as moving through the automaton (figure 1), multiplying the probabilities along the path you go.
2. Assume, the weather yesterday was q1 =
tomorrow it will be q3 = ?
P (q3 =

|q2 =

, q1 =

, and today it is q2 =

) = P (q3 =
= 0.2.

|q2 =

, what is the probability that

(Markov assumption)

3. Given that the weather today is q1 = , what is the probability that it will be
two days from
now: q3 = . (Hint: There are several ways to get from
today to
two days from now. You
have to sum over these paths.)

Hidden Markov Models (HMMs)

So far we heard of the Markov assumption and Markov models. So, what is a Hidden Markov Model ?
Well, suppose you were locked in a room for several days, and you were asked about the weather outside.
The only piece of evidence you have is whether the person who comes into the room bringing your daily
meal is carrying an umbrella ( ) or not ( ).
Lets suppose the probabilities shown in table 2: The probability that your caretaker carries an
umbrella is 0.1 if the weather is sunny, 0.8 if it is actually raining, and 0.3 if it is foggy.
The equation for the weather Markov process before you were locked in the room was (eq. 4):
P (q1 , ..., qn ) =

n
Y
i=1

P (qi |qi1 ).

Weather

Probability of umbrella

Sunny

0.1

Rainy

0.8

Foggy

0.3

Table 2: Probability P (xi |qi ) of carrying an umbrella (xi = true) based on the weather qi on some day i

However, the actual weather is hidden from you. Finding the probability of a certain weather qi
{ , , } can only be based on the observation xi , with xi = , if your caretaker brought an umbrella
on day i, and xi =
if the caretaker did not bring an umbrella. This conditional probability P (qi |xi )
can be rewritten according to Bayes rule:
P (qi |xi ) =

P (xi |qi )P (qi )

,
P (xi )

or, for n days, and weather sequence Q = {q1 , . . . , qn }, as well as umbrella sequence X = {x1 , , . . . , xn }
P (q1 , . . . , qn |x1 , . . . , xn ) =

P (x1 , . . . , xn |q1 , . . . , qn )P (q1 , . . . , qn )

,
P (x1 , . . . , xn )

using the probability P (q1 , . . . , qn ) of a Markov weather sequence from above, and the a-priori probability
P (x1 , . . . , xn ) of seeing a particular sequenceQof umbrella events (e.g., { , , }). The probability
n
P (x1 , . . . , xn |q1 , . . . , qn ) can be estimated as i=1 P (xi |qi ), if you assume that, for all i, the qi , xi are
independent of all xj and qj , for all j 6= i.
We want to draw conclusions from our observations (if the persons carries an umbrella or not) about
the weather outside. We can therefore omit the a-priori probability of seeing an umbrella P (x1 , . . . , xn )
as it is independent of the weather, that we like to predict. We get a measure for the probability, which
is proportional to the probability, and which we will refer as the likelihood L.
P (q1 , . . . , qn |x1 , . . . , xn )
L(q1 , . . . , qn |x1 , . . . , xn ) = P (x1 , . . . , xn |q1 , . . . , qn ) P (q1 , . . . , qn )

(5)

With our (first order) Markov assumption it turns to:

P (q1 , . . . , qn |x1 , . . . , xn )
L(q1 , . . . , qn |x1 , . . . , xn ) =

n
Y
i=1

2.0.1

P (xi |qi )

n
Y

(6)

P (qi |qi1 )

i=1

Examples

1. Suppose the day you were locked in it was sunny. The next day, the caretaker carried an umbrella
into the room. You would like to know, what the weather was like on this second day.
First we calculate the likelihood for the second day to be sunny:
L(q2 =

|q1 =

, x2 =

) = P (x2 = |q2 =
= 0.1 0.8 = 0.08,

) P (q2 =

|q1 =

) = P (x2 = |q2 = ) P (q2 =

= 0.8 0.05 = 0.04,

|q1 =

then for the second day to be rainy:

L(q2 =

|q1 =

, x2 =

and finally for the second day to be foggy:

L(q2 =

|q1 =

, x2 =

) = P (x2 = |q2 = ) P (q2 =

= 0.3 0.15 = 0.045.

|q1 =

Thus, although the caretaker did carry an umbrella, it is most likely that on the second day the
weather was sunny.
4

2. Suppose you do not know how the weather was when your were locked in. The following three days
the caretaker always comes without an umbrella. Calculate the likelihood for the weather on these
three days to have been {q1 = , q2 = , q3 = }. As you do not know how the weather is on
the first day, you assume the 3 weather situations are equi-probable on this day (cf. footnote on
page 2), and the prior probability for sun on day one is therefore P (q1 = |q0 ) = P (q1 = ) = 1/3.
L(q1 =

, q2 =

, q3 = |x1 = , x2 = , x3 = ) =
P (x1 = |q1 = ) P (x2 = |q2 = ) P (x3 = |q3 = )
P (q1 = ) P (q2 = |q1 = ) P (q3 = |q2 = ) =
0.9 0.7 0.9 1/3 0.15 0.2 = 0.0057
(7)

2.1

HMM Terminology

A HMM Model is specified by:

- The set of states S = {s1 , s2 , . . . , sNs }, (corresponding to the three possible weather conditions
above),
and a set of parameters = {, A, B}:
- The prior probabilities i = P (q1 = si ) are the probabilities of si being the first state of a state
sequence. Collected in a vector . (The prior probabilities were assumed equi-probable in the last
example, i = 1/Ns .)
- The transition probabilities are the probabilities to go from state i to state j: ai,j = P (qn+1 =
sj |qn = si ). They are collected in the matrix A.
- The emission probabilities characterize the likelihood of a certain observation x, if the model is in
state si . Depending on the kind of observation x we have:
- for discrete observations, xn {v1 , . . . , vK }: bi,k = P (xn = vk |qn = si ), the probabilities to
observe vk if the current state is qn = si . The numbers bi,k can be collected in a matrix B.
(This would be the case for the weather model, with K = 2 possible observations v1 =
and
v2 = .)
- for continuous valued observations, e.g., xn RD : A set of functions bi (xn ) = p(xn |qn = si )
describing the probability densities (probability density functions, pdfs) over the observation
space for the system being in state si . Collected in the vector B(x) of functions. Emission
pdfs are often parametrized, e.g, by mixtures of Gaussians.
The operation of a HMM is characterized by
- The (hidden) state sequence Q = {q1 , q2 , . . . , qN },
from day 1 to N ).

qn S, (the sequence of the weather conditions

- The observation sequence X = {x1 , x2 , . . . , xN }.

A HMM allowing for transitions from any emitting state to any other emitting state is called an
ergodic HMM. The other extreme, a HMM where the transitions only go from one state to itself or to a
unique follower is called a left-right HMM.
Useful formula:
- Probability of a state sequence: the probability of a state sequence Q = {q1 , q2 , . . . , qN } coming
from a HMM with parameters corresponds to the product of the transition probabilities from
one state to the following:
P (Q|) = q1

N
1
Y

aqn ,qn+1 = q1 aq1 ,q2 aq2 ,q3 . . . aqN 1 ,qN

n=1

(8)

- Likelihood of an observation sequence given a state sequence, or likelihood of an observation sequence

along a single path: given an observation sequence X = {x1 , x2 , . . . , xN } and a state sequence Q =
{q1 , q2 , . . . , qN } (of the same length) determined from a HMM with parameters , the likelihood
of X along the path Q is equal to:
P (X|Q, ) =

N
Y

p(xn |qn , ) = bq1 ,x1 bq2 ,x2 . . . bqN ,xN

(9)

n=1

i.e., it is the product of the emission probabilities computed along the considered path.
- Joint likelihood of an observation sequence X and a path Q: it is the probability that X and Q occur
simultaneously, p(X, Q|), and decomposes into a product of the two quantities defined previously:
p(X, Q|) = p(X|Q, ) P (Q|)

(Bayes)

(10)

- Likelihood of a sequence with respect to a HMM: the likelihood of an observation sequence X =

{x1 , x2 , . . . , xN } with respect to a Hidden Markov Model with parameters expands as follows:
X
p(X|) =
p(X, Q|)
all Q

i.e., it is the sum of the joint likelihoods of the sequence over all possible state sequences Q allowed
by the model.

2.2

Trellis Diagram

A trellis diagram can be used to visualize likelihood calculations of HMMs. Figure 2 shows such a diagram
for a HMM with 3 states.
b1,k

b1,k

a1,1

b1,k

..........

State 1

..........

a1,2
b2,k

b2,k

a1,3

State 2

..........

b3,k

..........

State 3
Sequence:

b2,k

..........

n=1

n=2

..........

n=i

n=N

time

Figure 2: Trellis diagram

Each column in the trellis shows the possible states of the weather at a certain time n. Each state
in one column is connected to each state in the adjacent columns by the transition likelihood given by
the the elements ai,j of the transition matrix A (shown for state 1 at time 1 in figure 2). At the bottom
is the observation sequence X = {x1 , . . . , xN }. bi,k is the likelihood of the observation xn = vk in state
qn = si at time n.
Figure 3 shows the trellis diagram for Example 2 in sect. 2.0.1. The likelihood of the state sequence
given the observation sequence can be found by simply following the path in the trellis diagram, multiplying the observation and transition likelihoods.
L= b

= 1/3 0.9 0.15 0.7 0.2 0.9

(11)
(12)

= 0.9
,

= 0.15

= 0.2
,

STATES

= 0.7
,

Sequence:

x1 =

x2 =

n=1

n=2

x3 =

n=3

time

Figure 3: Trellis diagram for Example 2 in sect. 2.0.1

2.3
2.3.1

Generating samples from Hidden Markov Models

Question

Find the parameter set = {, A, B} that describes the weather HMM. Lets suppose that the prior
probabilities are the same for each state.
2.3.2

Experiment

Lets generate a sample sequence X coming from our weather HMM. First you have to specify the
parameter set = {, A, B} that describes the model.
>> %Choose the prior probabilities as you like: E.g., first day sunny
>> Pi w=[1 0 0]
The transition matrix is given in table 1:
>> A w = [0.8 0.05 0.15; 0.2 0.6 0.2; 0.2 0.3 0.5]
As the observation probabilities in this case are discrete probabilities, they can be saved in a matrix:
>> B w = [0.1 0.9; 0.8 0.2; 0.3 0.7]
In this observation matrix B, using the numbers from table 2, the rows correspond to the states and the
columns to our 2 discrete observations, i.e., b1,1 = p(umbrella = ) and b1,2 = p(umbrella = ) is the
probability of seeing an umbrella, resp. not seeing an umbrella, if the weather is in state s1 : qn = .
Use the function sample dhmm to do several draws with the weather model. View the resulting samples
and state sequences with the help of the function plotseq w. In the matrix containing the data, 1 stands
for umbrella= and 2 for umbrella= .
>> % drawing 1 sample of length 4 (4 days)
>> [data,hidden] = sample dhmm(Pi w,A w,B w,1,4)
>> plotseq w(data,hidden)

2.3.3

Task

Write a Matlab function prob path(), that computes the joint likelihood for a given observation sequence and an assumed path:
>> prob = prob path(data,path,Pi w,A w,B w)
Input arguments are the observed sequence data, the assumed path path, and the weather HMM parameters (prior probabilities Pi w, transition matrix A w, and emission matrix B w). Use the function obslik
= mk dhmm obs lik(data,obsmat) to produce an observation likelihood matrix where the entries correspond to bi,j in a trellis diagram (see figures 2 and 3). Type help mk dhmm obs lik for more detailed
information.
Produce same random sequences, plot them and calculate their likelihood.

You can test your function to make the likelihood calculation for the 2. Example from section 2.0.1
described above.(Input parameters: data=[2 2 2];path=[1 3 1];prior=[1/3 1/3 1/3].)
>> prob = prob path([2 2 2],[1 3 1], [1/3 1/3 1/3], A w, B w)

2.4

HMMs for speech recognition

Lets forget about the weather and umbrellas for a moment and talk about speech. In automatic speech
given some acoustic input, or:
recognition, the task is to find the most likely sequence of words W
= arg max P (W |X).
W

(13)

W W

Here, X = {x1 , x2 , . . . , xN } is the sequence of acoustic vectors or feature vectors that are
as the sequence of words W (out of all possible
extracted from the speech signal, and we want to find W
word sequences W), that maximizes P (W |X). To compare this to the weather example, the acoustic
feature vectors are our observations, corresponding to the umbrella observations, and the word sequence
corresponds to the weather on successive days (a day corresponds to about 10 ms), i. e., to the hidden
state sequence of a HMM for speech production.
Words are made of ordered sequences of phonemes: /h/ is followed by /e/ and then by /l/ and /O/
in the word hello. This structure can be adequately modeled by a left-right HMM, where each state
corresponds to a phone. Each phoneme can be considered to produce typical feature values according to
a particular probability density (possibly Gaussian) (Note, that the observed feature values xi now are
d-dimensional vectors and continuous valued, xi Rd , and no more discrete values, as for the weather
model where xi could only be true or false: xi { , }!).
In real world speech recognition, the phonemes themselves are often modeled as left-right HMMs
(e.g., to model separately the transition part at the begin of the phoneme, then the stationary part, and
finally the transition at the end). Words are then represented by large HMMs made of concatenations of
smaller phonetic HMMs.
Values used throughout the following experiments:
In the following experiments we will work with HMMs where the observations are drawn from a Gaussian
probability distribution. Instead of an observation matrix (as in the weather example) the observations
of each state are described by the mean value and the variance of a Gaussian density.
The following 2-dimensional Gaussian pdfs will be used to model simulated observations of the vowels
/a/, /i/, and /y/3 . The observed features are the first two formants (maxima of the spectral envelope),
which are characteristic for the vowel identity, e.g., for the vowel /a/ the formants typically occur around
frequencies 730 Hz and 1090 Hz.
"
#
"
#
730
1625
5300
State s1 (for /a/): N/a/ : /a/ =
, /a/ =
1090
5300 53300
"
State s2 (for /i/):

N/i/ :

/i/ =

440

"
,

/i/ =
"

2525

1200

36125

8000

8400

, /y/ =
1020
8400 18500
(Those densities have been used in the previous lab session.) The Gaussian pdfs now take the role of the
emission matrix B in the hidden Markov models for recognition of the three vowels /a/, /i/, and /y/.
The parameters of the densities and of the Markov models are stored in the file data.mat (Use: load
data). A Markov model named, e.g., hmm1 is stored as an object with fields hmm1.means, hmm1.vars,
hmm1.trans,hmm1.pi. The means fields contains a matrix of mean vectors, where each column of the
matrix corresponds to one state si of the HMM (e.g., to access the means of the second state of hmm1 use:
hmm1.means(:,2); the vars field contains a 3 dimensional array of variance matrices, where the third
3 Phonetic

/y/ =

2290
"

State s3 (for /y/): N/y/ :

270

symbol for the usual pronunciation of

b1,k

....

State 1

b2,k

b3,k

n=1

n=2

n=3

....

State 2

....

State 3

Sequence:

b1,k

....
b2,k

b3,k

....
....

b2,k

b3,k

n=i

n=N

time

Figure 4:
dimension corresponds to the state (e.g to access /a/ for state 1 use hmm1.vars(:,:,1); the trans field
contains the transition matrix, and the pi field the prior probabilities.
2.4.1

Task

Load the HMMs and make a sketch of each of the models with the transition probabilities.
2.4.2

Experiment

Generate samples using the HMMs and plot them with plotseq and plotseq2.
Use the functions plotseq and plotseq2 to plot the obtained 2-dimensional data. In the resulting views,
the obtained sequences are represented by a yellow line where each point is overlaid with a colored dot.
The different colors indicate the state from which any particular point has been drawn.
>> %Example:generate sample from HMM1 of length N
>> [X,ST]=sample ghmm(hmm1,N)
>> plotseq(X,ST) % View of both dimensions as separate sequences
>> plotseq2(X,ST,hmm1) %2D view with location of gaussian states
Draw several samples with the same parameters and compare. Compare the Matlab figures with
your sketch of the model.
What is the effect of the different transition matrices of the HMMs on the sequences obtained during the
current experiment? Hence, what is the role of the transition probabilities in the HMM?

Pattern Recognition with HMM

In equation 10 we expressed the joint likelihood p(X, Q|) of an observation sequence X and a path Q
given a model with parameters .
The likelihood of a sequence with respect to a HMM (the likelihood of an observation sequence X =
{x1 , x2 , , xN } for a given hidden Markov model with parameters ) expands as follows:
X
p(X|) =
p(X, Q|),
(14)
every possible Q

i.e., it is the sum of the joint likelihoods of the sequence over all possible state sequences allowed by the
model (see Trellis diagram in figure 3).
Calculating the likelihood in this manner is computationally expensive, particularly for large models
or long sequences. It can be done with a recursive algorithm (forward-backward algorithm), which reduces
the complexity of the problem. For more information about this algorithm see [1]. It is very common
using log-likelihoods and log-probabilities, computing log p(X|) instead of p(X|).

3.0.1

Experiment

Classify the sequences X1 , X2 , X3 , X4 , given in the file Xdata.mat, in a maximum likelihood sense with
respect to the four Markov models. Use the function loglik ghmm to compute the likelihood of a sequence
with respect to a HMM. Store the results in a matrix (they will be used in the next section).
>> load Xdata
>> % Example:
>> logProb(1,1) = loglik ghmm(X1,hmm1)
>> logProb(1,2) = loglik ghmm(X1,hmm2)
etc.
>> logProb(3,2) = loglik ghmm(X3,hmm2)
etc.
Instead of typing these commands for every combination of the four sequences and models, filling the
logProb matrix can be done automatically with the help of loops, using a command string composed of
fixed strings and strings containing the number of sequence/model:
>> for i=1:4,
for j=1:4,
stri = num2str(i);
strj = num2str(j);
cmd = [logProb(,stri,,,strj,) = loglik_ghmm(X,stri,,hmm,strj,);]
eval(cmd);
end;
end;
You can find the maximum value of each row i of the matrix, giving the index of the most likely model
for sequence Xi , with the Matlab function max:
for i=1:4;
[v,index]=max(logProb(i,:));
disp ([X,num2str(i), -> HMM,num2str(index)]);
end

Sequence

log p(Xi |1 )

log p(Xi |2 )

log p(Xi |3 )

log p(Xi |4 )

Most likely
model

Optimal state sequence

In speech recognition and several other pattern recognition applications, it is useful to associate an
optimal sequence of states to a sequence of observations, given the parameters of a model. For instance,
in the case of speech recognition, knowing which frames of features belong to which state allows to
locate the word boundaries across time. This is called alignment of acoustic feature sequences.
A reasonable optimality criterion consists of choosing the state sequence (or path) that has the
maximum likelihood with respect to a given model. This sequence can be determined recursively via the
Viterbi algorithm.
This algorithm makes use of two variables:
n (i) is the highest likelihood of a single path among all the paths ending in state i at time n:
n (i) =

max

q1 ,q2 ,...,qn1

p(q1 , q2 , . . . , qn1 , q n = qi , x1 , x2 , . . . xn |)

(15)

a variable n (i) which allows to keep track of the best path ending in state i at time n:
n (i) = arg max p(q1 , q2 , . . . , qn1 , q n = qi , x1 , x2 , . . . xn |)

(16)

q1 ,q2 ,...,qn1

The idea of the Viterbi algorithm is to find the most probable path for each intermediate and finally
for the terminating state in the trellis. At each time n only the most likely path leading to each state si
survives.

The Viterbi Algorithm

for a HMM with Ns states.
1. Initialization
1 (i) = i bi,x1 ,
1 (i) = 0

i = 1, . . . , Ns

(17)

where i is the prior probability of being in state si at time n = 1.

2. Recursion
n (j) = max (n1 (i) aij ) bj,xn ,
1iNs

n (j) = arg max (n1 (i) aij ) ,

1iNs

2nN
1 j Ns

(18)

2nN
1 j Ns

Optimal policy is composed of optimal sub-policies: find the path that leads to a maximum likelihood considering the best likelihood at the previous step and the transitions from it; then multiply
by the current likelihood given the current state. Hence, the best path is found by induction.
3. Termination
p (X|) = max N (i)
1iNs

(19)

= arg max N (i)

1iNs

Find the best likelihood when the end of the observation sequence t = T is reached.
4. Backtracking

Q = {q1 , . . . , qN
} so that qn = n+1 (qn+1
),

n = N 1, N 2, . . . , 1

(20)

Read (decode) the best sequence of states from the n vectors.

4.0.1

Example

Lets get back to our weather HMM. You dont know how the weather was when your were locked in. On
the first 3 days your umbrella observations are: {no umbrella, umbrella, umbrella} ({ , , }). Find
the most probable weather-sequence using the Viterbi algorithm (assume the 3 weather situations to be
equi-probable on day 1).
1. Initialization
n = 1:
1 ( ) = b

= 1/3 0.9 = 0.3

1 ( ) = 0
1 ( ) = b

= 1/3 0.2 = 0.0667

1 ( ) = 0
1 ( ) =

1 ( ) = 0
11

= 1/3 0.7 = 0.233

2 ( ) = max(1 ( ) a

, 1 ( ) a

STATES

1 = 0.3

2 ( ) =

Sequence:

x1 =

x2 =

n=1

x3 =

n=2

n=3

time

Figure 5: Viterbi algorithm to find the most likely weather sequence. Finding the most likely path to
state sunny at n = 2.
2. Recursion
n=2:
We calculate the likelihood of getting to state from all possible 3 predecessor states, and choose
the most likely one to go on with:
2 ( ) = max(1 ( ) a

, 1 ( ) a

) b

= max(0.3 0.8, 0.0667 0.2, 0.233 0.2) 0.1 = 0.024

2 ( ) =
The likelihood is stored in , the most likely predecessor in . See figure 5.
The same procedure is executed with states

2 ( ) = max(1 ( ) a

and

, 1 ( ) a

) b

= max(0.3 0.05, 0.0667 0.6, 0.233 0.3) 0.8 = 0.056

2 ( ) =
2 ( ) = max(1 ( ) a , , 1 ( ) a , , 1 ( ) a , ) b

= max(0.3 0.15, 0.0667 0.2, 0.233 0.5) 0.3 = 0.0350

2 ( ) =
n=3:
3 ( ) = max(2 ( ) a

, 2 ( ) a

) b

= max(0.024 0.8, 0.056 0.2, 0.035 0.2) 0.1 = 0.0019

3 ( ) =
3 ( ) = max(2 ( ) a , , 2 ( ) a , , 2 ( ) a , ) b ,
= max(0.024 0.05, 0.056 0.6, 0.035 0.3) 0.8 = 0.0269
3 ( ) =
3 ( ) = max(2 ( ) a , , 2 ( ) a , , 2 ( ) a , ) b ,
= max(0.0024 0.15, 0.056 0.2, 0.035 0.5) 0.3 = 0.0052
3 ( ) =
Finally, we get one most likely path ending in each state of the model. See figure 6.
12

STATES

3 ( ) = 0.0019

3 ( ) = 0.0269

3 ( ) = 0.0052

Sequence:

x1 =

n=1

x2 =

n=2

x3 =

n=3

time

Figure 6: Viterbi algorithm to find most likely weather sequence at n = 3.

STATES

3 ( ) = 0.0019

3 ( ) = 0.0269

3 ( ) = 0.0052

Sequence:

x1 =

n=1

x2 =

n=2

x3 =

n=3

time

Figure 7: Viterbi algorithm to find most likely weather sequence. Backtracking.

3. Termination
The globally most likely path is determined, starting by looking for the last state of the most likely
sequence.
P (X|) = max(3 (i)) = 3 ( ) = 0.0269
q3 = arg max(3 (i)) =
4. Backtracking
The best sequence of states can be read from the vectors. See figure 7.
n = N 1 = 2:
q2 = 3 (q3 ) = 3 ( ) =
n = N 1 = 1:
q1 = 2 (q2 ) = 2 ( ) =
Thus the most likely weather sequence is: Q = {q1 , q2 , q3 } = { , , }.
4.0.2

Task

1. Write a Matlab function [loglik,path] = vit ghmm(data,HMM) to implement the Viterbi algorithm for HMMs with Gaussian emissions. Use the function mk ghmm obs lik to calculate the
observation likelihood matrix for a given sequence. Store the and vectors in a matrix in the
format of the observation likelihood matrix. You can either write the function with the algorithm
exactly as given before, or switch to make the calculations in the log likelihood domain, where the

multiplications of the parameters and transform to additions. What are the advantages or disadvantages of the second method? (Think about how to implement it on a real system with limited
computational accuracy, and about a HMM with a large number of states where the probabilities
ai,j and bi,k might be small.)
2. Use the function vit ghmm to determine the most likely path for the sequences X1 , X2 , X3 and X4 .
Compare with the state sequence ST 1, . . . , ST 4 originally used to generate X1 , . . . , X4 . (Use the
function compseq, which provides a view of the first dimension of the observations as a time series,
and allows to compare the original alignment to the Viterbi solution).
>> [loglik,path] = vit ghmm(X1,hmm1); compseq(X1,ST1,path);
>> [loglik,path] = vit ghmm(X2,hmm1); compseq(X2,ST2,path);
Repeat for the remaining sequences.
3. Use the function vit ghmm to compute the probabilities of the sequences X1 , . . . , X4 along the best
paths with respect to each model 1 , . . . , 4 . Note down your results below. Compare with the
log-likelihoods obtained in section 3.0.1 using the forward procedure.
>> diffL=logProb-logProbViterbi
Likelihoods along the best path:

Sequence

log p (Xi |1 )

log p (Xi |2 )

log p (Xi |3 )

log p (Xi |4 )

Most likely
model

Difference between log-likelihoods and likelihoods along the best path:

Sequence

HMM1

HMM2

HMM3

HMM4

X1
X2
X3
X4
Question:
Is the likelihood along the best path a good approximation of the real likelihood of a sequence given a
model?

References
[1] Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech
recognition. Proceedings of the IEEE, 77 (2), 257286.

Learning Python Design Patterns - Second Edition - Sample Chapter
No ratings yet
Learning Python Design Patterns - Second Edition - Sample Chapter
16 pages
Football Analytics
No ratings yet
Football Analytics
15 pages
2024 Fall CSE366 12 HMM
No ratings yet
2024 Fall CSE366 12 HMM
46 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
4 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
9 pages
Fingerprint Sensor Based Biometric Attendance System Using Arduino
No ratings yet
Fingerprint Sensor Based Biometric Attendance System Using Arduino
12 pages
Computational Genomics Hidden Markov Models (HMMS)
No ratings yet
Computational Genomics Hidden Markov Models (HMMS)
55 pages
Hidden Markov Models: Julia Hirschberg CS4705
No ratings yet
Hidden Markov Models: Julia Hirschberg CS4705
37 pages
MLRD 8
No ratings yet
MLRD 8
39 pages
Winter Semester 2022-23 CSE3008 ETH AP2022236000448 Reference Material I 26-Apr-2023 HMM Class-1 PDF
No ratings yet
Winter Semester 2022-23 CSE3008 ETH AP2022236000448 Reference Material I 26-Apr-2023 HMM Class-1 PDF
56 pages
CS 4705 Hidden Markov Models: Slides Adapted From Dan Jurafsky, and James Martin
No ratings yet
CS 4705 Hidden Markov Models: Slides Adapted From Dan Jurafsky, and James Martin
35 pages
1.1. Multiple Sequence Alignment
No ratings yet
1.1. Multiple Sequence Alignment
16 pages
HMM
No ratings yet
HMM
41 pages
Parametric Models Hidden Markov Models
No ratings yet
Parametric Models Hidden Markov Models
30 pages
HMM - Extra
No ratings yet
HMM - Extra
17 pages
Pes2014 PC en
No ratings yet
Pes2014 PC en
14 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
16 pages
HMM Overview
No ratings yet
HMM Overview
16 pages
4 HMM
No ratings yet
4 HMM
52 pages
HMM
No ratings yet
HMM
24 pages
Chapter15 1
No ratings yet
Chapter15 1
36 pages
MarkovHMMPython Intro
No ratings yet
MarkovHMMPython Intro
20 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
26 pages
Machine Learning For Natural Language Processing: Hidden Markov Models
No ratings yet
Machine Learning For Natural Language Processing: Hidden Markov Models
33 pages
Lecture Week11
No ratings yet
Lecture Week11
24 pages
HMM
No ratings yet
HMM
9 pages
Hidden Markov Models (HMMS) : Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Hidden Markov Models (HMMS) : Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
cs229 HMM
No ratings yet
cs229 HMM
13 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
17 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
35 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
17 pages
24f 09 Hidden Markov Models
No ratings yet
24f 09 Hidden Markov Models
79 pages
6-AI Markov
No ratings yet
6-AI Markov
24 pages
Definition:: Hidden Markov Model (HMM)
No ratings yet
Definition:: Hidden Markov Model (HMM)
14 pages
Hidden Markov Model Submitted by Sawera Yaseen ROLL NO 1010 Submitted To MR - Hassan University of Okara
No ratings yet
Hidden Markov Model Submitted by Sawera Yaseen ROLL NO 1010 Submitted To MR - Hassan University of Okara
18 pages
HMM Intro
No ratings yet
HMM Intro
23 pages
ML Notes
No ratings yet
ML Notes
41 pages
HMM
No ratings yet
HMM
4 pages
Hidden Markov Model (HMM) Architecture
No ratings yet
Hidden Markov Model (HMM) Architecture
15 pages
BT302 L9 HMM
No ratings yet
BT302 L9 HMM
29 pages
Hidden Markov Model New
No ratings yet
Hidden Markov Model New
7 pages
Sequence Model:: Hidden Markov Models
No ratings yet
Sequence Model:: Hidden Markov Models
60 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Data Conversion Plan
No ratings yet
Data Conversion Plan
22 pages
W 10 Markov Model
No ratings yet
W 10 Markov Model
13 pages
Bayes and HMM
No ratings yet
Bayes and HMM
2 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
56 pages
Hidden Markovnikov Model
No ratings yet
Hidden Markovnikov Model
32 pages
Solution 1
No ratings yet
Solution 1
8 pages
Lec18 HMMs
No ratings yet
Lec18 HMMs
56 pages
BSBXTW401 Assessment Task 2
100% (1)
BSBXTW401 Assessment Task 2
55 pages
08 Odds and Ends v2 Annotated PDF
No ratings yet
08 Odds and Ends v2 Annotated PDF
49 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
36 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
5 pages
MSFT Cloud Architecture Identity PDF
No ratings yet
MSFT Cloud Architecture Identity PDF
6 pages
Markov Chain - Exe
No ratings yet
Markov Chain - Exe
6 pages
Viterbi Algorithm Demystified
No ratings yet
Viterbi Algorithm Demystified
15 pages
cs188 Fa22 Note17
No ratings yet
cs188 Fa22 Note17
8 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
55 pages
Teacher'S Work Assignment Plan
No ratings yet
Teacher'S Work Assignment Plan
4 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
Buell Expert Report Final Served
No ratings yet
Buell Expert Report Final Served
68 pages
MOVEX
No ratings yet
MOVEX
3 pages
NLP Mod5 Lec1 Markov Model and Pos
No ratings yet
NLP Mod5 Lec1 Markov Model and Pos
21 pages
Document 4 (2) - Merged
No ratings yet
Document 4 (2) - Merged
72 pages
A Revealing Introduction To Hidden Markov Models
No ratings yet
A Revealing Introduction To Hidden Markov Models
20 pages
SSIS
No ratings yet
SSIS
7 pages
Steps For Inbound ALE IDOC
No ratings yet
Steps For Inbound ALE IDOC
5 pages
How To Use A Recordset Filter
No ratings yet
How To Use A Recordset Filter
4 pages
Software Testing Mod 3
No ratings yet
Software Testing Mod 3
67 pages
Checklist - For - Software Verification Plan
No ratings yet
Checklist - For - Software Verification Plan
7 pages
M580-2CH User Manual 20230620
No ratings yet
M580-2CH User Manual 20230620
40 pages
Education Learning: Connections Ities e Business
No ratings yet
Education Learning: Connections Ities e Business
80 pages
Annual Maintenance Contract - IMA, Pune
No ratings yet
Annual Maintenance Contract - IMA, Pune
5 pages
Indoor Intelligence: Product Suite "Industrial Apps"
No ratings yet
Indoor Intelligence: Product Suite "Industrial Apps"
38 pages
FoP Ch10 P1 Object-Oriented Programming
No ratings yet
FoP Ch10 P1 Object-Oriented Programming
38 pages
The Flexible Body Move Better Anywhere Anytime
No ratings yet
The Flexible Body Move Better Anywhere Anytime
1 page
Subdivision Group Segment Revenue Expenses
No ratings yet
Subdivision Group Segment Revenue Expenses
8 pages
Postern of Fate PDF
No ratings yet
Postern of Fate PDF
3 pages
Anshul Final Ultra
No ratings yet
Anshul Final Ultra
3 pages
Encoding Video at The Edge With Intel® Xeon® Processors: Solution Brief
No ratings yet
Encoding Video at The Edge With Intel® Xeon® Processors: Solution Brief
4 pages
ACTIVITY 9 Important Shortcuts
No ratings yet
ACTIVITY 9 Important Shortcuts
5 pages
JessNoLimit's Instant Gusion Guide! Only For Those of You Who Have Quick Fingers! - Dunia Gamesz
No ratings yet
JessNoLimit's Instant Gusion Guide! Only For Those of You Who Have Quick Fingers! - Dunia Gamesz
4 pages
Mexico Localization User Guide Cfdi 4.0 - English
No ratings yet
Mexico Localization User Guide Cfdi 4.0 - English
3 pages
TypeScript - Training Contents-1 Days
No ratings yet
TypeScript - Training Contents-1 Days
1 page
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Analytical Methods of Optimization
From Everand
Analytical Methods of Optimization
D. F. Lawden
No ratings yet
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Exercises of Quantum Physics
From Everand
Exercises of Quantum Physics
Simone Malacrida
No ratings yet