0% found this document useful (0 votes)

74 views5 pages

Hidden Markov Models

1. The document discusses hidden Markov models (HMMs), which are probabilistic models that assume observations are generated by an underlying Markov process. 2. HMMs make two key assumptions: observations are conditionally independent given the current state, and the states form a Markov chain. 3. The EM algorithm is used to learn the parameters of HMMs, which include transition probabilities between states and emission distributions for observations from each state.

Uploaded by

situkangsayur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views5 pages

Hidden Markov Models

Uploaded by

situkangsayur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Note Set 5: Hidden Markov Models

Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016

1 Hidden Markov Models (HMMs)

1.1 Introduction

Consider observed data vectors xt that are d-dimensional and where the subscript t indicates discrete time
or position in a sequence, t = 1, . . . , T . We can assume for notational simplicity that the xt vector is
real-valued, but in general xt could be discrete or could be a mixture of discrete and real-valued.
Our data consists of D = {x1 , . . . , xT }. Unlike the IID assumption we have made in the past we would
now like to model the sequential dependence among the xt s. One approach is to use a hidden Markov
model where we assume that the xt s are noisy stochastic functions of an unobserved (hidden) Markov chain
denoted by zt , where zt is a discrete random variable taking one of K possible values, zt {1, . . . , K}. zt
is often referred to as the state variable.
The generative model for a hidden Markov model is simple: at each time step t, a data vector xt is
generated conditioned on the state zt , and the Markov chain then transitions to a new state zt+1 to generate
xt+1 , and so on. As with standard Markov chains there is an initial distribution over the K states to
initialize the chain.
There are two key assumptions in a hidden Markov model:

1. Observations xt are conditionally independent of all other variables given zt , so the observation at
time t depends only on the current state zt .

2. The zt s form a (first order) Markov chain, i.e., p(zt |zt1 , . . . , z1 ) = p(zt |zt1 ), t = 2, . . . , T . The
chain is also typically assumed to be homogeneous in that the transition probabilities do not depend
on t.

The earliest application of hidden Markov models was in speech recognition in the 1970s, but they have
since been used for various other problems in bioinformatics, language modeling, economics, and climate
modeling.
We will use the shorthand notation x[1,T ] for x1 , . . . , xT , and z[1,T ] for z1 , . . . , zT . Our observed data is
D = {x1 , . . . , xT }. From our graphical model we have
T
Y
p(x[1,T ] , z[1,T ] ) = p(xt |zt )p(zt |zt1 )
t=1

1
Note Set 5: Hidden Markov Models: CS 274A, Probabilistic Learning: Fall 2016 2

(where p(z1 |z0 ) = , the initial distribution on states.)

We have two sets of parameters:

Transition matrix, K K matrix A, with aij = p(zt = j|zt1 = i), 1 i K.

K emission distributions/densities p(xt |zt = j), j = 1 . . . K, e.g., multivariate Gaussian for real-
valued xt , and usually assumed to be homogeneous (i.e., does not depend on t). If xt is very high-
dimensional it is common to assume that the components of x are conditionally independent given
zt .

For simplicity we will assume that the initial distribution on states is known, e.g., set to the uniform
distribution, but if we had multiple sequences this could also be learned from the data. We will let indicate
the full set of parameters, i.e., the transition matrix parameters in A and the parameters of K state-dependent
emission distributions p(xt |zt = j), j = 1 . . . K.
Note the similarity of the HMM to a finite mixture model with K components. In particular, the HMM
can be viewed as adding Markov dependence to the unobserved component indicator variable z in a mixture
model.

1.2 Efficient Computation of HMM Likelihood

Below we show how to compute the likelihood L() (where both the K emission density parameters and
the transition matrix A are unknown)?

L() = p(D|)
= p(x[1,T ] |)
X
= p(x[1,T ] , z[1,T ] |)
z[1,T ]

But this sum is intractable to compute directly, since it has complexity O(K T ). However, we can use the
conditional independence assumptions (or equivalently the graphical model structure in the HMM) to carry
out this computation efficiently.
Let t (j) = p(zt = j, x[1,t] ), j = 1, . . . , K (implicitly conditioning on ). This is the joint probability
of (a) the unobserved state at time t being in state j and (b) all of the observed xt s up to and including time
Note Set 5: Hidden Markov Models: CS 274A, Probabilistic Learning: Fall 2016 3

t (j) = p(zt = j, x[1,t] )

The first term is the evidence from the observation xt at time t, the second term is the transition probability,
and the final term is just t1 (i). This yields a simple recurrence relation for the s.
We can compute the t (j)s recursively, where t (j) = p(zt = j, x[1,t] ). We can do this in a single
forward pass, from t = 1 up to t = T , initializing the recursion with 0 (j) = (j). This is the forward part
of the well-known forward-backward algorithm for HMMs. Then given T (j), j = 1, . . . , K, the likelihood
P
can be computed as L() = j T (j), by the LTP.
If we know t1 (i), i = 1, . . . , K, we can compute t (j) in time O(K 2 + Kf (d)). The K 2 is because
we have to compute the probability for all i, j pairs, and the function f reflects the complexity of computing
the likelihood of the data vector xt for each possible state, e.g., f (d) = O(d2 ) for a Gaussian emission
density. The overall complexity of computing all of the s is O(T K 2 + T Kf (d)).

1.3 Efficient Computation of State Probabilities

In developing an EM algorithm for HMMs we will want to compute the probability of each possible state at
each time t given all of the observed data, i.e., p(zt = j|x[1,T ] ) (using all of the data, both before and after
t). We factor it as follows.

p(zt = j|x[1,T ] ) p(zt = j, x[1,t] , x[t+1,T ] )

= p(x[t+1,T ] |zt = j, x[1,t] ) p(zt = j, x[1,t] )
= p(x[t+1,T ] |zt = j) p(zt = j, x[1,t] )
= p(x[t+1,T ] |zt = j) t (j)

Note that given zt = j, the x[1,t] values give us no additional information about x[t+1,T ] (which is how we
get from the 2nd to the 3rd line above).
Define t (j) = p(x[t+1,T ] |zt = j), t = 1, . . . , T, j = 1, . . . , K. Then, from above, we have

p(zt = j|x[1,T ] ) t (j)t (j)

Note Set 5: Hidden Markov Models: CS 274A, Probabilistic Learning: Fall 2016 4

Using the same type of recursive decomposition as we used for t (j), the t (j)s can be computed in time
O(T K 2 + T Kf (d)), working backwards from t = T to t = 1.
Thus, to compute p(zt = j|x[1,T ] ), t = 1, . . . , T, j = 1, . . . , K:

We first recursively compute the t (j)s (forward step).

Next we recursively compute the t (j)s (backward step).

Finally, we can compute p(zt = j|x[1,T ] ) as

t (j)t (j)
wt (j) = p(zt = j|x[1,T ] ) = P
k t (k)t (k)
where the denominator is the normalization term. This yields a set of T K probabilities wt (j),
playing the same role in EM as the membership weights for mixture models.

2 EM for learning HMM Parameters

The EM algorithm for HMMs follows the same general idea as the EM algorithm for finite mixtures.
In the E-step we compute the probabilities wt (j) (or membership weights) of the unobserved states, for
each state j and each time t, conditioned on all of the data x[1,T ] and conditioned on the current parameters
.
In the M-step we compute point estimates of the parameters given the membership weights from the
E-step. There are two different sets of parameters: (1) the emission density parameters p(xt |zt = j), and
(2) the transition parameters aij , 1 i, j, K.
The estimation of the emission density parameters proceeds in exactly the same manner as for the finite
mixture case. For example, if the emission densities are Gaussian, then the membership weights are used to
generate fractional counts for estimating the mean and covariance for each of the K emission densities.
For the transition probabilities we proceed as follows. We first compute E[Nj ], the expected number of
times in state j, which is Tt=1 wt (j).
P

Next, we need to compute E[Nij ], the expected number of times we transition from state i to state j.
T
X 1
E[Nij ] = p(zt = i, zt+1 = j|x[1,T ] )
t=1
T
X 1
p(zt = i, zt=1 = j, x[1,T ] )
t=1

Letting t (i, j) = p(zt = i, zt=1 = j, x[1,T ] ), we have

t (i, j) = p(zt = i, zt=1 = j, x[1,T ] )
= p(x[t+2,T ] |zt+1 = j) p(xt+1 |zt+1 = j) p(zt+1 = j|zt = i) p(zt = i, x[1,t] )
= t+1 (j) p(xt+1 |zt+1 = j) aij t (i).
Note Set 5: Hidden Markov Models: CS 274A, Probabilistic Learning: Fall 2016 5

In going from the first to the second line we have used various conditional independence properties that
exist in the model. The final line consists of quantities that can easily be computed, e.g., they be computed
directly from the model (p(xt+1 |zt+1 = j)), or are known parameters (aij ), or have been computed during
the forward-backward computations of the E-step (the s and s).
We then normalize the s to get the conditional probabilities we need, i.e.,

t (i, j)
p(zt = i, zt+1 = j|x[1,T ] ) = P P
k1 k2 t (k1 , k2 )

from which we can compute E[Nij ] above.

The M step for the transition probabilities is now very simple:

E[Nij ]
aij = , 1 i, j K
E[Nj ]

Hidden Markov Models 3pb6fukspf
No ratings yet
Hidden Markov Models 3pb6fukspf
29 pages
Lecture 11
No ratings yet
Lecture 11
55 pages
AML Mod2
No ratings yet
AML Mod2
38 pages
Hidden Markovnikov Model
No ratings yet
Hidden Markovnikov Model
32 pages
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
No ratings yet
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
26 pages
Slides
No ratings yet
Slides
69 pages
Assignment 7
No ratings yet
Assignment 7
8 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
10 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
32 pages
Module 6.2
No ratings yet
Module 6.2
25 pages
Hidden Markov Models and Sequential Data
No ratings yet
Hidden Markov Models and Sequential Data
45 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
35 pages
Machine Learning For Natural Language Processing: Hidden Markov Models
No ratings yet
Machine Learning For Natural Language Processing: Hidden Markov Models
33 pages
HMM Tutorial
No ratings yet
HMM Tutorial
15 pages
Знімок екрана 2022-10-31 о 18.56.30
No ratings yet
Знімок екрана 2022-10-31 о 18.56.30
96 pages
Aiml Module 04
No ratings yet
Aiml Module 04
62 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
26 pages
Hidden Markov Modelss
No ratings yet
Hidden Markov Modelss
59 pages
NLP Lecture 01-10-Hmm
No ratings yet
NLP Lecture 01-10-Hmm
9 pages
Markov Models
No ratings yet
Markov Models
54 pages
Hidden Markov Models: A Simple Markov Chain
No ratings yet
Hidden Markov Models: A Simple Markov Chain
46 pages
斯坦福大学机器学习数学基础 65-72
No ratings yet
斯坦福大学机器学习数学基础 65-72
8 pages
cs229 HMM
No ratings yet
cs229 HMM
13 pages
MTL106 Assignment2B
No ratings yet
MTL106 Assignment2B
4 pages
Parametric Models Hidden Markov Models
No ratings yet
Parametric Models Hidden Markov Models
30 pages
19-Hidden Markov Models
No ratings yet
19-Hidden Markov Models
17 pages
Hidden Markov Models: CH 3.2, 3.2 of DEKM
No ratings yet
Hidden Markov Models: CH 3.2, 3.2 of DEKM
27 pages
Chapter 18
0% (1)
Chapter 18
56 pages
Multilevel Modeling Using R
No ratings yet
Multilevel Modeling Using R
253 pages
I2ml3e Chap15
No ratings yet
I2ml3e Chap15
22 pages
Recitation4 Notes
No ratings yet
Recitation4 Notes
6 pages
HMM Cuda Baum Welch
No ratings yet
HMM Cuda Baum Welch
8 pages
Lec20 PDF
No ratings yet
Lec20 PDF
7 pages
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
No ratings yet
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
29 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
6 pages
Lectures 7 and 8
No ratings yet
Lectures 7 and 8
37 pages
ML 5
No ratings yet
ML 5
28 pages
Sequence Model:: Hidden Markov Models
No ratings yet
Sequence Model:: Hidden Markov Models
60 pages
Lec 11
No ratings yet
Lec 11
7 pages
Rose Sparkling Wine
100% (1)
Rose Sparkling Wine
32 pages
Lecture07 HMM S
No ratings yet
Lecture07 HMM S
26 pages
Introduction To Machine Learning CMU-10701: Hidden Markov Models
No ratings yet
Introduction To Machine Learning CMU-10701: Hidden Markov Models
30 pages
Applications of Hidden Markov Model Stat-1
No ratings yet
Applications of Hidden Markov Model Stat-1
8 pages
Hidden Markov Models: Background
No ratings yet
Hidden Markov Models: Background
13 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
56 pages
Cu HMM
No ratings yet
Cu HMM
13 pages
The Infinite Hidden Markov Model
No ratings yet
The Infinite Hidden Markov Model
8 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
20 pages
Hidden Markov Models and Their Applications in Biological Sequence Analysis Byung-Jun Yoon
No ratings yet
Hidden Markov Models and Their Applications in Biological Sequence Analysis Byung-Jun Yoon
30 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
55 pages
Atmosphere 12 00005 s001
No ratings yet
Atmosphere 12 00005 s001
7 pages
Hidden Markov Model in Machine Learning
No ratings yet
Hidden Markov Model in Machine Learning
2 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Hidden Markov Models Common Probabilities HMM Diagram
No ratings yet
Hidden Markov Models Common Probabilities HMM Diagram
2 pages
Regression
No ratings yet
Regression
24 pages
hw3 Solution
No ratings yet
hw3 Solution
7 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
36 pages
Factorial Hidden Markov Models
No ratings yet
Factorial Hidden Markov Models
29 pages
Seattle SISG 18 IntroQG Lecture08
No ratings yet
Seattle SISG 18 IntroQG Lecture08
21 pages
MANISH BHUSHAN - OPSCM Project
No ratings yet
MANISH BHUSHAN - OPSCM Project
4 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
41 pages
Ams L 01 Introduction
No ratings yet
Ams L 01 Introduction
38 pages
I2ml3e Chap15
No ratings yet
I2ml3e Chap15
22 pages
One Way Analysis of Variance and DMRT For G9
No ratings yet
One Way Analysis of Variance and DMRT For G9
32 pages
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
4.5/5 (2)
Baum Welch HMM
No ratings yet
Baum Welch HMM
24 pages
OpenGL Lecture 06
No ratings yet
OpenGL Lecture 06
18 pages
Cia 2
No ratings yet
Cia 2
14 pages
Estimating A Dirichlet Distribution Thomas P. Minka
No ratings yet
Estimating A Dirichlet Distribution Thomas P. Minka
15 pages
Estimating A Dirichlet Distribution Thomas P. Minka
No ratings yet
Estimating A Dirichlet Distribution Thomas P. Minka
15 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
5 pages
Rank Correlation
No ratings yet
Rank Correlation
18 pages
Statistical Intervals.: INEN 5320 Statistical Decision Making Dr. Alberto Marquez Lamar University
No ratings yet
Statistical Intervals.: INEN 5320 Statistical Decision Making Dr. Alberto Marquez Lamar University
20 pages
Algorithms - Hidden Markov Models
No ratings yet
Algorithms - Hidden Markov Models
7 pages
Chi-Square Test and Odds Ratio - Tagged
No ratings yet
Chi-Square Test and Odds Ratio - Tagged
45 pages
Hmms and The Forward-Backward Algorithm: Hidden Markov Models
No ratings yet
Hmms and The Forward-Backward Algorithm: Hidden Markov Models
7 pages
Biostatistics Mcqs With Key
97% (29)
Biostatistics Mcqs With Key
13 pages
A Comprehensive Approach To Misspecification Testing in Linear Regression Models
No ratings yet
A Comprehensive Approach To Misspecification Testing in Linear Regression Models
6 pages
PS & PQT - Unit I MCQ
No ratings yet
PS & PQT - Unit I MCQ
18 pages
Deep Boltzmann Machines
No ratings yet
Deep Boltzmann Machines
8 pages
Autocorrelation
No ratings yet
Autocorrelation
4 pages
Normal Distribution Test
No ratings yet
Normal Distribution Test
2 pages
Hidden Markov Model HMM
No ratings yet
Hidden Markov Model HMM
33 pages
Distribution of Normal Variables
No ratings yet
Distribution of Normal Variables
6 pages
Cailloux Taxonomy 2007
No ratings yet
Cailloux Taxonomy 2007
18 pages
Business Statistics Model Paper - 2024-25
No ratings yet
Business Statistics Model Paper - 2024-25
3 pages
Bayes Ch7
No ratings yet
Bayes Ch7
11 pages
Kurtosis
No ratings yet
Kurtosis
2 pages
Sec Docs
No ratings yet
Sec Docs
1 page
First Docs
No ratings yet
First Docs
1 page
Head Tail Simulation
No ratings yet
Head Tail Simulation
4 pages
Hsieh Yeung 1986 Active Neck Motion Measurements With A Tape Measure
No ratings yet
Hsieh Yeung 1986 Active Neck Motion Measurements With A Tape Measure
3 pages
Exam 434
No ratings yet
Exam 434
4 pages
MIT18 445S15 Lecture13
No ratings yet
MIT18 445S15 Lecture13
6 pages
Sanfoundry
No ratings yet
Sanfoundry
3 pages
Application of Statistical Concepts in The Determination of Weight Variation in Coin Samples
No ratings yet
Application of Statistical Concepts in The Determination of Weight Variation in Coin Samples
2 pages
Problem Set 1
No ratings yet
Problem Set 1
1 page

Hidden Markov Models

Uploaded by

Hidden Markov Models

Uploaded by

Note Set 5: Hidden Markov Models

Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016

1 Hidden Markov Models (HMMs)

(where p(z1 |z0 ) = , the initial distribution on states.)

Transition matrix, K K matrix A, with aij = p(zt = j|zt1 = i), 1 i K.

1.2 Efficient Computation of HMM Likelihood

t (j) = p(zt = j, x[1,t] )

1.3 Efficient Computation of State Probabilities

p(zt = j|x[1,T ] ) p(zt = j, x[1,t] , x[t+1,T ] )

p(zt = j|x[1,T ] ) t (j)t (j)

We first recursively compute the t (j)s (forward step).

Next we recursively compute the t (j)s (backward step).

Finally, we can compute p(zt = j|x[1,T ] ) as

2 EM for learning HMM Parameters

Letting t (i, j) = p(zt = i, zt=1 = j, x[1,T ] ), we have

from which we can compute E[Nij ] above.

You might also like