0% found this document useful (0 votes)

4 views8 pages

cs188 Fa22 Note17

The document discusses Markov models and their application in modeling time-dependent systems, using weather patterns as an example. It explains the structure of Markov models, the mini-forward algorithm for calculating distributions, and the concept of stationary distributions. Additionally, it introduces Hidden Markov Models, which incorporate evidence at each timestep to update beliefs about the system's state.

Uploaded by

sh_mpa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views8 pages

cs188 Fa22 Note17

Uploaded by

sh_mpa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CS 188 Introduction to Artificial Intelligence

Fall 2022 Note 17

These lecture notes are heavily based on notes originally written by Nikhil Sharma.
Last updated: October 25, 2022

Markov Models
In previous notes, we talked about Bayes’ nets and how they are a wonderful structure used for compactly
representing relationships between random variables. We’ll now cover a very intrinsically related structure
called a Markov model, which for the purposes of this course can be thought of as analogous to a chain-
like, infinite-length Bayes’ net. The running example we’ll be working with in this section is the day-to-day
fluctuations in weather patterns. Our weather model will be time-dependent (as are Markov models in
general), meaning we’ll have a separate random variable for the weather on each day. If we define Wi as the
random variable representing the weather on day i, the Markov model for our weather example would look
like this:

What information should we store about the random variables involved in our Markov model? To track
how our quantity under consideration (in this case, the weather) changes over time, we need to know both
it’s initial distribution at time t = 0 and some sort of transition model that characterizes the probability
of moving from one state to another between timesteps. The initial distribution of a Markov model is
enumerated by the probability table given by P(W0 ) and the transition model of transitioning from state i to
i + 1 is given by P(Wi+1 |Wi ). Note that this transition model implies that the value of Wi+1 is conditionally
dependent only on the value of Wi . In other words, the weather at time t = i + 1 satisfies the Markov
property or memoryless property, and is independent of the weather at all other timesteps besides t = i.
Using our Markov model for weather, if we wanted to reconstruct the joint between W0 , W1 , and W2 using
the chain rule, we would want:

P(W0 ,W1 ,W2 ) = P(W0 )P(W1 |W0 )P(W2 |W1 ,W0 )

However, with our assumption that the Markov property holds true and W0 ⊥⊥ W2 |W1 , the joint simplifies to:

P(W0 ,W1 ,W2 ) = P(W0 )P(W1 |W0 )P(W2 |W1 )

And we have everything we need to calculate this from the Markov model. More generally, Markov models
make the following independence assumption at each timestep: Wi+1 ⊥⊥ {W0 , ...,Wi−1 }|Wi . This allows us
to reconstruct the joint distribution for the first n + 1 variables via the chain rule as follows:
n−1
P(W0 ,W1 , ...,Wn ) = P(W0 )P(W1 |W0 )P(W2 |W1 )...P(Wn |Wn−1 ) = P(W0 ) ∏ P(Wi+1 |Wi )
i=0

CS 188, Fall 2022, Note 17 1

A final assumption that’s typically made in Markov models is that the transition model is stationary. In
other words, for all values of i (all timesteps), P(Wi+1 |Wi ) is identical. This allows us to represent a Markov
model with only two tables: one for P(W0 ) and one for P(Wi+1 |Wi ).

The Mini-Forward Algorithm

We now know how to compute the joint distribution across timesteps of a Markov model. However, this
doesn’t explicitly help us answer the question of the distribution of the weather on some given day t. Nat-
urally, we can compute the joint then marginalize (sum out) over all other variables, but this is typically
extremely inefficient, since if we have j variables each of which can take on d values, the size of the joint
distribution is O(d j ). Instead, we’ll present a more efficient technique called the mini-forward algorithm.
Here’s how it works. By properties of marginalization, we know that

P(Wi+1 ) = ∑ P(wi ,Wi+1 )

By the chain rule we can re-express this as follows:

P(Wi+1 ) = ∑ P(Wi+1 |wi )P(wi )

This equation should make some intuitive sense — to compute the distribution of the weather at timestep
i + 1, we look at the probability distribution at timestep i given by P(Wi ) and "advance" this model a timestep
with our transition model P(Wi+1 |Wi ). With this equation, we can iteratively compute the distribution of the
weather at any timestep of our choice by starting with our initial distribution P(W0 ) and using it to compute
P(W1 ), then in turn using P(W1 ) to compute P(W2 ), and so on. Let’s walk through an example, using the
following initial distribution and transition model:

Wi+1 Wi P(Wi+1 |Wi )

W0 P(W0 ) sun sun 0.6
sun 0.8 rain sun 0.4
rain 0.2 sun rain 0.1
rain rain 0.9

Using the mini-forward algorithm we can compute P(W1 ) as follows:

P(W1 = sun) = ∑ P(W1 = sun|w0 )P(w0 )

Hence our distribution for P(W1 ) is

W1 P(W1 )
sun 0.5
rain 0.5

CS 188, Fall 2022, Note 17 2

Notably, the probability that it will be sunny has decreased from 80% at time t = 0 to only 50% at time t = 1.
This is a direct result of our transition model, which favors transitioning to rainy days over sunny days. This
gives rise to a natural follow-up question: does the probability of being in a state at a given timestep ever
converge? We’ll address the answer to this problem in the following section.

Stationary Distribution
To solve the problem stated above, we must compute the stationary distribution of the weather. As the
name suggests, the stationary distribution is one that remains the same after the passage of time, i.e.
P(Wt+1 ) = P(Wt )
We can compute these converged probabilities of being in a given state by combining the above equivalence
with the same equation used by the mini-forward algorithm:
P(Wt+1 ) = P(Wt ) = ∑ P(Wt+1 |wt )P(wt )
wt

For our weather example, this gives us the following two equations:

P(Wt = sun) = P(Wt+1 = sun|Wt = sun)P(Wt = sun) + P(Wt+1 = sun|Wt = rain)P(Wt = rain)
= 0.6 · P(Wt = sun) + 0.1 · P(Wt = rain)
P(Wt = rain) = P(Wt+1 = rain|Wt = sun)P(Wt = sun) + P(Wt+1 = rain|Wt = rain)P(Wt = rain)
= 0.4 · P(Wt = sun) + 0.9 · P(Wt = rain)

We now have exactly what we need to solve for the stationary distribution, a system of 2 equations in 2
unknowns! We can get a third equation by using the fact that P(Wt ) is a probability distribution and so must
sum to 1:

P(Wt = sun) = 0.6 · P(Wt = sun) + 0.1 · P(Wt = rain)

P(Wt = rain) = 0.4 · P(Wt = sun) + 0.9 · P(Wt = rain)
1 = P(Wt = sun) + P(Wt = rain)

Solving this system of equations yields P(Wt = sun) = 0.2 and P(Wt = rain) = 0.8. Hence the table for our
stationary distribution, which we’ll henceforth denote as P(W∞ ), is the following:

W∞ P(W∞ )
sun 0.2
rain 0.8

To verify this result, let’s apply the transition model to the stationary distribution:

P(W∞+1 = sun) = P(W∞+1 = sun|W∞ = sun)P(W∞ = sun) + P(W∞+1 = sun|W∞ = rain)P(W∞ = rain)
= 0.6 · 0.2 + 0.1 · 0.8 = 0.2
P(W∞+1 = rain) = P(W∞+1 = rain|W∞ = sun)P(W∞ = sun) + P(W∞+1 = rain|W∞ = rain)P(W∞ = rain)
= 0.4 · 0.2 + 0.9 · 0.8 = 0.8

As expected, P(W∞+1 ) = P(W∞ ). In general, if Wt had a domain of size k, the equivalence

P(Wt ) = ∑ P(Wt+1 |wt )P(wt )
wt

yields a system of k equations, which we can use to solve for the stationary distribution.

CS 188, Fall 2022, Note 17 3

Hidden Markov Models
With Markov models, we saw how we could incorporate change over time through a chain of random vari-
ables. For example, if we want to know the weather on day 10 with our standard Markov model from above,
we can begin with the initial distribution P(W0 ) and use the mini-forward algorithm with our transition
model to compute P(W10 ). However, between time t = 0 and time t = 10, we may collect new meteoro-
logical evidence that might affect our belief of the probability distribution over the weather at any given
timestep. In simpler terms, if the weather forecasts an 80% chance of rain on day 10, but there are clear
skies on the night of day 9, that 80% probability might drop drastically. This is exactly what the Hidden
Markov Model helps us with - it allows us to observe some evidence at each timestep, which can potentially
affect the belief distribution at each of the states. The Hidden Markov Model for our weather model can be
described using a Bayes’ net structure that looks like the following:

Unlike vanilla Markov models, we now have two different types of nodes. To make this distinction, we’ll
call each Wi a state variable and each weather forecast Fi an evidence variable. Since Wi encodes our belief
of the probability distribution for the weather on day i, it should be a natural result that the weather forecast
for day i is conditionally dependent upon this belief. The model implies similar conditional indepencence
relationships as standard Markov models, with an additional set of relationships for the evidence variables:

F1 ⊥⊥ W0 |W1
∀i = 2, . . . , n; Wi ⊥⊥ {W0 , . . . ,Wi−2 , F1 , . . . , Fi−1 }|Wi−1
∀i = 2, . . . , n; Fi ⊥⊥ {W0 , . . . ,Wi−1 , F1 , . . . , Fi−1 }|Wi

Just like Markov models, Hidden Markov Models make the assumption that the transition model P(Wi+1 |Wi )
is stationary. Hidden Markov Models make the additional simplifying assumption that the sensor model
P(Fi |Wi ) is stationary as well. Hence any Hidden Markov Model can be represented compactly with just
three probability tables: the initial distribution, the transition model, and the sensor model.
As a final point on notation, we’ll define the belief distribution at time i with all evidence F1 , . . . , Fi observed
up to date:
B(Wi ) = P(Wi | f1 , . . . , fi )
Similarly, we’ll define B′ (Wi ) as the belief distribution at time i with evidence f1 , . . . , fi−1 observed:
B′ (Wi ) = P(Wi | f1 , . . . , fi−1 )
Defining ei as evidence observed at timestep i, you might sometimes see the aggregated evidence from
timesteps 1 ≤ i ≤ t reexpressed in the following form:
e1:t = e1 , . . . , et
Under this notation, P(Wi | f1 , . . . , fi−1 ) can be written as P(Wi | f1:(i−1) ). This notation will become relevant
in the upcoming sections, where we’ll discuss time elapse updates that iteratively incorporate new evidence
into our weather model.

CS 188, Fall 2022, Note 17 4

The Forward Algorithm
Using the conditional probability assumptions stated above and marginalization properties of conditional
probability tables, we can derive a relationship between B(Wi ) and B′ (Wi+1 ) that’s of the same form as the
update rule for the mini-forward algorithm. We begin by using marginalization:
B′ (Wi+1 ) = P(Wi+1 | f1 , . . . , fi ) = ∑ P(Wi+1 , wi | f1 , . . . , fi )
wi

This can be reexpressed then with the chain rule as follows:

B′ (Wi+1 ) = P(Wi+1 | f1 , . . . , fi ) = ∑ P(Wi+1 |wi , f1 , . . . , fi )P(wi | f1 , . . . , fi )
wi

Noting that P(wi | f1 , . . . , fi ) is simply B(wi ) and that Wi+1 ⊥⊥ { f1 , . . . fi }|Wi , this simplies to our final rela-
tionship between B(Wi ) to B′ (Wi+1 ):

B′ (Wi+1 ) = ∑ P(Wi+1 |wi )B(wi )

Now let’s consider how we can derive a relationship between B′ (Wi+1 ) and B(Wi+1 ). By application of the
definition of conditional probability (with extra conditioning), we can see that
P(Wi+1 , fi+1 | f1 , . . . , fi )
B(Wi+1 ) = P(Wi+1 | f1 , . . . , fi+1 ) =
P( fi+1 | f1 , . . . , fi )
When dealing with conditional probabilities a commonly used trick is to delay normalization until we require
the normalized probabilities, a trick we’ll now employ. More specifically, since the denominator in the
above expansion of B(Wi+1 ) is common to every term in the probability table represented by B(Wi+1 ), we
can omit actually dividing by P( fi+1 | f1 , . . . , fi ). Instead, we can simply note that B(Wi+1 ) is proportional to
P(Wi+1 , fi+1 | f1 , . . . , fi ):
B(Wi+1 ) ∝ P(Wi+1 , fi+1 | f1 , . . . , fi )
with a constant of proportionality equal to P( fi+1 | f1 , . . . , fi ). Whenever we decide we want to recover the
belief distribution B(Wi+1 ), we can divide each computed value by this constant of proportionality. Now,
using the chain rule we can observe the following:
B(Wi+1 ) ∝ P(Wi+1 , fi+1 | f1 , . . . , fi ) = P( fi+1 |Wi+1 , f1 , . . . , fi )P(Wi+1 | f1 , . . . , fi )
By the conditional independence assumptions associated with Hidden Markov Models stated previously,
P( fi+1 |Wi+1 , f1 , . . . , fi ) is equivalent to simply P( fi+1 |Wi+1 ) and by definition P(Wi+1 | f1 , . . . , fi ) = B′ (Wi+1 ).
This allows us to express the relationship between B′ (Wi+1 ) and B(Wi+1 ) in it’s final form:

B(Wi+1 ) ∝ P( fi+1 |Wi+1 )B′ (Wi+1 )

Combining the two relationships we’ve just derived yields an iterative algorithm known as the forward
algorithm, the Hidden Markov Model analog of the mini-forward algorithm from earlier:

B(Wi+1 ) ∝ P( fi+1 |Wi+1 ) ∑ P(Wi+1 |wi )B(wi )

The forward algorithm can be thought of as consisting of two distinctive steps: the time elapse update
which corresponds to determining B′ (Wi+1 ) from B(Wi ) and the observation update which corresponds to
determining B(Wi+1 ) from B′ (Wi+1 ). Hence, in order to advance our belief distribution by one timestep
(i.e. compute B(Wi+1 ) from B(Wi )), we must first advance our model’s state by one timestep with the time
elapse update, then incorporate new evidence from that timestep with the observation update. Consider the
following initial distribution, transition model, and sensor model:

CS 188, Fall 2022, Note 17 5

Wi+1 Wi P(Wi+1 |Wi ) Fi Wi P(Fi |Wi )
W0 B(W0 ) sun sun 0.6 good sun 0.8
sun 0.8 rain sun 0.4 bad sun 0.2
rain 0.2 sun rain 0.1 good rain 0.3
rain rain 0.9 bad rain 0.7

To compute B(W1 ), we begin by performing a time update to get B′ (W1 ):

B′ (W1 = sun) = ∑ P(W1 = sun|w0 )B(w0 )

Hence:
W1 B′ (W1 )
sun 0.5
rain 0.5

Next, we’ll assume that the weather forecast for day 1 was good (i.e. F1 = good), and perform an observation
update to get B(W1 ):

B(W1 = sun) ∝ P(F1 = good|W1 = sun)B′ (W1 = sun) = 0.8 · 0.5 = 0.4
B(W1 = rain) ∝ P(F1 = good|W1 = rain)B′ (W1 = rain) = 0.3 · 0.5 = 0.15

The last step is to normalize B(W1 ), noting that the entries in table for B(W1 ) sum to 0.4 + 0.15 = 0.55:
8
B(W1 = sun) = 0.4/0.55 =
11
3
B(W1 = rain) = 0.15/0.55 =
11
Our final table for B(W1 ) is thus the following:

W1 B′ (W1 )
sun 8/11
rain 3/11

Note the result of observing the weather forecast. Because the weatherman predicted good weather, our
belief that it would be sunny increased from 12 after the time update to 11
8
after the observation update.

As a parting note, the normalization trick discussed above can actually simplify computation significantly
when working with Hidden Markov Models. If we began with some initial distribution and were interested
in computing the belief distribution at time t, we could use the forward algorithm to iteratively compute
B(W1 ), . . . , B(Wt ) and normalize only once at the end by dividing each entry in the table for B(Wt ) by the
sum of it’s entries.

CS 188, Fall 2022, Note 17 6

Viterbi Algorithm
In the Forward Algorithm, we used recursion to solve for P(XN |e1:N ), the probability distribution over states
the system could inhabit given the evidence variables observed so far. Another important question related to
Hidden Markov Models is: What is the most likely sequence of hidden states the system followed given the
observed evidence variables so far? In other words, we would like to solve for arg maxx1:N P(x1:N |e1:N ) =
arg maxx1:N P(x1:N , e1:N ). This trajectory can also be solved for using dynamic programming with the Viterbi
algorithm.
The algorithm consists of two passes: the first runs forward in time and computes the probability of the best
path to each (state, time) tuple given the evidence observed so far. The second pass runs backwards in time:
first it finds the terminal state that lies on the path with the highest probability, and then traverses backward
through time along the path that leads into this state (which must be the best path).
To visualize the algorithm, consider the following state trellis, a graph of states and transitions over time:

In this HMM with two possible hidden states, sun or rain, we would like to compute the highest probability
path (assignment of a state for every timestep) from X1 to XN . The weights on an edge from Xt−1 to Xt is
equal to P(Xt |Xt−1 )P(Et |Xt ), and the probability of a path is computed by taking the product of its edge
weights. The first term in the weight formula represents how likely a particular transition is and the second
term represents how well the observed evidence fits the resulting state.
Recall that:
N
P(X1:N , e1:N ) = P(X1 )P(e1 |X1 ) ∏ P(Xt |Xt−1 )P(et |Xt )
t=2
The Forward Algorithm computes (up to normalization)

P(XN , e1:N ) = ∑ P(XN , x1:N−1 , e1:N )

x1 ,..,xN−1

In the Viberbi Algorithm, we want to compute

arg max P(x1:N , e1:N )

x1 ,..,xN

to find the maximum likelihood estimate of the sequence of hidden states. Notice that each term in the
product is exactly the expression for the edge weight between layer t − 1 to layer t. So, the product of
weights along the path on the trellis gives us the probability of the path given the evidence.
We could solve for a joint probability table over all of the possible hidden states, but this results in an
exponential space cost. Given such a table, we could use dynamic programming to compute the best path in

CS 188, Fall 2022, Note 17 7

polynomial time. However, because we can use dynamic programming to compute the best path, we don’t
necessarily need the whole table at any given time.
Define mt [xt ] = maxx1:t−1 P(x1:t , e1:t ), or the maximum probability of a path starting at any x0 and the evidence
seen so far to a given xt at time t. This is the same as the highest weight path through the trellis from step 1
to t. Also note that

mt [xt ] = max P(et |xt )P(xt |xt−1 )P(x1:t−1 , e1:t−1 ) (1)

x1:t−1

= P(et |xt ) max P(xt |xt−1 ) max P(x1:t−1 , e1:t−1 ) (2)

xt−1 x1:t−2

= P(et |xt ) max P(xt |xt−1 )mt−1 [xt−1 ]. (3)

xt−1

This suggests that we can compute mt for all t recursively via dynamic programming. This makes it possible
to determine the last state xN for the most likely path, but we still need a way to backtrack to reconstruct the
entire path. Let’s define at [xt ] = P(et |xt ) arg maxxt−1 P(xt |xt−1 )mt−1 [xt−1 ] = arg maxxt−1 P(xt |xt−1 )mt−1 [xt−1 ]
to keep track of the last transition along the best path to xt . We can now outline the algorithm.
Result: Most likely sequence of hidden states x1:N ∗

/* Forward pass */
for t = 1 to N do
for xt ∈ X do
if t = 1 then
mt [xt ] = P(xt )P(e0 |xt )
else
at [xt ] = arg maxxt−1 P(xt |xt−1 )mt−1 [xt−1 ];
mt [xt ] = P(et |xt )P(xt |at [xt ])mt−1 [at [xt ]];
end
end
end
/* Find the most likely path’s ending point */
∗
xN = arg maxxN mN [xN ];
/* Work backwards through our most likely path and find the hidden
states */
for t = N to 2 do
∗ = a [x∗ ];
xt−1 t t
end
Notice that our a arrays define a set of N sequences, each of which is the most likely sequence to a particular
end state xN . Once we finish the forward pass, we look at the likelihood of the N sequences, pick the best
one, and reconstruct it in the backwards pass. We have thus computed the most likely explanation for our
evidence in polynomial space and time.

CS 188, Fall 2022, Note 17 8

Syllabus 2024-25 5 12 23
No ratings yet
Syllabus 2024-25 5 12 23
29 pages
Comparison of Several Multivariate Means
50% (2)
Comparison of Several Multivariate Means
103 pages
A Level Further Mathematics For AQA - Student Book 1
50% (2)
A Level Further Mathematics For AQA - Student Book 1
31 pages
Markov Chains and Decision Processes For Engineers and Managers
100% (6)
Markov Chains and Decision Processes For Engineers and Managers
478 pages
Mathematics II 2078 Questionpaper
No ratings yet
Mathematics II 2078 Questionpaper
2 pages
Football Analytics
No ratings yet
Football Analytics
15 pages
Ad8601-Unit Ii Probabilistic Reasoning Ii
No ratings yet
Ad8601-Unit Ii Probabilistic Reasoning Ii
26 pages
2024 Fall CSE366 12 HMM
No ratings yet
2024 Fall CSE366 12 HMM
46 pages
Seamo Paper C 2016
100% (1)
Seamo Paper C 2016
7 pages
Prime Numbers
No ratings yet
Prime Numbers
12 pages
Detailed Lesson Plan in General Mathematics
No ratings yet
Detailed Lesson Plan in General Mathematics
7 pages
Math 42 Final Project Combined
No ratings yet
Math 42 Final Project Combined
169 pages
PC Solve Linear Systems
100% (1)
PC Solve Linear Systems
15 pages
Adapt 1P
No ratings yet
Adapt 1P
22 pages
Explaining Simulating Weather Forecast Code Using Markov Chain
No ratings yet
Explaining Simulating Weather Forecast Code Using Markov Chain
2 pages
Lecture 3
No ratings yet
Lecture 3
7 pages
24f 09 Hidden Markov Models
No ratings yet
24f 09 Hidden Markov Models
79 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
Numerically Efficient Modified Runge-Kutta Solver For Fatigue Crack Growth Analysis
No ratings yet
Numerically Efficient Modified Runge-Kutta Solver For Fatigue Crack Growth Analysis
8 pages
Lec18 HMMs
No ratings yet
Lec18 HMMs
56 pages
Hidden Markovnikov Model
No ratings yet
Hidden Markovnikov Model
32 pages
LPP Real Life Problems
No ratings yet
LPP Real Life Problems
8 pages
11TH 12TH Class 2021 Punjab Board Pairing Scheme
No ratings yet
11TH 12TH Class 2021 Punjab Board Pairing Scheme
16 pages
? Presentation Title Markov Chains - Concepts, Types, and Applications
No ratings yet
? Presentation Title Markov Chains - Concepts, Types, and Applications
31 pages
Winter Semester 2022-23 CSE3008 ETH AP2022236000448 Reference Material I 26-Apr-2023 HMM Class-1 PDF
No ratings yet
Winter Semester 2022-23 CSE3008 ETH AP2022236000448 Reference Material I 26-Apr-2023 HMM Class-1 PDF
56 pages
11 Probabilistic Temporal Models
No ratings yet
11 Probabilistic Temporal Models
60 pages
Lecture 11
No ratings yet
Lecture 11
55 pages
L14 Probabilistic Reasoning Over Time
No ratings yet
L14 Probabilistic Reasoning Over Time
42 pages
ML Notes
No ratings yet
ML Notes
41 pages
Unit Digit
No ratings yet
Unit Digit
47 pages
Computational Genomics Hidden Markov Models (HMMS)
No ratings yet
Computational Genomics Hidden Markov Models (HMMS)
55 pages
Lec7 MarkovChains
No ratings yet
Lec7 MarkovChains
14 pages
Pert15 - Probabilistic Reasoning Over Time
No ratings yet
Pert15 - Probabilistic Reasoning Over Time
32 pages
Chapter 8 - Markov Chains
No ratings yet
Chapter 8 - Markov Chains
28 pages
MLRD 8
No ratings yet
MLRD 8
39 pages
SP14 CS188 Lecture 13 - Markov Models
No ratings yet
SP14 CS188 Lecture 13 - Markov Models
15 pages
Definition:: Hidden Markov Model (HMM)
No ratings yet
Definition:: Hidden Markov Model (HMM)
14 pages
Most Important MCQs Class 9th For Board Exam 2025
No ratings yet
Most Important MCQs Class 9th For Board Exam 2025
14 pages
A. H. Nuttal and G. C. Carter, A Generalized Framework For Power Spectral Estimation, Appendices
No ratings yet
A. H. Nuttal and G. C. Carter, A Generalized Framework For Power Spectral Estimation, Appendices
37 pages
Markov Model
No ratings yet
Markov Model
12 pages
Hidden Markov Model Submitted by Sawera Yaseen ROLL NO 1010 Submitted To MR - Hassan University of Okara
No ratings yet
Hidden Markov Model Submitted by Sawera Yaseen ROLL NO 1010 Submitted To MR - Hassan University of Okara
18 pages
HMM Overview
No ratings yet
HMM Overview
16 pages
6-AI Markov
No ratings yet
6-AI Markov
24 pages
Hidden Markov Model New
No ratings yet
Hidden Markov Model New
7 pages
Markov Chains: Scott Sheffield
No ratings yet
Markov Chains: Scott Sheffield
53 pages
Chapter15 1
No ratings yet
Chapter15 1
36 pages
Hidden Markov Models: Julia Hirschberg CS4705
No ratings yet
Hidden Markov Models: Julia Hirschberg CS4705
37 pages
Working With FDP
No ratings yet
Working With FDP
11 pages
Probability & Statistics 2: Robert Šámal January 29, 2024
No ratings yet
Probability & Statistics 2: Robert Šámal January 29, 2024
29 pages
SP14 CS188 Lecture 13 - Markov Models
No ratings yet
SP14 CS188 Lecture 13 - Markov Models
33 pages
Part A Simulation: Matthias Winkel Department of Statistics University of Oxford
No ratings yet
Part A Simulation: Matthias Winkel Department of Statistics University of Oxford
54 pages
Normal Distribution
No ratings yet
Normal Distribution
23 pages
cs229 HMM
No ratings yet
cs229 HMM
13 pages
MST PDF
No ratings yet
MST PDF
3 pages
What Is The Electron
100% (4)
What Is The Electron
288 pages
HMM
No ratings yet
HMM
4 pages
IU2015ERP1856
No ratings yet
IU2015ERP1856
20 pages
A.2.2Solving Absolute Values
No ratings yet
A.2.2Solving Absolute Values
7 pages
Lecture 03 Simulation and Modeling
No ratings yet
Lecture 03 Simulation and Modeling
7 pages
Markov Chains
No ratings yet
Markov Chains
76 pages
Lectures 7 and 8
No ratings yet
Lectures 7 and 8
37 pages
Solution 1
No ratings yet
Solution 1
8 pages
m211f2012 Slides Sec2.9handout
No ratings yet
m211f2012 Slides Sec2.9handout
17 pages
PY Answers 202305
No ratings yet
PY Answers 202305
6 pages
19MAT301 - Practice Sheet 2 & 3
No ratings yet
19MAT301 - Practice Sheet 2 & 3
10 pages
ESI 4313 Operations Research 2: Markov Chains Basics
No ratings yet
ESI 4313 Operations Research 2: Markov Chains Basics
45 pages
Lec19 PDF
No ratings yet
Lec19 PDF
9 pages
Markov Chains
No ratings yet
Markov Chains
42 pages
Math 312 Lecture Notes Markov Chains
No ratings yet
Math 312 Lecture Notes Markov Chains
9 pages
Euclid's Algorithm: ENGI 1331: Exam 2 Review - Additional Practice Problems Fall 2020
No ratings yet
Euclid's Algorithm: ENGI 1331: Exam 2 Review - Additional Practice Problems Fall 2020
4 pages
1 Introduction
No ratings yet
1 Introduction
4 pages
6 - Discrete Markov Chains
No ratings yet
6 - Discrete Markov Chains
34 pages
Sequences 2
No ratings yet
Sequences 2
1 page
Hidden Markov Model (HMM) Architecture
No ratings yet
Hidden Markov Model (HMM) Architecture
15 pages
Markov Chain - Exe
No ratings yet
Markov Chain - Exe
6 pages
Q-M Techniques
No ratings yet
Q-M Techniques
25 pages
Tutorial HMM CI
No ratings yet
Tutorial HMM CI
14 pages
Test Yourself: Example 1.3.7 Equality of Functions
No ratings yet
Test Yourself: Example 1.3.7 Equality of Functions
2 pages
Math 4 Mtap Reviewer
100% (1)
Math 4 Mtap Reviewer
4 pages
Colegio de Sto. Tomas-Recoletos, Inc.: Week 5 Quarter 1 Rectangular Coordinated System Introduction and Focus Questions
No ratings yet
Colegio de Sto. Tomas-Recoletos, Inc.: Week 5 Quarter 1 Rectangular Coordinated System Introduction and Focus Questions
12 pages
APPM 2360 Project 2 Network Markov Chains
No ratings yet
APPM 2360 Project 2 Network Markov Chains
6 pages
W 10 Markov Model
No ratings yet
W 10 Markov Model
13 pages
Practice 8
No ratings yet
Practice 8
8 pages
Markov Chains: Stochastic Models
No ratings yet
Markov Chains: Stochastic Models
7 pages
Lesson 14. Markov Chains - Modeling and Assumptions: 0 Quick Summary
No ratings yet
Lesson 14. Markov Chains - Modeling and Assumptions: 0 Quick Summary
4 pages
Life Is The Most Difficult Exam. Many People Fail Because They Try To Copy Others, Not Realizing That Everyone Has A Different Question Paper.
No ratings yet
Life Is The Most Difficult Exam. Many People Fail Because They Try To Copy Others, Not Realizing That Everyone Has A Different Question Paper.
3 pages
Viterbi Algorithm Demystified
No ratings yet
Viterbi Algorithm Demystified
15 pages
Understanding Vector Calculus: Practical Development and Solved Problems
From Everand
Understanding Vector Calculus: Practical Development and Solved Problems
Jerrold Franklin
No ratings yet
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)

cs188 Fa22 Note17

Uploaded by

cs188 Fa22 Note17

Uploaded by

CS 188 Introduction to Artificial Intelligence

Fall 2022 Note 17

P(W0 ,W1 ,W2 ) = P(W0 )P(W1 |W0 )P(W2 |W1 ,W0 )

P(W0 ,W1 ,W2 ) = P(W0 )P(W1 |W0 )P(W2 |W1 )

CS 188, Fall 2022, Note 17 1

The Mini-Forward Algorithm

P(Wi+1 ) = ∑ P(wi ,Wi+1 )

By the chain rule we can re-express this as follows:

P(Wi+1 ) = ∑ P(Wi+1 |wi )P(wi )

Wi+1 Wi P(Wi+1 |Wi )

Using the mini-forward algorithm we can compute P(W1 ) as follows:

P(W1 = sun) = ∑ P(W1 = sun|w0 )P(w0 )

Hence our distribution for P(W1 ) is

CS 188, Fall 2022, Note 17 2

P(Wt = sun) = 0.6 · P(Wt = sun) + 0.1 · P(Wt = rain)

As expected, P(W∞+1 ) = P(W∞ ). In general, if Wt had a domain of size k, the equivalence

CS 188, Fall 2022, Note 17 3

CS 188, Fall 2022, Note 17 4

This can be reexpressed then with the chain rule as follows:

B′ (Wi+1 ) = ∑ P(Wi+1 |wi )B(wi )

B(Wi+1 ) ∝ P( fi+1 |Wi+1 )B′ (Wi+1 )

B(Wi+1 ) ∝ P( fi+1 |Wi+1 ) ∑ P(Wi+1 |wi )B(wi )

CS 188, Fall 2022, Note 17 5

To compute B(W1 ), we begin by performing a time update to get B′ (W1 ):

B′ (W1 = sun) = ∑ P(W1 = sun|w0 )B(w0 )

CS 188, Fall 2022, Note 17 6

P(XN , e1:N ) = ∑ P(XN , x1:N−1 , e1:N )

In the Viberbi Algorithm, we want to compute

arg max P(x1:N , e1:N )

CS 188, Fall 2022, Note 17 7

mt [xt ] = max P(et |xt )P(xt |xt−1 )P(x1:t−1 , e1:t−1 ) (1)

= P(et |xt ) max P(xt |xt−1 ) max P(x1:t−1 , e1:t−1 ) (2)

= P(et |xt ) max P(xt |xt−1 )mt−1 [xt−1 ]. (3)

CS 188, Fall 2022, Note 17 8

You might also like