0% found this document useful (0 votes)

8 views4 pages

Lecture MarkovDecisionProcess

Hhjj bbb jj dj

Uploaded by

josselin.arj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views4 pages

Lecture MarkovDecisionProcess

Hhjj bbb jj dj

Uploaded by

josselin.arj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Markov chains and Markov Decision Processes

Jae Yun JUN KIM∗

Reference: Neil Walton’s lecture notes

1 Introduction
1.1 Introductory examples
Example 1: snakes and ladders
We give an initial example to better position our intuition. We highlight some of the key
properties of a Markov chain: how to calculate transitions, how the past effects the current
movement of the processes, how to construct a chain, what might the long run behavior of the
process look like.

Figure 1: Example 1

We let Xt be the position of the counter on the board for the dice after the dice has been
thrown t times. The processes X = {Xt : t ∈ Z} is a discrete time Markov chain. The
following exercises should be fairly straight-forward and should position your intuition about
Markov chains.
Example 2: Transition probabilities
Given that the counter is in square x, we let Pxy be the probability that you next go to square
y. That is,
P xy = P (X1 = y|X0 = x) (1)
Calculate:
a) P14 b) P17 c) P34
Answer:
a) 16 + 16 = 1
3
b) 1
6
+ 1
6
= 1
3
c) 1
6
+ 1
6
= 1
3
∗
ECE Paris Graduate School of Engineering, 37 quai de Grenelle 75015 Paris, France; [email protected]

1
Example 3: (Markov property)
Show that:

P(X3 = 7|X2 = 6, X1 = 3, X0 = 3) = P(X3 = 7|X2 = 6, X1 = 5, X0 = 1) = P(X3 = 7|X2 = 6)

(2)
illustrates that given we are on square 6, the probability of reaching square 7 is not effected
by the path by which we reached square 6.
Answer:
P(X3 = 7, X2 = 6, X1 = 5|X0 = 1)
P(X3 = 7|X2 = 6, X1 = 5, X0 = 1) = = P(X3 = 7|X2 = 6)
P(X2 = 6, X1 = 5|X0 = 1)
(3)
In general, given that counter is on a square we will see that the next square reached by the
counter on the next turn is not effected by the paht that was used to reach the square. This
is called the Markov property.

2 Markov chain
Let X be a countable set.
Definition: (Initial distribution/Transition matrix)
An initial distribution
λ = (λx : x ∈ X ) (4)
is a positive vector whose components sums to one. A transition Pmatrix P = (Pxy : x, y ∈ X )
is a positive matrix whose rows sum to one, that is, for x ∈ X , y∈X Pxy = 1.
Definition: (Discrete Time Markov Chain)
We say that a sequence of random variables X(Xt : t ∈ Z+ ) is a discrete time Markov chain,
with initial distribution λ and transition matrix P , if for x0 , · · · , xt+1 ∈ X ,

P(X0 = x0 ) = λx0 (5)

and
P(Xt+1 = xt+1 |Xt = xt , · · · , X0 = x0 ) = P(Xt+1 = xt+1 |Xt = xt ) = Pxt |xt+1 . (6)
The condition (Markov) is often called the Markov property:
It states that the past (X1 , · · · , Xt−1 ) and future Xt+1 are conditionally independent of the
present Xt .
Otherwise stated, it says that, when we know the past and present states (X1 , · · · , Xt ) =
(x0 , · · · , xt ), the distribution of the future states Xt+1 , Xt+2 , · · · is only determined by the
present state Xt = xt .
Think of a board game like snakes and ladders, where you go in, the future is only determined
by where you are now and not how you got there. This is the Markov property.

3 Markov decision processes

As in the section de Dynamic Programming, we consider discrete times t = 0, 1, · · · , T , states
x ∈ X , actions a ∈ At and rewards rt (a, x). However, the plant equation and definition of a
policy are slightly different.

2
Definition: (Plant equation):
The state evolves according to functions ft : X × At × [0, 1] → X as

Xt+1 = Ft (Xt , at ; Ut ) ≡ Ft (Xt , at ), (7)

where (Ut )t≥0 are IIDRVs uniform on [0,1]. This is called the plant equation. As noted in
the equivalence above, we will often suppress dependence on Ut .

Figure 2: Illustration for the plant equation

Definition: (Policy):
A policy π chooses an action πt at each time t as a function of past states x0 , · · · , xt and past
actions π0 , · · · , πt−1 . We let P be the set of policies.
A policy, a plant equation, and the resulting sequence of states and rewards describe a Markov
Decision Process. Objective is to find a process that optimizes the following objective
function.
Definition: (Markov decision problem):
Given initial state x0 , a Markov Decision Problem is the following optimization
"T −1 #
X
W (x0 ) =Maximize RT (x0 , Π) := E rt (Xt , πt ) + rT (XT )
t=0
(8)
over Π ∈ P.

Further, let Rτ (xτ , Π) (respectively, Wτ (xτ )) be the objective (respectively, the optimal objec-
tive) for (MDP) when the summation is started from time t = τ and state Xτ = xτ , rather
than t = 0 and X0 = x0 .
Definition: (Bellman equation):
Setting WT (x) = rT (x), for t = T − 1, T − 2, · · · , 0,

Wt (xt ) = sup {rt (xt , at ) + Ext ,at [Wt+1 (Xt+1 )]} . (9)
at ∈At

The above equation is the Bellman’s equation for a Markov Decision Process.
Example:
You need to sell a car. At every time t = 0, · · · , T − 1, you set a price pt , and a customer then
views the car. The probability that the customer buys a car at price p is D(p). If the car is
not sold at time T , then it is sold for a fixed price WT , WT < 1. Maximize the reward from
selling the car and find the recursion for the optimal reward, when D(p) = (1 − p)+ .

3
Figure 3: Markov Decision Process problem

Answer:
Let xt = I[Car is not sold by time t]

xt = 0 ⇒ xt+1 = 0
(
0, w.p. D(pt ) (10)
xt = 1 ⇒ xt+1 =
1, w.p. 1 − D(pt )

Let Rt (xt ) be the reward from t to T give xt and Wt be the optimal reward from t to T given
xt .
Note: Rt (0) = Wt (0) = 0.

RT −s (1) = D(pT −s )[pT −s + RT −s+1 (0)] + (1 − D(pT −s )[0 + RT −s+1 (1)]

(if optimal) (11)
= D(pT −s )pT −s + (1 − D(pT −s ))WT −s+1 (1).

Choosing pT −s optimally,

WT −s (1) = max{D(pT −s )pT −s + (1 − D(pT −s ))WT −s+1 (1)}. (12)

pT −s

Rewriting the above equation,

CT −s = max{pT −s (1 − pT −s )+ + (1 − (1 − pT −s )+ CT −s+1 }, (13)

pT −s

where D(pT −s ) = (1 − pT −s )+ .
Differentiating it over p, we obtain
CT −s+1 + 1
pT −s =
2
(14)
1 + CT −s+1 2

CT −s =
2

CHAPTER 20-Final
No ratings yet
CHAPTER 20-Final
20 pages
Bridge Health Monitoring System (BHMS)
No ratings yet
Bridge Health Monitoring System (BHMS)
6 pages
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
100% (1)
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
86 pages
Algorithm Foundations of Data Science: Lecture 1: Markov Chain
No ratings yet
Algorithm Foundations of Data Science: Lecture 1: Markov Chain
111 pages
Lecture2-MRP (RL IITH)
No ratings yet
Lecture2-MRP (RL IITH)
54 pages
AI512/EE633: Reinforcement Learning: Lecture 2 - Markov Decision Process
No ratings yet
AI512/EE633: Reinforcement Learning: Lecture 2 - Markov Decision Process
83 pages
A Brief Introduction To Markov Chains - Towards Data Science
0% (1)
A Brief Introduction To Markov Chains - Towards Data Science
26 pages
14 RA MIRI MarkovChains
No ratings yet
14 RA MIRI MarkovChains
61 pages
Co 3 Material
100% (1)
Co 3 Material
16 pages
3 Markov Decision Processes
No ratings yet
3 Markov Decision Processes
70 pages
Test - D22 May 2025
No ratings yet
Test - D22 May 2025
2 pages
Ai (It) Unit-4
No ratings yet
Ai (It) Unit-4
37 pages
Markov Chain - Wikipedia
No ratings yet
Markov Chain - Wikipedia
26 pages
Lec 02
No ratings yet
Lec 02
89 pages
Hydrocarbons 3 QP
No ratings yet
Hydrocarbons 3 QP
7 pages
Markov Models by Sivas A My and Mole Fe
No ratings yet
Markov Models by Sivas A My and Mole Fe
20 pages
DSBD Unit-Ii 2
No ratings yet
DSBD Unit-Ii 2
47 pages
Lec2 Markov ChainsI
No ratings yet
Lec2 Markov ChainsI
48 pages
Stochastic Process
No ratings yet
Stochastic Process
154 pages
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
No ratings yet
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
66 pages
Framework Grade3 Assessment
No ratings yet
Framework Grade3 Assessment
29 pages
Main
No ratings yet
Main
93 pages
Ms Sabiana
No ratings yet
Ms Sabiana
14 pages
STA03B3 Lecture 3
No ratings yet
STA03B3 Lecture 3
27 pages
Unix Assignment
No ratings yet
Unix Assignment
12 pages
Markov Chains2
No ratings yet
Markov Chains2
75 pages
All Chapters
No ratings yet
All Chapters
180 pages
Bab Amrkov
No ratings yet
Bab Amrkov
96 pages
Stoch Procs Lecture1 2025
No ratings yet
Stoch Procs Lecture1 2025
20 pages
DSA5102 Lecture11
No ratings yet
DSA5102 Lecture11
44 pages
Module 2
No ratings yet
Module 2
73 pages
Heating: Surrounded by Quality
No ratings yet
Heating: Surrounded by Quality
2 pages
ST202 Notes
No ratings yet
ST202 Notes
52 pages
Heat ExchangersBasics Design Applications
100% (3)
Heat ExchangersBasics Design Applications
598 pages
Markov Chains
No ratings yet
Markov Chains
55 pages
Math5846 Chapter8
No ratings yet
Math5846 Chapter8
101 pages
Manual DiffMerge
No ratings yet
Manual DiffMerge
71 pages
Markov Chain
No ratings yet
Markov Chain
16 pages
Discrete Markov Chain
No ratings yet
Discrete Markov Chain
43 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Stochastic Processes and Time Series Markov Chains - I: 1 A Coin-Tossing Game
No ratings yet
Stochastic Processes and Time Series Markov Chains - I: 1 A Coin-Tossing Game
4 pages
Analysis of Precipitation
No ratings yet
Analysis of Precipitation
46 pages
Oscillatory Behavior of A Higher-Order Nonlinear Neutral Type Functional Di Erence Equation With Oscillating Coe Cients
No ratings yet
Oscillatory Behavior of A Higher-Order Nonlinear Neutral Type Functional Di Erence Equation With Oscillating Coe Cients
8 pages
Discrete Time Markov Chains
No ratings yet
Discrete Time Markov Chains
59 pages
Summary Sophia Vent
No ratings yet
Summary Sophia Vent
16 pages
Stanford - Discrete Time Markov Chains PDF
No ratings yet
Stanford - Discrete Time Markov Chains PDF
23 pages
Tank Bottom Scraper Zickert - en
No ratings yet
Tank Bottom Scraper Zickert - en
22 pages
Installation Condition For Ul Recognition
No ratings yet
Installation Condition For Ul Recognition
18 pages
Toad Edge Comparison To Toad - Mac Edition Freeware
No ratings yet
Toad Edge Comparison To Toad - Mac Edition Freeware
2 pages
Examples in Markov Decision Processes by A B Piunovskiy
No ratings yet
Examples in Markov Decision Processes by A B Piunovskiy
308 pages
Fire Detection and Alarm System
No ratings yet
Fire Detection and Alarm System
5 pages
MDB Simple Strain
No ratings yet
MDB Simple Strain
4 pages
SGA Resizing
No ratings yet
SGA Resizing
4 pages
Markov Chain (Part 1)
No ratings yet
Markov Chain (Part 1)
31 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
20 pages
Quantifying Rooftop Photovoltaic Solar Energy Potential - A Machine Leaning Approach (Assouline - 2017)
No ratings yet
Quantifying Rooftop Photovoltaic Solar Energy Potential - A Machine Leaning Approach (Assouline - 2017)
19 pages
Markovian Decision Process
No ratings yet
Markovian Decision Process
27 pages
Introduction To Derivatives and Risk Management 9th Edition Chance Solutions Manual Download
100% (24)
Introduction To Derivatives and Risk Management 9th Edition Chance Solutions Manual Download
15 pages
Probability & Statistics 2: Robert Šámal January 29, 2024
No ratings yet
Probability & Statistics 2: Robert Šámal January 29, 2024
29 pages
Markov Chains: 1.1 Specifying and Simulating A Markov Chain
No ratings yet
Markov Chains: 1.1 Specifying and Simulating A Markov Chain
38 pages
Stochastic - Lecture Notes
100% (1)
Stochastic - Lecture Notes
108 pages
RL Unit 2
No ratings yet
RL Unit 2
11 pages
P16 HP - P16 AP: Portable Crimper For Repair Work
No ratings yet
P16 HP - P16 AP: Portable Crimper For Repair Work
2 pages
Tutorial 1
No ratings yet
Tutorial 1
3 pages
Markov Processes: Fundamental of Stochastic Networks-Oliver C.Ibe, John-Wiley, 2011
No ratings yet
Markov Processes: Fundamental of Stochastic Networks-Oliver C.Ibe, John-Wiley, 2011
30 pages
Playing (Mathematics) Games - When Is A Game Not A Game (Gough)
No ratings yet
Playing (Mathematics) Games - When Is A Game Not A Game (Gough)
5 pages
1 Introduction
No ratings yet
1 Introduction
4 pages
Markov Chain For Transition Probability
100% (1)
Markov Chain For Transition Probability
29 pages
1 Discrete-Time Markov Chains
No ratings yet
1 Discrete-Time Markov Chains
7 pages
Markov Chains
No ratings yet
Markov Chains
13 pages
Experiment: Prism Lab: Name: Lab Partner: Lab Group: Due Date
67% (3)
Experiment: Prism Lab: Name: Lab Partner: Lab Group: Due Date
11 pages
ChemSep Thermodynamic Property Model Selection
No ratings yet
ChemSep Thermodynamic Property Model Selection
6 pages
Cadenas de Markov
No ratings yet
Cadenas de Markov
53 pages
Probability and Statistics With Reliability, Queuing and Computer Science Applications: Chapter 7 On
No ratings yet
Probability and Statistics With Reliability, Queuing and Computer Science Applications: Chapter 7 On
41 pages
Computer Basics
No ratings yet
Computer Basics
14 pages
MC Notes
No ratings yet
MC Notes
42 pages
Integrated Pollution Prevention and Control (IPPC) Reference Document On Best Available Techniques For The Textiles Industry July 2003
No ratings yet
Integrated Pollution Prevention and Control (IPPC) Reference Document On Best Available Techniques For The Textiles Industry July 2003
22 pages
DX180LC-3: Crawler Excavator
No ratings yet
DX180LC-3: Crawler Excavator
24 pages
System Modeling - 5
No ratings yet
System Modeling - 5
9 pages
4703 07 Notes MC PDF
No ratings yet
4703 07 Notes MC PDF
7 pages
MC 96 Duty Cycle Crane en 905 673 2
No ratings yet
MC 96 Duty Cycle Crane en 905 673 2
16 pages
Markov Chains 2013
No ratings yet
Markov Chains 2013
42 pages
Ch.9 Circular Motion
No ratings yet
Ch.9 Circular Motion
10 pages
Module 5 - Forecasting
No ratings yet
Module 5 - Forecasting
13 pages
B+V Industrietechnik GMBH: A Thyssenkrupp Technolgies Company
No ratings yet
B+V Industrietechnik GMBH: A Thyssenkrupp Technolgies Company
12 pages
CE 579 Lecture 4 Stability-Energy Method LRG Deflections
No ratings yet
CE 579 Lecture 4 Stability-Energy Method LRG Deflections
11 pages
Notes On Stochastic Processes: 1 Learning Outcomes
No ratings yet
Notes On Stochastic Processes: 1 Learning Outcomes
26 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)

Lecture MarkovDecisionProcess

Uploaded by

Lecture MarkovDecisionProcess

Uploaded by

Markov chains and Markov Decision Processes

Jae Yun JUN KIM∗

Reference: Neil Walton’s lecture notes

P(X3 = 7|X2 = 6, X1 = 3, X0 = 3) = P(X3 = 7|X2 = 6, X1 = 5, X0 = 1) = P(X3 = 7|X2 = 6)

P(X0 = x0 ) = λx0 (5)

3 Markov decision processes

Xt+1 = Ft (Xt , at ; Ut ) ≡ Ft (Xt , at ), (7)

Figure 2: Illustration for the plant equation

RT −s (1) = D(pT −s )[pT −s + RT −s+1 (0)] + (1 − D(pT −s )[0 + RT −s+1 (1)]

WT −s (1) = max{D(pT −s )pT −s + (1 − D(pT −s ))WT −s+1 (1)}. (12)

Rewriting the above equation,

CT −s = max{pT −s (1 − pT −s )+ + (1 − (1 − pT −s )+ CT −s+1 }, (13)

You might also like