0% found this document useful (0 votes)

12 views4 pages

Markov Decision

A Finite Markov Decision Process (MDP) is a mathematical model for decision-making in uncertain environments, defined by states, actions, transition probabilities, rewards, and a discount factor. The goal is to find an optimal policy that maximizes expected cumulative rewards, using value functions and Bellman equations for evaluation. Solutions can be derived through methods such as Dynamic Programming, Monte Carlo, and Temporal-Difference Learning.

Uploaded by

hksun.12731

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views4 pages

Markov Decision

Uploaded by

hksun.12731

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Finite Markov Decision Process (MDP) :

A Markov Decision Process (MDP) is a mathematical framework used to model

decision-making in environments where outcomes are partly random and partly
under the control of a decision-maker. An MDP provides a formal foundation for
many reinforcement learning algorithms.

Components of a Finite MDP

A finite MDP is defined by the following components:

1. States (S):
A finite set of states representing the environment's different situations.
Example: In a grid world, each cell (position) can be a state.

2. Actions (A):
A finite set of actions available to the agent in each state.
Example: In a grid world, the actions could be "up," "down," "left," and "right."

3. Transition Probability (P):

The probability of transitioning from one state to another, given an action. This
is represented as:
\[
P(s' | s, a) = t{Probability of reaching state } s' \{ from state } s \{ by taking
action } a.
\]
Example: Moving "up" in the grid world has an 80% chance of moving up, a
10% chance of staying in the same place, and a 10% chance of moving left.

4. Reward Function (R):

The reward received after transitioning from one state to another, given an
action. It is represented as:
\[
R(s, a, s') = \{Reward received after taking action } a \{ in state } s \{ and
moving to state } s'.
\]
Example: In a grid world, reaching a goal state might yield a reward of +10,
while all other transitions yield a reward of -1.

5. Discount Factor (γ):

A factor between 0 and 1 that represents the importance of future rewards. It
determines how much future rewards are worth compared to immediate rewards.
- If \( \gamma = 0 \), the agent is shortsighted and only cares about immediate
rewards.
- If \( \gamma = 1 \), the agent is farsighted and cares equally about future
rewards.

Goal of MDP

The goal of an MDP is to find a policy \( \pi \) that maximizes the expected
cumulative reward over time. A policy is a function that maps states to actions, \(
\pi: S \rightarrow A \).

- Optimal Policy (\( \pi^* \)): The policy that yields the maximum expected reward
starting from any state \( s \).

Value Functions in MDPs

Value functions are used to estimate how good it is for an agent to be in a

particular state or to perform a particular action in that state.

1. State Value Function (V):

The value of a state \( s \) under policy \( \pi \) is the expected cumulative
reward starting from \( s \) and following policy \( \pi \).
\[
V^\pi(s) = \mathbb{E} \left[ \sum_{t=0}^{\infty} \gamma^t R(s_t, a_t) \big|
s_0 = s, \pi \right].
\]

2. Action Value Function (Q):

The value of taking action \( a \) in state \( s \) under policy \( \pi \) is the
expected cumulative reward starting from \( s \), taking action \( a \), and then
following policy \( \pi \).
\[
Q^\pi(s, a) = \mathbb{E} \left[ \sum_{t=0}^{\infty} \gamma^t R(s_t, a_t) \
big| s_0 = s, a_0 = a, \pi \right].
\]

Bellman Equations

The Bellman Equation provides a recursive decomposition for value functions,

expressing the value of a state in terms of the values of successor states.

1. Bellman Expectation Equation for \( V^\pi(s) \):

\[
V^\pi(s) = \sum_{a \in A} \pi(a | s) \sum_{s' \in S} P(s' | s, a) \left[ R(s, a, s') + \
gamma V^\pi(s') \right].
\]

2. Bellman Expectation Equation for \( Q^\pi(s, a) \):

\[
Q^\pi(s, a) = \sum_{s' \in S} P(s' | s, a) \left[ R(s, a, s') + \gamma \sum_{a' \in
A} \pi(a' | s') Q^\pi(s', a') \right].
\]

Solving MDPs

1. Dynamic Programming:
Methods like Policy Iteration and Value Iteration are used to compute the
optimal policy. These methods rely on the Bellman equations to iteratively
improve the value functions until convergence.

2. Monte Carlo Methods:

These methods use sample sequences of states, actions, and rewards to
estimate value functions based on the actual experience of the agent. They are
particularly useful when the model (transition probabilities) is unknown.
3. Temporal-Difference (TD) Learning:
A combination of dynamic programming and Monte Carlo methods. TD
methods like Q-Learning and SARSA learn from incomplete episodes, updating
estimates based on observed transitions.

Summary

- A Finite MDP consists of states, actions, transition probabilities, rewards, and a

discount factor.
- The objective is to find an optimal policy that maximizes the expected
cumulative reward.
- Value functions (state-value and action-value) are crucial for evaluating the
desirability of states or actions.
- The Bellman equations provide a recursive formula to calculate these value
functions, which can be solved using methods like Dynamic Programming, Monte
Carlo, or Temporal-Difference Learning.

Batt Mobile - Digital Strategy Deck
No ratings yet
Batt Mobile - Digital Strategy Deck
72 pages
Deep RL - Content Beyond Syllabus
No ratings yet
Deep RL - Content Beyond Syllabus
16 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
RL Unit - Ii
No ratings yet
RL Unit - Ii
20 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
17 - Markov Decision Processes
No ratings yet
17 - Markov Decision Processes
59 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
Lec 12
No ratings yet
Lec 12
60 pages
RL Unit-Ii
No ratings yet
RL Unit-Ii
14 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
DSA5102 Lecture11
No ratings yet
DSA5102 Lecture11
44 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
20ai903 - RL - Unit 2
No ratings yet
20ai903 - RL - Unit 2
27 pages
Textbook Solutions Expert Q&A Practice: Find Solutions For Your Homework
No ratings yet
Textbook Solutions Expert Q&A Practice: Find Solutions For Your Homework
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
242 Sheet 02 03
No ratings yet
242 Sheet 02 03
5 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
RL Ese
No ratings yet
RL Ese
7 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
M 2
No ratings yet
M 2
12 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
20 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
Cs229-Notes12 Reinforcement in Control
No ratings yet
Cs229-Notes12 Reinforcement in Control
17 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
AS02
No ratings yet
AS02
16 pages
RL-UNIT2 - RL Unit 2 RL-UNIT2 - RL Unit 2
No ratings yet
RL-UNIT2 - RL Unit 2 RL-UNIT2 - RL Unit 2
23 pages
Reinforcement Learning Lec12
No ratings yet
Reinforcement Learning Lec12
60 pages
Stochastic DP
No ratings yet
Stochastic DP
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
43 pages
Experiment 4
No ratings yet
Experiment 4
7 pages
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
No ratings yet
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
40 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
17 pages
Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab
No ratings yet
Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab
9 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
CS229
No ratings yet
CS229
17 pages
119686
No ratings yet
119686
24 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
Lecture4 Model Free Prediction
No ratings yet
Lecture4 Model Free Prediction
34 pages
242 Sheet 02 02
No ratings yet
242 Sheet 02 02
6 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Markovian Decision Process
No ratings yet
Markovian Decision Process
27 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
Unit-4 MDP
No ratings yet
Unit-4 MDP
21 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
14 pages
18 - Dynamic Programming For Markov Decision Processes
No ratings yet
18 - Dynamic Programming For Markov Decision Processes
50 pages
06 MDP
No ratings yet
06 MDP
89 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
PDF Unit-5 (Full Unit)
No ratings yet
PDF Unit-5 (Full Unit)
37 pages
Reinforcement Learning: Part I - Definitions
No ratings yet
Reinforcement Learning: Part I - Definitions
26 pages
Reinforcement Learning Cheatsheet
No ratings yet
Reinforcement Learning Cheatsheet
16 pages
02 MarkovDecisionProcess
No ratings yet
02 MarkovDecisionProcess
51 pages
RL DQN PG
No ratings yet
RL DQN PG
65 pages
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
No ratings yet
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
26 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
Lecture Notes
No ratings yet
Lecture Notes
29 pages
Policy (RL IITH)
No ratings yet
Policy (RL IITH)
46 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet
NTTF Placement Brochure 2021
No ratings yet
NTTF Placement Brochure 2021
72 pages
AMER - BRO - Stroboscopy Solution - (MKENT-2482EN-U Rev 2) - 09.2020
No ratings yet
AMER - BRO - Stroboscopy Solution - (MKENT-2482EN-U Rev 2) - 09.2020
4 pages
5G Bootcamp Syllabus 3.0 - APPROVED 10 - 12 - 22-1
No ratings yet
5G Bootcamp Syllabus 3.0 - APPROVED 10 - 12 - 22-1
9 pages
EAO MC 61 Main-Catalogue en
No ratings yet
EAO MC 61 Main-Catalogue en
110 pages
National Concrete Products Co. LTD 200-20-0049 CIFA K48
No ratings yet
National Concrete Products Co. LTD 200-20-0049 CIFA K48
4 pages
3 Way Valve Size 1.5 Inch VMBT6 + MVT503
No ratings yet
3 Way Valve Size 1.5 Inch VMBT6 + MVT503
6 pages
Caterpillar 3516b Marine Engine Operation Maintenance Manual 4bw
No ratings yet
Caterpillar 3516b Marine Engine Operation Maintenance Manual 4bw
32 pages
Loops in Python - Shishir Kant Singh
No ratings yet
Loops in Python - Shishir Kant Singh
16 pages
Industrial Training (Cse 4389) : Submitted by
No ratings yet
Industrial Training (Cse 4389) : Submitted by
33 pages
Turbine Monitoring and Control: Aset - Eee
No ratings yet
Turbine Monitoring and Control: Aset - Eee
16 pages
Meesho F
No ratings yet
Meesho F
75 pages
C16 Dcme
No ratings yet
C16 Dcme
311 pages
p102613 Docjl Burnerspec Sheet 3
No ratings yet
p102613 Docjl Burnerspec Sheet 3
2 pages
AccountStatement - 23-06-2025 17 - 03 - 59
No ratings yet
AccountStatement - 23-06-2025 17 - 03 - 59
20 pages
2020form - MC28s2020-Annexes E - Accex
No ratings yet
2020form - MC28s2020-Annexes E - Accex
1 page
Parts List EUPOLO150 (JC150T)
No ratings yet
Parts List EUPOLO150 (JC150T)
44 pages
Lecture 2 - Process Design & Analysis
No ratings yet
Lecture 2 - Process Design & Analysis
29 pages
Lesson Worksheet (Unit 5) With Solution
No ratings yet
Lesson Worksheet (Unit 5) With Solution
11 pages
Chopper Blade
No ratings yet
Chopper Blade
1 page
EXCEL ENERGY - Bill
0% (1)
EXCEL ENERGY - Bill
3 pages
BTMC506 AppliedThermodynamicsmECH
No ratings yet
BTMC506 AppliedThermodynamicsmECH
2 pages
Power Electronics
No ratings yet
Power Electronics
3 pages
MCAD
No ratings yet
MCAD
24 pages
A Project Report On: Restaurant Management System
No ratings yet
A Project Report On: Restaurant Management System
25 pages
1a Slide Introduction
No ratings yet
1a Slide Introduction
8 pages
Motherboard Manual
No ratings yet
Motherboard Manual
23 pages
1st MIL
No ratings yet
1st MIL
4 pages
FS - 720 - Общее описание - A6V10210355
No ratings yet
FS - 720 - Общее описание - A6V10210355
182 pages
Type L6N Load Cell: Short Description
No ratings yet
Type L6N Load Cell: Short Description
3 pages

Markov Decision

Uploaded by

Markov Decision

Uploaded by

Finite Markov Decision Process (MDP) :

A Markov Decision Process (MDP) is a mathematical framework used to model

Components of a Finite MDP

A finite MDP is defined by the following components:

3. Transition Probability (P):

4. Reward Function (R):

5. Discount Factor (γ):

Value Functions in MDPs

Value functions are used to estimate how good it is for an agent to be in a

1. State Value Function (V):

2. Action Value Function (Q):

The Bellman Equation provides a recursive decomposition for value functions,

1. Bellman Expectation Equation for \( V^\pi(s) \):

2. Bellman Expectation Equation for \( Q^\pi(s, a) \):

2. Monte Carlo Methods:

- A Finite MDP consists of states, actions, transition probabilities, rewards, and a

You might also like