0% found this document useful (0 votes)

33 views44 pages

DSA5102 Lecture11

lecture11

Uploaded by

gjpnwmdpz7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views44 pages

DSA5102 Lecture11

lecture11

Uploaded by

gjpnwmdpz7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Foundations of Machine Learning

DSA 5102 • Lecture 11

Li Qianxiao
Department of Mathematics
So far
We introduced two classes of machine learning problems
• Supervised Learning
• Unsupervised Learning

Today, we will look at another class of problems that lies

somewhere in between, called reinforcement learning
Motivation
Interactions with
environment

Learning from Experience

Some General
Observations
of Learning
Reward vs Demonstrations

Planning
All of what we mean by goals and
purposes can be well thought of
as the maximization of the
The Reward expected value of the cumulative
Hypothesis sum of a received scalar signal
(called reward).
Examples
• Studying and getting good grades
• Learning to play a new musical instrument
• Winning at chess
• Navigating a maze
• An infant learning to walk
The Basic Components
Environment

Action

State
Interpreter

Reward

Agent
Examples

Task Agent Environment Interpreter Reward

Chess Player Board state Vision Win/loss at the
end
Learning to Infant The world Senses Not falling,
Walk getting to places
Navigating a Player The maze Vision Getting out of
maze the maze
Key Differences in Reinforcement
Learning

• Vs unsupervised learning: not completely unsupervised due

to a reward signal

• Vs supervised learning: not completely supervised, since

optimal actions to take are never given
Example: The Recycling Robot
Actions
• Search for cans
• Pick up or drop cans
• Stop and wait
• Go back and charge

Rewards:
• +10 for each can picked up
• -1 for each meter moved
• -1000 for running out of battery
Another Example
The Reinforcement Learning Problem
The RL problem can be posed as follows:

An agent navigates an environment through the lens of an

interpreter. It interacts with the environment through performing
actions, and the environment in turn provides the agent with a
reward signal. The agent’s goal is to learn through experience
how to maximize the long term accumulated reward.
Finite Markov Decision Processes
Finite State, Discrete Time Markov
Chains
• Sequence of time steps:
• State space: such that
• States:

The states forms a stochastic process, and evolves according to a

transition probability
Markov Property and Time
Homogeneity
Markov Property

Time Homogeneous Markov Chain

• The transition probability is independent of time, i.e.

• The matrix is called the transition (probability) matrix

Example State space:
Transition probability:
𝑠1

𝑠2 Transition probability matrix

𝑠3

https://en.wikipedia.org/wiki/Markov_chain
Non-Markovian or Non-time-
homogeneous Stochastic Processes
Example of non-Markovian process
• Drawing without replacement coins out of a bag of coins
consisting of 10 of each $1, 50c and 10c coins. Let be the total
value of coins drawn up to time

Example of non-time-homogeneous process

• Drawing coins at time
Essential Components are Markov
Decision Processes
Markov decision processes (MDP) is a generalization of Markov
processes, with actions and rewards

Essential elements
• Sequence of time steps:
• States:
• Actions: (union over all )
• Rewards:
State Evolution

Agent

Reward
State Action

Environment
Reward (Interpreter)
State
Transition Probability
For Markov chains, we have the transition probability

For Markov decision processes, we need to account for

additionally:
• The reward
• The action

Hence, we specific the MDP transition probability

Markov Decision Processes
A Markov decision process (MDP) is the evolution of according
to

A MDP is finite if is finite and is finite for each

Example: The Recycling Robot
State:
(position, charge, weight)
Actions:

• If then is empty
• If , such that , and has a can, then

• …
Reward:

Charging
Station
The “Decision” Aspect: The Policy
The only way the agent has control over this system is through the
choice of actions.

This is done by a specifying a policy

Deterministic policies:
Then we write , i.e. deterministic policies are functions
The Goal of Choosing a Policy:
Returns
We want to maximize long-term rewards…

Define the return

Here, is the discount rate

This includes both finite and infinite time MDPs.

The Objective of RL
The goal of RL is the maximize, by choosing a good policy , the
expected return

where we start from some state .

We will consider time-homogeneous cases where this is the same
as
Dynamic Programming
Example
1 +2 3 -1
+3
+1
+1 +4
4
0

+3 -2
+0 7
-3
+5 -1

How long does it take to check all possibilities?

The Curse of Dimensionality
A term coined by R. Bellman (1957)

The number of states grows exponentially when the

dimensionality of the problem increases

Can we have a non-brute-force algorithm?

Dynamic Programming Principle
On an optimal path (following the optimal policy), if we start
at any state in that path, the rest of path must again be
optimal
Dynamic Programming in Action
Define

+4 +2 +2 -1
+4 +2 +3 +3
+1 +2 +1
+1 +4
+5
+6
+6
+3 -2
+0 +5 +1
+6 +1 -3
-3
+6 +5 -4 -1
The Complexity of Dynamic
Programming

We have shown that brute-force search takes at least steps.

What about dynamic programming?

Summary of Key Ideas

Come up with Find optimal

Come up with
a recursive policy by
a measure of
way to acting greedily
“value” of
compute the according to
each state
value the value
Bellman’s Equations and Optimal
Policies
Value Function
As motivated earlier, we define the value function

and the action value function

Our goal: derive a recursion for and

These are known as Bellman’s equations

Relationship between and
Using the definitions, we can show the following relationships:

Combining, we get

This is known as the Bellman’s equation for the value function

Bellman’s Equation
For finite MDPs, the Bellman’s equation can be written as

This is a linear equation, and we can show that there exists a

unique solution for .
In fact, it is just

whose existence and uniqueness follow from the invertibility of ,

which in turn follows from .
Bellman’s Equation for Action-Value
Function
Using similar methods, one can show that the action value
function satisfies a similar recursion

Exercise: derive this equation and show that there exists a unique
solution
Comparing Policies
We can compare policies via their values
• Given , we say if for all
• This is a partial order

Examples
• , , Then
• , , Then neither nor holds
Optimal Policy
We define an optimal policy to be any policy satisfying

In other words, for all and all

• Does such a exist?
• Is it unique?
Policy Improvement
We can derive the following result:

For any two policies , if

Then we must have

In addition, if the first inequality is strict for some , then the

second equality is strict for at least one .
Bellman’s Optimality Condition
A policy is optimal if and only if for any state-action pair such
that , we have

This means that an optimal policy must choose an action that

maximize its associated action value function.

This then implies the existence of an optimal policy!

Bellman’s Optimality Equation
Corresponding to an optimal policy

we obtain the following recursion

These are known as the Bellman’s optimality equations

Some Remarks
The optimal value function is unique. Is an optimal policy
unique?

Observe that the policy generated from can be taken to be

deterministic

In fact, for every policy there exists a deterministic policy such

that !
Summary
We introduced
• Basic formulation of reinforcing learning
• MDP as the mathematical framework
• Bellman’s equations characterizing optimal policies

Next time: algorithms to solve RL problems

Assignment#1 (A.i Lab)
No ratings yet
Assignment#1 (A.i Lab)
4 pages
Flowchart Notes
No ratings yet
Flowchart Notes
7 pages
Dekker's Algorithm
No ratings yet
Dekker's Algorithm
9 pages
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
No ratings yet
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
40 pages
Monad For Schemers
No ratings yet
Monad For Schemers
5 pages
Big-O Algorithm Complexity Cheat Sheet
No ratings yet
Big-O Algorithm Complexity Cheat Sheet
9 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
No ratings yet
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
6 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
Standard Greedy Algorithms
No ratings yet
Standard Greedy Algorithms
5 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
02 MarkovDecisionProcess
No ratings yet
02 MarkovDecisionProcess
51 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
CIE 115 SAS 5 Highlighted Version
No ratings yet
CIE 115 SAS 5 Highlighted Version
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
Graphs With Tiny Vector Chromatic Numbers and Huge Chromatic Numbers
No ratings yet
Graphs With Tiny Vector Chromatic Numbers and Huge Chromatic Numbers
63 pages
Reinforcement Learning: Part I - Definitions
No ratings yet
Reinforcement Learning: Part I - Definitions
26 pages
Some Problems Illustrating The Principles of Duality
No ratings yet
Some Problems Illustrating The Principles of Duality
22 pages
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
No ratings yet
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
19 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
16 RL PDF
No ratings yet
16 RL PDF
87 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
Markovian Decision Process
No ratings yet
Markovian Decision Process
27 pages
Botler, Jiménez - 2017 - On Path Decompositions of 2k-Regular Graphs
No ratings yet
Botler, Jiménez - 2017 - On Path Decompositions of 2k-Regular Graphs
7 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
Data Structures (Sample) Course Report
No ratings yet
Data Structures (Sample) Course Report
128 pages
24.2 Logic Gates
No ratings yet
24.2 Logic Gates
13 pages
3 Markov Decision Processes
No ratings yet
3 Markov Decision Processes
70 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Linear Algebra: Syllabus For M Tech Signal Processing (2011 Batch)
No ratings yet
Linear Algebra: Syllabus For M Tech Signal Processing (2011 Batch)
20 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Contoh Soal Dan Jawaban
No ratings yet
Contoh Soal Dan Jawaban
10 pages
VLSI Architectures and Implementations
No ratings yet
VLSI Architectures and Implementations
56 pages
32 Scheme Examples PDF
No ratings yet
32 Scheme Examples PDF
8 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
DS, Stacks Types of DS Apni Kaksha
No ratings yet
DS, Stacks Types of DS Apni Kaksha
13 pages
DSA5102 Lecture9
100% (1)
DSA5102 Lecture9
35 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
Add-On DRL CS06
No ratings yet
Add-On DRL CS06
23 pages
Lesson 5 TREES AND ALGORITHMS PDF
No ratings yet
Lesson 5 TREES AND ALGORITHMS PDF
31 pages
Lect28 4up
No ratings yet
Lect28 4up
11 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
AS02
No ratings yet
AS02
16 pages
LSyn3 Unate
No ratings yet
LSyn3 Unate
23 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
RL Ese
No ratings yet
RL Ese
7 pages
Chapter 6
No ratings yet
Chapter 6
12 pages
Unit 5 Deep Learning
No ratings yet
Unit 5 Deep Learning
24 pages
GAUSS
No ratings yet
GAUSS
4 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
06 MDP
No ratings yet
06 MDP
89 pages
17 - Markov Decision Processes
No ratings yet
17 - Markov Decision Processes
59 pages
PDF Unit-5 (Full Unit)
No ratings yet
PDF Unit-5 (Full Unit)
37 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
Graphical Method of Solving LPP
No ratings yet
Graphical Method of Solving LPP
25 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
20ai903 - RL - Unit 2
No ratings yet
20ai903 - RL - Unit 2
27 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
Deep RL - Content Beyond Syllabus
No ratings yet
Deep RL - Content Beyond Syllabus
16 pages
DSA5102 Lecture3
No ratings yet
DSA5102 Lecture3
34 pages
Quanvolutional Neural Networks Powering Image Recognition With Quantum Circuits
No ratings yet
Quanvolutional Neural Networks Powering Image Recognition With Quantum Circuits
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
31 pages
DSA5102 Lecture12
No ratings yet
DSA5102 Lecture12
41 pages
RL-UNIT2 - RL Unit 2 RL-UNIT2 - RL Unit 2
No ratings yet
RL-UNIT2 - RL Unit 2 RL-UNIT2 - RL Unit 2
23 pages
Automata and Complexity Theory AssignmentE
0% (1)
Automata and Complexity Theory AssignmentE
2 pages
Ai (It) Unit-4
No ratings yet
Ai (It) Unit-4
37 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
RL DQN PG
No ratings yet
RL DQN PG
65 pages
DSA5102 Lecture10
No ratings yet
DSA5102 Lecture10
40 pages
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
No ratings yet
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
66 pages
Lec 12
No ratings yet
Lec 12
60 pages
10 ML Introduction To Reinforcement Learning
No ratings yet
10 ML Introduction To Reinforcement Learning
8 pages
MCQ's Chapter 2
No ratings yet
MCQ's Chapter 2
4 pages
DLMAIRIL01 Q4-2024 Session2
No ratings yet
DLMAIRIL01 Q4-2024 Session2
68 pages
CS229
No ratings yet
CS229
17 pages
Achiever Ig Mock p2
No ratings yet
Achiever Ig Mock p2
25 pages
CSE2530 Reinforcement Learning 2025 P1+2
No ratings yet
CSE2530 Reinforcement Learning 2025 P1+2
115 pages
RL Lecturer
No ratings yet
RL Lecturer
38 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
Reinforcement Learning Lec12
No ratings yet
Reinforcement Learning Lec12
60 pages
Theory of Computer Science Automata Languages and Computation 3rd Edition K.L.P. Mishra - Quickly Download The Ebook To Read Anytime, Anywhere
100% (2)
Theory of Computer Science Automata Languages and Computation 3rd Edition K.L.P. Mishra - Quickly Download The Ebook To Read Anytime, Anywhere
80 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Markov Decision
No ratings yet
Markov Decision
4 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
Backtracking Search For CSP
No ratings yet
Backtracking Search For CSP
22 pages
Unit 4
No ratings yet
Unit 4
49 pages
RL Module 4
No ratings yet
RL Module 4
50 pages
Bca4orc 2024 Aprl Operations Research
No ratings yet
Bca4orc 2024 Aprl Operations Research
3 pages
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet

DSA5102 Lecture11

Uploaded by

DSA5102 Lecture11

Uploaded by

Foundations of Machine Learning

DSA 5102 • Lecture 11

Today, we will look at another class of problems that lies

Learning from Experience

Task Agent Environment Interpreter Reward

• Vs unsupervised learning: not completely unsupervised due

• Vs supervised learning: not completely supervised, since

An agent navigates an environment through the lens of an

The states forms a stochastic process, and evolves according to a

Time Homogeneous Markov Chain

• The matrix is called the transition (probability) matrix

𝑠2 Transition probability matrix

Example of non-time-homogeneous process

For Markov decision processes, we need to account for

Hence, we specific the MDP transition probability

A MDP is finite if is finite and is finite for each

This is done by a specifying a policy

Define the return

Here, is the discount rate

This includes both finite and infinite time MDPs.

where we start from some state .

How long does it take to check all possibilities?

The number of states grows exponentially when the

Can we have a non-brute-force algorithm?

We have shown that brute-force search takes at least steps.

What about dynamic programming?

Come up with Find optimal

and the action value function

Our goal: derive a recursion for and

These are known as Bellman’s equations

This is known as the Bellman’s equation for the value function

This is a linear equation, and we can show that there exists a

whose existence and uniqueness follow from the invertibility of ,

In other words, for all and all

For any two policies , if

Then we must have

In addition, if the first inequality is strict for some , then the

This means that an optimal policy must choose an action that

This then implies the existence of an optimal policy!

we obtain the following recursion

These are known as the Bellman’s optimality equations

Observe that the policy generated from can be taken to be

In fact, for every policy there exists a deterministic policy such

Next time: algorithms to solve RL problems

You might also like