0% found this document useful (0 votes)

17 views37 pages

Lecture Week12

The document provides an overview of reinforcement learning, including elements, use cases, agents and states, components of an RL agent, exploration vs exploitation, challenges, and multi-agent reinforcement learning. It also covers differences between supervised, unsupervised and reinforcement learning and discusses policy, value functions, and models in RL agents.

Uploaded by

Reedus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views37 pages

Lecture Week12

Uploaded by

Reedus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Reinforcement Learning:

Introduction
Lecture 12
Dr. Syed Maaz Shahid
20th May,2024
Contents
• Overview
• Elements of Reinforcement Learning
• Use Cases
• Agents and States
• Components of an RL Agent
• RL Agent Taxonomy
• Exploration and Exploitation
• Major Challenges of RL
• Multi-Agent Reinforcement Learning
Assignment
• Find a paper that uses HMM to solve a problem in your relevant
field.

• Make a report and submit by 31 May, 20204.

Machine Learning
Reinforcement Learning
• In supervised machine learning, you focus on predicting what you
don’t know→ learning to predict.

• Reinforcement learning (RL) is a framework for learning-based

decision-making→ learning to act.

• Reinforcement learning uses rewards and punishments as signals for

positive and negative behavior.
Reinforcement Learning: Elements
• Agent/ Policy: The program you train for specific task/Method to map
agent’s state to actions.
• Environment/State: The world, real or virtual, in which the agent
performs actions.
• Action. A move made by the agent, which causes a status change in
the environment.
• Rewards. The evaluation of an action (positive or negative)/Feedback
from the environment..
Reinforcement Learning: Elements
Reinforcement Learning: Use Case
• PacMan game
• Agent → PacMan
• Environment: Grid world
• Reward for eating food & Punishment if gets killed by the ghost
• State→ Location
• Cumulative reward→ winning the game
Reinforcement Learning: Use Case
• Battery Management for Retired Electric Vehicle Batteries

Doan, Nhat Quang, et al. "Deep Reinforcement Learning-Based Battery Management Algorithm for Retired Electric Vehicle Batteries with a Heterogeneous State of
Health in BESSs." Energies 17.1 (2023): 79.
Reinforcement Learning: Use Case
• Deep Q-learning playing Atari

• AlphaGo - The Movie

Differences: Supervised, Unsupervised, and
Reinforcement Learning
• Static vs. Dynamic

• No explicit right answer→ agent learns by trial and error

• RL requires exploration

• RL is a multiple-decision process→ decision-making chain through

the time in RL
Reinforcement Learning: Agent & Environment
• At each step t the agent
• Executes action 𝐴𝑡
• Receives observation 𝑂𝑡
• Receives scalar reward 𝑅𝑡
• The environment:
• Receives action 𝐴𝑡
• Emits observation 𝑂𝑡+1
• Emits scalar reward 𝑅𝑡+1
Reinforcement Learning: History & State
• The history 𝐻𝑡 is the sequence of observations, actions, rewards

• State is the information used to determine what happens next.

• Formally, state is a function of the history:

𝑆𝑡 = 𝑓(𝐻𝑡 )
Reinforcement Learning: Environment State
• The environment state 𝑆𝑡𝑒 is the
environment’s private representation.

• Usually, 𝑆𝑡𝑒 is not visible to the agent.

• Even if 𝑆𝑡𝑒 is visible to agent, it may contain

irrelevant information
Reinforcement Learning: Agent State
• The agent state 𝑆𝑡𝑎 is the agent’s internal
representation→ uses to pick the next action.
•
• It can be any function of history:
𝑆𝑡𝑎 = 𝑓(𝐻𝑡 )
Reinforcement Learning: Information State
• An information state (a.k.a. Markov state) contains all useful
information from the history.
• Markov chain is to assume that 𝑋𝑘 captures all the relevant information
for predicting the future

• A state St is Markov if and only if

𝑃 𝑆𝑡+1 𝑆𝑡 = 𝑃[𝑆𝑡+1 |𝑆1 , … , 𝑆𝑡 ]

• The future is independent of the past given the present

• Once the state is known, the history may be thrown away
Reinforcement Learning: Fully Observable
Environments
• Agent directly observes environment state
Ot = 𝑆𝑡𝑎 = 𝑆𝑡𝑒

• This is a Markov decision process (MDP).

Reinforcement Learning: Partially Observable
Environments
• Agent indirectly observes environment
𝑆𝑡𝑎 ≠ 𝑆𝑡𝑒

→A poker playing agent only observes public cards.

• This is a partially observable Markov decision process (POMDP).

• Agent must construct its own state representation 𝑆𝑡𝑎 = 𝐻𝑡

Components of an RL Agent
• Policy: agent’s behavior function

• Value function: how good is each state and/or action

• Model: agent’s representation of the environment

Policy
• A policy is the agent’s behavior.

• It is a map from state to action,

• Deterministic policy: 𝑎 = 𝜋(𝑠)→same action give a state

• Stochastic policy: 𝜋(𝑎|𝑠) = 𝑃[𝐴𝑡 = 𝑎|𝑆𝑡 = 𝑠]→ choose an action

randomly based on the probability distribution
Value Function
• Value function 𝑣𝜋 (𝑠) is a prediction of future reward

• Used to evaluate the goodness/badness of states

• By following a policy 𝜋, the value function is defined as

𝑣𝜋 𝑠 = 𝐸[𝑅𝑡 + 𝛾𝑅𝑡+1 + 𝛾 2 𝑅𝑡+2 + ⋯ |𝑆𝑡 = 𝑠]

• 0 ≤ 𝛾 ≤ 1: discount rate
• 𝛾 close to 1: rewards further in the future count more→ agent is farsighted
Model
• The model describes the environment by a distribution over
rewards and state transitions.

𝑃𝑠𝑎′ = 𝑃 𝑆𝑡+1 = 𝑠 ′ 𝑆𝑡 = 𝑠, 𝐴𝑡 = 𝑎

𝑅𝑠𝑎′ = 𝐸[𝑅𝑡+1 |𝑆𝑡 = 𝑠, 𝐴𝑡 = 𝑎]

RL Agent Taxonomy
RL Agent Taxonomy
• DQN→ value-based method

• A2C→ policy-based method (actor) & value-based method (critic)

• Policy Optimization (PPO)

Taxonomy of Reinforcement Learning
algorithms

OpenAI spinning up
Reinforcement Learning algorithms
• Model-based algorithms use a model of the environment.
• Model is used to predict future states and rewards.
• The model is either given (e.g. a chessboard) or learned.

• Model-free algorithms directly learn how to act for the states

encountered during training
• Which state-action pairs yield good rewards (Q-Learning).
Reinforcement Learning: Fundamental
Problems
• Learning:
• The environment is initially unknown → which states are good or what the
actions do.
• The agent interacts with the environment
• The agent improves its policy

• Planning:
• A model of the environment is known→ Markov decision problem
• The agent performs computations with its model
• The agent improves its policy
Exploration and Exploitation
• Reinforcement learning is like trial-and-error learning.
• The agent should discover a good policy from its experiences of the
environment → without losing too much reward along the way.

• Exploration: finds more information about the environment.

• Exploitation: exploits known information to maximize reward.

• Interesting trade-off:
• immediate reward (exploitation) vs. gaining knowledge that might enable
higher future reward (exploration)
Exploration and Exploitation: Examples
• Restaurant Selection
• Exploitation: Go to your favorite restaurant
• Exploration: Try a new restaurant

• Oil Drilling
• Exploitation: Drill at the best-known location
• Exploration: Drill at a new location
Major Challenges of RL
• Sample efficiency
• RL algorithms require a large amount of data and experience to learn
effectively → costly and time-consuming.

• State and action spaces

• Exponential growth of state and action spaces as problem complexity
increases.
• For example, in the game of Go, the number of possible board configurations
is estimated to be 10170 .
Major Challenges of RL
• Exploration and exploitation
• Both exploration and exploitation are essential for learning, but they can also
conflict with each other.
Multi-Agent Reinforcement Learning
• Vanilla reinforcement learning is concerned with a single agent.

• Multi-agent reinforcement learning (MARL) studies how multiple

agents interact in a common environment.

• Cooperative: All agents working towards a common goal

• Competitive: Agents competing with one another to accomplish a goal

Multi-Agent Reinforcement Learning
Why MARL?
• A more natural decomposition of the problem: Need to train
policies for cellular antenna tilt control.
• Instead of training a single super-agent that controls all the cellular
antennas in a city→ it is more natural to model each antenna as a
separate agent in the environment.

• Potential for more scalable learning

Sing super-agent Multi-agent

MARL: Use case
• Dota 2: AI agents are trained to coordinate with each other to
compete against humans.
MARL: Challenges
• Environment non-stationarity

Fig: Non-stationarity of environment: Initially (a), the red agent learns to regulate
the speed of the traffic by slowing down. However, over time the blue agents learn
to bypass the red agent (b), rendering the previous experiences of the red agent
invalid.
References
• Reinforcement Learning: An Introduction by Richard S. Sutton and
Andrew G. Barto.
• David Silver Course on RL

dm2023 0431
No ratings yet
dm2023 0431
1 page
Fracture Presentation
100% (10)
Fracture Presentation
53 pages
Open and Close Periods For MM - OMSY, MMRV and MMPV
95% (39)
Open and Close Periods For MM - OMSY, MMRV and MMPV
9 pages
ITP For Fire Fighting System
33% (3)
ITP For Fire Fighting System
7 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Module 1
No ratings yet
Module 1
72 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Module 01
No ratings yet
Module 01
66 pages
REINFORCEMENT LEARNING-1
No ratings yet
REINFORCEMENT LEARNING-1
19 pages
Reinforcement Learning (RL) : Big Data Mining
No ratings yet
Reinforcement Learning (RL) : Big Data Mining
86 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
RL Week_1
No ratings yet
RL Week_1
53 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Lecture1 Introduction Part1
No ratings yet
Lecture1 Introduction Part1
17 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
17 pages
UNIT V reinforcement learning
No ratings yet
UNIT V reinforcement learning
8 pages
MLT Unit-5 notes
No ratings yet
MLT Unit-5 notes
17 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
DLMAIRIL01_Q4-2024_Session1
No ratings yet
DLMAIRIL01_Q4-2024_Session1
84 pages
Unit 5
No ratings yet
Unit 5
45 pages
UNIT-4
No ratings yet
UNIT-4
56 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
Reinforcement_Learning_Enhanced
No ratings yet
Reinforcement_Learning_Enhanced
3 pages
Winter Semester 2023-24_CSE4037_ETH_AP2023246000594_2024-01-05_Reference-Material-I
No ratings yet
Winter Semester 2023-24_CSE4037_ETH_AP2023246000594_2024-01-05_Reference-Material-I
35 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
An Introduction To Deep ReinforcementLearning
No ratings yet
An Introduction To Deep ReinforcementLearning
65 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
4.1 Reinforcement Learning 2
No ratings yet
4.1 Reinforcement Learning 2
31 pages
IntroductiontoRL-BR
No ratings yet
IntroductiontoRL-BR
22 pages
UNIT-3
No ratings yet
UNIT-3
29 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
3.RL Unit 3
No ratings yet
3.RL Unit 3
31 pages
Final
No ratings yet
Final
18 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
F90de-Introduction To Reinforcement Learning
No ratings yet
F90de-Introduction To Reinforcement Learning
67 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
4 pages
Reinforcement_learning
No ratings yet
Reinforcement_learning
19 pages
Reinforcement learning
No ratings yet
Reinforcement learning
10 pages
AI unit -3.docx
No ratings yet
AI unit -3.docx
102 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
A (Long) Peek Into Reinforcement Learning _ Lil'Log
No ratings yet
A (Long) Peek Into Reinforcement Learning _ Lil'Log
23 pages
UNIT 5 ML
No ratings yet
UNIT 5 ML
49 pages
Module_1 - Reinforcement Learning and Markov Decision Process
No ratings yet
Module_1 - Reinforcement Learning and Markov Decision Process
19 pages
DRL Final Notes
No ratings yet
DRL Final Notes
281 pages
Introduction To Reinforcement Learning: Presented by - Rohit Mahto
No ratings yet
Introduction To Reinforcement Learning: Presented by - Rohit Mahto
9 pages
unit 5 ml
No ratings yet
unit 5 ml
15 pages
Reinforcement Learning (RL) : Agent
No ratings yet
Reinforcement Learning (RL) : Agent
35 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Lecture 1 20240304v5
No ratings yet
Lecture 1 20240304v5
33 pages
Week10-State-feedback Control-Smc-Optional
No ratings yet
Week10-State-feedback Control-Smc-Optional
13 pages
Week05-System Stability-2
No ratings yet
Week05-System Stability-2
8 pages
Week10-State-feedback Control-Part2
No ratings yet
Week10-State-feedback Control-Part2
8 pages
Week04-System Stability-1
No ratings yet
Week04-System Stability-1
12 pages
02 Big Data Analytics MDEC PDF
No ratings yet
02 Big Data Analytics MDEC PDF
34 pages
Appendix F - Mechanical Design
No ratings yet
Appendix F - Mechanical Design
37 pages
Automotive Aftermarket: Adhesive and Sealants Guide
No ratings yet
Automotive Aftermarket: Adhesive and Sealants Guide
48 pages
Top Question 1 Theories of Personality Reviewer
No ratings yet
Top Question 1 Theories of Personality Reviewer
10 pages
Podcast listening test-1
No ratings yet
Podcast listening test-1
9 pages
Log
No ratings yet
Log
4,539 pages
FortiOS-6 4 0-Best - Practices
No ratings yet
FortiOS-6 4 0-Best - Practices
42 pages
I Can Talk About Celebrity: A. Celebrity and The Media
No ratings yet
I Can Talk About Celebrity: A. Celebrity and The Media
9 pages
How To Grow Your Own SHTF Pharmacy
100% (1)
How To Grow Your Own SHTF Pharmacy
23 pages
Learn To Read, Read To Learn PDF
No ratings yet
Learn To Read, Read To Learn PDF
179 pages
The Complete MARILLION Discography V2 PDF
No ratings yet
The Complete MARILLION Discography V2 PDF
13 pages
Notes The Delhi Sultans
No ratings yet
Notes The Delhi Sultans
13 pages
Azure Migration Presentation Fresno 1 28 2020
No ratings yet
Azure Migration Presentation Fresno 1 28 2020
25 pages
Demystifying LLMs
No ratings yet
Demystifying LLMs
53 pages
Not Fatal Virus
No ratings yet
Not Fatal Virus
11 pages
What Is A Cell - MedlinePlus Genetics
No ratings yet
What Is A Cell - MedlinePlus Genetics
3 pages
AS NZS 3992-1998 Amdt 1-2000 Pressure Equipment - Welding and Brazing Qualification PDF
No ratings yet
AS NZS 3992-1998 Amdt 1-2000 Pressure Equipment - Welding and Brazing Qualification PDF
6 pages
2.2. Teaching - Learning Processes (100) 2.2.1. Describe Processes Followed To Improve Quality of Teaching & Learning
No ratings yet
2.2. Teaching - Learning Processes (100) 2.2.1. Describe Processes Followed To Improve Quality of Teaching & Learning
8 pages
ENDOMETRIOSIS
89% (9)
ENDOMETRIOSIS
33 pages
Edital Hospital Sarah-Fortaleza
No ratings yet
Edital Hospital Sarah-Fortaleza
15 pages
Eligibility Criteria For Faculty Positions at COMSATS Institute of Information Technology (CIIT)
No ratings yet
Eligibility Criteria For Faculty Positions at COMSATS Institute of Information Technology (CIIT)
11 pages
Module 3 - Far
No ratings yet
Module 3 - Far
19 pages
About Kansai Paint Presentation
No ratings yet
About Kansai Paint Presentation
29 pages
PDF Created With Pdffactory Pro Trial Version
No ratings yet
PDF Created With Pdffactory Pro Trial Version
16 pages
Methods of IT Project Management 2nd Edition Brewer Test Bank pdf download
100% (1)
Methods of IT Project Management 2nd Edition Brewer Test Bank pdf download
43 pages
EC Drive Catalog 1
No ratings yet
EC Drive Catalog 1
28 pages

Lecture Week12

Uploaded by

Lecture Week12

Uploaded by

Reinforcement Learning:

• Make a report and submit by 31 May, 20204.

• Reinforcement learning (RL) is a framework for learning-based

• Reinforcement learning uses rewards and punishments as signals for

• AlphaGo - The Movie

• No explicit right answer→ agent learns by trial and error

• RL is a multiple-decision process→ decision-making chain through

• State is the information used to determine what happens next.

• Formally, state is a function of the history:

• Usually, 𝑆𝑡𝑒 is not visible to the agent.

• Even if 𝑆𝑡𝑒 is visible to agent, it may contain

• A state St is Markov if and only if

• The future is independent of the past given the present

• This is a Markov decision process (MDP).

→A poker playing agent only observes public cards.

• This is a partially observable Markov decision process (POMDP).

• Agent must construct its own state representation 𝑆𝑡𝑎 = 𝐻𝑡

• Value function: how good is each state and/or action

• Model: agent’s representation of the environment

• It is a map from state to action,

• Deterministic policy: 𝑎 = 𝜋(𝑠)→same action give a state

• Stochastic policy: 𝜋(𝑎|𝑠) = 𝑃[𝐴𝑡 = 𝑎|𝑆𝑡 = 𝑠]→ choose an action

• Used to evaluate the goodness/badness of states

• By following a policy 𝜋, the value function is defined as

𝑅𝑠𝑎′ = 𝐸[𝑅𝑡+1 |𝑆𝑡 = 𝑠, 𝐴𝑡 = 𝑎]

• A2C→ policy-based method (actor) & value-based method (critic)

• Policy Optimization (PPO)

• Model-free algorithms directly learn how to act for the states

• Exploration: finds more information about the environment.

• State and action spaces

• Multi-agent reinforcement learning (MARL) studies how multiple

• Cooperative: All agents working towards a common goal

• Competitive: Agents competing with one another to accomplish a goal

• Potential for more scalable learning

Sing super-agent Multi-agent

You might also like