Lecture Week12
Lecture Week12
Introduction
Lecture 12
Dr. Syed Maaz Shahid
20th May,2024
Contents
• Overview
• Elements of Reinforcement Learning
• Use Cases
• Agents and States
• Components of an RL Agent
• RL Agent Taxonomy
• Exploration and Exploitation
• Major Challenges of RL
• Multi-Agent Reinforcement Learning
Assignment
• Find a paper that uses HMM to solve a problem in your relevant
field.
Doan, Nhat Quang, et al. "Deep Reinforcement Learning-Based Battery Management Algorithm for Retired Electric Vehicle Batteries with a Heterogeneous State of
Health in BESSs." Energies 17.1 (2023): 79.
Reinforcement Learning: Use Case
• Deep Q-learning playing Atari
• RL requires exploration
• 0 ≤ 𝛾 ≤ 1: discount rate
• 𝛾 close to 1: rewards further in the future count more→ agent is farsighted
Model
• The model describes the environment by a distribution over
rewards and state transitions.
𝑃𝑠𝑎′ = 𝑃 𝑆𝑡+1 = 𝑠 ′ 𝑆𝑡 = 𝑠, 𝐴𝑡 = 𝑎
OpenAI spinning up
Reinforcement Learning algorithms
• Model-based algorithms use a model of the environment.
• Model is used to predict future states and rewards.
• The model is either given (e.g. a chessboard) or learned.
• Planning:
• A model of the environment is known→ Markov decision problem
• The agent performs computations with its model
• The agent improves its policy
Exploration and Exploitation
• Reinforcement learning is like trial-and-error learning.
• The agent should discover a good policy from its experiences of the
environment → without losing too much reward along the way.
• Interesting trade-off:
• immediate reward (exploitation) vs. gaining knowledge that might enable
higher future reward (exploration)
Exploration and Exploitation: Examples
• Restaurant Selection
• Exploitation: Go to your favorite restaurant
• Exploration: Try a new restaurant
• Oil Drilling
• Exploitation: Drill at the best-known location
• Exploration: Drill at a new location
Major Challenges of RL
• Sample efficiency
• RL algorithms require a large amount of data and experience to learn
effectively → costly and time-consuming.
Fig: Non-stationarity of environment: Initially (a), the red agent learns to regulate
the speed of the traffic by slowing down. However, over time the blue agents learn
to bypass the red agent (b), rendering the previous experiences of the red agent
invalid.
References
• Reinforcement Learning: An Introduction by Richard S. Sutton and
Andrew G. Barto.
• David Silver Course on RL