0% found this document useful (0 votes)

16 views

AutoRL Tutorials

Uploaded by

Pinte Tomare

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

AutoRL Tutorials

Uploaded by

Pinte Tomare

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

Beyond Trial & Error: A Tutorial on

Automated Reinforcement Learning

André Biedenkapp & Theresa Eimer

ECAI 2024
Santiago de Compostela, 19. October 2024
André Biedenkapp Theresa Eimer
University of Freiburg University of Hannover
2
Come to the Next COSEAL Workshop
The next COSEAL workshop will take
place 5th - 7th of May 2025 in Porto
● Algorithm selection
● Algorithm configuration
● Algorithm portfolios
● Performance predictions
and empirical performance models
● Bayesian optimization
● Hyperparameter optimization
● Automated machine learning (AutoML)
● Automated reinforcement learning (AutoRL)
● Neural architecture search
● Meta-learning
● Algorithm and parameter control
● Explorative landscape analysis
● Programming by optimization COSEAL.net
● Hyper-heuristics
3
Outline
Introduction and algorithmic part on AutoRL (André, ~60min; 9:00 - 10:00)
💪Practical Session I (André, ~30min; 10:00 - 10:30)

☕ coffee break (resume 11:00) ☕

Practical guidelines and case studies (Theresa, ~45min; 11:00 - 11:45)

💪Practical Session II: Tools for HPO in RL (Theresa, ~45min; 11:45 - 12:30)

4
Why does AutoRL matter?

5
Primer on Reinforcement Learning
Agents learn by interacting with their world

6
Primer on Reinforcement Learning
Environments/Worlds are typically modelled as Markov Decision Processes (MDPs)

An MDP consists of:

• A state space S ← Observable by the agent

• An action space A ← Known to the agent
• A transition function T:S x A x S → [0, 1] ← Intractable in practice
• A reward function R: S x A → [-infinity, infinity] ← Observable by the agent

7
Primer on Reinforcement Learning
How can we learn by interacting with the world?

1. Initialize a policy : S x A → [0, 1]

2. Generate rollouts by following this policy.

3. Learn from your collected transitions (s, a, r, s’)
a. Learn a model of the value (i.e., expected reward to go) V(s) or Q(s,a)
Use this model to find a better policy
b. Directly update the policy by following a policy gradient
c. Learn a model of the world (T’: S x A x S → [0, 1])
Use planning to find a better policy
4. Repeat 2 & 3 until your training budget is exhausted

8
Primer on Reinforcement Learning

9
Primer on Reinforcement Learning
Many, well publicised
success stories

10
11
source: https://huggingface.co/learn/deep-rl-course/unit3/hands-on
12
source: https://sonyresearch.github.io/gt_sophy_public/
13
source: UZH Robotics and Perception Group YouTube channel
Reinforcement Learning is Sensitive to Hyperparameters

All previous examples worked well due to lots and lots of

domain expertise!

14
Reinforcement Learning is Sensitive to Hyperparameters
RL is sensitive to:

● Hyperparameters
● Network Architecture
● Reward Scale
● Random Seeds & Trials
● Environment Type
● Codebases

15
Henderson et al., AAAI 2018

16
How to Choose a Reinforcement-Learning Algorithm
More and more “heuristics” on how to choose the Bongratz
Caspi et al.,
et al.,
github
arXiv
2017
2024
right algorithm for the task

17
Properties of AutoRL
Hyperparameter Landscapes

18
Learning by Iteration
Supervised Learning: Reinforcement Learning

1. Initialize model 1. Initialize policy

2. Fit model to data set 2. Generate
Generate Observations
Observations
3. Observe model performance 3. Observe policy performance
→ compute loss → compute loss
4. Adapt model based on loss 4. Adapt policy based on loss
5. Repeat 2 - 4 until learning budget is 5. Repeat 2 - 4 until learning budget is
exhausted exhausted

19
Reinforcement Learning by Iteration

[Sutton & Barto, MIT Press 2018] 20

Reinforcement Learning by Iteration
Q-Learning example

Parameters that are being learned:

Model of the Value-Function

Hyperparameters:

learning rate

discounting factor

21
Properties of AutoRL Hyperparameter Landscapes
Landscape analysis of a DQN agent
on the cartpole environment by
[Mohan et al., AutoML 2023].

We’ll come back to this in the first practical session! 22

Properties of AutoRL Hyperparameter Landscapes

HPO for RL has to deal with much stronger non-stationarity

than is the case for supervised learning!

23
AutoRL: Optimizing the Full Pipeline

24
What Can We Automate?

25
What Can We Automate? → The Role of Networks
As in supervised learning:
The architecture choice matters! (recall Henderson et al., AAAI’18)

But:
Deep RL often uses much smaller networks

26
What Can We Automate? → The Role of Networks
As in supervised learning:
The architecture choice matters! (recall Henderson et al., AAAI’18)

But:
Deep RL often uses much smaller networks
Particularly when state-features are “well engineered”

Could potentially pose the perfect playground for NAS with smaller search spaces

27
What Can We Automate? → The Role of Networks
Recall: (Reinforcement) Learning is done by iteration

In Deep RL we are faced with the Primacy Bias (Nikishin et al., ICML 2022):
• Agents are prone to overfit to early experiences
• This often negatively affects the whole learning process

Why?
Replay Buffers & Replay Ratios

Combats Training Increases Sample-Efficiency

Non-Stationarity

28
What Can We Automate? → The Role of Networks
Heavy Priming: Nikishin et al., ICML 2022

● after collecting 100 data

points the agent is
updated 10^5 times
using the resulting replay
buffer
● ~heavy overfitting to a
single batch of early data

29
What Can We Automate? → The Role of Networks
“Is the data collected by an Nikishin et al., ICML 2022
overfitted agent unusable for
learning?”

Train the same agent twice:

● Once with an empty
replay buffer
● Starting from the final
replay buffer of the first

The collected data is of high

quality and enables quick
warmstarting.
30
Resetting Combats the Primacy Bias
Partial or even full resetting to the Nikishin et al., ICML 2022
rescue!
● Periodically resetting the full network
tends to harm performance
● Keeping the input layers but
periodically resetting the output
tends to drastically improve
performance

When or how often should we reset?

Related idea: Resetting internal

parameters of optimizers such as Adam
(Asadi et al., NeurIPS 2023)
31
What Can We Automate? → The Role of Environments
We can, e.g., optimize:
● Observation Spaces
● Reward Functions
● Action spaces
● Task Curriculum

32
Q&A: Part 1

33
Practical Session 1 https://s.gwdg.de/kuBEC1

34
Examples of successful AutoRL, DAC
and online configuration Approaches

35
Prioritized Level Replay [Jiang et al. 2021]

Idea: train on levels which are most important for training performance

36
HOOF: Hyperparameter Optimisation on the Fly [Paul et al. 2019]

Idea: efficient dynamic configuration by approximating hyperparameter quality

Query new Fit existing Approximate

Partial Training
configuration samples cost

37
HOOF: Hyperparameter Optimisation on the Fly [Paul et al. 2019]

Idea: efficient dynamic configuration by approximating hyperparameter quality

For each collected sample s:

Query new Fit existing Approximate

Partial Training
configuration samples cost

38
HOOF: Hyperparameter Optimisation on the Fly [Paul et al. 2019]

Idea: efficient dynamic configuration by approximating hyperparameter quality

For each collected sample s:
Is s more likely to occur under the new
configuration than before?

Query new Fit existing Approximate

Partial Training
configuration samples cost

39
HOOF: Hyperparameter Optimisation on the Fly [Paul et al. 2019]

Idea: efficient dynamic configuration by approximating hyperparameter quality

For each collected sample s:
Is s more likely to occur under the new
configuration than before?
X Quality of s

Query new Fit existing Approximate

Partial Training
configuration samples cost

40
Population-Based Training [Jaderberg et al. 2017]

Population
Idea: dynamic configuration via parallel trials

Selection &
Partial Training
Mutation

…
41
Combining AutoRL Approaches

42
In RL Everything Is Connected

• Most successful AutoRL methods are dynamic

43
In RL Everything Is Connected

• Most successful AutoRL methods are dynamic

• But: changing one aspect of the learning setting influences others

44
In RL Everything Is Connected

• Most successful AutoRL methods are dynamic

• But: changing one aspect of the learning setting influences others

• Example 1: as we progress to harder levels with PLR, we likely need different

hyperparameters and maybe even a different network

45
In RL Everything Is Connected

• Most successful AutoRL methods are dynamic

• But: changing one aspect of the learning setting influences others

• Example 1: as we progress to harder levels with PLR, we likely need different

hyperparameters and maybe even a different network

• Example 2: when the agent improves in training, we may want to not only adapt the
hyperparameters with PBT, but also change the algorithm

46
Combining HPO And NAS: BG-PBT [Wan et al. 2022]

47
Evaluation and Generalization of
AutoRL

48
What Is Generalization in AutoRL?

AutoRL

Images via Freepik 49

What Is Generalization in AutoRL?

AutoRL

Images via Freepik 50

What Is Generalization in AutoRL?

AutoRL

Images via Freepik 51

What Is Generalization in AutoRL?

AutoRL

Images via Freepik 52

Evaluating RL

Commonly we measure Return R:

53
Evaluating RL

Commonly we measure Return R:

54
Evaluating RL

Commonly we measure Return R:

Problem: both environment and policy can be stochastic

55
Evaluating RL

Problem: both environment and policy can be stochastic

56
Evaluating RL

Problem: both environment and policy can be stochastic

57
Evaluating RL

Problem: both environment and policy can be stochastic

58
Evaluating RL

Problem: both environment and policy can be stochastic

59
Evaluating RL

Problem: both environment and policy can be stochastic

Common Practice: average across multiple rollouts

• Trade-off between extra cost and reliability of the estimate

• Some return distributions are bimodal and lead to large deviations between rollouts
• This has to be done per seed, task variation, etc.
• Usually either mean or IQM are used as metrics

60
Evaluating AutoRL

Problem: unreliable estimates meet unstable target function meet potentially stochastic
AutoRL method

61
Evaluating AutoRL

Problem: unreliable estimates meet unstable target function meet potentially stochastic
AutoRL method
RL evaluation cost: 1

AutoRL

Images via Freepik

62
Evaluating AutoRL

Problem: unreliable estimates meet unstable target function meet potentially stochastic
AutoRL method
RL evaluation cost: 1

RL across seeds: 10

AutoRL

Images via Freepik

63
Evaluating AutoRL

Problem: unreliable estimates meet unstable target function meet potentially stochastic
AutoRL method
RL evaluation cost: 1

RL across seeds: 10

AutoRL

Across AutoRL seeds: 50

Images via Freepik

64
Evaluating AutoRL

Problem: unreliable estimates meet unstable target function meet potentially stochastic
AutoRL method
RL evaluation cost: 1

RL across seeds: 10

AutoRL

Across AutoRL seeds: 50 AutoRL evaluation

across task variation:
50 per task
Images via Freepik
65
HPO for RL

66
HPO for RL

• RL algorithms have many important hyperparameters [Eimer et al. 2023]

67
HPO for RL

• RL algorithms have many important hyperparameters [Eimer et al. 2023]

• Which hyperparameters matter most is highly domain dependent

68
HPO for RL

• RL algorithms have many important hyperparameters [Eimer et al. 2023]

• Which hyperparameters matter most is highly domain dependent

Left: DQN HP
importance on
Acrobot.

Right: DQN HP
importance on
MiniGrid 5x5.

69
HPO for RL

• RL algorithms have many important hyperparameters [Eimer et al. 2023]

• Which hyperparameters matter most is highly domain dependent

• But: in practice hand tuning and grid search are most common

• Little systematic work, partly due to high computational cost

70
HPO for RL

HPO for RL outperforms Grid Search

Baselines from State-Of-The-Art
Papers!

71
HPO for RL

HPO for RL outperforms Grid Search

Baselines from State-Of-The-Art
Papers!

But…

72
Seeding Matters

73
Seeding Matters

74
Zero-Cost Benchmarking: HPO-RL-Bench [Shala et al. 2024]

• Tabular benchmark: results for different

configurations of 5 algorithms on
22 environments
• “Evaluation” is a single lookup,
so essentially free
• Support for dynamic
configuration schedules
• Pre-evaluated baselines

75
Zero-Cost Benchmarking: HPO-RL-Bench [Shala et al. 2024]

• Tabular benchmark: results for different

configurations of 5 algorithms on
22 environments
• “Evaluation” is a single lookup,
so essentially free
• Support for dynamic
configuration schedules
• Pre-evaluated baselines

AutoML-Conf Best Paper Nominee

76
Flexible Benchmarking: ARLBench [Becktepe, Dierkes et al., 2024]

• JAX-based RL implementations for efficient online execution

• Subset selection to find 4-5 relevant environments (from 22) for three algorithms
• Speedup factors between 7x and 12x per algorithm
• Support for dynamic configuration and flexible objectives as well as large search spaces

77
Flexible Benchmarking: ARLBench [Becktepe, Dierkes et al., 2024]

• JAX-based RL implementations for efficient online execution

78
Practical Session 2 https://s.gwdg.de/xsIDyi

79
Thank You!

André Biedenkapp Theresa Eimer

University of Freiburg University of Hannover

GeorgiaTech CS-6515: Graduate Algorithms: Divide-And-Conquer Flashcards by Yang Hu - Brainscape
No ratings yet
GeorgiaTech CS-6515: Graduate Algorithms: Divide-And-Conquer Flashcards by Yang Hu - Brainscape
8 pages
Versteeg H K, Malalasekera W Introduction To Computational Fluid Dynamics The Finite Volume Meth
100% (12)
Versteeg H K, Malalasekera W Introduction To Computational Fluid Dynamics The Finite Volume Meth
267 pages
When To Use Parametric Models in Reinforcement Learning?: Hado Van Hasselt, Matteo Hessel, John Aslanides
No ratings yet
When To Use Parametric Models in Reinforcement Learning?: Hado Van Hasselt, Matteo Hessel, John Aslanides
14 pages
Reinforcement Learning Optimization
No ratings yet
Reinforcement Learning Optimization
6 pages
Deep RL with MuJoCo
No ratings yet
Deep RL with MuJoCo
20 pages
THEORY FILE - Machine Learning (6th Sem)!!
No ratings yet
THEORY FILE - Machine Learning (6th Sem)!!
26 pages
Environment Interaction of A Bipedal Robot Using Model-Free Control Framework Hybrid Off-Policy and On-Policy Reinforcement Learning Algorithm
No ratings yet
Environment Interaction of A Bipedal Robot Using Model-Free Control Framework Hybrid Off-Policy and On-Policy Reinforcement Learning Algorithm
12 pages
Introduction To Design Analysis & Algorithms
No ratings yet
Introduction To Design Analysis & Algorithms
79 pages
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
No ratings yet
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
6 pages
6. Practical Machine Learning-1
No ratings yet
6. Practical Machine Learning-1
5 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Rudin 22 A
No ratings yet
Rudin 22 A
10 pages
Fulll Stack LLMs Stanford University
No ratings yet
Fulll Stack LLMs Stanford University
39 pages
1) Algs & Theory Overview 3) Systems For Going Right 4) Really Doing It in Practice
No ratings yet
1) Algs & Theory Overview 3) Systems For Going Right 4) Really Doing It in Practice
54 pages
RADL LACuong
No ratings yet
RADL LACuong
81 pages
RLC Project Report
No ratings yet
RLC Project Report
2 pages
CSD Final Report
No ratings yet
CSD Final Report
8 pages
Mlt
No ratings yet
Mlt
159 pages
Lec07 Baysian Opti
No ratings yet
Lec07 Baysian Opti
94 pages
Zhang2020 Article InterpretablePolicyDerivationF
No ratings yet
Zhang2020 Article InterpretablePolicyDerivationF
13 pages
Lec 01
No ratings yet
Lec 01
60 pages
Luo Bag of Tricks and A Strong Baseline For Deep Person CVPRW 2019 Paper
No ratings yet
Luo Bag of Tricks and A Strong Baseline For Deep Person CVPRW 2019 Paper
9 pages
DL Module 1 - CS-1 Fundamentals of Neural Network
No ratings yet
DL Module 1 - CS-1 Fundamentals of Neural Network
81 pages
Learning Latent Dynamics for Planning From Pixels
No ratings yet
Learning Latent Dynamics for Planning From Pixels
11 pages
joshi2020
No ratings yet
joshi2020
6 pages
3542700.3542703
No ratings yet
3542700.3542703
8 pages
DS FML QB Bat20 PDF
No ratings yet
DS FML QB Bat20 PDF
51 pages
Lect 4-Design Process
No ratings yet
Lect 4-Design Process
21 pages
Asynchronous Methods For Deep Reinforcement Learning
No ratings yet
Asynchronous Methods For Deep Reinforcement Learning
28 pages
Supervised - ML Complete Book
No ratings yet
Supervised - ML Complete Book
153 pages
Lecture 16 Meta Learning
No ratings yet
Lecture 16 Meta Learning
39 pages
Evaluation of Deep Reinforcement Learning Algorithms for Autonomous Driving
No ratings yet
Evaluation of Deep Reinforcement Learning Algorithms for Autonomous Driving
7 pages
Bayesian Deep Reinforcement Learning Via Deep Kernel Learning
No ratings yet
Bayesian Deep Reinforcement Learning Via Deep Kernel Learning
8 pages
Ding 2015
No ratings yet
Ding 2015
10 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
Alamat Bri Se Indonesiaxlsx
No ratings yet
Alamat Bri Se Indonesiaxlsx
27 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
Transfer_Learning_for_Improving_Model_Predictions_
No ratings yet
Transfer_Learning_for_Improving_Model_Predictions_
12 pages
DSA Chapter 0
No ratings yet
DSA Chapter 0
58 pages
ML Manual - 2023-24
No ratings yet
ML Manual - 2023-24
54 pages
Soft Computing Lab Record
100% (1)
Soft Computing Lab Record
35 pages
Scimakelatex 27462 AdityaChaturvedi DevendraMilmile ShubhamHattewar
No ratings yet
Scimakelatex 27462 AdityaChaturvedi DevendraMilmile ShubhamHattewar
8 pages
Supersizing Self-Supervision Learning To Grasp From 50K Tries and 700 Robot Hours
No ratings yet
Supersizing Self-Supervision Learning To Grasp From 50K Tries and 700 Robot Hours
8 pages
Andrew ML
No ratings yet
Andrew ML
218 pages
Lecture4 2024
No ratings yet
Lecture4 2024
67 pages
Artificial Intelligence and Deep Learning: Certificate Program
No ratings yet
Artificial Intelligence and Deep Learning: Certificate Program
12 pages
Astha Ml Manual
No ratings yet
Astha Ml Manual
56 pages
UNIT-II
No ratings yet
UNIT-II
83 pages
Machine Learning Yearning
100% (2)
Machine Learning Yearning
116 pages
SimCLR
No ratings yet
SimCLR
11 pages
Deep Learning Based Modeling of A Gas Turbine PDF
No ratings yet
Deep Learning Based Modeling of A Gas Turbine PDF
32 pages
202046702 Artificial Intelligence and Machine Learning
No ratings yet
202046702 Artificial Intelligence and Machine Learning
4 pages
Benchmarking Optimizers
No ratings yet
Benchmarking Optimizers
30 pages
-3
No ratings yet
-3
28 pages
ML QB Solutionss
No ratings yet
ML QB Solutionss
16 pages
Thesis Presentation
No ratings yet
Thesis Presentation
33 pages
Hindsight Experience Replay
No ratings yet
Hindsight Experience Replay
15 pages
50 Days of Machine Learning With Infographics
No ratings yet
50 Days of Machine Learning With Infographics
15 pages
ML Questions
No ratings yet
ML Questions
3 pages
Unit-I (R20 Syllabus) Machine Learning Basics
No ratings yet
Unit-I (R20 Syllabus) Machine Learning Basics
50 pages
40 Machine Learning Algorithms
From Everand
40 Machine Learning Algorithms
Anam Giri
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
YW-Eshel
No ratings yet
YW-Eshel
8 pages
DWM 10 Marks
No ratings yet
DWM 10 Marks
3 pages
Mad Test 2
No ratings yet
Mad Test 2
24 pages
Unit 1 AI Part 2
No ratings yet
Unit 1 AI Part 2
10 pages
CS F211 1094
No ratings yet
CS F211 1094
6 pages
GeoAI
No ratings yet
GeoAI
50 pages
Big - Integer Library For Contests by Jane Alam Jan
No ratings yet
Big - Integer Library For Contests by Jane Alam Jan
10 pages
Be Itc MCQ 675263
No ratings yet
Be Itc MCQ 675263
18 pages
Performance Analysis
No ratings yet
Performance Analysis
32 pages
1.3 Signal Flow Graph
No ratings yet
1.3 Signal Flow Graph
33 pages
MCQ On Linear Programming Problem
88% (8)
MCQ On Linear Programming Problem
7 pages
Chapter 5 Inter Part II
No ratings yet
Chapter 5 Inter Part II
2 pages
Logic Clock
No ratings yet
Logic Clock
3 pages
Response To Arbitrary, Step and Pulse Excitation
No ratings yet
Response To Arbitrary, Step and Pulse Excitation
67 pages
Ce Pre Review 1 Prelim Examination Instruction: Read and Solve The Problems Below. Show Your Solutions. Correct Answers Without Correct
No ratings yet
Ce Pre Review 1 Prelim Examination Instruction: Read and Solve The Problems Below. Show Your Solutions. Correct Answers Without Correct
1 page
SLAM Algo
No ratings yet
SLAM Algo
33 pages
Ktu Btech Module 2 Data Structures Semester 3
No ratings yet
Ktu Btech Module 2 Data Structures Semester 3
12 pages
Opencv
No ratings yet
Opencv
858 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
30 pages
MTH213 22_2
No ratings yet
MTH213 22_2
10 pages
Simplex Method Steps
No ratings yet
Simplex Method Steps
35 pages
05 Computer Vision
No ratings yet
05 Computer Vision
64 pages
Numerical Methods: Puskar R. Pokhrel
No ratings yet
Numerical Methods: Puskar R. Pokhrel
22 pages
Python Lab Questions
No ratings yet
Python Lab Questions
1 page
Full download Machine Learning for Business Analytics: Concepts, Techniques and Applications with JMP Pro, 2nd Edition Galit Shmueli pdf docx
100% (1)
Full download Machine Learning for Business Analytics: Concepts, Techniques and Applications with JMP Pro, 2nd Edition Galit Shmueli pdf docx
47 pages
Deep Learning Handout
100% (1)
Deep Learning Handout
6 pages
Topic 5 - Abstract Data Structures (IB Computer Science Revision Notes)
No ratings yet
Topic 5 - Abstract Data Structures (IB Computer Science Revision Notes)
4 pages
Sadeghi Et Al. - 2023
No ratings yet
Sadeghi Et Al. - 2023
18 pages