AutoRL Tutorials
AutoRL Tutorials
ECAI 2024
Santiago de Compostela, 19. October 2024
André Biedenkapp Theresa Eimer
University of Freiburg University of Hannover
2
Come to the Next COSEAL Workshop
The next COSEAL workshop will take
place 5th - 7th of May 2025 in Porto
● Algorithm selection
● Algorithm configuration
● Algorithm portfolios
● Performance predictions
and empirical performance models
● Bayesian optimization
● Hyperparameter optimization
● Automated machine learning (AutoML)
● Automated reinforcement learning (AutoRL)
● Neural architecture search
● Meta-learning
● Algorithm and parameter control
● Explorative landscape analysis
● Programming by optimization COSEAL.net
● Hyper-heuristics
3
Outline
Introduction and algorithmic part on AutoRL (André, ~60min; 9:00 - 10:00)
💪Practical Session I (André, ~30min; 10:00 - 10:30)
4
Why does AutoRL matter?
5
Primer on Reinforcement Learning
Agents learn by interacting with their world
6
Primer on Reinforcement Learning
Environments/Worlds are typically modelled as Markov Decision Processes (MDPs)
7
Primer on Reinforcement Learning
How can we learn by interacting with the world?
8
Primer on Reinforcement Learning
9
Primer on Reinforcement Learning
Many, well publicised
success stories
10
11
source: https://huggingface.co/learn/deep-rl-course/unit3/hands-on
12
source: https://sonyresearch.github.io/gt_sophy_public/
13
source: UZH Robotics and Perception Group YouTube channel
Reinforcement Learning is Sensitive to Hyperparameters
14
Reinforcement Learning is Sensitive to Hyperparameters
RL is sensitive to:
● Hyperparameters
● Network Architecture
● Reward Scale
● Random Seeds & Trials
● Environment Type
● Codebases
15
Henderson et al., AAAI 2018
16
How to Choose a Reinforcement-Learning Algorithm
More and more “heuristics” on how to choose the Bongratz
Caspi et al.,
et al.,
github
arXiv
2017
2024
right algorithm for the task
17
Properties of AutoRL
Hyperparameter Landscapes
18
Learning by Iteration
Supervised Learning: Reinforcement Learning
19
Reinforcement Learning by Iteration
Hyperparameters:
learning rate
discounting factor
21
Properties of AutoRL Hyperparameter Landscapes
Landscape analysis of a DQN agent
on the cartpole environment by
[Mohan et al., AutoML 2023].
23
AutoRL: Optimizing the Full Pipeline
24
What Can We Automate?
25
What Can We Automate? → The Role of Networks
As in supervised learning:
The architecture choice matters! (recall Henderson et al., AAAI’18)
But:
Deep RL often uses much smaller networks
26
What Can We Automate? → The Role of Networks
As in supervised learning:
The architecture choice matters! (recall Henderson et al., AAAI’18)
But:
Deep RL often uses much smaller networks
Particularly when state-features are “well engineered”
Could potentially pose the perfect playground for NAS with smaller search spaces
27
What Can We Automate? → The Role of Networks
Recall: (Reinforcement) Learning is done by iteration
In Deep RL we are faced with the Primacy Bias (Nikishin et al., ICML 2022):
• Agents are prone to overfit to early experiences
• This often negatively affects the whole learning process
Why?
Replay Buffers & Replay Ratios
28
What Can We Automate? → The Role of Networks
Heavy Priming: Nikishin et al., ICML 2022
29
What Can We Automate? → The Role of Networks
“Is the data collected by an Nikishin et al., ICML 2022
overfitted agent unusable for
learning?”
32
Q&A: Part 1
33
Practical Session 1 https://s.gwdg.de/kuBEC1
34
Examples of successful AutoRL, DAC
and online configuration Approaches
35
Prioritized Level Replay [Jiang et al. 2021]
Idea: train on levels which are most important for training performance
36
HOOF: Hyperparameter Optimisation on the Fly [Paul et al. 2019]
37
HOOF: Hyperparameter Optimisation on the Fly [Paul et al. 2019]
38
HOOF: Hyperparameter Optimisation on the Fly [Paul et al. 2019]
39
HOOF: Hyperparameter Optimisation on the Fly [Paul et al. 2019]
40
Population-Based Training [Jaderberg et al. 2017]
Population
Idea: dynamic configuration via parallel trials
Selection &
Partial Training
Mutation
…
41
Combining AutoRL Approaches
42
In RL Everything Is Connected
43
In RL Everything Is Connected
44
In RL Everything Is Connected
45
In RL Everything Is Connected
• Example 2: when the agent improves in training, we may want to not only adapt the
hyperparameters with PBT, but also change the algorithm
46
Combining HPO And NAS: BG-PBT [Wan et al. 2022]
47
Evaluation and Generalization of
AutoRL
48
What Is Generalization in AutoRL?
AutoRL
AutoRL
AutoRL
AutoRL
53
Evaluating RL
54
Evaluating RL
55
Evaluating RL
56
Evaluating RL
57
Evaluating RL
58
Evaluating RL
59
Evaluating RL
60
Evaluating AutoRL
Problem: unreliable estimates meet unstable target function meet potentially stochastic
AutoRL method
61
Evaluating AutoRL
Problem: unreliable estimates meet unstable target function meet potentially stochastic
AutoRL method
RL evaluation cost: 1
AutoRL
Problem: unreliable estimates meet unstable target function meet potentially stochastic
AutoRL method
RL evaluation cost: 1
RL across seeds: 10
AutoRL
Problem: unreliable estimates meet unstable target function meet potentially stochastic
AutoRL method
RL evaluation cost: 1
RL across seeds: 10
AutoRL
Problem: unreliable estimates meet unstable target function meet potentially stochastic
AutoRL method
RL evaluation cost: 1
RL across seeds: 10
AutoRL
66
HPO for RL
67
HPO for RL
68
HPO for RL
Left: DQN HP
importance on
Acrobot.
Right: DQN HP
importance on
MiniGrid 5x5.
69
HPO for RL
• But: in practice hand tuning and grid search are most common
70
HPO for RL
71
HPO for RL
But…
72
Seeding Matters
73
Seeding Matters
74
Zero-Cost Benchmarking: HPO-RL-Bench [Shala et al. 2024]
75
Zero-Cost Benchmarking: HPO-RL-Bench [Shala et al. 2024]
77
Flexible Benchmarking: ARLBench [Becktepe, Dierkes et al., 2024]
78
Practical Session 2 https://s.gwdg.de/xsIDyi
79
Thank You!