lect1_introRL
lect1_introRL
Reinforcement Learning
DAVIDE BACCIU – [email protected]
Preliminaries
✓ Ph.D. Students
✓ Read 3 (or more) relevant papers on a topic of interest for the course and summarize their content
in a report (6-10 pages single column, NeurIPS format)
✓ Sketch/propose a novel RL method/application: report your idea in sufficient detail in a short
paper (6-10 pages single column, NeurIPS format)
✓ Implement a RL-based application and validate it: prepare a short presentation to report the
results (10-15 slides describing the model, the implementation and the results)
✓ Contact me and agree on alternative ways (e.g. using RL in your Ph.D. project, …)
Reference Book
https://www.youtube.com/watch?v=jwSbzNHGflM
✓The Environment:
✓ Receives action At
✓ Emits observation 𝑶𝒕+𝟏
✓ Emits scalar reward 𝑹𝒕+𝟏
✓Holiday planning
✓Exploitation – The camping site you go to since you are born
✓Exploration – Hitchhike and follow the flow
✓Game Playing
✓Exploitation - Play the move you believe is best
✓Exploration - Play an experimental move
𝑣∗ 𝜋∗
What is the optimal value function over all possible policies?
What is the optimal policy?