0% found this document useful (0 votes)
7 views18 pages

Lecture 35 36 - Exploration vs. Exploitation

The document outlines the agenda for lectures 35 and 36 of the AI-832 Reinforcement Learning course, focusing on the exploration vs. exploitation dilemma. It covers various concepts such as the multi-armed bandit problem, regret, and different algorithms like the greedy and epsilon-greedy algorithms. Key principles discussed include optimism in the face of uncertainty and upper confidence bounds.

Uploaded by

Hadia Ramzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views18 pages

Lecture 35 36 - Exploration vs. Exploitation

The document outlines the agenda for lectures 35 and 36 of the AI-832 Reinforcement Learning course, focusing on the exploration vs. exploitation dilemma. It covers various concepts such as the multi-armed bandit problem, regret, and different algorithms like the greedy and epsilon-greedy algorithms. Key principles discussed include optimism in the face of uncertainty and upper confidence bounds.

Uploaded by

Hadia Ramzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

AI-832 Reinforcement Learning

Instructor: Dr. Zuhair Zafar

Lecture # 35 & 36: Exploration vs. Exploitation


Recap

• Model Based Reinforcement Learning


Today’s Agenda

• Exploitation vs Exploration
Exploration vs. Exploitation Dilemma
Examples
Principles
The Multi-Armed Bandit
Regret
Counting Regret
Linear or Sublinear Regret
Greedy Algorithm
Optimistic Initialization
Epsilon-Greedy Algorithm
Decaying Epsilon-Greedy Algorithm
Lower Bound
Optimism in the Face of Uncertainty
Optimism in the Face of Uncertainty
Upper Confidence Bounds

You might also like