0% found this document useful (0 votes)
5 views

Reinforcement Learning- Introduction

Uploaded by

rajputdhruvi12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Reinforcement Learning- Introduction

Uploaded by

rajputdhruvi12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Reinforcement Learning

Definition

• Software agent learns to perform certain actions in an environment


which lead it to maximum reward.
• Exploration and Exploitation
• Multiple Trials
Type of ML

Machine Learning
Reinforcement:
Supervised • Cause and Effect
• Agent learns to interact
with environment for
reward.
Unsupervised

Reinforcement
Intuitive example

• Imagine you are supposed to cross an unknown field in the middle of


a pitch-black night without a torch.
• There can be pits and stones in the field, the position of those are
unfamiliar to you.
• There's a simple rule - if you fall into a hole or hit a rock, you must
start again from your initial point.
Block Diagram
Definitions

• Agent: Entity performing action in environment to gain reward.


• Action (a): All possible moves by agent.
• Environment (e): Scenario faced by agent.
• State (s): Current situation returned by the agent.
Definitions
• Reward(R): An immediate return sent from Environment to evaluate
last action by agent.
• Policy (𝜋): Strategy that an agent employs to determine next action
based on state s.
• Value (V): The expected long-term return with discount 𝑉𝜋 𝑠 .
Opposed to R.
• Q value or action value (Q): 𝑄𝜋 𝑠, 𝑎 : Long term return of current
state s, taking action a under policy 𝜋
Types of Reinforcement Learning

Reinforcement
Value Based

Policy based

Model Based
Value Based
• Try to maximize a value function 𝑉(𝑠)
max 𝑉𝜋(𝑠)

• The value of reward which the agent expects to gain in the future
upon starting at that state s.
• 𝐸- 𝑅/01 + 𝛾𝑅/04 + 𝛾 4 𝑅/05 + ⋯ |𝑆/ = 𝑠
Policy Based
• Try to produce a policy such that the action performed at each state is
optimal to gain maximum reward in the future.
• 𝜋 𝑠, 𝑎

• Deterministic
• At any state s, same action a is produced by policy 𝜋

• Stochastic: 𝜋 𝑎 𝑠 = 𝑃(𝐴/ = 𝑎|𝑆/ = 𝑠)


• Each action has a certain probability.
Model Based

• In this type of reinforcement learning, create a virtual model for each


environment,
• The agent learns to perform in that specific environment.
• Since the model differs for each environment, there is no singular
solution or algorithm for this type.
Multi-arm Bandit Problem
• Consider Casino section with 10 slot machine. It has written “Play for
Free ! Max. payout is $10.
• Each slot machine has different average payout.

• Goal: Find which one gives most average reward so as to maximize


reward in shortest time.

You might also like