0% found this document useful (0 votes)
5 views13 pages

Lecture2 Introduction Part2

The document discusses key concepts in Reinforcement Learning (RL), including action spaces, rewards, state values, and sequential decision making. It explains the difference between discrete and continuous action spaces, the importance of maximizing cumulative rewards, and the role of policies in determining actions. Additionally, it highlights the long-term consequences of actions and the need for expected return values in decision making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

Lecture2 Introduction Part2

The document discusses key concepts in Reinforcement Learning (RL), including action spaces, rewards, state values, and sequential decision making. It explains the difference between discrete and continuous action spaces, the importance of maximizing cumulative rewards, and the role of policies in determining actions. Additionally, it highlights the long-term consequences of actions and the need for expected return values in decision making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Introduction to RL Part2

Agent Environment Loop


Action Space
• The set of all valid actions in a given
environment is called the action space.
• There are two types of action space:
– Discrete action space
• the finite number of actions are possible.
• For example, turning left or right.
– Continuous action space
• can have an infinite number of actions.
• For instance, steering angle instead of turning left or
right.
Rewards
• A reward Rt is a scalar feedback signal

• Indicates how well agent is doing at step t

• The agent’s job is to maximize cumulative reward


(return)
…..

• Return is only about the future


State Value
• Since the return is about the future, we
cannot always know the actual return value.

• Therefore we consider expected return

• These values depend on the action the agent


takes.
State Value
• Now the goal is to maximize the expected return
value by choosing the suitable action.

• In RL, we cannot comment on if a particular


action is correct or wrong (since it is not
supervised learning).

• However, a sequence of actions taken from a


state s to reach the goal produce the state value
of s.
Sequential Decision Making
• Selecting actions to maximize total future reward
• Actions may have long term consequences
• Reward may be delayed
• It may be better to sacrifice immediate reward to
gain more long-term reward
Sequential Decision Making
• Examples:
– A financial investment (may take months to
mature)
– Refueling a helicopter (might prevent a crash in
several hours)
– Blocking opponent moves (might help winning
chances many moves from now)
Recursive Definitions – Return
…..

…..

Therefore
Recursive Definitions - State Value

…| =s]

Therefore
| =s]
Policy
• A mapping from states to actions
Action Value
• Values associated with the action to be taken

• Here, we pin down only the first action and


the future actions are not decided.
• Therefore, q depends on the policy
Thank You

You might also like