0% found this document useful (0 votes)
2 views

Lecture_01 - Introduction - I

Reinforcement Learning (RL) is a type of machine learning where an agent learns to perform tasks through trial and error, receiving positive or negative feedback based on its actions. Key concepts include the agent, environment, states, observations, action spaces, and policies, with the ultimate goal of maximizing cumulative rewards. RL is distinct from other learning methods due to its lack of supervision, delayed feedback, and the importance of sequential actions.

Uploaded by

attaurrehman1017
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture_01 - Introduction - I

Reinforcement Learning (RL) is a type of machine learning where an agent learns to perform tasks through trial and error, receiving positive or negative feedback based on its actions. Key concepts include the agent, environment, states, observations, action spaces, and policies, with the ultimate goal of maximizing cumulative rewards. RL is distinct from other learning methods due to its lack of supervision, delayed feedback, and the importance of sequential actions.

Uploaded by

attaurrehman1017
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Reinforcement Learning

Reinforcement Learning
Supervised (inductive) learning is the simplest and
most studied type of learning
How can an agent learn behaviors when it doesn’t
have a teacher to tell it how to perform?
◼ The agent has a task to perform
◼ It takes some actions in the world
◼ At some later point, it gets feedback telling it how well it did
on performing the task
◼ The agent performs the same task over and over again
This problem is called reinforcement learning:
◼ The agent gets positive reinforcement for tasks done well
◼ The agent gets negative reinforcement for tasks done poorly
Reinforcement Learning (cont.)
The goal is to get the agent to act in the
world so as to maximize its rewards
The agent has to figure out what it did that
made it get the reward/punishment
◼ This is known as the credit assignment problem
Reinforcement learning approaches can be
used to train computers to do many tasks
◼ backgammon and chess playing
◼ Autonomous cars
◼ controlling robot limbs
Characteristics of
Reinforcement Learning
What makes RL different from other
machine learning algorithms?
◼ There is no supervision, only a reward single
◼ Feedback is delayed, not instantaneous
◼ Time really matters, sequential, no i.i.d data
◼ Agent’s action affect the subsequent data it
receives
Key Concepts and Terminologies
Main characters of RL
◼ Agent
◼ Environment: World that
the agent lives in and
interacts with
◼ At every step of interaction, the agent sees a

observation of the state of the world, and


then decides on an action to take.
◼ The environment changes when the agent

acts on it but may also change on its own.


Key Concepts and Terminologies
Main characters of RL
◼ Agent
◼ Environment
◼ Reward: Agent perceives a reward signal from
the environment, a number that tells it how
good or bad the current world state is.
◼ The goal of the agent is to maximize its
cumulative reward, called return.
Reinforcement learning methods are ways
that the agent can learn behaviors to
achieve its goal.
Key Concepts and Terminologies
To talk more specifically what RL does,
we need to introduce additional
terminology. We need to talk about
States and observations
Action spaces
Policies
Trajectories
RL optimization problem
Value functions
Key Concepts and Terminologies
State: A state s is complete description of the world.
No information about world is hidden from state
Observation: An observation o is partial description of
the state.
Action space: The set of all valid actions in a given
environment is often called the action space.
Some environments, like Atari and Go, have discrete
action spaces while other environments, like agent
controls a robot in a physical world, have continuous
action spaces. In continuous spaces, actions are real-
valued vectors.
Key Concepts and Terminologies
Policy Example
3 +1

2 -1

1 2 3 4

• A policy  is a complete mapping from states to actions


Formalization
Given:
◼ a state space S
◼ a set of actions a1, …, ak
◼ reward value at the end of each trial (may
be positive or negative)
Output:

example:
a mapping Alvinnto
from states (driving
actionsagent)
state: configuration of the car
learn a steering action for each state
Reactive Agent Algorithm
Accessible or
Repeat: observable state
 s  sensed state
 If s is terminal then exit
 a  choose action (given s)
 Perform a
RL Agent Algorithm

Repeat:
 s  sensed state
 If s is terminal then exit
 a  (s)
 Perform a
Approaches
Learn policy directly– function mapping
from states to actions
Learn utility values for states (i.e., the
value function)
RL Summary
Active area of research
Approaches from both OR and AI
There are many more sophisticated
algorithms that we have not discussed
Applicable to game-playing, robot
controllers, others

You might also like