0% found this document useful (0 votes)
17 views44 pages

Lecture#1 - RL An Introduction 2023

This document provides an introduction to Reinforcement Learning (RL), emphasizing its nature as a machine learning paradigm focused on learning through interaction with an environment to maximize rewards. It distinguishes RL from supervised and unsupervised learning, highlighting the importance of trial-and-error and the exploration-exploitation dilemma. Key concepts such as state space, action space, policy, reward signals, and value functions are also discussed as fundamental elements of RL systems.

Uploaded by

majd abed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views44 pages

Lecture#1 - RL An Introduction 2023

This document provides an introduction to Reinforcement Learning (RL), emphasizing its nature as a machine learning paradigm focused on learning through interaction with an environment to maximize rewards. It distinguishes RL from supervised and unsupervised learning, highlighting the importance of trial-and-error and the exploration-exploitation dilemma. Key concepts such as state space, action space, policy, reward signals, and value functions are also discussed as fundamental elements of RL systems.

Uploaded by

majd abed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Lecture #1:

Reinforcement Learning RL An
Introduction

All the lectures of this course has been prepared basically depending on Reinforcement Learning:
An Introduction “2ed “
Richard S. Sutton and Andrew G. Barto
The MIT Press Cambridge, Massachusetts London, England

Prepared By : Eng.Nada Jonide Second Semester-2023 1


2
Reinforcement Learning-What is Learning ?

Learning takes place as a result of interaction between an agent


and the world, the idea behind learning is that:

Percepts received by an agent should be used not only


for acting, but also for improving the agent’s ability to
behave optimally in the future to achieve the goal.

3
Reinforcement Learning- What is Reinforcement Learning
(RL)?

Reinforcement learning is an area of machine learning, inspired


by behaviorist psychology, concerned with how an agent can
learn from interactions with an environment.
Wikipedia, Sutton and Barto (1998)
Reinforcement learning is a branch of machine learning dedicated to training agents to
operate in an environment, in order to maximize their utility in the pursuit of some
goals.

Fundamental challenge in artificial intelligence and machine learning is learning machine to

make good decisions under uncertainty 4


Many Faces of Reinforcement Learning

5
Reinforcement Learning

• The idea that we learn by interacting with our environment is probably the first
to occur to us when we think about the nature of learning.

• When an infant plays, waves its arms, or looks about, it has no explicit teacher, but it
does have a direct sensorimotor connection to its environment.

• Exercising this connection produces a wealth of information about cause and


react, about the consequences of actions, and about what to do in order to
achieve goals.
6
Reinforcement Learning

• Throughout our lives, such interactions are undoubtedly a major source of knowledge
about our environment and ourselves.

• Whether we are learning to drive a car or to hold a conversation, we are acutely aware of
how our environment responds to what we do, and we seek to influence what
happens through our behavior.

• Learning from interaction is a foundational idea underlying nearly


all theories of learning and intelligence.
7
Reinforcement Learning

• Here we are going to explore a computational approach to learning


from interaction.

• The approach we explore, called reinforcement learning, is focused on


goal-directed learning from interaction much more than other
approaches to machine learning.

• Reinforcement learning is learning what to do [how to map situations


to actions] so as to maximize a numerical reward signal.

8
Reinforcement Learning

• The learner is not told which actions to take, but instead must discover which actions yield
the most reward by trying them.

• In the most interesting and challenging cases, actions may react not only the immediate
reward but also the next situation and, through that, all subsequent rewards.
• These two characteristics
• trial-and-error search
• delayed reward
are the too most important distinguishing features of reinforcement
learning.
9
Reinforcement Learning

• In particular, the distinction between problems and solution


methods is very important in reinforcement learning; failing
to make this distinction is the source of many confusions.

• We formalize the problem of reinforcement learning using


ideas from dynamical systems theory, specifically, as the
optimal control of incompletely-known Markov decision
processes.

10
Reinforcement Learning

• A learning agent must be able to sense the state of its


environment to some extent and must be able to take actions
that react the state.

• The agent also must have a goal or goals relating to the state
of the environment.

11
Reinforcement Learning

• Markov decision processes are intended to include just these three aspects

—sensation
—action
— goal

in their simplest possible forms without trivializing[‫ ]التقليل من شأن‬any of them.

• Any method that is well suited to solving such problems we consider to be a


reinforcement learning method.

12
RL vs SL
RL vs USL
Reinforcement
Learning is, like
Supervised Learning
and Unsupervised
Learning, one the main
areas of Machine
Learning and Artificial
Intelligence.

13
Reinforcement Learning

up till now we learnt that RL is concerned with the learning process of an arbitrary
being, formally known as an , in the world surrounding it, known as
the Environment.

The Agent seeks to maximize the it receives from the Environment, and
performs different in order to learn how the Environment responds and gain
more rewards.

One of the greatest challenges of RL tasks is to associate actions with postponed


rewards — which are rewards received by the Agent long after the reward-generating
action was made.

It is therefore heavily used to solve different kind of games, from Tic-Tac-Toe, Chess,
14
Reinforcement Learning

Reinforcement learning is different from supervised learning, the kind of learning


studied in most current research in the field of machine learning. Supervised learning is
learning from a training set of labeled examples provided by a knowledgable external
supervisor.

Each example is a description of a situation together with a specification—the label—of


the correct action the system should take to that situation, which is often to identify a
category to which the situation belongs.

The object of this kind of learning is for the system to extrapolate, or generalize, its
responses so that it acts correctly in situations not present in the training set.

15
Reinforcement Learning

This is an important kind of learning, but alone it is not adequate for


learning from interaction.

In interactive problems it is often impractical to obtain examples of desired


behavior that are both correct and representative of all the situations in
which the agent has to act.

In uncharted territory[‫—]منطقة غير مدونة على الخريطة‬where one would expect


learning to be most beneficial—an agent must be able to learn from its own
experience.
16
Reinforcement Learning

• Reinforcement learning is also different from what


machine learning researchers call unsupervised learning,
which is typically about finding structure hidden in
collections of unlabeled data.

• The terms supervised learning and unsupervised learning


would seem to exhaustively classify machine learning
paradigms, but they do not.

17
Reinforcement Learning

• Although one might be tempted to think of reinforcement


learning as a kind of unsupervised learning because it does not
rely on examples of correct behavior, reinforcement learning is
trying to maximize a reward signal instead of trying to find
hidden structure.

• Uncovering structure in an agent’s experience can certainly be


useful in reinforcement learning, but by itself does not address
the reinforcement learning problem of maximizing a reward
signal.
18
Reinforcement Learning

• One of the challenges that arise in reinforcement learning, and not in other kinds of learning,
is
the trade-of between exploration and exploitation.

• Exploration means that you search over the whole sample space (exploring the sample
space)
while
• Exploitation means that you are exploiting the promising areas found when you did
the exploration

19
Reinforcement Learning

• To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried
in the past and found to be reactive in producing reward. But to discover such actions, it has
to try actions that it has not selected before.

• The agent has to exploit what it has already experienced in order to obtain reward, but it also
has to explore in order to make better action selections in the future.

20
Reinforcement Learning

• The dilemma[‫ ]المعضلة‬is that neither exploration nor exploitation can be pursued[‫]تتابع‬
exclusively without failing at the task.

• The agent must try a variety of actions and progressively favor those that appear to
be best.

• On a stochastic task, each action must be tried many times to gain a reliable estimate
of its expected reward.

21
Reinforcement Learning

• The exploration–exploitation dilemma has been intensively studied by


mathematicians for many decades, yet remains unresolved.

• For now, we simply note that the entire issue of balancing exploration and
exploitation does not even arise in supervised and unsupervised learning.

22
Reinforcement Learning Example

• A mobile cleaner robot decides whether it should enter a new room in search of more trash
to collect or start trying to find its way back to its battery recharging station.

• It makes its decision based on the current charge level of its battery and how quickly and
easily it has been able to find the recharger in the past.

• An adaptive controller adjusts parameters of a petroleum refinery’s operation in real time.

• The controller optimizes the yield/cost/quality trade-of on the basis of specified marginal
costs without sticking strictly to the set points originally suggested by engineers.

23
Typical RL Scenario Agent: The learner and
decision maker.
Child, dog, robot,
program, etc.

RL deals with agents


that must sense & act
upon their environment.
This is combines
classical AI and machine
learning techniques.

Environment: Agent’s
surronding or things it
interact with. Agent observe
environment and decide to
take action which changes
environment(it may also
change on its own).
24
• The key idea behind Reinforcement learning, we have an environment
which represents the outside world to the agent and an agent that
takes actions, receives observations from the environment that
consists of a reward for his action and information of his new state.
• That reward informs the agent of how good or bad was the taken
action, and the observation tells him what is his next state in the
environment. Its actions also may affect not only the immediate
rewards but rewards for the next situations.

25
Elements of Reinforcement Learning

Beyond the agent and the environment, one can identify:

The main sub elements of a


reinforcement learning system:

State Space Action Space

policy reward value Environment


signal function model
26
Elements of Reinforcement Learning-- State Spaces

• State Spaces : A State Spaces S is a complete


description of the states of the world.
• When agent is able to observe complete states
environment is fully observed while when only
partial states is observed environment is partially
observed.

27
Elements of Reinforcement Learning– Action Spaces

• Action Spaces: The set of all valid actions in a given


environment agent is able to perform is called the
action space.
• When actions are finite they are called discrete
action spaces and when they are infinite they are
called continuous action spaces.

28
Elements of Reinforcement Learning--Policy

• A policy defines the learning agent’s way of behaving at a given time.


• We can say that intelligence is the capacity of the agent to select the
appropriate strategy in relation to its goals. Strategy, a teleologically-
oriented subset of all possible behaviors, is here connected to the idea of
“policy”.
• A policy is, therefore, a strategy that an agent uses in pursuit of goals.
The policy dictates the actions that the agent takes as a function of the
agent’s state and the environment.
• Roughly speaking, a policy is a mapping from perceived states of the
environment to actions to be taken when in those states.
29
Elements of Reinforcement Learning--Policy

• It corresponds to what in psychology would be called a set of stimulus–


response rules or associations.
• In some cases the policy may be a simple function or lookup table,
whereas in others it may involve extensive computation such as a search
process.
• The policy is the core of a reinforcement learning agent in the sense that it
alone is sufficient to determine behavior.
• In general, policies may be stochastic, specifying probabilities for each
action.
30
Elements of Reinforcement Learning--Policy

• Policy: It is agent behavior function or simply agent’s behavior. The policy is a


mapping from perceived states of the environment to actions to be taken when in
those states or simply it maps the action to state.There are two types of policy
Deterministic and Stochastic.

• Deterministic policy is a mapping π:S → A. For each state s∈S, it yields the
action a∈A that the agent will choose while in state s.
• Stochastic policy is a mapping π:S × A → [0,1]. For each state s∈S and
action a∈A, it yields the probability π(a∣s) that the agent chooses action a
while in state s.

31
Elements of Reinforcement Learning--Rewards
• A reward signal defines the goal of a reinforcement learning problem.

• On each time step, the environment sends to the reinforcement learning agent a single
number called the reward.
• Rewards convey how “good" an agent's actions are, not what the “best “actions would have
been.
• If the agent was given instructive feedback (what action it should have taken) this would be a
supervised learning problem, not a reinforcement learning problem.

• The agent’s sole objective is to maximize the total reward [Cumulative Rewards] it receives
over the long run.
• Thus, the reward signal defines what are the good and bad events for the agent.
32
Elements of Reinforcement Learning--
Rewards
• In a biological system, we might think of rewards as analogous to the experiences
of pleasure or pain.

• They are the immediate and defining features of the problem faced by the agent.

• The reward signal is the primary basis for altering the policy; if an action selected
by the policy is followed by low reward, then the policy may be changed to select
some other action in that situation in the future.

• In general, reward signals may be stochastic functions of the state of the


environment and the actions taken.

33
Elements of Reinforcement Learning--
Rewards
• Reward: A reward is a scalar feedback signal it indicates how well the
agent is doing at step t. The agent’s sole objective is to maximize the
total reward it receives over the long run.
• The reward signal is the primary basis for altering the policy.
• R(s) indicates the reward for simply being in the state S.
• R(S,a) indicates the reward for being in a state S and taking an
action a.
• R(S, a, S’) indicates the reward for being in a state S, taking an
action a and ending up in a state S’.

34
Elements of Reinforcement Learning--values
• A value function specifies what is good in the long run of a reinforcement learning
problem.

• Whereas the reward signal indicates what is good in an immediate sense.

• Roughly speaking, the value of a state is the total amount of reward an agent can expect
to accumulate over the future, starting from that state.

• Whereas rewards determine the immediate, intrinsic desirability of environmental


states, values indicate the long-term desirability of states after taking into account the
states that are likely to follow and the rewards available in those states.

35
Elements of Reinforcement Learning--values

• Value Function: It is a prediction of future reward. The value of


a state is the total amount of reward an agent can expect to
accumulate over the future, starting from that state. Used to
evaluate the goodness and badness of a state. There are two
types of value functions State Value Function and Action Value
Function

36
Elements of Reinforcement Learning--Values

• For example, a state might always yield a low immediate reward


but still have a high value because it is regularly followed by
other states that yield high rewards. Or the reverse could be true.

• To make a human analogy, rewards are somewhat like pleasure


(if high) and pain (if low), whereas values correspond to a more
advanced and wise judgment of how pleased or displeased we
are that our environment is in a particular state.

37
Elements of Reinforcement Learning--Values

• Rewards are in a sense primary, whereas values, as predictions of rewards, are


secondary.

• Without rewards there could be no values, and the only purpose of estimating
values is to achieve more reward. Nevertheless, it is values with which we are
most concerned when making and evaluating decisions. Action choices are made
based on value judgments.

• We seek actions that bring about states of highest value, not highest reward,
because these actions obtain the greatest amount of reward for us over the long
run.

38
Elements of Reinforcement Learning--Values

• Unfortunately, it is much harder to determine values than it


is to determine rewards.

• Rewards are basically given directly by the environment, but


values must be estimated and re-estimated from the
sequences of observations an agent makes over its entire
lifetime.

39
Elements of Reinforcement Learning--Values

• In fact, the most important component of almost all


reinforcement learning algorithms we consider is a method
for efficiently estimating values.
• The central role of value estimation is arguably the most important thing that has
been learned about reinforcement learning over the last six decades.

40
Elements of Reinforcement Learning—Model of the
environment
• The final element of some reinforcement learning systems is a model of the
environment.

• This is something that mimics the behavior of the environment, or more


generally, that allows inferences to be made about how the environment
will behave.

• For example, given a state and action, the model might predict the resultant next state
and next reward.

41
Elements of Reinforcement Learning—Model
of the environment
• A model predicts what environment will do next. Models are used for
planning, by which we mean any way of deciding on a course of action
by considering possible future situations before they are actually
experienced.
• A Model (sometimes called Transition Model) gives an action’s effect in a
state. In particular, T(S, a, S’) defines a transition T where being in state S
and taking an action a takes us to state S’ (S and S’ may be same).
• For stochastic actions (noisy, non-deterministic) we also define a
probability P(S’|S, a) which represents the probability of reaching a
state S’ if action a is taken in state S.
42
Elements of Reinforcement Learning—Model
of the environment

• Models are used for planning, by which we mean any way of deciding
on a course of action by considering possible future situations before
they are actually experienced.

• Methods for solving reinforcement learning problems that use models


and planning are called model-based methods,
as opposed to
• simpler model-free methods that are explicitly trial-and-error learners—
viewed as almost the opposite of planning .
43
Reinforcement Learning-Conclusion

RL is a subfield of machine learning where the machine “lives” in an environment and is capable of perceiving the
state of that environment as a vector of features. The machine can execute actions in every state. Different actions
bring different rewards and could also move the machine to another state of the environment. The goal of a
reinforcement learning algorithm is to learn a policy. policy is a function (similar to the model in supervised
learning) that takes the feature vector of a state as input and outputs an optimal action to execute in that state.
The action is optimal if it maximizes the expected average reward.

Reinforcement learning solves a particular kind of problems where decision making is sequential, and the goal is
long-term, such as game playing, robotics, resource management, or logistics.

Andriy Burkov _The Hundred-Page Machine Learning Book


44

You might also like