0% found this document useful (0 votes)
69 views35 pages

Reinforcement Learning and Robotics

Reinforcement learning and robotics are related fields. Reinforcement learning involves an agent learning through trial-and-error interactions with an environment. The agent aims to maximize rewards. There are three main types of learning: supervised, unsupervised, and reinforcement learning. Reinforcement learning deals with rewarding or punishing an agent's actions to influence its behavior. Effective reinforcement learning balances exploration of new actions with exploitation of existing knowledge. Robotics applies reinforcement learning concepts through robotic agents that can perceive states and take actions to achieve goals in the physical world. Common robot types include manipulators, mobile robots, and mobile manipulators that combine mobility and manipulation.

Uploaded by

AhmedIsmaeil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views35 pages

Reinforcement Learning and Robotics

Reinforcement learning and robotics are related fields. Reinforcement learning involves an agent learning through trial-and-error interactions with an environment. The agent aims to maximize rewards. There are three main types of learning: supervised, unsupervised, and reinforcement learning. Reinforcement learning deals with rewarding or punishing an agent's actions to influence its behavior. Effective reinforcement learning balances exploration of new actions with exploitation of existing knowledge. Robotics applies reinforcement learning concepts through robotic agents that can perceive states and take actions to achieve goals in the physical world. Common robot types include manipulators, mobile robots, and mobile manipulators that combine mobility and manipulation.

Uploaded by

AhmedIsmaeil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Reinforcement learning

and Robotics
What is it?

https://www.youtube.com/watch?v=gn4nRCC9TwQ
Reinforcement learning

 Learning takes place as a result of interaction between an agent and the


world, the idea behind learning is that
 Percepts received by an agent should be used not only for acting, but also for
improving the agent’s ability to behave optimally in the future to achieve the
goal.
 The task of RL is to learn how to behave successfully to achieve a goal while
interacting with an external environment
 Learn via experiences!
Three types of learning

  Supervised Learning deals with predicting values or classes based on


labeled data.
 Unsupervised Learning deals with clustering and finding relations in
unlabeled data
 Reinforcement Learning deals with how some arbitrary being (formally
referred to as an ―Agent‖) should act and behave in a given environment.
The way it is done is by giving the Agent rewards or punishments based on
the actions it has performed on different scenarios.
RL –key features

 Reinforcement learning is learning what to do-how to map situations to


actions- so as to maximize a numerical reward signal.
 The learner is not told which actions to take, as in most forms of machine
learning, but instead must discover which actions yield the most reward by
trying them.
 Actions may affect not only the immediate reward but also the next
situation and, through that, all subsequent rewards.
 These two characteristics most important distinguishing features of
reinforcement learning
 trial-and-error search
 delayed reward
Exploration vs. exploitation challenge

 To obtain a lot of reward, a reinforcement learning agent must prefer


actions that it has tried in the past and found to be effective in producing
reward.
 But to discover such actions, it has to try actions that it has not selected
before.
 The agent has to exploit what it already knows in order to obtain
reward, but it also has to explore in order to make better action selections
in the future.
 The dilemma is that neither exploration nor exploitation can be
pursued exclusively without failing at the task.
Elements of RL

 Beyond the agent and the environment, one can identify four main
subelements of a reinforcement learning system: a policy, a reward
function, and a value function.
Elements of RL-policy States or
situations Actions

 A policy defines the learning agent’s way of behaving at a given


time –denoted pt.
 A policy is a mapping from perceived states of the environment to
actions to be taken when in those states.
 It corresponds to a set of stimulus–response rules or associations.
 In some cases the policy may be a simple function or lookup table,
whereas in others it may involve extensive computation such as a
search process.
 The policy is the core of a reinforcement learning agent in the
sense that it alone is sufficient to determine behavior.
Elements of RL- reward function
 A reward function defines the goal in a reinforcement learning problem.
 It maps each perceived state (or state–action pair) of the environment to
a single number, a reward, indicating the intrinsic desirability of that state.
 A reinforcement learning agent’s sole objective is to maximize the total
reward it receives in the long run.
 The reward function defines what are the good and bad events for the
agent.
 Reward may serve as a basis for altering the policy. For example, if an
action selected by the policy is followed by low reward, then the policy
may be changed to select some other action in that situation in the
future. mapping
State or
(state, action) A single number
Elements of RL- value function

 Whereas a reward function indicates what is good in an


immediate sense,
a value function specifies what is good in the long run.
 Roughly speaking, the value of a state is the total
amount of reward an agent can expect to accumulate
over the future, starting from that state.
Values vs. rewards
 For example, a state might always yield a low immediate reward but
still have a high value because it is regularly followed by other states
that yield high rewards.
 We seek actions that bring about states of highest value, not highest
reward
 Unfortunately, it is much harder to determine values than it is to
determine rewards.
 Rewards are basically given directly by the environment, but values
must be estimated and reestimated from the sequences of
observations an agent makes over its entire lifetime.
RL-example
 Take Super Mario as an example: Mario is the Agent interacting
with the world (the Environment).
 The states are exactly what we see on the screen
 Episode is a level: the initial state is how the level begins, and the
terminal state is how the level ends, whether we completed it or
perished while trying.
 The actions are move forward, move backwards, jump, etc.
 Rewards are given depending on actions outcome: when Mario
collects coins or bonuses, it receives a positive reward, and when
it falls or being hit by an enemy, it receives a negative reward.
When Mario just wonders around, the reward it receives is zero, as
if to say ―you did nothing special‖.
RL-example

 to be able to collect rewards,


some ―non-special‖ actions are
needed to be taken — you have to
walk towards the coins before you
can collect them.
 So an Agent must learn how to
handle postponed rewards by
learning to link those to the actions
that really caused them.
Learning based on Markov process

 Assume we already know what is the expected reward for each


action on each step.
 How will we choose an action in this case? Quite simply — we’ll
choose the sequence of action that will eventually generate
the highest reward.
 This cumulative reward we’ll receive is often referred to as Q
Value (an abbreviation of Quality Value), and we can formalize
our strategy mathematically as:
Learning based on Markov process
The immediate Expected
reward future rewards
Own history
experience

 The above equation states that the Q Value yielded from being at state s
and selecting action a, is the immediate reward received, r(s,a), plus the
highest Q Value possible from state s’ (which is the state we ended up in
after taking action a from state s).
 We’ll receive the highest Q Value from s’ by choosing the action that
maximizes the Q Value.
 We also introduce γ, usually called the discount factor, which controls
the importance of long term rewards versus the immediate one.
Learning based on Markov process
 How would we implement this to solve real-life challenges? One way is drawing a table to
store all possible state-action combinations, and use it to save Q Values.
Sample robot navigation with Qlearn

 See sentral LMS


Robotics
Basic components

 Robots are physical agents that perform tasks by manipulating the


physical world.
 Robots are equipped with effectors such as legs, wheels, joints, and
grippers.
 Effectors have a single purpose: to assert physical forces on the
environment.
 Robots are also equipped with sensors, which allow them to perceive
their environment.
 Present day robotics employs a diverse set of sensors, including cameras
and lasers to measure the environment, and gyroscopes and
accelerometers to measure the robot’s own motion.
Robot main categories

 Most of today’s robots fall into one of three primary categories.


 Manipulators
 mobile robot
 mobile manipulator
Manipulators
 Manipulators, or robot arms are physically anchored to
their workplace, for example in a factory assembly line or
on the International Space Station.
 Manipulator motion usually involves
a chain of controllable joints, enabling such robots to
place their effectors in any position
within the workplace.
 Manipulators are by far the most common type of
industrial robots, with approximately one million units
installed worldwide.
 Some mobile manipulators are used in hospitals to assist
surgeons.
 Few car manufacturers could survive without robotic
manipulators, and some manipulators have even been
used to generate original artwork.
Manipulator

https://www.youtube.com/watch?v=sjAZGUcjrP8
mobile robot
 Mobile robots move about their environment using wheels,
legs, or similar mechanisms.
 They have been put to use delivering food in hospitals, moving
containers at loading docks, and similar tasks.
 The planetary rover shown explored Mars for a period of 3
months in 1997.
mobile robot

 Other types of mobile robots UAV include unmanned air vehicles (UAVs),
commonly used for surveillance, crop-spraying, and military operations.
Mobile robot

https://www-
robotics.jpl.nasa.gov/tasks/taskVideo.cfm?TaskID=34&tdaID=2679&Video=120
mobile manipulator

 combines mobility with manipulation


 Humanoid robots mimic the human torso.
 Mobile manipulators can apply their
effectors further afield than anchored
manipulators can, but their task is made
harder because they don’t have the rigidity
that the anchor provides.
Real robots

 Real robots must cope with environments that are partially observable,
stochastic, dynamic, and continuous.
 Many robot environments are sequential and multiagent as well.
 Partial observability and stochasticity are the result of dealing with a large,
complex world.
 Robot cameras cannot see around corners, and motion commands are
subject to uncertainty due to gears slipping, friction, etc.
Robot hardware -sensors
 Sensors are the perceptual interface between robot and
environment
 Passive sensors, such as cameras, are true observers of the
environment: they capture signals that are generated by
other sources in the environment.
 Active sensors, such as sonar, send energy into the
environment. They rely on the fact that this energy is reflected
back to the sensor. Active sensors tend to provide more
information than passive sensors, but at the expense of
increased power consumption and with a danger of
interference when multiple active sensors are used at the
same time.
Robot hardware -sensors

 Range finders are sensors that measure the distance to nearby objects.
 Sonar sensors emit directional sound waves, which are reflected by objects, with
some of the sound making it back into the sensor.
 The time and intensity of the returning signal indicates the distance
to nearby objects.
 tactile sensors such as whiskers, bump panels, and touch-sensitive
skin. These sensors measure range based on physical contact, and can be
deployed only for sensing objects very close to the robot.
Robot hardware -sensors

 Location sensors use range sensing as a primary component to determine


location.
 Outdoors, the Global Positioning System (GPS) is the most common solution to
the localization problem.
 GPS measures the distance to satellites that emit pulsed signals.
 Differential GPS involves a second ground receiver with known location,
providing millimeter accuracy under ideal conditions.
 Unfortunately, GPS does not work indoors or underwater.
Robot hardware -sensors

 proprioceptive sensors, which inform the robot of its own motion.


 To measure the exact configuration of a robotic joint, motors are often
equipped with shaft decoders that count the revolution of motors in small
increments. On robot arms, shaft decoders can provide accurate
information over any period of time.
 On mobile robots, shaft decoders that report wheel revolutions can be
used for odometry—the measurement of distance traveled.
 Unfortunately, wheels tend to drift and slip, so odometry is accurate only
over short distances.
Robot hardware -effectors

 Effectors are the means by which robots move and change the
shape of their bodies.
 We count one degree of freedom for each independent direction
in which a robot, or one of its effectors, can move.
 For example, a rigid mobile robot such as an AUV has six degrees of
freedom, three for its (x, y, z) location in space and three for its
angular orientation, known as yaw, roll, and pitch.
 These six degrees define the kinematic state or pose of the robot.
 The dynamic state of a robot includes these six plus an additional six
dimensions for the rate of change of each kinematic dimension,
that is, their velocities.
Robot hardware -effectors
Robot hardware –power

 Sensors and effectors alone do not make a robot.


 A complete robot also needs a source of power to drive its effectors.
 The electric motor is the most popular mechanism for both manipulator
actuation and locomotion, but pneumatic actuation using compressed gas
and hydraulic actuation using pressurized fluids also have their application
niches.
Robot perception
 Perception is the process by which robots map sensor
measurements into internal representations of the environment.
 Perception is difficult because sensors are noisy, and the
environment is partially observable, unpredictable, and often
dynamic.
 A good internal representations for robots have three properties:
 they contain enough information for the robot to make good decisions,
 they are structured so that they can be updated efficiently,
 and they are natural in the sense that internal variables correspond to
natural state variables in the physical world.

You might also like