0% found this document useful (0 votes)

69 views35 pages

Reinforcement Learning and Robotics

Reinforcement learning and robotics are related fields. Reinforcement learning involves an agent learning through trial-and-error interactions with an environment. The agent aims to maximize rewards. There are three main types of learning: supervised, unsupervised, and reinforcement learning. Reinforcement learning deals with rewarding or punishing an agent's actions to influence its behavior. Effective reinforcement learning balances exploration of new actions with exploitation of existing knowledge. Robotics applies reinforcement learning concepts through robotic agents that can perceive states and take actions to achieve goals in the physical world. Common robot types include manipulators, mobile robots, and mobile manipulators that combine mobility and manipulation.

Uploaded by

AhmedIsmaeil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views35 pages

Reinforcement Learning and Robotics

Uploaded by

AhmedIsmaeil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Reinforcement learning

and Robotics
What is it?

https://www.youtube.com/watch?v=gn4nRCC9TwQ
Reinforcement learning

 Learning takes place as a result of interaction between an agent and the

world, the idea behind learning is that
 Percepts received by an agent should be used not only for acting, but also for
improving the agent’s ability to behave optimally in the future to achieve the
goal.
 The task of RL is to learn how to behave successfully to achieve a goal while
interacting with an external environment
 Learn via experiences!
Three types of learning

 Supervised Learning deals with predicting values or classes based on

labeled data.
 Unsupervised Learning deals with clustering and finding relations in
unlabeled data
 Reinforcement Learning deals with how some arbitrary being (formally
referred to as an ―Agent‖) should act and behave in a given environment.
The way it is done is by giving the Agent rewards or punishments based on
the actions it has performed on different scenarios.
RL –key features

 Reinforcement learning is learning what to do-how to map situations to

actions- so as to maximize a numerical reward signal.
 The learner is not told which actions to take, as in most forms of machine
learning, but instead must discover which actions yield the most reward by
trying them.
 Actions may affect not only the immediate reward but also the next
situation and, through that, all subsequent rewards.
 These two characteristics most important distinguishing features of
reinforcement learning
 trial-and-error search
 delayed reward
Exploration vs. exploitation challenge

 To obtain a lot of reward, a reinforcement learning agent must prefer

actions that it has tried in the past and found to be effective in producing
reward.
 But to discover such actions, it has to try actions that it has not selected
before.
 The agent has to exploit what it already knows in order to obtain
reward, but it also has to explore in order to make better action selections
in the future.
 The dilemma is that neither exploration nor exploitation can be
pursued exclusively without failing at the task.
Elements of RL

 Beyond the agent and the environment, one can identify four main
subelements of a reinforcement learning system: a policy, a reward
function, and a value function.
Elements of RL-policy States or
situations Actions

 A policy defines the learning agent’s way of behaving at a given

time –denoted pt.
 A policy is a mapping from perceived states of the environment to
actions to be taken when in those states.
 It corresponds to a set of stimulus–response rules or associations.
 In some cases the policy may be a simple function or lookup table,
whereas in others it may involve extensive computation such as a
search process.
 The policy is the core of a reinforcement learning agent in the
sense that it alone is sufficient to determine behavior.
Elements of RL- reward function
 A reward function defines the goal in a reinforcement learning problem.
 It maps each perceived state (or state–action pair) of the environment to
a single number, a reward, indicating the intrinsic desirability of that state.
 A reinforcement learning agent’s sole objective is to maximize the total
reward it receives in the long run.
 The reward function defines what are the good and bad events for the
agent.
 Reward may serve as a basis for altering the policy. For example, if an
action selected by the policy is followed by low reward, then the policy
may be changed to select some other action in that situation in the
future. mapping
State or
(state, action) A single number
Elements of RL- value function

 Whereas a reward function indicates what is good in an

immediate sense,
a value function specifies what is good in the long run.
 Roughly speaking, the value of a state is the total
amount of reward an agent can expect to accumulate
over the future, starting from that state.
Values vs. rewards
 For example, a state might always yield a low immediate reward but
still have a high value because it is regularly followed by other states
that yield high rewards.
 We seek actions that bring about states of highest value, not highest
reward
 Unfortunately, it is much harder to determine values than it is to
determine rewards.
 Rewards are basically given directly by the environment, but values
must be estimated and reestimated from the sequences of
observations an agent makes over its entire lifetime.
RL-example
 Take Super Mario as an example: Mario is the Agent interacting
with the world (the Environment).
 The states are exactly what we see on the screen
 Episode is a level: the initial state is how the level begins, and the
terminal state is how the level ends, whether we completed it or
perished while trying.
 The actions are move forward, move backwards, jump, etc.
 Rewards are given depending on actions outcome: when Mario
collects coins or bonuses, it receives a positive reward, and when
it falls or being hit by an enemy, it receives a negative reward.
When Mario just wonders around, the reward it receives is zero, as
if to say ―you did nothing special‖.
RL-example

 to be able to collect rewards,

some ―non-special‖ actions are
needed to be taken — you have to
walk towards the coins before you
can collect them.
 So an Agent must learn how to
handle postponed rewards by
learning to link those to the actions
that really caused them.
Learning based on Markov process

 Assume we already know what is the expected reward for each

action on each step.
 How will we choose an action in this case? Quite simply — we’ll
choose the sequence of action that will eventually generate
the highest reward.
 This cumulative reward we’ll receive is often referred to as Q
Value (an abbreviation of Quality Value), and we can formalize
our strategy mathematically as:
Learning based on Markov process
The immediate Expected
reward future rewards
Own history
experience

 The above equation states that the Q Value yielded from being at state s
and selecting action a, is the immediate reward received, r(s,a), plus the
highest Q Value possible from state s’ (which is the state we ended up in
after taking action a from state s).
 We’ll receive the highest Q Value from s’ by choosing the action that
maximizes the Q Value.
 We also introduce γ, usually called the discount factor, which controls
the importance of long term rewards versus the immediate one.
Learning based on Markov process
 How would we implement this to solve real-life challenges? One way is drawing a table to
store all possible state-action combinations, and use it to save Q Values.
Sample robot navigation with Qlearn

 See sentral LMS

Robotics
Basic components

 Robots are physical agents that perform tasks by manipulating the

physical world.
 Robots are equipped with effectors such as legs, wheels, joints, and
grippers.
 Effectors have a single purpose: to assert physical forces on the
environment.
 Robots are also equipped with sensors, which allow them to perceive
their environment.
 Present day robotics employs a diverse set of sensors, including cameras
and lasers to measure the environment, and gyroscopes and
accelerometers to measure the robot’s own motion.
Robot main categories

 Most of today’s robots fall into one of three primary categories.

 Manipulators
 mobile robot
 mobile manipulator
Manipulators
 Manipulators, or robot arms are physically anchored to
their workplace, for example in a factory assembly line or
on the International Space Station.
 Manipulator motion usually involves
a chain of controllable joints, enabling such robots to
place their effectors in any position
within the workplace.
 Manipulators are by far the most common type of
industrial robots, with approximately one million units
installed worldwide.
 Some mobile manipulators are used in hospitals to assist
surgeons.
 Few car manufacturers could survive without robotic
manipulators, and some manipulators have even been
used to generate original artwork.
Manipulator

https://www.youtube.com/watch?v=sjAZGUcjrP8
mobile robot
 Mobile robots move about their environment using wheels,
legs, or similar mechanisms.
 They have been put to use delivering food in hospitals, moving
containers at loading docks, and similar tasks.
 The planetary rover shown explored Mars for a period of 3
months in 1997.
mobile robot

 Other types of mobile robots UAV include unmanned air vehicles (UAVs),
commonly used for surveillance, crop-spraying, and military operations.
Mobile robot

https://www-
robotics.jpl.nasa.gov/tasks/taskVideo.cfm?TaskID=34&tdaID=2679&Video=120
mobile manipulator

 combines mobility with manipulation

 Humanoid robots mimic the human torso.
 Mobile manipulators can apply their
effectors further afield than anchored
manipulators can, but their task is made
harder because they don’t have the rigidity
that the anchor provides.
Real robots

 Real robots must cope with environments that are partially observable,
stochastic, dynamic, and continuous.
 Many robot environments are sequential and multiagent as well.
 Partial observability and stochasticity are the result of dealing with a large,
complex world.
 Robot cameras cannot see around corners, and motion commands are
subject to uncertainty due to gears slipping, friction, etc.
Robot hardware -sensors
 Sensors are the perceptual interface between robot and
environment
 Passive sensors, such as cameras, are true observers of the
environment: they capture signals that are generated by
other sources in the environment.
 Active sensors, such as sonar, send energy into the
environment. They rely on the fact that this energy is reflected
back to the sensor. Active sensors tend to provide more
information than passive sensors, but at the expense of
increased power consumption and with a danger of
interference when multiple active sensors are used at the
same time.
Robot hardware -sensors

 Range finders are sensors that measure the distance to nearby objects.
 Sonar sensors emit directional sound waves, which are reflected by objects, with
some of the sound making it back into the sensor.
 The time and intensity of the returning signal indicates the distance
to nearby objects.
 tactile sensors such as whiskers, bump panels, and touch-sensitive
skin. These sensors measure range based on physical contact, and can be
deployed only for sensing objects very close to the robot.
Robot hardware -sensors

 Location sensors use range sensing as a primary component to determine

location.
 Outdoors, the Global Positioning System (GPS) is the most common solution to
the localization problem.
 GPS measures the distance to satellites that emit pulsed signals.
 Differential GPS involves a second ground receiver with known location,
providing millimeter accuracy under ideal conditions.
 Unfortunately, GPS does not work indoors or underwater.
Robot hardware -sensors

 proprioceptive sensors, which inform the robot of its own motion.

 To measure the exact configuration of a robotic joint, motors are often
equipped with shaft decoders that count the revolution of motors in small
increments. On robot arms, shaft decoders can provide accurate
information over any period of time.
 On mobile robots, shaft decoders that report wheel revolutions can be
used for odometry—the measurement of distance traveled.
 Unfortunately, wheels tend to drift and slip, so odometry is accurate only
over short distances.
Robot hardware -effectors

 Effectors are the means by which robots move and change the
shape of their bodies.
 We count one degree of freedom for each independent direction
in which a robot, or one of its effectors, can move.
 For example, a rigid mobile robot such as an AUV has six degrees of
freedom, three for its (x, y, z) location in space and three for its
angular orientation, known as yaw, roll, and pitch.
 These six degrees define the kinematic state or pose of the robot.
 The dynamic state of a robot includes these six plus an additional six
dimensions for the rate of change of each kinematic dimension,
that is, their velocities.
Robot hardware -effectors
Robot hardware –power

 Sensors and effectors alone do not make a robot.

 A complete robot also needs a source of power to drive its effectors.
 The electric motor is the most popular mechanism for both manipulator
actuation and locomotion, but pneumatic actuation using compressed gas
and hydraulic actuation using pressurized fluids also have their application
niches.
Robot perception
 Perception is the process by which robots map sensor
measurements into internal representations of the environment.
 Perception is difficult because sensors are noisy, and the
environment is partially observable, unpredictable, and often
dynamic.
 A good internal representations for robots have three properties:
 they contain enough information for the robot to make good decisions,
 they are structured so that they can be updated efficiently,
 and they are natural in the sense that internal variables correspond to
natural state variables in the physical world.

Direct and Indirect Speech For Class 6
25% (4)
Direct and Indirect Speech For Class 6
6 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
No ratings yet
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
23 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Sections
No ratings yet
Sections
76 pages
Kguh
No ratings yet
Kguh
38 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Module 1
No ratings yet
Module 1
72 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
Reinforcement Learning 1
No ratings yet
Reinforcement Learning 1
14 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
59 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Lecture 9 - Reinforced Learning
No ratings yet
Lecture 9 - Reinforced Learning
18 pages
Introduction To Reinforcement Learning
No ratings yet
Introduction To Reinforcement Learning
62 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
37 RL
No ratings yet
37 RL
18 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
RL Lecturer
No ratings yet
RL Lecturer
38 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Module 01
No ratings yet
Module 01
66 pages
2024 MTH058 Lecture05 ReinforcementLearning
No ratings yet
2024 MTH058 Lecture05 ReinforcementLearning
59 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
4.1 Reinforcement Learning 2
No ratings yet
4.1 Reinforcement Learning 2
31 pages
Maai 6
No ratings yet
Maai 6
143 pages
AI Unit - 3
No ratings yet
AI Unit - 3
102 pages
Unit-V: Nb-Mjcet
No ratings yet
Unit-V: Nb-Mjcet
36 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
35 pages
RL Module 1
No ratings yet
RL Module 1
6 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Unit 4
No ratings yet
Unit 4
56 pages
RL Ese Answers
No ratings yet
RL Ese Answers
16 pages
Reinforcement Learning2A
No ratings yet
Reinforcement Learning2A
88 pages
Reinf 2
No ratings yet
Reinf 2
4 pages
Module 1
No ratings yet
Module 1
81 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
ML U5 Notes
No ratings yet
ML U5 Notes
26 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Reinforcement Learning and Deep Learning Unit 1,2
No ratings yet
Reinforcement Learning and Deep Learning Unit 1,2
74 pages
Module - 1 - Reinforcement Learning and Markov Decision Process
No ratings yet
Module - 1 - Reinforcement Learning and Markov Decision Process
19 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
Lecture - 01 - Introduction - I
No ratings yet
Lecture - 01 - Introduction - I
15 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
4 pages
Unit 5
No ratings yet
Unit 5
10 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
F90de-Introduction To Reinforcement Learning
No ratings yet
F90de-Introduction To Reinforcement Learning
67 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
CE OOO BOQ Solar PV System Contractor XXX
No ratings yet
CE OOO BOQ Solar PV System Contractor XXX
1 page
Gainsforth 1945
100% (1)
Gainsforth 1945
12 pages
Tugas 2 KPM
No ratings yet
Tugas 2 KPM
2 pages
EN
No ratings yet
EN
2 pages
Operation Maintenance Manual
100% (1)
Operation Maintenance Manual
138 pages
Geothermal Power Generating Systems: Zserlene Faye B. Manalo
No ratings yet
Geothermal Power Generating Systems: Zserlene Faye B. Manalo
14 pages
CH5 MINERALS AND ENERGY REOURCES Mcqs
No ratings yet
CH5 MINERALS AND ENERGY REOURCES Mcqs
4 pages
Panelboards - Electrical Design Guide
No ratings yet
Panelboards - Electrical Design Guide
2 pages
1130048585final Petition
No ratings yet
1130048585final Petition
50 pages
Post Synthesis Simulation
No ratings yet
Post Synthesis Simulation
6 pages
Manifold: Differentiable Manifolds and If Differentiation Can Take Place Arbitrarily Often They Are
No ratings yet
Manifold: Differentiable Manifolds and If Differentiation Can Take Place Arbitrarily Often They Are
1 page
Datasheet G0173GRGRCA5T4000S
No ratings yet
Datasheet G0173GRGRCA5T4000S
4 pages
Winter Pack Model Answer
No ratings yet
Winter Pack Model Answer
16 pages
Free Parenting Plan Template - PDF & Word
100% (1)
Free Parenting Plan Template - PDF & Word
11 pages
The Aluminizing in Powder Technology of AISI 304 S PDF
No ratings yet
The Aluminizing in Powder Technology of AISI 304 S PDF
5 pages
Buspass Management System
No ratings yet
Buspass Management System
15 pages
Bentley Flying Spur
No ratings yet
Bentley Flying Spur
8 pages
Rizal 19th Century New
No ratings yet
Rizal 19th Century New
38 pages
Menstural Cycle DISORDERS
No ratings yet
Menstural Cycle DISORDERS
28 pages
Strategic Management (SM) : Saurav Banerjee 1
No ratings yet
Strategic Management (SM) : Saurav Banerjee 1
11 pages
Non - Ammable Solvent-Free Liquid Polymer Electrolyte For Lithium Metal Batteries
No ratings yet
Non - Ammable Solvent-Free Liquid Polymer Electrolyte For Lithium Metal Batteries
12 pages
Italo Calvino Mushrooms in The City
No ratings yet
Italo Calvino Mushrooms in The City
3 pages
ON A/C 103-103: Reference Qty Designation
No ratings yet
ON A/C 103-103: Reference Qty Designation
21 pages
امتحان+ الصف الاول الاعدادي+اول+3+وحدات+مستر+عرفات+و+محمد+رضا
No ratings yet
امتحان+ الصف الاول الاعدادي+اول+3+وحدات+مستر+عرفات+و+محمد+رضا
8 pages
How To Publish Research Paper
0% (1)
How To Publish Research Paper
4 pages
IDEHI The Journey Continues
No ratings yet
IDEHI The Journey Continues
299 pages
Cambridge Assessment International Education: Hindi As A Second Language 0549/01 October/November 2019
No ratings yet
Cambridge Assessment International Education: Hindi As A Second Language 0549/01 October/November 2019
7 pages
NotaryStateAtLarge PDF
No ratings yet
NotaryStateAtLarge PDF
3 pages
Fuchs Lubritech GMBH - STABYL EOS E 2 - 000000000601079160 - 10!25!2016 - English
No ratings yet
Fuchs Lubritech GMBH - STABYL EOS E 2 - 000000000601079160 - 10!25!2016 - English
9 pages