Reinforcement Learning and Robotics
Reinforcement Learning and Robotics
and Robotics
What is it?
https://www.youtube.com/watch?v=gn4nRCC9TwQ
Reinforcement learning
Beyond the agent and the environment, one can identify four main
subelements of a reinforcement learning system: a policy, a reward
function, and a value function.
Elements of RL-policy States or
situations Actions
The above equation states that the Q Value yielded from being at state s
and selecting action a, is the immediate reward received, r(s,a), plus the
highest Q Value possible from state s’ (which is the state we ended up in
after taking action a from state s).
We’ll receive the highest Q Value from s’ by choosing the action that
maximizes the Q Value.
We also introduce γ, usually called the discount factor, which controls
the importance of long term rewards versus the immediate one.
Learning based on Markov process
How would we implement this to solve real-life challenges? One way is drawing a table to
store all possible state-action combinations, and use it to save Q Values.
Sample robot navigation with Qlearn
https://www.youtube.com/watch?v=sjAZGUcjrP8
mobile robot
Mobile robots move about their environment using wheels,
legs, or similar mechanisms.
They have been put to use delivering food in hospitals, moving
containers at loading docks, and similar tasks.
The planetary rover shown explored Mars for a period of 3
months in 1997.
mobile robot
Other types of mobile robots UAV include unmanned air vehicles (UAVs),
commonly used for surveillance, crop-spraying, and military operations.
Mobile robot
https://www-
robotics.jpl.nasa.gov/tasks/taskVideo.cfm?TaskID=34&tdaID=2679&Video=120
mobile manipulator
Real robots must cope with environments that are partially observable,
stochastic, dynamic, and continuous.
Many robot environments are sequential and multiagent as well.
Partial observability and stochasticity are the result of dealing with a large,
complex world.
Robot cameras cannot see around corners, and motion commands are
subject to uncertainty due to gears slipping, friction, etc.
Robot hardware -sensors
Sensors are the perceptual interface between robot and
environment
Passive sensors, such as cameras, are true observers of the
environment: they capture signals that are generated by
other sources in the environment.
Active sensors, such as sonar, send energy into the
environment. They rely on the fact that this energy is reflected
back to the sensor. Active sensors tend to provide more
information than passive sensors, but at the expense of
increased power consumption and with a danger of
interference when multiple active sensors are used at the
same time.
Robot hardware -sensors
Range finders are sensors that measure the distance to nearby objects.
Sonar sensors emit directional sound waves, which are reflected by objects, with
some of the sound making it back into the sensor.
The time and intensity of the returning signal indicates the distance
to nearby objects.
tactile sensors such as whiskers, bump panels, and touch-sensitive
skin. These sensors measure range based on physical contact, and can be
deployed only for sensing objects very close to the robot.
Robot hardware -sensors
Effectors are the means by which robots move and change the
shape of their bodies.
We count one degree of freedom for each independent direction
in which a robot, or one of its effectors, can move.
For example, a rigid mobile robot such as an AUV has six degrees of
freedom, three for its (x, y, z) location in space and three for its
angular orientation, known as yaw, roll, and pitch.
These six degrees define the kinematic state or pose of the robot.
The dynamic state of a robot includes these six plus an additional six
dimensions for the rate of change of each kinematic dimension,
that is, their velocities.
Robot hardware -effectors
Robot hardware –power