0% found this document useful (0 votes)

164 views75 pages

AI in Robots

This document is the table of contents and preface for a book on autonomous mobile robots and artificial intelligence approaches. The book covers topics like behavior-based control architectures, designing robot behaviors and programming, mapping, localization and navigation techniques, and machine learning methods for robot learning. It aims to provide both foundational knowledge and an appreciation of trends in the field of mobile robotics and artificial intelligence.

Uploaded by

Islam Sehsah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

164 views75 pages

AI in Robots

Uploaded by

Islam Sehsah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

AUTONOMOUS MOBILE ROBOTS: AN

ARTIFICIAL INTELLIGENCE APPROACH

MOHAMMED E. EL-TELBANY

AUTONOMOUS MOBILE ROBOTS: AN

ARTIFICIAL INTELLIGENCE APPROACH

BY
MOHAMMED E. EL-TELBANY
Electronics Research Institute
Computers and System Department

TToo eevveerryyoonnee hhoow

w hhaass ccoonnssiiddeerraabbllyy cchhaannggeedd m
myy vviieew
w ttoo tthhee w
woorrlldd
aanndd ggiivveess m
mee tthhee m
moorraall ssuuppppoorrtt nneeeeddeedd ttoo m
myy ssuucccceessss..

iii

TABLE OF CONTENTS
Table of Contents ........................................................................................................iv
Preface .........................................................................................................................vi
Chapter I...................................................................................................................... 1
Intelligence in Autonomous robots ....................................................................... 1
1.1 Mobile Robots................................................................................................ 1
1.2 Mobile Robots Applications .......................................................................... 2
1.3 Intelligence in Robotics ................................................................................. 3
1.4 Mobile Robot Sensors .................................................................................... 4
1.4.1 Tactile Sensors ........................................................................................ 5
1.4.2 Infrared Sensors ...................................................................................... 5
1.4.3 Sonar Sensors .......................................................................................... 5
1.4.4 Compass Sensors .................................................................................... 7
1.4.5 Shaft Encoder .......................................................................................... 7
1.4.6 Vision Sensors ........................................................................................ 7
1.4.7 Sensors Fusion ........................................................................................ 8
1.5 Mobile Robot Actuators................................................................................. 8
1.6 Assignments ................................................................................................... 9
Chapter II .................................................................................................................. 10
Behavior-Based Control ...................................................................................... 10
2.1 Behavior-Based Robotics ............................................................................ 10
2.2 Subsumption Control Architecture .............................................................. 12
2.3 Potential Fields ............................................................................................ 13
2.4 Schema-Based Reactive Control ................................................................. 15
2.5 Temporal Sequencing .................................................................................. 16
2.6 Motor Schema Versus Subsumption Architecture....................................... 17
2.7 Other Architectures ...................................................................................... 18
2.7.1 Colony Architecture .............................................................................. 18
2.7.2 DAMN Architecture ............................................................................. 18
2.8 Deliberative Control .................................................................................... 18
2.9 Inspiration Sources of Behaviours ............................................................... 19
2.10 Assignments ............................................................................................... 21
Chapter III ................................................................................................................ 22
Behaviors Design and Programming ................................................................. 22
3.1 Behavior Encoding ...................................................................................... 22
3.2 Concurrent Behaviors Combination ............................................................ 23
3.3 Subsumption Architecture Behaviors Programming .................................. 23
3.4 Behaviors and Schema Theory .................................................................... 25

3.5 Motor-Schema Behavior Programming ..................................................... 26

3.6 Motor-Schema Concurrent Behaviors Combination ................................... 28
3.7 Assignments ................................................................................................. 29
Chapter IV ................................................................................................................. 30
Mapping, localization and Navigation ............................................................... 30
4.1 Mapping ....................................................................................................... 31
4.1.1 Occupancy Grids............................................................................... 32
4.1.2 Grid or Topological Maps: A Comparison ....................................... 33
4.2 Localization ................................................................................................. 34
4.3 Navigation .................................................................................................... 35
4.3.2 Reactive Navigational Approaches ....................................................... 35
4.3.3 Deliberative Approaches....................................................................... 36
4.3.3.1 Workspace and Configuration Space ............................................. 36
4.3.3.2 Roadmap Methods ......................................................................... 38
4.3.3.3 Cell Decomposition ....................................................................... 39
4.3.3.4 Potential Fields .............................................................................. 40
4.3.3.5 Virtual Force histogram ................................................................. 42
4.4 Assignments ................................................................................................. 42
Chapter V .................................................................................................................. 43
Robot Learning .................................................................................................... 43
5.1 Machine Learning in Mobile Robots ........................................................... 43
5.2 Reinforcement Learning .............................................................................. 44
5.2.1 Monte Carlo Learning ........................................................................... 48
5.2.2 Temporal Difference Leaning ............................................................... 49
5.2.3 Q-learning Algorithm ........................................................................... 50
5.2.4 SARSA Learning Algorithm ................................................................ 52
5.3 Which Algorithm Should We Use? ............................................................. 52
5.4 Planning in Reinforcement Learning ........................................................... 52
5.5 Designing the Reinforcement Functions...................................................... 53
5.6 Other Reinforcement Learning Techniques ................................................. 54
5.6.1 Evolutionary Reinforcement Learning ................................................. 55
5.6.2 Statistical Reinforcement Learning ...................................................... 55
5.7 Comparison Between Learning Techniques ................................................ 56
5.8 Multi-Agent Robotic Systems ..................................................................... 57
5.9 Communication Techniques in Robotics ..................................................... 58
5.10 Collective Learning.................................................................................... 59
References ................................................................................................................. 60

PREFACE
This notes is discuss the artificial intelligence approaches that implemented for the
mobile robots field. The notes were motivated from the desire to educate the students
interested in mobile robots and artificial intelligence. The notes attempt to provide both a
foundation and appreciation for mobile robotics. At the same time, putting newer trends in
machine learning techniques in perspective specially the reinforcement learning.

Mohammed E. El-Telbany

CHAPTER I

INTELLIGENCE IN AUTONOMOUS ROBOTS

Artificial intelligence (AI) as a discipline was born by an eagerness to understand the
nature and mechanisms of intelligence and possibly replicate them. Robotics, as a main
field of AI, demonstrates this clearly. Robots are generally understood as artificial creature
that exist physically in the real world and exhibit anthropomorphic characteristics. Most
artificial intelligence researchers that study robotics are working on mobile robots. Mobile
robots pose a unique challenge to the artificial intelligence community, since they force the
researchers to deal with some of its tasks like uncertainty in sensing and action, planning,
mapping, localization and navigation. Moreover, it uses machine-learning techniques, such
as reinforcement learning, neural networks and genetic algorithms to improve their
performance.

1.1 Mobile Robots

Russell and Norvig [1995] were defined the robot as an active, artificial agent whose
environment is the physical world. This broad definition includes all artificial creatures that exist
physically in the real world and interact wit it in some manner. However, we can define the
robot as a machine able to extract information from its environment and use knowledge about its world
to move safely in a meaningful and purposive manner. The focus will be primarily on autonomous
robots, which can operate on their own without a human directly controlling them. A
particular class of robots that has attracted special attention is that mobile robots. The
mobile robots emphasis on problems related to understanding of large-scale space [Dudek
and Jenkin, 2000]. Where influences of the space creates a fundamental tasks of moving
through the space, sensing about space, and reasoning about the space. Theoretically, the
environment of a mobile robot can be land, water, air, space or any combination. Most
research efforts have focused on the first class, namely land mobile robots, which is also
the simplest case.
Mobile robots are achievements of engineering. The effectors, processors, user
interface, sensors, and communication mechanism that permit a mobile robot to operate
must be integrated so as to permit the entire system to function as a complete whole. The
effectors and sensors with which they are equipped distinguish mobile robots from each
other. It is obvious that a mobile robot needs both. An effector is any device under the
control of the robot that affects the environment. Effectors are used in two ways: to
change the position of the robot within its environment (locomotion) and to move other
objects in the environment (manipulation). To have an impact on the physical world, an
effector must be equipped with an actuator that converts software commands into physical
motion. The actuators themselves are electric motors or hydraulic or pneumatic cylinders. The
correspondence between the actuator motions in a mechanism and the resulting motion in
its various parts can be described with kinematics, the study of motion. In the next sections,
description for most common sensor and actuator types used in mobile robots to date are

presented and giving examples of the kind of reading that can be obtained from them and
their most common applications.

1.2 Mobile Robots Applications

Mobile robotics however is even one step behind in industrial applications. In a lot of
cases, mobile robots are used as wheeled transport vehicles in warehouses. These constructions
follow paths marked by reflective tape, paint or buried wire. In modern approaches, more
autonomous systems are designed which for example permit to build up mobile delivery
systems for hospitals. Autonomous systems in these cases may react dynamically to
changes in their environment or may be easily able to switch between different tasks. The
developers of those systems created different methods like behavior-based and bio-inspired
approaches or more algorithmic solution based on physical models to solve the emerging
problems. The real applications in which in which current mobile robots have been
successfully installed are characterized by one of more of the following attributes: an
absence of on-site human operator, a potentially high cost, and the need to tolerate
environment conditions that might not be acceptable to a human. As such, robots are
especially well suited for tasks where: (1) a human is at significant risk (i.e., nuclear, space,
military), (2) The economics or menial nature of the application result in insufficient use of
human workers (i.e., service industry, agriculture, clean streets or vacuum apartments), (3)
for humanitarian uses where there is a great risk (i.e., de-mining an area of land mines,
urban search and rescue), and (4) for tasks with a high duty cycles or very high fatigue
factor (i.e., security guards, transportation of objects, welding).

Figure 1.1: An automated guided vehicle.

For example, in industry, a lot of different transport systems are established. Besides
conveyors and forklifts, other automated transporters called AGVs (Automated Guided
Vehicles) carry parts to different stations in production halls. Figure 1.1 shows such a system
where, the AGVs act automatically but not autonomous. Usually, they follow marked paths
to find their goal points with high accuracy. But this precision has to be paid with the lack
of autonomy. If one looks for autonomous systems, which can perform transportation
tasks, very often, behavior-based systems can be found. Figure 1.2 shows another robot
with manipulator arm and a payload of about 100 kg. Again, this system is controlled with a

behavior-based approach. Especially, mobile robots, which are equipped with a

manipulator arm, are very rarely found as a commercial product.

Figure 1.2: A mobile robot with arm and a high payload of about 100 kg .

Figure 1.3 shows a service robot as a last example. Such a service task does no require
millimeter precision accuracy. In fact, these systems need a reactive behavior in the manner
cup not above table yet move a bit forward.

Figure 1. 3: A service robot delivers a cup.

1.3 Intelligence in Robotics

The Mobile robots pose a unique challenge to the artificial intelligence community, since
they force the researchers to deal with some of its tasks like uncertainty in sensing and
action, planning, learning, reliability, and real-time response. By improving and expanding
this knowledge into the mobile robot, it may be successfully becomes intelligent robot.
However, there are far more fundamental problems that have to be solved before the robot
becomes completely intelligent. These problems are:
Navigation: deals with the problem of how to get from point A to point B. In order
to move around in the house, the robot needs to be able to avoid running into things.
Much research has been devoted to this problem, but there exist no solution, which will
solve the problem in all cases, as all sensors have limited regions of operations (cf.
chapter IV).
Localization: the robot also needs information about its position in the world. One
can think about different ways to express this information. It could be in relation to

some global coordinate system, but it could be in relative to some objects. A

combination is likely to be needed as every physical contact requires the robot to
position itself relative to the object, whereas the robot will need its global position when
reasoning about how to go from one place to another. In order to do anything
meaningful a model of the world is needed. This model or, map, can be of any different
types, e.g., grid maps and feature based maps (cf. Chapter IV). The way the map is
acquired also various from system to system, but for a fully autonomous system, the
robot acquires the map on its own. Hence an unknown environment must be explored
and a map of it built.
Recognition: in order to know what to manipulate and navigate, objects have to be
recognized. In the past this problem has been solved by limiting the set of objects to
contain only objects, which can be unambiguously identified by simple features, such as
color or shape. In real environments this is not feasible, so the robot has to rely on
some kind of information extracted from its vision as we do.
Learning: Learning is one of important issues in intelligent robotics. The goal of
learning is to prepare it to deal with unforeseen situations and circumstances in its
environment by automatically acquiring knowledge, developing strategies and modifying
its behavioral tendency by experience. The robots would like to adapt themselves to the
changes of external and internal circumstances (e.g., changes in terrain, draft in sensors
and actuators, loss of power).
The four key problems mentioned above can be studied individually. However, each one
complements the other and assists to build a fully intelligent and autonomous robot.

1.4 Mobile Robot Sensors

Sensors are devices that can sense and a measure physical property of the environment
and deliver low-level information about the environment the robot is working in. This
information is noisy (i.e., imprecise), often contradictory and ambiguous. The sensors can
be grouped into internal and external sensors. The internal sensors measure some internal
state, such as the orientation of the wheel axis or the direction of the camera. The external
sensors measure e.g. distance to objects, position relative to beacon. Another possible way
to group the sensors would be to group them into sensors, which are active, and those,
which are passive. The active sensors probe the world by looking at the effect of some
action. Here we find sensor like sonar that probe the by sending out a sound signal and
looks at what comes back. The camera, used in ambient light, is on the other hand an
example of a passive sensor. All sensors are characterized by a number of properties that
describe their capabilities. The most important are [Nehmzow, 2000]:
Sensitivity: ratio of change of output to change of input.
Linearity: measure of the constancy of the ratio of input to output.
Measurement range: difference between minimum and maximum values measurable.
Response time: time required for a change in input to be observable in the output.
Accuracy: the difference between actual and measured value.

Resolution: smallest observable increment in input.

Repeatability: the difference between successive measurements of the same entity.
Type of output.
Good physical understanding for the sensors, will greatly increase the ability to utilize
their potential. Moreover, it is important to understand how the sensors work as they are
the input to the algorithms.

1.4.1 Tactile Sensors

Tactical sensors detect physical contact with an object. The simplest tactical sensors are
micro-switches or whisker sensors as shown in Figure 1.4. When a bumper is in contact with an
object, a micro-switch is closed and gives an electrical signal that can be sensed by the
controller. Other methods of building simple touch sensors include using strain gauges, or
piezoelectric transducers that measure the degree of deformation and give more information
than binary micros-witches sensors.

Figure 1.4: Tactical sensors (a) microswitch being used as a short range
digital touch sensor (b) A whisker and micoswitch used as a long-age
touch sensor.

1.4.2 Infrared Sensors

Infrared (IR) sensors are probably the simplest type of non-contact sensors; they are
widely used in mobile robots to detect obstacles. They operate by emitting an infrared light,
and detecting any reflections of surfaces in front of the robot. To differentiate the emitted
IR from ambient IR stemming from fluorescent tubes or the sun, the emitted signal is
usually modulated with a low frequency (e.g. 100 Hz). The intensity of the reflected light is
inversely proportional to the square of the distance, IR sensors are inherently short-range
sensor and the maximum ranges for reflection detection are 50 to 100 cm.

1.4.3 Sonar Sensors

In the sonar sensor a powerful pulse of a range of frequencies, is emitted, and a receiver
detects its reflection off objects ahead of the sensor is detected by a receiver. The
sensitivity of sonar receiver is nit uniform, but consists of a main lobe and side lobes. The
echoed signal can be processed in many ways, but the most common way is to measure the
time-of-flight (TOF), i.e., measure the time from the transmission to reception of the signal.
The signal, which is received, is integrated until it reached a certain threshold and then the

time is measured. As the speed of sound is known the distance to the object that reflected
the pulse can be computed from the time that elapsed between emission and reception,
using Equation 1.1.

1
vt,
2

(1.1)

With d being the distance to the nearest object within the sonar cone, v the speed of
the emitted signal in the medium (i.e. 344 ms-1 for sound in air of 20 C), and t the time
elapsed between emission of a pulse and the reception of its echo as shown in Figure 1.5.

Figure 1.5: The echo signal of robots sonar signals.

There are a number of uncertainties associated with reading from the sonar sensors.
Firstly, the sensitivity of sonar is cone-shaped as shown in Figure 1.6, and an object
detected at distance d could be any where within the sonar cone, on an arc with distance d
from the robot. So hat, the accuracy of object position measurement is a function of the
width of the sonar beam pattern.

Figure 1.6: The sonar beam pattern at 50 kHz.

Secondly, objects with flat surfaces and sharp edges reflects very little echo in most
directions and will probably not be noticed which called specular reflection problem. Finally,

after being reflected back from a surface, the sound may strike another surface and
reflected back to the sensor. The time delay will not correspond to a physical object, but to
a ghost object.

1.4.4 Compass Sensors

Compass sensors are very important for navigation application, and widely used.
Compass mechanism used in mobile robots measure the horizontal components of the
earth natural magnetic field.

1.4.5 Shaft Encoder

Shaft encoder is used to measure the rotation of the robot axels. They are mounted on
the rotating shaft to generate a binary representation of the shafts position. There are two
types of shaft encoders: absolute and incremental. In absolute shaft encoders, a disk showing a
gray code is read by an optical sensor and indicates the shafts position. Incremental encoders
are used for integrating movement over time. Shaft encoders used to measure the
movement of the translation and rotation movement of the robot. This method is called
path integration or dead reckoning. Based on their measurement, odometry can provide an
estimate of the robots location that is very accurate when expressed relative to the robots
previous location. Unfortunately, because of slippage as the robot moves, the position error
from wheel motion increases.

1.4.6 Vision Sensors

To supplement sonar information, a real time vision is often used, where structured
light sensors can determine the shape of an object by projecting stripes of light on it and
stereo cameras provide pairs of images recorded simultaneously for depth calculations.
Cameras are used to generate matrices of numbers that corresponding the light-level
distribution in an image. Cameras are available for grey-level and color image acquisition,
with a range of image resolutions and frame rates. All vision systems require at least the
following components: an object to be viewed, an illumination source, a vision sensor, an
edge detector, a line detector, an object detector, and previously recorded information for
interpreting what is being seen as shown in Figure 1.7.

Figure 1.7: The vision system architecture.

Edge detection is the process of finding the boundaries between areas of different
intensities of vision energy.
Motion detection is the process of detecting things that are moving in the field of
vision. You can detect motion by comparing two edge-detection results. If an edge has
moved or a new edge has appeared since the previous edge-detection process was
performed, motion has occurred.
Object line detection is the process of connecting detected edges into lines.
Object shape detection is the process of assembling detected lines to see if they form
some known shape.
Image understanding, the final step of vision processing, is where the system assesses
whether it is seeing the right things. Where the Color, depth, and texture provide
additional information about a viewed scene.
Computer vision requires knowledge of the field of computer graphics and image
processing and will not be addressed in this report.

1.4.7 Sensors Fusion

Sensor fusion is a technical term used to describe the process of using information from
several sensor modalities (i.e., types of sensors) to form one image of the world. Usually,
assumptions have to be made in order to form a global view of the world, and results have
to be amended in the light of new evidence. Because sensors have different characteristics,
different strengths and different weakness, sensor fusion is an extremely difficult task. In
the context of mobile robot, sensors fusion must be accomplished in at least three different
axes: (1) sensors, (2) positions, and (3) times. Combining measurements from multiple
positions, times, or sensors is closely related to interpolation, extrapolating, and fitting an
analytical function to data [Dudek and Jenkin, 2000]. One method of sensor fusion is to
use artificial neural networks that learn to associate perception with some kind of output
(e.g., an action, or classification) [Nehmazow, 2000].

1.5 Mobile Robot Actuators

Robot actuators can take many forms such as a tool for drilling, grinding, painting, and
welding as shown in Figure 1.8. Three types of power are used for actuators: electrical,
pneumatic, and hydraulic. The electrical power imbedded in the electrical motor, usually a
DC motor or a stepper motor. The former is the easiest to control applying DC power
operates the motor. The latters stator coils are sectioned; so that the rotator movement can
be determined by applying a voltage to the desired section of the coil. This generates a
rotation by a known angle. Stepper motor can be used for fine and precise motion.
The simplest and cheapest of all the drives are pneumatic actuators, which typically
operate between fixed positions by pumping compressed air between air chambers, and
thus moving a piston. However they are not capable of high precision. Hydraulic actuators
use pressured oil to generate motion. They are able to generate both high powers and high
precision movements, but are usually too heavy, dirty and expensive to be used on mobile
robots.

Figure 1.8: The three-finger gripper.

1.6 Assignments
1. Write a report on the impact of mobile robots on the society. (Hint: explore the
applications of mobile robots in industry, agriculture, services, etc, by searching the
Internet).
2. What is an intelligent robot?
3. Write a report that surveying the methods and techniques used in sensor fusions.
4. In Electronics Research Institute mobile robots platform obtains a copy of the
system documentation. How is powered, recharged, and controlled? Learn how to
power the robot on and off and how to control via software.

CHAPTER II

BEHAVIOR-BASED CONTROL
Building a mobile robot that shows an intelligent behavior while executing a complex
task is an interesting research challenge. In the same time, there has been an upsurge of
interest in learning in autonomous mobile robots deals with the behavior-based paradigm.
This approach concentrates on physical systems situated in the real world and promotes
simple associative learning between sensing and acting. The behavior-based control has been
served as an effective methodology for autonomous mobile robot control and learning in a
large number of autonomous robot problem domains. This chapter presents and overviews
the behavior-based control methods and a comparison between them to show their pros
and cons.

2.1 Behavior-Based Robotics

There are two control approaches used for robot control, traditional approach and
behavior-based approach. The traditional approach (Akin, 1998; Russell and Norvig, 1995)
structuring a robots controls into functional modules perception, planning, learning etc.,
and constraining as much as possible the environment where the robot will operate.
Creating a model of the environment and the robot preprocess sensor information into
abstracted internal representation that are acted on by a central planner, then instantiated
the results to become actions that can be executed by the robot to reach a specific goal.
This can be represented in horizontal control architecture as shown in Figure 2.1.

Figure 2.1: Horizontal control architecture of the robot.

Control systems using this architecture solve their task in several steps. First, the sensor
input is used o modify the internal representation of the environment. Second, based on
the internal representation planning is made. This results in a series of actions for the robot
to take to reach a specified goal. Third, this series of actions is used to control the motors
of the robot. This completes the cycle of the control system and its restarted to achieve
new goals. The first general purpose mobile robot constructed in the late 1960s at the
Stanford Research Institute is Shakey as shown in Figure 2.2, demonstrated that general
purpose planning systems using STRIPS were not very efficient and much two slow for
practical use.

Figure 2.2: Shakey is first general-purpose mobile platform.

General planning approach has several problems. Maintaining the model is in many
cases difficult because of sensor limitation or imperfect. The plans produced by the planner
often dont give the effects in the real world that is anticipated. So planning cannot be done
using this open-loop approach (Wyatt, 1997; Gat, 1998). Brooks (1990a; 1990b; 1991a;
1991b) argues that this traditional approach is very difficult, slow, and the resulting programs
are very brittle and not generalizable to other robotic platform or to a different environment.
It is clear that a different approach is required if one wants to bring robots to every day life.
This was creating systems that are more robust that do not require too many modifications
to the lay out of the environment where the robot will operate. To overcome these
limitations behavior-based control approach is developed (Brooks, 1986). In Behavior-based
approach, instead of decomposing the task based on the functionality, the decomposition is
done based on task-achieving modules or competences, are called behaviors on top of each
other as shown in Figure 2.3, this is called vertical control architecture.

Figure 2.3: Vertical control architecture of the robot.

The most common behavior-based control approaches are subsumption (Brooks, 1986)
and motor schemas (Arkin, 1989; 1998; Arkin and Balch, 1997; Murphy, 2000) architectures,
which made up of several parallel running behaviors1. Each behavior calculates a mapping
from sensor inputs - the sensor inputs relevant for the task of that behavior are used - to
motor outputs (cf. Appendix A). The suggested motor outputs from the behavior with
highest priority are used to control the robots motors, or summed to generate one motors
output. These architectures are called behavior-based control approaches and represent
methodologies for endowing robots with a collection of intelligent behaviors (Matari,
1994b; 1995b; 1995c; 1997a; 1998a; Parker, 1996). Behavior-based approaches are an
extension of reactive architecture, their computation is not limited to lookup table and
execution a simple functional mappings. Behavior-based systems are typically designed so
the effects of the behaviors interact in the environment rather than internally through the
system.

2.2 Subsumption Control Architecture

Brooks subsumption architecture is a hierarchical architecture (Brook, 1986). Lower
levels control instinctive behaviors and higher-level control tasks that are regarded as
more abstract. Each behavior consists of a fixed hardwired finite state machine (FSM), and
cannot be altered without redesign the whole system as shown in Figure 2.4. The high-level
behaviors can influence low-level ones through subsumption links. These links can either
inhibit the lower levels completely, or influence their behaviors.

Figure 2.4: Behavior module as augmented FSA.

The large limitation of the subsumption architecture is its complexity. Creating correct
subsuming and inhibitory links into other behaviors requires detailed knowledge of how the
other modules, and the system as a whole, function. Another important limitation is the
lack of explicit goals and its goal handling capabilities (Pirjanian, 1999). Since there is no
1

These are many infinitely man possible robot control schemes. These schemes are classified into four classes: reactive control
(dont think, react), deliberative control (think, then act), hybrid control (think and act independently in parallel, and
behavior-based control (think the way you act). Each of these approaches has its strengths and weaknesses, and all play
important and successful roles in certain problems and application. The behavior-based and hybrid control have the same
expressive and computational capabilities: both can store representation and look ahead. But they work in very different
ways. Reactive control is also a popular in highly stochastic environments and demands very fast responses. Deliberative
control is the best choice for domains that require a great deal of strategy and optimization, and in turn search and learning
(Matari, 2002)

centralized control, there the higher-level behavior wins the arbitration when they compete
against lower-level ones, and the functionality of subsumption architecture is viewed as an
emergent property of the interaction of its behavior and depends on the static and dynamic
properties of the environment.

2.3 Potential Fields

The potential-field approach is an approach to motion planning where the robot, which
represented as a point in configuration space, move under the influence of an artificial
potential field produced by an attractive force at the goal configuration and repulsive forces
at the obstacles (Khateb, 1986). The motor commands of the robot at any position in an
artificial potential field correspond to the vector on which the robot is situated. Goal
attract, and thus the goals will have vectors pointing towards them; obstacles repulse, and
will be surrounded by vectors pointing away. Forces can be of constant magnitude on
every vector in a system, or operate on a gradient. Many potential fields implementations
have goal forces exert a constant force across the entire gradient, while obstacles will exert
greater outward force the nearer the vector is to the obstacles. The vector at any given
position is computed by summing all the forces exerted on that potential vector U . In a
simple domain with one goal and two obstacles, the potential function U is constructed as
the sum of two more elementary potential functions:

U q Uatt q Urep q

(2.1)

Where U att is the attractive potential and U rep is a repulsive potential associated with

obstacles. The force F U corresponds the most promising direction to move in.

From Equation (2.1) it is seen that F is the sum of two forces: attractive F att U att and

repulsive forces F rep U rep . These forces vectors will generally point directly towards
the goal except in the area around the obstacles as shown in Figure 2.4. There are many
ways to define the elementary functions. For example, the attractive potential field U att can
be defined as a parabolic well:

1
U att q
2

q q goal

(2.2)

Where 0 is a scaling factor and q goal denotes the goal configuration. Figure 2.4(b)
is a plot of U att for a two dimensional cases. The repulsive potential field U rep can define as
follow:
2
1 1
1

if q 0
U rep q 2 q 0

if q 0
0

(2.3)

Where 0 is a scaling factor, q denotes the distance from q to the closest

obstacle and 0 0 is the distance within which the robot is under the influence of
repulsive force. Figure 2.5(c) is a plot of U rep for a two dimensional cases. Figure 2.4(d) is a
plot of the total potential field function U . These forces vectors will generally point
directly towards the goal except in the area around the obstacles as shown in Figure 2.4.

Figure 2.5: (a) The configuration of obstacles (black) in the map. (b) The
attractive potential associated with the goal configuration. (c) The
repulsive potential associative with obstacles. (d) The sum of the
attractive and repulsive potentials (adopted from Pirjanian, 1999).

There are two major difficulties with the potential fields method as applied to robot
navigation. The first is that this method can be slow and require substantial internal state,
as it appears that vectors must be computed based on a map of the entire region. But a
potential fields method can be easily adapted to a reactive system. There is no need to
factor in forces outside the of the robots perceptual range, though true potential fields
required computing vectors based on the entire field. A goal outside the perceptual range
may need to be maintained, but obstacles that the robot cannot sense do not necessarily
need to affect the motion vectors. Thus the vector accompanying the current position of
the robot can be computed using only the sensory information from a single time step
(Arkin, 1989). This vector computation using only current sensor readings makes this
method reactive, in that it requires no internal state, and allows the vector on the current
position of the robot to be computed quite quickly. The second potential field problem is
local minima. The robots motion at any moment is the sum of the forces affecting it. What
happens, then when these forces sum perfectly to zero? The robot could remain
indefinitely in a single position, perfectly immobilized at a local minimum. If no force

disrupts this equilibrium, movement may cease. Thus any potential fields method must
employ strategies for coping with this problem of local minima. One of these strategies is
generating a time-varying noise vector; another is to relocate the goal temporarily when stuck in a
local minimum, or to mark areas that have been visited already as impassable. All these methods
do not avoid the existence of local minima and are not always successful in their solutions.
Potential field navigation is very suitable for local navigation since the environment has
only to be known in the vicinity of the vehicle and only a short piece of the path is
calculated with each evaluation of the force fields.

2.4 Schema-Based Reactive Control

The potential fields method is sufficient to create good behavior when the environment
consists of only simple attractive and repulsive forces, but is not designed to support more
complicated and situational-dependent forces. The motor schema method is the reactive
control system components of Arkins Autonomous Robot Architecture (AuRA) and is an
attempt to create a more general, behavior-based, conception of the potential field method
(Arkin, 1989; 1998; Arkin and Balch, 1997; Murphy, 2000). A motor schema consists of
several pre-programmed collection of parallel, concurrently active behaviors, some of
which gather sensor information, called perceptual-schema some derive effectors called motorschema. Each perceptual schema is responsible for one sensory modality, and each motor
schema is responsible for one type of primitive behavior, expressing goal, or constraint for
a task. The perceptual-schemas directly feed into the motor schemas. These fields can be easily
added to each other resulting in a field taking information from all behaviors into account.
The direction and magnitude of the vector field are determined by the nature and gain of
the schema. The absence of arbitration between behaviors (schemas), the fusion of
behavioral outputs using vector summation in a manner analogous to the potential field
method2. As an example, important schema for a navigational task would include
avoid_obstacles and move_to_goal. Since schemas are independent, they can run concurrently,
provide parallelism and speed. Sensors inputs are processed by perceptual-schemas embedded
in the motor behaviors. Perceptual processing is minimal and provides just information
pertinent to the motor schema. For instance, a find_obstacles perceptual schema, which
provides a list of sensed obstacles, is embedded in the avoid_obstacles motor schema.
The concurrently running motor schemas are integrated as follows. First, each scheme
produces a vector indicating the direction the robot should move to satisfy that schemas
goal or constraint. The magnitude of the vector indicates the importance of achieving it. It
is not critical, for instance, to avoid an obstacle if it is distant, but critical if close by. The
magnitude of the avoid-obstacles vector is correspondingly small for distant obstacles and
large for close ones. The importance of motor schemas relative to each other is indicated
by the gain value for each one. The gain is usually set by human designer, but may also be
determined by learning using genetic algorithms (Pearce et. al., 1992; Arkin, 1998). Each
motor vector is multiplied by associated gain value and the results are summed and

The programming of schema-based reactive control is demonstrated in Appendix A.

normalized. The resultant vector is sent to the robot hardware for execution as shown in
Figure 2.6.

Figure 2.6: Motor schema example. The diagram on the left shows a
vector corresponding to move-to-goal schema; the center diagram shows
avoid-obstacles schema. On the right, the two schemas are summed,
resulting in a complete behavior for reaching the goal.

2.5 Temporal Sequencing

If a single set of schemata does not seem sufficient to obtain the desired behavior,
motor-schemata can be clumped together into complex, emergent behaviors, in which a
behavior consists of a number of different schemata. Groups of behaviors are referred to
as behavioral assemblages. One way behavioral assemblages may be used in solving complex
mission is to develop an assemblage for each task and to execute the assemblage in an
appropriate sequence. The steps in the sequence are separate behavioral states. Perceptual
events that cause transitions from one behavioral state to another are called perceptual
triggers. The resulting mission-solving strategy can be represented as a Finite State
Automaton (FSA). The technique is referred to as temporal sequencing (Arkin and MacKenzie,
1994; Arkin, 1998).

Figure 2.7: The Forage finite state automaton (FSA).

For example, the foraging task as shown in Figure 2.7, the forage task for a robot is to
wander about the environment looking for items of interest (attractors). Upon
encountering one of these attractors, the robot moves towards it, finally attaching itself.
After attachment the robot navigation to the home base where it deposits the attractor. In

this approach to solving the forage task, a robot can be in one of three behaviors states:
wander, acquire and deliver. The robot begins in the wander state. If there are no attractors
within the robots field of view, the robot remains in wander until one is encountered. When
an attractor is encountered, a transition to the acquire state is triggered. While in the acquire
state the robot moves towards the attractor and when it is sufficiently close, attaches to it.
The last state, deliver, is triggered when the robot attaches to the attractor. While in the
deliver state the robot carries the attractor back to home base. Upon reaching home base,
the robot deposits the attractor there and reverts to the wander state (Balch, 1998a).

2.6 Motor Schema Versus Subsumption Architecture

Comparison with the subsumption architecture can highlight some of distinctive aspects
of the motor schema architecture. Subsumption architectures consist of a behavioral
hierarchy, with each behavior running simultaneously and in parallel. As in motor schema
approach, each behavior has an independent access to the sensory data, and uses that data
to formulate an action to be taken, or it may decide to recommend no action at all. But
unlike the motor schema approach, subsumption style architecture creates a hierarchy of
behaviors, consisting of low-level behaviors that have no knowledge of the higher-level
behaviors. Coordination of behaviors occurs according to the priority hierarchy in a winnertake-all approach. In the motor schema approach the action outputs of each of the schemes
are combined by summation, and all the schemata are blended together to produce the
actions of the robot at each time step (See 2.2 and 2.4). Each type of architecture has
advantages and disadvantages. Subsumptive design gives a single behavior control over a
system, meaning that the behavior is given complete priority. This can mean that the robot
does whatever behavior is currently in control very will, but it can also mean that the robot
loses sight of the larger goals of the system. Motor schemata allow the blending of
behaviors, allowing the behavior of the robot to combine multiple goals. But motor
schema architecture must contend with local minima and situations in which having multiple
goals means that none of them is fulfilled to the designers satisfactions. Moreover, motor
schemata inherent flexibility due to the dynamic instantiation and deinstantiation of behaviors
on an as-needed basis, and easy reconfigurability through the use of high-level planners or
adaptive learning systems (Pearce et. al., 1992; Gat, 1998; Connell, 1991). The comparison is
summarized in Table 2.1.
Subsumption architecture

Motor-Schemata

Hierarchical

Yes

Reconfigurability/Learning

Yes

Arbitration Mechanism

Competitive

Cooperative

Behavior implementation

Hardwired

Software

Behavior Reusability

Yes

Table 2.1: Comparison between Subsumption architecture and Motor

schema.

2.7 Other Architectures

There is a wide range of other behavior-based architectures that share a philosophy of
eliminating or reduction of he symbolic representational knowledge and using the
behavioral units as their primary building blocks. Each ones uniqueness arises from is
choice of coordination mechanisms, the response encoding method used, and the design
methodology. A survey of these architectures is given, highlighting each ones approach
and contribution.

2.7.1 Colony Architecture

The colony architecture [Connell, 1990] is a direct descendent of the subsumption
architecture that uses simper coordination strategies (i.e., suppression only) and permits a
more flexible specification of behavioral relations. The colony architecture permits a treelike
ordering for behavioral priority as to the total ordering of layered behaviors found in
subsumption.

2.7.2 DAMN Architecture

The Distributed Architecture for Mobile Navigation (DAMN) developed by Rosenblatt
[1995] introduces a unique coordination mechanism. The behaviors in DAMN are
themselves asynchronous processes each generating outputs as a collection of votes cast
over a range responses. The votes for each behavior can be cast in a verity of ways,
including differing statistical distribution over a range of responses the behavior arbitration
method is a winner-take-all strategy in which the response with the largest number of votes
is selected for performing.

2.8 Deliberative Control

Over he years many different techniques and approaches for robotic control were
developed. The complete spectrum ranges from traditional methods that use deliberative
reasoning on the other side to reactive and behaviour-based control on the other as shown
in Figure (2.8).

Figure 2.8: The spectrum of robot control (Adopted from Arkin, 1998).

The robot that uses deliberative reasoning in control requires much knowledge about
the world and uses this knowledge to predict the outcome of actions to optimize its
performance. If the information is inaccurate or out of date, the outcome of the reasoning
process is probably incorrect. In a dynamic world, where objects may be moving, it is
potentially dangerous to rely on past information that may no longer be valid.
Representational world models are therefore generally constructed from both prior
knowledge about the environment and incoming sensor data (cf. chapter IV). On the other
side of the control spectrum are the reactive systems. Reactive control is a technique for tightly
coupling perception and action, typically in the context of motor behaviors, to produce timely
robotic responses in dynamic and unstructured worlds.

Both deliberative planning systems and purely reactive control systems have their
limitations, but using both forms of knowledge in a robotic architecture can make
behavior-based navigation more flexible and general. Hybrid deliberative-reactive robotic
architecture can combine the traditional abstract representational knowledge with the
responsiveness, robustness, and flexibility of purely reactive systems. However, building
such a hybrid system requires certain compromises. The interface between deliberation and
reactivity is not yet well understood and a research in this area is still ongoing. An
important design issue of hybrid control systems is the number of layers. Usually the hybrid
architecture consists of two or three layers. In the case of two layers the interface between
the layers is very important because it links rapid reactions with long-range planning. In the
case of three layers the middle layer coordinates between the other two layers, much like
the interface in the two-layer architecture. There are four main interface strategies for the
various hybrid architectural designs:
Selection: planning is viewed as configuration and determines the behavioral
composition and parameters used during execution.
Advising: the planner suggests changes that the reactive control system may or may
not use.
Adaptation: the planner continuously alters the ongoing reactive behaviors based on
the changing conditions within the world and task requirements.
Postponing: planning is viewed as a least commitment process. The planner postpones
making decisions on action until as late as possible.
Strong evidence exists that hybrid deliberative and behavior-based systems are found in
biology, so research in this area could lead to new insights. For example, there seem to be
two distinct systems concerned with controlling human behavior, a reactive and automatic
system (e.g. when ones hand touches something hot) and a willed and conscious control
system (e.g. grasping something).

2.9 Inspiration Sources of Behaviours

Behavior-based researchers argue that there is much can be gained for robotics through the
study of neuroscience, psychology and ethology. Some scientists attempt to implement
these results as closely as possible, testing the hypothesis of the biological models in
question. Others use it as an inspiration to create more intelligent robots [Arkin, 1998].

2.9.1 Neuroscientific and Psychological Basis for Behaviors

Neuroscience has inspired computer scientists to create artificial neural networks at the
cellular level of neurons, but also at the organizational level of brain structure much can be
leaned. Neuroscience models have often bad an impact on behavior-based robot design.
Two examples are:
The distinction between deliberative and automatic behavior control system.
Parallel mechanisms associated with both long and short-term memory.
Psychological models focus on the concept of mind and behavior rather then on the brain
itself. Various often-opposing psychological schools have inspired roboticists. Examples
are:
Using stimulus-response mechanisms for the expression of the behaviors.
Capturing the relationship a robot has with its environment based on the theories of
ecological psychology.
Using computational models from cognitive psychology to describe a robots behavior
within he world.
2.9.2 Ethological Basis for Behaviors
Ethology is the study of animal behaviors in its natural environment. For the true
ethologoist, the behavior of an organism is intimately coupled with its environment, so
removing the animal from its environment destroys the context for its behavior. Murphy
[2000] divides the behaviors into three broad categories. Reflexive behaviors are stimulusresponse (S-R), such as when your knee is tapped, it jerks upward. Reactive behaviors are leaned,
and then consolidated to where they can be executed without conscious thought (i.e riding
a bike, skiing). Conscious behaviors are deliberative such that stringing together previously
developed behaviors. The reflexive types of behaviors are interested and can be further
divided into three classes [Murphy, 2000]:
Reflexes are rapid automatic and involuntary responses triggered by certain
environmental stimuli. The reflexive response persists only as long as the duration of
the stimuli. Further, the response intensity correlated with the stimuluss strength.
Certain escape behaviors, such as those found in snails and bristle worms, involve
reflexive actions.
Taxes are voluntary behavioral responses that direct the animal toward or away
from a stimulus (attractive or repulsive). Taxes occur in response to visual, chemical,
mechanical and electromagnetic phenomena in a wide range of animals. Chemotaxis for
example, can be seen in response to chemical stimuli, used in trail following of ants.
Fixed-action patterns are time-extended response patterns triggered by a stimulus
but persisting for longer than the stimulus itself. The strength and duration of the
stimulus do not control the intensity and duration of the response. Unlike reflexes, fixed
action patterns may be motivated and they may result from a much broader range of
stimuli. Examples include egg-retrieving behavior of the grayling goose and the song of
crickets.
One of the most important concepts for behavior-based robotics drawn from the field of
ethology is the ethological niche. The animals survive in nature because they have found a
reasonable table niche. They have a place where they can coexist with their environment.

Evolution has shaped animals to fit their niche. As the environment is always changing, a
successful animal must be capable of adapting to these changes, a successful animal must
be capable of adapting to these changes or it will became extinct.

2.10 Assignments
1. Explain in one or two sentences each of the following terms: reflexes, taxes, and fixedaction pattern.
2. Write GUI program using Java or C++ to simulate a robot containing the sensors and
actuators, which are coupled together in reflexive behaviours to do:
a. Reflexive avoid: turn left when they touch something (use touch sensors and two
motors).
b. Phototaxis: follow a black line (use the IR sensor to detect the difference between
light and dark).
c. Fixed-action pattern avoid: backup and turn right when robot encounter a
negative obstacles (a cliff).
3. Describe the difference between robot control using a horizontal decomposition and a
vertical decomposition.
4. Evaluate the subsumption architecture in terms of: support for modularity, niche
targetability, ease of portability to other domains, robustness.

CHAPTER III

BEHAVIORS DESIGN AND PROGRAMMING

3.1 Behavior Encoding
A behavior is a mapping of sensory inputs to a pattern of motor actions, which then are
used to achieve a task as shown in Figure A.1.

Figure 3.1: Simple Behavior Diagram.

To encode behavior response that the stimuli evoke, a functional mapping is created
from the stimuli plane to the motor plane. By factoring the robots motor response into
orthogonal components: strength and orientation. Strength denotes the magnitude of the
sensory input and orientation denotes the direction of action. The behavior can express as
triple S , R, where S denotes the domain of all interpretable stimuli, R denotes the range
of possible response, and denotes the mapping : S R .
S consists of all the perceivable stimuli. Each individual stimulus or percept s S is
represented as a binary tuple p, having a particular type or perceptual class p and a
property of strength . The stimulus strength can be defined in a variety of ways: discrete
(e.g., binary: absent or present; categorical: absent, weak, medium, strong) or real valued or
continuous. The instantaneous response r R of the mobile robot that moves on flat ground
and can rotate only about its central axis has three degree of freedom expressed as r=[x, y,
]. For each active behavior we can formally establish a mapping between stimulus and
response using the behavior function s r . Associated with a particular behavior, ,
may be a scalar gain value g modifies the magnitude of the overall response r for a given s
as: r gr . These gain values are used to compose behaviors by specify their strengths
relative to one another. In the extreme case, g can be used o turn off a behavior by setting it
to 0, thus reducing to zero [Arkin, 1998]. is defined arbitrary, but it must be defined
over all relevant p in S. The behavior mappings, fall into three general categories:
Null: The Stimulus produces no motor response.
Discrete: The stimulus produces a response from an enumerative set of prescribed
choices (e.g., turn-right, stop etc.,). For example, consists of a finite set of (situation,
response) pairs. Sensing provides the index for finding the appropriate situation. Another
strategy involves he use of rule-based system. Here is represented as a collection of IFTHEN rules [Matari, 1994]., which can be extended using the fuzzy logic by synthesis a
Fuzzy IF-THEN rules [Saffiotti, 1997; Hoffmann, 1998].

Continuous: The stimulus produces a response that is continues over Rs range. One of
the most common methods for implementing continuous response is based on the potential
field technique (cf. 2.3).

3.2 Concurrent Behaviors Combination

Having discussed method to describe individual behaviors, we now study methods for
constructing systems consisting of multiple behaviors, where multiple behaviors may be
concurrently active with the robot system. The behavioral coordination function, C, is now
defined such as CG * BS or CG * R . C can be arbitrary defined, but several
strategies are commonly used to encode this function. They are split across two
dimensions: competitive and cooperative. The simplest competitive method is pure arbitration,
where only one behaviors output (ri) is selected from R and assigned to , what is,
arbitrarily choosing only one response from the many available. Several methods used to
implement this particular technique, including behavioral prioritization (Subsumption).
Cooperative methods, on the other hand, blend the outputs of multiple behaviors in some
way consistent with the robots overall goal. The most common method of this type is
vector addition or super-positioning. Competitive and cooperative methods can be
composed as well [see Arkin, 1998 for more details]. This is illustrated in Figure 3.2.

3.3 Subsumption Architecture Behaviors Programming

Task-achieving behaviors in the subsumption architecture are represented as separate
layers. Individual layers work on individual goals concurrently and asynchronously. Where
each behavior is represented by augmented finite state machine (AFSM) model (cf. 2.2).
This method is difficult in design and implementation. Brooks [1990c] recognize the
problem and develop the Behavior Language, which provide a new abstraction
independent of the AFSMs using a single rule set to encode each behavior. This high level
language is then compiled to the intermediate AFSM representation, which can then be
further compiled to run on arrange of target processors.

Figure 3.2: The classification of multiple behaviors coordination

techniques (adopted from Pirjanian, 1999).

To illustrate a programming example how the subsumption-based design is

programmed. We design and implement a foraging example for a robot having two sonar
sensors on the front of its body. Each behavior in the system is encoded as a set of rules
(standard for the Behavior Language) [Arkin, 1998]. Before programming of the behavior
begins the priorities of the behaviors have to be thought. It looks like a hard problem since
the many layers of behaviors are not dividing work and occasionally pass results to each
other. Usually the lowest level of behavior is programmed first and tested on the robot
until it works. When the behavior is working it is frozen and programming of the behavior
of the next level is initiated. The new behavior is tested together with the previous behavior
and adjusted until every thing is works. This process is continued until the desired level of
competence is achieved. The proposed architecture is shown in Figure 3.3, which consists
of five behaviors and layered based on the priorities.

Figure 3.3: Subsumption based model for foraging robot (adopted fro
Arkin 1998).

Coding theses behavior is straightforward for example, a wander behavior which making
the robot to move in a random direction for some time has the following Java code
segment.
private int wander()
{
if ( tt > 5 )
{
tt--;
move_random();
}
return 0; /* set priority = 0 which activate find_goal behavior */
}
For the pickup behavior, which turn towards the sensed attractor and go forward. If at
the attractor, close gripper, has the following Java code segment.

private int pickup() {

int dif;
if( obstacle ) {
return 1; /* set priority = 1 which activate avoid_obstacles behavior */
}
if( pellet ) {
pellet = false;
return 0; /* set priority = 0 which activate find_goal behavior */
}
if( energy < energy_low ) {
return 3; /* set priority = 3 which activate homing behavior again */
}
dif = laser_sen[0] - laser_sen[1];
/* calculate the difference between right and left laser sensors */
if( dif >= -40 && dif <= 40 ) { move_forward(); }
if( dif < -40 ) { move_left(); }
if( dif > 40 ) { move_right(); }
return 2; /* set priority = 2 which activate pickup behavior again */
}
Coordination the multiple behaviors based on behavioral prioritization. Each behavior
has a priority value and every behavior execute its action if its priority matches the current
priority variable value, this can be coded as following:
priorty = 0;
public void run() {
while(isAlive()) {
switch(priorty) {
case 0: priorty = find_goal (); break;
case 1: priorty = avoid_obstacles(); break;
case 2: priorty = pickup(); break;
case 3: priorty = homing(); break;
case 4: priorty = upload(); break;
case 5: priorty = recharge(); break;
default: priorty = wander(); break;
}
}
}

3.4 Behaviors and Schema Theory

Schemas were conceived of by psychologists as a way of expressing the basic unit of
activity. A schema consists both of the knowledge of to act and/or perceive as will as the
computational process by which it is used to accomplish the activity. The idea of schema
maps very well onto a class in object-oriented programming. Extending the schema theory
towards a computation theory of intelligence, the building block for robot intelligence, is
equivalent to a schema and composed of a motor schema and a perceptual schema. The motor

schema represents the template for the physical activity; the perceptual schema embodies
the sensing as shown in Figure 3.4.

Figure 3.4: Behavior decomposed into perceptual and motor schema

(adopted from Murphy, 2000).

The schema theory implemented the motor schema as a vector field, which direction and
magnitude of the action. In the schema theory, the perceptual schema is permitted to path
both the percept and a gain to the motor schema. The motor schema can use the gain to
compute a magnitude on the output action. However, schema theory does not specify how
the output from concurrent behaviors is combined. Where the output of concurrent
behaviors in some circumstances is combined or summed, in others occur in a sequence,
and sometimes would be the winner-talk-all effect.

3.5 Motor-Schema Behavior Programming

The motor-schema behavior-based methodology based on the potential fields. The
potential fields are actually easy to program, especially since the fields are egocentric to the
robot. The robot computes the effect of the potential field, usually as a straight line, at
every update, with no memory of where it was previously or where the robot has moved.
There are five basic potential fields, or primitives, which can be combined to build more
complex fields: uniform, perpendicular, attractor, repulsive, and tangential as shown in Figure 3.5.

Figure 3.5: Five primitive potential fields (a) uniform, b) perpendicular, c)

attractive, d) repulsive, and e) tangential (adopted from Murphy, 2000).

Any primitive potential field is usually represented by a single function. The vector
impacting the robot is computed each update. Consider a robot with a single range sensor
facing forward. The designer has decided that the repulsive field with a liner drop off is
appropriate. The formula is:
V direction 180
D d

V magnitude D
0

for d D

(3.1)

for d D

Where D is the maximum range of the fields effect, or the maximum distance at which
the robot can detect the obstacles. D is always the range at which the robot should respond
to stimulus. The roboticist set a D of 2 meters and the robot maximum velocity to 10
meters. This can be coded using Java as follow:
class vector {
double magnitude;
double direction;
public vector() {
magnitude = 0;
direction = 0;
}
}
class perceptual_schema {
double readSonar () {
}

/* hardware dependent function */ }

class motor_schema {
public static final double MAX_DIST = 2.0;
public static final double MAX_VELOCITY = 10.0;
vector repulsive (double d, double D) {
vector output = new vector();
if (d <= D)
{
output.direction = -180;
output.magnitude = (D d)/D;
}
else
return (new vector()); /* stay where you are */
}
}
To illustrate how the repulsive potential field can be used by a behavior, runaway for a
robot with a single sensor as a motor-schema. The following code segment clarifies the
idea.

class runaway extended Behavior {

perceptual_schema perceptual;
motor_schema motor;
public runaway (perceptual_schema p, motor_schema m) {
perceptual = p; /* construct the runaway behavior */
motor = m;
}
vector execute_runaway () {
/* everytime step get perceptual value and calculate the potential field */
double reading = perceptual.readSonar();
vector output = motor.repulsive(reading, motor_schema.MAX_DIST);
return (output);
}
The robot can execute the runaway behavior as soon is became on as follow:
vector output;
While (isAlive()) {
output = execute_runaway();
turn (output.direction);
forward(output.magnitude * motor_schema.MAX_VOLICITY);
}
The implementation of runaway behavior for a robot with a single sensor is
straightforward. But what if the robot has multiple range sensors? Multiple sensors will
detect bigger obstacles at the same time. The common way is to have a runaway behavior
for each sensor. This called multiple instantiation of the same behavior and using the
vector summation prosperity of potential field to generate a single vector value.

3.6 Motor-Schema Concurrent Behaviors Combination

A robot will generally have forces acting on it from multiple behaviors. In motorschema concurrent behaviors is combined and coordinated by vector summation. Each
behaviors relative strength or gain, gi, is used as a multiplier of the vectors before addition.
Figure 3.6 illustrate how this is accomplished.

Figure 3.6: Behavior Fusion via vector summation (adopted from Arkin,
1998).

For example, adding another behavior, move_to_goal that is represented with an attractive
potential field and uses the shaft encoders to sense if reach the goal position. The
move_to_goal behavior exerts an attractive field over the entire space; wherever the robot is,
it well feel a force from the goal. The runaway behavior exerts a repulsive field in a radius
around the obstacle. Extending our example in the previous section with the two behaviors
coded in java as follows:
vector output, vec1, vec2;
While (isAlive())
{
vec1 = execute_runaway();
vec2 = execute_move_to_goal();
/* you can multiply the two vectors by their gain values. */
output = vec1 + vec2;
/* combine the two vectors summation. */
turn (output.direction);
forward(output.magnitude* motor_schema.MAX_VOLICITY);
}

3.7 Assignments
1. Describe the difference between the two dominant methods for combining behaviours
in a reactive architecture, subsumption and potential field summation.
2. Suppose you were to construct a library of potential fields of the five primitives using
Java or C++. What parameters would you include as arguments to allow a user to
customize the fields?

CHAPTER IV

MAPPING, LOCALIZATION AND NAVIGATION

The behavior-based robotic community generally views the use of symbolic
representational knowledge as hindrance to efficient and effective robotic control, where as
the symbol-grounding problem arise [Arkin, 1998]. The symbol-grounding problem refers to the
difficulty in connecting the meaning of an arbitrary symbol to the real entity or event. So
that if we want our robots to act knowledgeably in order to do navigation and other task,
various types of knowledge are necessary. The suggested types of knowledge are:
Spatial world knowledge: these knowledge deals with understanding of the navigable
space and structure surrounding the robot.
Object knowledge: these knowledge deals with recognizing the categories or instances of
particular types of things within the world.
Behavior knowledge: these knowledge deals with understanding of how to react in
different situations.
Perceptual knowledge: information regarding how to sense the environment under
various circumstances.
In this chapter, we are concerned with the spatial world knowledge, which assist in the
robotic navigation task. The problem of navigation can be broken down into four
questions: What should I remember?, Where am I?, Where am I going?, and How should I get
there?. The first question has been termed mapping problem, that is, how can the navigator
acquire the spatial knowledge by integrating observations gathered over time to create
cognitive mapping, and use it to find routes and determine the relative positions of the places.
The need for cognitive mapping is evident, since the robot must store information from its
observations and experiences to use later for effective navigation. The second question has
been termed the localization problem, that is, how can the navigator determine its location in
the environment, given its current view of the world (i.e., cognitive map) and what it has
encountered while traveling thus far? The third and fourth questions are those of specifying
a goal and finding an appropriate path to that goal. Robotic research in path planning and
obstacle avoidance is concerned with these last two questions. However, the localization
problem is central to the task of robotic navigation. Without some idea of its current
location, the navigator will be unable to follow a proper path to reach its goal. For example,
consider the case of a hypothetical robot delivery vehicle. The robot must navigate through
busy city streets to reach its destination while avoiding obstacles in its path (the other cars)
and obeying traffic laws. But in order to do so, the robot must have accurate information
of its location: its position in the traffic lane, its location in the city, whether or not it has
reached the goal, etc. Furthermore, the task may be assisted by the ability to recognize
landmarks or distinctive areas on the streets of interest. Navigation and mapping are crucial
to all mobile robot systems. Navigation and mapping tasks include sonar sensor
interpretation, collision avoidance, localization and path planning.

4.1 Mapping
Mapping is the process of constructing a model of the environment based on the
sensors measurements. There are different approaches to representing and using spatial
information. On one side, there are purely gird-based maps, also called geometric or metric maps.
In these representations, a single, global coordinate system in which all mapping and
navigation takes place defines the robots environment. In geometric mapping, low-level
geometric primitives are extracted from sensor data to build up high-level abstractions of the
environment. Thus, the world might be modeled using wire-frame, surface boundary, or
volumetric representations, for example. This approach is appealing because it provides a
concise representation that is easily manipulated and displayed by computers and is readily
understood by humans. However, the geometric paradigm has drawbacks in that it leads to
world models that are potentially fragile and unrepresentative of the sensor data because it
relies heavily upon a priori environment assumptions and heuristics [Elfes, 1987]. An
alternative to the geometric paradigm is the sensor-based paradigm. Here, the environment is
represented as a collection of the sensor readings, obtained through sensor models. No a
priori assumptions or models of the world are employed. Most typical of this approach are
cellular representations such as occupancy grid [Elfes, 1989]. However, the unrealistic storage
requirements needed to represent each cell of the grid explicitly, one alternative is to take
the advantages of the fact that many of the cells will have a similar state, especially those
cells that correspond to spatially adjacent regions. The general approach to solve this
problem is to represent the space using cells of a non-uniform shape and size such as
quadtrees [Nebot and Pagac, 1995]. As shown in Figure 4.1, the map is a grid with each cell
of the grid representing some amount of space in the real worlds.

(a)

(b)

Figure 4.1: The environment represenation (a) grid-based mapping (b)

quadtree mapping.

These methods make weaker assumptions about the environment than in the geometric
paradigm, and usually have a high sensing to computation ratio, since no high-level
abstractions are generated. These approaches work well within bounded environments
where the robot has opportunities to realign itself with the global coordinate system using
external markers. On the other side are topological maps, also called qualitative maps, in which
the robots environment is represented as places and connections between places. Each
node in such a graph corresponds to a distinct place or landmark as shown in Figure 4.2.

Arcs connect these nodes if there exists a direct path between them. This approach has the
advantage that it eliminates the inevitable problems of dealing with movement uncertainty.
Movement errors do not accumulate in topological maps as they do in maps with global
coordinate system since the robot navigate locally.

Figure 4.2: representation of a floor plane as topological maps.

4.1.1 Occupancy Grids

Moravec and Elfes [1988; 989] introduced the statistical occupancy grid method for mapmaking as a way of making detailed geometric maps using noisy sensors. A statistical
occupancy grid is an array where the content of each cell is based on the confidence that
the part of the environment corresponding to that cell is occupied. The main idea behind
the statistical occupancy grid is that, since range data is inexact, the content of each cell
should be a probability: a cell containing a high probability is likely to be occupied, while a
cell with low probability is likely empty. The ultrasonic sonar sensors have often been used
in implementations of the certainty grid system. A typical ultrasonic sensor using time-offlight measurements yields precise range readings for object location, but lacks angular
resolution. Figure 4.3 shows the cross section of an ultrasonic signal intercepting an object
in the environment.

Figure 4.3: Cross section of a conical sonar signal. Any object along arc A
will yield a sensed distance of d.

Since the sensor uses merely the first echo it receives for range calculations, it is
impossible to determine the exact angular location of the reflecting object with respect to
the acoustic axis; only its radial distance can be measured. The arc labeled A depicts the
angular uncertainty of the ultrasonic sensor - it is bounded by the width of the ultrasonic
cone (typically 30 degrees). In his occupancy grid system [Moravec 1988; Elfes, 1989]
accounts for this uncertainty with a Gaussian probability function that models the error of the
ultrasonic sensor. Here, cells close to the sensed distance from the sensor and near to the
acoustic axis have a high likelihood of occupancy. Likewise, cells near to the ultrasound
sensor and close to the acoustic axis have a high likelihood of vacancy. After each sensor
reading, Bayes theorem (Equation 4.1) is applied to update the probability of occupancy of
each of the cells intersecting the ultrasonic cone. A Gaussian model is used as the evidence
of occupancy (OCC) or vacancy (VAC) and the current value of the cell provide the prior
probability of occupancy.

Pocc | s k 1

(4.1)

Here Pocc | s k is the current value of the cell (after k sensor readings), Pocc | s k 1

will be the new cell value, and Ps k 1 | occ and Ps k 1 | vac are obtained from the sensor

model. Initially, the occupancy status of a cell is unknown, thus we set Pocc | s 0 0.5 for
all cells. After several readings by multiple sensors at different locations in the
environment, a map representing the objects and empty space of the environment is
constructed. The algorithm summary is shown in Table 4.1.

Occupancy grid algorithm

Initialize all the cells of the map to a reasonable prior

Repeat:

Take a sensor scan.

For each cell on the map in sensor range, update the probability in that cell using
the sensor model and the current value in the cell as the prior.
Move to somewhere else.
Table 4.1: The occupancy grid algorithm.

4.1.2 Grid or Topological Maps: A Comparison

When using a grid, the environment is represented by evenly spaced grids indicating,
for example, the presence of an obstacle in the corresponding region of the environment.
When using the topological approach, the environment is represented by graphs. Each
node in such a graph corresponds to a distinct place or landmark. Arcs connect these
nodes if there exists a direct path between them. The grid map is very easy to build and
maintain and the relationship between the map and the environment is straightforward. In
addition the multiple viewpoints can easily be integrated by using a coordinate
transformation and calculation of shortest path is fairly easy. However, the space and

memory is consuming since the complexity of the map does not depend on the complexity
of the environment and as a result of the previous point the processing of these kinds of
maps can be very time consuming since the position of the robot itself must be known
accurately. Also it required more pre -processing when used by symbolic problem solvers
such as behaviors coordination algorithms.
On the other hand, the Topological map is difficult to construct and to maintain since it is
required recognition of landmarks and places and must have the sensory equipment for
doing this and even then, it is no simple task. It is efficient and faster in planning since the
exact robot position isnt that important. Saving the memory and space, since the
resolution depends on the complexity of the environment, and very convenient interface
towards symbolic problem solvers. However, the paths calculated based on topological
maps may not be optimal in terms of energy consumption or distance traveled. Both
methods have their advantages and disadvantages the most important ones are stated in
Table 4.2.
Grid Maps

Topological Maps

Building / Maintainability

easy

difficult

Memory space

huge

small

Create shortest path

yes

Planning speed

slow

fast

Table 4.2: Comparison between grid and topological maps.

4.2 Localization
An important sub-problem in the robot navigation task is that of localization. That is,
given accumulated knowledge about the world and current sensor readings, what is the
position of the robot within the workspace? Localization is particularly difficult for mapbased approaches that learn their maps, since the accuracy of a metric map depends on the
alignment of the robot with its map. This would be a trivial problem if the navigation
system could rely upon its velocity history (i.e., employ dead reckoning) to calculate its
global coordinates. Unfortunately, distance and orientation measurements made from dead
reckoning are highly prone to error due to shaft encoder imprecision, wheel slippage, and
irregularities in the surface of the workspace, etc. which accumulate over time. This odometry
and inertial navigation methods as relative position measurements, requires supplemental
information in order to periodically flush the metric error from the system and redetermine the robots position in the environment [Borenstein et. al., 1996].
The external measurement techniques, so called the absolute measurement techniques, are
often used in conjunction with the relative measurement techniques to improve localization
estimation. Examples of absolute measurement techniques are:
Guidepath: is one of the simplest forms of localization. The guidepath can be for
instance a wire (which transmits audio or radio signals) or a magnetic strip [Everett,

1995], which a robot can follow. Guidepaths indicates static paths in an environment
and are not suitable in applications where mobile robots should be allowed to move
flexibly.
Active beacons use a set of transmitters (beacons), which their locations in the
workspace are known in advance. The transmitters, in most cases, use light or radio
signals to announce their presence. The active beacons may transmit their identity
and current location. For a robot to be able to use the active beacons for its
localization, at some position in the workspace, at least three of the beacons have to
be visible or detectable at that exact position.
Artificial landmark recognition method tries to recognize objects or images with a
distinctive shape placed in the workspace. The positions of the landmarks are known,
and if three of more landmarks are detectable at certain position, and estimation of
the robot position can be calculated.

4.3 Navigation
The process of generating a sequence of actions that has to be performed in order for a
robot to move through its environment (also called workspace) autonomously and without
collisions is called navigation. Methods for navigation, in their most simple form, fall into
one of two categories, deliberative or reactive. The pure reactive methods offer real-time
motion response provided by simple computations and do not use any representations of
its environment. (i.e., it has no understanding of the layout of the workspace, e.g. where the
obstacle are situated). It is compute a function using the parameters obtained from external
and proprioceptive sensors on the robot. It can assure target achieving if it has already made the
way and learned it or if it was designed to follow a certain path.
Pure deliberative methods, on the other and are highly representation dependent and
express a higher level of intelligence. They use a priori knowledge to pre-calculate robot
motion. Typical way to utilize this knowledge is to plan a path through the workspace for
the robot, so called path planning. The path planner generates minimum-cost paths to the
goal(s) using the map or the workspace. There are a large number of approaches that
combine both reactive and deliberative navigation and seem to be the most effective ones.
They combine the efficiency of planning with the flexibility and adaptive ability of the
reactive systems. As a result for this integration, deliberative navigation provides
intermediate goals to the reactive collision avoidance behavior that controls the velocity
and the exact direction of the robots motion.

4.3.2 Reactive Navigational Approaches

Reactive approaches attempt to overcome the computational limitation of deliberative
approaches by offering fast solutions for real-time applications. The basic idea is to couple
perception with actuation, so that particular sensed patterns directly activate appropriate
motor commands. In reactive navigational, the robot planned a path using the topological
maps, also called qualitative maps, in which the robots environment is represented as places
and connections between places. These places or nodes represent gateways, landmarks, or
goal. Edges or connections represent a navigable path between two nodes, where the
35

robot path consisting of segments. Gateways or landmarks are needed so the robot can tell
when it has completed a segment another should begin (4.1). These distinctive places are
derived from the sensor readings and after the robot determines these observable and
ultimately recognizable landmarks, they can be integrated easily into behaviours control.
For example, Matari [1990] demonstrated the integration of qualitative maps and
behavior-based robotic systems in a subsumption-based system as shown in Figure 4.4.

Figure 4.4: Integration of topological map in subsumption architecture

(Adopted from Arkin, 1998).

Landmarks are derived from sonar data, using features that are stable and consistent
over time. In particular, right walls, left walls, and corridors are the landmarks features
derived from the sensors. Spatial relationships are added connecting various landmarks
through constructing the graph. Map navigation consists of following the boundaries of
wall or corridor regions.

4.3.3 Deliberative Approaches

Classical motion planners assume that full knowledge of the geometry of the robot's
environs is known a priori. The idea is to model the environment and the robot using
geometric structures (e.g. polygons, rectangles, circles) and then define the necessary
rotations and translations of these objects that will lead to the desired results. A famous
problem in this category is the piano-movers problem that deals with the maneuvers
required to move a long piano out of a room through a narrow door. Geometric methods
have rarely been applied to real robotic problems, due to the computational cost for
modeling and manipulating geometric objects. Additionally, they cannot be easily adapted o
dynamically changed worlds. However, the classical planners have the useful properties of
correctness and completeness. A path is correct if it lies wholly within F, and, if the goal is
reachable, connects the robot's initial position with the goal. Whereas any useful path
planner produces correct paths, completeness is a highly desirable virtue: the planner will
generate a path if one is possible, and will halt otherwise in finite time. These approaches
use geometric methods to derive solutions to path planning and navigation problem.
4.3.3.1 Workspace and Configuration Space
Any robot operates in a subset of the real world. It called the workspace of the robot. It is
important that the exact position of the robot in its workspace with respect to some frame of

reference can be precisely specified. The configuration of the robot is a vector that holds the
current values of all those single-valued parameters that need to be specified in order to
identify uniquely the robots exact position. These parameters are broadly known as the
generalized coordinates of the robot. For a mobile robot the generalization coordinates might
be the Cartesian coordinates of its center and the orientation of its heading. The
configuration space C of the robot is simply the space where its configuration vector lives,
i.e. the set of all possible configurations, and its dimension is equal to the number of
generalized coordinates [Lozano-Prez, 1983]. Since the robot can only be in a single
configuration at any time, it is represented as a single point in C. obstacles in the workspace
implies that certain areas are not accessible by the robot.
The configuration q of a robot A specifies the exact position and orientation of A relative
to a fixed reference frame. Therefore, the configuration space (often referred to as "C-obstacle")
of A is the set of all possible configurations of A. For a robot with n degrees of freedom
(DOF), its C-space, C, is an n-dimensional space (typically an n-dimensional manifold),
since there is a one-to-one, onto map between q and a specific arrangement of the robot's
joints and the robot's position within its (real world) workspace. Obstacles are mapped into
C-space by determining which configurations of the robot produce collisions with an
obstacle; these configurations are deemed forbidden. Let A(q) denote the location of A's
particles when A is in configuration q. A C-space obstacle (or "C-obstacle") associated with a
physical obstacle B is defined as:

CB q C | Aq B

(4.2)

The complement of the C-obstacles is termed the freespace," F:

F C \ i CBi

(4.3)

Motion plans are constructed in F. Note that the closure of freespace includes the
obstacle boundaries. Since generally a robot is allowed to touch an obstacle (but not
penetrate it), the word "freespace" (and symbol F) will often be used to refer to the closure of
freespace as well. In the case illustrated in Figure 4.5 (part (a)), the robot is a rigid body in a
2D environment (its workspace), and it is capable of translating in any direction (i.e.,
horizontally and vertically) but not rotating. Thus, this robot has 2-DOF, and its
configuration space (shown in (b)) is equivalent to 2. Configuration space is a useful
concept for classical motion planners aimed at creating paths for (holonomic3) multi-DOF
robots. Classical motion planning is a two-step process. In the first step, the C-obstacles
are computed from knowledge of the physical obstacles and of robot geometry. The
second step constructs a path within F. A path in C of the robot A from the start
3

Holonomic robots are capable of motion directly between configurations: for example, a robot, which can
move forward, backward, and side-to-side freely in an obstacle-free 2D environment. An automobile is an
example of a non-holonomic vehicle; other techniques must be used (perhaps in addition to C-space) to
capture the inability of a car to move directly sideway.

configuration, q start , to the goal configuration, q goalt , is a continuous map : 0,1 C ,

with q0 q start and q1 q goal . The resultant path can then be translated back into
actual robot configurations through an interface with the robot's automatic controllers
[Hwang and Ahuja, 1992].

Figure 4.5: A sketch of the configuration space of an oblong robot with

the two polygonal obstacles shown in (a). The robot can slide in any
direction, but not rotate. The polygons in (b) are the configuration space
obstacles corresponding to the point highlighted on the robot, which
would then be used by a motion planner to plot the robot's trajectory.

4.3.3.2 Roadmap Methods

Roadmap methods basically attempt to reduce the dimensionality of the space in which
path planning and navigation takes place. It attempts capturing the structure of the free
space by a set of on-dimensional curves and their interconnections, the roadmap. The result
is a graph-like structure, where paths can be planned easily. The full path consists of three
components: an initial path segment that connects the current configuration to the
roadmap, the main path that lies on the roadmap and the final segment that connects the
final configuration to the roadmap. The search can be performed using a variety of
approaches, but among the most is the A* algorithm. This search method uses a heuristic
measure to minimum cost paths. It is shown that the A* algorithm is admissible; that it
always find the shortest path and it is optimal [Doyle, 1995].
Well-known methods in this category are the visibility graphs and the Voronoi diagrams. The
visibility graph is a roadmap that can be used in two dimensional configuration spaces. It is
constructed by using all the vertices of the obstacles and the start and goal configurations
as the graph nodes, and the links are the line segments, which connect two nodes without
intersecting any obstacles. An improvement to this method is the reduced visibility or the
tangent graph. By removing all the line segments that are not tangents of the obstacles
produces these graphs. This reduces the size of the graph, which can make the structure
quicker to search. The tangent graph will always contain the shortest path between the start
and the goal, which will always be found when using the A* search algorithm. The Voronoi
diagram contains the set of points that are equidistant from two or more object features.
These create roadmaps that connect the initial and goal configurations by forming paths
consisting of line segments and parabolic arcs (for polygonal obstacles) that maximize the
clearance between the robot and the obstacles as shown in Figure 4.6. The space is
partitioned into regions each containing one feature and any point in a region is closer to

the feature in that region than to another feature. The resulting roadmap is the edge
between regions. This edge is the safest possible path as it avoids obstacles by the
maximum distance possible.

Figure 4.6: The road map approach.

A related problem in this case; is how the roadmap can be constructed (a

computationally heavy problem, in general) and how it can be modified in case the
environment changes dynamically. Roadmap methods are discussed in [Canny, 1988].
4.3.3.3 Cell Decomposition
The idea here is to decompose the free space into a large number of small regions, called
cells. The decomposition is such that adjacent cells define configurations, where the robot
can trivially navigate between them. The result is non-directed connectivity graph that represents
the adjacency of the cells. The sizes of the cells can be variables as well their distribution.
The nodes of the graph are free space cells, cells are only linked if the pass and adjacency
rule. Searching this graph form a start node, to find a goal will only successful if a channel
can be found linking the start and goal. The method can be divided into two categories:
exact and approximate cell decomposition. In Exact cell decomposition refers to the case where
the union of the cells is the exact free space as opposed to approximate cell decomposition
where the union of the cells is properly included in the free space. In Exact cell
decomposition the free space is divided into trapezoidal and triangular segments using the
external and internal obstacle vertices, as demonstrated in Figure 4.7.

Figure 4.7: Exact Cell Decomposition.

In approximate cell decomposition, cells usually have a standard predefined shape but
perhaps variable size. Any algorithm for graph search can realize path construction. The
details of the decomposition, however, can affect significantly the completeness and the
computational complexity of the algorithm, as well as the quality of the resulted path.
These methods are perhaps the best studied and have been applied widely for robot
navigation. Simple instances of such method are Quadtrees [Nebot and Pagac, 1995] and
grid-based representations [Moravec and Martin, 1996] as shown in Figures 4.8 and 4.9.

Figure 4.8: Approximate Cell Decomposition (grid-based

decomposition).

One example of the approx. cell decomposition method is the quadtree algorithm. This
algorithm represents C-space as a quadtree structure by performing the steps listed in Table
4.2 and the resulted decomposition is shown in Figure 4.9. The path is created at any time
by creating a connectivity graph based on the quadtree and computes a path through the
known portions F.

Quadtree cell decomposition algorithm

1. Place C-space as the root node and current cell.
2. Break current cell into n non-overlapping cells.
3. Check each cell for obstacles, noting if obstacles take up all, some, or none of the cell.
Represent each of these non-overlapping cells as children nodes.
4. If a cell is completely free, or completely full by obstacles it has no children, as the
characteristics of the entire cell are known.
5. If a cell is mixed (partially free, partially full by obstacles) repeat this process by
returning to step 2. (Note: Once the desired resolution is achieved, you may stop the
algorithm).
Table 4.2: The quadtree cell decomposition algorithm.

4.3.3.4 Potential Fields

Potential field methods are based on the construction of a function of configuration
vectors, called potential function, such that the value of the function is monotonically
increasing toward obstacles and monotonically decreasing toward the goal configuration
where it achieve the minimum value (cf. 2.3 for more details). The search progresses
iteratively, at each step the robot move a small pre-defined distance in the direction

determined by the field until the goal configuration is reached. Using only our potential
energy function U q , we can determine the path, which our robot should take through F
using the algorithm listed in Table 4.2.

Figure 4.9: Approximate Cell Decomposition (quadtree decomposition).

Figure 4.1: The potential filed approach.

Potential field path planning algorithm

1.
2.
3.
4.
5.
6.
7.
8.

Let e = Maximum distance traveled in each step.

Let k=1.
Find the gradient of the potential function = grad U q
Let u t be the direction opposite of the gradient vector. (Note: u t between [0..2))
Let new position xt 1 , yt 1 xt e cosu t , yt e sinu t

If xt 1 , y t 1 is within distance D of the goal, declare success.

If Q iteration have passed declare failure (indicates local minima is probably present)
Goto step 2.
Table 4.2: The potential field-planning algorithm.

4.3.3.5 Virtual Force histogram

Various heuristic based on the idea similar to the potential field have been used for mobile
robot control. The virtual field force (VFF) algorithm [Borenstein and Koren, 1989] for
example, constructs an occupancy grid on-line and then has obstacles (and the goal) exert
forces on the robot. As a potential field approaches, these forces drive the robot towards
the goal while driving it away from obstacles. Unlike the true potential fields however, the
VFF algorithm only considers forces from a small local neighbourhood around the robot.
However, the VFF faces reduce the spatial distribution of forces to a single vector quantity
and the detailed information concerning the local distribution of obstacles is lost. The vector
field histogram algorithm [Borenstein and Koren, 1991] overcomes this problem by
constructing a local polar representation of obstacle density. The local region around the
robot, defined by a circular window of width w, is divided into angular sectors, and
obstacles within each sector are combined to measure of that sectors traversablility. A
threshold is set and sectors above the threshold are termed impassable, whereas sectors
below the threshold are identified as candidate directions for the robot. The candidate
region most closely aligned with the robots preferred direction of motion is chosen as the
appropriate direction of travel. The robot and obstacles are defined in pre-existing
occupancy grid C expressed in Cartesian coordinates as x r , y r and xi , yi . The vector
direction is given by

i , j arctan

yi y r
xi x r

(4.4)

Where the vector magnitude is given by

mi , j ci2, j a bd ij2

(4.5)

Where c ij is the certainty of cell C ij in the grid, d ij is the distance to the cell C ij from the
robot, and a and b are constants.

4.4 Assignments
1. Implement a potential field path planning method and compute a 2-D trajectory a cross
a rectangular room with several non-convex obstacles in it.
2. Implement the visibility graph-based planner for an environment composed of
polygonal obstacles.
3. An important property of an occupancy grid is that it supports sensor fusion. Discuss
this sentence.

CHAPTER V

ROBOT LEARNING
In order to built this mobile robot system to come close for that the people dream with
them. Researches suggested that embedding learning techniques in it to incite them for
efficiency and prepare it to deal with unforeseen situations and circumstances in its
environment. Since that, the learning has often been neglected, and the designer of mobile
robot system has generally taken on the extremely difficult task of trying to anticipate all
possible contingencies and interactions among the robot components ahead of time. The
machine learning techniques ultimate goals are to deal with the noise in their internal and
external sensors and the inconsistencies in the behavior of the environment. Machine
learning focused on developing approaches and algorithms for artificial agent (i.e., robot) to
automatically acquiring knowledge, developing strategies and modifying its behavioral
tendency by experience.

5.1 Machine Learning in Mobile Robots

Research in machine learning has undergone a tremendous growth. This field is
contained within the broader confidences of artificial intelligence, and its attraction for
researchers stems from many resources. Machine learning focused on developing
approaches and algorithms for artificial agent (i.e., robot) to automatically acquiring
knowledge, developing strategies and modifying its behavioral tendency by experience. This
is especially important in mobile robot systems, where learning has often been neglected.
The designer of mobile robot systems have generally taken on the extremely difficult task
of trying to anticipate all possible contingencies and interactions among the robot and its
environment ahead of time. Therefore, the machine learning ultimate goal is to make
intelligent robots to deal with noise in their internal and external sensors and with
inconsistencies in the behavior of the environment. Machine learning has studied many
different types of learning. They are classified under two types of learning: supervised and
unsupervised [Russell and Norvig, 1995; Russell, 1996; Mahadevan, 1996a; Dietterich,
1997; Mitchell, 1997].
In supervised learning, a designer carefully selects training examples of the task to be
performed by robots, which must be supplied by an expert human or computerized.
The backpropagation algorithm for training artificial neural networks is one common
technique for supervised learning [Mitchell, 1997; Haykin, 1999]. If the training
information provided is more limited in the form of scalar, performance-related
reward, not indicate the correct action itself then learning method become a
reinforcement learning. ALVINN (Autonomous Land Vehicle In a Neural Network) is
a well-known application of supervised learning to robot control [Pomerleau, 1993].
The network receives as input an image of the road ahead and produces, as output the
steering command required keeping the car on the road.

Reinforcement learning algorithms can be used as a component of multi-robot systems.

Where the environment supplies the scalar reward signal indicates how good or bad the
robots action was. The mapping from state to action performed by the robot, the
policy, is the object modified during learning [Nehmzov and Mitchell, 1995]. Usually the
policy is based on a value function providing the estimate of the expected future reward to
receive in a state or for a state-action pair. The reward is said to be immediate when, and
only when an optimal policy is always taking action with the highest next reward
without considering future consequences. If such a policy is not optimal, rewards are
delayed. The reinforcement learning problems can be solved in principle using dynamic
programming algorithms [Barto at. el., 1995; Sutton and Barto 1998]. However, dynamic
programming requires a model of the environment, which is not available in dynamic
interactive tasks and time that is polynomial in the number of the states. Some recent
reinforcement learning algorithms have been designed to perform dynamic
programming in an incremental manner without the need of state transition
probabilities and reward structure of the environment by enhancing the performance
on-line while interacting with the environment. Reinforcement learning algorithms can
be used as components of multi-robot algorithms. If the members of a group of robots
each employ one of these algorithms, a new collective algorithm emerges for the group as
a whole. This type of collective algorithm allows control policies to be learned in a
decentralized way.
Unsupervised learning, on the other hand, doesnt provide a feedback on the learning task.
Unsupervised learning techniques perform a clustering of incoming information
without knowing the target output. Unsupervised learning can identify clusters of
similar data points; enable the data to be represented in terms of more orthogonal
features. This reduces the effective dimensionality of the data, enabling representation
that is more concise and supporting more accurate supervised learning [Mitchell, 1997;
Haykin, 1999]. One of unsupervised learning paradigm is Kohonen map, which is
frequently used, in autonomous robots, most often for categorization, map-building
and motor control tasks. A well-known application of Kohonen network for location
recognition. The robot learns to recognize a particular location based on information
from motor signals [Nehmzow and Smithers, 1992].
From the different approached to machine learning, I have chosen the reinforcement
learning because it suitable for lifelong learning because the a priori knowledge required by
the designer of the learning robot consists only of a good specification of the task to be
learned.

5.2 Reinforcement Learning

The artificial intelligence, a very general model of environment is assumed to be
deterministic and by planning and heuristic search typically attempt to construct open-loop
control policies. Such methods are very general in that they can find correct policies for
complex, non-linear environment if the model is accurate, perception completes, and
controlled system is deterministic. In complex dynamic environments for which we dont

posses models, and in which the robot may be hampered by incomplete perception and
unreliable effectors. On the other hand, control theory gives us method for constructing
closed-loop adaptive control policies that depends on the environments state and the time
[Singh 1994; Barto et. al., 1995]. Techniques for constructing such control policies include
analytical methods for simple (linear) systems, and numerical methods such as dynamic
programming for more complex (non-linear and stochastic) systems. These methods are for
constructing control policies off-line, given accurate environment model [Wyatt, 1997].
When it is trying to design adaptive control policies for robots we typically have no model
of the environment. Thus the machine learning arises for developing on-line and model-free
approaches and algorithms for constructing robots closed-loop policy by automatically
acquiring knowledge, developing strategies and modifying its behavioral tendency by
experience [Kaelbling at. el., 1996; Sutton and Barto, 1998; Keerthi and Ravindran, 1999;
Mahadevan 1996b; Russell 1996; Mitchell, 1997; Dietterich 1997]. Reinforcement learning (RL)
which describes a large class of learning tasks and algorithms characterized by the fact that
the robot learns an associative mapping, X A (from state space X to the action
space A) by maximizing a scalar evaluation (rewards) of its performance form the
environment. The robot learns strategies for interacting with its environment in closedloop cycle as shown in Figure 5.1. This is the basic reinforcement-learning model where the
robot environment interaction is viewed as a coupled dynamic system in which the outputs
of the robot are transferred into inputs to the environment and the outputs of the
environment are transformed into inputs to the robot. The inputs to the robot usually do
not capture a complete description of the environments state and the environment
changes continuously through time. Furthermore, the robot influences the future sequence
of its inputs by its outputs at each stage. Through that, the robot explore its dynamic
environment, attempting to monitor the environment and figure out by itself how it can
exploit acquired experience to maximize its reward, which will receive based on the
sequences of the actions it takes.

Figure 5.1: The basic reinforcement-learning model. The robot interacts

with an environment in a feedback cycle.

Learning solely from reinforcement signals, makes the environment is a powerful force
shaping the behaviors of autonomous robots. Robots can also directly shape their
environments. Because of this interaction of shaping forces, it is important to develop
autonomous robots in a system where all of the forces for change are learnable to optimize

system performance over its lifetime. When learning control policies, we must be able to
evaluate them with respect to each other. The robots goal is to find, based on its
experience with the environment, a strategy, or an optimal policy, for choosing actions that
will yield as much reward as possible. The most obvious metric, the sum of all rewards over
the life of the robot,

r t,

t 0

(5.1)

is generally not used. For robots with infinite lifetimes, all possible sequences of rewards
would sum to infinity. This is not the case for robot with a finite lifetime, however. In this
case, the obvious metric turns into the finite horizon measure,
k

r t.

(5.2)

t 0

This measure sums the rewards over some finite number, k , of time steps. Average
case,
1 k
r t.
k k t 0
lim

(5.3)

Extends this by using the average reward received over the whole lifetime of the robot.
The infinite horizon discounted measure,

r t.

t 0

(5.4)

Uses a discount factor, 0 1, to controls the relative importance of short-term and longterm reinforcements. As 0 the short-term reinforcement signals became more
important. When 0 the only reinforcement signal that matters is the immediate
reinforcement signal, and we obtain the one-step greedy policy; the best action is the one
that gives the greatest immediate reward. The optimal finite horizon policy may be nonstationary in the sense that different actions may be selected for the same state at deferent
time steps. In problems with hard time lines, finite horizon policies are the right choice. In
problem without hard deadlines, k , and an infinite-horizon policy is desired. It is theorem
that the optimal infinite horizon policy is stationary [Haykin, 1999]. Most of the
reinforcement-learning problems are typically cast as Markov decision processes (MDPs)4 as
shown in Figure 5.2.
4

Complete observaabilty of the state is necessary for learning methods based on MDPs. In the case of noisy and incomplete
information for the robot observations, the perceptual aliasing or hidden state problem arise. So, the MDP famework
needs to extend by partially observable Markov decision process (POMDP). See Kaelbling et al., ( 1996) for the stratiges
implemented to deal with this problem.

Figure 5.2: The Markov decision process model.

In Markov decision process there is a finite set of states, S , a finite set of actions, A , and
time is discrete. The reward function
R:S A

(5.5)

returns an immediate measure of how good an action was. The resulting state, st 1 , is
dependent on the transition function

T : S A S

(5.6)

Which returns a probability distribution over possible next states. An important

property of MDPs is that these state transitions depend only on the last state and action.
This is known as the Markov property. The problem, then, is to generate a policy, : S A ,
based on these immediate rewards that maximize our expected long-term reward measure such
as:
t

*
V s0 E R st .
t 0

(5.7)

Where E indicates expected values, and R s, a T s, a, s Rs, a, s is the expected

reward for state-action pair. If we know the functions T and R then we can define an
optimal value function, V * s , over states:

R s, a T s, a, s * s .
*
V ( s) max
V

a
S

(5.8)

This function assigns a value to each state, which is the best immediate reward that we
can get for any action from that state added to the optimal value from each of the possible
resulting states, weighted by their probability. If we know this function, then we can define
the optimal policy, * by simply selecting the action, a , that gives the maximum value:
R s, a T s, a, s * s .
* ( s) arg max
V

(5.9)

Equation (5.9) is called Bellmans optimality equation. It provides a mean of computing the
optimal policy for the Markov decision process by performing one-step look-ahead search

and choosing the action whose backup value is the largest. There are well-understood
dynamic programming based methods for computing V * such as value iteration and policy
iteration, which lead to a simple procedure for learning the optima value function, and hence
the optimal policy. The two algorithms, policy iteration and value iteration, are off-line and
model-based methods, they need to learn T and R functions [Dietterich, 1997]. Instead of
learning T and R , however we can incrementally learn the optimal value function directly.
In the next sections, we describe will known algorithms that attempts to iterative
approximate the optimal value function.

5.2.1 Monte Carlo Learning

Dynamic programming methods require complete environment dynamics to find the
optimal value function. However, in problems domains, full environmental information is
not available, or there is no model of the environment. For such cases, environment must be
observed through experience. Monte Carlo methods provide some means of making this
possible. Monte Carlo methods are for episodic tasks only, because they are based on
complete averaged returns. By episodic we mean that task somehow (with success or
failure) terminates and then restarts. MC methods learn episode-by-episode manner, rather
than step-by-step manner. This implies that the value function estimates and policies are
changed after each episode and remain same until the episode complete. Monte Carlo
methods average the values of the states that are visited in an episode. After many episodes
and many visits to each state, the averaged values became approximately correct
estimations.

First-visit Monte Carlo Algorithm

Initialize
Policy to be evaluated
V(s) an arbitrary state-value function.
Returns (s) an empty list, for all sS
Do forever:
Generate an episode using .
For each state s appearing in the episode:
R return following the first occurrence of s.
Append R to Returns (s)
V(s) average (Returns (s))
End Do
Table 5.1: First-visit Monte Carlo method for estimating

V s .

In order to estimate V s , the value of state s under policy , given a set of episodes
obtained by following and passing through s , we first define each occurrence of state
s in an episode to be a visit to s . Then, every-visit Monte Carlo method estimates V s as the
average of the returns following all the visits to s in a set of episodes. Another approach,
called first-visit Monte Carlo method (as shown in Table 5.1) averages just the returns following

first visit to s . Both first-visit and every-visit Monte Carlo converge to V s as the
number of visits (or first visits) to s goes to infinity [Kaelbling at. el., 1996; Sutton and
Barto, 1998; Keerthi and Ravindran, 1999].

5.2.2 Temporal Difference Leaning

Temporal difference learning (TD (0)) [Sutton 1988] is a combination of Monte Carlo and
dynamic programming methods for learning a value function for states, V s , based on
state transitions and rewards, st , r t 1 , st 1 . Starting with a random value for each state, it
iterative updates the value-function approximately according to the following update rule:

V st V st st r t 1 V st 1 V st

(5.10)

There are two parameters; a learning rate, , and a discount factor, . The learning rate

controls how much we change the current estimates of V s based on each new
experience. Equation (5.10) forms the basis for the TD (0) algorithm which is really an
instance of a more general class of algorithms called TD (), with =0, where 0 1 is a
parameter that controls how much of the current difference r V t 1 st 1 V t 1 st is
applied to all previous states. As shown in Table 5.2, TD (0) looks only one step ahead when
adjusting value estates; although it will eventually arrive at correct answer, it can take quite a
while to do so. The general TD () is similar to TD (0) but is generalized to propagate
information across trajectories of states according to its eligibility, et st , which is a function
of that indicates how far in the past and how frequency the state has been visited.

V st V st st r t 1 V st 1 V st et st

(5.11)

The value of et st is updated each transition for all s S . There are two forms of
update equations. An accumulating trace updates the eligibility of a state using,

et 1 ( s) 1
et ( s)
et 1 ( s)

if s current state
otherwise

(5.12)

A replacing trace update is defined by,

1
et ( s)
et 1 ( s)

if s current state
otherwise

(5.13)

Under both mechanisms, the eligibility of a state decays away exponentially when the
state is unvisited. Under an accumulating trace, the eligibility is increased by a constant
every time the state is visited and under a replacing trace, eligibility of a state is reset to a
constant on each visit. An eligibility trace can be thought of as a short-term memory process,
initiated the first time a state is visited by a robot. The degree of activation depends on the
49

recency of the most recent visit and on the frequency of visits. Thus, eligibility traces
implement two heuristics, a recency heuristic, and a frequency heuristic [Sutton and Barto, 1998].
Sutton [1988] and others [Dayan, 1992; Dayan and Sejnowski, 1994] note that TD ()
converges more quickly for 0. Learning the value function V s for a fixed policy can be
combined with a policy-learner to get what is known as an actor-critic or an adaptive heuristic critic
system [Barto et. al., 1983]. This alternates between learning the value function for the
current policy, and modify the policy based on the learned value function.

TD (0) Algorithm
For each (sS), initiate the table entry V(s) to 0.
Observe the current state s.
Do forever:
Select an action at using st and , execute it.
Receive immediate reward rt+1.
Observe the new state st+1.
Update the table entry for V(s) as Eq. (3.9).
Set st to st+1.
End Do
Table 5.2: TD (0) learning algorithm for estimating

V s .

5.2.3 Q-learning Algorithm

Instead of maintaining both a policy and value function using actor-critic techniques, Qlearning [Watkins, 1989] performs essentially the same function by combines them into a
single function, known as the Q-function. This function, Qs, a , reflects how good it is, in
a long-term sense that depends on the evaluation measure, to take action a from states.
On other words, the quantity Qs, a gives the expected cumulative discounted reward of
performing action a in state s and then pursuing the current policy thereafter. It uses 4tuples st , at , r t 1 , st 1 to iteratively update an approximation to the optimal Q-function,
*
*
Q s, a Rs, a T s, a, s max Q s, a .
S

(5.14)

Once the optimal value function is known, the optimal policy, * s can be easily
calculated:
*
Q s, a .
* s arg max
a

(5.15)

Starting with a random the Q-function can be approximated incrementally according the
following rule:

Qst , at Qst , at r t st , at max Qst 1 , a Qst , at

(5.16)

The Q-learning algorithm is summarized in Table 5.3. Watkins and Dayan [1992]
proved the iteration converges to Q*(s, a) under the condition that the learning set includes
an infinite number of episodes for each state and action. If the function is properly
computed, an agent can act optimally simply by looking up the best-valued action for any
situation. The Q-learning chooses the optimal action for state based on the value of the Qfunction for that state. If only the best action is chosen, it is possible that some actions will
never be chosen from some state. This is bad, because we can never be certain of having
found the best action from a state unless we have tried them all. So, we must have sort of
exploration strategy, which encourages us to take some non-optimal actions in order to
gain information about the world. This is known as the exploration/exportation problem
[Thrun, 1992a, 1992b; Wiering, 1999].
Q-learning Algorithm
For each pair (sS, aA), initiate the table entry Q (s, a) to 0.
Observe the current state s.
Do forever:
Select an action at and execute it.
Receive immediate reward rt+1.
Observe the new state st+1.
Update the table entry using real-experience for Q (st, at) as Equation (3.16).
Set st to st+1.
End Do
Table 3.3: Q-learning algorithm.

The simplest exploration strategy follows a policy of optimism in the face of uncertainty.
When the effect of an action is poorly understood, then it is usually better to take that
action in order to reduce this uncertainty. For example, introducing and propagating
exploration bonuses to artificially raise the estimated value of untried actions [Sutton, 1990a;
Wyatt, 1997; Wiering, 1999]. Another approach is -greedy strategy, is to select a random
action a certain percentage of time , and the best action otherwise [Sutton and Barto,
1998]. The last approach attempts to take into account how certain we are that are god or
bad. It is based on the Boltzmann or Gibbs distribution and is often called soft-max
exploration. Where at each time step, action a is chosen from state s with the following
probability:

Pra | s

Qs,a / T

eQ s , a / T

(3.17)

The parameter T 0 is called the temperature. Varying the temperature parameter

provides a continuum of stochastic policies, varying between the random policy at height
temperature and greedy policy at low temperature. Even at fixed temperature, the nature of
the policy will change as the system learns, because a lot of exploration occurs in states

where Q-values for different actions are almost equal, and little exploration in states where
Q-values are very different.

5.2.4 SARSA Learning Algorithm

While Q-learning update Q-values by making use of an assumption that the currently best
action will be taken at the next step, there is a simple scheme which makes use of the realworld experience instead. This approach is called SARSA, which is introduced by
Rummery and Niranjan [1994]. SARSA is similar to Q-learning in that it attempts to learn
the state-action value function Qs, a . The main difference between SARSA and Qlearning, however, is in the incremental update function. Actually, the max operator is
dropped from update equation and state action value of the next state-action is used. The
update equation is:

Qst , at Qst , at r t st , at Qst 1 , at 1 Qst , at

(3.18)

Just like TD, SARSA learns the value for a fixed policy and must be combined with
policy learning component in order to make a complete reinforcement system.

5.3 Which Algorithm Should We Use?

The algorithms presented in previous sections have all been shown to be effective in
solving a variety of reinforcement learning tasks. However, there is a fundamental
difference between Q-learning and the other that makes it much more appealing four our
purposes. The temporal difference and SARSA algorithms are known as on-policy
algorithms. The value function that they learn is dependent on the policy that is followed
during learning. Q-learning on the other hand, is an off-policy algorithm. The learned policy is
independent of the policy followed during learning. Using an off-policy algorithm, such as Qlearning, frees us from worrying about the quality of the policy that we are follow during
the learning, since we might not know a good policy for the task that we are attempting to
learn. Using an on-policy algorithm with an arbitrary bad training policy might cause us not
to learn the optimal policy. Using an off-policy method allows us to avoid this problem by
using exploration strategies to maximize the information obtained during learning [Smart,
2002]. There are still potential problems, however, with the speed of convergence to the
optimal policy when using training policies that are not necessarily very good.

5.4 Planning in Reinforcement Learning

Reinforcement learning algorithms learn from experiences obtained interacting with the
environment. Although making errors are required to learn, they may cost robots life
(Hazards situations). If the robot has an almost correct model, it can produce can produce
hypothetical experience to learn from them, since the learning process for an optimal policy
needs many experiences [Lin, 1992; 1993; Mahadevan and Connell, 1992]. From an AI
point of view, the optimal policy might obtained if the system had carried out explicit
planning, which can be thought as learning the possible future robot-environment
interactions, from the con-specifics to determine long-range consequences. Planning is

learning from these hypothetical experiences and useful when making errors are costly, robot
has less real experiences from the interaction with the environment, and speeding up the
learning process is urgently [Sutton, 1990a; 1990b; 1991; Clouse, 1997]. The earliest use of
planning in the reinforcement learning was the Dyna architecture as shown in Table (5.4)
that integrates learning and planning which proposed by Suttons [1990a; 1990b; 1991]. The
planning is treated as being virtually identical to reinforcement learning except that while
learning updates the appropriate value function estimates according to real experience,
planning differs only in that updates these same value function estimated for hypothetical
experience chosen from the world model. The model component of Dyna algorithm is a
table indexed by state action pairs and contains the resultant state and a reward as an entry
[Sutton and Barto, 1998]. The table is filled while real experience gathered. When state and
action space are too large, storing all experience become impractical. There is a version of
Dyna called Prioritized Sweeping [Moore and Atkeson, 1993] and closely related Queue-Dyna
[Peng and Willams, 1993] that store fixed number of most useful experiences and updates
experience tuples for which there will be large update errors using prioritize value functions.
Lin [Lin, 1993] proposed the experience replay method, where the past useful experience are
stored in a fixed length list and presented to the reinforcement learning again and again in
reverse order. This is also a kind of planning because some costly experiences from the
past are used to learn from once again. This is useful for learning in real worlds for which
exploring new experiences is often much more expensive than replaying old experiences.

Q-Dyna-Learning Algorithm
For each pair (sS, aA), initiate the table entry Q (s, a) to 0.
Observe the current state st.
Do forever:
Select an action at and execute it.
Receive immediate reward rt+1.
Observe the new state st+1.
Update the table entry using real-experience for Q (st, at) as Equation (3.16).
Set st to st+1.
Update Model using this real experience.
Repeat k times
Predict reward rt+1 and next state st+1 from the Model
Update the table entry using hypothetical experience for Q (st, at) as
Equation (3.16).
End Repeat
End Do
Table 5.4: Q-Dyna learning algorithm.

5.5 Designing the Reinforcement Functions

Many researchers recognized that for any real artificial agent implementation, there is no
convenient reinforcement function in the world inserting appropriate values into the robot
[Karlsson, 1997]. The reinforcement function is thus depicted as a part of the robot that

translates the observed state into a reinforcement value. That is, part of the robots a priori
knowledge is the ability to recognize states and assign a value to them. Balch [1998b]
provides taxonomies for various kinds of reinforcement functions that have developed for
multi-robot systems based on numbers of issues. The taxonomy of multi-robot
reinforcement functions is to examine the source of reward, relation to performance, locality, time
and continuity. In order to define the reinforcement function, the designer distills a large
amount of task specific knowledge into a number. Thus, the reinforcement function usually
closely coupled to the performance metric for a task. Since learning robots strive to
maximize the reward signal provided them, performance is maximized when their reward
closely parallels performance. It is sometimes the case however, in many embedded
autonomous agent applications, especially in multi-robot systems, robots cannot or should
not be rewarded strictly according to overall system performance. For example, the
reinforcement function designer will encode a notation of what classes of states represents
the robots goal. However, the robots sensors do not provide enough information for an
accurate computation of performance, reward is delayed or performance depends on the
actions of other robots over which the agent has limited knowledge and/or control [Balch,
1998b]. Since the reinforcement function is the robots only indication of how well it is
meeting its goals, it must be designed with great care. For all practical purposes, the
reinforcement function defines the robots task. For complex problems, it may be necessary
to refine the reinforcement function as a robot discovers ways to get the reinforcement
without accomplishing the task. In order to design a correct a reinforcement function,
there must first be a clear idea of what the robots goals should be and the reinforcement
function parallels the performance metric [Balch, 1998b]. Given a simple goal, it is
straightforward to define a reinforcement function leading to the proper behavior. The
reinforcement learning gains much of its power from the ability to use reinforcement
function to specify different degrees of goal satisfaction, and identify sub-goals of a
complex task. Rewarding sub-goals is one way to encode domain, task knowledge, and
speedup learning [Matari, 1994c]. The relative magnitude of reinforcements needs to
reflect the relative importance of the states being rewarded. One could prioritize sub-goals
by assigning reinforcements of greater magnitude to the more important ones [Karlsson,
1997].

5.6 Other Reinforcement Learning Techniques

There are other two reinforcement-learning techniques that are not cast in Markov
decision process framework. The two techniques take differing approaches, with disparate
assumptions, and algorithms to endow the robot with the capacity to learn to solve
problems. These techniques that we present in this section are: (1) evolutionary and (2)
statistical reinforcement learning [Holland, 1986; Goldberg, 1989; Moriarty at. el., 1999;
Maes and Brooks, 1991; Parker, 1994, 1995b; 1998a; 2000b]5.
5

Evolutionary learning is a form of learning. The main difference between evolution and learning is concerned with how the
adaptation process is carried out in space and time. Evolution is based on a population of individuals, but learning taking
place within a single individual. Evolution capture relatively slow environmental changes on the other hand, learning
capture environmental changes faster.

5.6.1 Evolutionary Reinforcement Learning

Evolutionary learning algorithms [Goldberg, 1989; Michalewicz 1994; Koza, 1992] are
used to search the space of decision policies, and use a direct mapping that associates state
descriptions directly with a recommended action. Evolutionary algorithms encoded the
potential policies into structures, called chromosomes. During each iteration, the evolutionary
algorithms evaluate policies and generate offspring based on the fitness of each solution in
the task. Substructures, or genes, of the policy are then modified through genetic operators
such as mutation and crossover. The idea is that policies that lead to good solutions in
previous evaluations can be mutated or combined to form even better policies in
subsequent evaluations. The policies valuated by a so-called fitness or objective function,
indicating its utility or immediate rewards as of a candidate policy [Moriarty at. el., 1999].
Several researchers employed an evolutionary reinforcement learning, which combines
genetic evolution with rich variety of structures that been but under evolution likes neural
networks [Lund 1995; Floreano 1997], symbolic rule-based [Grefenstette and Schultz
1994], and fuzzy rules [Hoffmann, 1998] for robot control. The evolutionary learning
algorithms are off-on and model-free approaches, however they address the credit assignment and
statistics information implicitly based on the fitness evaluations of the entire policy sequence
obtain [Moriarty at. el. 1999]. In addition, evolutionary learning algorithms required a lot of
memory space and evaluation time for the policies, which make them actually off-line
learning techniques [Matrai and Cliff 1996]. The co-evolution has been used for evolving
robust cooperative and competitive behavior in societies, but has not yet been applied to
evolve control policies for physical robots [Matrai and Cliff 1996]6.

5.6.2 Statistical Reinforcement Learning

Statistic learning is deals with learning a world model using analysis and interpretation of
data. These world models are usually not known a priori, it is possible to learn the world
model from experiences. In statistical reinforcement algorithms - by using of a mathematical
world model combining a set of parameters - the robot try to learn the parameters
adjustments or altering the precondition list for the behaviors activation using the collected
data through monitoring the performance of the system through accumulating a statistical
factors. Such that correlation between the reinforcement signal and behavior activation [Maes
and Brooks, 1991], the average, and standard deviations of the task performance [Parker, 1994;
1995b; 2000b]. These mathematical world models guided the system to learn the conditions
of activation and selection of the behaviors through feedback signals. Furthermore, they
are useful to explain causal relationship, to explain why the robot prefers some choice over
another, and to report what the robot has been ding most of its time.

The combining of the evolution and learning during the life of the robot is one way to overcome the lengthy evolutionary
time scale. The learning can improves the search properties of artificial evolution by making the controller more robust to
changes occur very fast.. This is typically achieved by evolving neural controllers that can lean with a reinforcement
learning or back-propagation.

5.7 Comparison Between Learning Techniques

One can characterize the techniques along dimensions such as: the memory space used by
these algorithms, tracking changing world and noise, world-model is learned for an optimal
policy, and the optimality of the learned policy. One can also discuss how each technique
deals with the credit assignment and explore/exploit problems. Finally, another quality on which
the methods differ is the speed with which each achieves its objective. One can compare
and contrast these techniques along these dimensions and notice their pros and cons.
Leaning to solve a problem needs to find a policy. Each algorithm uses different data
structure and memory allocation. In evolutionary learning algorithms a lot of memory
space and evaluation time is required for the policies, they create a population of candidate
policies to be evaluated, which make them actually off-line learning techniques [Matrai and
Cliff 1996]. The other techniques, they reserved only one policy and gradually improve it.
Moreover, this enables them to track changing world and noise. The evolutionary learning
algorithms are independent of any model of the environment in learning the policy, the
temporal difference can be either model-free or model-based approach, and the statistical
approach is based completely on a world-model. The using of the world models speed up
learning process and reaching the objectives. In addition, world models enable to track
changing world and noise. The optimality of learned policy is also a characteristic of
concern. There is no evidence that a robot employ evolutionary learning will learn an
optimal policy, especially with premature convergence [Goldberg 1989; Michalewicz 1994]. For
statistical learning, the barely have empirical evidence [Maes and Brooks, 1991; Parker,
1994; 1995b] that the learner will develop a useful policy at all, let alone an optimal one. On
the other hand, there is much theoretical support for the claim that temporal reinforcement
learning robot will eventually learn an optimal policy. Many convergence proofs for TD ()
and Q-learning exist [Watkins 1989; Watkins and Dayan 1992; Tsitsiklis 1994; Sutton and
Barto, 1998]. Reinforcement learning must face the credit assignment problem and the
explore/exploit tradeoff. Learning with an evolutionary learning algorithms address the credit
assignment and statistics information implicitly based on the fitness evaluations of the entire
policy sequence obtain [Moriarty at. el., 1999], with one exception for the classifier system.
Temporal difference learning addresses the credit assignment and statistics information
explicitly by distribution of credit and statistic for every state. On the other hand, statistical
reinforcement learning, address credit assignment explicitly by correlating samples
experience with reinforcement rewards. Finally, both of evolutionary learning and temporal
difference algorithms needs to explore their environment, but for statistical learning, there
is no reason for exploring since the experience data is cataloged and correlated with
reinforcement signals. The comparison is summarized in Table 5.5. In our comparison, we
point out why the temporal difference and statistical learning techniques are preferred in
the robotics and multi-robot domains [EL-Telbany et al., 2002b].
Evolutionary
learning

Temporal
Difference

Statistical
learning

Memory space

Large

Small

Tracking changes

Yes

Model-based learning

Yes

Slow

Middle

Fast

Optimality of policy

Mostly

Approached

Mostly

Credit assignment

Implicit

Explicit

Yes

Learning speed

Explore/exploit problem

Table 5.5: Comparison between different reinforcement techniques in a

set of characteristics.

5.8 Multi-Agent Robotic Systems

The decentralized control and distributed artificial intelligence became common in
everyday systems, where there are techniques for coordinating multiple intelligent mobile
robots gaining popularity in the robotic community. Certainly, a decentralized and
distributed solution is the only viable approach for many real-world application domains
that are inherently distributed in time, space, and/or behaviorally. Therefore, there has been an
upsurge of interest in multiple autonomous mobile robots engaged in collective behavior
that supporting and complementing each other. Many of these real-world applications are
space missions, operations in hazardous environments (fire fighting, cleanup of toxic waste,
nuclear power plant decommissioning, security), and military operations (Arkin, 1998;
Balch and Arkin; 1998). When compared with single robot, multi-robot systems offer a
number of advantages those include:
Distributed Actions: the robots can be separated over a wide area. The single
cannot be in different places at same time.
Parallelism: this leads to speed up a systems operations.
Scalability: the multi-robot systems are an open system. one may add a new robot
without major changes to the over all system.
Robustness or fault tolerance: being distributed are less prone to failures. Even if a
robot is damaged, the over all performance not be compromised (provide that there
is redundancy).
Simplicity: The multi-robot systems are modular, making programming and testing
of the individual robot easier. Moreover, integrating many functions in one robot is
more difficult than many robots have different function.
Despite the advantages, multi-root systems have some drawbacks:
How to coordinate multiple robots?
How they cope with the dynamic environments?
How the robotic team performance is measured?

Coordination and cooperation are a vital part of applying multiple robots to a mission in
an efficient manner. It is a form of interaction that lets the individual robot capable of
collaborating with other robots to accomplish that are beyond individuals capabilities.
Designing and implementing cooperative team of robots needs to decide how to resolve
organization problems. The learning of choosing organization scheme is based on the
improvement gained to complete the mission cooperatively within the efficiency
constraints (i.e., execution time, interference, and robustness). Gaining dynamic organization
behavior through learning lets the robots complement each other in cooperative manner and
to cope with the dynamics environments. This will exist different organizations, which
have different behaviors, and this behavior difference leads to difference in performance.
Therefore, no one organization is best suited for every mission. Dynamic assigning each
robot different roles through learning allows a group of robots to improve their
performance by providing constraints for robots by accepting tasks that meet their roles in
the organization based upon these robots skills and experience. As shown in Figure 5.3,
increasing the level of communication and dynamic task allocation leads to a virtual
centralized controller that direct the multi-robot systems for a coherent behavior. The
cooperative and adaptive character of robots behavior increases the organization level, which
demonstrated by high efficiency and decreasing execution time. In addition, preserve for
the robots in the team with their autonomous.

Figure 5.3: Dynamic team organization behavior gained through

learning, which lets the robots complement each other in cooperative
manner.

5.9 Communication Techniques in Robotics

Communications plays a large role in robot teams coordination and cooperation.
Communication is necessary for cooperation among multiple robots if and only if the
robots use the information that they communicate to enhance their performance as
individuals and/or as a group. Range, content, and guarantees for communication are
important factors in the design of social behavior. Communication is not an error free and
can be undependable. There are three kinds of communication used in robotics [Cao at. el.,
1997]. Explicit communication occurs when symbols representing data or commands are
exchanged between robots through direct channels. This active communication may be
local or global (broadcast). Implicit communication happens when robots gain useful

information by observing each other performing actions in the environment and

understanding of a robots actions. Using implicit communication, robots cooperate by
predicting the behavior, awareness of other robots or estimating reinforcements of other
from past observations and this type of information does not require the robot to model
another robots internal state [Parker, 1995a; Matrai, 1994a; 1997a]. This form of
communication is passive and local communication occurs when individuals sense other
individuals.
Stigmergic communication is when a robot gains useful information by observing the
changes other robot has made to the environment. Considering the environment as an
implicit communication channel that provides awareness of robot by another; usually it
refers to communication based on modification of the environment. All three types of
communication occur in the natural world. For instance, the honeybees dance and human
language are explicit. Implicit communication occurs when you approach a door at the
same time as another person and they hold open the door to allow you to pass through
first. Stigmergy is seen when one ant follows the pheromone trial another one laid to find
food. Neither ant ever communicated directly with the other they only changed or
observed the change in the environment [Cao at. el., 1997].

5.10 Collective Learning

Various forms of machine learning have been applied to robotic teams, including
reinforcement learning and imitation. Social learning is the process of acquiring new
(cooperative) behavior patterns by learning from others. Social learning includes learning
how to perform a behavior, and when to perform it. Mataric defines the basic forms of
social learning as imitation or mimicry and social facilitation [Mataric 1994]. Imitation
involves first observing another agents actions (either human or robot), then encoding that
action in some internal representation and finally reproducing the initial action.
Reinforcements can result from a robots action directly, from observation of another
robots actions, or from observations of the reinforcement another robot receives Tensions
between individual and group needs can exist. Robots may be strongly self-interested and
have no concern for the societys overall well being. Optimization in social robots usually
focuses on minimizing interference between robots and maximizing the societys reward.

REFERENCES
[1]

[2]
[3]
[4]
[5]
[6]
[7]

[8]

[9]

[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]

[18]

Albus S., (1981). A new approach to manipulator control: Cerebeller model

articulation controller (cmac). Journal of Dynamic Systems, Management and
Control, 97, 220-227.
Arkin R., (1989). Towards the Unification of Navigational Planning and Reactive
Control. In AAAI Spring Symposium on Robot Navigation, pp. 1-5.
Arkin R., (1992). Cooperation without Communication: Multiagent Schema Based
Robot Navigation. Journal of Robotic Systems.
Arkin R., (1998). Behavior-Based robots. MIT Press.
Arkin R., and Balch T., (1994). Communication in Reactive Multi-agent Robotic
Systems. Journal of Autonomous Robots, 1(1): 27-53.
Arkin R., and Balch T., (1997). AuRA: Principles and Practice in Review. Journal of
Experimental and Theoretical Artificial Intelligence.
Arkin R., and Balch T., (1998). Cooperative Multiagent Robotic Systems. In
Artificial Intelligence and Mobile Robots, Kortenkamp D., Bonasso R., and
Murphy R., (eds.), MIT Press.
Arkin R., and MackKenzie D., (1994). Temporal coordination of perceptual
algorithms for mobile robot navigation. IEEE Transactions on Robotics and
Automation, 10(3): 276:286.
Arkin R., Balch T., and Nitz E., (1993). Communication of behavioral State in
Multiagent Retrieval Tasks. In IEEE International Conference on Robotics and
Automation, pp. 588-594.
Balch T., (1997). Clay: Integrating Motor Schemas and Reinforcement Learning.
Technical Report GIT-CC-97-11, Georgia Institute of Technology.
Balch T., (1998a). Behavior Diversity in Learning Robots Teams. Ph.D. Thesis,
Georgia Institute of Technology.
Balch T., (1998b). Taxonomies of multi-robot task and reward. Technical Report,
Carnegie Mellon University.
Balch T., (1999). Behavioral Diversity as Multiagent Cooperation. SPIE99
Workshop on Multi-agents Systems, Boston.
Balch T., (2000). Hierarchical social entropy: an information theoretic measure of
robot team diversity. Journal Autonomous Robots.
Balch T., and Arkin R., (1998). Behavior-Based Formation Control for Multi-Robot
Teams. IEEE Transaction on Robotics and Automation, 14(6): 1-15.
Barto A., Bradtke S., and Singh S., (1995). Learning to act using real-time dynamic
programming. Journal of Artificial Intelligence, 72: 81-138.
Barto A., Sutton R., and Anderson C. (1983). Neurolike Adaptive Elements that
can solve Difficult Learning Control Problems. IEEE Transaction on Systems,
Man and Cybernetics, Vol. SMC-13, No. 5.
Bonabeau E., Dorigo M., and Theraulaz G., (1999). Swarm Intelligence: From
Natural to Artificial Systems. Oxford University Press.

[19]
[20]
[21]
[22]
[23]
[24]

[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]

[35]
[36]
[37]

[38]

Borenstein J., and Koren Y., (1989). Real-Time Obstacle Avoidance for Fast Mobile
Robots. IEEE Tans. Systems, Man, and Cybernetics, 19(5): 1179-1187.
Borenstein J., and Koren Y., (1991). The Vector Field Histogram- Fast obstacle avoidance
for mobile robots. IEEE Tans. Robotics and Automation, 7(3): 278-288.
Borenstein J., Everett H., and Feng L., (1996). Navigating Mobile Robots- Systems and
Techniques. A.K. Peters, Wessley, MA.
Brooks R. and Stein L. (1994). Building brains for bodies. Journal Autonomous
Robots, 1: 7-25.
Brooks R., (1986). A Robust Layered Control System for a Mobile Robot. IEEE
Journal of Robotics and Automation, 2(1): 14-22.
Brooks R., (1990a). Challenges for Complete Creature Architectures. In:
Proceedings of the first international conference on simulation of adaptive
Behavior, MIT Press, pp. 484-448.
Brooks R., (1990b). Elephants Dont Play Chess. In: Maes P., (Ed.), Designing
Autonomous Agents, MIT Press, pp. 3-15.
Brooks R., (1990c). The Behavior Language; Users Guide. Technical report AIM1127, MIT Artificial Intelligence Lab.
Brooks R., (1991a). Intelligence Without Reason. In: Proceedings IJCAI-91,
Sydney.
Brooks R., (1991b). Intelligence Without Representation. Artificial Intelligence 47,
139-160.
Canny John (1988). The Complexity of Robot Motion Planning. Cambridge,
Massachusetts: MIT Press.
Cao Y., Fukunaga A., and Kahng A. (1997). Co-operative Mobile Robotics:
Antecedents and Directions. Journal of Autonomous Robotics, 4:1-23.
Clouse J., (1997). On Integrating Apprentice learning and Reinforcement Learning.
Ph.D. Thesis, University of Massachusetts.
Clouse J., and Utgoff P., (1992). A teaching method for reinforcement. Machine
learning: In Proceedings of the Ninth International Conference, pp. 182-189.
Connell J. (1990). Minimalist Mobile Robotics: A Colony Architecture for an Artificial
Creature. Academic Press.
Connell J. (1991). SSS: a hybrid architecture applied to robot navigation. In:
proceedings of IEEE International conference on robotics and automation, France,
pp. 2719-2724.
Dayan P., (1992). The Convergence of TD() for general . Machine Learning, 8:
341-362.
Dayan P., and Sejnowski T., (1994). TD() Converges with Probability 1. Machine
Learning, 14: 295-301.
Dias B., and Stentz A., (2000). A Free Market Architecture for Distributed Control
of a Multirobot System. In: the 6th International Conference on Intelligent
Autonomous Systems, pp. 115-122.
Dietterich T., (1997). Machine Learning Research: Four Current Directions. AI
Magazine 18(4), 97-136.

[39]
[40]

[41]
[42]
[43]
[44]
[45]
[46]

[47]

[48]

[49]

[50]
[51]

[52]

[53]
[54]
[55]

Doran J., Franklin S., Jennings N., and Norman T., (1997). On Cooperation in
Multi-Robot Systems. The knowledge Engineering review, 12 (3).
Doyle A., (1995). Algorithms and Computational Techniques for Robot Path Planning.
Ph.D. Thesis, School of Electronic Engineering and Computer Systems, University
of Wales.
Dudek G., and Jenkin M., (2000). Competently Principles of Mobile Robots. Cambridge
University Press.
Elfes A., (1987). Sonar-based real-world mapping and navigation. IEEE Transaction on
Robotics and Automation RA-3(3), pp. 249-265.
Elfes A., (1987). Sonar-based real-world mapping and navigation. IEEE Transaction on
Robotics and Automation RA-3(3), pp. 249-265.
Elfes A., (1989). Using occupancy grids for mobile robot perception and navigation. IEEE
Computer, pp. 46-57.
Elfes A., (1989). Using occupancy grids for mobile robot perception and navigation. IEEE
Computer, pp. 46-57.
El-Telbany M., AbdelWhab A., and Shaheen S. (2001). Learning Multi-robot Cooperation by Expertise Distribution. Proceedings of the International Conference
on Industrial Electronics, Technology & Automation. Cairo, Egypt.
El-Telbany M., AbdelWhab A., and Shaheen S. (2002a). Learning Co-ordination by
Spatial and Expertise Distribution in Multi-agent Systems. The Egyptian Journal of
Engineering and Applied Science, 49 (2), 357-367.
El-Telbany M., AbdelWhab A., and Shaheen S. (2002b). Behaviour-Based MultiRobots Learning: A Reinforcement Learning Survey. The Egyptian Informatics
Journal. (To appear).
El-Telbany M., AbdelWhab A., and Shaheen S. (2002c). Speed up Reinforcement
Learning in Multi-robot Domains by Apprentice Learning. In WSEAS
International conference, Greece.
Everett H., (1995) Sensors for Mobile Robots- Theory and Application. A.K.
Peters, Wessley, MA.
Floreano D., (1997). Evolutionary Mobile Robots. In D. Quagliarelli , J Periaux, C.
Polini, and G. Winter (eds), Genetic Algorithms in Engineering and Computer
Science, Chichester: Johan Wiley and Sons, Ltd.
Fontn M., and Matari M., (1996). A Study of Territoriality: the Role of Critical
Mass in Adaptive Task Division. In Proceedings of fourth International
Conference on Simulation and Adaptive Behavior, pp. 553-561.
Fontn M., and Matari M., (1998). Territorial Multi-Robot Task Division. IEEE
transaction on Robotics and Automation, 14 (5).
Gat E., (1998). On three-layer architectures. In: Bonnasso R., and Merphy P.
(Eds.), Artificial Intelligence and mobile robots, AAAI Press.
Gerkey B., and Matari M., (2000). Murdoch: Publish/Subscribe Task Allocation
for Heterogeneous Agents. Proceedings, Autonomous Agents 2000, Barcelona,
Spain, June 3-7, 203-204.

[56]

[57]

[58]
[59]
[60]
[61]

[62]

[63]
[64]
[65]

[66]
[67]
[68]
[69]

[70]
[71]
[72]
[73]

Gerkey B., and Matari M., (2001). Principled Communication for Dynamic MultiRobot Task Allocation. In Rus D., and Singh S., (eds.) Experimental Robotics VII,
353-362, Springer-Verlag, Berlin.
Gerkey B., and Matari M., (2002). Pusher-Watcher: an Approach to Fault-Tolerant
Tightly Coupled Robot Coordination. In Proceedings of IEEE International
Conference on Robotics and Automation, Washington.
Gerkey B., and Matari M., (2002). Sold!: Auction Methods for Multi-Robot
Coordination. IEEE Transactions on Robotics and Automation.
Goldberg D., (1989). Genetic Algorithms in Search, optimization, and Machine
Learning. Addison-Wesley.
Goldberg D., and Matari M., (1997). Interference as a Tool for Designing and
Evaluating Multi-Robot Controllers. In Proceedings AAAI-97, pp. 637-642.
Goldberg D., and Matari M., (1999). Coordinating mobile robot group behavior
using a model of interaction dynamics. In: Proceedings of the third international
Conference on Autonomous Agents, Washington, ACM Press.
Grefenstette J. and Schultz A. (1994). An Evolutionary approach to learning in
robots. In proceedings Machine Learning Workshop on Robot Learning, New
Brunswick, NJ.
Haykin S., (1999). Neural Networks: A Comprehensive Foundation. Second
Edition, Prentice-Hall, Inc.
Hoffmann F., (1998). Soft computing techniques for the design of the mobile robot
behaviors. Journal of Information Sciences.
Holland J., (1986). Escaping brittleness: the possibilities of general-purpose learning
algorithms applied to parallel rule-based systems. In R.S. Michalski and T.M.
Mitchell (eds.), Machine learning: An Artificial Intelligence Approach, Volume II,
Morgan Kaufmann.
Humphrys M., (1995). W-Learning: Competition among selfish Q-Learners.
Technical Report no. 362, University of Cambridge, Computer Laboratory.
Humphrys M., (1997). Action Selection Methods using Reinforcement Learning.
Ph.D., Thesis, University of Cambridge, Computer Laboratory.
Kaelbling L., Littman M., and Moore A., (1996). Reinforcement Learning: A
Survey. Journal of Artificial Intelligence Research, 4, pp.237-258.
Kaiser M., Dillmann R., and Rogalla O., (1996). Communication as the Basis for
Learning in Multi-Agent Systems. In: Workshop on Learning in Distributed AI
Systems, Budapest, Hungary, 1996.
Karlsson J., (1997). Learning to Solve Multiple Goals. Ph.D. Thesis, University of
Rochester, New York.
Keerthi S., and Ravindran B., (1999) A Tutorial Survey of Reinforcement Learning.
Department of Computer Science, Indian Institute of Science.
Khatib O., (1986). Real-time obstacle avoidance for manipulators and mobile
robots. The International Journal of Robotics Research, 5(1).
Koza J., (1992). Genetic Programming, on the Programming of Computers by
Means of Natural Selection. MIT Press.

[74]

[75]
[76]

[77]
[78]
[79]
[80]
[81]
[82]
[83]
[84]

[85]
[86]
[87]

[88]

[89]
[90]

[91]

Kube C. R., and Zhang H., (1992). Collective Robotics Intelligence. In: From
Animal to Animates: Second International conference on Simulation of Adaptive
Behaviors, PP. 460-468.
Kube C. R., and Zhang H., (1993). Collective Robotics: From Social Insects to
Robots. Journal of Adaptive Behavior, 2 (2): 189-219.
Kube C. R., and Zhang H., (1996). The use of Perceptual cues in Multi-Robot BoxPushing. In: Proceedings of IEEE International Conference on Robotics and
Automation, pp. 2085-2090.
Kube R., and Bonabeau E., (1998). Cooperative transport by ants and robots.
Journal of Robotics and Autonomous Systems.
Lin Long-Ji, (1992). Self-improving reactive agents based on reinforcement
learning, planning and teaching. Machine learning, 8, 293-321.
Lin Long-Ji, (1993). Scaling up reinforcement learning for robot control.
Proceedings of the 10th International Conference of Machine learning, pp. 182-189.
Lozano-Prez T., (1983). Spatial Planning: A Configuration Space Approach. IEEE
Transaction on Computers, C-32, pp. 108-120.
Lund, H. (1995). Evolving Robot Control System. In Proceedings of the first
NWGA, University of Vaasa, Finland.
Maes P., (1994). Modeling Adaptive Autonomous Agents. Journal of Artificial Life
1(2), 135-162.
Maes P., and Brooks R., (1991). Learning to Coordinate Behaviors. In: Proceedings
of the Eighth National Conference on AI, Boston.
Mahadevan S., (1996a). Machine Learning for robots: A Comparison of Different
Paradigms. In: proceedings of International Conference on Intelligent Robots and
systems.
Mahadevan S., (1996b). The NSF Workshop on Reinforcement Learning: Summary
and Observations. AI Magazine.
Mahadevan S., Connell J., (1992). Automatic Programming of behavior-based
Robots using reinforcement Learning. Artificial Intelligent, pp. 311-365.
Matari M., (1990). A Distributed Model for Mobile-Robot Environment Learning
and navigation. Ms.c. Thesis, Massachusetts Institute of Technology, Artificial
Intelligence laboratory, Cambridge, USA.
Matari M., (1992). Minimizing complexity in controlling a collection of mobile
robots. In: proceedings of IEEE International Conference on Robotics and
Automation, pp. 830-835.
Matari M., (1994a). Learning Social Behaviors. In: Proceedings of the Third
International Conference on Simulation of Adaptive Behavior, PP. 453-462.
Matari M., (1994b). Interaction and Intelligent Behavior. Ph.D. Thesis,
Massachusetts Institute of Technology, Artificial Intelligence laboratory,
Cambridge, USA.
Matari M., (1994c). Reward Functions for Accelerating Learning. In: Proceedings
of the Eleventh International Machine Learning conference, Morgan Kaufmanns
Publisher Inc.

[92]

[93]

[94]
[95]

[96]
[97]
[98]

[99]
[100]
[101]
[102]
[103]

[104]
[105]
[106]
[107]
[108]
[109]

[110]

Matari M., (1995a). Learning in Multi-Robot Systems. In: Proceedings of the

IJCAI-95 Workshop on Adaptation and Learning in Multiagent Systems, 16:321331.
Matari M., (1995b). Issues and Approaches in the Design of Collective
Autonomous Robots. Journal of Robotics and Autonomous Systems, 16(2-4), 321331.
Matari M., (1995c). Designing and Understanding Adaptive Group Behavior.
Journal of Adaptive Behavior, pp. 51-80.
Matari M., (1997a). Behavior-Based Control: Examples from Navigation, Learning
and Group Behavior. Journal of Experimental and Theoretical Artificial
Intelligence, 9 (2-3), 323-336.
Matari M., (1997b). Using Communication to Reduce Locality in Distributed
Multi-Agent learning. In: Proceedings, AAAI-97, pp. 643-648.
Matari M., (1997c). Reinforcement Learning in the Multi-Robot Domain.
Autonomous Robots 4(1), 73-83.
Matari M., (1998a). Behavior-Based Robotics as a Tool for Synthesis of Artificial
behavior and Analysis of Natural Behavior. In Trends in cognitive science, Vol. 2,
No. 3, 82-87.
Matari M., (1998b). Coordination and Learning in Multi-robot Systems. In IEEE
Intelligent Systems.
Matari M., (2001). Learning in behavior-based multi-robot systems: policies,
models, and other agents. Journal of Cognitive Systems Research, 2: 81-93.
Matari M., (2002). Situated Robotics. In the Encyclopedia of Cognitive Science,
Nature Publishers Group.
Matari M., and Cliff D. (1996). Challenges in Evolving Controllers for Physical
Robots. Journal of Robotics and Autonomous Systems, 19:67-83.
Matari M., and Sukhatme G., (2001). Task-allocation and Coordination of Multiple
robots for Planetary Exploration. In the 10th International Conference on
Advanced Robotics, Buda, Hungary, pp. 61-70.
Matari M., Nilsson M., and Simsarian K. (1995). Co-operative Multi-Robot BoxPushing. In proceedings, IROS-95, IEEE computer society press Pittsburgh, PA.
Michalewicz Z., (1994). Genetic Algorithms + Data Structure = Evolutionary
Programs. Springer-Verlag.
Mitchell T., (1997). Machine Learning. McGraw-Hill Inc.
Moore A., and Atkeson C., (1993). Prioritized sweeping: Reinforcement learning
with less data and less time. Machine learning, 103-130.
Moravec H., and Martin M., (1996). Robot Evidence Grids. Technical Report CMURI-TR-96-06, Carnegie Mellon University, Robot Institute.
Moriarty D., Schultz A., and Grefenstette J., (1999). Evolutionary Algorithms for
Reinforcement Learning. Journal of Artificial Intelligence Research, 11, pp. 241278.
Murphy R., (2000). Introduction to AI Robotics. MIT Press.

[111]

[112]

[113]

[114]

[115]

[116]

[117]
[118]
[119]

[120]

[121]

[122]
[123]
[124]

[125]

Nagayuki Y., Ishii S., and Doya K., (1999). Multi-Agent Reinforcement Learning:
An Approach Based on the Other Agents Internal Model. Advanced
Telecommunications Research Institute, Japan.
Nebot E., and Pagac D., (1995). Quadtree Representations and Ultrasonic Information for
Mapping an Autonomous Guided Vehicles Environment. International Journal of
Computers and Their Applications, 2(3), pp. 160-170.
Nehmzov U., and Mitchell T., (1995). A Prospective Students Introduction to the
Robot Learning Problem. Technical Report UMCS-95-12-6, University of
Manchester.
Nehmzow U. and Smithers T., (1992). Using motor actions for location
recognition. In Varela F., and Bourgine P. (Eds.) Toward a practice of autonomous
systems. MIT Press, pp. 96-104.
Ostergaard E., Matari M., and Sukhatme G., (2001b). Distributed Multi-Robot
Task Allocation for Emergency Handling. In Proceedings of IEEE/RSJ
International Conference on Robots and Systems (IROS), Maui, Hawaii, 821-826.
Ostergaard E., Matari M., and Sukhatme G., (2002). Multi-Robot Task Allocation
in the Light of Uncertainty. In Proceedings of IEEE International Conference on
Robotics and Automation (ICRA-2002), Washington.
Ostergaard E., Sukhatme G., and Matari M., (2001a). Emergent Bucket Brigading.
Autonomous Agents 2001, Monteral, Canada.
Parker L., (1992). Adaptive Action Selection for Cooperative Agent Teams. In
International Conference on Simulation of Adaptive Behavior, pp. 442-450.
Parker L., (1994). Heterogeneous Multi-robot Cooperation. Ph.D. thesis,
Massachusetts Institute of Technology, Artificial Intelligence laboratory,
Cambridge, USA.
Parker L., (1995a). The Effect of Action Recognition and robot Awareness in
Cooperative Robotic Teams. In proceedings of International Conference on
Intelligent Robotics and Systems, 1, pp. 212-219.
Parker L., (1995b). L-ALLIANCE: A mechanism for adaptive action selection in
heterogeneous multi-robot teams. Technical Report ORNL/TM-13000, Oak Ridge
National laboratory, Tennessee.
Parker L., (1996). On the Design of Behavior-based Multi-Robot Teams. In Journal
of Advanced Robotics, 10, pp. 547-578.
Parker L., (1998a). ALLIANCE: An Architecture for Fault-Tolerant Multi-robot
Cooperation. IEEE Transactions on Robotics and Automation 14(2), pp. 220-240.
Parker L., (1998b). Distributed Control of Multi-robot Teams: Cooperative BatonPassing Task. In: proceedings of the 4th International Conference on Information
Systems Analysis and Synthesis, Volume 3, pp. 89-94
Parker L., (2000a). Current State of the art in Distributed Autonomous Mobile
Robotics. In: proceedings of Fifth International Symposium on Distributed
Autonomous Robotic systems.

[126]

[127]

[128]
[129]
[130]
[131]
[132]
[133]
[134]
[135]
[136]
[137]
[138]
[139]

[140]
[141]
[142]

[143]

[144]

Parker L., (2000b). Multi-robot Learning in a Cooperative Observation Task. In:

proceedings of Fifth International Symposium on Distributed Autonomous
Robotic systems.
Pearce M., Arkin R., and Ram A., (1992). The learning of reactive Control
Parameters through Genetic Algorithms. In Proceedings of the IEEE/RSJ
International Conference on Intelligent Robotics and Systems.
Peng J., and Williams J., (1993). Efficient learning and learning within the Dyna
framework. Adaptive Behavior, 4, 323-334.
Pirjanian P., (1999). Behavior coordination Mechanisms State-of-the-art.
University of Southern California, Robotics Research laboratory.
Pomerleau A. (1993). Neural Network Perception for mobile Robot Guidance.
Kluwer Academic Publishers.
Rummery G., and Niranjan M., (1994). On-line Q-learning Using Connectionist
Systems. Technical Report CUED/F-INFENG/TR 166. Cambridge University.
Russell S., (1996). Machine Learning. In Boden, M., A., Artificial Intelligence,
Academic Press, Inc.
Russell S., and Norvig P., (1995). Artificial intelligence: A Modern Approach.
Prentice-Hall International, Inc.
Saffiotti A., (1997). The uses of fuzzy logic in autonomous robot navigation: a
catalogue raisonne. Soft Computing 1(4): 180-197.
Salustowicz P., Wiering A., and Schmidhuber H., (1998). Learning Team Strategies:
Soccer case studies. Machine Learning, 33(2/3).
Samuel A., (1967). Some studies in machine learning using the game for Checkers
II: Recent Progress. IBM Journal of Research and Development, 11, 601-617.
Schneider J., Wong W., Moore A., and Riedmiller M., (1999). Distributed Value
Functions. International Conference on Machine Learning.
Shappard J., (1997). Multi-agent Reinforcement learning in Markov games. PhD.
Thesis, Johan Hopkins University.
Simsaian K., and Matari M., (1995). Learning to Cooperate using Two Six-Legged
Mobile Robots. In: Proceedings of 3rd European Workshop of Learning Robots,
Crete, Greece.
Singh S., (1994). Learning to Solve Markovian Decision Processes. Ph.D. Thesis,
University of Massachusetts.
Smart W., (2002). Making Reinforcement Learning Work on Real Robots. Ph.D.
Thesis, Department of Computer Science, University of Brown, USA.
Stone P., (1998) Layered Learning in Multi-Agent Systems. Technical Report CMUCS-98-187, Computer Science Department, Carnegie Mellon University, Pittsburgh,
PA.
Stone P., and Veloso M., (1997). Multi-Agent Systems: A Survey from Machine
Learning Perspective. Technical Report CMU-CS-97-1993, Computer Science
Department, Carnegie Mellon University, Pittsburgh, PA.
Sutton R. S., and Barto A. G., (1998). Reinforcement Learning: An Introduction.
MIT Press.

[145]
[146]

[147]
[148]

[149]

[150]

[151]

[152]
[153]
[154]
[155]

[156]
[157]
[158]

[159]

[160]

Sutton R., (1988). Learning to predict by the methods of temporal differences.

Machine Learning, 3(1): 9-44.
Sutton R., (1990a). Integrated architectures for learning, planning, and reacting
based on approximating dynamic programming. In Proceedings of the 7th
International Conference on Machine Learning.
Sutton R., (1990b). Planning by Incremental Dynamic Programming. In
Proceedings of the 9sth International Conference on Machine Learning.
Sutton R., (1991). DYNA, an integrated Architecture for Learning, Planning and
Reacting. In working notes of the AAAI Spring Symposium of Integrated
intelligent Architectures.
Tan M., (1993). Multi-Agent Reinforcement Learning: Independent vs. Cooperative
Agents. Proceedings of the 10th International Conference on Machine Learning, pp.
330-337.
Tennenholtz M., and Shoham Y., (1992). Emergent Conventions in Multi-Agent
Systems. Robotics Laboratory, Department of Computer science, Stanford
University, Proceedings of KR, Boston.
Thrun S. (1992a). Efficient exploration in reinforcement learning. Technical Report
CMU-CS-92-102, Carnegie Mellon University, Computer Science Department,
Pittsburgh, PA, 1992.
Thrun S., (1995). Exploration in active learning. In M. Arbib, editor, Handbook of
Brain and Cognitive Science. MIT Press.
Touzet C., (2000). Distributed Lazy Q-Learning for Cooperative Mobile Robots.
Autonomous Robots, (2) pp. 1-18.
Tsitsiklis J., (1994). Asynchronous stochastic approximation and Q-learning.
Machine Learning, 16:185-202.
Version C., and Gambardella L. (1996). Ibots: Learning Real Team Solutions. In
MAS96 Workshop on Learning, Interaction and Organization in Multi-agent
Environments, Kyoto, Japan.
Watkins C., (1989). Learning from delayed rewards. Ph.D. Thesis, Cambridge
University, England.
Watkins C., and Dayan P., (1992). Technical note: Q-learning. Journal of Machine
Learning, 8(3/4): 279:292.
Weiss G., (1997). Adaptation and learning in multi-agent systems: some remarks
and bibliography. Weiss, G., (ed.), Distributed Artificial Intelligence Meets Machine
Learning: Learning in Multi-agent Environment, Lecture Notes in AI 1221,
Springer-Verlag, Berlin, 223-241.
Werger B., and Matari M., (2000a). Broadcast of Local Eligibility for Multi-Target
Observation. In proceedings of the international Symptom on Distributed
Autonomous Robotic Systems, Knoxville, Tennessee.
Werger B., and Matari M., (2000b). Broadcast of Local Eligibility: Behavior-Based
Control for Strongly Cooperative Robot Teams. In proceedings, Autonomous
Agents 2000, Barcelona, Spain, June 3-7, 203-204.

[161]
[162]
[163]
[164]

[165]
[166]

Wiering M., (1999). Explorations in Efficient Reinforcement learning. PhD Thesis,

University of Amsterdam.
Wiering M., Krose B., and Groen F., (2000). Learning in Multi-agent Systems.
Department of Computer Science, University of Amsterdam.
Wyatt J., (1997). Exploration and Inference in Learning from Reinforcement. Ph.D.
Thesis, University of Edinburgh, England.
Yanco H., and Stein L., (1993). An Adaptive Communication Protocol for
Cooperative Mobile Robots. In: Meyer J., Roitblat H., and Wilson S., (Eds.) From
Animals to Animates: International Conference on Simulation of adaptive
Behavior. MIT Press, pp. 478-485.
Zlot R., Stentz A., Dias B., and Thayer S., (2002). Multi-Robot Exploration
Controlled by a Market Economy. ICRA-2002.
Hwang Y. and Ahuja N., (1992). Gross Motion Planning- A Survey. ACM
computing Surveys, vol. 24, pp. 219-291.

NSTP 11 2018 (Lecture 01)
100% (6)
NSTP 11 2018 (Lecture 01)
33 pages
Introduction To Autonomous Mobile Robots - Siegwart Nourbakhsh
100% (6)
Introduction To Autonomous Mobile Robots - Siegwart Nourbakhsh
336 pages
Artificial Intelligence in Robotics
100% (1)
Artificial Intelligence in Robotics
11 pages
FFR125 LectureNotes
100% (2)
FFR125 LectureNotes
122 pages
Child Psy PDF
100% (3)
Child Psy PDF
215 pages
Double-Digit Addition Without Regrouping
No ratings yet
Double-Digit Addition Without Regrouping
5 pages
Artificial Intelligence Project Ideas
No ratings yet
Artificial Intelligence Project Ideas
2 pages
Unit 4 Mobile Robotics (LPU)
No ratings yet
Unit 4 Mobile Robotics (LPU)
59 pages
Assessment On The Accessibility of Public Buildings and Itsfacilities To The Disabled in Ghana
0% (1)
Assessment On The Accessibility of Public Buildings and Itsfacilities To The Disabled in Ghana
8 pages
8 X4 Wog V2 L
No ratings yet
8 X4 Wog V2 L
169 pages
Robotic Paradigms
No ratings yet
Robotic Paradigms
22 pages
Lesson Plan For Central Tendency
No ratings yet
Lesson Plan For Central Tendency
4 pages
Introduction To Mobile Robotics: Robot Control Paradigms
No ratings yet
Introduction To Mobile Robotics: Robot Control Paradigms
21 pages
The Direction Analysis On Trajectory of Fast Neural Network Learning Robot
No ratings yet
The Direction Analysis On Trajectory of Fast Neural Network Learning Robot
11 pages
Robotics 13 00012
No ratings yet
Robotics 13 00012
29 pages
Smart Home Technologies: Automation and Robotics
No ratings yet
Smart Home Technologies: Automation and Robotics
57 pages
Mobile Robot: Motion Control and Path Planning: Ahmad Taher Azar Ibraheem Kasim Ibraheem Amjad Jaleel Humaidi
No ratings yet
Mobile Robot: Motion Control and Path Planning: Ahmad Taher Azar Ibraheem Kasim Ibraheem Amjad Jaleel Humaidi
670 pages
Drone
No ratings yet
Drone
60 pages
Conclusion: Environments, University of Cincinnati, Ohio
No ratings yet
Conclusion: Environments, University of Cincinnati, Ohio
4 pages
Intelligent Autonomous Systems 12 2013 PDF
No ratings yet
Intelligent Autonomous Systems 12 2013 PDF
861 pages
Material b1 para Cambridge
No ratings yet
Material b1 para Cambridge
9 pages
Mobile Robots Moving Intelligence
No ratings yet
Mobile Robots Moving Intelligence
586 pages
Urban Follow A Track: Dynamic Databases. Dynamic Databases Would Store Task-Oriented Environment Knowledge
No ratings yet
Urban Follow A Track: Dynamic Databases. Dynamic Databases Would Store Task-Oriented Environment Knowledge
5 pages
10+IJISAPE-97 Scopus+paper v1 24.06.2023+
No ratings yet
10+IJISAPE-97 Scopus+paper v1 24.06.2023+
8 pages
Reactive Navigation For Autonomous Guided Vehicle Using The Neuro-Fuzzy Techniques
No ratings yet
Reactive Navigation For Autonomous Guided Vehicle Using The Neuro-Fuzzy Techniques
10 pages
Teaching Kids Objects Around The Classroom Lesson Plan
No ratings yet
Teaching Kids Objects Around The Classroom Lesson Plan
3 pages
Signature Redacted
No ratings yet
Signature Redacted
175 pages
Mobile Robot Control and Navigation A Global Overview
No ratings yet
Mobile Robot Control and Navigation A Global Overview
24 pages
BTMH3423 Maze Solving Mobile Robot Assignment (Leong Kit and Cyrus Tay Yeh)
No ratings yet
BTMH3423 Maze Solving Mobile Robot Assignment (Leong Kit and Cyrus Tay Yeh)
27 pages
AR2022 Lec 1-Final
No ratings yet
AR2022 Lec 1-Final
25 pages
Ai Based Algorithms of Path Planning, Navigation and Control For Mobile Ground Robots and Uavs
No ratings yet
Ai Based Algorithms of Path Planning, Navigation and Control For Mobile Ground Robots and Uavs
177 pages
Lecture 1 Introduction No Video
No ratings yet
Lecture 1 Introduction No Video
63 pages
Mobile Robot Navigation Using A Behavioural Strategy
No ratings yet
Mobile Robot Navigation Using A Behavioural Strategy
16 pages
Navigation in Virtual Environments
No ratings yet
Navigation in Virtual Environments
21 pages
High and Low Level Control For An Unmanned Ground Vehicle
No ratings yet
High and Low Level Control For An Unmanned Ground Vehicle
94 pages
Research About Robotics
No ratings yet
Research About Robotics
3 pages
Rsa 3 1 23090
No ratings yet
Rsa 3 1 23090
20 pages
A Comprehensive Study For Robot Navigation Techniques: Cogent Engineering
No ratings yet
A Comprehensive Study For Robot Navigation Techniques: Cogent Engineering
26 pages
Motion Control System of Autonomous
No ratings yet
Motion Control System of Autonomous
5 pages
Design of A PID Optimized Neural Networks and PD Fuzzy Logic Controllers For A Two-Wheeled Mobile Robot
No ratings yet
Design of A PID Optimized Neural Networks and PD Fuzzy Logic Controllers For A Two-Wheeled Mobile Robot
19 pages
Artificial Intelligence in Robotics: From Automation To Autonomous Systems
No ratings yet
Artificial Intelligence in Robotics: From Automation To Autonomous Systems
12 pages
Fuzzy Logic Based Control For Autonomous Mobile Ro
No ratings yet
Fuzzy Logic Based Control For Autonomous Mobile Ro
10 pages
Thesis of Path Planning of Mobile Robots Using Genetic Algorithms
No ratings yet
Thesis of Path Planning of Mobile Robots Using Genetic Algorithms
151 pages
Slides 1
No ratings yet
Slides 1
62 pages
Matteo Iovino Tesi
No ratings yet
Matteo Iovino Tesi
179 pages
Automation and Robotics: Prepared by G.Harsha Vardhini
No ratings yet
Automation and Robotics: Prepared by G.Harsha Vardhini
26 pages
Artificial Intelligence in Robots-1
No ratings yet
Artificial Intelligence in Robots-1
13 pages
背景介绍
No ratings yet
背景介绍
27 pages
Sensor Based Motion Control of Mobile Car Robot
No ratings yet
Sensor Based Motion Control of Mobile Car Robot
5 pages
Zhou Et Al. - 2022 - A Review of Motion Planning Algorithms For Intelli
No ratings yet
Zhou Et Al. - 2022 - A Review of Motion Planning Algorithms For Intelli
38 pages
AI-Based Path Planning For Autonomous Robots
No ratings yet
AI-Based Path Planning For Autonomous Robots
3 pages
Implementation of VFH
No ratings yet
Implementation of VFH
6 pages
Robotika #1 - Introduction To Mobile Robotics
No ratings yet
Robotika #1 - Introduction To Mobile Robotics
56 pages
Koseoglu 2017
No ratings yet
Koseoglu 2017
5 pages
Mobile Robots That Learn To Navigate: Honours Year Project Report
No ratings yet
Mobile Robots That Learn To Navigate: Honours Year Project Report
85 pages
Analysis and Application Research of Mobile Robot
No ratings yet
Analysis and Application Research of Mobile Robot
5 pages
012 Proposed ANN Positioning and Navigation System of Hybrid Robots For Mine Purpose
No ratings yet
012 Proposed ANN Positioning and Navigation System of Hybrid Robots For Mine Purpose
6 pages
Design and Implementation of Fuzzy Logic For Obstacle 1mkn35olye
No ratings yet
Design and Implementation of Fuzzy Logic For Obstacle 1mkn35olye
10 pages
Mobile Robotics: Research, Applications and Challenges PDF
No ratings yet
Mobile Robotics: Research, Applications and Challenges PDF
4 pages
Electronically Driven Mobile Platform
No ratings yet
Electronically Driven Mobile Platform
6 pages
Psychological Science Modeling Scientific Literacy 2nd Edition Krause Test Bank 1
100% (75)
Psychological Science Modeling Scientific Literacy 2nd Edition Krause Test Bank 1
68 pages
WEYER RestorationTheoryAppliedtoInstallationArt
No ratings yet
WEYER RestorationTheoryAppliedtoInstallationArt
5 pages
My Personal Leadership Philosophy
75% (4)
My Personal Leadership Philosophy
5 pages
Mind Illuminated
No ratings yet
Mind Illuminated
2 pages
Advancements and Challenges in Mobile Robot Naviga
No ratings yet
Advancements and Challenges in Mobile Robot Naviga
25 pages
VHDL Practical Exam Guide
No ratings yet
VHDL Practical Exam Guide
95 pages
Certi of Recognition For SLAC Speaker
No ratings yet
Certi of Recognition For SLAC Speaker
25 pages
Driver Education 101: Student Must Not Miss More Than 2 Periods of In-Class Driver Ed at Patriot
No ratings yet
Driver Education 101: Student Must Not Miss More Than 2 Periods of In-Class Driver Ed at Patriot
2 pages
TWE TOEFL Writing Templates
No ratings yet
TWE TOEFL Writing Templates
16 pages
Powerade TCCC
0% (1)
Powerade TCCC
5 pages
Practical Research-Final Draft222
No ratings yet
Practical Research-Final Draft222
66 pages
Introduction of Digital Design & VHDL: Eng - Emad Samuel
No ratings yet
Introduction of Digital Design & VHDL: Eng - Emad Samuel
8 pages
(PDF) Analyzing Patterns - Logical Reasoning
No ratings yet
(PDF) Analyzing Patterns - Logical Reasoning
4 pages
VHDL Practical Exam Guide
No ratings yet
VHDL Practical Exam Guide
95 pages
Lecture 5
No ratings yet
Lecture 5
5 pages
Intake 34 Embedded Systems Engineer Track
No ratings yet
Intake 34 Embedded Systems Engineer Track
10 pages
Aqa-71822-Qp-Jun17 1
No ratings yet
Aqa-71822-Qp-Jun17 1
24 pages
Family Law Project Niyati..Sem 3 Final
No ratings yet
Family Law Project Niyati..Sem 3 Final
10 pages
HowTo Read Simulation Code
No ratings yet
HowTo Read Simulation Code
4 pages
Lesson I Learning To Be A Better Learner
No ratings yet
Lesson I Learning To Be A Better Learner
4 pages
12 Pieces of Buddhist Wisdom That Will Transform Your Life
No ratings yet
12 Pieces of Buddhist Wisdom That Will Transform Your Life
6 pages
ACLETA Brigada Eskwela Accomplishment Report
No ratings yet
ACLETA Brigada Eskwela Accomplishment Report
6 pages
Entity Mux Is: Port (A, CLK, Reset: IN STD - Logic D: Out STD - Logic)
No ratings yet
Entity Mux Is: Port (A, CLK, Reset: IN STD - Logic D: Out STD - Logic)
4 pages
Academic Performance of Student-Athletes: Basis For Decision Making
No ratings yet
Academic Performance of Student-Athletes: Basis For Decision Making
18 pages
Job Application 678787opihpi
No ratings yet
Job Application 678787opihpi
2 pages
English For Specific Purposes
No ratings yet
English For Specific Purposes
2 pages
Adolescent Emotional Spectrum Scale (AESS)
No ratings yet
Adolescent Emotional Spectrum Scale (AESS)
13 pages
Articles of Confederation Modification Reflection
No ratings yet
Articles of Confederation Modification Reflection
1 page
Stress and University Students
No ratings yet
Stress and University Students
9 pages
(Transitions)
No ratings yet
(Transitions)
1 page
AOC-03 F-Man UR 00285289 Application
No ratings yet
AOC-03 F-Man UR 00285289 Application
2 pages
Robotics for Mobile Applications
From Everand
Robotics for Mobile Applications
Menka Chopra
No ratings yet
Engineering the Future at the Intersection: Groundbreaking Advances in Mechanical Engineering
From Everand
Engineering the Future at the Intersection: Groundbreaking Advances in Mechanical Engineering
Fatemeh Bakhtiari
No ratings yet
Behavior Based Robotics: Fundamentals and Applications
From Everand
Behavior Based Robotics: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Percept: Fundamentals and Applications
From Everand
Percept: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet