CS8691 Unit 5
CS8691 Unit 5
UNIT V APPLICATIONS
Page 198
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
d) AI in Autonomous vehicles: Just like humans, self-driving cars need to have sensors to
understand the world around them and a brain to collect, processes and choose specific
actions based on information gathered. Autonomous vehicles are with advanced tool to
gather information, including long range radar, cameras, and LIDAR. Each of the
technologies are used in different capacities and each collects different information. This
information is useless, unless it is processed and some form of information is taken based on
the gathered information. This is where artificial intelligence comes into play and can be
compared to human brain. AI has several applications for these vehicles and among them the
more immediate ones are as follows:
Directing the car to gas station or recharge station when it is running low on fuel.
Adjust the trips directions based on known traffic conditions to find the quickest
route.
Incorporate speech recognition for advanced communication with passengers.
Natural language interfaces and virtual assistance technologies.
e) AI for robotics will allow us to address the challenges in taking care of an aging
population and allow much longer independence. It will drastically reduce, may be even
bring down traffic accidents and deaths, as well as enable disaster response for dangerous
situations for example the nuclear meltdown at the fukushima power plant.
f) Cyborg Technology: One of the main limitations of being human is simply our own bodies
and brains. Researcher Shimon Whiteson thinks that in the future, we will be able to
augment ourselves with computers and enhance many of our own natural abilities. Though
many of these possible cyborg enhancements would be added for convenience, others may
serve a more practical purpose. Yoky Matsuka of Nest believes that AI will become useful
for people with amputated limbs, as the brain will be able to communicate with a robotic
limb to give the patient more control. This kind of cyborg technology would significantly
reduce the limitations that amputees deal with daily.
5.1.1 Artificial Intelligence Technologies
The market for artificial intelligence technologies is flourishing. Artificial Intelligence
involves a variety of technologies and tools, some of the recent technologies are as follows:
Page 199
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Natural Language Generation: it‘s a tool that produces text from the computer data.
Currently used in customer service, report generation, and summarizing business
intelligence insights.
Speech Recognition: Transcribes and transforms human speech into a format useful for
computer applications. Presently used in interactive voice response systems and mobile
applications.
Virtual Agent: A Virtual Agent is a computer generated, animated, artificial intelligence
virtual character (usually with anthropomorphic appearance) that serves as an online
customer service representative. It leads an intelligent conversation with users, responds
to their questions and performs adequate non-verbal behavior. An example of a typical
Virtual Agent is Louise, the Virtual Agent of eBay, created by a French/American
developer VirtuOz.
Machine Learning: Provides algorithms, APIs (Application Program interface)
development and training toolkits, data, as well as computing power to design, train, and
deploy models into applications, processes, and other machines. Currently used in a wide
range of enterprise applications, mostly `involving prediction or classification.
Deep Learning Platforms: A special type of machine learning consisting of artificial
neural networks with multiple abstraction layers. Currently used in pattern recognition
and classification applications supported by very large data sets.
Biometrics: Biometrics uses methods for unique recognition of humans based upon one
or more intrinsic physical or behavioral traits. In computer science, particularly,
biometrics is used as a form of identity access management and access control. It is also
used to identify individuals in groups that are under surveillance. Currently used in
market research.
Robotic Process Automation: using scripts and other methods to automate human action
to support efficient business processes. Currently used where it is inefficient for humans
to execute a task.
Text Analytics and NLP: Natural language processing (NLP) uses and supports text
analytics by facilitating the understanding of sentence structure and meaning, sentiment,
and intent through statistical and machine learning methods. Currently used in fraud
Page 200
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
detection and security, a wide range of automated assistants, and applications for mining
unstructured data.
5.2 LANGUAGE MODEL
Speech Recognition
OCR & Handwriting Recognition
Machine Translation
Generation
Context sensitive spelling correction.
A language model also supports predicting the completion of a sentence. Predictive text
input systems can guess what is been typed and provide choices on how to complete it.
5.2.2 N- Gram Word Models
This model is considered over sequences of words, characters, syllables or other units.
Estimate probability of each word given prior context.
An N-gram model uses only N-1 words of prior context.
Unigram: P(phone)
Bigram: P(phone | cell)
Trigram: P(phone | your cell)
The Markov assumption is the presumption that the future behavior of a dynamical
system only depends on its recent history. In particular, in a Kth-Order Markov Model,
next state only depends on the k most recent states, therefore an N – gram model is a (N-
1) – order Markov model.
5.2.3 N-gram Character Models
One of the simplest language models: P(c1N)
Language identification: given the text determine which language it is written in.
Build a trigram character model of each candidate language: P(ci | ci-2i-1 , l )
Page 201
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Page 202
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
information comprising text, audio, images and video. While many features of conventional text
retrieval system are equally applicable to multimedia information retrieval, the specific nature of
audio, image and video information have called for the development of many new tools and
techniques for information retrieval.
Modern information retrieval deals with storage, organization and access to text, as well
as multimedia information resources. The concept of information retrieval presupposes that there
are some documents or records containing information that have been organized in an order
suitable for easy retrieval. The documents or records we are concerned with contain
bibliographic information which is quite different from other kinds of information or data. We
may take a simple example. If we have a database of information pertaining to an office, or a
supermarket, all we have are the different kinds of records and related facts, like names of
employees, their positions, salary, and so on, or in the case of a supermarket, names of different
items, prices, quantity, and so on. The main objective of a bibliographic information retrieval
system, however, is to retrieve the information either the actual information or the documents
containing the information that fully or partially match the user‗s query. The database may
contain abstracts or full texts of document, like newspaper articles, handbooks, dictionaries,
encyclopedias, legal documents, statistics, etc., as well as audio, images, and video information.
An information retrieval system thus has three major components- the document
subsystem, the users subsystem, and the searching/retrieval subsystem. These divisions are quite
broad and each one is designed to serve one or more functions, such as:
Analysis of documents and organization of information (creation of a document
database)
Analysis of user‗s queries, preparation of a strategy to search the database
Actual searching or matching of users queries with the database, and finally
Retrieval of items that fully or partially match the search statement.
An IR is a 3 step Process:
Asking a question (how to use the language to get what we want?)
Building an answer from known data. (How to refer to a given text?)
Assessing the answer. (Does it contain the information we are seeking.)
Page 203
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Page 204
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Page 205
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
more subject areas in order to provide it to the user as soon asked for. Belkin presents the
following situation which clearly reflects the purpose of information retrieval systems:
A writer presents as set of ideas in a document using a set of concepts
Somewhere there will be some users who require the ideas but may not be able to
identify those. In other words, there will be some persons who lack the ideas put forward
by the author in his/her work.
Information retrieval system serve to match the writers ideas expressed in the document
with the user requirements or demand for those.
Thus, an information retrieval system serves as a bridge between the world of creators or
generators of information and the users of that information.
Some terminology
An IR system looks for data matching using some criteria defined by the users in their
queries.
The language used to ask a question is called the query language.
These queries use keywords (atomic items characterizing some data).
The basic unit of data is a document (can be a file, an article, a paragraph, etc.).
A document corresponds to free text (may be unstructured).
All the documents are gathered into a collection (or corpus).
Example:
1 million documents, each counting about 1000 words
if each word is encoded using 6 bytes:
109 × 1000 × 6/1024 ≃ 6GB
5.3.3 Components of Information Retrieval
In an information retrieval system there are the documents or sources of information on one
side and on the other there are the user‗s queries. These two sides are linked through a series of
tasks. Lancaster mentions that an information retrieval system comprises six major subsystems:
The document subsystem
The indexing subsystem
The vocabulary subsystem
The searching subsystem
The service-system interface, and
Page 206
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Page 207
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
required. Online IR is nothing but retrieving data from web sites, web pages and servers that
may include data bases, images, text, tables, and other types.
5.3.4 Functions of information retrieval system
An information retrieval system deals with various sources of information on the one hand
and user‗s requirements on the other. It must:
Analyze the contents of the sources of information as well as the user‗s queries, and then
Match these to retrieve those items that are relevant
The major functions of an information retrieval system can be listed as follows:
To identify the information (sources) relevant to the areas of interest of the target users
community
To analyze the contents of the sources (documents)
To represent the contents of the analyzed sources in a way that will be suitable for
matching user‗s queries
To analyze user‗s queries and to represent them in a form that will be suitable for
matching with the database
To match the search statement with the stored database
To retrieve the information that is relevant, and
To make necessary adjustments in the system based on feedback form the users.
5.3.5 Features of an information retrieval system
An effective information retrieval system must have provisions for:
Prompt dissemination of information
Filtering of information
The right amount of information at the right time
Active switching of information
Receiving information in an economical way
Browsing
Getting information in an economical way
Current literature
Access to other information systems
Interpersonal communications, and
Personalized help.
Page 208
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Page 210
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Page 211
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
this step, we search for mentions of potentially interesting entities in each sentence. Finally, we
use relation detection to search for likely relations between different entities in the text.
Figure : Simple Pipeline Architecture for an Information Extraction System. This system
takes the raw text of a document as its input, and generates a list of (entity, relation,
entity) tuples as its output.
5.4.2 Applications of IE
Enterprise
News tracking
Customer care
Data cleaning
Personal information management
Scientific applications
Web oriented applications
Citation databases
Opinion databases
Community websites
Comparison shopping
Ad placement on webpages
Structured web searches
Page 212
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Page 213
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Natural Language Processing is the driving force behind the following common applications:
Language translation applications such as Google Translate
Word Processors such as Microsoft Word and Grammarly that employ NLP to check
grammatical accuracy of texts.
Interactive Voice Response (IVR) applications used in call centers to respond to certain
users‘ requests.
Personal assistant applications such as OK Google, Siri, Cortana, and Alexa.
5.5.2 NLP Terminology
Phonology − It is study of organizing sound systematically.
Morphology − It is a study of construction of words from primitive meaningful units.
Morpheme − It is primitive unit of meaning in a language.
Syntax − It refers to arranging words to make a sentence. It also involves determining
the structural role of words in the sentence and in phrases.
Semantics − It is concerned with the meaning of words and how to combine words into
meaningful phrases and sentences.
Pragmatics − It deals with using and understanding sentences in different situations and
how the interpretation of the sentence is affected.
Discourse − It deals with how the immediately preceding sentence can affect the
interpretation of the next sentence.
World Knowledge − It includes the general knowledge about the world.
5.5.3 Steps in NLP
There are general five steps:
1) Lexical Analysis − It involves identifying and analyzing the structure of words. Lexicon
of a language means the collection of words and phrases in a language. Lexical analysis
is dividing the whole chunk of txt into paragraphs, sentences, and words.
2) Syntactic Analysis (Parsing) − Syntax refers to the arrangement of words in a sentence
such that they make grammatical sense. In NLP, syntactic analysis is used to assess how
the natural language aligns with the grammatical rules. Computer algorithms are used to
apply grammatical rules to a group of words and derive meaning from them. Here are
some syntax techniques that can be used:
Page 214
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
3) Semantic Analysis − It draws the exact meaning or the dictionary meaning from the text.
The text is checked for meaningfulness. It is done by mapping syntactic structures and objects
in the task domain. The semantic analyzer disregards sentence such as ―hot ice-cream‖.
Semantics refers to the meaning that is conveyed by a text. Semantic analysis is one of the
difficult aspects of Natural Language Processing that has not been fully resolved yet. It involves
Page 215
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
applying computer algorithms to understand the meaning and interpretation of words and how
sentences are structured. Here are some techniques in semantic analysis:
Named entity recognition (NER): It involves determining the parts of a text that
can be identified and categorized into preset groups. Examples of such groups
include names of people and names of places.
Word sense disambiguation: It involves giving meaning to a word based on the
context.
Natural language generation: It involves using databases to derive semantic
intentions and convert them into human language.
Discourse Integration − The meaning of any sentence depends upon the
meaning of the sentence just before it. In addition, it also brings about the
meaning of immediately succeeding sentence.
Pragmatic Analysis − During this, what was said is re-interpreted on what it
actually meant. It involves deriving those aspects of language which require real
world knowledge.
5.5.4 Implementation Aspects of Syntactic Analysis
There are a number of algorithms researchers have developed for syntactic analysis, but
we consider only the following simple methods −
Context-Free Grammar
Top-Down Parser
Context-Free Grammar
It is the grammar that consists rules with a single symbol on the left-hand side of the
rewrite rules. Let us create grammar to parse a sentence.
―The bird pecks the grains‖
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
Page 216
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
The parse tree breaks down the sentence into structured parts so that the computer can
easily understand and process it. In order for the parsing algorithm to construct this parse tree, a
set of rewrite rules, which describe what tree structures are legal, need to be constructed. These
rules say that a certain symbol may be expanded in the tree by a sequence of other symbols.
According to first order logic rule, if there are two strings Noun Phrase (NP) and Verb Phrase
(VP), then the string combined by NP followed by VP is a sentence. The rewrite rules for the
sentence are as follows:
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon:
DET → a | the
ADJ → beautiful | perching
N → bird | birds | grain | grains
V → peck | pecks | pecking
The parse tree can be created as shown:
Page 217
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or
"pecks", sentences such as "The birds pecks the grains" can be wrongly permitted. i. e. the
subject-verb agreement error is approved as correct.
Merit:
The simplest style of grammar, therefore widely used one.
Demerits:
They are not highly precise. For example, ―The grains peck the bird‖, is a syntactically
correct according to parser, but even if it makes no sense, parser takes it as a correct
sentence.
To bring out high precision, multiple sets of grammar need to be prepared. It may require
a completely different sets of rules for parsing singular and plural variations, passive
sentences, etc., which can lead to creation of huge set of rules that are unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence
of terminal symbols that matches the classes of the words in the input sentence until it consists
entirely of terminal symbols. These are then checked with the input sentence to see if it
matched. If not, the process is started over again with a different set of rules. This is repeated
until a specific rule is found which describes the structure of the sentence.
Merit:
It is simple to implement.
Demerits:
It is inefficient, as the search process has to be repeated if an error occurs.
Slow speed of working
Page 218
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Machine translation is the automatic translation of text from one natural language (the
source) to another (the target). It was one of the first application areas envisioned for computers,
but it is only in the past decade that the technology has seen widespread usage. Here is a
sentence from this book: ―AI is one of the newest fields in science and engineering.‖
And here it is translated from English to Tamil by an online tool, Google Translate:
For those who don‘t read Tamil, here is the Tamil translated back to English. The words that
came out different are in italics: ―AI is one of the new disciplines in science and engineering.‖
The differences are all reasonable paraphrases, such as frequently mentioned for regularly cited.
This is typical accuracy: of the two sentences, one has an error that would not be made by a
native speaker, yet the meaning is clearly conveyed.
Historically, there have been three main applications of machine translation. Rough
translation, as provided by free online services, gives the ―gist‖ of a foreign sentence or
document, but contains errors. Pre-edited translation is used by companies to publish their
documentation and sales materials in multiple languages. The original source text is written in a
constrained language that is easier to translate automatically, and the results are usually edited by
a human to correct any errors. Restricted-source translation works fully automatically, but only
on highly stereotypical language, such as a weather report.
Page 219
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
The problem is that different languages categorize the world differently. For example, the
French word ―doux‖ covers a wide range of meanings corresponding approximately to the
English words ―soft,‖ ―sweet,‖ and ―gentle.‖ A translator (human or machine) often needs to
understand the actual situation described in the source, not just the individual words. For
example, to translate the English word ―him,‖ into Tamil, a choice must be made between the
humble and honorific form, a choice that depends on the social relationship between the speaker
and the referent of ―him.‖
Page 220
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Here the factor P(f) is the target language model for French; it says how probable a
given sentence is in French. P(e|f) is the translation model based on Baye‘s rule; it says how
probable an English sentence is as a translation for a given French sentence. Similarly, P(f | e) is
a translation model from English to French.
Baye‘s rule translation is applicable in few domains. But statistical machine translation
optimizes using a more sophisticated model that takes into account many of the features from the
language model. The translation model is learned from a bilingual corpus—a collection of
parallel texts, each an English/French pair. Now, if we had an infinitely large corpus, then
translating a sentence would just be a lookup task. But of course our resources are finite, and
most of the sentences we will be asked to translate will be novel. Most sentences composed of
phrases. Translation is a matter of three steps:
1. Break the English sentence into phrases.
2. For each phrase, choose a corresponding French phrase. We use the notation P(fi | ei)
for the phrasal probability that fi is a translation of ei.
3. Choose a permutation of the phrases. For each fi, choose a distortion di, which is the
number of words that phrase fi has moved with respect to ei.
Page 221
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Here P(sound1:t|word1:t) is the acoustic model. It describes the sounds of words such as
―ceiling‖ that begins with a soft ―c‖ and sounds the same as ―sealing.‖ P(word 1:t) is known as
the language model. It specifies the prior probability of each utterance—for example, that
―ceiling fan‖ is about 500 times more likely as a word sequence than ―sealing fan.‖
This approach was named the noisy channel model by Claude Shannon (1948). He
described a situation in which an original message (the words in our example) is transmitted over
a noisy channel (such as a telephone line) such that a corrupted message (the sounds in our
example) are received at the other end. Shannon showed that no matter how noisy the channel, it
is possible to recover the original message with arbitrarily small error, if we encode the original
message in a redundant enough way.
5.7.1 Acoustic model
Sound waves are periodic changes in pressure that propagate through the air. When these
waves strike the diaphragm of a microphone, the back-and-forth movement generates an electric
current. An analog-to-digital converter measures the size of the current, which approximates the
amplitude of the sound wave at discrete intervals called the sampling rate.
Page 222
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Speech sounds, which are mostly in the range of 100 Hz to 1000 Hz, are typically
sampled at a rate of 8 kHz. The precision of each measurement is determined by the quantization
factor. We only need to distinguish between different speech sounds. Linguists have identified
about 100 speech sounds, or phones, that can be composed to form all the words in all known
human languages. Roughly speaking, a phone is the sound that corresponds to a single vowel or
consonant, but there are some complications: combinations of letters, such as ―th‖ and ―ng‖
produce single phones, and some letters produce different phones in different contexts (e.g., the
―a‖ in rat and rate).
Let us see a brief overview of the features in a typical system. First, a Fourier transform
is used to determine the amount of acoustic energy at about a dozen frequencies. Then we
compute a measure called the mel frequency cepstral coefficient (MFCC) or MFCC for each
frequency.
We also compute the total energy in the frame ( signal over a time slice). That gives
thirteen features; for each one we compute the difference between this frame and the previous
frame, and the difference between differences, for a total of 39 features. These are continuous-
valued; the easiest way to fit them into the HMM (Hidden Markov Model) framework is to
discretize the values.
Page 223
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
language. For task-specific speech recognition, the corpus should be task-specific: to build your
airline reservation system, get transcripts of prior calls. It also helps to have task-specific
vocabulary, such as a list of all the airports and cities served, and all the flight numbers.
Page 224
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
5.8 ROBOTICS
Robots are physical agents that perform tasks by manipulating the physical world. To do
so, they are equipped with effectors such as legs, wheels, joints, and grippers. Robots are also
equipped with sensors, which allow them to perceive their environment, including cameras and
lasers to measure the environment, and gyroscopes and accelerometers to measure the robot‘s
own motion. Most of today‘s robots fall into one of three primary categories.
The agent architecture consists of sensors, effectors, and processors. The success of real
robots depends as much on the design of sensors and effectors that are appropriate for the task.
Sensors
Sensors are the perceptual interface between robot and environment. Passive sensors,
such as cameras, are true observers of the environment. They capture signals that are generated
by other sources in the environment. Active sensors, such as sonar, send energy into the
environment. Range finders are sensors that measure the distance to nearby objects. In the early
days of robotics, robots were commonly equipped with sonar sensors. Sonar sensors emit
directional sound waves, which are reflected by objects, with some of the sound making it back
into the sensor. The time and intensity of the returning signal indicates the distance to nearby
objects.
A second important class of sensors is location sensors. The Global Positioning System
(GPS) measures the distance to satellites that emit pulsed signals. GPS receivers can recover the
distance to these satellites by analyzing phase shifts. By triangulating signals from multiple
satellites receivers can determine their absolute location on Earth to within a few meters. Third is
imaging sensors, the cameras provide us with images of the environment using computer vision
techniques.
The fourth important class is proprioceptive sensors, which inform the robot of its own
motion. To measure the exact configuration of a robotic joint, motors are often equipped with
shaft decoders that count the revolution of motors in small increments. On mobile robots, shaft
decoders that report wheel revolutions can be used for odometry—the measurement of distance
traveled. Unfortunately, wheels tend to drift and slip, so odometry is accurate only over short
distances.
Other important aspects of robot state are measured by force sensors and torque
sensors. These are indispensable when robots handle fragile objects or objects whose exact
shape and location is unknown.
Effectors
Effectors are the means by which robots move and change the shape of their bodies. We
count one degree of freedom for each independent direction in which a robot, or one of its
effectors, can move. For example, a rigid mobile robot such as an AUV has six degrees of
freedom, three for its (x, y, z) location in space and three for its angular orientation, known as
Page 226
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
yaw, roll, and pitch. These six degrees define the kinematic state or pose of the robot. The arm
in Figure 25.4(a) has exactly six degrees of freedom, created by five revolute joints that
generate rotational motion and one prismatic joint that generates sliding motion.
Figure 25.4 (a) The Stanford Manipulator with six degrees of freedom.
(b) Motion of a non-holonomic-four-wheeled vehicle with front-wheel steering.
The car has three effective degrees of freedom but two controllable degrees of
freedom. We say a robot is non-holonomic if it has more effective DOFs than controllable
DOFs and holonomic if the two numbers are the same. Holonomic robots are easier to control,
i.e., it would be much easier to park a car that could move sideways as well as forward and
backward—but holonomic robots are also mechanically more complex.
Differential drive robots possess two independently actuated wheels (or tracks), one on
each side, as on a military tank. If both wheels move at the same velocity, the robot moves on a
straight line. If they move in opposite directions, the robot turns on the spot. An alternative is the
synchro drive, in which each wheel can move and turn around its own axis.
Legged robots have been made to walk, run, and even hop. This robot is dynamically
stable, meaning that it can remain upright while hopping around. A robot that can remain upright
without moving its legs is called statically stable. The electric motor is the most popular
mechanism for both manipulator actuation and locomotion, but pneumatic actuation using
compressed gas and hydraulic actuation using pressurized fluids also have their application.
Page 227
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Perception is the process by which robots map sensor measurements into internal
representations of the environment. Perception is difficult because sensors are noisy, and the
environment is partially observable, unpredictable, and often dynamic. In other words, robots
have all the problems of state estimation (or filtering). As a rule of thumb, good internal
representations for robots have three properties: they contain enough information for the robot to
make good decisions, they are structured so that they can be updated efficiently, and they are
natural in the sense that internal variables correspond to natural state variables in the physical
world.
Robot perception can be viewed as temporal inference from sequences of actions and
measurements, as illustrated by this dynamic Bayes network. For robotics problems, we include
the robot‘s own past actions as observed variables in the model. Figure 25.7 shows the notation
used in this chapter: Xt is the state of the environment (including the robot) at time t, Zt is the
observation received at time t, and At is the action taken after the observation is received.
Page 228
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
More difficulty is in global localization problem, in which the initial location of the
object is entirely unknown. In kidnapping problem we are mean to our robot and kidnap the
object it is trying to localize. It is used to test the robustness of a localization under extreme
conditions.
Next, we need a sensor model. We will consider two kinds of sensor model. The first
assumes that the sensors detect stable, recognizable features of the environment called
landmarks. For each landmark, the range and bearing are reported. Suppose the robot‘s state
is xt and it senses a landmark whose location is known to be (xi, yi)T. Without noise, the range
and bearing can be calculated by simple geometry.
is illustrated in Figure 25.10 as the robot finds out where it is inside an office building. In the
first image, the particles are uniformly distributed based on the prior, indicating global
uncertainty about the robot‘s position. In the second image, the first set of measurements arrives
and the particles form clusters in the areas of high posterior belief. In the third, enough
measurements are available to push all the particles to a single location.
The Kalman filter is the other major way to localize. A Kalman filter represents the
posterior P(Xt | z1:t, a1:t−1) by a Gaussian. The mean of this Gaussian will be denoted μt and its
covariance Σt. The main problem with Gaussian beliefs is that they are only closed under linear
motion models f and linear measurement models h.
Sometimes the navigating robot will have to determine its location relative to a map it
doesn‘t quite know, at the same time building this map while it doesn‘t quite know its actual
location. This mapping problem is often called as Simultaneous localization and mapping,
abbreviated as SLAM.
Page 230
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
The complication is added by path planning in continuous spaces. There are two main
approaches: cell decomposition and skeletonization. Each reduces the continuous path-planning
problem to a discrete graph-search problem.
5.11.1 Configuration space
Consider a simple representation for a simple robot motion problem. The robot arm
shown in Figure has two joints that move independently. Moving the joints, alters the (x, y)
coordinates of the elbow and the gripper. This suggests that the robot‘s configuration can be
described by a four dimensional coordinate: (xe, ye) for the location of the elbow relative to the
environment and (xg, yg) for the location of the gripper. Clearly, these four coordinates
characterize full state of robot. They constitute what is known as workspace representation,
since the coordinates of the robot are specified in the same coordinate system as objects it seeks
to manipulate (or to avoid). Paths adhere to workspace is configuration space representation.
The problem with the workspace representation is that not all workspace coordinates are
actually attainable, even in the absence of obstacles. This is because of the linkage constraints
on the space of attainable workspace coordinates.
Figure 25.14 (a) Workspace representation of a robot arm with 2 DOFs. The workspace
is a box with a flat obstacle hanging from the ceiling.
(b) Configuration space of the same robot.
Page 231
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Page 232
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Figure 25.16 (a) Discrete grid cell approximation (b) The Voronoi graph.
5.11.5 Planning Uncertain Movements
In robotics, uncertainty arises from partial observability of the environment and from the
stochastic effects of the robot‘s actions. Most of today‘s robots use deterministic algorithms for
decision making, to extract the most likely state from the probability distribution produced by
the state estimation algorithm. Many robots plan paths online during plan execution, with online
replanning technique.
Robust methods
Uncertainty can also be handled using robust control methods. A robust method is one
that assumes a bounded amount of uncertainty in each aspect of a problem, but does not assign
probabilities to values within the allowed interval. A robust solution is one that works no matter
what actual values occur, provided they are within the assumed interval. An extreme form of
robust method is the conformant planning approach which produces plans that work with no
state information at all.
Fine-motion planning (or FMP) is a robotic assembly task. Fine-motion planning
involves moving a robot arm in very close proximity to a static environment object. A fine-
motion plan consists of a series of guarded motions. Each guarded motion consists of (1) a
motion command and (2) a termination condition, which is a predicate on the robot‘s sensor
values, and returns true to indicate the end of the guarded move. The motion commands are
typically compliant motions that allow the effector to slide if the motion command would cause
collision with an obstacle.
Page 233
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
5.12 MOVING
Dynamics and control
Dynamic state extends the kinematic state of a robot by its velocity. For example, in
addition to the angle of a robot joint, the dynamic state also captures the rate of change of the
angle, and possibly even its momentary acceleration.
The transition model for a dynamic state representation includes the effect of forces on
this rate of change. Such models are typically expressed via differential equations, which are
equations that relate a quantity (e.g., a kinematic state) to the change of the quantity over time
(e.g., velocity).
Controllers are techniques for generating robot controls in real time using feedback from
the environment, so as to achieve a control objective. If the objective is to keep the robot on a
preplanned path, it is often referred to as a reference controller and the path is called a
reference path. Controllers that optimize a global cost function are known as optimal
controllers. Optimal policies for continuous MDPs are, in effect, optimal controllers.
1. Controllers that provide force in negative proportion to the observed error are known as P
controllers. The letter ‗P‘ stands for P proportional, indicating that the actual control is
proportional to the error of the robot manipulator.
2. A robot is said to be strictly stable if it is able to return to and then stay on its reference
path upon such perturbations. The simplest controller that achieves strict stability in our
domain is a PD controller. The letter ‗P‘ stands again for proportional, and ‗D‘ stands
for derivative.
3. A controller that calculates the integral of the error over time is called a PID controller
(for proportional integral derivative). PID controllers are widely used in industry, for a
variety of control problems.
Potential-field control
Potential-field control defines an attractive force that pulls the robot towards its goal
configuration and a repellent potential field that pushes the robot away from obstacles. Its single
global minimum is the goal configuration, and the value is the sum of the distance to this goal
configuration and the proximity to obstacles.
Page 234
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Reactive control
Reactive control is a reflex agent architecture. For example, picture a legged robot that
attempts to lift a leg over an obstacle. We could give this robot a rule that says lift the leg a small
height h and move it forward, and if the leg encounters an obstacle, move it back and start again
at a higher height. On rugged terrain, obstacles may prevent a leg from swinging forward. This
problem can be overcome by a remarkably simple control rule: when a leg’s forward motion is
blocked, simply retract it, lift it higher, and try again. The resulting controller is shown in Figure
25.24(b) as a finite state machine.
Figure 25.24 (a) Genghis, a hexapod robot. (b) An augmented finite state machine (AFSM)
for the control of a single leg.
Reinforcement learning control
One particularly exciting form of control is based on the policy search form of
reinforcement learning. Policy search is the simplest of all the methods in this chapter: the idea is
to keep twiddling the policy as long as its performance improves, then stop.
6. Define Robots?
Robots are physical agents that perform tasks by manipulating the physical world. To do so, they
are equipped with effectors such as legs, wheels, joints, and grippers. Robots are also equipped
with sensors, which allow them to perceive their environment, including cameras and lasers to
measure the environment, and gyroscopes and accelerometers to measure the robot‘s own
motion.
Page 236
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Manipulators or robot arms are physically anchored to their workplace, in a factory assembly
line. Manipulator motion usually involves a chain of controllable joints, enabling such robots to
place their effectors in any position within the workplace.
Page 237
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Page 238
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Page 239
CS8691 – ARTIFICIAL INTELLIGENCE UNIT V
Page 240