Unit V-AI-KCS071
Unit V-AI-KCS071
Unit V-AI-KCS071
ARTIFICIAL INTELLIGENCE
KCS 071
B.TECH IV YR / VII SEM
(2022-23)
1
Course Outcomes
Course
Statement (On completion of this course, the student will be able to)
Outcome
Understand the basics of the theory and practice of Artificial Intelligence as a
CO1
discipline and about intelligent agents.
CO2 Understand search techniques and gaming theory.
The student will learn to apply knowledge representation techniques and
CO3 problem solving strategies to common AI applications.
CO4 Student should be aware of techniques used for classification and clustering.
CO5 Student should aware of basics of pattern recognition and steps required for it.
UNIT 1
Introduction–Definition – Future of Artificial Intelligence – Characteristics of Intelligent
Agents–Typical Intelligent Agents – Problem Solving Approach to Typical AI problems.
UNIT 2
Problem solving Methods – Search Strategies- Uninformed – Informed – Heuristics – Local
Search Algorithms and Optimization Problems – Searching with Partial Observations –
Constraint Satisfaction Problems – Constraint Propagation – Backtracking Search – Game
Playing – Optimal Decisions in Games – Alpha – Beta Pruning – Stochastic Games.
UNIT 3
First Order Predicate Logic – Prolog Programming – Unification – Forward Chaining-
Backward Chaining – Resolution – Knowledge Representation – Ontological Engineering-
Categories and Objects – Events – Mental Events and Mental Objects – Reasoning Systems for
Categories – Reasoning with Default Information.
UNIT 4
Architecture for Intelligent Agents – Agent communication – Negotiation and Bargaining –
Argumentation among Agents – Trust and Reputation in Multi-agent systems.
UNIT 5
AI applications – Language Models – Information Retrieval- Information Extraction – Natural
Language Processing – Machine Translation – Speech Recognition.
2
UNIT 5
APPLICATIONS
AI applications – Language Models – Information Retrieval- Information Extraction – Natural
Language Processing – Machine Translation – Speech Recognition.
2. Finally, natural language are difficult to deal with because they are very large, and
constantly changing. Thus, our language models are, at best, an approximation. We start
with simplest possible approximation and move up from there.
We can define the probability of a sequence of characters P(c1:N) under the trigram
model by first factoring with the chain rule and then using the Markov assumption:
N N
3
For a trigram character model in a language with 100 characters, P(Ci|Ci-2:i=1) has a
million entries, and can be accurately estimated by counting character sequences in a
body of text of 10 million characters or more. We call a body of text a corpus (plural
corpora), from the Latin word for body.
argmax
𝑙* = argmaxP(𝑙 | e1 : N)
𝑙
argmax
= P(𝑙)O(C 𝑙)
1:N
𝑙
argmax N
= P( 𝑙 ) P(c4 | c4 − 2 : i −1, 𝑙)
𝑙 i=1
4. Other tasks for character models include spelling correction, genre classification, and
named-entity recognition. Genre classification means decided if a text is a new story, a
legal document, a scientific article, etc. While many features help make this
classification, counts of punctuation and other character n-gram features go a long way
(Kessler et al., 1997).
4
5.3 SMOOTHING N-GRAM MODELS
1. The major complication of n-gram models is that the training corpus provides only an
estimate of the true probability distribution.
2. For common character sequence such as “th” any English corpus will give a good
estimate: about 1.5% of all trigrams.
3. On the other hand, “ht” is very uncommon – no dictionary words start5 with ht. It is
likely that the sequence would
4. Have a count of zero in a training corpus of standard English. Does that mean we should
assign P(“th”) = 0? If we did, then the text “The program issues an http request” would
have an English probability of zero, which seems wrong.
It is also possible to have the values of I depend on the counts: if we have a high count
of trigrams, then we weigh them relatively more; if only a low count, then we put more
weight on the bigram and unigram models.
4. It can also be thought of as the weighted average branching factor5 of a model. Suppose
there are 100 characters in our language, and our model says they are all equally likely.
5
Then for a sequence of any length, the perplexity will be 100. If some characters are
more likely than others, and the model reflects that, then the model will have a
perplexity less than 100.
2. All the same mechanism applies equally to word and character models.
3. The main difference is that the vocabulary— the set of symbols that make up the corpus
and the model—is larger.
4. There are only about 100 characters in most languages, and sometimes we build
character models that are even more restrictive, for example by treating “A” and “a” as
the same symbol or by treating all punctuation as the same symbol. But with word
models we have at least tens of thousands of symbols, and sometimes millions. The
wide range is because it is not clear what constitutes a word.
6. With word models there is always the chance of a new word that was not seen in the
training corpus, so we need to model that explicitly in our language model.
7. This can be done by adding just one new word to the vocabulary: <UNK>, standing for
the unknown word.
8. Sometimes multiple unknown-word symbols are used, for different classes. For
example, any string of digits might be replaced with <NUM>, or any email address with
<EMAIL>.
The best-known examples of information retrieval systems are search engines on the
World Wide Web. A Web user can type a query such as [AI book] into a search engine and see
a list of relevant pages. In this section, we will see how such systems are built. An information
retrieval (henceforth IR) system can be characterized by
1. A corpus of documents. Each system must decide what it wants to treat as a document:
a paragraph, a page, or a multipage text.
2. Queries posed in a query language. A query specifies what the user wants to know. The
query language can be just a list of words, such as [AI book]; or it can specify a phrase
of words that must be adjacent, as in [“AI book”]; it can contain Boolean operators as
6
in [AI AND book]; it can include non-Boolean operators such as [AI NEAR book] or
[AI book site: www.aaai.org].
3. A result set. This is the subset of documents that the IR system judges to be relevant to
the query. By relevant, we mean likely to be of use to the person who posed the query,
for the particular information need expressed in the query.
4. A presentation of the result set. This can be as simple as a ranked list of document titles
or as complex as a rotating color map of the result set projected onto a three-
dimensional space, rendered as a two-dimensional display.
The earliest IR systems worked on a Boolean keyword model. Each word in the
document collection is treated as a Boolean feature that is true of a document if the word occurs
in the document and false if it does not.
Advantage
Disadvantages
2. Boolean expressions are unfamiliar to users who are not programmers or logicians.
Users find it unintuitive that when they want to know about farming in the states of
Kansas and Nebraska they need to issue the query [farming (Kansas OR Nebraska)].
3. It can be hard to formulate an appropriate query, even for a skilled user. Suppose we
try [information AND retrieval AND models AND optimization] and get an empty
result set. We could try [information OR retrieval OR models OR optimization], but if
that returns too many results, it is difficult to know what to try next.
1. Most IR systems have abandoned the Boolean model and use models based on the
statistics of word counts.
2. A scoring function takes a document and a query and returns a numeric score; the most
relevant documents have the highest scores
3. In the BM25 scoring function, the score is a linear weighted combination of scores for
each of the words that make up the query
7
• First, the frequency with which a query term appears in a document (also known as TF
for term frequency). For the query [farming in Kansas], documents that mention
“farming” frequently will have higher scores.
• Second, the inverse document frequency of the term, or IDF. The word “in” appears in
almost every document,
• so it has a high document frequency, and thus a low inverse document frequency, and
thus it is not as important to the query as “farming” or “Kansas.”
• Third, the length of the document. A million-word document will probably mention all
the query words, but may not actually be about the query. A short document that
mentions all the words is a much better candidate.
j i
| dj |
TF(qi , d j ) + k.(1− b + a.
i=1
where |dj| is the length of document dj in words, and L is the average document length
in the corpus, L = i |di| N. We have two parameters, k and b, that can be tuned by cross-
validation; typical values are k = 2.0 and b = 0.75. IDF (qi) is the inverse document.
6. Systems create an index ahead of time that lists, for each vocabulary word, the
documents that contain the word. This is called the hit list for the word. Then when
given an query, we intersect the hit lists of the query words and only score the
documents in the intersection.
Imagine that IR system has returned a result set for a single query, for which we know
which documents are and are not relevant, out of a corpus of 100 documents. The document
counts in each category are given in the following table.
Table 5.1
Precision measures the proportion of documents in the result set that are actually
relevant. In our example, the precision is 30/(30 + 10)=.75. The false positive rate is 1 −
.75=.25. 2. Recall measures the proportion of all the relevant documents in the collection that
8
are in the result set. In our example, recall is 30/(30 + 20)=.60. The false negative rate is 1
−.60=.40. 3. In a very large document collection, such as the WorldWideWeb, recall is
difficult to compute, because there is no easy way to examine every page on the Web for
relevance.
5.9 IR REFINEMENTS
PageRank (PR) is an algorithm used by Google Search to rank web pages in their search
engine results. Page Rank was named after Larry Page, one of the founders of Google.
PageRank is a way of measuring the importance of website pages. According to Google:
PageRank works by counting the number and quality of links to a page to determine a rough
estimate of how important the website is. The underlying assumption is that more important
websites are likely to receive more links from other website
9
Figure 5.1 The HITS algorithm for computing hubs and authorities with respect to a
query. REELVANT-PAGES fetches the pages that match the query and EXPAND-
PAGES adds in every page that links to or is linked from one of the relevant pages.
NORMALIZE divides each page’s score by the sum of the sequence of all pages’
scores (separately for both the authority and hubs scores)
We will see that the recursion bottom out property. The PageRank for a page p is
defined as
1− d
PR(p) = + d PR(ini )
N i C(ini )
where PR(p) is the PageRank of page p, N is the total number of pages in the corpus,
ini are the pages that link into p, and C(ini) is the count of the total number of out-links
on page ini. The constant d is a damping factor. It can be understood through the
random surface model: imagine a Web surface who starts at some random page and
begins exploring.
1. The Hyperlink-Induced Topic Search algorithm, also known as “Hubs and Authorities”
or HITS, is another influential link-analysis algorithm.
3. HITS first find a set of pages that are relevant to the query. It does that by intersecting
hit lists of queries
4. Words, and then adding pages in the link neighborhood of these pages—pages that link
to or are linked from one of the pages in the original relevant set.
5. Each page in this set is considered an authority on the query to the degree that other
pages in the relevant set point to it. A page is considered a hub to the degree that it
points to other authoritative pages in the relevant set.
6. Just as with PageRank, we don’t want to merely count the number of links; we want to
give more value to the high-quality hubs and authorities.
7. Thus, as with PageRank, we iterate a process that updates the authority score of a page
to be the sum of the hub scores of the pages that point to it, and the hub score to be the
sum of the authority scores of the pages it points to.
8. Both PageRank and HITS played important roles in developing our understanding of
Web information retrieval. These algorithms and their extensions are used in ranking
10
billions of queries daily as search engines steadily develop better ways of extracting yet
finer signals of search relevance.
2. One step up from attribute-based extraction systems are relational extraction systems,
which deal with multiple objects and the relations among them.
5. That is, the system consists of a series of small, efficient finite-state automata (FSAs),
where each automaton receives text as input, transduces the text into a different format,
and passes it along to the next automaton. FASTUS consists of five stages: 1.
Tokenization 2. Complex-word handling 3. Basic-group handling 4. Complex-phrase
handling 5. Structure merging
6. FASTUS’s first stage is tokenization, which segments the stream of characters into
tokens (words, numbers, and punctuation). For English, tokenization can be fairly
simple; just separating characters at white space or punctuation does a fairly good job.
Some tokenizers also deal with markup languages such as HTML, SGML, and XML.
7. The second stage handles complex words, including collocations such as “set up” and
“joint venture,” as well as proper names such as “Bridgestone Sports Co.” These are
recognized by a combination of lexical entries and finite- state grammar rules.
8. The third stage handles basic groups, meaning noun groups and verb groups. The idea
is to chunk these into units that will be managed by the later stages.
9. The fourth stage combines the basic groups into complex phrases. Again, the aim is to
have rules that are finite- state and thus can be processed quickly, and that result in
11
unambiguous (or nearly unambiguous) output phrases. One type of combination rule
deals with domain-specific events.
10. The final stage merges structures that were built up in the previous step. If the next
sentence says “The joint venture will start production in January,” then this step will
notice that there are two references to a joint venture, and that they should be merged
into one. This is an instance of the identity uncertainty problem.
1. The simplest probabilistic model for sequences with hidden state is the hidden Markov
model, or HMM.
• First, HMMs are probabilistic, and thus tolerant to noise. In a regular expression, if a
single expected character is missing, the regex fails to match; with HMMs there is
graceful degradation with missing characters/words, and we get a probability indicating
the degree of match, not just a Boolean match/fail.
• Second, HMMs can be trained from data; they don’t require laborious engineering of
templates, and thus they can more easily be kept up to date as text changes over time.
12
4. The other approach is to combine all the individual attributes into one big HMM, which
would then find a path that wanders through different target attributes, first finding a
speaker target, then a date target, etc. Separate HMMs are better when we expect just
one of each attribute in a text and one big HMM is better when the texts are more free-
form and dense with attributes.
5. HMMs have the advantage of supplying probability numbers that can help make the
choice. If some targets are missing, we need to decide if this is an instance of the desired
relation at all, or if the targets found are false positives. A machine learning algorithm
can be trained to make this choice.
• First it is open-ended—we want to acquire facts about all types of domains, not just one
specific domain.
• Second, with a large corpus, this task is dominated by precision, not recall—just as with
question answering on the Web
• Third, the results can be statistical aggregates gathered from multiple sources, rather
than being extracted from one specific text.
2. Here is one of the most productive templates: NP such as NP (, NP)* (,)? ((and | or)
NP)?.
3. Here the bold words and commas must appear literally in the text, but the parentheses
are for grouping, the asterisk means repetition of zero or more, and the question mark
means optional.
4. NP is a variable standing for a noun phrase
5. This template matches the texts “diseases such as rabies affect your dog” and “supports
network protocols such as DNS,” concluding that rabies is a disease and DNS is a
network protocol.
6. Similar templates can be constructed with the key words “including,” “especially,” and
“or other.” Of course these templates will fail to match many relevant passages, like
“Rabies is a disease.” That is intentional.
7. The “NP is a NP” template does indeed sometimes denote a subcategory relation, but
it often means something else, as in “There is a God” or “She is a little tired.” With a
large corpus we can afford to be picky; to use only the high-precision templates.
8. We’ll miss many statements of a subcategory relationship, but most likely we’ll find a
paraphrase of the statement somewhere else in the corpus in a form we can use.
13
5.16 AUTOMATED TEMPLATE CONSTRUCTION
Clearly these are examples of the author–title relation, but the learning system had no
knowledge of authors or titles. The words in these examples were used in a search over a Web
corpus, resulting in 199 matches. Each match is defined as a tuple of seven strings, (Author,
Title, Order, Prefix, Middle, Postfix, URL), where Order is true if the author came first and
false if the title came first, Middle is the characters between the author and title, Prefix is the
10 characters before the match, Suffix is the 10 characters after the match, and URL is the Web
address where the match was made.
1. Traditional information extraction system that is targeted at a few relations and more
like a human reader who learns from the text itself; because of this the field has been
called machine reading.
14
Figure 5.3 Eight general templates that cover about 95% of the ways that
relations are expressed in English
TEXTRUNNER achieves a precision of 88% and recall of 45% (F1 of 60%) on a large
Web corpus. TEXTRUNNER has extracted hundreds of millions of facts from a corpus of half-
billion Web pages.
On a basic level, MT performs simple substitution of words in one language for words
in another, but that alone usually cannot produce a good translation of a text because
recognition of whole phrases and their closest counterparts in the target language is needed.
Solving this problem with corpus statistical, and neural techniques is a rapidly growing field
that is leading to better translations, handling differences in linguistic typology, translation of
idioms and the isolation of anomalies.
All translation systems must model the source and target languages, but systems vary
in the type of models they use. Some systems, attempt to analyse the sourced language text all
the way in to an interlingua knowledge representation and then generate sentences in the target
language from that representation. This is difficult because it involves three unsolved problems:
creating a complete knowledge representation of everything; parsing into that representation;
and generating sentences from that representation. Other systems are based on a transfer model.
15
Figure 5.4 The Vauquois triangle: schematic diagram of the choice for a machine
translation system (Vauquois, 1968). We start with English test as the top. An
interlingua-based system follows the solid lines, pursuing English first into a syntactic
form, then into a semantic representation and an interlingua representation, and then
through generation to a semantic, syntactic and lexical form in French. A transfer-
based system uses the dashed lines as a shortcut. Different systems make the transfer
at different points: some make it at multiple points.
1. Find parallel texts: First, gather a parallel bilingual corpus. For example, a Hansard
is a record of parliamentary debate. Canada, Hong Kong, and other countries produce
bilingual Hansards, the European Union publishes its official documents in 11
languages, and the United Nations publishes multilingual documents. Bilingual text is
also available online; some Web sites publish parallel content with parallel URLs, for
example, /en/ for the English page and /fr/ for the corresponding French page. The
leading statistical translation systems train on hundreds of millions of words of parallel
text and billions of words of monolingual text.
2. Segment into sentences: The unit of translation is a sentence, so we will have to break
the corpus into sentences. Periods are strong indicators of the end of a sentences, but
consider “Dr. J.R. Smith of Rodeo Dr. PAD $ 29.99 ON 9.9.09”; only the final period
ends a sentence. One way to decide if a period ends a sentence is to train a model that
takes as features the surrounding words and their parts of speech. This approach
achieves about 98% accuracy.
3. Align sentences: For each sentence in the En1glish version, determine what sentence(s)
it corresponds to in the French version. Usually, the next sentence of English
corresponds to the next sentence of French in a 1:1 match, but sometimes there is
variation: one sentence in one language will be split into a 2:1 match, or the order of
two sentences will be swapped, resulting in a 2:2 match. By looking at the sentence
lengths alone (i.e. short sentences should align with short sentences), it is possible to
align (1:1, 1:2, or 2;2, etc) with accuracy in the 90% to 99% range using a variation on
the Viterbi algorithm.
16
4. Align phrases: Within a sentence, phrases can be aligned by a process that is similar
to that used for sentence alignment, but requiring iterative improvement. When we start,
we have no way of knowing that “qui dort” aligns with “sleeping”, but we can arrive at
that alignment by a process of aggregation of evidence.
6. Improve estimates with EM: Use expectation – maximization to improve the estimate
of P(f|e) and P(d) values. We compute the best alignments with the current values of
these parameters in the E step, then update the estimates in the M step and iterate the
process until convergence.
SPEECH RECOGNITION
1. Example: The phrase “recognize speech” sounds almost the same as “wreak a nice
beach” when spoken quickly. Even this short example shows several of the issues that
make speech problematic.
2. First segmentation: written words in English have spaces between them, but in fast
speech there are no pauses in “wreck a nice” that would distinguish it as a multiword
phrase as opposed to the single word “recognize”.
3. Second, coarticulation: when speaking quickly the “s” sound at the end of “nice”
merges with the “b” sound at the beginning of “beach” yielding something that is close
to a “sp”. Another problem that does not show up in this example is homophones –
words like “to”, “too” and “two” that sound the same but differe in meaning
Heere P (sound1:t |sound1:t) is the acrostic model. It describes the sound of words – that
“ceiling” begins with a soft “c” and sounds the same as “sealing”. P (word1:t) is known
as the language model. It specifies the prior probability of each utterance – for
example, that “ceiling fan” is about 500 times more likely as a word sequence than
“sealing fan”.
4. Once we define the acoustic and language models, we can solve for the most likely
sequence of words using the Viterbi algorithm.
Acoustic Model
17
1. An analog-to-digital converter measures the size of the current – which approximates
the amplitude of the sound wave at discrete intervals called as sampling rate.
3. A phoneme is the smallest unit of sound that has a distinct meaning to speakers of a
particular language.
For example the “t” in “stick” sounds similar enough to the “t” in “tick” that speakers
of English consider them the same phoneme.
Figure 5.5 Translating the acoustic signal into a sequence of frames. In this
diagram each frame is described by the discretized values of three acoustic
features; a real system would have dozens of features
1. The quality of a speech recognition system depends on the quality of all of its
components – the language model, the word-pronunciation models, the phone models,
and the signal processing algorithms used to extract spectral features from the acoustic
signals.
2. The systems with the highest accuracy work by training a different models for each
speaker, thereby capturing differences in dialect as well as male / female and other
variations. This training can require several hours of interaction with the speaker, so the
systems with the most widespread adoption do not create speaker-specific models.
3. The accuracy of a system depends on a number of factors. First, the quality of the signal
matters: a high-quality directional microphone aimed at a stationary mouth in a padded
room will do much better than a cheap microphone transmitting a signal over phone
lines from a car in traffic with the radio playing. The vocabulary size matters: when
18
recognizing digit strings with a vocabulary of 11 words (1-9 plus “oh” and “zero)”, the
word error rate will be below 0.5%, where as it rises to about 10% on news stories with
a 20,000-word vocabulary, and 20% on a corpus with a 64,000-word vocabulary. The
task matters too: when the system is trying to accomplish a specific task – book a flight
or give directions to a restaurant – the task can often be a accomplished perfectly even
with a word error rate of 109% or more.
Figure 5.6A schematic diagram of the choices for a machine translation system.
We start with English text at the top. An interlingua-based system follows the
solid lines, parsing English first into a syntactic form, then into a semantic
representation and an interlingua representation, and then through generation
to a semantic, syntactic, and lexical form in French. A transfer-based system
uses the dashed lines as a shortcut. Different systems make the transfer at
different points; some it at multiple points.
5.19 ROBOT
1. Robots are physical agents that perform tasks by manipulating the physical world.
2. To do so, they are equipped with effectors such as legs, wheels, joints, and grippers.
4. Robots are also equipped with sensors, which allow them to perceive their environment.
5. Present day robotics employs a diverse set of sensors, including cameras and lasers to
measure the environment, and gyroscopes and accelerometers to measure the robot’s
own motion.
6. Most of today’s robots fall into one of three primary categories. Manipulators, or robot
arms are physically anchored to their workplace, for example in a factory assembly line
or on the International Space Station.
Robot Hardware
19
1. Sensors are the perceptual interface between robot and environment.
2. Passive sensors, such as cameras, are true observers of the environment: they capture
signals that are generated by other sources in the environment.
3. Active sensors, such as sonar, send energy into the environment. They rely on the fact
that this energy is reflected back to the sensor. Active sensors tend to provide more
information than passive sensors, but at the expense of increased power consumption
and with a danger of interference when multiple active sensors are used at the same
time. Whether active or passive, sensors can be divided into three types, depending on
whether they sense the environment, the robot’s location, or the robot’s internal
configuration.
4. Range finders are sensors that measure the distance to nearby objects. In the early days
of robotics, robots were commonly equipped with sonar sensors. Sonar sensors emit
directional sound waves, which are reflected by objects, with some of the sound making
it.
5. Stereo vision relies on multiple cameras to image the environment from slightly
different viewpoints, analyzing the resulting parallax in these images to compute the
range of surrounding objects. For mobile ground robots, sonar and stereo vision are
now rarely used, because they are not reliably accurate.
6. Other range sensors use laser beams and special 1-pixel cameras that can be directed
using complex arrangements of mirrors or rotating elements. These sensors are called
scanning lidars (short for light detection and ranging).
7. Other common range sensors include radar, which is often the sensor of choice for
UAVs. Radar sensors can measure distances of multiple kilometers. On the other
extreme end of range sensing are tactile sensors such as whiskers, bump panels, and
touch-sensitive skin. These sensors measure range based on physical contact, and can
be deployed only for sensing objects very close to the robot.
8. A second important class of sensors is location sensors. Most location sensors use range
sensing as a primary component to determine location. Outdoors, the Global
Positioning System (GPS) is the most common solution to the localization problem.
9. The third important class is proprioceptive sensors, which inform the robot of its own
motion. To measure the exact configuration of a robotic joint, motors are often equipped
with shaft decoders that count the revolution of motors in small increments.
10. Inertial sensors, such as gyroscopes, rely on the resistance of mass to the change of
velocity. They can help reduce uncertainty.
20
11. Other important aspects of robot state are measured by force sensors and torque sensors.
These are indispensable when robots handle fragile objects or objects whose exact
shape and location is unknown.
Robotic Perception
1. Perception is the process by which robots map sensor measurements into internal
representations of the environment. Perception is difficult because sensors are noisy,
and the environment is partially observable, unpredictable, and often dynamic. In other
words, robots have all the problems of state estimation (or filtering)
2. As a rule of thumb, good internal representations for robots have three properties: they
contain enough information for the robot to make good decisions, they are structured
so that they can be updated efficiently, and they are natural in the sense that internal
variables correspond to natural state variables in the physical world.
3. Adaptive perception techniques enable robots to adjust to such changes. Methods that
make robots collect their own training data (with labels!) are called self-supervised. In
this instance, the robot machine learning to leverage a short-range sensor that works
well for terrain classification into a sensor that can see much farther.
1. All of a robot’s deliberations ultimately come down to deciding how to move effectors.
2. The point-to-point motion problem is to deliver the robot or its end effector to a
designated target location.
3. A greater challenge is the compliant motion problem, in which a robot moves while
being in physical contact with an obstacle.
21
out that the configuration space—the space of robot states defined by location,
orientation, and joint angles—is a better place to work than the original 3D space.
5. The path planning problem is to find a path from one configuration to another in
configuration space. We have already encountered various versions of the path-
planning problem throughout this book; the complication added by robotics is that path
planning involves continuous spaces. T
6. Here are two main approaches: cell decomposition and skeletonization. Each reduces
the continuous path- planning problem to a discrete graph-search problem. In this
section, we assume that motion is deterministic and that localization of the robot is
exact. Subsequent sections will relax these assumptions.
1. It has two joints that move independently. Moving the joints alters the (x, y) coordinates
of the elbow and the gripper. (The arm cannot move in the z direction.) This suggests
that the robot’s configuration can be described by a four-dimensional coordinate: (xe,
ye) for the location of the elbow relative to the environment and (xg, yg) for the location
of the gripper. Clearly, these four coordinates characterize the full state of the robot.
They constitute what is known as workspace representation.
2. Configuration spaces have their own problems. The task of a robot is usually expressed
in workspace coordinates, not in configuration space coordinates. This raises the
question of how to map between workspace coordinates and configuration space.
3. These transformations are linear for prismatic joints and trigonometric for revolute
joints. This chain of coordinate transformation is known as kinematics.
4. The inverse problem of calculating the configuration of a robot whose effector location
is specified in workspace coordinates is known as inverse kinematics.The
configuration space can be decomposed into two subspaces: the space of all
configurations that a robot may attain, commonly called free space, and the space of
unattainable configurations, called occupied space.
22
2. Grayscale shading indicates the value of each free-space grid cell—i.e., the cost of the
shortest path from that cell to the goal.
1. A potential field is a function defined over state space, whose value grows with the
distance to the closet obstacle.
2. The potential field can be used as an additional cost term in the shortest-path
calculation.
3. This induces an interesting trade off. On the one hand, the robot seeks to minimize path
length to the goal. On the other hand, it tries to stay away from obstacles by virtue to
minimizing the potential function.
4. There exist many other ways to modify the cost function. For example, it may be
desirable to smooth the control parameters over time.
2. These algorithms reduce the robot’s free space to a one-dimensional representation, for
which the planning problem is easier.
4. It is a Voronoi graph of the free space - the set of all points that are equidistant to two
or more obstacles. To do path planning with a Voronoi graph, the robot first changes
its present configuration to a point on the Voronoi graph.
23
and moves to the target. Again, this final step involves straight-line motion in
configuration space.
Figure 5.8(a) A repelling potential field pushes the robot away from
obstacles, (b) Path found by simultaneously minimizing path length and
the potential
Figure 5.9 (a) The Voronoi graph is the set of points equidistance to two
or more obstacles in configuration space (b) A probabilistic moodmap,
composed of 100 randomly chosen points in free space.
Robust methods
1. A robust method is one that assumes a bounded amount of uncertainty in each aspect
of a problem, but does not assign probabilities to values within the allowed interval.
2. A robust solution is one that works no matter what actual values occur, provided they
are within the assumed intervals.
24
Figure 5.10A two-dimensional environment, velocity uncertainty cone,
and envelope of possible robot motions. The intended velocity is , but
with uncertainty the actual velocity could be anywhere in C, resulting in
a final configuration somewhere in the motion envelope, which means we
wouldn’t know if we hit the hole or not
This process is highly difficult since sound has to be matched with stored sound bites
on which further analysis has to be done because sound bites do not match with pre-existing
sound pieces. Feature extraction technique and pattern matching techniques plays important
role in speech recognition system to maximize the rate of speech recognition of various
persons.
1. Isolated Words Isolated word recognition system which recognizes single utterances
i.e. single word. Isolated word recognition is suitable for situations where the user
is required to give only one word response or commands. It is simple and easiest for
implementation because word boundaries are obvious and the words tend to be clearly
pronounced which is the major advantage of this type. Eg. Yes/No, Listen / Not-Listen
2. Connected Words A connected words system is similar to isolated words, but it allows
separate utterances to be - run-together‟ with a minimal pause between them.
25
Utterance is the vocalization of a word or words that represent a single
meaning to the computer. Eg. drive car(action + object), baby book(agent + object) \
3. Continuous Speech Continuous speech recognition system allows users to speak almost
naturally, while the computer determines its content. Continuous speech recognition
system is difficult to develop. Eg. dictation
Feature Extraction Feature extraction step finds the set of parameters of utterances that
have acoustic correlation with speech signals and these parameters are computed through
processing of the acoustic waveform. These parameters are known as features or feature vectors
(x₁, x₂, x₃, ...). The main focus of feature extractor is to keep the relevant information and
26
discard irrelevant one. Feature extractor divides the acoustic signal into 10-25 ms. Data
acquired in these frames is multiplied by window function. There are many types of window
functions that can be used such as hamming Rectangular, Blackman, Welch or Gaussian etc.
Feature extraction methods: Principal component Analysis (PCA), Linear Discriminate
Analysis (LDA), Wavelet, Independent component Analysis(ICA) etc.
27
TEXT/REFERENCE BOOKS
3. Nils J. Nilsson, “The Quest for Artificial Intelligence”, Cambridge University Press,
2009.
5. Gerhard Weiss, “Multi Agent Systems”, 2nd Edition, MIT Press, 2013.
28