Module 5 - Part1
Module 5 - Part1
Syllabus;
Uncertain knowledge and Reasoning: Quantifying Uncertainty, Acting
under Uncertainty, Basic Probability Notation, Inference Using Full Joint
Distributions, Independence, Bayes Rule and Its Use, The WumpusWorld
Revisited.
Expert Systems: Representing and using domain knowledge, ES shells. Explanation,
knowledge acquisition
Text Book 1: Chapter 13-13.1, 13.2, 13.3, 13.4, 13.5, 13.6
Text Book 2: Chapter 20
13.1 : Acting under Uncertainty
• Agents may need to handle uncertainty, whether due to partial
observability, non-determinism, or a combination of the two.
• An agent may never know for certain what state it’s in or where it
will end up after a sequence of actions.
• Problem-solving agents and logical agents designed to handle
uncertainty by keeping track of a belief state—a representation of
the set of all possible world states that it might be in—and
generating a contingency plan that handles every possible
eventuality that its sensors may report during execution.
• This approach has significant drawbacks when taken literally as a
recipe for creating agent programs:
• When interpreting partial sensor information, a logical agent must
consider every logically possible explanation for the observations,
no matter how unlikely.
• This leads to impossible large and complex belief-state
representations.
• A correct contingent plan that handles every eventuality can grow
arbitrarily large and must consider arbitrarily unlikely contingencies.
• Sometimes there is no plan that is guaranteed achieve the goal—yet
the agent must act.
• It must have some way to compare the merits of plans that are not
guaranteed.
1
Module 5 BAD402 Artificial Intelligence
• Suppose, for example, that an automated taxi automated has the goal
of delivering a passenger to the airport on time.
• The agent forms a plan, A90, that involves leaving home 90 minutes
before the flight departs and driving at a reasonable speed.
• Even though the airport is only about 5 miles away, a logical taxi
agent will not be able to conclude with certainty that ―Plan A90 will
get us to the airport in time.‖
• Instead, it reaches the weaker conclusion ―Plan A90 will get us to the
airport in time, as long as : the car doesn’t break down or run out of
gas, and I don’t get into an accident, and there are no accidents on
the bridge, and the plane doesn’t leave early, and no meteorite hits
the car,and ....‖ None of these conditions can be deduced for sure.
• So the plan’s success cannot be inferred.
• This is the qualification problem, for which we so far have seen no
real solution.
• A90 is expected to maximize the agent’s performance measure
(where the expectation is relative to the agent’s knowledge about the
environment).
• The performance measure includes getting to the airport in time for
the flight, avoiding a long, unproductive wait at the airport, and
avoiding speeding tickets along the way.
• The agent’s knowledge cannot guarantee any of these outcomes for
A90, but it can provide some degree of belief that they will be
achieved.
• Other plans, such as A180, might increase the agent’s belief that it
will get to the airport on time, but also increase the likelihood of a
long wait.
2
Module 5 BAD402 Artificial Intelligence
3
Module 5 BAD402 Artificial Intelligence
4
Module 5 BAD402 Artificial Intelligence
5
Module 5 BAD402 Artificial Intelligence
• For example, if we assume that each die is fair and the rolls don’t
interfere with each other, then each of the possible worlds (1,1),
(1,2), ..., (6,6) has probability 1/36.
6
Module 5 BAD402 Artificial Intelligence
• On the other hand, if the dice conspire to produce the same number,
then the worlds (1,1), (2,2), (3,3), etc., might have higher
probabilities,leaving the others with lower probabilities.
• In AI, the sets are always described by propositions in a formal
language.
• For each proposition, the corresponding set contains just those
possible worlds in which the proposition holds.
• The probability associated with a proposition is defined to be the
sumof the probabilities of the worlds in which it holds:
• For any proposition ,
• For example, when rolling fair dice, we have P(Total =11) = P((5,6))
• + P((6,5)) = 1/36 + 1/36 = 1/18.
• Probabilities such as P(Total =11) and P(doubles) are called
unconditional or prior probabilities (and sometimes just
―priors‖ for short);
• they refer to degrees of belief in propositions in the absence of
any other information.
• Most of the time, however, we have some information, usually
called evidence, that has already been revealed.
• This probability is written P(doubles |Die1 =5),where the ―|‖
is pronounced ―given.‖
• Similarly, if I am going to the dentist for a regular checkup, the
probability P(cavity)=0.2 might be of interest; but if I go to the
dentist because I have a toothache, it’s P(cavity |toothache)=0.6
that matters.
7
Module 5 BAD402 Artificial Intelligence
8
Module 5 BAD402 Artificial Intelligence
• The domain of Total for two dice is the set {2,...,12} and
thedomain of Die1 is {1,...,6}.
• A Boolean random variable has the domain: {true, false} (notice
thatvalues are always lowercase);
• for example, the proposition that doubles are rolled can be written as
• Doubles =true.
• By convention, propositions of the form A=true are abbreviated
simply as a, while A=false is abbreviated as ¬a.
• As in Constrained Satisfaction Problems, domains can be sets of
arbitrary tokens;
• we might choose the domain of Age to be {juvenile,teen,adult} and
• the domain of Weather might be {sunny, rain, cloudy, snow}.
• ―The probability that the patient has a cavity, given that she
is ateenager with no toothache, is 0.1‖ as follows:
9
Module 5 BAD402 Artificial Intelligence
• where the bold P indicates that the result is a vector of numbers, and
where we assume a predefined ordering sunny, rain, cloudy, snow
on the domain of Weather.
• We say that the P statement defines a probability distribution for the
random variable Weather.
• The P notation is also used for conditional distributions: P(X |Y )
gives the values of
• P(X =xi |Y =yj) for each possible i, j pair.
• For continuous variables, it is not possible to write out the entire
distribution as a vector, because there are infinitely many values.
• Instead, we can define the probability that a random variable takes on
some value x as a parameterized function of x.
• For example, the sentence
• P(NoonTemp =x)=Uniform[18C,26C](x)
• Expresses the belief that the temperature at noon is distributed
uniformly between 18 and 26 degrees Celsius.
• We call this a Probability Density Function.
• Probability density functions (sometimes called pdfs) differ in
meaning from discrete distributions.
• Saying that the probability density is uniform from 18C to 26C means
that there is a 100% chance that the temperature will fall
somewhere in that 8C-wide region and a 50% chance that it will fall
in any 4C- wide region, and so on.
• the intuitive definition of P(x) is the probability that X falls within an
arbitrarily small region beginning at x, divided by the width of the
region:
10
Module 5 BAD402 Artificial Intelligence
11
Module 5 BAD402 Artificial Intelligence
• As a degenerate case,
• P(sunny,cavity) has no variables and thus is a one-element vector
that is the probability of a sunny day with a cavity, which could also
be written as P(sunny,cavity) or P(sunny 𝖠cavity).
• A possible world is defined to be an assignment of values to all of the
random variables under consideration.
• It is easy to see that this definition satisfies the basic requirement that
possible worlds be mutually exclusive and exhaustive .
• For example, if the random variables are Cavity, Toothache, and
Weather, then there are 2×2×4=16possible worlds.
• Furthermore, the truth of any given proposition, no matter how
complex, can be determined easily in such worlds using the same
recursive definition of truth as for formulas in propositional logic.
• Probability model is completely determined by the joint
distribution for all of the random variables—the so-called full
joint probability distribution.
• For example, if the variables are Cavity, Toothache, and Weather,
then the full joint distribution is given by P(Cavity, Toothache,
Weather).
• This joint distribution can be represented as a 2×2×4 table with 16
entries.
12
Module 5 BAD402 Artificial Intelligence
13
Module 5 BAD402 Artificial Intelligence
14
Module 5 BAD402 Artificial Intelligence
• For example, adding the entries in the first row gives the
unconditional or marginal probability of cavity:
16
Module 5 BAD402 Artificial Intelligence
• The two values sum to 1.0, as they should. Notice that in these two
calculations the term 1/P(toothache) remains constant, no matter
which value of Cavity we calculate.
• In fact, it can be viewed as a normalization constant for the
distribution P(Cavity | toothache), ensuring that it adds up to 1..
• Throughout the chapters dealing with probability, we use α to
denote such constants. With this notation, we can write the two
preceding equations in one:
• We begin with then case in which the query involves a single variable,
X (Cavity in the example).
• Let E be the list of evidence variables (just Toothache in the example),
let e be the list of observed values for them, and let Y be the
remaining unobserved variables (just Catch in the example).
• The query is P(X |e) and can be evaluated as:
17
Module 5 BAD402 Artificial Intelligence
13.4: Independence
• Let us expand the full joint distribution in Figure 13.3 by adding a
fourth variable, Weather.
• The full joint distribution then becomes P(Toothache, Catch,
Cavity, Weather), which has 2 ×2×2×4=32 entries.
• It contains four ―editions‖ of the table shown in Figure 13.3, one for
each kind of weather.
• What relationship do these editions have to each other and to the
original three-variable table?
• For example, how are P(toothache, catch, cavity, cloudy)
• and P(toothache, catch, cavity) related?
• We can use the product rule:
• Thus, the 32-element table for four variables can be constructed from
one 8-element table and one 4-element table.
• This decomposition is illustrated schematically in Figure 13.4(a).
• The property we used in Equation (13.10) is called independence
(also marginal independence and absolute independence).
18
Module 5 BAD402 Artificial Intelligence
19
Module 5 BAD402 Artificial Intelligence
20
Module 5 BAD402 Artificial Intelligence
21
Module 5 BAD402 Artificial Intelligence
22
Module 5 BAD402 Artificial Intelligence
• Lurking somewhere in the cave is the terrible wumpus, a beast that eats anyone
who enters its room.
• The wumpus can be shot by an agent, but the agent has only one arrow.
• Some rooms contain bottomless pits that will trap anyone who wanders into these
rooms (except for the wumpus, which is too big to fall in).
23
Module 5 BAD402 Artificial Intelligence
• Performance measure: +1000 for climbing out of the cave with the gold,–1000 for
falling into a pit or being eaten by the wumpus,–1 for each action taken and–10 for
using up the arrow.
• The game ends either when the agent dies or when the agent climbs out of the
cave.
• Environment: A 4×4 grid of rooms. The agent always starts in the square labeled
[1,1], facing to the right. In addition, each square other than the start can be a pit,
with probability 0.2.
24
Module 5 BAD402 Artificial Intelligence
• Uncertainty arises in the wumpus world because the agent’s sensors give only
partial information about the world.
• For example, Figure 13.5 shows a situation in which each of the three reachable
squares—[1,3], [2,2], and [3,1]—might contain a pit.
• Pure logical inference can conclude nothing about which square is most likely to
be safe, so a logical agent might have to choose randomly.
• Our aim is to calculate the probability that each of the three squares contains a
pit.
• (2) each square other than [1,1] contains a pit with probability 0.2.
• As in the propositional logic case, we want one Boolean variable Pij for each
square, which is true iff square[i,j] actually contains a pit.
25
Module 5 BAD402 Artificial Intelligence
• We also have Boolean variables Bij that are true iff square [i,j] is breezy; we
include these variables only for the observed squares—in this case, [1,1], [1,2],
and [2,1].
• This decomposition makes it easy to see what the joint probability values should
be.
• Given a pit configuration; its values are 1 if the breezes are adjacent to the pits
and 0 otherwise.
• Each square contains a pit with probability 0.2, independently of the other squares;
hence:
• In the situation in Figure 13.5(a), the evidence consists of the observed breeze (or
its absence) in each square that is visited, combined with the fact that each such
square contains no pit.
• b=¬b1,1∧b1,2∧b2,1 and
• known =¬p1,1∧¬p1,2∧¬p2,1.
• how likely is it that [1,3] contains a pit, given the observations so far?
• Let Unknown be the set of Pi,j variables for squares other than the Known squares
and the query square [1,3].
• Surely, one might ask, aren’t the other squares irrelevant? How could [4,4] affect
whether [1,3] has a pit?
• Let Frontier be the pit variables (other than the query variable) that are adjacent
to visited squares, in this case just [2,2] and [3,1].
• Also, let Other be the pit variables for the other unknown squares; in this case,
there are 10 other squares, as shown in Figure 13.5(b).
• The key insight is that the observed breezes are conditionally independent of
the other variables, given the known, frontier, and query variables.
• To use the insight, we manipulate the query formula into a form in which the
breezes are conditioned on all the other variables, and then we apply conditional
independence:
27
Module 5 BAD402 Artificial Intelligence
• By independence, as in Equation (13.20), the prior term can be factored, and then
the terms can be reordered:
• That is, [1,3] (and [3,1] by symmetry) contains a pit with roughly 31%
probability.
• A similar calculation, which the reader might wish to perform, shows that [2,2]
contains a pit with roughly 86% probability.
28