0% found this document useful (0 votes)
2 views24 pages

Module 5

The document outlines a syllabus on uncertain knowledge and reasoning, focusing on probability theory and expert systems. It discusses how agents handle uncertainty, the limitations of logical reasoning in uncertain environments, and the application of decision theory and utility theory for rational decision-making. Key concepts include the representation of probabilities, conditional probabilities, and the use of probability density functions in AI contexts.

Uploaded by

chiragpatelmk46
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views24 pages

Module 5

The document outlines a syllabus on uncertain knowledge and reasoning, focusing on probability theory and expert systems. It discusses how agents handle uncertainty, the limitations of logical reasoning in uncertain environments, and the application of decision theory and utility theory for rational decision-making. Key concepts include the representation of probabilities, conditional probabilities, and the use of probability density functions in AI contexts.

Uploaded by

chiragpatelmk46
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Syllabus;

Uncertain knowledge and Reasoning: Quantifying Uncertainty, Acting under


Uncertainty , Basic Probability Notation, Inference Using Full Joint
Distributions, Independence , Bayes Rule and Its Use, The Wumpus World
Revisited.
Expert Systems: Representing and using domain knowledge, ES shells. Explanation,
knowledge acquisition
Text Book 1: Chapter 13-13.1, 13.2, 13.3, 13.4, 13.5, 13.6
Text Book 2: Chapter 20

13.1 : Acting under Uncertainty


● Agents may need to handle uncertainty, whether due to partial
observability, non determinism, or a combination of the two.
● An agent may never know for certain what state it’s in or where it
will end up after a sequence of actions.
● Problem-solving agents and logical agents designed to handle
uncertainty by keeping track of a belief state—a representation of
the set of all possible world states that it might be in—and
generating a contingency plan that handles every possible
eventuality that its sensors may report during execution.
● This approach has significant drawbacks when taken literally as a
recipe for creating agent programs:
● When interpreting partial sensor information, a logical agent must
consider every logically possible explanation for the observations,
no matter how unlikely.

1

Dept of CSE( AI & ML)


● This leads to impossible large and complex belief-state
representations.
● A correct contingent plan that handles every eventuality can grow
arbitrarily large and must consider arbitrarily unlikely
contingencies.
● Sometimes there is no plan that is guaranteed achieve the
goal—yet the agent must act.
● It must have some way to compare the merits of plans that are not
guaranteed.

1

Dept of CSE( AI & ML)


● Suppose, for example, that an automated taxi automated has the
goal of delivering a passenger to the airport on time.
● The agent forms a plan, A90, that involves leaving home 90
minutes before the flight departs and driving at a reasonable
speed.
● Even though the airport is only about 5 miles away, a logical taxi
agent will not be able to conclude with certainty that “Plan A90 will
get us to the airport in time.”
● Instead, it reaches the weaker conclusion “Plan A90 will get us
to the airport in time, as long as : the car doesn’t break down or run
out of gas, and I don’t get into an accident, and there are no
accidents on the bridge, and the plane doesn’t leave early, and no
meteorite hits the car, and ....” None of these conditions can be
deduced for sure.
●So the plan’s success cannot be inferred.
● This is the qualification problem, for which we so far have seen no
real solution.
● A90 is expected to maximize the agent’s performance measure
(where the expectation is relative to the agent’s knowledge about
the environment).
● The performance measure includes getting to the airport in time
for the flight, avoiding a long, unproductive wait at the airport, and
avoiding speeding tickets along the way.
● The agent’s knowledge cannot guarantee any of these outcomes

1
for A90, but it can provide some degree of belief that they will be
achieved.
● Other plans, such as A180, might increase the agent’s belief
that it will get to the airport on time, but also increase the
likelihood of a long wait.

Dept of CSE( AI & ML)


● The right thing to do—the rational decision—therefore depends on
both the relative importance of various goals and the likelihood
that, and degree to which, they will be achieved.
13.1.1Summarizing uncertainty
● Let’s consider an example of uncertain reasoning: diagnosing a
dental patient’s toothache.
● Diagnosis—whether for medicine, automobile repair, or
whatever— almost always involves uncertainty.
● Let us try to write rules for dental diagnosis using propositional
logic, so that we can see how the logical approach breaks down.
Consider the following simple rule:
●Toothache Cavity .
● The problem is that this rule is wrong.
● Not all patients with tooth aches have cavities; some of them have
gum disease, an abscess, or one of several other problems:
●Toothache Cavity ! Gum Problem ! Abscess ...

● Unfortunately, in order to make the rule true, we have to add


an almost unlimited list of possible problems.
●We could try turning the rule into a causal rule:
●Cavity Toothache .
● Trying to use logic to cope with a domain like medical diagnosis
thus fails for three main reasons:
● Laziness: It is too much work to list the complete set of
antecedents or consequents needed to ensure an exception less

1
rule and too hard to use such rules.
● Theoretical ignorance: Medical science has no complete theory for
the domain.

Dept of CSE( AI & ML)


● Practical ignorance: Even if we know all the rules, we might be
uncertain about a particular patient because not all the necessary
tests have been or can be run.
● The agent’s knowledge can at best provide only a degree of belief
in the relevant sentences.
● Our main tool for dealing with degrees of belief is probability theory.
● The ontological commitments of logic and probability theory are the
same—that the world is composed of facts that do or do not hold
in any particular case—but the epistemological commitments are
different:
● a logical agent believes each sentence to be true or false or has no
opinion,
● whereas a probabilistic agent may have a numerical degree of
belief between 0 (for sentences that are certainly false) and 1
(certainly true).
● Probability provides a way of summarizing the uncertainty that
comes from our laziness and ignorance, thereby solving the
qualification problem.
● We might not know for sure what afflicts a particular patient, but
we believe that there is, say, an 80% chance—that is, a probability
of 0.8—that the patient who has a toothache has a cavity.
13.1.2 Uncertainty and rational decisions
● Consider again the A90 plan for getting to the airport. Suppose it
gives us a 97% chance of catching our flight.

1
● Does this mean it is a rational choice? Not necessarily: there might
be other plans, such as A180, with higher probabilities.
● If it is vital not to miss the flight, then it is worth risking the longer
wait at the airport.
● What about A1440, a plan that involves leaving home 24 hours in
advance?
Dept of CSE( AI & ML)
● In most circumstances, this is not a good choice, because although
it almost guarantees getting there on time, it involves an
intolerable wait—not to mention a possibly unpleasant diet of
airport food.
● To make such choices, an agent must first have preferences
between the different possible outcomes of the various plans.
● An outcome is a completely specified state, including such factors
as whether the agent arrives on time and the length of the wait at
the airport.
●We use utility theory to represent and reason with preferences.
● (The term utility is used here in the sense of “the quality of being
useful,” not in the sense of the electric company or water works.)
● Utility theory says that every state has a degree of usefulness, or
utility, to an agent and that the agent will prefer states with higher
utility.
● Preferences, as expressed by utilities, are combined with
probabilities in the general theory of rational decisions called
decision theory:
● Decision theory = probability theory + utility theory .
● The fundamental idea of decision theory is that an agent is
rational if and only if it chooses the action that yields the highest
expected utility, averaged over all the possible outcomes of the
action.
●This is called the principle of Maximum Expected Utility (MEU).

1
13.2 Basic Probability Notation
For our agent to represent and use probabilistic information, we need
a formal language.
The language of probability theory has traditionally been informal,
written by human mathematicians to other human mathematicians.

Dept of CSE( AI & ML)


13.2.1What probabilities are about
● In probability theory, the set of all possible worlds is called the
sample space.
● The possible worlds are mutually exclusive and exhaustive—two
possible worlds cannot both be the case, and one possible world
must be the case.
● For example, if we are about to roll two (distinguishable) dice, there
are 36 possible worlds to consider: (1,1), (1,2), ..., (6,6).
● The Greek letter Ω (uppercase omega) is used to refer to the
sample space, and ω (lowercase omega) refers to elements of
the space, that is, particular possible worlds.
● A fully specified probability model associates a numerical
probability P(ω) with each possible world 1.

The basic axioms of probability theory say that every possible


world has a probability between 0 and 1 and that the total

1
probability of the set of possible worlds is 1:

● For example, if we assume that each die is fair and the rolls don’t
interfere with each other, then each of the possible worlds (1,1),
(1,2), ..., (6,6) has probability 1/36.

Dept of CSE( AI & ML)


● On the other hand, if the dice conspire to produce the same number,
then the worlds (1,1), (2,2), (3,3), etc., might have higher
probabilities, leaving the others with lower probabilities.
● In AI, the sets are always described by propositions in a formal
language.
● For each proposition, the corresponding set contains just those
possible worlds in which the proposition holds.
● The probability associated with a proposition is defined to be the
sum of the probabilities of the worlds in which it holds:
●For any proposition ,

● For example, when rolling fair dice, we have P(Total =11) = P((5,6))
+ P((6,5)) = 1/36 + 1/36 = 1/18.
● Probabilities such as P(Total =11) and P(doubles) are called
unconditional or prior probabilities (and sometimes just “priors” for
short);
● they refer to degrees of belief in propositions in the absence of any
other information.
● Most of the time, however, we have some information, usually
called evidence, that has already been revealed.
● This probability is written P(doubles |Die1 =5),where the “|” is
pronounced “given.”
● Similarly, if I am going to the dentist for a regular checkup, the
1

probability P(cavity)=0.2 might be of interest; but if I go to the


dentist because I have a toothache, it’s P(cavity |toothache)=0.6
that matters.

Dept of CSE( AI & ML)


● Note that the precedence of “|”is such that any expression of the
form P(...|...) always means P((...)|(...)).
●The assertion that P(cavity |toothache)=0.6
● does not mean “Whenever toothache is true, conclude that
cavity is true with probability 0.6” rather it means
● “Whenever toothache is true and we have no further information,
conclude that cavity is true with probability 0.6.”
● The extra condition is important; for example, if we had the further
information that the dentist found no cavities,
● we definitely would not want to conclude that cavity is true with
probability 0.6; instead we need to use
●P(cavity|toothache ¬cavity)=0.

Mathematically speaking, conditional probabilities are


defined in terms of unconditional probabilities as follows: for any
propositions a and b, we have;

13.2.2The language of propositions in probability assertions


● Factored representation, in which a possible world is represented
by a set of variable/value pairs.
● Variables in probability theory are called random variables and their
1
names begin with an uppercase letter.
● Thus, in the dice example, Total and Die1 are random variables.
Every random variable has a domain—the set of possible values it
can take on.

Dept of CSE( AI & ML)


● The domain of Total for two dice is the set {2,...,12} and the domain
of Die1 is {1,...,6}.
● A Boolean random variable has the domain: {true, false} (notice that
values are always lowercase);
●for example, the proposition that doubles are rolled can be written as
Doubles =true.
● By convention, propositions of the form A=true are abbreviated
simply as a, while A=false is abbreviated as ¬a.
● As in Constrained Satisfaction Problems, domains can be sets of
arbitrary tokens;
●we might choose the domain of Age to be {juvenile,teen,adult} and
●the domain of Weather might be {sunny, rain, cloudy, snow}.
● “T
he probability that the patient has a cavity, given that she is a
teenager with no toothache, is 0.1” as follows:

● Sometimes we will want to talk about the probabilities of all


the possible values of a random variable.
●We could write:

1

Dept of CSE( AI & ML)


● where the bold P indicates that the result is a vector of numbers,
and where we assume a predefined ordering sunny, rain, cloudy,
snow on the domain of Weather.
● We say that the P statement defines a probability distribution for
the random variable Weather.
● The P notation is also used for conditional distributions: P(X |Y )
gives the values of
●P(X =xi |Y =yj) for each possible i, j pair.
● For continuous variables, it is not possible to write out the entire
distribution as a vector, because there are infinitely many values.
● Instead, we can define the probability that a random variable takes
on some value x as a parameterized function of x.
●For example, the sentence
●P(NoonTemp =x)=Uniform[18C,26C](x)
● Expresses the belief that the temperature at noon is
distributed uniformly between 18 and 26 degrees Celsius.
●We call this a Probability Density Function.
● Probability density functions (sometimes called pdfs) differ in
meaning from discrete distributions.
● Saying that the probability density is uniform from 18C to 26C
means that there is a 100% chance that the temperature will fall
somewhere in that 8C-wide region and a 50% chance that it will
fall in any 4C- wide region, and so on.
● the intuitive definition of P(x) is the probability that X falls within an

1
arbitrarily small region beginning at x, divided by the width of the
region:

Dept of CSE( AI & ML)


●where C stands for centigrade (not for a constant).
● In P(NoonTemp =20.18C)= 1/ 8C, note that 1/ 8C is not a
probability, it is a probability density.
● The probability that NoonTemp is exactly 20.18C is zero, because
20.18C is a region of width 0.
● For example, P(Weather, Cavity) denotes the probabilities of all
combinations of the values of Weather and Cavity.
● This is a 4×2 table of probabilities called the joint probability
distribution of Weather and Cavity.
● We can also mix variables with and without values; P(sunny,
Cavity) would be a two-element vector giving the probabilities of a
sunny day with a cavity and a sunny day with no cavity.
● The P notation makes certain expressions much more
concise than they might otherwise be.
● For example, the product rules for all possible values of
Weather and Cavity can be written as a single equation:
● P(Weather,Cavity)=P(Weather|Cavity)P(Cavity).

1

Dept of CSE( AI & ML)


●As a degenerate case,
● P(sunny,cavity) has no variables and thus is a one-element vector
that is the probability of a sunny day with a cavity, which could
also be written as P(sunny,cavity) or P(sunny cavity).
● A possible world is defined to be an assignment of values to all of
the random variables under consideration.
● It is easy to see that this definition satisfies the basic requirement
that possible worlds be mutually exclusive and exhaustive .
● For example, if the random variables are Cavity, Toothache, and
Weather, then there are 2×2×4=16possible worlds.
● Furthermore, the truth of any given proposition, no matter how
complex, can be determined easily in such worlds using the same
recursive definition of truth as for formulas in propositional logic.
● Probability model is completely determined by the joint distribution
for all of the random variables—the so-called full joint probability
distribution.
● For example, if the variables are Cavity, Toothache, and Weather,
1
then the full joint distribution is given by P(Cavity, Toothache,
Weather).
● This joint distribution can be represented as a 2×2×4 table
with 16 entries.

Dept of CSE( AI & ML)


● Because every proposition’s probability is a sum over possible
worlds, a full joint distribution suffices, in principle, for calculating
the probability of any proposition.
13.2.3 Probability axioms and their reasonableness

● The basic axioms of probability (Equations (13.1) and (13.2)) imply


certain relationships among the degrees of belief that can be
accorded to logically related propositions.

For example, we can derive the familiar relationship between the


probability of a proposition and the probability of its negation:

● We can also derive the well-known formula for the probability of a


disjunction, sometimes called the inclusion–exclusion principle:

● Equations (13.1) and (13.4) are often called Kolmogorov’s axioms


in honor of the Russian mathematician Andrei Kolmogorov, who
showed how to build up the rest of probability theory from this
1
simple foundation and how to handle the difficulties caused by
continuous variables.
●But de Finetti proved something much stronger:

Dept of CSE( AI & ML)


● If an agent has some degree of belief in a proposition a, then the
agent should be able to state odds at which it is indifferent to a
bet for or against a3. Think of it as a game between two agents:
Agent 1 states, “my degree of belief in event a is 0.4.” Agent 2 is
then free to choose whether to wager for or against a at stakes
that are consistent with the stated degree of belief.

If
Agent 1 expresses a set of degrees of belief that violate the
axioms of probability theory then there is a combination of bets by
Agent 2 that guarantees that Agent 1 will lose money every time.

13.3 :Inference Using Full Joint Distributions


● In this section we describe a simple method for probabilistic
inference—
● that is, the computation of posterior probabilities for query
propositions given observed evidence.
● We use the full joint distribution as the “knowledge base” from
which answers to all questions may be derived.
● Along the way we also introduce several useful techniques for
manipulating equations involving probabilities.
WHERE DO PROBABILITIES COME FROM?
1
● There has been endless debate over the source and status of
probability numbers.
● The frequentist position is that the numbers can come only
from experiments: if we test 100 people and find that 10 of them
have a

Dept of CSE( AI & ML)


cavity, then we can say that the probability of a cavity is
approximately 0.1. In this view, the assertion “the probability of a
cavity is 0.1” means that 0.1 is the fraction that would be observed in
the limit of infinitely many samples. From any finite sample, we can
estimate the true fraction and also calculate how accurate our
estimate is likely to be.
● The objectivist view is that probabilities are real aspects of
the universe—propensities of objects to behave in certain ways—
rather than being just descriptions of an observer’s degree of
belief. For example, the fact that a fair coin comes up heads with
probability 0.5 is a propensity of the coin itself.
● The subjectivist view describes probabilities as a way of
characterizing an agent’s beliefs, rather than as having any
external physical significance.

In the end, even a strict frequentist position involves subjective


analysis because of the reference class problem:

● We begin with a simple example: a domain consisting of just the


three Boolean variables Toothache, Cavity, and Catch (the
dentist’s nasty steel probe catches in my tooth).
●The
1
full joint distribution is a 2×2×2 table as shown in Figure13.3.
● Notice that the probabilities in the joint distribution sum to 1, as
required by the axioms of probability.
●For example, there are six possible worlds
● in which cavity ! toothache holds:

Dept of CSE( AI & ML)


● For example, adding the entries in the first row gives the
unconditional or marginal probability of cavity:

● This process is called marginalization, or summing out—because


we sum up the probabilities for each possible value of the other
variables, there by taking them out of the equation.

We can write the following general marginalization rule for any


sets of variables Y and Z:

●This rule is called conditioning.


● Marginalization and conditioning turn out to be useful rules for all
kinds of derivations involving probability expressions.
● For example, we can compute the probability of a cavity, given
evidence of a toothache, as follows:

1

Dept of CSE( AI & ML)


● The two values sum to 1.0, as they should. Notice that in these two
calculations the term 1/P(toothache) remains constant, no matter
which value of Cavity we calculate.
● In fact, it can be viewed as a normalization constant for the
distribution P(Cavity | toothache), ensuring that it adds up to 1..

T
hroughout the chapters dealing with probability, we use α to
denote such constants. With this notation, we can write the two
preceding equations in one:

● We begin with then case in which the query involves a single


variable, X (Cavity in the example).
● Let E be the list of evidence variables (just Toothache in the
example), let e be the list of observed values for them, and let Y be
the remaining unobserved variables (just Catch in the example).
●The
1
query is P(X |e) and can be evaluated as:

Dept of CSE( AI & ML)


13.4: Independence
● Let us expand the full joint distribution in Figure 13.3 by adding a
fourth variable, Weather.
● The full joint distribution then becomes P(Toothache, Catch,
Cavity, Weather), which has 2 ×2×2×4=32 entries.
● It contains four “editions” of the table shown in Figure 13.3, one for
each kind of weather.
● What relationship do these editions have to each other and to the
original three-variable table?
●For example, how are P(toothache, catch, cavity, cloudy)
● and P(toothache, catch, cavity) related?
●We can use the product rule:

● Thus, the 32-element table for four variables can be constructed


from one 8-element table and one 4-element table.
●This
1 decomposition is illustrated schematically in Figure 13.4(a).

● The property we used in Equation (13.10) is called


independence (also marginal independence and absolute
independence).

Dept of CSE( AI & ML)


● In particular, the weather is independent of one’s dental problems.
Independence between propositions a and b can be written as:
● P(a|b)=P(a) or P(b|a)=P(b) or
P(a b)=P(a)P(b). (13.11)
● Independence between variables X and Y can be written as follows
(again, these are all equivalent):
● P(X |Y)=P(X) or P(Y |X)=P(Y) or P(X,Y)=P(X)P(Y).
● Independence assertions are usually based on knowledge of
the domain.
● As the toothache weather example illustrates, they can
dramatically reduce the amount of information necessary to
specify the full joint distribution.

● For example, the full joint distribution on the outcome of n


independent coin flips, P(C1,. ,Cn),has 2n entries,
1
● but it can be represented as the product of n single-variable
distributions P(Ci).

Dept of CSE( AI & ML)


13.5 : BAYES’RULE AND ITS USE
●The Product rule. It can actually be written in two forms:

● This equation is known as Bayes’ rule (also Bayes’ law or Bayes’


theorem).
● Thissimple equation underlies most modern AI systems
for probabilistic inference.
● The more general case of Bayes’ rule for multivalued variables can
be written in the P notation as follows:

We will also have occasion to use a more general version


conditionalized on some background evidence e:

13.5.1 ; Applying Bayes’ rule: The simple case


●On the surface, Bayes’ rule does not seem very useful.
1
●It allows us to compute the single term P(b|a) in terms of three terms:
P(a|b), P(b),and P(a).

Dept of CSE( AI & ML)


● The conditional probability P(effect |cause) quantifies the
relationship in the causal direction,
●whereas P(cause |effect) describes the diagnostic direction.

●The general form of Bayes’ rule with normalization is


●P(Y |X)=αP(X|Y)P(Y), (13.15)
● where α is the normalization constant needed to make the entries
in P(Y |X) sum to 1.
13.5.2 Using Bayes’ rule: Combining evidence
● We have seen that Bayes’ rule can be useful for answering
probabilistic queries conditioned on one piece of evidence—for
example, the stiff neck. In particular, we have argued that
probabilistic information is often available in the form P(effect |
cause).
● What happens when we have two or more pieces of evidence?
● For example, what can a dentist conclude if her nasty steel probe
catches in the aching tooth of a patient?
● If we know the full joint distribution (Figure 13.3), we can read off
the answer:
●P(Cavity
1
|toothache catch)=α(0.108,0.016)≈(0.871,0.129) .
● We know, however, that such an approach does not scale up
to larger numbers of variables.
● We can try using Bayes’ rule to reformulate the problem:
● P(Cavity |toothache catch)

Dept of CSE( AI & ML)


=αP(toothache catch |Cavity)P(Cavity) . …(13.16)
●P(toothache catch |Cavity)=
P(toothache |Cavity)P(catch |Cavity) …(13.17)
● This equation expresses the conditional independence of
toothache and catch given Cavity.
● We can plug it into Equation (13.16) to obtain the probability of
a cavity:
● P(Cavity |toothache catch)
=αP(toothache|Cavity)P(catch|Cavity)P(Cavity) ……(13.18)

The
general definition of conditional independence of two variables X
and Y ,given a third variable Z, is:


In the dentist domain, for example, it seems reasonable to assert
conditional independence of the variables Toothache and Catch,
given Cavity:

● which asserts independence only for specific values of Toothache


and Catch.

As with absolute independence in Equation (13.11), the


1
equivalent forms:

Dept of CSE( AI & ML)


● conditional independence assertions can allow probabilistic
systems to scale up; moreover, they are much more commonly
available than absolute independence assertions.
● Conceptually,: Cavity separates Toothache and Catch because it is
a direct cause of both of them.
● The decomposition of large probabilistic domains into weakly
connected subsets through conditional independence
● Is one of the most important developments in the recent history of AI.
● The dentistry example illustrates a commonly occurring pattern in
which a single cause directly influences a number of effects, all of
which are conditionally independent, given the cause.
●The full joint distribution can be written as

● Such a probability distribution is called a naive Bayes model—


“naive” because it is often
● used (as a simplifying assumption) in cases where the “effect”
variables are not actually conditionally independent given the
cause variable.
● (The naive Bayes model is sometimes called a Bayesian classifier,
a somewhat careless usage that has prompted true Bayesians to
call it the idiot Bayes model.)

1

Dept of CSE( AI & ML)

You might also like