0% found this document useful (0 votes)
16 views80 pages

Quantifying Uncertainty

The document presents a lecture on quantifying uncertainty in artificial intelligence, focusing on decision-making under uncertainty, basic probability notation, and the use of Bayes' Rule. It discusses the challenges of handling uncertainty in real-world scenarios, such as partial observability and nondeterminism, and introduces concepts like belief states and contingency plans. Additionally, it emphasizes the importance of rational decision-making and the application of probability theory to represent uncertainty in various contexts, including medical diagnosis.

Uploaded by

Gaming with Joel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views80 pages

Quantifying Uncertainty

The document presents a lecture on quantifying uncertainty in artificial intelligence, focusing on decision-making under uncertainty, basic probability notation, and the use of Bayes' Rule. It discusses the challenges of handling uncertainty in real-world scenarios, such as partial observability and nondeterminism, and introduces concepts like belief states and contingency plans. Additionally, it emphasizes the importance of rational decision-making and the application of probability theory to represent uncertainty in various contexts, including medical diagnosis.

Uploaded by

Gaming with Joel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Slide 1

Quantifying Uncertainty
Introduction to Artificial Intelligent
Hamdi Abdurhman, PhD

401892 LECTURE 6 11

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 2

Last Time

• Knowledge-Based Agents
• Wumpus World
• Logic
• Propositional Logic: A very Simple Logic

401892 LECTURE 6 22

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 3

Today
• Acting under Uncertainty
• Basic Probability Notation
• Inference Using Full Joint Distributions
• Independence
• Bayes’ Rule and Its Use
• Naive Bayes Models
• The Wampus World Revisited

401892 LECTURE 6 33

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 4

Introduction to Uncertainty
• Real World Uncertainty:
• Partial observability, nondeterminism, and adversaries.
• Agents may not know current or future states.
• Handling Uncertainty:
• Belief state: Represents possible world states.
• In partially observable environments, agents can't determine their exact state.
• In nondeterministic environments, agents use belief states to account for multiple
possible state transitions.
• Contingency plan: Handles every possible sensor observation.
• In partially observable and nondeterministic environments, the solution to a problem is
no longer a sequence, but rather a conditional plan (sometimes called contingency plan
or a strategy)

401892 LECTURE 6 44

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 5

Drawbacks of Contingency Plans


• Despite its many virtues, however, this approach has many significant
drawbacks:
• With partial information, an agent must consider every possible eventuality,
no matter how unlikely. This leads to impossibly large and complex belief-
state representations
• A correct contingent plan that handles every eventuality can grow arbitrarily
large and must consider arbitrarily unlikely contingencies.
• Sometimes there is no plan that is guaranteed to achieve the goal—yet the
agent must act. It must have some way to compare the merits of plans that
are not guaranteed.

401892 LECTURE 6 55

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 6

Uncertainty
• Uncertainty is everywhere. Consider the following proposition.
• At: Leaving t minutes before the flight will get me to the airport.
• Problems:
1. Partial observability (road state, other drivers plans, etc.)
2. Noisy sensors (radio traffic reports)
3. Uncertainty in action outcomes (a flat tire, etc.)
4. Immense complexity of modelling and predicting traffic

401892 LECTURE 6 66

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 7

Example -
Automated Taxi

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 8

Rational Decision-Making
• Performance Measure:
• Timeliness, avoiding unproductive waits, avoiding speeding tickets.
• Comparing Plans:
• Plan A90: Maximizes performance measure based on knowledge.
• Plan A180: Increases belief in success but has trade-offs.
• Rational Decision:
• Depends on goal importance and likelihood of achievement.
• Expected to maximize performance based on environmental knowledge.

401892 LECTURE 6 88

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 9

A Visit to the Dentist


• We'll use medical/dental diagnosis examples extensively
• our new prototype problem relates to whether a dental patient has a cavity or not
• the process of diagnosis always involves uncertainty & this leads to difficulty with
logical representations (propositional logic examples)
1. T
2.
3.
1. is just wrong since other things cause toothaches
2. will need to list all possible causes
3. tries a causal rule but it's not always the case that cavities cause toothaches &
fixing the rule requires making it logically exhaustive
401892 LECTURE 6 99

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 10

Representation for Diagnosis


• Logic is not sufficient for medical diagnosis, due to
• Our laziness: it's too hard to list all possible antecedents or consequents to make the
rule have no exceptions
• Our theoretical ignorance: generally, there is no complete theory of the domain, no
complete model
• Our practical ignorance: even if the rules were complete, in any particular case it's
impractical or impossible to do all the necessary tests, to have all relevant evidence
• The example relationship between toothache & cavities is not a logical
consequence in either direction
• Instead, knowledge of the domain provides a degree of belief in diagnostic
sentences & the way to represent this is with probability theory

401892 LECTURE 6 10
10

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 11

Epistemological Commitment
• Ontological commitment
• What a representational language assumes about the nature of reality - logic
& probability theory agree in this, that facts do or do not hold
• Epistemological commitment
• The possible states of knowledge
• For logic, sentences are true/false/unknown
• For probability theory, there's a numerical degree of belief in sentences, between 0
(certainly false) and 1 (certainly true)

401892 LECTURE 6 11
11

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 12

Knowledge representation
Language Main elements (Ontological Assignments
Commitment) (Epistemological Commitment)
Propositional logic Facts T, F, unknown
First-order logic Facts, objects, relations T, F, unknown
Temporal logic Facts, objects, relations, times T, F, unknown
Temporal constraint satisfaction problems Time points Time intervals
Fuzzy logic Set membership Degree of truth
Probability theory Facts Degree of belief

• The first three do not represent uncertainty, while the last three do.

401892 LECTURE 6 12
12

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 13

The Qualification Problem


• For a logical representation
• The success of a plan can't be inferred because of all the conditions that could
interfere but can't be deduced not to happen (this is the qualification
problem)
• Probability is a way of dealing with the qualification problem by numerically
summarizing the uncertainty that derives from laziness &/or ignorance
• Returning to the toothache & cavity problem
• In the real world, the patient either does or does not have a cavity
• A probabilistic agent makes statements with respect to the knowledge state, & these
may change as the state of knowledge changes
• For example, an agent initially may believe there's an 80% chance (probability 0.8) that
the patient with the toothache has a cavity, but subsequently revises that as additional
evidence is available

401892 LECTURE 6 13
13

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 14

Rational Decisions
• Making choices among plans/actions when the probabilities of their
success differ
• This requires additional knowledge of preferences among outcomes
• This is the domain of utility theory: every state has a degree of
utility/usefulness to the agent & the agent will prefer those with higher utility
• Utilities are specific to an agent, to the extent that they can even encompass perverse or
altruistic preferences

401892 LECTURE 6 14
14

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 15

Rational Decisions
• Making choices among plans/actions when the probabilities of their
success differ
• We can combine to get a general
theory of rational decisions: decision theory
• A rational agent chooses actions to yield the highest expected utility
averaged over all possible outcomes of the action
• This is the maximum expected utility (MEU) principle
• Expected = average of the possible outcomes of an action weighted by their probabilities
• Choice of action = the one with highest expected utility

401892 LECTURE 6 15
15

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 16

1.2 Uncertainty and Rational Decisions:


Decision-Theoretic Agent Structure

• Reflects history of percepts to date.


• Belief state includes probabilities of
world states.
• Probabilistic Predictions:
• Agent predicts action outcomes
probabilistically.
• Selects actions with highest Figure 1. A decision-theoretic agent that selects rational actions [1].
expected utility.

16
401892 LECTURE 6 16

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 17

2. Basic Probability Notation


Represent and use probabilistic information.
Traditionally informal, written for mathematicians.
Approach tailored for AI and linked to formal logic.

401892 LECTURE 6 17
17

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 18

2.1 What Probabilities are About


• Sample Space:
• Set of all possible worlds (Ω).
• Each ω is a particular configuration or outcome.
• Example: Rolling two dice gives 36 possible worlds.
▪ Sample space (Ω): { (1,1), (1,2), ..., (6,6) }
▪ Specific outcome (ω): (5,6).
• Probability Model:
• Assigns numerical probability P(ω) to each possible world.
• Axioms: . ……(Equations 1)
• Example:
• Fair dice: Each world (1,1), (1,2), ..., (6,6) has P(ω)=1/36.

401892 LECTURE 6 18
18

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 19

Events and Propositions


• Assertions & queries in probabilistic reasoning
• Events:
• Sets of possible worlds.
• Example: Probability of two dice adding up to 11.
• Propositions:
• Correspond to sets of possible worlds in logic.
• Probability of a proposition: Sum of probabilities of worlds where it holds.
• Use the Greek letter phi for proposition
• Equation (2):
• Example:
• Rolling dice:
1/36 1/36 = /36 = 1/1

401892 LECTURE 6 19
19

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 20

Unconditional Probabilities
• Prior Probabilities:
o Also called "unconditional probabilities" or "priors."
o Degree of belief in propositions without additional information.
o When rolling fair dice, if we assume that each die is fair, and the rolls don’t
interfere with each others
o The set of possible worlds
o (1,1), (1, ), (1,3) …( ,1), ( , ), …,(6,5), (6,6)
o P(Dice) = 1/36

401892 LECTURE 6 20
20

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 21

Conditional Probabilities
• Posterior Probabilities:
• Also called "conditional probabilities" or "posteriors."
• The probability of a certain event happening, given the effect of another
event(called evidence).
• For example, the first die may be already showing 5 and we are waiting for
the other die to settle down
• In that case, we are interested in the probability of the other die given the
first one is 5
• Example: P(Dice ∣ Die1=5).
• Notation:
• P(A∣B) is read as "Probability of A given B."
401892 LECTURE 6 21
21

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 22

Understanding Conditional Probability


• Example Context:
• Prior probability: P(cavity) = 0.2 for a regular checkup.
• Conditional probability: P(cavity | toothache) = 0.6 for a toothache.
• Valid but Not Useful:
• P(cavity) = 0.2 remains valid but less useful after observing a toothache.
• Decisions should condition on all observed evidence.
• Difference from Logical Implication:
• P(cavity |toothache) = 0.6 does not mean “If toothache, then cavity with probability
0.6”.
• It means “If toothache and no further information, then cavity with probability 0.6”.
• Further evidence (e.g., dentist finds no cavities) updates probability: P(cavity | tooth
ache ∧ ¬cavity) = 0.

401892 LECTURE 6 22
22

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 23

Defining Conditional Probability


• Mathematical Definition:
• For propositions a and b:

• Equation (3): .
• Example: P(doubles | Die = 5) = P(doubles ∧ Die = 5) / P(Die = 5).
• Product rule:
• Equation 3 can be written in a different form called product rule:
• Equation (4): ∧
• Easier to Remember:
• For a and b to be true:
• must be true.
• must be true given .

401892 LECTURE 6 23
23

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 24

Random Variables
• Factored representation of possible worlds: sets of
pairs
• Variables in probability theory: Random variables
• Domain: the set of possible values a variable can take on
• Names begin with an uppercase letter (e.g., Weather, Die1).
• Function mapping from possible worlds (Ω) to a range of values.
• Example Ranges:
• Weather: {sunny, rain, cloudy, snow} ,Die1: {1, ..., 6}, Odd: {true, false}
• Value Naming Conventions:
• Lowercase for values (e.g., P ( = ) ) to sum over the value of X.
• Boolean variable ranges: {true, false} or {0,1} (Bernoulli distribution).
401892 LECTURE 6 24
24

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 25

Variable Ranges and Propositions


• Ranges can be sets of arbitrary tokens.
• Examples of Ranges:
• Age: {juvenile, teen, adult}.
• Weather: {sun, rain, cloud, snow}.
• Abbreviations:
• A= : simply .
• A= : .
• Using Values for Propositions:
• When no ambiguity is possible, it is common to use a value by itself to stand for the
proposition that a particular variable has that value;
• Example: can stand for .
• Infinite Ranges:
• Discrete (e.g., integers) or continuous (e.g., reals).

401892 LECTURE 6 25
25

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 26

Combining Propositions and Probability


Distributions
• Elementary Propositions:
• Combine using propositional logic connectives.
• Example: ∧ .
• Conjunction Notation:
• Common to use a comma (e.g., P(cavity | ¬toothache, teen)).
• Probability Distributions:
• Bold is used as notational coding
• List probabilities of all possible values of a variable.
• Example: P(Weather) = ⟨0.6, 0.1, 0.29, 0.01⟩ for {sun, rain, cloud, snow}.
• P(Weather = sun) = 0.6
• Where the bold P indicates that the result is a vector of numbers
• And we assume a predefined order (sun, rain, cloud, snow) on the range of weather.
• We say that the P statement defines a probability distribution for the random variable .
• We can use a similar shorthand for conditional distributions
• Conditional Distributions:
• for each pair.

401892 LECTURE 6 26
26

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 27

Continuous Variable and Probability Density


Functions (PDFs)
• Infinite Values: Continuous variables have infinitely many possible values, making it impossible to
list out the entire distribution as a vector.
• Probability Density Function (PDF): Instead of exact probabilities, we use PDFs to describe the
likelihood of a random variable taking on a specific value
Uniform Distribution Example:

• Probability density from to is uniform.


• Probability Density P(x):

• For NoonTemp:

401892 LECTURE 6 27
27

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 28

Continuous Variable and Probability Density


Functions (PDFs) cont.
• Probabilities vs. Densities: Probabilities are unitless, while densities have units
(e.g., reciprocal degrees centigrade).
• Understanding Densities:
• A probability density of means there is a 100% chance the temperature will be within the
8°C range (18°C to 26°C).
• The density function varies with units. For example, the same temperature range in degrees
Fahrenheit (18°C = 64.4°F, 26°C = 78.8°F) results in a width of 14.4°F and a density of
• Implications:
• The exact probability of NoonTemp being 20.18°C is zero because it is an exact point, not an
interval.
• The density function provides a way to understand probabilities over intervals, not specific
points.

401892 LECTURE 6 28
28

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 29

Distribution Notation
• for distributions on multiple variables
o we use commas between the variables: so P(Weather, Cavity) denotes the
probabilities of all combinations of values of the 2 variables
o for discrete random variables we can use a tabular representation, in this
case yielding a 4x2 table of probabilities this gives the joint probability
distribution of Weather & Cavity
o tabulates the probabilities for all combinations

401892 LECTURE 6 29
29

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 30

Distribution Notation
• for distributions on multiple variables
o the notation also allows mixing variables & values
▪ P(sunny, Cavity) is just a 2-vector of probabilities
o the distribution notation, P, allows compact expressions
▪ for example, here are the product rules for all possible combinations of Weather &
Cavity
▪ P(Weather, Cavity) = P(Weather | Cavity)P(Cavity)
• the distribution notation summarizes what otherwise would be 8 separate equations each of
the form

401892 LECTURE 6 30
30

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 31

Full joint Distribution (FJD)


• now we fill in some details
o of the semantics of the probability of a proposition as the sum of probabilities
for the possible worlds in which it holds
▪ possible worlds are analogous to those in propositional logic
▪ each possible world is specified by an assignment of values to all of the random variables
under consideration
o for the random variables Cavity, Toothache & Weather there are 16 possible
worlds (2x2x4) & the value of a given proposition is determined in the same
recursive fashion as for formulas in propositional logic

401892 LECTURE 6 31
31

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 32

Full joint Distribution


• semantics of a proposition
o the probability model is determined by the joint distribution for all the
random variables: the full joint probability distribution
▪ for the Cavity, Toothache, Weather domain, the notation is:
▪ P(Cavity, Toothache, Weather)
▪ this can be represented as a 2x2x4 table
o given the definition of the probability of a proposition as a sum over possible
worlds, the full joint distribution allows calculating the probability of any
proposition over its variables by summing entries in the FJD

401892 LECTURE 6 32
32

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 33

Probability Axioms
• We can derive
o Some additional relationships for degrees belief among logically related propositions,
from axioms (equation 1 and 2) and some algebraic manipulation.
o For example, , the relationship between the probability of a
proposition & its negation
o and also axiom (eq. 5)
o This axiom from the probability of a disjunction is referred to as the inclusion-exclusion
principle
o Equation 1:
o Equation 5:
o Together, equations 1 and 5 are referred to as Kolmogorov’s axioms, in honor of the
Russian mathematician Andrey Kolmogorov, who showed how to build up the rest of
probability theory, including issue related to handling continuous variables.
401892 LECTURE 6 33
33

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 34

Is Probability the Answer?


• Historically
o there's been a debate over whether probabilities are the only viable
mechanism for describing degrees of belief
o the degree of belief in a proposition can be reformulated as betting odds for
establishing amounts of wagers on outcomes of events
o Bruno de Finetti (1931, 1993) proved that if an agent's set of degrees of belief
are inconsistent with the probability axioms, then when formulated as bets
on outcomes of events, there is a combinations of bets by an opposing agent
that will cause the agent to lose money every time

401892 LECTURE 6 34
34

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 35

Rationality & Probability Axioms


• apparently then
o no rational agent will have beliefs that violate the axioms of probability
▪ a common rebuttal to this argument is that betting is a poor metaphor & the agent could
just refuse to bet
▪ which itself is countered by pointing out that betting is just a model for the decision-
making that goes on, inevitably, all the time
o other authors have constructed similar arguments to support those of Bruno
de Finetti
o furthermore, in the "real world", AI reasoning systems based on probability
have been highly successful

401892 LECTURE 6 35
35

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 36

Don't Mess with the Probability Axioms


• From Figure 12.2 AIMA 4e
o Evidence for the rationality of probability

Figure 2. Agent 1's inconsistent beliefs allow Agent 2 to set up bets to guarantee Agent 1 loses, independent of outcome of a and b

o So, for example, Agent 1's degree of belief in a is 0.4, so will bet "against" it &
pay 6 to Agent 2 if a is the outcome, receive 4 from Agent 2 if it is not, and so
on

401892 LECTURE 6 36
36

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 37

Inference Using Full Joint Distributions


• Using the full joint distributions for inference
o We use FJD as the KB from which answers to all questions may be derived.
o Here's the FJD for the Toothache, Cavity, Catch domain of 3 Boolean variables
o The FJD is a

Figure 3. A full joint distribution for the Toothache, Cavity, Catch world.

o As required by the axioms, the probabilities sum to 1.0


o When available, the FJD gives a direct means of calculating the probability of
any proposition
o Just sum the probabilities for all the possible worlds in which the proposition is true

401892 LECTURE 6 37
37

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 38

Inference Using Full Joint Distributions


• An example of using the FJD for inference

o To calculate:
o Cavity toothache holds for 6 possible worlds
o The corresponding sum is = 0.28.

401892 LECTURE 6 38
38

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 39

Inference Using Full Joint Distributions


• Using the FJD for inference

• A common take is to state the distribution over a single variable or a subset of


variables: sum over the other variables to get the unconditional or marginal
probability.
• For example,
• The terminology for this is: “marginalization” or “summing out”
• It takes other variables out of the equation
• For sets of variables Y and Z: P(Y) =
• means to sum over all the possible combinations of values of the set of
variables Z

401892 LECTURE 6 39
39

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 40

Inference Using Full Joint Distributions


• Using the FJD for inference
• A variant considers conditional probabilities instead of joint probabilities uses the
product rule, referred to as conditioning

• The common scenario is to want conditional probabilities of some variable given


evidence about others
• Use the product Rule Equation 3 ∧ to get an expression in terms of
unconditional probabilities, then sum appropriately in the FJD.
• For example: the probability of a cavity, given evidence of a toothache
• ∧

401892 LECTURE 6 40
40

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 41

Inference Using Full Joint Distributions


• as a check we might
• compute the probability of no cavity, given a toothache

• as they should, the probabilities sum to 1.0


• we note that is the denominator for both, & as part of the calculation of both
values for cavity, can be viewed as a normalization constant for the distribution
• terms both have as denominator ensuring they sum to 1

401892 LECTURE 6 41
41

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 42

Normalization Constant
• note that was the denominator
• for calculating both conditional probabilities
• it functions as a normalization constant for the distribution
, ensuring the probabilities add to 1
• in AIMA, this constant is denoted by and we use it to mean a normalizing
constant , where probabilities must add to 1
• since the sum for the distribution must be 1, we can just sum the raw values
obtained and then use 1/sum for
• this may make calculations simpler, and might even allow them when some
probability assessment is not available

401892 LECTURE 6 42
42

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 43

Normalization Constant
• an example of using the normalization constant

• ( | )= ( , )/ ( )
= ( , )
= [P(Cavity, toothache, catch)+P(Cavity, toothache, ¬catch)]
= [ 0.10 , 0.016 + 0.01 , 0.06 ]
= 0.1 , 0.0
= 0.6, 0.
• since the probabilities must add to 1.0, the calculation can be done without knowing , just
normalizing at the end

401892 LECTURE 6 43
43

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 44

Generalization of Inference
• given a query, the generalized version of the process for a conditional
probability distribution is:
• for a single variable X (Cavity in the preceding example), let E be the list of
evidence variables (just Toothache in the example) and e the list of observed
values for them, and Y the unobserved variables (Catch in the example)
• the query: is calculated by summing out over the unobserved
variables
• Equation (9):

401892 LECTURE 6 44
44

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 45

Inference for Probability


• given the full joint distribution & Equation 9
• we can answer all probability queries for discrete variables
• are we left with any unresolved issues?
• well, given n variables, and d as an upper bound on the number of values then the full
joint distribution table size & corresponding processing of it are , exponential in
• since might be 100 or more for real problems, this is often simply not
practical
• as a result, the FJD is not the implementation of choice for real
systems, but functions more as the theoretical reference point
(analogous to role of truth tables for propositional logic)
• the next sections we look at are foundational for developing practical
systems
401892 LECTURE 6 45
45

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 46

Independence

401892 LECTURE 6 46
46

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 47

Independence
• consider a new version of our example domain
• now defined in terms of 4 random variables

• so has a FJD with
entries
• one way to display it would be as four tables, 1 for each value of
Weather
• how are they related?
• for example:

401892 LECTURE 6 47
47

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 48

Independence
• in the 4-variable domain
• what is the relationship between

• given what we know about relating probabilities (the product rule)

• but we "know" that dental problems don't influence the weather


• & we know weather doesn't seem to influence dental variables
• so

• P(toothache, catch, cavity, cloudy) = P(cloudy) P(toothache, catch, cavity)
• & similarly for each entry in
• thus the 32 element table for 4 variables reduces to an 8 element table & a 4 element table

401892 LECTURE 6 48
48

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 49

Independence

• The property of independence


• or marginal independence or absolute
independence
• notationally, in terms of propositions or random
variables, is:
• ∧

• from our knowledge of the domain, we can Figure 4. Two examples of factoring a large joint distribution into smaller distributions, using
simplify the full joint distribution, dividing absolute independence. (a) Weather and dental problems are independent. (b) Coin flips are
variables into independent subsets with separate independent.
distributions
• as an example, for the Dentistry-Weather
domain

401892 LECTURE 6 49
49

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 50

Independence

• Absolute independence
• while very powerful for simplifying probability
representation & inference absolute independence is
unfortunately rare
• though, for example, for independent coin tosses
• P( ), the full joint distribution with
entries becomes n single variable
distributions
• and while
• this is an artificial example and the converse is more
likely the case for real domains
• that is, within a large domain like dentistry there are
likely dozens of diseases & hundreds of symptoms, all
interrelated

401892 LECTURE 6 50 50

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 51

• Absolute independence
• while very powerful for simplifying probability representation & inference absolute independence
is unfortunately rare
• though, for example, for independent coin tosses
• P( ), the full joint distribution with entries becomes n single variable
distributions
• and while

• this is an artificial example and the converse is more likely the case for real domains
• that is, within a large domain like dentistry there are likely dozens of diseases & hundreds of
symptoms, all interrelated

401892 LECTURE 6 51
51

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 52

Bayes’ Rule and Its Use

401892 LECTURE 6 52
52

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 53

Bayes’ Rule and Its Use


• From the product Rule, for propositions a & b: ∧
• This expresses the probability of both a and b happening in terms of the conditional probability ∧
happening in terms of the conditional probability and the probability of b.
• Bayes’ Rule
• Now, let’s also e press ∧ in another way, by switching the roles of a and b:

Since ∧ is the same regardless of the order, we can set these two expressions equal to each
other:

Solving for P(a|b)


• To derive Bayes’ Rule, we solve for P(a|b) by dividing both sides by P(b):

• This simple equation allows us to update the probability of an event based on new evidence .

401892 LECTURE 6 53
53

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 54

Bayes’ Rule and Its Use Cont.


• in the general case of multivalued variables, in distribution form
( ) ( )

• representing the set of equations, each for specific values of the variables
• & finally, a version indicating conditionalizing on background evidence
e

401892 LECTURE 6 54
54

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 55

Bayes’ Rule
• Bayes' rule
• is the basis of most AI systems of probabilistic inference
• It allows us to compute the single term P(b|a) in terms of three terms: P(a|b),
P(b), and P(a).
• finding diagnostic probability from causal probability
• specifies relationship in causal direction
• describes diagnostic direction

• in the medical domain, it is common to have conditional probabilities on


causal relationships
• P(symptoms | disease)

401892 LECTURE 6 55
55

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 56

Bayes’ Rule
• Bayes’ rule: a medical e ample

• here's a medical domain example


• a patient presents with a stiff neck, a known symptom of the disease meningitis
• the physician "knows" the prior probabilities of stiff neck (P(s) = 0.01) & meningitis (P(m) =
0.0002)
• in addition the physician knows that 70% of patients with meningitis have a stiff neck: P(s|m)
= 0.7

• P(m|s) = 0.7 × 0.00002/ 0.01


• 0.0014

401892 LECTURE 6 56
56

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 57

Bayes’ Rule Example


• Bayes’ rule & the meningitis e ample
P(m|s)=P(s|m)P(m)/P(s)
=0.7*0.00002/0.01
=0.0014
• So, we should expect only 1 in 700 patient with a stiff neck to have meningitis,
reflecting the much higher prior probability of stiff neck than of meningitis
• Note: normalization can be applied when using Bayes' Rule
• P(Y|X) =
• Where is a normalization constant so entries in P(Y|X) sum to 1

401892 LECTURE 6 57
57

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 58

Bayes’ Rule: n Evidence Variables


• Bayes’ rule & the dental diagnosis: scaling up
• for the combining of evidence from multiple sources/variables, how does use
of Bayes' Rule scale up, compared to using the FJD?
• the sample problem:
• what does the dentist conclude about a cavity when the patient has a
toothache & the probe catches in the sore tooth
• Equation (16):


• there's not an issue with just 2 sources, but if there are , then we have
possible combinations of observed values & we need to know the
conditional probabilities for each (no better than needing the full joint
distribution)
401892 LECTURE 6 58
58

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 59

Bayes’ Rule: n Evidence Variables


• Bayes' rule & the dental diagnosis: scaling up
• we return to the idea of independence
• in the example, Toothache & Catch are not absolutely independent, but are independent given either the
presence or absence of a cavity (each is caused by the cavity but otherwise they are independent)
• expressing the conditional independence given Cavity we get
• Equation (17): ∧

• (16):
∧ ∧
• substituting into yields the following, reflecting the conditional independence of
Toothache and Catch
• ∧

401892 LECTURE 6 59
59

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 60

Conditional Independence
• the general form of the conditional independence rule
• here are the most general & for the dental diagnosis domain

• (Eq.19):
• conditional independence also allows decomposition
• for the dental problem, algebraically, given (Eq.19), we have

401892 LECTURE 6 60
60

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 61

Conditional Independence
• implications of the conditional independence rule

• we decompose the original large table, which has = 7 independent


entries, into 3 smaller tables
• 2 of the tables are of the form P(T|C) with 2 rows, each of which must sum to 1 so has 1
independent number
• 1 table with 1 row for the prior distribution P(C) so having 1 more independent number
• for our Toothache, Catch, Cavity domain, we've gone from 7 to 5 independent
values in total, a small gain for a small problem
• but if there were n symptoms, all conditionally independent given Cavity, the size of the
resulting representation would be linear in n instead of exponential

401892 LECTURE 6 61
61

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 62

Conditional Independence
• summary: conditional independence
• allows scaling up to real problems since the representational complexity can
go from exponential to linear
• is more often applicable than absolute independence assertions
• yields this net gain: the decomposition of large domains into weakly
connected subsets
• is illustrated in a prototypical way by the dental domain: one cause influences
multiple effects, which are conditionally independent, given that cause

401892 LECTURE 6 62
62

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 63

Conditional Independence
• summary: conditional independence
• with multiple effects, which are conditionally independent, given the cause,
the full joint distribution then is rewritten as

• this is called the naïve Bayes model


• it makes the simplifying assumption that all effects are conditionally
independent
• it is naïve in that it is applied to many problems although the effect variables
are not precisely conditionally independent given the cause variable
• nevertheless, such systems often work well in practice

401892 LECTURE 6 63
63

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 64

The Wampus World Revisited

401892 LECTURE 6 64
64

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 65

The Wampus World Revisited


• recall the Wumpus World agent
• the agent explores the grid world to grab the gold while attempting to avoid being
eaten by the Wumpus or falling into a bottomless Pit
• we used propositional logic for representation & inference
• now we'll explore an example
• that uses probability in Wumpus World
• we'll simplify by restricting our WW hazards only to Pits
• recall that
1. the percept of a breeze in a square indicates a pit in a neighbouring square
2. the logical representation allowed some conclusions about whether a square was
safe but not a quantitative measure of risk if not absolutely safe
• the "is it safe" problem can be reformulated to use our new probability
tools

401892 LECTURE 6 65
65

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 66

The Wampus World Revisited


• the world
• incomplete information about the presence of Pits leads
to uncertainty, & the agent should choose the best next
move
• Figure 5, shows a situation in which each of the three
unvisited but reachable squares—[1,3], [2,2], and [3,1]—
might contain a pit
• Out aim is to calculate the probability that each of the
three squares contains a pit. (for this example, we ignore
the Wumpus and the gold.)

Figure 5 [1]. After finding a breeze in both [1,2] and [2,1], the agent is
stuck—there is no safe place to explore. (b) Division of the squares into ,
Known Frontier , and other, for a query about [1,3].

401892 LECTURE 6 66
66

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 67

The Wampus World Revisited


• The relevant properties of the wump world are that:
1. A pit cause breezes in all neighbouring squares
2. Each square other than [1,1] contain a pit with probability 0.2
• The first step is to identify the set of random variables we need:
here are the Random Variables in the problem
1. one Boolean variable for each square, which is true [ , ] contains a
pit
2. one per observed square, = [ , ] is breezy; we include these
variables only for the observed squares—in this case [1,1], [1,2], [2,1]
so we include only in the probability model
• The next step is specifying the full joint distribution

401892 LECTURE 6 67
67

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 68

Probabilities in Wumpus World


• We begin with the full joint distribution
=
• Applying the product rule yields

• 1st term: the conditional probability of a breeze configuration given a pit


configuration
• values in the first term are 1 if adjacent to a pit, 0 otherwise
• 2nd term: the prior probability of a pit configuration
• pits are placed randomly, independent of each other, with probability 0.2 for any square, so
• Equation 22:
• For a particular configuration with exactly pits, the probability is

401892 LECTURE 6 68
68

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 69

Probabilities in Wumpus World


• in the example, we have observed evidence
• a breeze or not in each visited square + no pit in any
visited square, abbreviated as b & known:
∧ ∧
∧ ∧
• an example query concerns the safety of other squares:
what's the probability of a pit at [1,3], given the evidence
so far?

• we could answer by summing over cells in the FJD

401892 LECTURE 6 69
69

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 70

Probabilities in Wumpus World


• to use summation over the FJD
• let Unknown be the set of variables for squares other than Known & [1,3]
• so from (Equation 9) we have

• that is, we can just sum over the entries in the Full Joint Distribution
but with 12 unknown squares we have terms in the summation,
so the calculation is exponential in the number of squares
• so we'll need to simplify from insight about independence
• we note: not all unknown squares are equally relevant to the query

401892 LECTURE 6 70
70

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 71

Probabilities in Wumpus World


• since summations over the FJD are exponential
• we need to simplify, given insight about independence
• to begin, we note that not all unknown squares are equally relevant to the
query
• first, some terminology about partitioning the pit variables
• frontier are those pit variables (besides the query variable) neighbouring the visited
squares
• other are the remaining pit variables
• with this revision, we see that the observed breezes are conditionally
independent of the other variables, given the known, frontier & query
variables

401892 LECTURE 6 71
71

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 72

Probabilities in Wumpus World


• using conditional independence

• note that the figures use Frontier to name the relevant squares neighbouring the visited
squares ([2,2] & [3,1])
• then we'll need to manipulate our query into a form where we can use this
• the query:
• the world:

401892 LECTURE 6 72
72

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 73

Using Conditional Independence


• using the conditional independence simplification

(the query, from Eq. 3)

• Then by product rule

• then partitioning unknown into frontier & other

• then using the conditional independence of b from other


• given (& so dropping other from first term)

• since the 1st term now does not depend on other, move the summation inward:

401892 LECTURE 6 73
73

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 74

Using Conditional Independence


• Manipulating the query to get efficient computation
• We began with

• so far, we have

• Using independence as in (Equ. 22), to factor the prior term

• then reorder the term

• fold P(known) into the normalizing constant


• & use

401892 LECTURE 6 74
74

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 75

Probabilities in Wumpus World


• using conditional independence & independence
• has yielded an expression with just 4 terms in the summation over the frontier variables &
eliminating other squares

• the expression is 1 when the frontier is consistent with the breeze


observations, 0 otherwise
• so to get each value of we sum over the logical models for frontier variables that are consistent
with known facts
• this figure shows the models & the associated priors

• .
Figure 6. Consistent model for frontier variables, and , showing P(frontier) for each model: (a) three model with = true showing two or three models with
= true showing two or three pits, and (b) two models with = false showing one or two pits [1].

401892 LECTURE 6 75
75

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 76

Probabilities in Wumpus World


• using conditional independence & independence
• has yielded an expression with just 4 terms in the summation over the
frontier variables & eliminating other squares

•.
Figure 6. Consistent model for frontier variables, and , showing P(frontier) for each model: (a) three model with = true showing two or three models with
= true showing two or three pits, and (b) two models with = false showing one or two pits [1].

401892 LECTURE 6 76
76

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 77

Using Conditional Independence


• note that , are symmetric
• so by symmetry, [3,1] would contain a pit about 31% of the time:

• & by a similar calculation, [2,2] can be shown to contain a pit with about 0.86
probability: 0. 6, 0.1
• it is clear to the probabilistic agent where not to go next

401892 LECTURE 6 77
77

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 78

Probability in Wumps World


• the logical agent & the probabilistic agent
• strictly logical inferencing can only yield known safe/known unsafe/unknown
• the probabilistic agent knows which move is relatively safer, relatively more
dangerous
• for efficient probabilistic solutions we can use independence & conditional
independence among variables to simplify the summations involved
• fortunately, these often match our natural understanding of how the problem should be
decomposed

401892 LECTURE 6 78
78

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 79

Reference
• Book: Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern
Approach (4th Edition). Pearson
• [1] Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern
Approach (4th Edition). Pearson.

401892 LECTURE 6 79
79

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________
Slide 80

Next Class
• Introduction to Machine Learning

401892 LECTURE 6 80
80

____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________

You might also like