0% found this document useful (0 votes)
2 views

Module 5

The document discusses how agents can manage uncertainty through degrees of belief and probability, outlining key concepts such as probabilistic inference, conditional planning, and decision-making under uncertainty. It emphasizes the importance of probability in summarizing ignorance and laziness in decision-making processes, as well as the role of utility theory in representing preferences. Examples, including a medical diagnosis and travel planning, illustrate the complexities of acting under uncertainty and the necessity of probabilistic reasoning.

Uploaded by

h13584962
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 5

The document discusses how agents can manage uncertainty through degrees of belief and probability, outlining key concepts such as probabilistic inference, conditional planning, and decision-making under uncertainty. It emphasizes the importance of probability in summarizing ignorance and laziness in decision-making processes, as well as the role of utility theory in representing preferences. Examples, including a medical diagnosis and travel planning, illustrate the complexities of acting under uncertainty and the necessity of probabilistic reasoning.

Uploaded by

h13584962
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

BAD402-Module 5

UNCERTAINTY

In which we see how an agent can tame uncertainty with degrees of belief.
Outlin
e

1 Acting Under
Uncertainty
2 Basics on
Probability
3 Probabilistic Inference via
Enumeration
4 Independence and Conditional
Independence
5 Applying Bayes’
Rule
6 An Example: The Wumpus World
Revisited
Outlin
e

1 Acting Under
Uncertainty
2 Basics on
Probability
3 Probabilistic Inference via
Enumeration
4 Independence and Conditional
Independence
5 Applying Bayes’
Rule
6 An Example: The Wumpus World
Revisited
The real world: Things go wrong
Consider a plan for changing a tire after getting a flat using
operators of RemoveTire(x), PutOnTire(x),
InflateTire(x)

• Incomplete information
– Unknown preconditions, e.g., Is the spare actually
intact?
– Disjunctive effects, e.g., Inflating a tire with a pump may cause the tire
to inflate or a slow hiss or the tire may burst or …

• Incorrect information
– Current state incorrect, e.g., spare NOT intact
– Missing/incorrect postconditions (effects) in operators

• Qualification problem:
– can never finish listing all the required preconditions and possible
conditional outcomes of actions
Possible Solutions
• Conditional planning
– Plan to obtain information (observation actions)
– Subplan for each contingency, e.g.,
[Check(Tire1), [IF Intact(Tire1)
THEN [Inflate(Tire1)] ELSE
[CallAAA] ]
– Expensive because it plans for many
unlikely cases

• Monitoring/Replanning
– Assume normal states, outcomes
– Check progress during execution, replan
if necessary
Dealing with Uncertainty Head-on: Probability
• Let action At = leave for airport t minutes before flight from Lindbergh Field
• Will At get me there on time?

Problems:
1. Partial observability (road state, other drivers' plans, etc.)
2. Noisy sensors (traffic reports)
3. Uncertainty in action outcomes (turn key, car doesn’t start, etc.)
4. Immense complexity of modeling and predicting traffic

Hence a purely logical approach either


1) risks falsehood: “A90 will get me there on time,” or
2)leads to conclusions that are too weak for decision making:
“A90 will get me there on time if there's no accident on I-5 and it doesn't rain
and my tires remain intact, etc., etc.”
(A1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport
…)
Probability
Probabilistic assertions summarize effects of
Ignorance: lack of relevant facts, initial conditions, etc.
Laziness: failure to enumerate exceptions, qualifications, etc.

Subjective or Bayesian probability:


Probabilities relate propositions to one's own state of knowledge e.g.,
P(A90 succeeds | no reported accidents) = 0.97

These are NOT assertions about the world, but represent belief about the whether the assertion is
true.

Probabilities of propositions change with new evidence: e.g.,


P(A90 | no reported accidents, 5 a.m.) = 0.99

(Analogous to logical entailment status; I.e., does KB |=  )


Making decisions under uncertainty

• Suppose I believe the following:


P(A 30 gets me there on time | ...) = 0.05
P(A 60 gets me there on time | ...) = 0.70
P(A 100 gets me there on time | ...) = 0.95
P(A 1440 gets me there on time | ...) = 0.9999

• Which action to choose?

• Depends on my preferences for missing flight vs. airport cuisine, etc.

• Utility theory is used to represent and infer preferences

• Decision theory = utility theory + probability theory


Acting Under
Uncertainty
• Agents often make decisions based on incomplete information
• partial
observability
nondeterministic
actions
• Partial solution: maintain belief states represent the
set of all possible world states the agent might be in
generating a contingency plan handling every possible
eventuality
• Several drawbacks:
• must consider every possible explanation for the
observation (even very-unlikely ones)
• ⇒ impossibly complex belief-states
• contingent plans handling every eventuality grow
arbitrarily large sometimes there is no plan that is
guaranteed to achieve the goal
• Agent’s knowledge cannot guarantee a successful
outcome ...
• ... but can provide some degree of belief (likelihood)
Acting Under Uncertainty:
Example (2)
A medical diagnosis
• Given the symptoms (toothache) infer the
cause (cavity) How to encode this relation in
logic?
• diagnostic rules:
• Toothache → Cavity (wrong)
• Toothache → (Cavity ∨ GumProblem ∨ Abscess
∨ ...) (too many possible causes, some
very unlikely) causal rules:
• Cavity → Toothache (wrong)
• (Cavity ∧ ...) → Toothache (many possible
(con)causes)
• Problems in specifying the correct logical
rules: Complexity: too many possible antecedents
or consequents Theoretical ignorance: no complete
theory for the domain Practical ignorance: no
complete knowledge of the patient
Summarizing
Uncertainty
• Probability allows to summarize the uncertainty on
effects of laziness: failure to enumerate exceptions,
qualifications, etc. ignorance: lack of relevant facts, initial
conditions, etc.
• Probability can be derived from
• statistical data (ex: 80% of toothache patients so far had
cavities) some knowledge (ex: 80% of toothache patients
has cavities) their combination thereof
• Probability statements are made with respect to a state of knowledge (aka
evidence), not with respect to the real world
• e.g., “The probability that the patient has a cavity, given that she has a toothache, is
0.8”:
• P(HasCavity (patient ) | hasToothAche(patient)) = 0.8
• Probabilities of propositions change with new evidence:
• “The probability that the patient has a cavity, given that she has a toothache and a
history of gum disease, is 0.4”:
• P(HasCavity (patient ) | hasToothAche(patient ) ∧ HistoryOfGum(patient )) = 0.4
Making Decisions Under
Uncertainty
• Ex: Suppose I believe:
• P(A25 gets me there on time |...) =
0.04 P(A90 gets me there on time
|...) = 0.70 P(A120 gets me there on
time |...) = 0.95 P(A1440 gets me
there on time |...) = 0.9999 Which
action to choose?
• Depends on tradeoffs among
preferences:
• missing flight vs. costs (airport
cuisine, sleep overnight in
airport)
• When there are conflicting goals the agent may express preferences among
them by means of a utility function.
• Utilities are combined with probabilities in the general theory of rational
decisions, aka decision theory:
• Decision theory = Probability theory + Utility theory
Outlin
e

1 Acting Under
Uncertainty
2 Basics on
Probability
3 Probabilistic Inference via
Enumeration
4 Independence and Conditional
Independence
5 Applying Bayes’
Rule
6 An Example: The Wumpus World
Revisited
Probabilities Basics: an AI-sh
Introduction
• Probabilistic assertions: state how likely possible worlds
are Sample space Ω: the set of all possible worlds
• ω ∈ Ω is a possible world (aka sample point or
atomic event) ex: the dice roll (1,4)
• the possible worlds are mutually exclusive and
exhaustive
• ex: the 36 possible outcomes of rolling two dice:
(1,1), (1,2), ...
• A probability model (aka probability space) is a sample space with an
assignment P(ω) for every ω ∈ Ω s.t.
• 0 ≤ P(ω) ≤ 1, for every ω ∈
Ω Σω∈Ω P(ω) = 1
• Ex: 1-die roll: P(1) = P(2) =
P(3) = P(4) = P(5) = P(6) =
1/6
• An Event A is any subset of Ω,
Random
Variables

• Factored representation of possible worlds: sets of


⟨variable, value⟩ pairs Variables in probability theory:
Random variables
• domain: the set of possible values a variable can take on
• ex: Die: {1, 2, 3, 4, 5, 6}, Weather: {sunny, rain, cloudy,
snow }, Odd: {true, false}
• a r.v. can be seen as a function from sample
points to the domain: ex: Die(ω), Weather (ω),...
(“(ω)” typically omitted)
• Probability Distribution
X : P(X gives
= xi ) =def Σ
ω∈X the probabilities of all
P(ω)
the possible values of a random variable
(ω)

ex: P(Odd = true) = P(1) + P(3) + P(5) = 1/6 + 1/6 + 1/6


= 1/2
Propositions and
Probabilities
• We think a proposition a as the event A (set of sample points) where the
proposition is true
• Odd is a propositional random variable of range {true, false}
• notation: a ⇐⇒ “A = true′′
• Given Boolean random variables A and
B:
• a: set of sample points where A(ω) = true
• ¬a: set of sample points where
A(ω) = false
• a ∧ b: set of sample points where
A(ω) = true, B(ω) = true
• =⇒ with Boolean random variables, sample points are
PL models Proposition: disjunction of the sample
points in which it is true
• ex: (a ∨ b) ≡ (¬a ∧ b) ∨ (a ∧ ¬b) ∨ (a ∧ b)
• =⇒ P(a ∨ b) = P(¬a ∧ b) + P(a ∧ ¬b) + P(a ∧ b)
Probability
Distributions
Probability Distribution gives the probabilities of all the possible values of a
random variable{sunny, rain, cloudy, snow
ex: Weather:
} P(Weather = sunny ) =
0.6
P(Weather = rain) =
=⇒ P(Weather ) = (0.6, 0.1, 0.29,
0.1
P(Weather = cloudy ) =
0.01) ⇐⇒
0.29
P(Weather = snow ) =
normalized: their sum is 1 0.01
Joint Probability Distribution for multiple
variables
Weather
gives the probability of every =
sample point rain cloudy
ex: P(Weather , Cavity ) sunny
Cavity = true 0.144 snow
0.02 0.016 0.02
=
Cavity = false 0.576 0.08 0.064 0.08
Every event is a sum of sample points,
=⇒ its probability is determined by the joint
distribution
Probability for Continuous
Variables
Express continuous probability
∫+∞
distributions:
density functions f (x ) ∈ [0, 1]− s.tf (x )dx =
1 ∫ b

P(x ∈ [a, b]) = a f (x )
dx
=⇒ P(x ∈ [val, val]) = 0, P(x ∈ [−∞, +∞])

= 1 ex: P(x ∈ [20, 22]) =22 0.125 dx =
20
0.25 P(x ) = P(X = x ) =def lim
Density: ›→ dx 0 P(X ∈ [x, x +
dx ])/dx
ex: P(20.1) = limdx ›→0 P(X ∈ [20.1, 20.1 + dx ])/dx =
0.125
note: P(v ) /= P(x ∈ [v , v ]) = 0
Conditional
Probabilities
• Unconditional or prior probabilities refer to degrees of belief in propositions in
the absence of any other information (evidence)
• ex: P(cavity ) = 0.2, P(Total = 11) = 1/18, P(double) = 1/6
• Conditional or posterior probabilities refer to degrees of belief in proposition
a given some evidence b: P(a|b)
• evidence: information already revealed
• ex: P(cavity |toothache) = 0.6: p. of a cavity given a toothache (assuming no other
information is provided!)
• ex: P(Total = 11|die1 = 5) = 1/6: p. of total 11 given first die is 5
• =⇒ restricts the set of possible worlds to those where the first die is 5
• Note: P(a|... ∧ a) = 1, P(a|... ∧ ¬a) = 0
• ex: P(cavity |toothache ∧ cavity ) = 1, P(cavity |toothache ∧ ¬cavity ) = 0
• Less specific belief still valid after more evidence arrives
• ex: P(cavity ) = 0.2 holds even if P(cavity |toothache) = 0.6
• New evidence may be irrelevant, allowing for simplification
• ex: P(cavity |toothache, 49ersWin) = P(cavity |toothache) = 0.8
Conditional Probabilities
[cont.]
Conditional probability: P(a|f P(a∧b)
de
P(b) , s.t. P(b) >
b) =ex: P(Total = 11|die = 5) P(Total=1 ∧die
0 1=5) = 1/ 6·1/ 6 =
1 1 P(die =5) 1
=
observing 1/6 worlds 1/6
b restricts the possible to those where
b is true
Production
Production rule: ∧ b) =
P(awhole
rule for P(a|b) · P(b) P(X,
distributions: Y ) =· P(a)
= P(b|a) P(X |Y )
· P(Y )
ex: P(Weather, Cavity ) = P(Weather |Cavity )P(Cavity ), that
is:
P(sunny , cavity ) = P(sunny |cavity )P(cavity )
...
P(snow, ¬cavity ) = P(snow|¬cavity )P(¬cavity )
a 4 × 2 set of equations, not matrix multiplication!
Chain rule is derived by successive application of product
rule:
= ...1 , ..., Xn )
P(X
= P(X1 , ..., Xn− 1 )P(Xn |X1 , ..., Xn− 1 )
= P(X1, ..., Xn−2)P(Xn−1|X1, ..., Xn−2 )P(Xn |X1, ..., X n − 1 )
Logic vs.
Probability
Logic Probability
a
¬a P(a) = 1
a b P(a) = 0
→ P(b|a) = 1
(a, a → b) P (a) = 1, P(b|a) = 1
b P(b) = 1
(a → b, b → c) P(b|a) = 1, P(c|b) = 1
a→ c P(c|a) = 1

Proof of P(b|a) = 1, P(c|b) = 1 =⇒ P(c|a)


= 1 P(b|a) = 1 =⇒ P(¬b, a)de
f = P(¬b|a)P(a)
=0
P(c|b)
de
= 1 =⇒ P(¬c, b)
f = P(¬c|b)P(b)
= 0 a) = P(¬c, a, b) + P(¬c, a, ¬b) ≤
P(¬c,

P(¬c|a) = P(¬c, a)/P(a)


=0
P(c|a) = 1 − P(¬c|a) = 1
Outlin
e

1 Acting Under
Uncertainty
2 Basics on
Probability
3 Probabilistic Inference via
Enumeration
4 Independence and Conditional
Independence
5 Applying Bayes’
Rule
6 An Example: The Wumpus World
Revisited
Probabilistic Inference via
Enumeration

Basic Ideas
Start with the joint distribution P(Toothache, Catch, Cavity )
For any proposition ϕ, sum the atomic events where ϕ is true: P(ϕ) = Σ ω :

ω|=ϕP(ω)
Probabilistic Inference via Enumeration:
Example
Example: Generic Inference
Start with the joint distribution P(Toothache, Catch, Cavity )
For any proposition ϕ, sum the atomic events where ϕ is true: P(ϕ) = Σ ω :

P(ω): Ex: P(cavity ∨ toothache) = 0.108 + 0.012 + 0.072 + 0.008 +


ω|=ϕ

0.016 + 0.064 = 0.28


Probabilistic Inference via Enumeration:
Example
Example: Generic Inference
Start with the joint distribution P(Toothache, Catch, Cavity )
For any proposition ϕ, sum the atomic events where ϕ is true: P(ϕ) = Σ ω :

P(ω): Ex: P(cavity ∨ toothache) = 0.108 + 0.012 + 0.072 + 0.008 +


ω|=ϕ

0.016 + 0.064 = 0.28


Marginalizati
on

Start with the joint distribution P(Toothache, Catch, Cavity )


Marginalization (aka summing out):
sum up the probabilities for each possible value of the other
Σ
variables:
P(Y) = z∈Z P(Y, z)
Σ
Ex: P(Toothache) =
z∈ { Catch,Cavity } P(Toothache, z)
Conditioning: variant of marginalization, involving conditional probabilities
instead of joint probabilities (using the product rule)
Σ
P(Y) = z∈Z P(Y|z)P(z)
Σ
Ex: P(Toothache) = z∈ { Catch,Cavity } P(Toothache|z)P(z)
Marginalization:
Example
Start with the joint distribution P(Toothache, Catch, Cavity )
Marginalization (aka summing out):
sum up the probabilities for each possible value of the other
variables:Σ
P(Y) = z∈Z P(Y, z)
Ex: P(Toothache) = Σ
z∈ { Catch,Cavity } P(Toothache, z)
P(toothache) = 0.108 + 0.012 + 0.016 + 0.064
= 0.2
P(¬toothache) = 1 − P(toothache) = 1 − 0.2 =
0.8
=⇒ P(Toothache) = ⟨0.2,
0.8⟩
Marginalization:
Example
Start with the joint distribution P(Toothache, Catch, Cavity )
Marginalization (aka summing out):
sum up the probabilities for each possible value of the other
variables: Σ
P(Y) = z∈Z P(Y, z)
Σ
Ex: P(Toothache) =
z∈ { Catch,Cavity } P(Toothache, z)
P(toothache) = 0.108 + 0.012 + 0.016 + 0.064
= 0.2
P(¬toothache) = 1 − P(toothache) = 1 − 0.2 =
0.8
=⇒ P(Toothache) = ⟨0.2,
0.8⟩
Conditional Probability via Enumeration:
Example
Start with the joint distribution P(Toothache, Catch,
Cavity )
Conditional
Ex: P(¬ cavityProbability:
|toothache) (¬cavity ∧toothache)
P(toothache)
P
= 0.016+0.06
= =
0.108+0.012+0.016+0.0
4
64 0.4
Ex: P(cavity |toothache) P(cavity
∧toothache)
= ... =
= P(toothache) 0.6
Conditional Probability via Enumeration:
Example
Start with the joint distribution P(Toothache, Catch,
Cavity )
Conditional
Ex: P(¬ cavityProbability:
|toothache) (¬cavity ∧toothache)
P(toothache)
P
= 0.016+0.06
= =
0.108+0.012+0.016+0.0
4
64 0.4
Ex: P(cavity |toothache) P(cavity
∧toothache)
= ... =
= P(toothache) 0.6
Conditional Probability via Enumeration:
Example
Start with the joint distribution P(Toothache, Catch,
Cavity )
Conditional
Ex: P(¬ cavityProbability:
|toothache) (¬cavity ∧toothache)
P(toothache)
P
= 0.016+0.06
= =
0.108+0.012+0.016+0.0
4
64 0.4
Ex: P(cavity |toothache) P(cavity
∧toothache)
= ... =
= P(toothache) 0.6
Normalizati
on
Let X be all the variables. Typically, we want P(Y|E
= e): the conditional joint distribution of the
query variables Y given specific values e for the
evidence variables E
let the hidden variables be H def X \ (Y ∪ E)
=
The summation of joint entries is done by summing out
the hidden
where αfvariables:
de
= 1/P(E = e) (different α’s for different
P(Y |E = e)
values
=⇒ it =
is easy αP(Y
of to
e) , E = e)
compute = normalization
α by αΣ h∈H P(Y , E = e, H = h)
note: the terms in the summation are joint entries,
because Y, E, H together exhaust the set of random
variables X
Idea:fixing
compute whole
evidence distribution
variables on query
and summing overvariable
hidden by:
Σ
variables the final distribution, so that
normalize
... =
Complexity: O(2n ), n number of propositions =⇒ impractical for
1
large n’s
Normalization:
Example
de
α=f 1/P(toothache) can be viewed as a normalization
constant Idea: compute whole distribution on query
variable
fixingby:
evidence variables and summing over hidden
variables normalize the final distribution, so that
Σ
... = 1
Ex:
P(Cavity |toothache) = αP(Cavity ∧ toothache)
= α[P(Cavity, toothache, catch) + P(Cavity, toothache,
¬catch)]
= α[⟨0.108, 0.016⟩ + ⟨0.012, 0.064⟩]
= α⟨0.12, 0.08⟩ = (normalization) = ⟨0.6, 0.4⟩ [α = 5]
P(Cavity |¬toothache) = ... = α⟨0.08, 0.72⟩ = ⟨0.1, 0.9⟩
[α = 1.25]
Normalization:
Example
de
α=f 1/P(toothache) can be viewed as a normalization
constant Idea: compute whole distribution on query
variable
fixingby:
evidence variables and summing over hidden
variables normalize the final distribution, so that
Σ
... = 1
Ex:
P(Cavity |toothache) = αP(Cavity ∧ toothache)
= α[P(Cavity, toothache, catch) + P(Cavity, toothache,
¬catch)]
= α[⟨0.108, 0.016⟩ + ⟨0.012, 0.064⟩]
= α⟨0.12, 0.08⟩ = (normalization) = ⟨0.6, 0.4⟩ [α = 5]
P(Cavity |¬toothache) = ... = α⟨0.08, 0.72⟩ = ⟨0.1, 0.9⟩
[α = 1.25]
Normalization:
Example
de
α=f 1/P(toothache) can be viewed as a normalization
constant Idea: compute whole distribution on query
variable
fixingby:
evidence variables and summing over hidden
variables normalize the final distribution, so that
Σ
... = 1
Ex:
P(Cavity |toothache) = αP(Cavity ∧ toothache)
= α[P(Cavity, toothache, catch) + P(Cavity, toothache,
¬catch)]
= α[⟨0.108, 0.016⟩ + ⟨0.012, 0.064⟩]
= α⟨0.12, 0.08⟩ = (normalization) = ⟨0.6, 0.4⟩ [α = 5]
P(Cavity |¬toothache) = ... = α⟨0.08, 0.72⟩ = ⟨0.1, 0.9⟩
[α = 1.25]
Outlin
e

1 Acting Under
Uncertainty
2 Basics on
Probability
3 Probabilistic Inference via
Enumeration
4 Independence and Conditional
Independence
5 Applying Bayes’
Rule
6 An Example: The Wumpus World
Revisited
Independen
ce Variables X and Y are independent iff P(X, Y ) =
P(X )P(Y )
(or equivalently,
ex: P(Toothache, P(X |YCavity,
iff Catch, ) = P(X ) or P(Y
Weather |X ) =
) = P(Toothache, Catch,
=⇒ ))
P(Y )P(Weather catch,
e.g. P(toothache,
Cavity ) cavity, cloudy ) = P(toothache, catch,
cavity )P(cloudy )
typically based on domain knowledge
May drastically
=⇒ ex: reduce
32-element table the number of
decomposed entries
into and computation
one 8-element and one 4-
element table
Unfortunately, absolute independence is quite rare
Conditional
Independence

Variables X and Y are conditionally independent given Z iff P(X, Y |Z) = P(X |Z)P(Y
|Z)
(or equivalently, iff P(X |Y , Z) = P(X |Z) or P(Y |X, Z) = P(Y |Z))
Consider P(Toothache, Cavity , Catch)
if I have a cavity, the probability that the probe catches in it doesn’t depend on
whether I have a toothache: P(catch|toothache, cavity ) = P(catch|cavity )
the same independence holds if I haven’t got a cavity:
P(catch|toothache, ¬cavity ) = P(catch|¬cavity )
=⇒ Catch is conditionally independent of Toothache given Cavity:
P(Catch|Toothache, Cavity ) = P(Catch|Cavity )
or, equivalently: P(Toothache|Catch, Cavity ) = P(Toothache|Cavity ), or
P(Toothache, Catch|Cavity ) = P(Toothache|Cavity )P(Catch|Cavity )
Hint: Toothache and Catch are two (mutually-independent) effects of the same
cause Cavity
Conditional Independence
[cont.]
In many cases, the use of conditional independence reduces the size of the
representation of the joint distribution dramatically
even from exponential to linear!
P(Toothache, Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch,
Ex:
= )
P(Toothache|Catch,
Cavity Cavity )P(Catch|
=
Cavity )P(Cavity )
P(Toothache|Cavity )P(Catch|Cavity )P(Cavity
)
⇒ PassesP(Toothache,
from 7 to 2+2+1=5
Catch, Cavityindependent numbers
) contains 7 independent
entries
(the 8th can be obtained as 1 −Σ ...)
P(Toothache|Cavity ),P(Catch|Cavity ) contain 2 independent entries (2 × 2 matrix,
each row sums to 1)
P(Cavity ) contains 1 independent entry
General Case: if one causes has n independent
effects:
Exercis
e
Consider the joint probability distribution described in the table in previous
section (slide 20 onwards): P(Toothache, Catch, Cavity )
Consider the example in previous slide:
P(Toothache, Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
= P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
Compute separately the distributions P(Toothache|Catch, Cavity ), P(Catch|
Cavity ),
P(Cavity ), P(Toothache|Cavity ).
Recompute P(Toothache, Catch, Cavity ) in two
ways: P(Toothache|Catch, Cavity )P(Catch|Cavity
)P(Cavity ) P(Toothache|Cavity )P(Catch|
Cavity )P(Cavity )
and compare the result with P(Toothache, Catch,
Cavity )
Outlin
e

1 Acting Under
Uncertainty
2 Basics on
Probability
3 Probabilistic Inference via
Enumeration
4 Independence and Conditional
Independence
5 Applying Bayes’
Rule
6 An Example: The Wumpus World
Revisited
Bayes’
Rule

Bayes’
Rule/Theorem/Law P(a ∧ P(b|a)P(a)
Bayes’ rule: P(a|b) =
b)P(b) P(b)
=
P(X |
In distribution form P(Y |X = αP(X |Y )P(Y
Y )P(Y
P(X)
) = de )
)
α f= 1/P(X ): normalization constant to make P(Y |X ) entries
sum to 1 α′ s for different values of X )
(different
A version conditionalized on some background
evidence e:
P(X |Y , e)P(Y |
e)
P(Y |X , e) = P(X |
e)
Using Bayes’ Rule: The Simple
Case
Used to assess diagnostic probability from causal
probability:
P(effect|cause)P(cause)
P(cause|effect )
P(effect )
= P(cause|effect ) goes from effect to cause (diagnostic direction)
P(effect |cause) goes from cause to effect (causal direction)

Example
An expert doctor is likely to have causal knowledge ... P(symptoms|disease)
(i.e., P(effect |cause))
... and needs producing diagnostic knowledge P(disease|symptoms) (i.e., P(cause|
effect )) Ex: let m be meningitis, s be stiff neck
P(m) = 1/50000, P(s) = 0.01 (prior knowledge, from statistics)
“meningitis causes to the patient a stiff neck in 70% of cases”: P(s|m) = 0.7
(doctor’s experience)
P(s|m)P(m) .7 ·
⇒ P(m|s) = = =
0 P(s) 1/ 50000
0.0
0.0014
1
Using Bayes’ Rule: Combining
Evidence
A naive Bayes model is a probability model that assumes the effects are
conditionally independent, given the cause

total number of parameters is linear in n


ex: P(Cavity , Toothache, Catch) = P(Cavity )P(Toothache|Cavity )P(Catch|
Cavity )
Q: How can we compute P(Cause|Effect1, ..., Effectk )?
ex P(Cavity |toothache ∧ catch)?
Using Bayes’ Rule: Combining Evidence
[cont.]

Q: How can we compute P(Cause|Effect1, ..., Effectk


)?
ex: P(Cavity |toothache ∧ catch)?
P(Cavity
= |toothache
P(toothache ∧ catch) )P(Cavity )/P(toothache ∧
∧ catch|Cavity
A: Apply Bayes’
= αP(toothache ∧ catch|
catch)
Rule = )P(Cavity )
αP(toothache|Cavity
Cavity )P(catch|
de Cavity )P(Cavity )
α f= 1/P(toothache ∧ catch) not computed
explicitly
General case:
de
α f= 1/P(Effect1, ..., Effectn ) not computed
explicitly (one α value for every value of
Effect1, ..., Effectn )
=⇒ reduces from 2n+1 − 1 to 2n + 1 independent
entries
Outlin
e

1 Acting Under
Uncertainty
2 Basics on
Probability
3 Probabilistic Inference via
Enumeration
4 Independence and Conditional
Independence
5 Applying Bayes’
Rule
6 An Example: The Wumpus World
Revisited
An Example: The Wumpus
World
A probability model of the Wumpus World
Consider again the Wumpus World (restricted to pit
detection) Evidence: no pit in (1,1), (1,2), (2,1),
breezy in (1,2), (2,1)
Q. Given the evidence, what is the probability of having a pit in (1,3),
(2,2) or (3,1)? Two groups of variables:
Pij = true iff [i, j] contains a pit
(“causes”)
Bij = true iff [i, j] is breezy
(“effects”, consider
only
B1,1, B1,2, B2,1)
Joint Distribution:
P(P1,1 , ...,∗ P
=4,4
def
, B1,1 ,
¬b
B1,2 ,bB2,1 ∧ ) 1,1 b1,2 ∧ b2,1
p∗ = ¬p1,1 ∧ ¬p1,2 ∧
def

Known facts ¬p 2,1


Queries: P(P1,3 |b∗, p∗)? P(P22 |b∗,
(evidence):
(P(P 3,1 |b , p )
∗)? ∗ ∗
p
An Example: The Wumpus
World
A probability model of the Wumpus World
Consider again the Wumpus World (restricted to pit
detection) Evidence: no pit in (1,1), (1,2), (2,1),
breezy in (1,2), (2,1)
Q. Given the evidence, what is the probability of having a pit in (1,3),
(2,2) or (3,1)? Two groups of variables:
Pij = true iff [i, j] contains a pit
(“causes”)
Bij = true iff [i, j] is breezy
(“effects”, consider
only
B1,1, B1,2, B2,1)
Joint Distribution:
P(P1,1 , ...,∗ P
=4,4
def
, B1,1 ,
¬b
B1,2 ,bB2,1 ∧ ) 1,1 b1,2 ∧ b2,1
p∗ = ¬p1,1 ∧ ¬p1,2 ∧
def

Known facts ¬p 2,1


Queries: P(P1,3 |b∗, p∗)? P(P22 |b∗,
(evidence):
(P(P 3,1 |b , p )
∗)? ∗ ∗
p
An Example: The Wumpus World
[cont.]

Specifying the probability model


Apply the product rule to the joint distribution P(P1,1, ..., P4,4, B1,1, B1,2,
B2,1) =
P(B1,1 , B1,2 , B2,1 |P1,1 , ..., P4,4 ) P(P1,1 , ..., P4,4 )
P(B1,1 , B1,2 , B2,1 |P1,1 , ..., P4,4 )
1 if one pit is adjacent to
breeze,
P(P1,1 , ..., P4,40):otherwise
pits are placed randomly except in
(1,1):
0.2 if (i, j) /= (1,
P(Pi,j )
1)}
0 otherwise
=
ex: P(P1,1, ..., P4,4) = 0.23 · 0.815−3 ≈ 0.00055 if 3
pits
An Example: The Wumpus World
[cont.]
Inference by
enumeration
Case P :
1,3

General form of query: P(Y|E = e) = αP(Y, E = e) = (Y, E = e, H =


αΣ hY:Pquery vars; E,e: evidence vars/values; H,h: hidden vars/values
h)
Our case: P(P1,3 |p , b ), s.t. the evidence is
∗ ∗

b∗ = ¬b1,1
def
b1,2 ∧ b2,1

p∗ = ¬p1,1 ∧ ¬p1,2 ∧
def

Sum over ¬p hidden


2,1

P(P1, |p∗ ,b∗ )


variables:
= Σ 3unknown P(P1, |p∗
α ∗
,b ,
unknown)
3
unknown are all Pij ’s s.t.
(i, j) ∉{ (1, 1), (1, 2), (2, 1), (1, 3)}
216−4 = 4096 terms of the sum!
Grows exponentially in the number of hidden
variables H! Inefficient
An Example: The Wumpus World
[cont.]
Using conditional
independence
Basic insight: Given the fringe squares (see below), b∗ is conditionally
independent of the other hidden squares
def
Unknown = Fringe ∪
Other de
P( b∗ |p∗, P1,3 , Unknown) f= ( b∗|
, P , Fringe, Others) = (b∗ |p∗ , P1, ,
P 1,3 P
Next: manipulate the query into a Fringe) 3
p∗
form where this equation can be
used
An Example: The Wumpus World
[cont.]
P(p∗, b∗) = P(p∗, b∗) is scalar; use as a normalization
constant
An Example: The Wumpus World
[cont.]
Sum over the unknowns
An Example: The Wumpus World
[cont.]
Use the product rule
An Example: The Wumpus World
[cont.]
Separate unknown into fringe and other
An Example: The Wumpus World
[cont.]
b∗ is conditionally independent of other given
fringe
An Example: The Wumpus World
[cont.]
Move P(b∗|p∗, P1,3 , fringe)
outward
An Example: The Wumpus World
[cont.]
All of the pit locations are independent
An Example: The Wumpus World
[cont.]
Move P(p∗), P(P1,3 ), and P(fringe)
outward
An Example: The Wumpus World
[cont.]
Σ
Remove other P(other ) because it equals 1
An Example: The Wumpus World
[cont.]
P(p∗) is scalar, so make it part of the normalization
constant
An Example: The Wumpus World
[cont.]
We have obtained: P(P 1,3 p∗ | b∗, ′P )Σ fringe P(b∗ |p∗ , P1, ,
= α (P 1,3 )
We know that P(P1,3 ) = ⟨0.2, 0.8⟩ fringe)P(fringe)
3

We can compute the normalization coefficient α′


Σ
afterwards
fringe P(b∗ |p∗ , P1, , fringe)P(fringe): only 4 possible
Start by fringes 3
rewriting as two separate
P( p1,3 |p∗ , b∗ ) = α ′ P
equations: P( b∗| , p1,3 , fringe)P(fringe)
Σ fringe
( p 1,3
P(¬p ) |p∗, b∗) = α′ P(¬p1,3)
1,3 fringe P(b∗|p∗, ¬p1,3,
p∗
fringe)P(fringe)
An Example: The Wumpus World
[cont.]
Start by rewriting as two separate
P( p1,3 |p∗ , b∗ ) = α ′ P
equations: P( b∗| , p1,3 , fringe)P(fringe)
Σ fringe
( p 1,3
P(¬p ) |p∗, b∗) = α′ P(¬p1,3) fringe P(b∗|p∗, ¬p1,3,
1,3
p∗
fringe)P(fringe)
For each of them, P(b∗|...) is 1 if the breezes occur, 0
Σ fringe P(b∗∗ |p∗∗ , p1,3, fringe)P(fringe) = 1 · 0.04 + 1 · 0.16 + 1 · 0.16 + 0 · 0.64
otherwise:
= 0.36
fringe P(b |p , ¬p1,3, fringe)P(fringe) = 1·0.04 + 1 · 0.16 + 0 · 0.16 + 0 · 0.64
= 0.2
P(P1,3 |p∗ , b∗) = α ′P (P ) Σ fringe P(b∗ |p∗ , P1, ,
= α1,3 fringe)P(fringe)
′ ⟨0.2, 0.8⟩⟨0.36, 0.2⟩ = α′ ⟨0.072, 3
0.16⟩ = (normalization, s.t. α′ ≈ 4.31) ≈ ⟨0.31,
0.69⟩
Exercis
e

Compute P(P2,2 |p∗, b∗) in the same


way.

You might also like