0% found this document useful (0 votes)
5 views145 pages

03 QuantifyingUncertainty

Uploaded by

Jayaram Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views145 pages

03 QuantifyingUncertainty

Uploaded by

Jayaram Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 145

Fundamentals of Artificial Intelligence

Chapter 13: Quantifying Uncertainty

Roberto Sebastiani
DISI, Università di Trento, Italy – [email protected]
http://disi.unitn.it/rseba/DIDATTICA/fai_2020/

Teaching assistant: Mauro Dragoni – [email protected]


http://www.maurodragoni.com/teaching/fai/

M.S. Course “Artificial Intelligence Systems”, academic year 2020-2021


Last update: Thursday 17th December, 2020, 10:28

Copyright notice: Most examples and images displayed in the slides of this course are taken from
[Russell & Norwig, “Artificial Intelligence, a Modern Approach”, 3rd ed., Pearson],
including explicitly figures from the above-mentioned book, and their copyright is detained by the authors.
A few other material (text, figures, examples) is authored by (in alphabetical order):
Pieter Abbeel, Bonnie J. Dorr, Anca Dragan, Dan Klein, Nikita Kitaev, Tom Lenaerts, Michela Milano, Dana Nau, Maria
Simi, who detain its copyright. These slides cannot can be displayed in public without the permission of the author.
1 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

2 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

3 / 44
Acting Under Uncertainty
Agents often make decisions based on incomplete information
partial observability
nondeterministic actions
Partial solution (see previous chapters): maintain belief states
represent the set of all possible world states the agent might be in
generating a contingency plan handling every possible eventuality
Several drawbacks:
must consider every possible explanation for the observation
(even very-unlikely ones) =⇒ impossibly complex belief-states
contingent plans handling every eventuality grow arbitrarily large
sometimes there is no plan that is guaranteed to achieve the goal
Agent’s knowledge cannot guarantee a successful outcome ...
... but can provide some degree of belief (likelihood) on it
A rational decision depends on both the relative importance of
(sub)goals and the likelihood that they will be achieved
Probability theory offers a clean way to quantify likelihood
4 / 44
Acting Under Uncertainty
Agents often make decisions based on incomplete information
partial observability
nondeterministic actions
Partial solution (see previous chapters): maintain belief states
represent the set of all possible world states the agent might be in
generating a contingency plan handling every possible eventuality
Several drawbacks:
must consider every possible explanation for the observation
(even very-unlikely ones) =⇒ impossibly complex belief-states
contingent plans handling every eventuality grow arbitrarily large
sometimes there is no plan that is guaranteed to achieve the goal
Agent’s knowledge cannot guarantee a successful outcome ...
... but can provide some degree of belief (likelihood) on it
A rational decision depends on both the relative importance of
(sub)goals and the likelihood that they will be achieved
Probability theory offers a clean way to quantify likelihood
4 / 44
Acting Under Uncertainty
Agents often make decisions based on incomplete information
partial observability
nondeterministic actions
Partial solution (see previous chapters): maintain belief states
represent the set of all possible world states the agent might be in
generating a contingency plan handling every possible eventuality
Several drawbacks:
must consider every possible explanation for the observation
(even very-unlikely ones) =⇒ impossibly complex belief-states
contingent plans handling every eventuality grow arbitrarily large
sometimes there is no plan that is guaranteed to achieve the goal
Agent’s knowledge cannot guarantee a successful outcome ...
... but can provide some degree of belief (likelihood) on it
A rational decision depends on both the relative importance of
(sub)goals and the likelihood that they will be achieved
Probability theory offers a clean way to quantify likelihood
4 / 44
Acting Under Uncertainty
Agents often make decisions based on incomplete information
partial observability
nondeterministic actions
Partial solution (see previous chapters): maintain belief states
represent the set of all possible world states the agent might be in
generating a contingency plan handling every possible eventuality
Several drawbacks:
must consider every possible explanation for the observation
(even very-unlikely ones) =⇒ impossibly complex belief-states
contingent plans handling every eventuality grow arbitrarily large
sometimes there is no plan that is guaranteed to achieve the goal
Agent’s knowledge cannot guarantee a successful outcome ...
... but can provide some degree of belief (likelihood) on it
A rational decision depends on both the relative importance of
(sub)goals and the likelihood that they will be achieved
Probability theory offers a clean way to quantify likelihood
4 / 44
Acting Under Uncertainty
Agents often make decisions based on incomplete information
partial observability
nondeterministic actions
Partial solution (see previous chapters): maintain belief states
represent the set of all possible world states the agent might be in
generating a contingency plan handling every possible eventuality
Several drawbacks:
must consider every possible explanation for the observation
(even very-unlikely ones) =⇒ impossibly complex belief-states
contingent plans handling every eventuality grow arbitrarily large
sometimes there is no plan that is guaranteed to achieve the goal
Agent’s knowledge cannot guarantee a successful outcome ...
... but can provide some degree of belief (likelihood) on it
A rational decision depends on both the relative importance of
(sub)goals and the likelihood that they will be achieved
Probability theory offers a clean way to quantify likelihood
4 / 44
Acting Under Uncertainty
Agents often make decisions based on incomplete information
partial observability
nondeterministic actions
Partial solution (see previous chapters): maintain belief states
represent the set of all possible world states the agent might be in
generating a contingency plan handling every possible eventuality
Several drawbacks:
must consider every possible explanation for the observation
(even very-unlikely ones) =⇒ impossibly complex belief-states
contingent plans handling every eventuality grow arbitrarily large
sometimes there is no plan that is guaranteed to achieve the goal
Agent’s knowledge cannot guarantee a successful outcome ...
... but can provide some degree of belief (likelihood) on it
A rational decision depends on both the relative importance of
(sub)goals and the likelihood that they will be achieved
Probability theory offers a clean way to quantify likelihood
4 / 44
Acting Under Uncertainty
Agents often make decisions based on incomplete information
partial observability
nondeterministic actions
Partial solution (see previous chapters): maintain belief states
represent the set of all possible world states the agent might be in
generating a contingency plan handling every possible eventuality
Several drawbacks:
must consider every possible explanation for the observation
(even very-unlikely ones) =⇒ impossibly complex belief-states
contingent plans handling every eventuality grow arbitrarily large
sometimes there is no plan that is guaranteed to achieve the goal
Agent’s knowledge cannot guarantee a successful outcome ...
... but can provide some degree of belief (likelihood) on it
A rational decision depends on both the relative importance of
(sub)goals and the likelihood that they will be achieved
Probability theory offers a clean way to quantify likelihood
4 / 44
Acting Under Uncertainty: Example
Automated taxi to Airport
Goal: deliver a passenger to the airport on time
Action At : leave for airport t minutes before flight
How can we be sure that A90 will succeed?
Too many sources of uncertainty:
partial observability (ex: road state, other drivers’ plans, etc.)
uncertainty in action outcome (ex: flat tire, etc.)
noisy sensors (ex: unreliable traffic reports)
complexity of modelling and predicting traffic
=⇒ With purely-logical approach it is difficult to anticipate everything
that can go wrong
risks falsehood: “A25 will get me there on time” or
leads to conclusions that are too weak for decision making:
“A25 will get me there on time if there’s no accident on the bridge ,
and it doesn’t rain and my tires remain intact , and...”
Over-cautious choices are not rational solutions either
ex: A1440 causes staying overnight at the airport
5 / 44
Acting Under Uncertainty: Example
Automated taxi to Airport
Goal: deliver a passenger to the airport on time
Action At : leave for airport t minutes before flight
How can we be sure that A90 will succeed?
Too many sources of uncertainty:
partial observability (ex: road state, other drivers’ plans, etc.)
uncertainty in action outcome (ex: flat tire, etc.)
noisy sensors (ex: unreliable traffic reports)
complexity of modelling and predicting traffic
=⇒ With purely-logical approach it is difficult to anticipate everything
that can go wrong
risks falsehood: “A25 will get me there on time” or
leads to conclusions that are too weak for decision making:
“A25 will get me there on time if there’s no accident on the bridge ,
and it doesn’t rain and my tires remain intact , and...”
Over-cautious choices are not rational solutions either
ex: A1440 causes staying overnight at the airport
5 / 44
Acting Under Uncertainty: Example
Automated taxi to Airport
Goal: deliver a passenger to the airport on time
Action At : leave for airport t minutes before flight
How can we be sure that A90 will succeed?
Too many sources of uncertainty:
partial observability (ex: road state, other drivers’ plans, etc.)
uncertainty in action outcome (ex: flat tire, etc.)
noisy sensors (ex: unreliable traffic reports)
complexity of modelling and predicting traffic
=⇒ With purely-logical approach it is difficult to anticipate everything
that can go wrong
risks falsehood: “A25 will get me there on time” or
leads to conclusions that are too weak for decision making:
“A25 will get me there on time if there’s no accident on the bridge ,
and it doesn’t rain and my tires remain intact , and...”
Over-cautious choices are not rational solutions either
ex: A1440 causes staying overnight at the airport
5 / 44
Acting Under Uncertainty: Example
Automated taxi to Airport
Goal: deliver a passenger to the airport on time
Action At : leave for airport t minutes before flight
How can we be sure that A90 will succeed?
Too many sources of uncertainty:
partial observability (ex: road state, other drivers’ plans, etc.)
uncertainty in action outcome (ex: flat tire, etc.)
noisy sensors (ex: unreliable traffic reports)
complexity of modelling and predicting traffic
=⇒ With purely-logical approach it is difficult to anticipate everything
that can go wrong
risks falsehood: “A25 will get me there on time” or
leads to conclusions that are too weak for decision making:
“A25 will get me there on time if there’s no accident on the bridge ,
and it doesn’t rain and my tires remain intact , and...”
Over-cautious choices are not rational solutions either
ex: A1440 causes staying overnight at the airport
5 / 44
Acting Under Uncertainty: Example
Automated taxi to Airport
Goal: deliver a passenger to the airport on time
Action At : leave for airport t minutes before flight
How can we be sure that A90 will succeed?
Too many sources of uncertainty:
partial observability (ex: road state, other drivers’ plans, etc.)
uncertainty in action outcome (ex: flat tire, etc.)
noisy sensors (ex: unreliable traffic reports)
complexity of modelling and predicting traffic
=⇒ With purely-logical approach it is difficult to anticipate everything
that can go wrong
risks falsehood: “A25 will get me there on time” or
leads to conclusions that are too weak for decision making:
“A25 will get me there on time if there’s no accident on the bridge ,
and it doesn’t rain and my tires remain intact , and...”
Over-cautious choices are not rational solutions either
ex: A1440 causes staying overnight at the airport
5 / 44
Acting Under Uncertainty: Example (2)

A medical diagnosis
Given the symptoms (toothache) infer the cause (cavity)
How to encode this relation in logic?
diagnostic rules:
Toothache → Cavity (wrong)
Toothache → (Cavity ∨ GumProblem ∨ Abscess ∨ ...)
(too many possible causes, some very unlikely)
causal rules:
Cavity → Toothache (wrong)
(Cavity ∧ ...) → Toothache (many possible (con)causes)
Problems in specifying the correct logical rules:
Complexity: too many possible antecedents or consequents
Theoretical ignorance: no complete theory for the domain
Practical ignorance: no complete knowledge of the patient

6 / 44
Acting Under Uncertainty: Example (2)

A medical diagnosis
Given the symptoms (toothache) infer the cause (cavity)
How to encode this relation in logic?
diagnostic rules:
Toothache → Cavity (wrong)
Toothache → (Cavity ∨ GumProblem ∨ Abscess ∨ ...)
(too many possible causes, some very unlikely)
causal rules:
Cavity → Toothache (wrong)
(Cavity ∧ ...) → Toothache (many possible (con)causes)
Problems in specifying the correct logical rules:
Complexity: too many possible antecedents or consequents
Theoretical ignorance: no complete theory for the domain
Practical ignorance: no complete knowledge of the patient

6 / 44
Acting Under Uncertainty: Example (2)

A medical diagnosis
Given the symptoms (toothache) infer the cause (cavity)
How to encode this relation in logic?
diagnostic rules:
Toothache → Cavity (wrong)
Toothache → (Cavity ∨ GumProblem ∨ Abscess ∨ ...)
(too many possible causes, some very unlikely)
causal rules:
Cavity → Toothache (wrong)
(Cavity ∧ ...) → Toothache (many possible (con)causes)
Problems in specifying the correct logical rules:
Complexity: too many possible antecedents or consequents
Theoretical ignorance: no complete theory for the domain
Practical ignorance: no complete knowledge of the patient

6 / 44
Acting Under Uncertainty: Example (2)

A medical diagnosis
Given the symptoms (toothache) infer the cause (cavity)
How to encode this relation in logic?
diagnostic rules:
Toothache → Cavity (wrong)
Toothache → (Cavity ∨ GumProblem ∨ Abscess ∨ ...)
(too many possible causes, some very unlikely)
causal rules:
Cavity → Toothache (wrong)
(Cavity ∧ ...) → Toothache (many possible (con)causes)
Problems in specifying the correct logical rules:
Complexity: too many possible antecedents or consequents
Theoretical ignorance: no complete theory for the domain
Practical ignorance: no complete knowledge of the patient

6 / 44
Acting Under Uncertainty: Example (2)

A medical diagnosis
Given the symptoms (toothache) infer the cause (cavity)
How to encode this relation in logic?
diagnostic rules:
Toothache → Cavity (wrong)
Toothache → (Cavity ∨ GumProblem ∨ Abscess ∨ ...)
(too many possible causes, some very unlikely)
causal rules:
Cavity → Toothache (wrong)
(Cavity ∧ ...) → Toothache (many possible (con)causes)
Problems in specifying the correct logical rules:
Complexity: too many possible antecedents or consequents
Theoretical ignorance: no complete theory for the domain
Practical ignorance: no complete knowledge of the patient

6 / 44
Summarizing Uncertainty
Probability allows to summarize the uncertainty on effects of
laziness: failure to enumerate exceptions, qualifications, etc.
ignorance: lack of relevant facts, initial conditions, etc.
Probability can be derived from
statistical data (ex: 80% of toothache patients so far had cavities)
some knowledge (ex: 80% of toothache patients has cavities)
their combination thereof
Probability statements are made with respect to a state of
knowledge (aka evidence), not with respect to the real world
e.g., “The probability that the patient has a cavity, given that she
has a toothache, is 0.8”:
P(HasCavity (patient) | hasToothAche(patient)) = 0.8
Probabilities of propositions change with new evidence:
“The probability that the patient has a cavity, given that she has a
toothache and a history of gum disease, is 0.4”:
P(HasCavity (patient)
| hasToothAche(patient) ∧ HistoryOfGum(patient)) = 0.4
7 / 44
Summarizing Uncertainty
Probability allows to summarize the uncertainty on effects of
laziness: failure to enumerate exceptions, qualifications, etc.
ignorance: lack of relevant facts, initial conditions, etc.
Probability can be derived from
statistical data (ex: 80% of toothache patients so far had cavities)
some knowledge (ex: 80% of toothache patients has cavities)
their combination thereof
Probability statements are made with respect to a state of
knowledge (aka evidence), not with respect to the real world
e.g., “The probability that the patient has a cavity, given that she
has a toothache, is 0.8”:
P(HasCavity (patient) | hasToothAche(patient)) = 0.8
Probabilities of propositions change with new evidence:
“The probability that the patient has a cavity, given that she has a
toothache and a history of gum disease, is 0.4”:
P(HasCavity (patient)
| hasToothAche(patient) ∧ HistoryOfGum(patient)) = 0.4
7 / 44
Summarizing Uncertainty
Probability allows to summarize the uncertainty on effects of
laziness: failure to enumerate exceptions, qualifications, etc.
ignorance: lack of relevant facts, initial conditions, etc.
Probability can be derived from
statistical data (ex: 80% of toothache patients so far had cavities)
some knowledge (ex: 80% of toothache patients has cavities)
their combination thereof
Probability statements are made with respect to a state of
knowledge (aka evidence), not with respect to the real world
e.g., “The probability that the patient has a cavity, given that she
has a toothache, is 0.8”:
P(HasCavity (patient) | hasToothAche(patient)) = 0.8
Probabilities of propositions change with new evidence:
“The probability that the patient has a cavity, given that she has a
toothache and a history of gum disease, is 0.4”:
P(HasCavity (patient)
| hasToothAche(patient) ∧ HistoryOfGum(patient)) = 0.4
7 / 44
Summarizing Uncertainty
Probability allows to summarize the uncertainty on effects of
laziness: failure to enumerate exceptions, qualifications, etc.
ignorance: lack of relevant facts, initial conditions, etc.
Probability can be derived from
statistical data (ex: 80% of toothache patients so far had cavities)
some knowledge (ex: 80% of toothache patients has cavities)
their combination thereof
Probability statements are made with respect to a state of
knowledge (aka evidence), not with respect to the real world
e.g., “The probability that the patient has a cavity, given that she
has a toothache, is 0.8”:
P(HasCavity (patient) | hasToothAche(patient)) = 0.8
Probabilities of propositions change with new evidence:
“The probability that the patient has a cavity, given that she has a
toothache and a history of gum disease, is 0.4”:
P(HasCavity (patient)
| hasToothAche(patient) ∧ HistoryOfGum(patient)) = 0.4
7 / 44
Making Decisions Under Uncertainty
Ex: Suppose I believe:
P(A25 gets me there on time |...) = 0.04
P(A90 gets me there on time |...) = 0.70
P(A120 gets me there on time |...) = 0.95
P(A1440 gets me there on time |...) = 0.9999
Which action to choose?
=⇒ Depends on tradeoffs among preferences:
missing flight vs. costs (airport cuisine, sleep overnight in airport)
When there are conflicting goals the agent may express
preferences among them by means of a utility function.
Utilities are combined with probabilities in the general theory of
rational decisions, aka decision theory:
Decision theory = Probability theory + Utility theory
Maximum Expected Utility (MEU): an agent is rational if and only
if it chooses the action that yields the maximum expected utility,
averaged over all the possible outcomes of the action.
8 / 44
Making Decisions Under Uncertainty
Ex: Suppose I believe:
P(A25 gets me there on time |...) = 0.04
P(A90 gets me there on time |...) = 0.70
P(A120 gets me there on time |...) = 0.95
P(A1440 gets me there on time |...) = 0.9999
Which action to choose?
=⇒ Depends on tradeoffs among preferences:
missing flight vs. costs (airport cuisine, sleep overnight in airport)
When there are conflicting goals the agent may express
preferences among them by means of a utility function.
Utilities are combined with probabilities in the general theory of
rational decisions, aka decision theory:
Decision theory = Probability theory + Utility theory
Maximum Expected Utility (MEU): an agent is rational if and only
if it chooses the action that yields the maximum expected utility,
averaged over all the possible outcomes of the action.
8 / 44
Making Decisions Under Uncertainty
Ex: Suppose I believe:
P(A25 gets me there on time |...) = 0.04
P(A90 gets me there on time |...) = 0.70
P(A120 gets me there on time |...) = 0.95
P(A1440 gets me there on time |...) = 0.9999
Which action to choose?
=⇒ Depends on tradeoffs among preferences:
missing flight vs. costs (airport cuisine, sleep overnight in airport)
When there are conflicting goals the agent may express
preferences among them by means of a utility function.
Utilities are combined with probabilities in the general theory of
rational decisions, aka decision theory:
Decision theory = Probability theory + Utility theory
Maximum Expected Utility (MEU): an agent is rational if and only
if it chooses the action that yields the maximum expected utility,
averaged over all the possible outcomes of the action.
8 / 44
Making Decisions Under Uncertainty
Ex: Suppose I believe:
P(A25 gets me there on time |...) = 0.04
P(A90 gets me there on time |...) = 0.70
P(A120 gets me there on time |...) = 0.95
P(A1440 gets me there on time |...) = 0.9999
Which action to choose?
=⇒ Depends on tradeoffs among preferences:
missing flight vs. costs (airport cuisine, sleep overnight in airport)
When there are conflicting goals the agent may express
preferences among them by means of a utility function.
Utilities are combined with probabilities in the general theory of
rational decisions, aka decision theory:
Decision theory = Probability theory + Utility theory
Maximum Expected Utility (MEU): an agent is rational if and only
if it chooses the action that yields the maximum expected utility,
averaged over all the possible outcomes of the action.
8 / 44
Making Decisions Under Uncertainty
Ex: Suppose I believe:
P(A25 gets me there on time |...) = 0.04
P(A90 gets me there on time |...) = 0.70
P(A120 gets me there on time |...) = 0.95
P(A1440 gets me there on time |...) = 0.9999
Which action to choose?
=⇒ Depends on tradeoffs among preferences:
missing flight vs. costs (airport cuisine, sleep overnight in airport)
When there are conflicting goals the agent may express
preferences among them by means of a utility function.
Utilities are combined with probabilities in the general theory of
rational decisions, aka decision theory:
Decision theory = Probability theory + Utility theory
Maximum Expected Utility (MEU): an agent is rational if and only
if it chooses the action that yields the maximum expected utility,
averaged over all the possible outcomes of the action.
8 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

9 / 44
Probabilities Basics: an AI-sh Introduction

Probabilistic assertions: state how likely possible worlds are


Sample space Ω: the set of all possible worlds
ω ∈ Ω is a possible world (aka sample point or atomic event)
ex: the dice roll (1,4)
the possible worlds are mutually exclusive and exhaustive
ex: the 36 possible outcomes of rolling two dice: (1,1), (1,2), ...
A probability model (aka probability space) is a sample space
with an assignment P(ω) for every ω ∈ Ω s.t.
0 ≤ P(ω) ≤ 1, for every ω ∈ Ω
Σω∈Ω P(ω) = 1
Ex: 1-die roll: P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
An Event A is any subset of Ω, s.t. P(A) = Σω∈A P(ω)
events can be described by propositions in some formal language
ex: P(Total = 11) = P(5, 6) + P(6, 5) = 1/36 + 1/36 = 1/18
ex: P(doubles) = P(1, 1) + P(2, 2) + ... + P(6, 6) = 6/36 = 1/6

10 / 44
Probabilities Basics: an AI-sh Introduction

Probabilistic assertions: state how likely possible worlds are


Sample space Ω: the set of all possible worlds
ω ∈ Ω is a possible world (aka sample point or atomic event)
ex: the dice roll (1,4)
the possible worlds are mutually exclusive and exhaustive
ex: the 36 possible outcomes of rolling two dice: (1,1), (1,2), ...
A probability model (aka probability space) is a sample space
with an assignment P(ω) for every ω ∈ Ω s.t.
0 ≤ P(ω) ≤ 1, for every ω ∈ Ω
Σω∈Ω P(ω) = 1
Ex: 1-die roll: P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
An Event A is any subset of Ω, s.t. P(A) = Σω∈A P(ω)
events can be described by propositions in some formal language
ex: P(Total = 11) = P(5, 6) + P(6, 5) = 1/36 + 1/36 = 1/18
ex: P(doubles) = P(1, 1) + P(2, 2) + ... + P(6, 6) = 6/36 = 1/6

10 / 44
Probabilities Basics: an AI-sh Introduction

Probabilistic assertions: state how likely possible worlds are


Sample space Ω: the set of all possible worlds
ω ∈ Ω is a possible world (aka sample point or atomic event)
ex: the dice roll (1,4)
the possible worlds are mutually exclusive and exhaustive
ex: the 36 possible outcomes of rolling two dice: (1,1), (1,2), ...
A probability model (aka probability space) is a sample space
with an assignment P(ω) for every ω ∈ Ω s.t.
0 ≤ P(ω) ≤ 1, for every ω ∈ Ω
Σω∈Ω P(ω) = 1
Ex: 1-die roll: P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
An Event A is any subset of Ω, s.t. P(A) = Σω∈A P(ω)
events can be described by propositions in some formal language
ex: P(Total = 11) = P(5, 6) + P(6, 5) = 1/36 + 1/36 = 1/18
ex: P(doubles) = P(1, 1) + P(2, 2) + ... + P(6, 6) = 6/36 = 1/6

10 / 44
Probabilities Basics: an AI-sh Introduction

Probabilistic assertions: state how likely possible worlds are


Sample space Ω: the set of all possible worlds
ω ∈ Ω is a possible world (aka sample point or atomic event)
ex: the dice roll (1,4)
the possible worlds are mutually exclusive and exhaustive
ex: the 36 possible outcomes of rolling two dice: (1,1), (1,2), ...
A probability model (aka probability space) is a sample space
with an assignment P(ω) for every ω ∈ Ω s.t.
0 ≤ P(ω) ≤ 1, for every ω ∈ Ω
Σω∈Ω P(ω) = 1
Ex: 1-die roll: P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
An Event A is any subset of Ω, s.t. P(A) = Σω∈A P(ω)
events can be described by propositions in some formal language
ex: P(Total = 11) = P(5, 6) + P(6, 5) = 1/36 + 1/36 = 1/18
ex: P(doubles) = P(1, 1) + P(2, 2) + ... + P(6, 6) = 6/36 = 1/6

10 / 44
Random Variables

Factored representation of possible worlds: sets of


hvariable, valuei pairs
Variables in probability theory: Random variables
domain: the set of possible values a variable can take on
ex: Die: {1, 2, 3, 4, 5, 6}, Weather: {sunny , rain, cloudy , snow},
Odd: {true, false},
a r.v. can be seen as a function from sample points to the domain:
ex: Die(ω), Weather (ω),... (“(ω)” typically omitted)
Probability Distribution gives the probabilities of all the possible
def
values of a random variable X : P(X = xi ) = Σω∈X (ω) P(ω)
ex: P(Odd = true) = P(1) + P(3) + P(5) = 1/6 + 1/6 + 1/6 = 1/2

11 / 44
Random Variables

Factored representation of possible worlds: sets of


hvariable, valuei pairs
Variables in probability theory: Random variables
domain: the set of possible values a variable can take on
ex: Die: {1, 2, 3, 4, 5, 6}, Weather: {sunny , rain, cloudy , snow},
Odd: {true, false},
a r.v. can be seen as a function from sample points to the domain:
ex: Die(ω), Weather (ω),... (“(ω)” typically omitted)
Probability Distribution gives the probabilities of all the possible
def
values of a random variable X : P(X = xi ) = Σω∈X (ω) P(ω)
ex: P(Odd = true) = P(1) + P(3) + P(5) = 1/6 + 1/6 + 1/6 = 1/2

11 / 44
Random Variables

Factored representation of possible worlds: sets of


hvariable, valuei pairs
Variables in probability theory: Random variables
domain: the set of possible values a variable can take on
ex: Die: {1, 2, 3, 4, 5, 6}, Weather: {sunny , rain, cloudy , snow},
Odd: {true, false},
a r.v. can be seen as a function from sample points to the domain:
ex: Die(ω), Weather (ω),... (“(ω)” typically omitted)
Probability Distribution gives the probabilities of all the possible
def
values of a random variable X : P(X = xi ) = Σω∈X (ω) P(ω)
ex: P(Odd = true) = P(1) + P(3) + P(5) = 1/6 + 1/6 + 1/6 = 1/2

11 / 44
Propositions and Probabilities

We think a proposition a as the event A (set of sample points)


where the proposition is true
Odd is a propositional random variable of range {true, false}
notation: a ⇐⇒ “A = true00
Given Boolean random variables A and B:
event a: set of sample points where A(ω) = true
event ¬a: set of sample points where A(ω) = false
event a ∧ b: set of sample points where A(ω) = true, B(ω) = true
=⇒ with Boolean random variables, sample points are PL models
Proposition: disjunction of the sample points in which it is true
ex: (a ∨ b) ≡ (¬a ∧ b) ∨ (a ∧ ¬b) ∨ (a ∧ b)
=⇒ P(a ∨ b) = P(¬a ∧ b) + P(a ∧ ¬b) + P(a ∧ b)
Some derived facts:
P(¬a) = 1 − P(a)
P(a ∨ b) = P(a) + P(b) − P(a ∧ b)

12 / 44
Propositions and Probabilities

We think a proposition a as the event A (set of sample points)


where the proposition is true
Odd is a propositional random variable of range {true, false}
notation: a ⇐⇒ “A = true00
Given Boolean random variables A and B:
event a: set of sample points where A(ω) = true
event ¬a: set of sample points where A(ω) = false
event a ∧ b: set of sample points where A(ω) = true, B(ω) = true
=⇒ with Boolean random variables, sample points are PL models
Proposition: disjunction of the sample points in which it is true
ex: (a ∨ b) ≡ (¬a ∧ b) ∨ (a ∧ ¬b) ∨ (a ∧ b)
=⇒ P(a ∨ b) = P(¬a ∧ b) + P(a ∧ ¬b) + P(a ∧ b)
Some derived facts:
P(¬a) = 1 − P(a)
P(a ∨ b) = P(a) + P(b) − P(a ∧ b)

12 / 44
Propositions and Probabilities

We think a proposition a as the event A (set of sample points)


where the proposition is true
Odd is a propositional random variable of range {true, false}
notation: a ⇐⇒ “A = true00
Given Boolean random variables A and B:
event a: set of sample points where A(ω) = true
event ¬a: set of sample points where A(ω) = false
event a ∧ b: set of sample points where A(ω) = true, B(ω) = true
=⇒ with Boolean random variables, sample points are PL models
Proposition: disjunction of the sample points in which it is true
ex: (a ∨ b) ≡ (¬a ∧ b) ∨ (a ∧ ¬b) ∨ (a ∧ b)
=⇒ P(a ∨ b) = P(¬a ∧ b) + P(a ∧ ¬b) + P(a ∧ b)
Some derived facts:
P(¬a) = 1 − P(a)
P(a ∨ b) = P(a) + P(b) − P(a ∧ b)

12 / 44
Propositions and Probabilities

We think a proposition a as the event A (set of sample points)


where the proposition is true
Odd is a propositional random variable of range {true, false}
notation: a ⇐⇒ “A = true00
Given Boolean random variables A and B:
event a: set of sample points where A(ω) = true
event ¬a: set of sample points where A(ω) = false
event a ∧ b: set of sample points where A(ω) = true, B(ω) = true
=⇒ with Boolean random variables, sample points are PL models
Proposition: disjunction of the sample points in which it is true
ex: (a ∨ b) ≡ (¬a ∧ b) ∨ (a ∧ ¬b) ∨ (a ∧ b)
=⇒ P(a ∨ b) = P(¬a ∧ b) + P(a ∧ ¬b) + P(a ∧ b)
Some derived facts:
P(¬a) = 1 − P(a)
P(a ∨ b) = P(a) + P(b) − P(a ∧ b)

12 / 44
Probability Distributions

Probability Distribution gives the probabilities of all the possible


values of a random variable
ex: Weather: {sunny , rain, cloudy , snow}
P(Weather ) = (0.6, 0.1, 0.29, 0.01) 
=⇒  ⇐⇒

 P(Weather = sunny ) = 0.6 

P(Weather = rain) = 0.1
 

 P(Weather = cloudy ) = 0.29  
P(Weather = snow) = 0.01
 
normalized: their sum is 1
Joint Probability Distribution for multiple variables
gives the probability of every sample point
ex: P(Weather , Cavity ) =
Weather = sunny rain cloudy snow
Cavity = true 0.144 0.02 0.016 0.02
Cavity = false 0.576 0.08 0.064 0.08
Every event is a sum of sample points,
=⇒ its probability is determined by the joint distribution
13 / 44
Probability Distributions

Probability Distribution gives the probabilities of all the possible


values of a random variable
ex: Weather: {sunny , rain, cloudy , snow}
P(Weather ) = (0.6, 0.1, 0.29, 0.01) 
=⇒  ⇐⇒

 P(Weather = sunny ) = 0.6 

P(Weather = rain) = 0.1
 

 P(Weather = cloudy ) = 0.29  
P(Weather = snow) = 0.01
 
normalized: their sum is 1
Joint Probability Distribution for multiple variables
gives the probability of every sample point
ex: P(Weather , Cavity ) =
Weather = sunny rain cloudy snow
Cavity = true 0.144 0.02 0.016 0.02
Cavity = false 0.576 0.08 0.064 0.08
Every event is a sum of sample points,
=⇒ its probability is determined by the joint distribution
13 / 44
Probability Distributions

Probability Distribution gives the probabilities of all the possible


values of a random variable
ex: Weather: {sunny , rain, cloudy , snow}
P(Weather ) = (0.6, 0.1, 0.29, 0.01) 
=⇒  ⇐⇒

 P(Weather = sunny ) = 0.6 

P(Weather = rain) = 0.1
 

 P(Weather = cloudy ) = 0.29  
P(Weather = snow) = 0.01
 
normalized: their sum is 1
Joint Probability Distribution for multiple variables
gives the probability of every sample point
ex: P(Weather , Cavity ) =
Weather = sunny rain cloudy snow
Cavity = true 0.144 0.02 0.016 0.02
Cavity = false 0.576 0.08 0.064 0.08
Every event is a sum of sample points,
=⇒ its probability is determined by the joint distribution
13 / 44
Probability for Continuous Variables
Express continuous probability distributions:
R +∞
density functions f (x) ∈ [0, 1] s.t −∞
f (x)dx = 1
Rb
P(x ∈ [a, b]) = a f (x) dx
=⇒ P(x ∈ [val, val]) = 0, P(x ∈ [−∞, +∞]) = 1
R 22
ex: P(x ∈ [20, 22]) = 20 0.125 dx = 0.25
def
Density: P(x) = P(X = x) = limdx7→0 P(X ∈ [x, x + dx])/dx
ex: P(20.1) = limdx7→0 P(X ∈ [20.1, 20.1 + dx])/dx = 0.125
note: P(v ) 6= P(x ∈ [v , v ]) = 0

( c S. Russell & P. Norwig, AIMA) 14 / 44


Probability for Continuous Variables
Express continuous probability distributions:
R +∞
density functions f (x) ∈ [0, 1] s.t −∞
f (x)dx = 1
Rb
P(x ∈ [a, b]) = a f (x) dx
=⇒ P(x ∈ [val, val]) = 0, P(x ∈ [−∞, +∞]) = 1
R 22
ex: P(x ∈ [20, 22]) = 20 0.125 dx = 0.25
def
Density: P(x) = P(X = x) = limdx7→0 P(X ∈ [x, x + dx])/dx
ex: P(20.1) = limdx7→0 P(X ∈ [20.1, 20.1 + dx])/dx = 0.125
note: P(v ) 6= P(x ∈ [v , v ]) = 0

( c S. Russell & P. Norwig, AIMA) 14 / 44


Probability for Continuous Variables
Express continuous probability distributions:
R +∞
density functions f (x) ∈ [0, 1] s.t −∞
f (x)dx = 1
Rb
P(x ∈ [a, b]) = a f (x) dx
=⇒ P(x ∈ [val, val]) = 0, P(x ∈ [−∞, +∞]) = 1
R 22
ex: P(x ∈ [20, 22]) = 20 0.125 dx = 0.25
def
Density: P(x) = P(X = x) = limdx7→0 P(X ∈ [x, x + dx])/dx
ex: P(20.1) = limdx7→0 P(X ∈ [20.1, 20.1 + dx])/dx = 0.125
note: P(v ) 6= P(x ∈ [v , v ]) = 0

( c S. Russell & P. Norwig, AIMA) 14 / 44


Probability for Continuous Variables
Express continuous probability distributions:
R +∞
density functions f (x) ∈ [0, 1] s.t −∞
f (x)dx = 1
Rb
P(x ∈ [a, b]) = a f (x) dx
=⇒ P(x ∈ [val, val]) = 0, P(x ∈ [−∞, +∞]) = 1
R 22
ex: P(x ∈ [20, 22]) = 20 0.125 dx = 0.25
def
Density: P(x) = P(X = x) = limdx7→0 P(X ∈ [x, x + dx])/dx
ex: P(20.1) = limdx7→0 P(X ∈ [20.1, 20.1 + dx])/dx = 0.125
note: P(v ) 6= P(x ∈ [v , v ]) = 0

( c S. Russell & P. Norwig, AIMA) 14 / 44


Probability for Continuous Variables
Express continuous probability distributions:
R +∞
density functions f (x) ∈ [0, 1] s.t −∞
f (x)dx = 1
Rb
P(x ∈ [a, b]) = a f (x) dx
=⇒ P(x ∈ [val, val]) = 0, P(x ∈ [−∞, +∞]) = 1
R 22
ex: P(x ∈ [20, 22]) = 20 0.125 dx = 0.25
def
Density: P(x) = P(X = x) = limdx7→0 P(X ∈ [x, x + dx])/dx
ex: P(20.1) = limdx7→0 P(X ∈ [20.1, 20.1 + dx])/dx = 0.125
note: P(v ) 6= P(x ∈ [v , v ]) = 0

( c S. Russell & P. Norwig, AIMA) 14 / 44


Probability for Continuous Variables
Express continuous probability distributions:
R +∞
density functions f (x) ∈ [0, 1] s.t −∞
f (x)dx = 1
Rb
P(x ∈ [a, b]) = a f (x) dx
=⇒ P(x ∈ [val, val]) = 0, P(x ∈ [−∞, +∞]) = 1
R 22
ex: P(x ∈ [20, 22]) = 20 0.125 dx = 0.25
def
Density: P(x) = P(X = x) = limdx7→0 P(X ∈ [x, x + dx])/dx
ex: P(20.1) = limdx7→0 P(X ∈ [20.1, 20.1 + dx])/dx = 0.125
note: P(v ) 6= P(x ∈ [v , v ]) = 0

( c S. Russell & P. Norwig, AIMA) 14 / 44


Conditional Probabilities
Unconditional or prior probabilities refer to degrees of belief in
propositions in the absence of any other information (evidence)
ex: P(cavity ) = 0.2, P(Total = 11) = 1/18, P(double) = 1/6
Conditional or posterior probabilities refer to degrees of belief in
proposition a given some evidence b: P(a|b)
evidence: information already revealed
ex: P(cavity |toothache) = 0.6: p. of a cavity given a toothache
(assuming no other information is provided!)
ex: P(Total = 11|die1 = 5) = 1/6: p. of total 11 given first die is 5
=⇒ restricts the set of possible worlds to those where the first die is 5
Note: P(a|... ∧ a) = 1, P(a|... ∧ ¬a) = 0
ex: P(cavity |toothache ∧ cavity ) = 1,
P(cavity |toothache ∧ ¬cavity ) = 0
Less specific belief still valid after more evidence arrives
ex: P(cavity ) = 0.2 holds even if P(cavity |toothache) = 0.6
New evidence may be irrelevant, allowing for simplification
ex: P(cavity |toothache, 49ersWin) = P(cavity |toothache) = 0.8
15 / 44
Conditional Probabilities
Unconditional or prior probabilities refer to degrees of belief in
propositions in the absence of any other information (evidence)
ex: P(cavity ) = 0.2, P(Total = 11) = 1/18, P(double) = 1/6
Conditional or posterior probabilities refer to degrees of belief in
proposition a given some evidence b: P(a|b)
evidence: information already revealed
ex: P(cavity |toothache) = 0.6: p. of a cavity given a toothache
(assuming no other information is provided!)
ex: P(Total = 11|die1 = 5) = 1/6: p. of total 11 given first die is 5
=⇒ restricts the set of possible worlds to those where the first die is 5
Note: P(a|... ∧ a) = 1, P(a|... ∧ ¬a) = 0
ex: P(cavity |toothache ∧ cavity ) = 1,
P(cavity |toothache ∧ ¬cavity ) = 0
Less specific belief still valid after more evidence arrives
ex: P(cavity ) = 0.2 holds even if P(cavity |toothache) = 0.6
New evidence may be irrelevant, allowing for simplification
ex: P(cavity |toothache, 49ersWin) = P(cavity |toothache) = 0.8
15 / 44
Conditional Probabilities
Unconditional or prior probabilities refer to degrees of belief in
propositions in the absence of any other information (evidence)
ex: P(cavity ) = 0.2, P(Total = 11) = 1/18, P(double) = 1/6
Conditional or posterior probabilities refer to degrees of belief in
proposition a given some evidence b: P(a|b)
evidence: information already revealed
ex: P(cavity |toothache) = 0.6: p. of a cavity given a toothache
(assuming no other information is provided!)
ex: P(Total = 11|die1 = 5) = 1/6: p. of total 11 given first die is 5
=⇒ restricts the set of possible worlds to those where the first die is 5
Note: P(a|... ∧ a) = 1, P(a|... ∧ ¬a) = 0
ex: P(cavity |toothache ∧ cavity ) = 1,
P(cavity |toothache ∧ ¬cavity ) = 0
Less specific belief still valid after more evidence arrives
ex: P(cavity ) = 0.2 holds even if P(cavity |toothache) = 0.6
New evidence may be irrelevant, allowing for simplification
ex: P(cavity |toothache, 49ersWin) = P(cavity |toothache) = 0.8
15 / 44
Conditional Probabilities
Unconditional or prior probabilities refer to degrees of belief in
propositions in the absence of any other information (evidence)
ex: P(cavity ) = 0.2, P(Total = 11) = 1/18, P(double) = 1/6
Conditional or posterior probabilities refer to degrees of belief in
proposition a given some evidence b: P(a|b)
evidence: information already revealed
ex: P(cavity |toothache) = 0.6: p. of a cavity given a toothache
(assuming no other information is provided!)
ex: P(Total = 11|die1 = 5) = 1/6: p. of total 11 given first die is 5
=⇒ restricts the set of possible worlds to those where the first die is 5
Note: P(a|... ∧ a) = 1, P(a|... ∧ ¬a) = 0
ex: P(cavity |toothache ∧ cavity ) = 1,
P(cavity |toothache ∧ ¬cavity ) = 0
Less specific belief still valid after more evidence arrives
ex: P(cavity ) = 0.2 holds even if P(cavity |toothache) = 0.6
New evidence may be irrelevant, allowing for simplification
ex: P(cavity |toothache, 49ersWin) = P(cavity |toothache) = 0.8
15 / 44
Conditional Probabilities
Unconditional or prior probabilities refer to degrees of belief in
propositions in the absence of any other information (evidence)
ex: P(cavity ) = 0.2, P(Total = 11) = 1/18, P(double) = 1/6
Conditional or posterior probabilities refer to degrees of belief in
proposition a given some evidence b: P(a|b)
evidence: information already revealed
ex: P(cavity |toothache) = 0.6: p. of a cavity given a toothache
(assuming no other information is provided!)
ex: P(Total = 11|die1 = 5) = 1/6: p. of total 11 given first die is 5
=⇒ restricts the set of possible worlds to those where the first die is 5
Note: P(a|... ∧ a) = 1, P(a|... ∧ ¬a) = 0
ex: P(cavity |toothache ∧ cavity ) = 1,
P(cavity |toothache ∧ ¬cavity ) = 0
Less specific belief still valid after more evidence arrives
ex: P(cavity ) = 0.2 holds even if P(cavity |toothache) = 0.6
New evidence may be irrelevant, allowing for simplification
ex: P(cavity |toothache, 49ersWin) = P(cavity |toothache) = 0.8
15 / 44
Conditional Probabilities [cont.]
def P(a∧b)
Conditional probability: P(a|b) = P(b) , s.t. P(b) > 0
P(Total=11∧die1 =5)
ex: P(Total = 11|die1 = 5) = = 1/6·1/6
P(die1 =5) 1/6 = 1/6
observing b restricts the possible worlds to those where b is true
Production rule: P(a ∧ b) = P(a|b) · P(b) = P(b|a) · P(a)
Production rule for whole distributions: P(X , Y ) = P(X |Y ) · P(Y )
ex: P(Weather , Cavity ) = P(Weather |Cavity )P(Cavity ), that is:
P(sunny , cavity ) = P(sunny |cavity )P(cavity )
...
P(snow, ¬cavity ) = P(snow|¬cavity )P(¬cavity )
a 4 × 2 set of equations, not matrix multiplication!
Chain rule is derived by successive application of product rule:
P(X1 , ..., Xn )
= P(X1 , ..., Xn−1 )P(Xn |X1 , ..., Xn−1 )
= P(X1 , ..., Xn−2 )P(Xn−1 |X1 , ..., Xn−2 )P(Xn |X1 , ..., Xn−1 )
= ...
Qn
= i=1 P(Xi |X1 , ..., Xi−1 )

16 / 44
Conditional Probabilities [cont.]
def P(a∧b)
Conditional probability: P(a|b) = P(b) , s.t. P(b) > 0
P(Total=11∧die1 =5)
ex: P(Total = 11|die1 = 5) = = 1/6·1/6
P(die1 =5) 1/6 = 1/6
observing b restricts the possible worlds to those where b is true
Production rule: P(a ∧ b) = P(a|b) · P(b) = P(b|a) · P(a)
Production rule for whole distributions: P(X , Y ) = P(X |Y ) · P(Y )
ex: P(Weather , Cavity ) = P(Weather |Cavity )P(Cavity ), that is:
P(sunny , cavity ) = P(sunny |cavity )P(cavity )
...
P(snow, ¬cavity ) = P(snow|¬cavity )P(¬cavity )
a 4 × 2 set of equations, not matrix multiplication!
Chain rule is derived by successive application of product rule:
P(X1 , ..., Xn )
= P(X1 , ..., Xn−1 )P(Xn |X1 , ..., Xn−1 )
= P(X1 , ..., Xn−2 )P(Xn−1 |X1 , ..., Xn−2 )P(Xn |X1 , ..., Xn−1 )
= ...
Qn
= i=1 P(Xi |X1 , ..., Xi−1 )

16 / 44
Conditional Probabilities [cont.]
def P(a∧b)
Conditional probability: P(a|b) = P(b) , s.t. P(b) > 0
P(Total=11∧die1 =5)
ex: P(Total = 11|die1 = 5) = = 1/6·1/6
P(die1 =5) 1/6 = 1/6
observing b restricts the possible worlds to those where b is true
Production rule: P(a ∧ b) = P(a|b) · P(b) = P(b|a) · P(a)
Production rule for whole distributions: P(X , Y ) = P(X |Y ) · P(Y )
ex: P(Weather , Cavity ) = P(Weather |Cavity )P(Cavity ), that is:
P(sunny , cavity ) = P(sunny |cavity )P(cavity )
...
P(snow, ¬cavity ) = P(snow|¬cavity )P(¬cavity )
a 4 × 2 set of equations, not matrix multiplication!
Chain rule is derived by successive application of product rule:
P(X1 , ..., Xn )
= P(X1 , ..., Xn−1 )P(Xn |X1 , ..., Xn−1 )
= P(X1 , ..., Xn−2 )P(Xn−1 |X1 , ..., Xn−2 )P(Xn |X1 , ..., Xn−1 )
= ...
Qn
= i=1 P(Xi |X1 , ..., Xi−1 )

16 / 44
Conditional Probabilities [cont.]
def P(a∧b)
Conditional probability: P(a|b) = P(b) , s.t. P(b) > 0
P(Total=11∧die1 =5)
ex: P(Total = 11|die1 = 5) = = 1/6·1/6
P(die1 =5) 1/6 = 1/6
observing b restricts the possible worlds to those where b is true
Production rule: P(a ∧ b) = P(a|b) · P(b) = P(b|a) · P(a)
Production rule for whole distributions: P(X , Y ) = P(X |Y ) · P(Y )
ex: P(Weather , Cavity ) = P(Weather |Cavity )P(Cavity ), that is:
P(sunny , cavity ) = P(sunny |cavity )P(cavity )
...
P(snow, ¬cavity ) = P(snow|¬cavity )P(¬cavity )
a 4 × 2 set of equations, not matrix multiplication!
Chain rule is derived by successive application of product rule:
P(X1 , ..., Xn )
= P(X1 , ..., Xn−1 )P(Xn |X1 , ..., Xn−1 )
= P(X1 , ..., Xn−2 )P(Xn−1 |X1 , ..., Xn−2 )P(Xn |X1 , ..., Xn−1 )
= ...
Qn
= i=1 P(Xi |X1 , ..., Xi−1 )

16 / 44
Logic vs. Probability

Logic Probability
a P(a) = 1
¬a P(a) = 0
a→b P(b|a) = 1
(a, a → b) P(a) = 1, P(b|a) = 1
b P(b) = 1
(a → b, b → c) P(b|a) = 1, P(c|b) = 1
a→c P(c|a) = 1

Proof of P(b|a) = 1, P(c|b) = 1 =⇒ P(c|a) = 1


def
P(b|a) = 1 =⇒ P(¬b, a) = P(¬b|a)P(a) = 0
def
P(c|b) = 1 =⇒ P(¬c, b) = P(¬c|b)P(b) = 0
P(¬c, a) = P(¬c, a, b) + P(¬c, a, ¬b) ≤ P(¬c, b) + P(a, ¬b) = 0
| {z } | {z }
0 0
P(¬c|a) = P(¬c, a)/P(a) = 0
P(c|a) = 1 − P(¬c|a) = 1
(Courtesy of Maria Simi, UniPI)
17 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

18 / 44
Probabilistic Inference via Enumeration

Basic Ideas
Start with the joint distribution P(Toothache, Catch, Cavity )
For any proposition ϕ, sum the atomic events where ϕ is true:
P(ϕ) = Σω : ω|=ϕ P(ω)

19 / 44
Probabilistic Inference via Enumeration

Basic Ideas
Start with the joint distribution P(Toothache, Catch, Cavity )
For any proposition ϕ, sum the atomic events where ϕ is true:
P(ϕ) = Σω : ω|=ϕ P(ω)

19 / 44
Probabilistic Inference via Enumeration: Example

Example: Generic Inference


Start with the joint distribution P(Toothache, Catch, Cavity )
For any proposition ϕ, sum the atomic events where ϕ is true:
P(ϕ) = Σω : ω|=ϕ P(ω):
Ex: P(cavity ∨ toothache) =
0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28

( c S. Russell & P. Norwig, AIMA)

20 / 44
Probabilistic Inference via Enumeration: Example

Example: Generic Inference


Start with the joint distribution P(Toothache, Catch, Cavity )
For any proposition ϕ, sum the atomic events where ϕ is true:
P(ϕ) = Σω : ω|=ϕ P(ω):
Ex: P(cavity ∨ toothache) =
0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28

( c S. Russell & P. Norwig, AIMA)

20 / 44
Marginalization

Start with the joint distribution P(Toothache, Catch, Cavity )


Marginalization (aka summing out): sum up the probabilities for
each possible value of the other variables:
P
P(Y) = z∈Z P(Y, z)
P
Ex: P(Toothache) = z∈{Catch,Cavity } P(Toothache, z)
Conditioning: variant of marginalization, involving conditional
probabilities instead of joint probabilities (using the product rule)
P
P(Y) = z∈Z P(Y|z)P(z)
P
Ex: P(Toothache) = z∈{Catch,Cavity } P(Toothache|z)P(z)

21 / 44
Marginalization

Start with the joint distribution P(Toothache, Catch, Cavity )


Marginalization (aka summing out): sum up the probabilities for
each possible value of the other variables:
P
P(Y) = z∈Z P(Y, z)
P
Ex: P(Toothache) = z∈{Catch,Cavity } P(Toothache, z)
Conditioning: variant of marginalization, involving conditional
probabilities instead of joint probabilities (using the product rule)
P
P(Y) = z∈Z P(Y|z)P(z)
P
Ex: P(Toothache) = z∈{Catch,Cavity } P(Toothache|z)P(z)

21 / 44
Marginalization

Start with the joint distribution P(Toothache, Catch, Cavity )


Marginalization (aka summing out): sum up the probabilities for
each possible value of the other variables:
P
P(Y) = z∈Z P(Y, z)
P
Ex: P(Toothache) = z∈{Catch,Cavity } P(Toothache, z)
Conditioning: variant of marginalization, involving conditional
probabilities instead of joint probabilities (using the product rule)
P
P(Y) = z∈Z P(Y|z)P(z)
P
Ex: P(Toothache) = z∈{Catch,Cavity } P(Toothache|z)P(z)

21 / 44
Marginalization: Example

Start with the joint distribution P(Toothache, Catch, Cavity )


Marginalization (aka summing out): sum up the probabilities for
each possible value of the other variables:
P
P(Y) = z∈Z P(Y, z)
P
Ex: P(Toothache) = z∈{Catch,Cavity } P(Toothache, z)
P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
P(¬toothache) = 1 − P(toothache) = 1 − 0.2 = 0.8
=⇒ P(Toothache) = h0.2, 0.8i

( c S. Russell & P. Norwig, AIMA)


22 / 44
Marginalization: Example

Start with the joint distribution P(Toothache, Catch, Cavity )


Marginalization (aka summing out): sum up the probabilities for
each possible value of the other variables:
P
P(Y) = z∈Z P(Y, z)
P
Ex: P(Toothache) = z∈{Catch,Cavity } P(Toothache, z)
P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
P(¬toothache) = 1 − P(toothache) = 1 − 0.2 = 0.8
=⇒ P(Toothache) = h0.2, 0.8i

( c S. Russell & P. Norwig, AIMA)


22 / 44
Marginalization: Example

Start with the joint distribution P(Toothache, Catch, Cavity )


Marginalization (aka summing out): sum up the probabilities for
each possible value of the other variables:
P
P(Y) = z∈Z P(Y, z)
P
Ex: P(Toothache) = z∈{Catch,Cavity } P(Toothache, z)
P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
P(¬toothache) = 1 − P(toothache) = 1 − 0.2 = 0.8
=⇒ P(Toothache) = h0.2, 0.8i

( c S. Russell & P. Norwig, AIMA)


22 / 44
Conditional Probability via Enumeration: Example

Start with the joint distribution P(Toothache, Catch, Cavity )


Conditional Probability:
P(¬cavity ∧toothache)
Ex: P(¬cavity |toothache) = P(toothache)
0.016+0.064
= 0.108+0.012+0.016+0.064 = 0.4
P(cavity ∧toothache)
Ex: P(cavity |toothache) = P(toothache) = ... = 0.6

( c S. Russell & P. Norwig, AIMA)

23 / 44
Conditional Probability via Enumeration: Example

Start with the joint distribution P(Toothache, Catch, Cavity )


Conditional Probability:
P(¬cavity ∧toothache)
Ex: P(¬cavity |toothache) = P(toothache)
0.016+0.064
= 0.108+0.012+0.016+0.064 = 0.4
P(cavity ∧toothache)
Ex: P(cavity |toothache) = P(toothache) = ... = 0.6

( c S. Russell & P. Norwig, AIMA)

23 / 44
Conditional Probability via Enumeration: Example

Start with the joint distribution P(Toothache, Catch, Cavity )


Conditional Probability:
P(¬cavity ∧toothache)
Ex: P(¬cavity |toothache) = P(toothache)
0.016+0.064
= 0.108+0.012+0.016+0.064 = 0.4
P(cavity ∧toothache)
Ex: P(cavity |toothache) = P(toothache) = ... = 0.6

( c S. Russell & P. Norwig, AIMA)

23 / 44
Normalization

Let X be all the variables. Typically, we want P(Y|E = e):


the conditional joint distribution of the query variables Y
given specific values e for the evidence variables E
def
let the hidden variables be H = X \ (Y ∪ E)
The summation of joint entries is done by summing out the
hidden variables:
P(Y |E = e) = αP(Y , E = e) = αΣh∈H P(Y , E = e, H = h)
def
where α = 1/P(E = e) (different α’s for different values of e)
=⇒ it is easy to compute α by normalization
note: the terms in the summation are joint entries,
because Y, E, H together exhaust the set of random variables X
Complexity: O(2n ), n number of propositions =⇒ impractical

24 / 44
Normalization

Let X be all the variables. Typically, we want P(Y|E = e):


the conditional joint distribution of the query variables Y
given specific values e for the evidence variables E
def
let the hidden variables be H = X \ (Y ∪ E)
The summation of joint entries is done by summing out the
hidden variables:
P(Y |E = e) = αP(Y , E = e) = αΣh∈H P(Y , E = e, H = h)
def
where α = 1/P(E = e) (different α’s for different values of e)
=⇒ it is easy to compute α by normalization
note: the terms in the summation are joint entries,
because Y, E, H together exhaust the set of random variables X
Complexity: O(2n ), n number of propositions =⇒ impractical

24 / 44
Normalization

Let X be all the variables. Typically, we want P(Y|E = e):


the conditional joint distribution of the query variables Y
given specific values e for the evidence variables E
def
let the hidden variables be H = X \ (Y ∪ E)
The summation of joint entries is done by summing out the
hidden variables:
P(Y |E = e) = αP(Y , E = e) = αΣh∈H P(Y , E = e, H = h)
def
where α = 1/P(E = e) (different α’s for different values of e)
=⇒ it is easy to compute α by normalization
note: the terms in the summation are joint entries,
because Y, E, H together exhaust the set of random variables X
Complexity: O(2n ), n number of propositions =⇒ impractical

24 / 44
Normalization: Example
def
α = 1/P(toothache) can be viewed as a normalization constant
Idea: compute whole distribution on query variable by:
fixing evidence variables and summingP over hidden variables
normalize the final distribution, so that ... = 1
Ex:
P(Cavity |toothache) = αP(Cavity ∧ toothache)
= α[P(Cavity , toothache, catch) + P(Cavity , toothache, ¬catch)]
= α[h0.108, 0.016i + h0.012, 0.064i]
= αh0.12, 0.08i = (normalization) = h0.6, 0.4i [α = 5]
P(Cavity |¬toothache) = ... = αh0.08, 0.72i = h0.1, 0.9i[α = 1.25]

( c S. Russell & P. Norwig, AIMA)


25 / 44
Normalization: Example
def
α = 1/P(toothache) can be viewed as a normalization constant
Idea: compute whole distribution on query variable by:
fixing evidence variables and summingP over hidden variables
normalize the final distribution, so that ... = 1
Ex:
P(Cavity |toothache) = αP(Cavity ∧ toothache)
= α[P(Cavity , toothache, catch) + P(Cavity , toothache, ¬catch)]
= α[h0.108, 0.016i + h0.012, 0.064i]
= αh0.12, 0.08i = (normalization) = h0.6, 0.4i [α = 5]
P(Cavity |¬toothache) = ... = αh0.08, 0.72i = h0.1, 0.9i[α = 1.25]

( c S. Russell & P. Norwig, AIMA)


25 / 44
Normalization: Example
def
α = 1/P(toothache) can be viewed as a normalization constant
Idea: compute whole distribution on query variable by:
fixing evidence variables and summingP over hidden variables
normalize the final distribution, so that ... = 1
Ex:
P(Cavity |toothache) = αP(Cavity ∧ toothache)
= α[P(Cavity , toothache, catch) + P(Cavity , toothache, ¬catch)]
= α[h0.108, 0.016i + h0.012, 0.064i]
= αh0.12, 0.08i = (normalization) = h0.6, 0.4i [α = 5]
P(Cavity |¬toothache) = ... = αh0.08, 0.72i = h0.1, 0.9i[α = 1.25]

( c S. Russell & P. Norwig, AIMA)


25 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

26 / 44
Independence
Variables X and Y are independent iff P(X , Y ) = P(X )P(Y )
(or equivalently, iff P(X |Y ) = P(X ) or P(Y |X ) = P(Y ))
ex: P(Toothache, Catch, Cavity , Weather ) =
P(Toothache, Catch, Cavity )P(Weather )
=⇒ e.g. P(toothache, catch, cavity , cloudy ) =
P(toothache, catch, cavity )P(cloudy )
typically based on domain knowledge
May drastically reduce the number of entries and computation
=⇒ ex: 32-element table decomposed into one 8-element and one
4-element table
Unfortunately, absolute independence is quite rare

( c S. Russell & P. Norwig, AIMA)


27 / 44
Independence
Variables X and Y are independent iff P(X , Y ) = P(X )P(Y )
(or equivalently, iff P(X |Y ) = P(X ) or P(Y |X ) = P(Y ))
ex: P(Toothache, Catch, Cavity , Weather ) =
P(Toothache, Catch, Cavity )P(Weather )
=⇒ e.g. P(toothache, catch, cavity , cloudy ) =
P(toothache, catch, cavity )P(cloudy )
typically based on domain knowledge
May drastically reduce the number of entries and computation
=⇒ ex: 32-element table decomposed into one 8-element and one
4-element table
Unfortunately, absolute independence is quite rare

( c S. Russell & P. Norwig, AIMA)


27 / 44
Independence
Variables X and Y are independent iff P(X , Y ) = P(X )P(Y )
(or equivalently, iff P(X |Y ) = P(X ) or P(Y |X ) = P(Y ))
ex: P(Toothache, Catch, Cavity , Weather ) =
P(Toothache, Catch, Cavity )P(Weather )
=⇒ e.g. P(toothache, catch, cavity , cloudy ) =
P(toothache, catch, cavity )P(cloudy )
typically based on domain knowledge
May drastically reduce the number of entries and computation
=⇒ ex: 32-element table decomposed into one 8-element and one
4-element table
Unfortunately, absolute independence is quite rare

( c S. Russell & P. Norwig, AIMA)


27 / 44
Conditional Independence

Variables X and Y are conditionally independent given Z iff


P(X , Y |Z) = P(X |Z)P(Y |Z)
(or equivalently, iff P(X |Y , Z) = P(X |Z) or P(Y |X , Z) = P(Y |Z))
Consider P(Toothache, Cavity , Catch)
if I have a cavity, the probability that the probe catches in it doesn’t
depend on whether I have a toothache:
P(catch|toothache, cavity ) = P(catch|cavity )
the same independence holds if I haven’t got a cavity:
P(catch|toothache, ¬cavity ) = P(catch|¬cavity )
=⇒ Catch is conditionally independent of Toothache given Cavity:
P(Catch|Toothache, Cavity ) = P(Catch|Cavity )
or, equivalently:
P(Toothache|Catch, Cavity ) = P(Toothache|Cavity ), or
P(Toothache, Catch|Cavity ) =
P(Toothache|Cavity )P(Catch|Cavity )

28 / 44
Conditional Independence

Variables X and Y are conditionally independent given Z iff


P(X , Y |Z) = P(X |Z)P(Y |Z)
(or equivalently, iff P(X |Y , Z) = P(X |Z) or P(Y |X , Z) = P(Y |Z))
Consider P(Toothache, Cavity , Catch)
if I have a cavity, the probability that the probe catches in it doesn’t
depend on whether I have a toothache:
P(catch|toothache, cavity ) = P(catch|cavity )
the same independence holds if I haven’t got a cavity:
P(catch|toothache, ¬cavity ) = P(catch|¬cavity )
=⇒ Catch is conditionally independent of Toothache given Cavity:
P(Catch|Toothache, Cavity ) = P(Catch|Cavity )
or, equivalently:
P(Toothache|Catch, Cavity ) = P(Toothache|Cavity ), or
P(Toothache, Catch|Cavity ) =
P(Toothache|Cavity )P(Catch|Cavity )

28 / 44
Conditional Independence [cont.]

In many cases, the use of conditional independence reduces the


size of the representation of the joint distribution dramatically
even from exponential to linear!
P(Toothache, Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch, Cavity )
Ex:
= P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
= P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
=⇒ Passes from 7 to 2+2+1=5 independent numbers
P(Toothache, Catch, Cavity ) contains
P 7 independent entries
(the 8th can be obtained as 1 − ...)
P(Toothache|Cavity ),P(Catch|Cavity ) contain 2 independent
entries (2 × 2 matrix, each row sums to 1)
P(Cavity ) contains 1 independent entry
General Case: if one causes has n independent
Q effects:
P(Cause, Effect1 , ..., Effectn ) = P(Cause) i P(Effecti |Cause)
=⇒ reduces from 2n+1 − 1 to 2n + 1 independent entries
29 / 44
Conditional Independence [cont.]

In many cases, the use of conditional independence reduces the


size of the representation of the joint distribution dramatically
even from exponential to linear!
P(Toothache, Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch, Cavity )
Ex:
= P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
= P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
=⇒ Passes from 7 to 2+2+1=5 independent numbers
P(Toothache, Catch, Cavity ) contains
P 7 independent entries
(the 8th can be obtained as 1 − ...)
P(Toothache|Cavity ),P(Catch|Cavity ) contain 2 independent
entries (2 × 2 matrix, each row sums to 1)
P(Cavity ) contains 1 independent entry
General Case: if one causes has n independent
Q effects:
P(Cause, Effect1 , ..., Effectn ) = P(Cause) i P(Effecti |Cause)
=⇒ reduces from 2n+1 − 1 to 2n + 1 independent entries
29 / 44
Conditional Independence [cont.]

In many cases, the use of conditional independence reduces the


size of the representation of the joint distribution dramatically
even from exponential to linear!
P(Toothache, Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch, Cavity )
Ex:
= P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
= P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
=⇒ Passes from 7 to 2+2+1=5 independent numbers
P(Toothache, Catch, Cavity ) contains
P 7 independent entries
(the 8th can be obtained as 1 − ...)
P(Toothache|Cavity ),P(Catch|Cavity ) contain 2 independent
entries (2 × 2 matrix, each row sums to 1)
P(Cavity ) contains 1 independent entry
General Case: if one causes has n independent
Q effects:
P(Cause, Effect1 , ..., Effectn ) = P(Cause) i P(Effecti |Cause)
=⇒ reduces from 2n+1 − 1 to 2n + 1 independent entries
29 / 44
Conditional Independence [cont.]

In many cases, the use of conditional independence reduces the


size of the representation of the joint distribution dramatically
even from exponential to linear!
P(Toothache, Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch, Cavity )
Ex:
= P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
= P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
=⇒ Passes from 7 to 2+2+1=5 independent numbers
P(Toothache, Catch, Cavity ) contains
P 7 independent entries
(the 8th can be obtained as 1 − ...)
P(Toothache|Cavity ),P(Catch|Cavity ) contain 2 independent
entries (2 × 2 matrix, each row sums to 1)
P(Cavity ) contains 1 independent entry
General Case: if one causes has n independent
Q effects:
P(Cause, Effect1 , ..., Effectn ) = P(Cause) i P(Effecti |Cause)
=⇒ reduces from 2n+1 − 1 to 2n + 1 independent entries
29 / 44
Exercise

Consider the joint probability distribution described in the table in


previous section (slide 20 onwards): P(Toothache, Catch, Cavity )
Consider the example in previous slide:
P(Toothache, Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
= P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
Compute separately the distributions
P(Toothache|Catch, Cavity ), P(Catch|Cavity ), P(Cavity ),
P(Toothache|Cavity ).
Recompute P(Toothache, Catch, Cavity ) in two ways:
P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
and compare the result with P(Toothache, Catch, Cavity )

30 / 44
Exercise

Consider the joint probability distribution described in the table in


previous section (slide 20 onwards): P(Toothache, Catch, Cavity )
Consider the example in previous slide:
P(Toothache, Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
= P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
Compute separately the distributions
P(Toothache|Catch, Cavity ), P(Catch|Cavity ), P(Cavity ),
P(Toothache|Cavity ).
Recompute P(Toothache, Catch, Cavity ) in two ways:
P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
and compare the result with P(Toothache, Catch, Cavity )

30 / 44
Exercise

Consider the joint probability distribution described in the table in


previous section (slide 20 onwards): P(Toothache, Catch, Cavity )
Consider the example in previous slide:
P(Toothache, Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
= P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
Compute separately the distributions
P(Toothache|Catch, Cavity ), P(Catch|Cavity ), P(Cavity ),
P(Toothache|Cavity ).
Recompute P(Toothache, Catch, Cavity ) in two ways:
P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
and compare the result with P(Toothache, Catch, Cavity )

30 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

31 / 44
Bayes’ Rule

Bayes’ Rule/Theorem/Law
P(a ∧ b) P(b|a)P(a)
Bayes’ rule: P(a|b) = =
P(b) P(b)
P(X |Y )P(Y )
In distribution form P(Y |X ) = = αP(X |Y )P(Y )
P(X )
def
α = 1/P(X ): normalization constant to make P(Y |X ) entries sum
to 1 (different α0 s for different values of X )
A version conditionalized on some background evidence e:
P(X |Y , e)P(Y |e)
P(Y |X , e) =
P(X |e)

32 / 44
Bayes’ Rule

Bayes’ Rule/Theorem/Law
P(a ∧ b) P(b|a)P(a)
Bayes’ rule: P(a|b) = =
P(b) P(b)
P(X |Y )P(Y )
In distribution form P(Y |X ) = = αP(X |Y )P(Y )
P(X )
def
α = 1/P(X ): normalization constant to make P(Y |X ) entries sum
to 1 (different α0 s for different values of X )
A version conditionalized on some background evidence e:
P(X |Y , e)P(Y |e)
P(Y |X , e) =
P(X |e)

32 / 44
Bayes’ Rule

Bayes’ Rule/Theorem/Law
P(a ∧ b) P(b|a)P(a)
Bayes’ rule: P(a|b) = =
P(b) P(b)
P(X |Y )P(Y )
In distribution form P(Y |X ) = = αP(X |Y )P(Y )
P(X )
def
α = 1/P(X ): normalization constant to make P(Y |X ) entries sum
to 1 (different α0 s for different values of X )
A version conditionalized on some background evidence e:
P(X |Y , e)P(Y |e)
P(Y |X , e) =
P(X |e)

32 / 44
Using Bayes’ Rule: The Simple Case
Used to assess diagnostic probability from causal probability:
P(effect|cause)P(cause)
P(cause|effect) =
P(effect)
P(cause|effect) goes from effect to cause (diagnostic direction)
P(effect|cause) goes from cause to effect (causal direction)

Example
An expert doctor is likely to have causal knowledge ...
P(symptoms|disease) (i.e., P(effect|cause))
... and needs producing diagnostic knowledge
P(disease|symptoms) (i.e., P(cause|effect))
Ex: let m be meningitis, s be stiff neck
P(m) = 1/50000, P(s) = 0.01 (prior knowledge, from statistics)
“meningitis causes to the patient a stiff neck in 70% of cases”:
P(s|m) = 0.7 (doctor’s experience)
P(s|m)P(m) 0.7 · 1/50000
=⇒ P(m|s) = = = 0.0014
P(s) 0.01
33 / 44
Using Bayes’ Rule: The Simple Case
Used to assess diagnostic probability from causal probability:
P(effect|cause)P(cause)
P(cause|effect) =
P(effect)
P(cause|effect) goes from effect to cause (diagnostic direction)
P(effect|cause) goes from cause to effect (causal direction)

Example
An expert doctor is likely to have causal knowledge ...
P(symptoms|disease) (i.e., P(effect|cause))
... and needs producing diagnostic knowledge
P(disease|symptoms) (i.e., P(cause|effect))
Ex: let m be meningitis, s be stiff neck
P(m) = 1/50000, P(s) = 0.01 (prior knowledge, from statistics)
“meningitis causes to the patient a stiff neck in 70% of cases”:
P(s|m) = 0.7 (doctor’s experience)
P(s|m)P(m) 0.7 · 1/50000
=⇒ P(m|s) = = = 0.0014
P(s) 0.01
33 / 44
Using Bayes’ Rule: The Simple Case
Used to assess diagnostic probability from causal probability:
P(effect|cause)P(cause)
P(cause|effect) =
P(effect)
P(cause|effect) goes from effect to cause (diagnostic direction)
P(effect|cause) goes from cause to effect (causal direction)

Example
An expert doctor is likely to have causal knowledge ...
P(symptoms|disease) (i.e., P(effect|cause))
... and needs producing diagnostic knowledge
P(disease|symptoms) (i.e., P(cause|effect))
Ex: let m be meningitis, s be stiff neck
P(m) = 1/50000, P(s) = 0.01 (prior knowledge, from statistics)
“meningitis causes to the patient a stiff neck in 70% of cases”:
P(s|m) = 0.7 (doctor’s experience)
P(s|m)P(m) 0.7 · 1/50000
=⇒ P(m|s) = = = 0.0014
P(s) 0.01
33 / 44
Using Bayes’ Rule: The Simple Case
Used to assess diagnostic probability from causal probability:
P(effect|cause)P(cause)
P(cause|effect) =
P(effect)
P(cause|effect) goes from effect to cause (diagnostic direction)
P(effect|cause) goes from cause to effect (causal direction)

Example
An expert doctor is likely to have causal knowledge ...
P(symptoms|disease) (i.e., P(effect|cause))
... and needs producing diagnostic knowledge
P(disease|symptoms) (i.e., P(cause|effect))
Ex: let m be meningitis, s be stiff neck
P(m) = 1/50000, P(s) = 0.01 (prior knowledge, from statistics)
“meningitis causes to the patient a stiff neck in 70% of cases”:
P(s|m) = 0.7 (doctor’s experience)
P(s|m)P(m) 0.7 · 1/50000
=⇒ P(m|s) = = = 0.0014
P(s) 0.01
33 / 44
Using Bayes’ Rule: The Simple Case
Used to assess diagnostic probability from causal probability:
P(effect|cause)P(cause)
P(cause|effect) =
P(effect)
P(cause|effect) goes from effect to cause (diagnostic direction)
P(effect|cause) goes from cause to effect (causal direction)

Example
An expert doctor is likely to have causal knowledge ...
P(symptoms|disease) (i.e., P(effect|cause))
... and needs producing diagnostic knowledge
P(disease|symptoms) (i.e., P(cause|effect))
Ex: let m be meningitis, s be stiff neck
P(m) = 1/50000, P(s) = 0.01 (prior knowledge, from statistics)
“meningitis causes to the patient a stiff neck in 70% of cases”:
P(s|m) = 0.7 (doctor’s experience)
P(s|m)P(m) 0.7 · 1/50000
=⇒ P(m|s) = = = 0.0014
P(s) 0.01
33 / 44
Using Bayes’ Rule: The Simple Case
Used to assess diagnostic probability from causal probability:
P(effect|cause)P(cause)
P(cause|effect) =
P(effect)
P(cause|effect) goes from effect to cause (diagnostic direction)
P(effect|cause) goes from cause to effect (causal direction)

Example
An expert doctor is likely to have causal knowledge ...
P(symptoms|disease) (i.e., P(effect|cause))
... and needs producing diagnostic knowledge
P(disease|symptoms) (i.e., P(cause|effect))
Ex: let m be meningitis, s be stiff neck
P(m) = 1/50000, P(s) = 0.01 (prior knowledge, from statistics)
“meningitis causes to the patient a stiff neck in 70% of cases”:
P(s|m) = 0.7 (doctor’s experience)
P(s|m)P(m) 0.7 · 1/50000
=⇒ P(m|s) = = = 0.0014
P(s) 0.01
33 / 44
Using Bayes’ Rule: The Simple Case
Used to assess diagnostic probability from causal probability:
P(effect|cause)P(cause)
P(cause|effect) =
P(effect)
P(cause|effect) goes from effect to cause (diagnostic direction)
P(effect|cause) goes from cause to effect (causal direction)

Example
An expert doctor is likely to have causal knowledge ...
P(symptoms|disease) (i.e., P(effect|cause))
... and needs producing diagnostic knowledge
P(disease|symptoms) (i.e., P(cause|effect))
Ex: let m be meningitis, s be stiff neck
P(m) = 1/50000, P(s) = 0.01 (prior knowledge, from statistics)
“meningitis causes to the patient a stiff neck in 70% of cases”:
P(s|m) = 0.7 (doctor’s experience)
P(s|m)P(m) 0.7 · 1/50000
=⇒ P(m|s) = = = 0.0014
P(s) 0.01
33 / 44
Using Bayes’ Rule: Combining Evidence
A naive Bayes model is a probability model that assumes the
effects are conditionally independent, given the cause
Q
=⇒ P(Cause, Effect1 , ..., Effectn ) = P(Cause) i P(Effecti |Cause)
total number of parameters is linear in n
ex: P(Cavity , Toothache, Catch) =
P(Cavity )P(Toothache|Cavity )P(Catch|Cavity )
Q: How can we compute P(Cause|Effect1 , ..., Effectk )?
ex P(Cavity |toothache ∧ catch)?

( c S. Russell & P. Norwig, AIMA)


34 / 44
Using Bayes’ Rule: Combining Evidence
A naive Bayes model is a probability model that assumes the
effects are conditionally independent, given the cause
Q
=⇒ P(Cause, Effect1 , ..., Effectn ) = P(Cause) i P(Effecti |Cause)
total number of parameters is linear in n
ex: P(Cavity , Toothache, Catch) =
P(Cavity )P(Toothache|Cavity )P(Catch|Cavity )
Q: How can we compute P(Cause|Effect1 , ..., Effectk )?
ex P(Cavity |toothache ∧ catch)?

( c S. Russell & P. Norwig, AIMA)


34 / 44
Using Bayes’ Rule: Combining Evidence [cont.]

Q: How can we compute P(Cause|Effect1 , ..., Effectk )?


ex: P(Cavity |toothache ∧ catch)?
A: Apply Bayes’ Rule
P(Cavity |toothache ∧ catch)
= P(toothache ∧ catch|Cavity )P(Cavity )/P(toothache ∧ catch)
= αP(toothache ∧ catch|Cavity )P(Cavity )
= αP(toothache|Cavity )P(catch|Cavity )P(Cavity )
def
α = 1/P(toothache ∧ catch) not computed explicitly
General case: Q
P(Cause|Effect1 , ..., Effectn ) = αP(Cause) i P(Effecti |Cause)
def
α = 1/P(Effect1 , ..., Effectn ) not computed explicitly
(one α value for every value of Effect1 , ..., Effectn )
=⇒ reduces from 2n+1 − 1 to 2n + 1 independent entries

35 / 44
Using Bayes’ Rule: Combining Evidence [cont.]

Q: How can we compute P(Cause|Effect1 , ..., Effectk )?


ex: P(Cavity |toothache ∧ catch)?
A: Apply Bayes’ Rule
P(Cavity |toothache ∧ catch)
= P(toothache ∧ catch|Cavity )P(Cavity )/P(toothache ∧ catch)
= αP(toothache ∧ catch|Cavity )P(Cavity )
= αP(toothache|Cavity )P(catch|Cavity )P(Cavity )
def
α = 1/P(toothache ∧ catch) not computed explicitly
General case: Q
P(Cause|Effect1 , ..., Effectn ) = αP(Cause) i P(Effecti |Cause)
def
α = 1/P(Effect1 , ..., Effectn ) not computed explicitly
(one α value for every value of Effect1 , ..., Effectn )
=⇒ reduces from 2n+1 − 1 to 2n + 1 independent entries

35 / 44
Using Bayes’ Rule: Combining Evidence [cont.]

Q: How can we compute P(Cause|Effect1 , ..., Effectk )?


ex: P(Cavity |toothache ∧ catch)?
A: Apply Bayes’ Rule
P(Cavity |toothache ∧ catch)
= P(toothache ∧ catch|Cavity )P(Cavity )/P(toothache ∧ catch)
= αP(toothache ∧ catch|Cavity )P(Cavity )
= αP(toothache|Cavity )P(catch|Cavity )P(Cavity )
def
α = 1/P(toothache ∧ catch) not computed explicitly
General case: Q
P(Cause|Effect1 , ..., Effectn ) = αP(Cause) i P(Effecti |Cause)
def
α = 1/P(Effect1 , ..., Effectn ) not computed explicitly
(one α value for every value of Effect1 , ..., Effectn )
=⇒ reduces from 2n+1 − 1 to 2n + 1 independent entries

35 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

36 / 44
An Example: The Wumpus World
A probability model of the Wumpus World
Consider again the Wumpus World (restricted to pit detection)
Evidence: no pit in (1,1), (1,2), (2,1), breezy in (1,2), (2,1)
Q. Given the evidence, what is the probability of having a pit in
(1,3), (2,2) or (3,1)?
Two groups of variables:
Pij = true iff [i, j] contains a pit
(“causes”)
Bij = true iff [i, j] is breezy
(“effects”, consider only
B1,1 , B1,2 , B2,1 )
Joint Distribution:
P(P1,1 , ..., P4,4 , B1,1 , B1,2 , B2,1 )
Known facts (evidence):
def
b∗ = ¬b1,1 ∧ b1,2 ∧ b2,1
def
p∗ = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1 ( c S. Russell & P. Norwig, AIMA)

37 / 44
An Example: The Wumpus World
A probability model of the Wumpus World
Consider again the Wumpus World (restricted to pit detection)
Evidence: no pit in (1,1), (1,2), (2,1), breezy in (1,2), (2,1)
Q. Given the evidence, what is the probability of having a pit in
(1,3), (2,2) or (3,1)?
Two groups of variables:
Pij = true iff [i, j] contains a pit
(“causes”)
Bij = true iff [i, j] is breezy
(“effects”, consider only
B1,1 , B1,2 , B2,1 )
Joint Distribution:
P(P1,1 , ..., P4,4 , B1,1 , B1,2 , B2,1 )
Known facts (evidence):
def
b∗ = ¬b1,1 ∧ b1,2 ∧ b2,1
def
p∗ = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1 ( c S. Russell & P. Norwig, AIMA)

37 / 44
An Example: The Wumpus World
A probability model of the Wumpus World
Consider again the Wumpus World (restricted to pit detection)
Evidence: no pit in (1,1), (1,2), (2,1), breezy in (1,2), (2,1)
Q. Given the evidence, what is the probability of having a pit in
(1,3), (2,2) or (3,1)?
Two groups of variables:
Pij = true iff [i, j] contains a pit
(“causes”)
Bij = true iff [i, j] is breezy
(“effects”, consider only
B1,1 , B1,2 , B2,1 )
Joint Distribution:
P(P1,1 , ..., P4,4 , B1,1 , B1,2 , B2,1 )
Known facts (evidence):
def
b∗ = ¬b1,1 ∧ b1,2 ∧ b2,1
def
p∗ = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1 ( c S. Russell & P. Norwig, AIMA)

37 / 44
An Example: The Wumpus World
A probability model of the Wumpus World
Consider again the Wumpus World (restricted to pit detection)
Evidence: no pit in (1,1), (1,2), (2,1), breezy in (1,2), (2,1)
Q. Given the evidence, what is the probability of having a pit in
(1,3), (2,2) or (3,1)?
Two groups of variables:
Pij = true iff [i, j] contains a pit
(“causes”)
Bij = true iff [i, j] is breezy
(“effects”, consider only
B1,1 , B1,2 , B2,1 )
Joint Distribution:
P(P1,1 , ..., P4,4 , B1,1 , B1,2 , B2,1 )
Known facts (evidence):
def
b∗ = ¬b1,1 ∧ b1,2 ∧ b2,1
def
p∗ = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1 ( c S. Russell & P. Norwig, AIMA)

37 / 44
An Example: The Wumpus World
A probability model of the Wumpus World
Consider again the Wumpus World (restricted to pit detection)
Evidence: no pit in (1,1), (1,2), (2,1), breezy in (1,2), (2,1)
Q. Given the evidence, what is the probability of having a pit in
(1,3), (2,2) or (3,1)?
Two groups of variables:
Pij = true iff [i, j] contains a pit
(“causes”)
Bij = true iff [i, j] is breezy
(“effects”, consider only
B1,1 , B1,2 , B2,1 )
Joint Distribution:
P(P1,1 , ..., P4,4 , B1,1 , B1,2 , B2,1 )
Known facts (evidence):
def
b∗ = ¬b1,1 ∧ b1,2 ∧ b2,1
def
p∗ = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1 ( c S. Russell & P. Norwig, AIMA)

37 / 44
An Example: The Wumpus World
A probability model of the Wumpus World
Consider again the Wumpus World (restricted to pit detection)
Evidence: no pit in (1,1), (1,2), (2,1), breezy in (1,2), (2,1)
Q. Given the evidence, what is the probability of having a pit in
(1,3), (2,2) or (3,1)?
Two groups of variables:
Pij = true iff [i, j] contains a pit
(“causes”)
Bij = true iff [i, j] is breezy
(“effects”, consider only
B1,1 , B1,2 , B2,1 )
Joint Distribution:
P(P1,1 , ..., P4,4 , B1,1 , B1,2 , B2,1 )
Known facts (evidence):
def
b∗ = ¬b1,1 ∧ b1,2 ∧ b2,1
def
p∗ = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1 ( c S. Russell & P. Norwig, AIMA)

37 / 44
An Example: The Wumpus World [cont.]

Specifying the probability model


Apply the product rule to the joint distribution
P(P1,1 , ..., P4,4 , B1,1 , B1,2 , B2,1 ) =
P(B1,1 , B1,2 , B2,1 |P1,1 , ..., P4,4 ) P(P1,1 , ..., P4,4 )
P(B1,1 , B1,2 , B2,1 |P1,1 , ..., P4,4 )
1 if one pit is adjacent to breeze,
0 otherwise
P(P1,1 , ..., P4,4 ): pits are placed randomly except in (1,1)
P(P1,1 , ..., P4,4 ) = 4i=1 4j=1 P(Pi,j )
Q Q

0.2 if (i, j) 6= (1, 1)}
P(Pi,j ) =
0 otherwise
ex: P(P1,1 , ..., P4,4 ) = 0.23 · 0.815−3 ≈ 0.00055 if 3 pits

38 / 44
An Example: The Wumpus World [cont.]

Specifying the probability model


Apply the product rule to the joint distribution
P(P1,1 , ..., P4,4 , B1,1 , B1,2 , B2,1 ) =
P(B1,1 , B1,2 , B2,1 |P1,1 , ..., P4,4 ) P(P1,1 , ..., P4,4 )
P(B1,1 , B1,2 , B2,1 |P1,1 , ..., P4,4 )
1 if one pit is adjacent to breeze,
0 otherwise
P(P1,1 , ..., P4,4 ): pits are placed randomly except in (1,1)
P(P1,1 , ..., P4,4 ) = 4i=1 4j=1 P(Pi,j )
Q Q

0.2 if (i, j) 6= (1, 1)}
P(Pi,j ) =
0 otherwise
ex: P(P1,1 , ..., P4,4 ) = 0.23 · 0.815−3 ≈ 0.00055 if 3 pits

38 / 44
An Example: The Wumpus World [cont.]

Specifying the probability model


Apply the product rule to the joint distribution
P(P1,1 , ..., P4,4 , B1,1 , B1,2 , B2,1 ) =
P(B1,1 , B1,2 , B2,1 |P1,1 , ..., P4,4 ) P(P1,1 , ..., P4,4 )
P(B1,1 , B1,2 , B2,1 |P1,1 , ..., P4,4 )
1 if one pit is adjacent to breeze,
0 otherwise
P(P1,1 , ..., P4,4 ): pits are placed randomly except in (1,1)
P(P1,1 , ..., P4,4 ) = 4i=1 4j=1 P(Pi,j )
Q Q

0.2 if (i, j) 6= (1, 1)}
P(Pi,j ) =
0 otherwise
ex: P(P1,1 , ..., P4,4 ) = 0.23 · 0.815−3 ≈ 0.00055 if 3 pits

38 / 44
An Example: The Wumpus World [cont.]

Inference by enumeration
General form of query: P
P(Y|E = e) = αP(Y, E = e) = α h P(Y, E = e, H = h)
Y: query vars; E,e: evidence vars/values; H,h: hidden vars/values
Our case: P(P1,3 |p∗ , b∗ ), s.t. the evidence is
def
b∗ = ¬b1,1 ∧ b1,2 ∧ b2,1
def
p∗ = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1
Sum over hidden variables:
∗ ∗
P1,3 |p , b ) =
P(P
α unknown P(P1,3 |p∗ , b∗ , unknown)
unknown are all Pij ’s s.t.
(i, j) 6∈ {(1, 1), (1, 2), (2, 1), (1, 3)}
=⇒ 216−4 = 4096 terms of the sum!
Grows exponentially in the number of hidden variables H!
=⇒ Inefficient
39 / 44
An Example: The Wumpus World [cont.]

Inference by enumeration
General form of query: P
P(Y|E = e) = αP(Y, E = e) = α h P(Y, E = e, H = h)
Y: query vars; E,e: evidence vars/values; H,h: hidden vars/values
Our case: P(P1,3 |p∗ , b∗ ), s.t. the evidence is
def
b∗ = ¬b1,1 ∧ b1,2 ∧ b2,1
def
p∗ = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1
Sum over hidden variables:
∗ ∗
P1,3 |p , b ) =
P(P
α unknown P(P1,3 |p∗ , b∗ , unknown)
unknown are all Pij ’s s.t.
(i, j) 6∈ {(1, 1), (1, 2), (2, 1), (1, 3)}
=⇒ 216−4 = 4096 terms of the sum! ( c S. Russell & P. Norwig, AIMA)

Grows exponentially in the number of hidden variables H!


=⇒ Inefficient
39 / 44
An Example: The Wumpus World [cont.]

Inference by enumeration
General form of query: P
P(Y|E = e) = αP(Y, E = e) = α h P(Y, E = e, H = h)
Y: query vars; E,e: evidence vars/values; H,h: hidden vars/values
Our case: P(P1,3 |p∗ , b∗ ), s.t. the evidence is
def
b∗ = ¬b1,1 ∧ b1,2 ∧ b2,1
def
p∗ = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1
Sum over hidden variables:
∗ ∗
P1,3 |p , b ) =
P(P
α unknown P(P1,3 |p∗ , b∗ , unknown)
unknown are all Pij ’s s.t.
(i, j) 6∈ {(1, 1), (1, 2), (2, 1), (1, 3)}
=⇒ 216−4 = 4096 terms of the sum! ( c S. Russell & P. Norwig, AIMA)

Grows exponentially in the number of hidden variables H!


=⇒ Inefficient
39 / 44
An Example: The Wumpus World [cont.]

Inference by enumeration
General form of query: P
P(Y|E = e) = αP(Y, E = e) = α h P(Y, E = e, H = h)
Y: query vars; E,e: evidence vars/values; H,h: hidden vars/values
Our case: P(P1,3 |p∗ , b∗ ), s.t. the evidence is
def
b∗ = ¬b1,1 ∧ b1,2 ∧ b2,1
def
p∗ = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1
Sum over hidden variables:
∗ ∗
P1,3 |p , b ) =
P(P
α unknown P(P1,3 |p∗ , b∗ , unknown)
unknown are all Pij ’s s.t.
(i, j) 6∈ {(1, 1), (1, 2), (2, 1), (1, 3)}
=⇒ 216−4 = 4096 terms of the sum!
Grows exponentially in the number of hidden variables H!
=⇒ Inefficient
39 / 44
An Example: The Wumpus World [cont.]
Using conditional independence
Basic insight: Given the fringe squares (see below), b∗ is
conditionally independent of the other hidden squares
def
Unknown = Fringe ∪ Other
def
=⇒ P(b∗ |p∗ , P1,3 , Unknown) = P(b∗ |p∗ , P1,3 , Fringe, Others) =
P(b∗ |p∗ , P1,3 , Fringe)
Next: manipulate the query into a form
where this equation can be used

( c S. Russell & P. Norwig, AIMA)

40 / 44
An Example: The Wumpus World [cont.]
Using conditional independence
Basic insight: Given the fringe squares (see below), b∗ is
conditionally independent of the other hidden squares
def
Unknown = Fringe ∪ Other
def
=⇒ P(b∗ |p∗ , P1,3 , Unknown) = P(b∗ |p∗ , P1,3 , Fringe, Others) =
P(b∗ |p∗ , P1,3 , Fringe)
Next: manipulate the query into a form
where this equation can be used

( c S. Russell & P. Norwig, AIMA)

40 / 44
An Example: The Wumpus World [cont.]
Using conditional independence
Basic insight: Given the fringe squares (see below), b∗ is
conditionally independent of the other hidden squares
def
Unknown = Fringe ∪ Other
def
=⇒ P(b∗ |p∗ , P1,3 , Unknown) = P(b∗ |p∗ , P1,3 , Fringe, Others) =
P(b∗ |p∗ , P1,3 , Fringe)
Next: manipulate the query into a form
where this equation can be used

( c S. Russell & P. Norwig, AIMA)

40 / 44
An Example: The Wumpus World [cont.]
P(p∗ , b∗ ) = P(p∗ , b∗ ) is scalar; use as a normalization constant

41 / 44
An Example: The Wumpus World [cont.]
Sum over the unknowns

41 / 44
An Example: The Wumpus World [cont.]
Use the product rule

41 / 44
An Example: The Wumpus World [cont.]
Separate unknown into fringe and other

41 / 44
An Example: The Wumpus World [cont.]
b∗ is conditionally independent of other given fringe

41 / 44
An Example: The Wumpus World [cont.]
Move P(b∗ |p∗ , P1,3 , fringe) outward

( c of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)


41 / 44
An Example: The Wumpus World [cont.]
All of the pit locations are independent

( c of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)


41 / 44
An Example: The Wumpus World [cont.]
Move P(p∗ ), P(P1,3 ), and P(fringe) outward

( c of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)


41 / 44
An Example: The Wumpus World [cont.]
P
Remove other P(other ) because it equals 1

( c of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)


41 / 44
An Example: The Wumpus World [cont.]
P(p∗ ) is scalar, so make it part of the normalization constant

( c of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)


41 / 44
An Example: The Wumpus World [cont.]

We have obtained:
P(P1,3 |p∗ , b∗ ) = α0 P(P1,3 ) fringe P(b∗ |p∗ , P1,3 , fringe)P(fringe)
P

We know that P(P1,3 ) = h0.2, 0.8i


We can compute the normalization coefficient α0 afterwards
∗ ∗
P
fringe P(b |p , P1,3 , fringe)P(fringe): only 4 possible fringes
Start by rewriting as two separate equations:
P( p1,3 |p∗ , b∗ ) = α0 P( p1,3 ) fringe P(b∗ |p∗ , p1,3 , fringe)P(fringe)
P

P(¬p1,3 |p∗ , b∗ ) = α0 P(¬p1,3 ) fringe P(b∗ |p∗ , ¬p1,3 , fringe)P(fringe)


P

( c S. Russell & P. Norwig, AIMA)

42 / 44
An Example: The Wumpus World [cont.]

We have obtained:
P(P1,3 |p∗ , b∗ ) = α0 P(P1,3 ) fringe P(b∗ |p∗ , P1,3 , fringe)P(fringe)
P

We know that P(P1,3 ) = h0.2, 0.8i


We can compute the normalization coefficient α0 afterwards
∗ ∗
P
fringe P(b |p , P1,3 , fringe)P(fringe): only 4 possible fringes
Start by rewriting as two separate equations:
P( p1,3 |p∗ , b∗ ) = α0 P( p1,3 ) fringe P(b∗ |p∗ , p1,3 , fringe)P(fringe)
P

P(¬p1,3 |p∗ , b∗ ) = α0 P(¬p1,3 ) fringe P(b∗ |p∗ , ¬p1,3 , fringe)P(fringe)


P

( c S. Russell & P. Norwig, AIMA)

42 / 44
An Example: The Wumpus World [cont.]

We have obtained:
P(P1,3 |p∗ , b∗ ) = α0 P(P1,3 ) fringe P(b∗ |p∗ , P1,3 , fringe)P(fringe)
P

We know that P(P1,3 ) = h0.2, 0.8i


We can compute the normalization coefficient α0 afterwards
∗ ∗
P
fringe P(b |p , P1,3 , fringe)P(fringe): only 4 possible fringes
Start by rewriting as two separate equations:
P( p1,3 |p∗ , b∗ ) = α0 P( p1,3 ) fringe P(b∗ |p∗ , p1,3 , fringe)P(fringe)
P

P(¬p1,3 |p∗ , b∗ ) = α0 P(¬p1,3 ) fringe P(b∗ |p∗ , ¬p1,3 , fringe)P(fringe)


P

( c S. Russell & P. Norwig, AIMA)

42 / 44
An Example: The Wumpus World [cont.]

We have obtained:
P(P1,3 |p∗ , b∗ ) = α0 P(P1,3 ) fringe P(b∗ |p∗ , P1,3 , fringe)P(fringe)
P

We know that P(P1,3 ) = h0.2, 0.8i


We can compute the normalization coefficient α0 afterwards
∗ ∗
P
fringe P(b |p , P1,3 , fringe)P(fringe): only 4 possible fringes
Start by rewriting as two separate equations:
P( p1,3 |p∗ , b∗ ) = α0 P( p1,3 ) fringe P(b∗ |p∗ , p1,3 , fringe)P(fringe)
P

P(¬p1,3 |p∗ , b∗ ) = α0 P(¬p1,3 ) fringe P(b∗ |p∗ , ¬p1,3 , fringe)P(fringe)


P

( c S. Russell & P. Norwig, AIMA)

42 / 44
An Example: The Wumpus World [cont.]

We have obtained:
P(P1,3 |p∗ , b∗ ) = α0 P(P1,3 ) fringe P(b∗ |p∗ , P1,3 , fringe)P(fringe)
P

We know that P(P1,3 ) = h0.2, 0.8i


We can compute the normalization coefficient α0 afterwards
∗ ∗
P
fringe P(b |p , P1,3 , fringe)P(fringe): only 4 possible fringes
Start by rewriting as two separate equations:
P( p1,3 |p∗ , b∗ ) = α0 P( p1,3 ) fringe P(b∗ |p∗ , p1,3 , fringe)P(fringe)
P

P(¬p1,3 |p∗ , b∗ ) = α0 P(¬p1,3 ) fringe P(b∗ |p∗ , ¬p1,3 , fringe)P(fringe)


P

( c S. Russell & P. Norwig, AIMA)

42 / 44
An Example: The Wumpus World [cont.]
Start by rewriting as two separate equations:
P( p1,3 |p∗ , b∗ ) = α0 P( p1,3 ) fringe P(b∗ |p∗ , p1,3 , fringe)P(fringe)
P

P(¬p1,3 |p∗ , b∗ ) = α0 P(¬p1,3 ) fringe P(b∗ |p∗ , ¬p1,3 , fringe)P(fringe)


P


of them, P(b |...) is 1 if the breezes occur, 0 otherwise:
P For each ∗ ∗
P(b |p , p1,3 , fringe)P(fringe) = 1·0.04+1·0.16+1·0.16+0 = 0.36
Pfringe ∗ ∗
fringe P(b |p , ¬p1,3 , fringe)P(fringe) = 1·0.04 + 1 · 0.16 + 0 + 0 = 0.2
=⇒ P(P1,3 |p∗ , b∗ ) = α0 P(P1,3 ) fringe P(b∗ |p∗ , P1,3 , fringe)P(fringe)
P
= α0 h0.2, 0.8ih0.36, 0.2i = α0 h0.072, 0.16i
= (normalization, s.t. α0 ≈ 4.31) ≈ h0.31, 0.69i

( c S. Russell & P. Norwig, AIMA)

43 / 44
An Example: The Wumpus World [cont.]
Start by rewriting as two separate equations:
P( p1,3 |p∗ , b∗ ) = α0 P( p1,3 ) fringe P(b∗ |p∗ , p1,3 , fringe)P(fringe)
P

P(¬p1,3 |p∗ , b∗ ) = α0 P(¬p1,3 ) fringe P(b∗ |p∗ , ¬p1,3 , fringe)P(fringe)


P


of them, P(b |...) is 1 if the breezes occur, 0 otherwise:
P For each ∗ ∗
P(b |p , p1,3 , fringe)P(fringe) = 1·0.04+1·0.16+1·0.16+0 = 0.36
Pfringe ∗ ∗
fringe P(b |p , ¬p1,3 , fringe)P(fringe) = 1·0.04 + 1 · 0.16 + 0 + 0 = 0.2
=⇒ P(P1,3 |p∗ , b∗ ) = α0 P(P1,3 ) fringe P(b∗ |p∗ , P1,3 , fringe)P(fringe)
P
= α0 h0.2, 0.8ih0.36, 0.2i = α0 h0.072, 0.16i
= (normalization, s.t. α0 ≈ 4.31) ≈ h0.31, 0.69i

( c S. Russell & P. Norwig, AIMA)

43 / 44
An Example: The Wumpus World [cont.]
Start by rewriting as two separate equations:
P( p1,3 |p∗ , b∗ ) = α0 P( p1,3 ) fringe P(b∗ |p∗ , p1,3 , fringe)P(fringe)
P

P(¬p1,3 |p∗ , b∗ ) = α0 P(¬p1,3 ) fringe P(b∗ |p∗ , ¬p1,3 , fringe)P(fringe)


P


of them, P(b |...) is 1 if the breezes occur, 0 otherwise:
P For each ∗ ∗
P(b |p , p1,3 , fringe)P(fringe) = 1·0.04+1·0.16+1·0.16+0 = 0.36
Pfringe ∗ ∗
fringe P(b |p , ¬p1,3 , fringe)P(fringe) = 1·0.04 + 1 · 0.16 + 0 + 0 = 0.2
=⇒ P(P1,3 |p∗ , b∗ ) = α0 P(P1,3 ) fringe P(b∗ |p∗ , P1,3 , fringe)P(fringe)
P
= α0 h0.2, 0.8ih0.36, 0.2i = α0 h0.072, 0.16i
= (normalization, s.t. α0 ≈ 4.31) ≈ h0.31, 0.69i

( c S. Russell & P. Norwig, AIMA)

43 / 44
Exercise

Compute P(P2,2 |p∗ , b∗ ) in the same way.

44 / 44

You might also like