0% found this document useful (0 votes)
2 views59 pages

Lecture7 - Probabilistic Reasoning (Updated)

The document provides an introduction to probabilistic reasoning in artificial intelligence, focusing on quantifying uncertainty, Bayesian networks, and probabilistic inference over time. It discusses the challenges of acting under uncertainty, the role of probability theory in decision-making, and the basics of probability notation and distributions. Additionally, it covers concepts like joint distributions, conditional probabilities, and normalization in the context of probabilistic inference.

Uploaded by

tssdhanvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views59 pages

Lecture7 - Probabilistic Reasoning (Updated)

The document provides an introduction to probabilistic reasoning in artificial intelligence, focusing on quantifying uncertainty, Bayesian networks, and probabilistic inference over time. It discusses the challenges of acting under uncertainty, the role of probability theory in decision-making, and the basics of probability notation and distributions. Additionally, it covers concepts like joint distributions, conditional probabilities, and normalization in the context of probabilistic inference.

Uploaded by

tssdhanvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Introduction to Artificial Intelligence

Probabilistic Reasoning

University of Technology and Applied Sciences


Computing and Information Sciences
Outline

1. Quantifying Uncertainty

2. Bayesian Networks

3. Probabilistic Inference Over Time

CCIS@UTAS CSDS3203 2
Acting Under Uncertainty

• Real world application contain uncertainties due to:


◦ Partial observability
◦ Nondeterminism, or
◦ Adversaries
• Example of dental diagnoses using propositional logic:
◦ 𝑇𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 → 𝐶𝑎𝑣𝑖𝑡𝑦
• However, the above is inaccurate. Not all patients with toothaches have cavities.
◦ 𝑇𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 → 𝐶𝑎𝑣𝑖𝑡𝑦 ∨ 𝐺𝑢𝑚𝑃𝑟𝑜𝑏𝑙𝑒𝑚 ∨ 𝐴𝑏𝑠𝑐𝑒𝑠𝑠 …
• In order to make the rule true, we have to add an almost unlimited list of possible
problems.
• The only way to fix the rule to make logically exhaustive.

CCIS@UTAS CSDS3203 3
Acting Under Uncertainty, Cont’d

• An agent strives to choose the right thing to do—the rational decision—depends on


both the relative importance of various goals and the likelihood that, and degree to
which, they will be achieved.

• Large domains such as medical diagnosis fail to three main reasons:


◦ Laziness: It is too much work to list the complete set of antecedents or consequents needed to
ensure an exceptionless rule
◦ Theoretical ignorance: Medical science has no complete theory for the domain
◦ Practical ignorance: Even if we know all the rules, we might be uncertain about a particular patient
because not all the necessary tests have been or can be run.

• An agent only has a degree of belief in the relevant sentences.

CCIS@UTAS CSDS3203 4
Acting Under Uncertainty, Cont’d

• Probability theory
◦ tool to deal with degrees of belief of relevant sentences.
◦ summarizes the uncertainty that comes from our laziness and ignorance
• Uncertainty and rational decisions
◦ An agent requires preference among different possible outcomes of various plans
◦ Utility Theory: the quality of the outcome being useful
− Every state has a degree of usefulness/utility
− Higher utility is preferred
◦ Decision Theory: Preferences (Utility Theory) combined with probabilities
− Decision theory = probability theory + utility theory.
− agent is rational if and only if it chooses the action that yields the highest expected utility, averaged over all
the possible outcomes of the action.
− principle of maximum expected utility (MEU).

CCIS@UTAS CSDS3203 5
Basic Probability Notation
• For our agent to represent and use probabilistic information, we need a formal
language.
• Sample space: the set of all possible world
◦ The possible worlds are mutually exclusive and exhaustive.
• A fully specified probability model associates a numerical probability 𝑃 𝜔 with each
possible world.
• The basic axioms of probability theory say that every possible world has a probability
between 0 and 1 and that the total probability of the set of possible worlds is 1:

0 ≤ 𝑃 𝜔 ≤ 1 for every 𝜔 and 𝑤 ∈ Ω

• Unconditional or prior probability: degrees of belief in propositions in the absence of


any other information.

CCIS@UTAS CSDS3203 6
Basic Probability Notation, Cont’d
• Conditional or posterior probability: given evidence that and event has happened,
the degree of belief of new event
◦ Makes use of unconditional probabilities

• Probability of 𝑎 given 𝑏
𝑃(𝑎 ∧ 𝑏)
𝑃 𝑎𝑏 =
𝑃(𝑏)
• Can also be written as:
𝑃 𝑎 ∧ 𝑏 = 𝑃 𝑎 𝑏) 𝑃(𝑏)

• Example of rolling a fair dice, rolling doubles when the first dice is 5

𝑃(𝑑𝑜𝑢𝑏𝑙𝑒𝑠 ∧ 𝐷𝑖𝑒1 = 5)
𝑃 𝑑𝑜𝑢𝑏𝑙𝑒𝑠 𝐷𝑖𝑒1 = 5 =
𝑃(𝐷𝑖𝑒1 = 5)
CCIS@UTAS CSDS3203 7
Random Variables and Probability Distribution

• Variables in probability are called random variables.


◦ A random variable is a variable whose possible values are numerical outcomes of a random
phenomenon.
• A random variable is associated with a probability distribution that prescribes the
probabilities of its values. Here is an example of a probability distribution of the random
variable 𝑊𝑒𝑎𝑡ℎ𝑒𝑟:
◦ 𝑃 𝑊𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑠𝑢𝑛 = 0.6
◦ 𝑃(𝑊𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑟𝑎𝑖𝑛) = 0.1
The probability values in
◦ 𝑃(𝑊𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑐𝑙𝑜𝑢𝑑) = 0.29 probability distribution
◦ 𝑃(𝑊𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑠𝑛𝑜𝑤) = 0.01 must add up to 1.

• The above can be abbreviated as:


◦ 𝑷(𝑊𝑒𝑎𝑡ℎ𝑒𝑟) = < 0.6, 0.1, 0.29, 0.01 >
• 𝑷 symbol defines a probability distribution for the random variable 𝑊𝑒𝑎𝑡ℎ𝑒𝑟

CCIS@UTAS CSDS3203 8
Inference Using Full Joint Distribution, Cont’d
• Start with the joint distribution
𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 ¬𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒
𝑐𝑎𝑡𝑐ℎ ¬𝑐𝑎𝑡𝑐ℎ 𝑐𝑎𝑡𝑐ℎ ¬𝑐𝑎𝑡𝑐ℎ
𝑐𝑎𝑣𝑖𝑡𝑦 0.108 0.012 0.072 0.008
¬𝑐𝑎𝑣𝑖𝑡𝑦 0.016 0.064 0.144 0.576

• Note, probabilities in a full joint distribution add up to 1.


• For any given proposition (𝜔), sum the values where it is true.
• Examples:
◦ 𝑃 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
◦ 𝑃 𝑐𝑎𝑣𝑖𝑡𝑦 ∨ 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28
◦ 𝑃 ¬𝑐𝑎𝑣𝑖𝑡𝑦 ∧ 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 = ? ?

CCIS@UTAS CSDS3203 9
Inference Using Full Joint Distribution, Cont’d
• Start with the joint distribution
𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 ¬𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒
𝑐𝑎𝑡𝑐ℎ ¬𝑐𝑎𝑡𝑐ℎ 𝑐𝑎𝑡𝑐ℎ ¬𝑐𝑎𝑡𝑐ℎ
𝑐𝑎𝑣𝑖𝑡𝑦 0.108 0.012 0.072 0.008
¬𝑐𝑎𝑣𝑖𝑡𝑦 0.016 0.064 0.144 0.576

• We can also compute conditional probabilities:

𝑃(¬𝑐𝑎𝑣𝑖𝑡𝑦 ∧ 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒) 0.016+0.064


• 𝑃 ¬𝑐𝑎𝑣𝑖𝑡𝑦 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒) = = = 0.4
𝑃(𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒) 0.108+0.012+0.016+0.064

CCIS@UTAS CSDS3203 10
Normalization
𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 ¬𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒
𝑐𝑎𝑡𝑐ℎ ¬𝑐𝑎𝑡𝑐ℎ 𝑐𝑎𝑡𝑐ℎ ¬𝑐𝑎𝑡𝑐ℎ
𝑐𝑎𝑣𝑖𝑡𝑦 0.108 0.012 0.072 0.008
¬𝑐𝑎𝑣𝑖𝑡𝑦 0.016 0.064 0.144 0.576

• We can also compute the distribution of a query variable (given some evidence)
• What is the probability distribution of 𝐶𝑎𝑣𝑖𝑡𝑦 given 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒.
• General idea: we compute the distribution on a query variable (𝐶𝑎𝑣𝑖𝑡𝑦) by fixing
evidence variables (𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒) and summing over hidden variables (𝐶𝑎𝑡𝑐ℎ).

CCIS@UTAS CSDS3203 11
Normalization, Cont’d
𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 ¬𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒
𝑐𝑎𝑡𝑐ℎ ¬𝑐𝑎𝑡𝑐ℎ 𝑐𝑎𝑡𝑐ℎ ¬𝑐𝑎𝑡𝑐ℎ
𝑐𝑎𝑣𝑖𝑡𝑦 0.108 0.012 0.072 0.008
¬𝑐𝑎𝑣𝑖𝑡𝑦 0.016 0.064 0.144 0.576

• The denominator can be viewed as a normalization constant 𝛼


𝑃(𝐶𝑎𝑣𝑖𝑡𝑦, 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒)
• 𝑃 𝐶𝑎𝑣𝑖𝑡𝑦 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 = = 𝛼𝑃(𝐶𝑎𝑣𝑖𝑡𝑦, 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒)
𝑃(𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒)
• = 𝛼 𝑃 𝐶𝑎𝑣𝑖𝑡𝑦, 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒, 𝑐𝑎𝑡𝑐ℎ + 𝑃 𝐶𝑎𝑣𝑖𝑡𝑦, 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒, ¬𝑐𝑎𝑡𝑐ℎ
• = 𝛼 0.108, 0.016 + 0.012, 0.064
• = 𝛼 0.12, 0.08 = (0.6, 0.4)

Side Note: In probability the


AND operator (∧) can be
replaced by a comma (,)

CCIS@UTAS CSDS3203 12
Normalization, Exercise
𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 ¬𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒
𝑐𝑎𝑡𝑐ℎ ¬𝑐𝑎𝑡𝑐ℎ 𝑐𝑎𝑡𝑐ℎ ¬𝑐𝑎𝑡𝑐ℎ
𝑐𝑎𝑣𝑖𝑡𝑦 0.108 0.012 0.072 0.008
¬𝑐𝑎𝑣𝑖𝑡𝑦 0.016 0.064 0.144 0.576

• Find the probability distribution defined by 𝑃 𝑇𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 𝑐𝑎𝑣𝑖𝑡𝑦 .

CCIS@UTAS CSDS3203 13
Inference Using Full Joint Distribution
• Let X be all the variables. Typically we want:
◦ The joint distribution of the query variables 𝑌
◦ Given specific values e for the evidence variables 𝐸
• Let the hidden variables by 𝐻 = 𝑋 – 𝑌 – 𝐸
• The required summation of joint entries is done by summing out the hidden variables:
◦ 𝑃 𝑌 𝐸 = 𝑒) = 𝛼𝑃 𝑌, 𝐸 = 𝑒 = 𝛼 σℎ 𝑃(𝑌, 𝐸 = 𝑒, 𝐻 = ℎ)
• The terms in the summation are joint entries because 𝑌, 𝐸, and 𝐻 together exhaust the
set of random variables.
• Obvious problems:
◦ The worst-case time complexity 𝑂 𝑑 𝑛 where d is the largest arity (e.g., 2 in the case of Boolean
variables)
◦ Space complexity 𝑂 𝑑 𝑛 to store the joint distribution
◦ How to find the number of 𝑂 𝑑 𝑛 entries??

CCIS@UTAS CSDS3203 14
Independence
• Independence is the knowledge that the occurrence of one event does not affect the
probability of the other event. For example,
◦ One’s dental problems has nothing to do with the weather.
◦ Coin flips are independent

• If event 𝑎 and 𝑏 are independent then:


◦ 𝑃 𝑎 𝑏) = 𝑃 𝑎
◦ 𝑃 𝑎, 𝑏 = 𝑃 𝑎 𝑃 𝑏
• 𝑃 𝑐𝑙𝑜𝑢𝑑 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒, 𝑐𝑎𝑡𝑐ℎ, 𝑐𝑎𝑣𝑖𝑡𝑦 = 𝑃(𝑐𝑙𝑜𝑢𝑑)
• 𝑃 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒, 𝑐𝑎𝑡𝑐ℎ, 𝑐𝑎𝑣𝑖𝑡𝑦, 𝑐𝑙𝑜𝑢𝑑 = 𝑃 𝑐𝑙𝑜𝑢𝑑 𝑃(𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒, 𝑐𝑎𝑡𝑐ℎ, 𝑐𝑙𝑜𝑢𝑑)

CCIS@UTAS CSDS3203 15
Bayes’ Rule and Its Use
• Bayes' rule is derived from the product rule
• 𝑃 𝑎, 𝑏 = 𝑃 𝑎 𝑏 𝑃(𝑏) and 𝑃 𝑎, 𝑏 = 𝑃 𝑏 𝑎 𝑃(𝑎)
• Equating the two right hand side and dividing be 𝑃(𝑎), we get
𝑃 𝑏 𝑎 𝑃(𝑎)
•𝑃 𝑎𝑏 =
𝑃(𝑏)
• Often, we perceive as evidence the effect of some unknown cause and we would like
to determine that cause. In that case Bayes’ rules becomes:
𝑃 𝑒𝑓𝑓𝑒𝑐𝑡 𝑐𝑎𝑢𝑠𝑒 𝑃(𝑐𝑎𝑢𝑠𝑒)
• 𝑃 𝑐𝑎𝑢𝑠𝑒 𝑒𝑓𝑓𝑒𝑐𝑡 =
𝑃(𝑒𝑓𝑓𝑒𝑐𝑡)
• The conditional probability 𝑃(𝑒𝑓𝑓𝑒𝑐𝑡|𝑐𝑎𝑢𝑠𝑒) quantifies the relationship in the causal
direction, whereas 𝑃(𝑐𝑎𝑢𝑠𝑒|𝑒𝑓𝑓𝑒𝑐𝑡) describes the diagnostic direction.

CCIS@UTAS CSDS3203 16
Bayes’ Rule Example

• A doctor knows that the disease meningitis causes a patient to have a stiff neck, say,
70% of the time. The doctor also knows some unconditional facts: the prior probability
that any patient has meningitis is 1/50,000, and the prior probability that any patient
has a stiff neck is 1%. Letting 𝑠 be the proposition that the patient has a stiff neck and
𝑚 be the proposition that the patient has meningitis, what is the probability of
meningitis given that someone have a stiff neck.

CCIS@UTAS CSDS3203 17
Bayes’ Rule Solution

• A doctor knows that the disease meningitis causes a patient to have a stiff neck, say,
70% of the time. The doctor also knows some unconditional facts: the prior probability
that any patient has meningitis is 1/50,000, and the prior probability that any patient
has a stiff neck is 1%. Letting 𝑠 be the proposition that the patient has a stiff neck and
𝑚 be the proposition that the patient has meningitis, what is the probability of
meningitis given that someone have a stiff neck.

• 𝑃(𝑠 𝑚 = 0.7
1
•𝑃 𝑚 =
50000
• 𝑃 𝑠 = 0.01
𝑃 𝑠𝑚 𝑃(𝑚) 0.7×1/50000
•𝑃 𝑚𝑠 = = = 0.0014
𝑃(𝑠) 0.01

CCIS@UTAS CSDS3203 18
Outline

1. Quantifying Uncertainty

2. Bayesian Networks

3. Probabilistic Inference Over Time

CCIS@UTAS CSDS3203 19
What is a Bayesian Network?

• A Bayesian network is a graphical model that represents conditional


dependencies among random variables.
• Formally, a Bayesian network is a directed graph in which each node is
annotated with quantitative probability information. The fill specification as
follows:
◦ Each node represent a random variable, which may be discrete or continuous.
◦ Edges connect a pair of nodes. If there is an edge from Node 𝑋 to node 𝑌, 𝑋 is said
to be the parent of Y. The has no direct cycles and hence is a directed acyclic
graph, or DAG.
◦ Each node 𝑋𝑖 has an associated probability distribution 𝑃 𝑋𝑖 𝑃𝑎𝑟𝑒𝑛𝑡𝑠 𝑋𝑖 that
quantifies the effect of the parents on the node.

CCIS@UTAS CSDS3203 20
Example

• Dr. Khamis Juma Al Sabti is out of the country for a conference. Before
leaving he installed a new alarm system in his house. The alarm can go off
for one of two causes, either the house is burgled or there is an
earthquake. If the the alarm goes off, one of Dr. Khamis siblings (Moza
and Salim) will call. Note that Moza and Salim will not confer with each
other before calling.

• Draw the Bayesian network for this scenario.

CCIS@UTAS CSDS3203 21
Bayesian Network: Alarm Example

Burglary Earthquake

Alarm

Salim Calls Moza Calls

CCIS@UTAS CSDS3203 22
Exercise

• The stock price of an oil company ( 𝑆𝑃 ), which is assumed to be


multivalued discrete variable is affected by both the condition of the oil
industry (𝑂𝐼) and the stock market (𝑆𝑀). Both 𝑂𝐼 and 𝑆𝑀 are Boolean
variables with values good and bad. Further, the interest rate (𝐼𝑅) set by
the central bank impacts the performance of the stock market. For
simplicity, assume that the 𝐼𝑅 can be either high or low.

• Draw a Bayesian network for this domain

CCIS@UTAS CSDS3203 23
Conational Probability Tables (CPTs)

• The probability distribution of each node in Bayesian network is specified as a


conditional probability table (CPT).
• A conditional probability table show the probability of each possible outcome of a
random variable, given the value of one or more other random variables.
• For example, a conditional probability table could be used to show the probability that a
person has a certain 𝐷𝑖𝑠𝑒𝑎𝑠𝑒, given their 𝐴𝑔𝑒 and 𝐺𝑒𝑛𝑑𝑒𝑟.
• In this case, the table would have three columns: one for outcome (the presence or
absence of the disease), on for the age of the person, and one for their gender.

CCIS@UTAS CSDS3203 24
CPTs: Alarm Example
𝑷(𝑩) 𝑷(𝑬)
true false true false
Burglary Earthquake
0.001 0.999 0.002 0.998

𝑷(𝐀|𝐁, 𝐄)
B E
true false
t t 0.70 0.30
Alarm t f 0.01 0.99
f t 0.70 0.30
f f 0.01 0.99

𝑷(𝑺|𝑨)
A
true false Salim Calls 𝑷(𝐌|𝑨) Moza Calls
A
t 0.90 0.10 true false
f 0.05 0.95 t 0.70 0.30

CCIS@UTAS
f 0.01 0.99
CSDS3203 25
Exercise

• Define probability tables for the Bayesian network we have created in the
Exercise in slide 21. You can give any plausible probability values.

CCIS@UTAS CSDS3203 26
The Semantics of Bayesian Networks

• A Bayesian Network allows us to calculate the joint probability of n random variables


using the following decomposition 𝑛

𝑃 𝑋1 , 𝑋2 , … , 𝑋𝑛 = ෑ 𝑃 𝑋𝑖 𝑃𝑎𝑟𝑟𝑒𝑛𝑡𝑠 𝑋𝑖
𝑖=1
• For example, we calculate the probability of:
𝐵 = 𝑡𝑟𝑢𝑒, 𝐸 = 𝑡𝑟𝑢𝑒, 𝐴 = 𝑓𝑎𝑙𝑠𝑒, 𝑀 = 𝑓𝑎𝑙𝑠𝑒, 𝑆 = 𝑓𝑎𝑙𝑠𝑒:

𝑃 𝑏, 𝑒, ¬𝑎, ¬𝑚, ¬𝑠
= 𝑃 ¬𝑚 ¬𝑎 𝑃 ¬𝑠 ¬𝑎 𝑃 ¬𝑎 𝑏, 𝑒 𝑃 𝑏 𝑃 𝑎
= 0.99 × 0.95 × 0.30 × 0.001 × 0.002
= 5.64 × 10−7

CCIS@UTAS CSDS3203 27
Probabilistic Inference Using Bayesian Networks

• Probabilistic inference is a method of reasoning that allows us to


determine the likelihood of something based on incomplete or uncertain
information.

• The outcome of probabilistic inference is either a probability value of some


event or a probability distribution for a random variable.

• Inference in Bayesian networks is a process that uses the structure of the


network and the probabilities of the variables to make prediction or
determine the likelihood of certain events.

CCIS@UTAS CSDS3203 28
Probabilistic Inference Using Bayesian Networks

• Inference in Bayesian networks utilizes various components to compute probability


distributions.
• Types of variables
◦ Query Variable (𝑌): This is the target variable for which the probability distribution is desired.
◦ Evidence Variables (𝐸): These are the observed variables that provide context and information for
inference. For instance, if an earthquake is observed, it informs the likelihood for related events.
◦ Hidden Variables (𝐻): These variables are neither directly observed (evidence) nor the current focus
of the query, but they may influence other variables within the network. For example, the occurrence
of an earthquake may or may not trigger an alarm, and if we haven't observed the alarm, it remains a
hidden variable.
• The objective of inference is to determine the probability of query variable given the
evidence, denoted as 𝑃(𝑌|𝐸). For example, we aim to assess the probability of Moza
making a call given the evidence of an earthquake occurrence.

CCIS@UTAS CSDS3203 29
Inference through Enumeration in Bayesian Networks
• This approach involves calculating the probability for a specific variable or a group of
variables within a Bayesian network.
• The process entails listing all possible value combination for the network’s variables
and computing the exact probability for each possible combination.
• To compute the probability distribution of a query Variable 𝑌 given the evidence 𝐸 and
accounting for any hidden variable 𝐻, the following formula is applied:

𝑃 𝑌 𝐸 = 𝛼 ෍ 𝑃(𝑌, 𝐸, ℎ)
ℎ∈𝐻
• Where 𝛼 is a normalizing constant ensuring that the probabilities sum to 1.

• Note: This technique is also known as Exact Inference.

CCIS@UTAS CSDS3203 30
Exercise

Using the CPTs in Slide 23 calculate the following probabilities using


inference by enumeration:
◦ Calculate the probability that a burglary occur with no earthquake and the alarm
does not sound. Only Salim calls and Moza does not.
◦ Given that that both burglary and an earthquake has occurred, what is the
probability distribution that Salm class.

CCIS@UTAS CSDS3203 31
Homework

• Given that that both burglary and an earthquake has occurred, what is the probability
distribution that Moza class.

Challenging:

• Given that both Salim and Moza call, calculate the probability distribution of Burglary.

CCIS@UTAS CSDS3203 32
Computational Complexity of Enumeration-Based Inference
• While enumeration is a valuable technique for calculating probabilities in Bayesian
networks, its computational demands increase with the network's complexity due to the
growing number of variable combinations.
• Complexity Considerations
◦ Single-Connection Networks:
− Time and space complexity is linear (𝑂(𝑛)), making enumeration tractable.
− Defined as networks where a single undirected path exists between any two nodes.
◦ Multiple-Connection Networks
− Complexity can skyrocket to exponential levels in the worst cases (𝑂(𝑐 𝑛 )), where 𝑐 is a constant factor and
𝑛 is the count of nodes.
− These are networks with several undirected paths between nodes.
• Strategies for Complex Networks
◦ To manage complex networks more effectively, we must employ more efficient computational
strategies, such as sampling methods.

CCIS@UTAS CSDS3203 33
Sampling as an Approximation Technique in Bayesian Networks

• For intricate networks, sampling provides an efficient means to approximate


probabilities, circumventing the high computational costs of exact methods.
• Sampling involves generating random samples from the network and then leveraging
these samples to estimate event probabilities.
• The steps for performing sampling in Bayesian networks are as follows:
◦ Select an event for which the probability is sought.
◦ Identify all variables pertinent to the chosen event.
◦ Use the network's distributions to produce a substantial quantity of random samples.
◦ Determine the proportion of samples where the chosen event occurs and use this ratio to
approximate the probability of the event.
• Note: This approach allows for probability estimations that are close to true values,
especially with a large number of samples, thus facilitating analysis of complex
networks.

CCIS@UTAS CSDS3203 34
Can Insurance Bayesian Network
SocioEcon
Age
GoodStudent
ExtraCar
Mileage
Sampling methods RiskAversion
VehicleYear
becomes useful when SeniorTrain
dealing with large,
complex networks like DrivingSkill MakeModel

the one shown here. DrivingHist


Antilock
DrivQuality Airbag CarValue HomeBase AntiTheft

Ruggedness Accident
Theft
OwnDamage
Cushioning OwnCost
OtherCost

MedicalCost LiabilityCost PropertyCost

CCIS@UTAS CSDS3203 35
Example 𝑷(𝑩) 𝑷(𝑬)
true false true false
0.001 0.999 0.002 0.998

Earthquake
𝑷(𝐀|𝐁, 𝐄)
Burglary B E
true false
t t 0.70 0.30
t f 0.01 0.99
Alarm f t 0.70 0.30
𝑷(𝑺|𝑨) f f 0.01 0.99
A
true false 𝑷(𝐌|𝑨)
A
t 0.90 0.10 Salim Calls true false Moza Calls
f 0.05 0.95 t 0.70 0.30
f 0.01 0.99
B E A S M

CCIS@UTAS CSDS3203 36
Example 𝑷(𝑩)
true false
0.001 0.999

Earthquake
Burglary

Alarm

Salim Calls Moza Calls

Sample B
B E A S M

CCIS@UTAS CSDS3203 37
Example 𝑷(𝑬)
true false
0.002 0.998

Earthquake
Burglary

Alarm

Salim Calls Moza Calls

Sample E
B E A S M
false
CCIS@UTAS CSDS3203 38
Example

Earthquake
𝑷(𝐀|𝐁, 𝐄)
Burglary B E
true false
t t 0.70 0.30
t f 0.01 0.99
Alarm f t 0.70 0.30
f f 0.01 0.99

Salim Calls Moza Calls

Sample A
B E A S M
false true
CCIS@UTAS CSDS3203 39
Example

Earthquake
Burglary

Alarm
𝑷(𝑺|𝑨)
A
true false
t 0.90 0.10 Salim Calls Moza Calls
f 0.05 0.95

Sample S
B E A S M
false true true
CCIS@UTAS CSDS3203 40
Example

Earthquake
Burglary

Alarm

𝑷(𝐌|𝑨)
A
Salim Calls true false Moza Calls
t 0.70 0.30
f 0.01 0.99 Sample M
B E A S M
false true true true
CCIS@UTAS CSDS3203 41
Example

Earthquake
Burglary

Alarm

Salim Calls Moza Calls

Sample M
B E A S M
false true true true false
CCIS@UTAS CSDS3203 42
Generated Samples
The previous slide showed how we generate a single sample. The process is repeated
many times to generate a large number of samples.
B E A S M B E A S M
false true true true false false true true true true

B E A S M B E A S M
false true true false false true true true true false

B E A S M B E A S M
true false true false false false false false true true

B E A S M B E A S M
true true true false false false false true true false

……

CCIS@UTAS CSDS3203 43
Using Sample to Estimate Probabilities
To use sampling to calculate the probability distribution of an earthquake given that Moza
calls (i.e, 𝑃(𝐸|𝑀 = 𝑡𝑟𝑢𝑒)), follow these steps:
1. Collect a large number of sample using the method describe in the previous slides,
say 10,000 samples.
2. Filter Samples: Go through the 10,000 sample and filter out those samples where
Moza calls (i.e., 𝑀 = 𝑡𝑟𝑢𝑒).
3. Count Relevant Samples: Count the number of filtered samples where the
earthquake occurred (i.e., 𝐸 = 𝑡𝑟𝑢𝑒)
4. Calculate Probability: Calculate the conditional probability 𝑃(𝐸 = 𝑡𝑟𝑢𝑒|𝑀 = 𝑡𝑟𝑢𝑒) by
dividing the count of relevant samples by the total number of samples where Moza
call. The formula for the probability is:
Number of samples with 𝐸 = 𝑡𝑟𝑢𝑒 and 𝑀 = 𝑡𝑟𝑢𝑒
𝑃 𝐸 𝑀 = 𝑡𝑟𝑢𝑒 =
Total number of sample where 𝑀 = 𝑡𝑟𝑢𝑒
Repeat steps 3 and 4 with 𝐸 = 𝑓𝑎𝑙𝑠𝑒.

CCIS@UTAS CSDS3203 44
Outline

1. Quantifying Uncertainty

2. Bayesian Networks

3. Probabilistic Inference Over Time

CCIS@UTAS CSDS3203 45
Probabilistic Inference Over Time

• So far, we have primarily discussed probabilities based on observed information,


without incorporating the dimension of time.

• To integrate time, we will introduce a new variable, 𝑋, and change it based on the event
of interest, such that 𝑋𝑡 is the current event, 𝑋𝑡+1 is the next event, and so on.

• For predicting future events, we will utilize Markov Models that lever time-structured
variable 𝑋.

CCIS@UTAS CSDS3203 46
Why Inference Over Time?
• Inference over time is crucial in many fields that requires modeling sequential to time-
dependent phenomena.

• Examples:
◦ Weather Forecasting: Predicting weather conditions helps in planning agricultural activities,
construction projects, and disaster preparedness.

◦ Healthcare Monitoring: Continuous monitoring of patient vitals can predict critical health events,
allowing for timely intervention.

◦ Text Prediction: In applications like email or messaging, predicting the next word or sentence helps
in faster and more efficient communication.

◦…

CCIS@UTAS CSDS3203 47
The Markov Assumption
• Markov Assumption Basics: The Markov assumption posits that the current state
depends only on a limited, fixed number of previous states, simplifying the prediction
process.
• Practical Necessity: Considering all past data (e.g., a year's weather) for predictions is
impractical due to computational limitations and diminishing relevance of older data.
• Application Example: In weather forecasting, using the Markov assumption allows the
consideration of only recent data (e.g., the previous few days) rather than the entire
historical record.
• Simplification and Efficiency: By applying the Markov assumption, predictions become
more computationally feasible and manageably approximate, although they might be
less precise.
• Specific Model Use: Markov models often utilize data from the most recent event (e.g.,
using today’s weather to predict tomorrow's) to efficiently forecast future states.

CCIS@UTAS CSDS3203 48
Markov Chains

• Definition: A Markov chain consists of a sequence of random variables, each influenced


only by the immediate preceding state in accordance with the Markov assumption.

• Sequential Dependence: The occurrence of each event in the chain is determined by


the state of the event directly before it, reflecting a dependency solely on this prior
state.

• Construction Requirements: To build a Markov chain, a transition model is essential,


detailing the probability distributions for future events based on the current event's
state.
◦ The transition model the probability of going from one event to another.

CCIS@UTAS CSDS3203 49
Transition Model: Example

Tomorrow (𝑋𝑡+1 )

Today (𝑋𝑡 ) 0.8 0.2

0.3 0.7

CCIS@UTAS CSDS3203 50
First Markov Chain
Given the transition model in the previous slide we can calculate the probability of a
sequence of events, such as observing two sunny days followed by four rainy days.

◦ Assume that the initial probability of both events (rainy or sunny) is 0.5.

𝑋0 𝑋1 𝑋2 𝑋3 𝑋4 𝑋5

0.5 0.8 0.2 0.7 0.7 0.7

𝑃 𝑠𝑢𝑛, 𝑠𝑢𝑛, 𝑟𝑎𝑖𝑛, 𝑟𝑎𝑖𝑛, 𝑟𝑎𝑖𝑛, 𝑟𝑎𝑖𝑛 = 0.5 × 0.8 × 0.2 × 0.7 × 0.7 × 0.7 = 0.02744

CCIS@UTAS CSDS3203 51
Inference with Markov Chains
• Given some sequence 𝑋 of length 𝑡, we can compute how probable the sequence
given a Markov chain model using the following formula:

𝑃 𝑋 = 𝑃 𝑥𝑖 ෑ 𝑃(𝑥𝑖 |𝑥𝑖−1 )
𝑖=2

• A key property of (first order) Markov chain is that the probability of each 𝑥𝑖 depends
only on the value of 𝑥𝑖−1

CCIS@UTAS CSDS3203 52
Hidden Markov Models

• In some applications, observations we make are influenced by hidden (to us) states.
• Here we need a mode that allows us to measure the outcomes of hidden states.
◦ We can observe the events generated by the states, but not the states themselves.
• Hidden Markov models (HMMs) are probabilistic models that involve underlying states
that are not directly observable, which influence observable evens.
• Application:
◦ In speech recognition: spoken words (hidden states) are inferred from sound waves (observations).
◦ In Web search: user engagement with the results is a hidden state which is computed from
clickthrough log (observations.

CCIS@UTAS CSDS3203 53
Using HHMs
• Consider having a camera outside your home that records people carrying umbrellas.
People carry umbrellas both on sunny and rainy days. However, some people never
carry umbrellas, regardless of the weather. In this scenario, observing someone with or
without an umbrella is the observable event, while the actual weather condition (sunny
or rainy) represents the hidden state.

Observation (𝐸𝑡 )

States (𝑋𝑡 ) 0.2 0.8


0.9 0.1

CCIS@UTAS CSDS3203 54
Sensor Markov Assumption

• The sensor Markov assumption assumes that the observable evidence depends solely
on the corresponding hidden state.

• In our example, it's assumed that carrying an umbrella is dictated only by the weather
condition.

• Limitation of Reality: The assumption may not capture all factors, as some individuals
carry umbrellas irrespective of weather based on personal habits or preferences.

• By ignoring individual behavioral nuances, the sensor Markov assumption simplifies


the relationship between observations and hidden states.

CCIS@UTAS CSDS3203 55
HMMs A Two Layers View
A hidden Markov model can be depicted as a two-layer Markov chain, where the top
layer, variable 𝑋, represents the hidden state, and the bottom layer, variable 𝐸,
represents the observable evidence.

𝑋0 𝑋1 𝑋2 𝑋3 𝑋4

𝐸0 𝐸1 𝐸2 𝐸3 𝐸4

CCIS@UTAS CSDS3203 56
Inference on HMMs
Hidden Markov models facilitate several key tasks:

• Filtering: Computes the current state's probability distribution based on all prior
observations, such as determining if it's raining today based on historical umbrella
usage.

• Prediction: Estimates future state probabilities using past and present observations.

• Smoothing: Determines past state probabilities using data up to the present, like
predicting yesterday’s weather from today’s umbrella sightings.

• Most Likely Explanation: Identifies the most probable sequence of events based on
observed data.
CCIS@UTAS CSDS3203 57
Recommended Reading

For more on the topics covered in this lecture please refer to the following
sources:
• Russell-Norvig Book (Russell & Norvig, 2020): Sections 12.1 – 12.5
(Quantifying Uncertainty), Sections 13.1 – 13.3 (Probabilistic
Reasoning), Sections 14.1 – 14.3 (Probabilistic Reasoning Over
Time).

CCIS@UTAS CSDS3203 58
References

Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach


(4th Edition). Pearson. http://aima.cs.berkeley.edu/
Yu, & Malan. (2024). Lecture 2: Uncertainty . CS50’s Introduction to
Artificial Intelligence With Python. Retrieved May 11, 2024, from
https://cs50.harvard.edu/ai/2024/notes/2/

CCIS@UTAS CSDS3203 59

You might also like