0% found this document useful (0 votes)

10 views62 pages

CHP: 13 and 14

The document discusses the concepts of uncertainty in real-world scenarios and introduces probability theory and decision theory as tools for managing this uncertainty. It covers basic laws of probability, probability distributions, joint distributions, and the importance of conditional probabilities and Bayes' Rule for probabilistic inference. Additionally, it highlights the use of Bayes' Nets as a graphical model to represent complex joint distributions and the relationships between variables.

Uploaded by

Pubg Star

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views62 pages

CHP: 13 and 14

Uploaded by

Pubg Star

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

CMPT310: Probability, Bayes

Net
Chp: 13 and 14

Reza Nezami
[These slides were extracted from the course taught by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at
Uncertainty
■ The real world is rife with uncertainty!
■ E.g., if I leave for SFO 60 minutes before my flight, will I be there in time?
■ Problems:
■ partial observability (road state, other drivers’ plans, etc.)
■ noisy sensors (radio traffic reports, Google maps)
■ immense complexity of modelling and predicting traffic, security line, etc.
■ lack of knowledge of world dynamics (will tire burst? will I get in crash?)
■ Probabilistic assertions summarize effects of ignorance and laziness
■ Combine probability theory + utility theory -> decision theory
■ Maximize expected utility : a* = argmaxa  s P(s | a) U(s)
Basic laws of probability
(discrete)
■ Begin with a set  of possible worlds
■ E.g., 6 possible rolls of a die, {1, 2, 3, 4, 5, 6}

■ A probability model assigns a number P() to each world

■ E.g., P(1) = P(2) = P(3) = P(5) = P(5) = P(6) = 1/6. 1/6
1/6

■ These numbers must satisfy 1/6 1/6 1/6

■ 0 <= P() 1/6

■S_m P() = 1
Basic laws contd.
1/6 1/6
■ An event is any subset of  1/6 1/6
1/6
1/6 1/6 1/6
1/6
1/6

■ E.g., “roll < 4” is the set {1,2,3}

■ E.g., “roll is odd” is the set {1,3,5} 1/6 1/6

■ The probability of an event is the sum of probabilities over its worlds

■ P(A) = Sw P()
■ E.g., P(roll < 4) = P(1) + P(2) + P(3) = 1/2

■ De Finetti (1931): anyone who bets according to probabilities that

violate these laws can be forced to lose money on every set of
bets
Probability Distributions
■ Associate a probability with each value; sums to 1
■ Temperature: ■ Weather: ■ Joint distribution
Marginal distributions
P(T) P(W) P(T,W)
W P
T P
sun 0.6 Temperature
hot 0.5
rain 0.1 hot cold
cold 0.5
fog 0.3 sun 0.45 0.15

Weather
meteor 0.0 rain 0.02 0.08
fog 0.03 0.27
meteor 0.00 0.00

■ Can’t deduce joint from marginals

■ Can deduce marginals from joint
Probability Distributions
• Unobserved random variables have distributions

T P W P
hot 0.5 sun 0.6
rain 0.1
cold 0.5
fog 0.3
meteor 0.0

• A distribution is a TABLE of probabilities of values

• A probability (lower case value) is a single number

• Must have: and

Joint Distributions
• A joint distribution over a set of random variables:
specifies a real number for each assignment (or outcome):

T W P
hot sun 0.4
• Must obey:
hot rain 0.1
cold sun 0.2
cold rain 0.3

• Size of distribution if n variables with domain sizes d?

• For all but the smallest distributions, impractical to write out!
Probabilistic Models
• A probabilistic model is a joint distribution
over a set of random variables

• Probabilistic models:
• (Random) variables with domains
• Assignments are called outcomes
• Joint distributions: says whether assignments Distribution over T,W
(outcomes) are likely or not.
• Normalized: sum to 1.0 T W P
• Ideally: only certain variables directly interact
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
Events
• An event is a set E of outcomes:

• From a joint distribution, we can calculate the probability of any

event
T W P
• Probability that it’s hot AND sunny?
hot sun 0.4
• Probability that it’s hot? hot rain 0.1
cold sun 0.2
• Probability that it’s hot OR sunny? cold rain 0.3
• Typically, the events we care about are partial assignments, like
P(T=hot)
Marginal Distributions
• Marginal distributions are sub-tables which eliminate variables
• Marginalization (summing out): Combine collapsed rows by adding

T P
T W P hot 0.5
hot sun 0.4 cold 0.5
hot rain 0.1
cold sun 0.2
W P
cold rain 0.3
sun 0.6
rain 0.4
Conditional Probabilities
• A simple relation between joint and conditional probabilities
• In fact, this is taken as the definition of a conditional probability

P(a,b)

P(a) P(b)

T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
Normalizing a distribution

■ (Dictionary) To bring or restore to a normal condition

■ Procedure: All entries sum to ONE

■ Multiply each entry by  = 1/(sum over all entries)

P(W,T)
P(W | T=c) = P(W,T=c)/P(T=c)
Temperature
P(W,T=c) =  P(W,T=c)
hot cold
sun 0.45 0.15 0.15 0.30
Normalize
Weather

rain 0.02 0.08 0.08 0.16

fog 0.03 0.27 0.27 0.54
meteor 0.00 0.00 0.00  = 1/0.50 = 0.00
2
The Product Rule:

• Example:

D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
The Chain Rule
• More generally, can always write any joint distribution as an
incremental product of conditional distributions

• Why is this always true?

Probabilistic Inference
• Probabilistic inference: compute a desired
probability from other known probabilities (e.g.
conditional from joint)

• We generally compute conditional probabilities

• P(on time | no reported accidents) = 0.90
• These represent the agent’s beliefs given the evidence

• Probabilities change with new evidence:

• P(on time | no accidents, 5 a.m.) = 0.95
• P(on time | no accidents, 5 a.m., raining) = 0.80
• Observing new evidence causes beliefs to be updated
Inference by Enumeration
• General case:  We want:
• Evidence variables:
• Query* variable: * Works fine with
All variables multiple query
• Hidden variables: variables, too

 Step 1: Select the

entries consistent  Step 2: Sum out H to get joint  Step 3: Normalize
with the evidence of Query and evidence
Inference by Enumeration example:
S T W P
• P(W)? summer hot sun 0.30

summer hot rain 0.05

summer cold sun 0.10

• P(W | winter)?
summer cold rain 0.05

winter hot sun 0.10

winter hot rain 0.05
• P(W | winter, hot)? winter cold sun 0.15
winter cold rain 0.20
Bayes’ Rule
• Two ways to factor a joint distribution over two variables:

That’s my rule!

• Dividing, we get:

• Why is this at all helpful?

• Allows us build one conditional from its reverse

• Often one conditional is tricky but the other one is simple
• Foundation of many probabilistic Learning Systems.
Inference with Bayes’ Rule
• Example: Diagnostic probability from causal probability:

• Example:
• M: meningitis, S: stiff neck
Example
givens

• Note: posterior probability of meningitis still very small

• Note: you should still get stiff necks checked out! Why?
Example: Tom’s test result
• Tom takes a test of leukemia.
• It’s known that patients with leukemia have positive result
98%
• If patient doesn’t have leukemia still the result may be
positive 3%
• It is known that 0.8 % of population usually have leukemia
• Tom’s test comes back positive!
• What is the likelihood that Tom has leukemia? P(+L | +test)
Prior : P(+L) = 0.008 , P(-L) = 0.992
P(+test | +L ) = 0.98, P(+test | -L) = 0.03

P(+L | +test) = (P(+test | +L) * P(+L) ) / P(+test)

P(+test) = P(+test | +L) * P(+L) + P(+test | -L) * P(-L) =

0.98 * 0.008 + 0.03 * 0.992 = 0.0078 + 0.0298

P(+L | +test) = (0.98 * 0.008 ) / (0.0078 + 0.0298) = 0.21

so it means P(-L | +test) = 0.79

Quiz: Bayes’ Rule
D W P
• Given:
wet sun 0.1
R P
dry sun 0.9
sun 0.8
wet rain 0.7
rain 0.2
dry rain 0.3

• What is P(W | dry) ?

Independence
• Two variables are independent if:

• This says that their joint distribution factors into product of two simpler distributions

• Another form:

• We write:

• Independence is a simplifying modeling assumption

• Empirical joint distributions: at best “close” to independent

• What could we assume for {Weather, Traffic, Cavity, Toothache}?

Example: Independence?

T P
hot 0.5
cold 0.5
T W P T W P
hot sun 0.4 hot sun 0.3
hot rain 0.1 hot rain 0.2
cold sun 0.2 cold sun 0.3
cold rain 0.3 cold rain 0.2
W P
sun 0.6
rain 0.4
Example: Smoke alarm
 Variables:
 F: There is fire F
 S: There is smoke
 A: Alarm sounds

A
Conditional Independence Examples
 What about this domain:

 Fire
 Smoke
 Alarm
Conditional Independence
• P(Toothache, Cavity, Catch)

• If I have a cavity, the probability that the probe catches it

doesn't depend on whether I have a toothache:
• P(+catch | +toothache, +cavity) = P(+catch | +cavity)

• The same independence holds if I don’t have a cavity:

• P(+catch | +toothache, -cavity) = P(+catch| -cavity)

• Catch is conditionally independent of Toothache given Cavity:

• Conditional independence is our most basic and robust form of

knowledge about uncertain environments.

• X is conditionally independent of Y given Z:

if and only if:

or, equivalently, if and only if

Conditional Independence and the Chain Rule
• Chain rule:

• Trivial decomposition:

• With assumption of conditional independence:

• Bayes’nets / graphical models help us express conditional independence assumptions

Bayes’ Nets: Big Picture
• Problems with using full joint distribution tables as our
probabilistic models:
• Unless there are only a few variables, the joint is WAY too big
to represent explicitly
• Hard to learn (estimate) anything empirically about more than
a few variables at a time

• Bayes’ nets: a technique for describing complex joint

distributions (models) using simple, local distributions
(conditional probabilities)
• More properly called graphical models
• We describe how variables locally interact
• Local interactions chain together to give global, indirect
interactions
• For now, we’ll be vague about how these interactions are
specified
Graphical Model Notation
• Nodes: variables (with domains)
• Can be assigned (observed) or unassigned
(unobserved)

• Arcs: interactions
• Similar to CSP constraints
• Indicate “direct influence” between variables
• Formally: encode conditional independence
(more later)

• For now: imagine that arrows mean direct

causation (in general, they don’t!)
Example: Traffic
• Variables:
• R: It rains
• T: There is traffic

• Model 1: independence
 Model 2: rain causes traffic

R R

T
• Why is an agent using model 2 better? T
Bay’s net Examples
Bay’s net Examples: Car
won’t Start!
Example Bayes’ Net: Insurance
Bayes’ Net Semantics
• A set of nodes, one per variable X

• A directed, acyclic graph

A1 An

• A conditional distribution for each node given its parent

variables in the graph
X
• A collection of distributions over X, one for each combination of
parents’ values

• CPT: conditional probability table

• Description of a noisy “causal” process

A Bayes net = Topology (graph) + Local Conditional Probabilities

Exercise: Alarm Network
• Variables
• B: Burglary
• A: Alarm goes off
• M: Mary calls
• J: John calls
• E: Earthquake!
Example: Bayesian vs Join distribution parameter
count
Probabilities in BNs
• Bayes’ Nets implicitly encode joint distributions

• As a product of local conditional distributions

• To see what probability a BN gives to a full assignment, multiply all the

relevant conditionals together:

• Chain rule gives:

• Example:
Example: Alarm Network
B P(B) E P(E)
+b 0.001 Burglary Earthquake +e 0.002
-b 0.999 -e 0.998

A J P(J|A)
Alarm A M P(M|A)
B E A P(A|B,E)
+a +j 0.9 +a +m 0.7
+b +e +a 0.95
+a -j 0.1 +a -m 0.3
+b +e -a 0.05
-a +j 0.05 JohnCall MaryCall -a +m 0.01
+b -e +a 0.94
-a -j 0.95 -a -m 0.99
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
Bayes Net vs Joint Distribution Table
■ Both give you the power to calculate
■ BNs encode joint distributions as P(X1, X2, …, XN)
product of conditional distributions on ■ Bayes Nets: huge space savings with sparsity
each variable: since usually number of parents is small!
P(X1,..,Xn) =  i P(Xi |
Parents(Xi)) ■ It’s easier to elicit local CPTs
■ How big is a joint distribution over N
variables, each with d values? ■ BNs faster to answer queries (coming)
dN

■ How big is an N-node BN if nodes

have at most k parents?
O(N * dk)
Conditional independence in BNs
■ Compare the Bayes net global semantics
P(X1,..,Xn) =  i P(Xi | Parents(Xi))

with the chain rule identity

P(X1,..,Xn) =  i P(Xi | X1,…,Xi-1)

■ Assume (without loss of generality) that X1,..,Xn sorted in topological order according to
the graph (i.e., parents before children), so Parents(Xi) � X1,…,Xi-1
■ So the Bayes net asserts conditional independences P(Xi | X1,…,Xi-1) = P(Xi | Parents(Xi))
■ To ensure these are valid, choose parents for node Xi that “shield” it from other predecessors
Conditional independence
semantics
■ Every variable is conditionally independent of its non-descendants given its parents
■ Conditional independence semantics <=> global semantics

U1 ... Um

X
Z1j Z nj

Y1 ... Yn

34
P(B) Example: Burglary P(E)

■ Burglary 0.001
true false
0.999
true false
0.002 0.998
■ Earthquake Burglary
?
Earthquake

■ Alarm ? ?
Alarm B E P(A|B,E)

true false

true true 0.95 0.05

true false 0.94 0.06

false true 0.29 0.71

false false 0.001 0.999

35
Causality?
■ When Bayes’ nets reflect the true causal patterns:
■ Often simpler (nodes have fewer parents)
■ Often easier to think about
■ Often easier to elicit from experts

■ BNs need not actually be causal

■ Sometimes no causal net exists over the domain
(especially if variables are missing)
■ E.g. consider the variables Traffic and Rain
■ End up with arrows that reflect correlation, not
causation

■ What do the arrows really mean?

■ Topology may happen to encode causal
structure
■ Topology really encodes conditional
independence
Summary
■ Independence and conditional independence are
important forms of probabilistic knowledge

■ Bayes net encode joint distributions efficiently by

taking advantage of conditional independence
■ Global joint probability = product of local conditionals

■ Allows for flexible tradeoff between model accuracy

and memory/compute efficiency by picking most
important correlations and ignoring the rest!
A A B A B
A B C
B E C C
D E
C D D E D E

Strict Independence Naïve Bayes Sparse Bayes Net Joint Distribution

Inference
• Inference: calculating some useful  Examples:
quantity from a joint probability  Posterior probability
distribution

 Most likely explanation:

Operation 1: Joining Factors

+r 0.1
R -r 0.9 Join R
+r +t 0.08
Join T
R, T, L
+r -t 0.02
T +r +t 0.8 -r +t 0.09
+r -t 0.2 -r -t 0.81 R, T
-r +t 0.1 +r +t +l 0.024
-r -t 0.9 +r +t -l 0.056
L
L +r -t +l 0.002
+r -t -l 0.018
+t +l 0.3 +t +l 0.3 -r +t +l 0.027
+t -l 0.7 +t -l 0.7 -r +t -l 0.063
-t +l 0.1 -t +l 0.1 -r -t +l 0.081
-t -l 0.9 -t -l 0.9 -r -t -l 0.729
Operation 2: Eliminate
• Second basic operation: marginalization

• Take a table and sum out a variable

• Shrinks a table to a smaller one

• A projection operation

• Example:

+r +t 0.08
+r -t 0.02 +t 0.17
-r +t 0.09 -t 0.83
-r -t 0.81
Multiple Elimination
R, T, L T, L L

Sum Sum
+r +t +l 0.024 out T
+r +t -l 0.056
out R
+t +l 0.051
+r -t +l 0.002 +l 0.134
+t -l 0.119
+r -t -l 0.018 -l 0.886
-t +l 0.083
-r +t +l 0.027
-t -l 0.747
-r +t -l 0.063
-r -t +l 0.081
-r -t -l 0.729
General Variable Elimination
• Query:

• Start with initial factors:

• Local CPTs (but instantiated by evidence)

• While there are still hidden variables (not Q or

evidence):
• Pick a hidden variable H
• Join all factors mentioning H
• Eliminate (sum out) H

• Join all remaining factors and normalize

Example Process

Choose A
Example Process (ctn’d)

Choose E

Finish with B

Normalize
Marginalizing
Join R Sum out R Join T Sum out T

+r +t 0.08
+r 0.1 +r -t 0.02
-r 0.9 +t 0.17
-r +t 0.09
-t 0.83
-r -t 0.81
R R, T T T, L L
+r +t 0.8
+r -t 0.2
-r +t 0.1
T -r -t 0.9 L L +t +l 0.051
+t -l 0.119 +l 0.134
-t +l 0.083 -l 0.866
L +t +l 0.3 +t +l 0.3 -t -l 0.747
+t +l 0.3
+t -l 0.7 +t -l 0.7
+t -l 0.7
-t +l 0.1 -t +l 0.1
-t +l 0.1
-t -l 0.9 -t -l 0.9
-t -l 0.9
Example 2: P(B|+a)

Start / Select Join on B Normalize

B a, B
B P
+b 0.1
b 0.9 a
A B P A B P
+a +b 0.08 +a +b 8/17
B A P +a b 0.09 +a b 9/17
+b +a 0.8
b a 0.2
b +a 0.1
b a 0.9
Causal Chains
• This configuration is a “causal chain”  Guaranteed X independent of Z given Y?

X: Low pressure Y: Rain Z: Traffic

Yes!
 Evidence along the chain “blocks” the
influence
Common Causes
 This configuration is a “common cause”  Guaranteed X independent of Z ?
 No!
Y: Project
due  One example set of CPTs for which X is not
independent of Z is sufficient to show this
independence is not guaranteed.

 Example:
 Project due causes both Canvas busy
and lab full

X: Canvas
Z: Lab full
active
Common Cause
 This configuration is “common cause”  Guaranteed X and Z independent given Y?

Y: Project
due

X: Forums
Z: Lab full
busy Yes!
 Observing the cause blocks influence
between effects.
Common Effect
 2 causes of one effect (v-structures)  Are X and Y independent?
 Yes: the hockey game and the rain cause traffic, but
they are not correlated
X: Raining Y: Hockey game
 Proof:

Z: Traffic
Common Effect
 two causes of one effect (v-structures)
 Are X and Y independent given Z?
 No: seeing traffic puts the rain and the hockey game
X: Raining Y: Hockey game
in competition as explanation.

 This is backwards from the other cases

 Observing an effect activates influence between
possible causes.

Z: Traffic
Naïve Bayes (Towards Machine Learning fundamentals)
• A general Naive Bayes model:
Y

|Y| parameters

F1 F2 Fn

|Y| x |F|n values n x |F| x |Y|

parameters

• We only have to specify how each feature depends on the class

• Total number of parameters is linear in n
• Model is very simplistic, but often works anyway

IT8601 unitIV
No ratings yet
IT8601 unitIV
47 pages
Lecture 05 Reasoning Under Uncertainty
No ratings yet
Lecture 05 Reasoning Under Uncertainty
41 pages
07 Bayesian Networks
No ratings yet
07 Bayesian Networks
106 pages
CS115 Probability
No ratings yet
CS115 Probability
41 pages
Bayes Nets - Representation
No ratings yet
Bayes Nets - Representation
96 pages
cs188 Su24 Lec07
No ratings yet
cs188 Su24 Lec07
89 pages
12.uncertainty Reasoning Class
No ratings yet
12.uncertainty Reasoning Class
68 pages
Cpar 95 Far Final PB
No ratings yet
Cpar 95 Far Final PB
14 pages
Lecture8 - Bays1
No ratings yet
Lecture8 - Bays1
40 pages
CSE3635 Lecture 12 Probability 3
No ratings yet
CSE3635 Lecture 12 Probability 3
33 pages
SP14 CS188 Lecture 12 - Probability - Print
No ratings yet
SP14 CS188 Lecture 12 - Probability - Print
33 pages
Artificial Intelligence: Lecture 12 - Probability Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 12 - Probability Dr. Shivanjali Khare
32 pages
ProbabilityStatitic Review
No ratings yet
ProbabilityStatitic Review
41 pages
Lecture 29
No ratings yet
Lecture 29
65 pages
Unit 4
No ratings yet
Unit 4
74 pages
L08 Probabilistic Reasoning
No ratings yet
L08 Probabilistic Reasoning
90 pages
07 Probability Review
No ratings yet
07 Probability Review
56 pages
Lec 12
No ratings yet
Lec 12
54 pages
4-AI Probability
No ratings yet
4-AI Probability
19 pages
Probability
No ratings yet
Probability
56 pages
L12 Bayesian Network
No ratings yet
L12 Bayesian Network
35 pages
Bayes Net
No ratings yet
Bayes Net
36 pages
UNIT-4 New
No ratings yet
UNIT-4 New
79 pages
Unit-V POAI
No ratings yet
Unit-V POAI
50 pages
Probability Review
No ratings yet
Probability Review
29 pages
Lec6 - Probabilistic Reasoning
No ratings yet
Lec6 - Probabilistic Reasoning
36 pages
Unit 4 Ci 2017
No ratings yet
Unit 4 Ci 2017
22 pages
SP14 CS188 Lecture 12 - Probability
No ratings yet
SP14 CS188 Lecture 12 - Probability
35 pages
09 Uncertainty Annot
No ratings yet
09 Uncertainty Annot
34 pages
Outline of The Course: Unknown
No ratings yet
Outline of The Course: Unknown
26 pages
Bayes Reasoning
No ratings yet
Bayes Reasoning
45 pages
Uncertainty Probabilty
No ratings yet
Uncertainty Probabilty
25 pages
Announcements: Released Monday 3/10, 6:30pm-9:30pm
No ratings yet
Announcements: Released Monday 3/10, 6:30pm-9:30pm
40 pages
L07 Probabilistic Reasoning Till Sep6
No ratings yet
L07 Probabilistic Reasoning Till Sep6
71 pages
Lecture 2 Notes
No ratings yet
Lecture 2 Notes
23 pages
UNIT-4 Uncertainty in Artificial Intelligence
No ratings yet
UNIT-4 Uncertainty in Artificial Intelligence
38 pages
Uncertainty Inference
No ratings yet
Uncertainty Inference
38 pages
8 - Probability
No ratings yet
8 - Probability
54 pages
CS464 Review of Probability: (Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Review of Probability: (Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
Uncertainty: CSE-345: Artificial Intelligence
No ratings yet
Uncertainty: CSE-345: Artificial Intelligence
30 pages
Current State of The Course!!!: We're Done With Part I Search and Planning! Part II: Probabilistic Reasoning
No ratings yet
Current State of The Course!!!: We're Done With Part I Search and Planning! Part II: Probabilistic Reasoning
30 pages
Uncertainty
No ratings yet
Uncertainty
27 pages
L11a Uncertainty171105
No ratings yet
L11a Uncertainty171105
25 pages
Chapter 4
No ratings yet
Chapter 4
25 pages
Probability Review À Markov Models: CSE 473: Artificial Intelligence
No ratings yet
Probability Review À Markov Models: CSE 473: Artificial Intelligence
23 pages
Naive Bayes
No ratings yet
Naive Bayes
25 pages
Dempster Shafer
No ratings yet
Dempster Shafer
134 pages
Module 2
No ratings yet
Module 2
12 pages
Introduction To Uncertainity
No ratings yet
Introduction To Uncertainity
66 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
37 pages
AIFA 25 Bayesian Logic 120324
No ratings yet
AIFA 25 Bayesian Logic 120324
33 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
Reasoning Under Uncertainity
No ratings yet
Reasoning Under Uncertainity
49 pages
Imp Class Bayes Therom and Basian Network Class
No ratings yet
Imp Class Bayes Therom and Basian Network Class
39 pages
Armor 1972 Jan-Jun
100% (2)
Armor 1972 Jan-Jun
230 pages
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
No ratings yet
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
141 pages
09 AI Probability Based Expert Systems
No ratings yet
09 AI Probability Based Expert Systems
64 pages
Vital Managment System All Chapters
No ratings yet
Vital Managment System All Chapters
86 pages
An Introduction To Artificial Intelligence: Chapter 13 &14.1-14.2: Uncertainty & Bayesian Networks
No ratings yet
An Introduction To Artificial Intelligence: Chapter 13 &14.1-14.2: Uncertainty & Bayesian Networks
31 pages
Guieline Full
No ratings yet
Guieline Full
460 pages
Thesis Report - Review-03
No ratings yet
Thesis Report - Review-03
41 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
Water Resource - Watermark
No ratings yet
Water Resource - Watermark
4 pages
Software Requirements Specification: Lovely Professional University
No ratings yet
Software Requirements Specification: Lovely Professional University
9 pages
Romance Astrology PDF
No ratings yet
Romance Astrology PDF
311 pages
Tooth-Colored Restorations: (1) Good Esthetics
100% (1)
Tooth-Colored Restorations: (1) Good Esthetics
12 pages
Lenoir-Lowood - TheatersOfWar - THE MILITARY-ENTERTAINMENT COMPLEX
No ratings yet
Lenoir-Lowood - TheatersOfWar - THE MILITARY-ENTERTAINMENT COMPLEX
42 pages
IBM 9406 270 Repair Analysis
No ratings yet
IBM 9406 270 Repair Analysis
773 pages
Guide
No ratings yet
Guide
28 pages
Relators Application For Order Requiring Citation
No ratings yet
Relators Application For Order Requiring Citation
63 pages
National Curriculum in England - Mathematics Programmes of Study - GOV - UK
No ratings yet
National Curriculum in England - Mathematics Programmes of Study - GOV - UK
45 pages
RHUB5921 Description
No ratings yet
RHUB5921 Description
11 pages
Tax Invoice
No ratings yet
Tax Invoice
1 page
Sherwin Govender Research Report 2018
No ratings yet
Sherwin Govender Research Report 2018
50 pages
Hacienda Luisita and Agrarian Reform
No ratings yet
Hacienda Luisita and Agrarian Reform
34 pages
Core Answer
No ratings yet
Core Answer
22 pages
Course Outline
No ratings yet
Course Outline
12 pages
CEGP013091: 49.248.216.238 27/03/2024 13:55:28 Static-238
No ratings yet
CEGP013091: 49.248.216.238 27/03/2024 13:55:28 Static-238
1 page
Solutions To Problems : Smart/Gitman/Joehnk, Fundamentals of Investing, 12/e Chapter 11
No ratings yet
Solutions To Problems : Smart/Gitman/Joehnk, Fundamentals of Investing, 12/e Chapter 11
6 pages
Geologia Econômica Kupferschiefer
No ratings yet
Geologia Econômica Kupferschiefer
2 pages
Sa
No ratings yet
Sa
43 pages
MB 22 - DATA SHEET Lock Nuts and Locking Devices - SKF
No ratings yet
MB 22 - DATA SHEET Lock Nuts and Locking Devices - SKF
6 pages
Hockey
No ratings yet
Hockey
11 pages
ForwardInvoice ORD474579931
No ratings yet
ForwardInvoice ORD474579931
2 pages
Vivek Pandey Resume
No ratings yet
Vivek Pandey Resume
1 page
Mother India Rice Mill Pan Ack
No ratings yet
Mother India Rice Mill Pan Ack
1 page
Adidas Ultraboost 1.0 Shoes - Orange Adidas UK
No ratings yet
Adidas Ultraboost 1.0 Shoes - Orange Adidas UK
1 page
Understanding the Jet Stream: Clash of the Titans
From Everand
Understanding the Jet Stream: Clash of the Titans
Ged Dunkel
5/5 (1)
Probability Theory: A Concise Course
From Everand
Probability Theory: A Concise Course
Y. A. Rozanov
4/5 (2)
General Relativity 3: Astrophysics with Tensor Calculus
From Everand
General Relativity 3: Astrophysics with Tensor Calculus
Robert Piccioni
5/5 (4)

CHP: 13 and 14

Uploaded by

CHP: 13 and 14

Uploaded by

CMPT310: Probability, Bayes

■ A probability model assigns a number P() to each world

■ These numbers must satisfy 1/6 1/6 1/6

■ 0 <= P() 1/6

■ E.g., “roll < 4” is the set {1,2,3}

■ The probability of an event is the sum of probabilities over its worlds

■ De Finetti (1931): anyone who bets according to probabilities that

■ Can’t deduce joint from marginals

• A distribution is a TABLE of probabilities of values

• A probability (lower case value) is a single number

• Must have: and

• Size of distribution if n variables with domain sizes d?

• From a joint distribution, we can calculate the probability of any

■ (Dictionary) To bring or restore to a normal condition

■ Procedure: All entries sum to ONE

rain 0.02 0.08 0.08 0.16

• Why is this always true?

• We generally compute conditional probabilities

• Probabilities change with new evidence:

 Step 1: Select the

summer hot rain 0.05

summer cold sun 0.10

winter hot sun 0.10

• Why is this at all helpful?

• Allows us build one conditional from its reverse

• Note: posterior probability of meningitis still very small

P(+L | +test) = (P(+test | +L) * P(+L) ) / P(+test)

P(+test) = P(+test | +L) * P(+L) + P(+test | -L) * P(-L) =

P(+L | +test) = (0.98 * 0.008 ) / (0.0078 + 0.0298) = 0.21

so it means P(-L | +test) = 0.79

• What is P(W | dry) ?

• Independence is a simplifying modeling assumption

• Empirical joint distributions: at best “close” to independent

• What could we assume for {Weather, Traffic, Cavity, Toothache}?

• If I have a cavity, the probability that the probe catches it

• The same independence holds if I don’t have a cavity:

• Catch is conditionally independent of Toothache given Cavity:

• Conditional independence is our most basic and robust form of

• X is conditionally independent of Y given Z:

if and only if:

or, equivalently, if and only if

• With assumption of conditional independence:

• Bayes’nets / graphical models help us express conditional independence assumptions

• Bayes’ nets: a technique for describing complex joint

• For now: imagine that arrows mean direct

• A directed, acyclic graph

• A conditional distribution for each node given its parent

• CPT: conditional probability table

• Description of a noisy “causal” process

A Bayes net = Topology (graph) + Local Conditional Probabilities

• As a product of local conditional distributions

• To see what probability a BN gives to a full assignment, multiply all the

• Chain rule gives:

■ How big is an N-node BN if nodes

with the chain rule identity

true true 0.95 0.05

true false 0.94 0.06

false true 0.29 0.71

false false 0.001 0.999

■ BNs need not actually be causal

■ What do the arrows really mean?

■ Bayes net encode joint distributions efficiently by

■ Allows for flexible tradeoff between model accuracy

Strict Independence Naïve Bayes Sparse Bayes Net Joint Distribution

 Most likely explanation:

• Take a table and sum out a variable

• Start with initial factors:

• While there are still hidden variables (not Q or

• Join all remaining factors and normalize

Start / Select Join on B Normalize

X: Low pressure Y: Rain Z: Traffic

 This is backwards from the other cases

|Y| x |F|n values n x |F| x |Y|

• We only have to specify how each feature depends on the class

You might also like