0% found this document useful (0 votes)
71 views23 pages

Probability Review À Markov Models: CSE 473: Artificial Intelligence

Uploaded by

GODFREY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views23 pages

Probability Review À Markov Models: CSE 473: Artificial Intelligence

Uploaded by

GODFREY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

CSE

473: Artificial Intelligence


Probability Review… à Markov Models

Daniel Weld
University of Washington
[These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

Outline
§ Probability
§ Random Variables
§ Joint and Marginal Distributions
§ Conditional Distribution
§ Product Rule, Chain Rule, Bayes’ Rule
§ Inference
§ Independence & Conditional Indpendence
§ … Markov Models

§ You’ll need all this stuff A LOT for the


next few weeks, so make sure you go
over it now!

1
Joint Distributions
§ A joint distribution over a set of random variables:
specifies a probability for each assignment (or outcome):

T W P
§ Must obey: hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

§ Size of joint distribution if n variables with domain sizes d?


§ For all but the smallest distributions, impractical to write out!

Marginal Distributions
§ Marginal distributions are sub-tables which eliminate variables
§ Marginalization (summing out): Combine collapsed rows by adding

T P
hot 0.5
T W P
cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.6
rain 0.4

2
Conditional Distributions
§ Conditional distributions are probability distributions over some variables
given fixed values of others

Conditional Distributions
Joint Distribution
W P
sun 0.8
T W P
rain 0.2
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.4
rain 0.6

Normalization Trick

SELECT the joint NORMALIZE the


probabilities selection
T W P matching the (make it sum to one)
hot sun 0.4 evidence T W P W P
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3

§ Why does this work? Sum of selection is P(evidence)! (P(T=c), here)

3
Probabilistic Inference
§ Probabilistic inference =
“compute a desired probability from other known
probabilities (e.g. conditional from joint)”

§ We generally compute conditional probabilities


§ P(on time | no reported accidents) = 0.90
§ These represent the agent’s beliefs given the evidence

§ Probabilities change with new evidence:


§ P(on time | no accidents, 5 a.m.) = 0.95
§ P(on time | no accidents, 5 a.m., raining) = 0.80
§ Observing new evidence causes beliefs to be updated

Inference by Enumeration
* Works fine with
§ General case: § We want: multiple query
§ Evidence variables: variables, too
§ Query* variable:
All variables
§ Hidden variables:

§ Step 1: Select the § Step 2: Sum out H to get joint § Step 3: Normalize
entries consistent of Query and evidence
with the evidence
1

Z

4
Example: Inference by Enumeration
S T W P
P(W=sun | S=winter)?
summer hot sun 0.30
summer hot rain 0.05
1. Select data consistent with evidence
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20

Example: Inference by Enumeration


S T W P
P(W=sun | S=winter)?
summer hot sun 0.30
summer hot rain 0.05
1. Select data consistent with evidence
summer cold sun 0.10
2. Marginalize away hidden variables summer cold rain 0.05
(sum out temperature)
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20

5
Example: Inference by Enumeration
S T W P
P(W=sun | S=winter)?
summer hot sun 0.30
summer hot rain 0.05
1. Select data consistent with evidence
summer cold sun 0.10
2. Marginalize away hidden variables summer cold rain 0.05
(sum out temperature)
winter hot sun 0.10
3. Normalize winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
S W P
winter sun 0.25
winter rain 0.25

Example: Inference by Enumeration


S T W P
P(W=sun | S=winter)?
summer hot sun 0.30
summer hot rain 0.05
1. Select data consistent with evidence
summer cold sun 0.10
2. Marginalize away hidden variables summer cold rain 0.05
(sum out temperature)
winter hot sun 0.10
3. Normalize winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
S W P
winter sun 0.50
winter rain 0.50

6
Inference by Enumeration

§ Computational problems?
§ Worst-case time complexity O(dn)
§ Space complexity O(dn) to store the joint distribution

Don’t be Fooled
§ It may look cute…

https://fc08.deviantart.net/fs71/i/2010/258/4/4/baby_dragon__charles_by_imsorrybuti-d2yti11.png

38

7
The Sword of Conditional Independence!

Slay I am a BIG joint


the distribution!
Basilisk!

harrypotter.wikia.com/

Means:

Or, equivalently:

40

A Brief Trip Forward in Time…

41

8
Preview: Bayes Nets Encode Joint Distributions
§ A set of nodes, one per variable X P(A1 ) …. P(An )
§ A directed, acyclic graph A1 An

§ A conditional distribution for each node


§ A collection of distributions over X, one for each
combination of parents’ values X

§ CPT: conditional probability table


§ Description of a noisy “causal” process

A Bayes net = Topology (graph) + Local Conditional Probabilities


Benefits: Smaller, Allows Fast Inference, Learnable!

Preview: Example Bayes Net - Car

9
Preview: Dynamic Bayes Nets (DBNs) - Ghosts
§ We want to track multiple variables over time, using
multiple sources of evidence
§ Idea: Repeat a fixed Bayes net structure at each time
§ Generalization of Hidden Markov Models (HMMs)
§ Itself a generalization of Markov Models

§ Variables from time t may condition on those from t-1


t =1 t =2 t =3

G1a G2a G3a

G1b G2b G3b

E1a E1b E2a E2b E3a E3b

Back to Our Own Universe… (for now)

45

10
Ghostbusters, Revisited
§ Let’s say we have two distributions:
§ Prior distribution over ghost location: P(G)
§ Let’s say this is uniform
§ Sensor reading model: P(R | G)
§ Given: we know what our sensors do
§ R = reading color measured at (1,1)
§ E.g. P(R = yellow | G=(1,1)) = 0.1

§ We can calculate the posterior distribution


P(G|r) over ghost locations given a reading
using Bayes’ rule:
[Demo: Ghostbuster – with probability (L12D2) ]

What’s Our Probabilistic Model


§ Random Variables
§ Location of Ghost. Values = {L1,1, L1,2, ….., L6, 10}
§ Sensor value at locations S1,1, …. S6,10. Values = {R, O, Y, G}
§ Joint Distribution
§ Too big to write down 60 * 460 = 7.98 * 1037
§ Here’s a schema for a conditional distribution specifying part of it:
P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3)
0.05 0.15 0.50 0.30
...
P(red | 0) P(orange | 0) P(yellow | 0) P(green | 0)
0.70 0.15 0.10 0.05
47

11
Model for a Tiny Ghostbuster
§ Random Variables
§ Location of Ghost, G. Values = {L1, L2} L1 L2

§ Sensor value at locations S1, S2 with values {R, O, Y, G}


§ Joint Distribution G=L1 G=L2
S2 S2
R O Y G R O Y G
R Select G=L1
O
S1
Y ∑ S2
G

48
Can marginalize to get P(S1 | distance =0)

Video of Demo Ghostbusters with Probability

12
The Product Rule
§ Sometimes have conditional distributions but want the joint

The Chain Rule

§ More generally, can always write any joint distribution as an


incremental product of conditional distributions

13
Bayes’ Rule

§ Two ways to factor a joint distribution over two variables:


That’s my rule!

§ Dividing, we get:

§ Why is this at all helpful?


§ Lets us build one conditional from its reverse
§ Often one conditional is tricky but the other one is simple
§ Foundation of many systems we’ll see later (e.g. ASR, MT)

§ In the running for most important AI equation!

Independence
§ Two variables are independent in a joint distribution if:

§ Says the joint distribution factors into a product of two simple ones
§ Usually variables aren’t independent!

§ Can use independence as a modeling assumption


§ Independence can be a simplifying assumption
§ Empirical joint distributions: at best “close” to independent
§ What could we assume for {Weather, Traffic, Cavity}?

§ Independence is like something from CSPs: what?

14
Independence

P(AÙB) = P(A)P(B)

A AÙB

B
True

© Daniel S. Weld 58

Example: Independence
§ N fair, independent coin flips:

H 0.5 H 0.5 H 0.5


T 0.5 T 0.5 T 0.5

15
Example: Independence?

T P
hot 0.5
cold 0.5 P2 (T, W ) = P (T )P (W )
T W P T W P


hot sun 0.4 hot sun 0.3
hot rain 0.1 hot rain 0.2
cold sun 0.2 cold sun 0.3
cold rain 0.3 cold rain 0.2
W P
sun 0.6
rain 0.4

Conditional Independence

16
Conditional Independence
§ Unconditional (absolute) independence very rare

§ Conditional independence is our most basic and robust form


of knowledge about uncertain environments.

§ X is conditionally independent of Y given Z (written )

if and only if:

or, equivalently, if and only if

Conditional Independence

Are A & B independent? P(A|B) <


? P(A)

P(A)=(.25+.5)/2
AÙB
A = .375

P(B)= .75

P(A|B)=(.25+.25+.5)/3
=.3333
B

© Daniel S. Weld 63

17
A, B Conditionally Independent Given C

P(A|B,C) = P(A|C) C = striped

AÙC P(A|¬C) =.5


AÙBÙC
P(A|B,¬C)=.5

BÙC

© Daniel S. Weld 64

Conditional Independence
§ What about this domain:
§ Fire
§ Smoke
§ Alarm F

18
Conditional Independence
§ What about this domain:
§ Traffic
§ Umbrella R
§ Raining

U T

What is Conditional Independence?

I am a BIG joint
distribution!

http://harrypotter.wikia.com/ Slay the Basilisk! 68

19
Probability Recap
§ Conditional probability

§ Product rule

§ Chain rule

§ Bayes rule

§ X, Y independent if and only if:

§ X and Y are conditionally independent given Z:


if and only if:

Markov Models

20
Reasoning over Time or Space

§ Often, we want to reason about a sequence of observations


§ Speech recognition
§ Robot localization
§ User attention
§ Medical monitoring

§ Need to introduce time (or space) into our models

Markov Models
§ Value of X at a given time is called the state

X1 X2 X3 X4

§ Parameters: called transition probabilities or dynamics, specify how the


state evolves over time (also, initial state probabilities)
§ Stationarity assumption: transition probabilities the same at all times
§ Means P(X5 | X4) = P(X12 | X11) etc.
§ Same as MDP transition model, but no choice of action

21
Joint Distribution of a Markov Model
X1 X2 X3 X4

§ Joint distribution:
P (X1 , X2 , X3 , X4 ) = P (X1 )P (X2 |X1 )P (X3 |X2 )P (X4 |X3 )
§ More generally:
P (X1 , X2 , . . . , XT ) = P (X1 )P (X2 |X1 )P (X3 |X2 ) . . . P (XT |XT 1)
T
Y
= P (X1 ) P (Xt |Xt 1)
t=2
§ Questions to be resolved:
§ Does this indeed define a joint distribution?
§ Can every joint distribution be factored this way, or are we making some assumptions
about the joint distribution by using this factorization?

Chain Rule and Markov Models


X1 X2 X3 X4

§ From the chain rule, every joint distribution over can be written as:
X 1 , X2 , X3 , X4

P (X1 , X2 , X3 , X4 ) = P (X1 )P (X2 |X1 )P (X3 |X1 , X2 )P (X4 |X1 , X2 , X3 )

§ And, if we assume that


X3 ?
? X1 | X2 and X4 ?
? X 1 , X2 | X 3

This formula simplifies to

P (X1 , X2 , X3 , X4 ) = P (X1 )P (X2 |X1 )P (X3 |X2 )P (X4 |X3 )

22
Chain Rule and Markov Models
X1 X2 X3 X4

§ From the chain rule, every joint distribution over can be written as:
X 1 , X2 , . . . , X T
T
Y
P (X1 , X2 , . . . , XT ) = P (X1 ) P (Xt |X1 , X2 , . . . , Xt 1)
t=2
§ So, if we assume that for all t:
Xt ?
? X1 , . . . , X t 2 | Xt 1

We get
T
Y
P (X1 , X2 , . . . , XT ) = P (X1 ) P (Xt |Xt 1)
t=2

23

You might also like