0% found this document useful (0 votes)
6 views3 pages

281 Probability Reference Sheet

Uploaded by

only6.not
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views3 pages

281 Probability Reference Sheet

Uploaded by

only6.not
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

15-281: AI: Representation and Problem Solving Spring 2020

1 Probability Notation
Suppose we have 3 random variables A, B, and C. Consider the expression
X
P (+b, C) = P (a, +b, C)
a∈{a1 ,a2 ,a3 }

In this course, we denote discrete random variables by capital letters and use them to represent all pos-
sible disjoint outcomes. In the above example, A, B, and C are random variables.

We use lower case letters to denote outcomes, i.e. possible values our variables can take on, such as
+b for the variable B, or a1 , a2 , and a3 for the variable A in the above example.

We also have variables for values like a. Note that these variables are also represented by lower case
letters and only represent a single outcome (as opposed to random variables).

2 Basic Rules
Definition of Conditional Probability:

P (X, Y )
P (X | Y ) =
P (Y )

Product Rule:

P (X, Y ) = P (X | Y )P (Y )

= P (Y | X)P (X)
P (X1 , X2 , X3 ) = P (X1 , X2 | X3 )P (X3 )
= P (X1 | X2 , X3 )P (X2 , X3 )

Bayes’ Theorem:

P (X | Y )P (Y )
P (Y | X) =
P (X)

Normalization:
P (X, Y ) P (X, Y )
P (Y | X) = =P
P (X) y P (X, y)

P (Y | X) ∝ P (X, Y )

P (Y | X) = αP (X, Y ) Note this difference between ∝ and α


1 1
α= =P
P (X) y (X, y)
P

Chain Rule:

P (X1 , X2 , X3 ) = P (X1 | X2 , X3 )P (X2 , X3 )


= P (X1 | X2 , X3 )P (X2 | X3 )P (X3 )
N
Y
P (X1 , ..., XN ) = P (Xn | X1 , ..., Xn−1 )
n=1

1
15-281: AI: Representation and Problem Solving Spring 2020

All of these basic probability rules hold when conditioning on a set of random variables or outcomes. To
make this work, the conditioned variables need to be included in each term in the rule. For example, take
Bayes’ Theorem from above, but now conditioned upon variables A and B:

P (X | Y, A, B)P (Y | A, B)
P (Y | X, A, B) =
P (X | A, B)

3 Marginalization
Marginalization uses the law of total probability to “sum out” variables from a joint distribution. This is
useful when we are given the joint probability distribution and want to find the probability distribution over
just a subset of the variables. Marginalization has the following forms:

To sum out a single variable:


X
P (X) = P (X, y)
y

To sum out multiple variables:


XX
P (X) = P (X, y, z)
z y

This also works for conditional distributions when summing out a variable that is not conditioned upon,
i.e. a variable to the left of the |:
X
P (A | C, d) = P (A, b | C, d)
b

This does NOT work when summing over a variable that is conditioned upon, i.e. a variable to the right of
the |:
X
P (A, b | C) 6= P (A, b | C, d)
d

4 Independence
If two variables X and Y are independent (X ⊥
⊥ Y ), by definition the following are true:
• P (X, Y ) = P (X)P (Y )
• P (X) = P (X | Y )
• P (Y ) = P (Y | X)

If two variables X and Y are conditionally independent given Z (X ⊥


⊥ Y | Z), by definition the
following are true:
• P (X, Y | Z) = P (X | Z)P (Y | Z)
• P (X | Y, Z) = P (X | Z)

• P (Y | X, Z) = P (Y | Z)

2
15-281: AI: Representation and Problem Solving Spring 2020

5 Probability Tables
When representing probabilities with capital letters, e.g. P (A, B), we are referring to all the combinations
of outcomes that the discrete random variables can have. Thus, we have a table of probabilities rather than
a single value. This is also true for conditional probabilities, e.g. P (A, B | C). When there is a mixture of
capital letters and lower case letters, e.g. P (A, b | C, d), the table contains all the combinations of outcomes
for the random variables, A and C (while the discrete values b and d are fixed).

6 Important Note About Conditional Probability Tables


It is important to understand when a probability table contains the complete distribution, or in other words,
when a probability table sums to one.

A probability table will sum to one when:


1. there is exactly one specific combination of outcomes that is conditioned upon and
2. we are considering all possible combinations of the other random variables.

Another way to phrase this: a probability table will sum to one, when:
1. there are no capital letters on the right-hand side of the |, and
2. there are only capital letters on the left-hand side.

You might also like