CS229 - Probability Theory Review: Taide Ding, Fereshte Khani
CS229 - Probability Theory Review: Taide Ding, Fereshte Khani
Based on CS229 Review of Probability Theory by Arian Maleki and Tom Do.
Additional material by Zahra Koochak and Jeremy Irvin.
P(A ∩ B)
P(A | B) :=
P(B)
P(B ∩ A) P(A ∩ B)
P(B | A) = =
P(A) P(A)
P(B)P(A | B)
=
P(A)
P(B | A, C )P(A | C )
P(A | B, C ) =
P(B | C )
P(Bk )P(A | Bk )
P(Bk | A) =
P(A)
P(Bk )P(A | Bk )
= Pn
i=1 P(A | Bi )P(Bi )
Example
Treasure chest A holds 100 gold coins. Treasure chest B holds 60
gold and 40 silver coins.
Choose a treasure chest uniformly at random, and pick a coin from
that chest uniformly at random. If the coin is gold, then what is
the probability that you chose chest A? 1
Solution:
P(A)P(G | A)
P(A | G ) =
P(A)P(G | A) + P(B)P(G | B)
0.5 × 1
=
0.5 × 1 + 0.5 × 0.6
= 0.625
1
Question based on slides by Koochak & Irvin
Chain Rule
P(A1 ∩ A2 ∩ ... ∩ An )
= P(A1 )P(A2 | A1 )P(A3 | A2 ∩ A1 )...P(An | An−1 ∩ An−2 ∩ ... ∩ A1 )
Independence
P(AB) = P(A)P(B)
pX (x) := P(X = x)
P
For a valid PMF, x∈Val(x) pX (x) = 1.
Cumulative Distribution Function (CDF)
FX (x) := P(X ≤ x)
dFX (x)
fX (x) :=
dx
Thus,
Z b
P(a ≤ X ≤ b) = FX (b) − FX (a) = fX (x)dx
a
Xn n
X
E[ fi (X )] = E[fi (X )]
i=1 i=1
E[E[X | Y ]] = E[X ]
P
N.B. E[X | Y ] = x∈Val(x) xpX |Y (x|y ) is a function of Y .
See Appendix for details :)
Example of Law of Total Expectation
El Goog sources two batteries, A and B, for its phone. A phone
with battery A runs on average 12 hours on a single charge, but
only 8 hours on average with battery B. El Goog puts battery A in
80% of its phones and battery B in the rest. If you buy a phone
from El Goog, how many hours do you expect it to run on a single
charge?
Solution: Let L be the time your phone runs on a single charge.
We know the following:
I pX (A) = 0.8, pX (B) = 0.2,
I E[L | A] = 12, E[L | B] = 8.
Then, by Law of Total Expectation,
X
E[L] = E[E[L | X ]] = E[L | X ]pX (X )
X ∈{A,B}
Distribution PDF
or PMF Mean Variance
p, if x = 1
Bernoulli(p) p p(1 − p)
1 − p, if x = 0.
n k
n−k
Binomial(n, p) k p (1 − p) for k = 0, 1, ..., n np np(1 − p)
k−1 1 1−p
Geometric(p) p(1 − p) for k = 1, 2, ... p p2
−λ k
e λ
Poisson(λ) k! for k = 0, 1, ... λ λ
1 a+b (b−a)2
Uniform(a, b) b−a for all x ∈ (a, b) 2 12
(x−µ)2
Gaussian(µ, σ 2 ) √1 e − 2σ2 for all x ∈ (−∞, ∞) µ σ2
σ 2π
−λx 1 1
Exponential(λ) λe for all x ≥ 0, λ ≥ 0 λ λ2
2
Table reproduced from Maleki & Do’s review handout by Koochak & Irvin
Joint and Marginal Distributions
I Joint PMF for discrete RV’s X , Y :
I for continuous X , Y :
Z ∞ Z ∞
E[g (X , Y )] := g (x, y )fXY (x, y )dxdy
−∞ −∞
pXY (x, y )
pY |X (y |x) =
pX (x)
I For continuous X , Y :
fXY (x, y )
fY |X (y |x) =
fX (x)
I In general, for continuous X1 , ..., Xn :
pX |Y (x|y )pY (y )
pY |X (y |x) = P 0 0
y 0 ∈Val(Y ) pX |Y (x|y )pY (y )
I For continuous X , Y :
fX |Y (x|y )fY (y )
fY |X (y |x) = R ∞ 0 0 0
−∞ fX |Y (x|y )fY (y )dy
Chain Rule for RVs
f (x1 , x2 , ..., xn ) = f (x1 )f (x2 |x1 )...f (xn |x1 , x2 , ..., xn−1 )
Yn
= f (x1 ) f (xi |x1 , ..., xi−1 )
i=2
Independence for RVs
n
Y
f (x1 , ..., xn ) = f (x1 )f (x2 )...f (xn ) = f (xi )
i=1
Given g : Rn → Rm , we have:
g1 (x) E[g1 (X )]
g2 (x) E[g2 (X )]
g (x) = . , E[g (X )] = .
..
.. .
gm (x) E[gm (X )]
Covariance Matrices
For a random vector X ∈ Rn , we define its covariance matrix Σ
as the n × n matrix whose ij-th entry contains the covariance
between Xi and Xj .
Cov [X1 , X1 ] . . . Cov [X1 , Xn ]
Σ=
.. .. ..
. . .
Cov [Xn , X1 ] . . . Cov [Xn , Xn ]
Properties:
I Σ is symmetric and PSD
I If Xi ⊥ Xj for all i, j, then Σ = diag (Var [X1 ], ..., Var [Xn ])
Multivariate Gaussian
The multivariate Gaussian X ∼ N (µ, Σ), X ∈ Rn :
1 1 T −1
p(x; µ, Σ) = 1 n exp − (x − µ) Σ (x − µ)
det(Σ) 2 (2π) 2 2
X = Y2 +
where (1), (2), and (5) result from the definition of expectation,
(3) results from the definition of cond. prob., and (5) results from
marginalizing out Y .
3
from slides by Koochak & Irvin
Appendix: A proof of Conditioned Bayes Rule
4
from slides by Koochak & Irvin