Probability Cheatsheet
Probability Cheatsheet
0
Compiled by William Chen (http://wzchen.com) and Joe Blitzstein,
with contributions from Sebastian Chiu, Yuan Jiang, Yuqi Hou, and
Jessy Hwang. Material based on Joe Blitzsteins (@stat110) lectures
(http://stat110.net) and Blitzstein/Hwangs Introduction to
Probability textbook (http://bit.ly/introprobability). Licensed
under CC BY-NC-SA 4.0. Please share comments, suggestions, and errors
at http://github.com/wzchen/probability_cheatsheet.
Thinking Conditionally
Independence
Multiplication Rule
C
ke
ca
wa
ffle
cak
waffle
cake
V
S
(A B) = A B
(A B) = A B
waffle
cake
P (B|A) = P (B)
Counting
P (A) = P (A B1 ) + P (A B2 ) + + P (A Bn )
P (A|B) = P (A)
Last Updated February 26, 2016
P (A) = P (A B) + P (A B )
Bayes Rule
Bayes Rule, and with extra conditioning (just add in C!)
P (A|B) =
P (A|B, C) =
waffl
e
Sampling Table
P (B|A)P (A)
P (B)
P (A|B)
P (B|A) P (A)
=
P (Ac |B)
P (B|Ac ) P (Ac )
The posterior odds of A are the likelihood ratio times the prior odds.
8
9
P (A B) = P (A) + P (B) P (A B)
pX (x) = P (X = x)
P (A B) P (A C) P (B C)
+ P (A B C).
Simpsons Paradox
0.8
The sampling table gives the number of possible samples of size k out
of a population of size n, under various assumptions about how the
sample is collected.
1.0
Without Replacement
n!
(n k)!
0.6
heart
pmf
Not Matter
n + k 1
k
n
0.4
With Replacement
0.2
Order Matters
band-aid
Dr. Hibbert
Dr. Nick
It is possible to have
c
0.0
X
x
pX (x) = 1
1.0
FX (x) = P (X x)
LOTUS
xP (X = x) (for discrete X)
0.8
E(X) =
2
Note that IA
= IA , IA IB = IAB , and IAB = IA + IB IA IB .
0.6
0.4
cdf
0.2
0.0
E(g(X)) =
P (a X b) =
E(X)
yi
E(Y)
i=1
1
n
(xi + yi)
Whats the point? You dont need to know the PMF/PDF of g(X)
to find its expected value. All you need is the PMF/PDF of X.
When you plug any CRV into its own CDF, you get a Uniform(0,1)
random variable. When you plug a Uniform(0,1) r.v. into an inverse
CDF, you get an r.v. with that CDF. For example, lets say that a
random variable X has CDF
F (x) = 1 e
, for x > 0
Unif(0, 1)
CDF
0.20
0.8
1.0
0.30
i=1
E(X + Y)
0.2
i=1
1
n
Moments
0.0
xi
7
4
14
33
2
1
14
5
...
4
2
8
23
3
0
9
1
...
F (x) = f (x)
0.10
3
2
6
10
1
1
5
4
...
X+Y
0.00
0.6
P (X = x, Y = y) = P (X = x)P (Y = y)
Z
E(g(X)) =
0.4
1
n
E(X) =
xf (x)dx
Marginal Distributions
To find the distribution of one (or more) random variables from a joint
PMF/PDF, sum/integrate over the unwanted random variables.
MX (t) = E(e
tX
MX (t) = E(e
X
X
k tk
E(X k )tk
=
)=
k!
k!
k=0
k=0
t(aX+b)
bt
) = e E(e
(at)X
bt
) = e MX (at)
) = E(e
tX
)E(e
tY
+ Cov(X, Z)
Correlation is location-invariant and scale-invariant For any
constants a, b, c, d with a and c nonzero,
Multivariate LOTUS
LOTUS in more than one dimension is analogous to the 1D LOTUS.
For discrete random variables:
XX
E(g(X, Y )) =
g(x, y)P (X = x, Y = y)
x
Cov(aX, bY ) = abCov(X, Y )
Cov(W + X, Y + Z) = Cov(W, Y ) + Cov(W, Z) + Cov(X, Y )
) = MX (t) MY (t)
The MGF of the sum of two random variables is the product of the
MGFs of those two random variables.
Joint Distributions
Transformations
One Variable Transformations Lets say that we have a random
variable X with PDF fX (x), but we are also interested in some
function of X. We call this function Y = g(X). Also let y = g(x). If g
is differentiable and strictly increasing (or strictly decreasing), then
the PDF of Y is
dx
= fX (g 1 (y)) d g 1 (y)
fY (y) = fX (x)
dy
dy
(u, v)
=
(x, y)
u
x
v
x
u
y
v
y
F (x, y) = P (X x, Y y)
In the discrete case, X and Y have a joint PMF
pX,Y (x, y) = P (X = x, Y = y).
In the continuous case, they have a joint PDF
fX,Y (x, y) =
2
FX,Y (x, y).
xy
(u, v)
fX,Y (x, y) = fU,V (u, v)
(x, y)
The inner bars tells us to take the matrixs determinant, and the outer
bars tell us to take the absolute value. In a 2 2 matrix,
Cov(X, Y )
a
c
Var(X)Var(Y )
X
Y Cov(X, Y ) = 0 E(XY ) = E(X)E(Y )
Conditional Distributions
be the Jacobian matrix. If the entries in this matrix exist and are
continuous, and the determinant of the matrix is never 0, then
fX (x|A) =
Corr(aX + b, cY + d) = Corr(X, Y )
fY |X (y|x) =
Cov(X, Y ) = Cov(Y, X)
Cov(X + a, Y + b) = Cov(X, Y )
(k)
k = E(X ) = MX (0)
Convolutions
Convolution Integral If you want to find the PDF of the sum of two
independent CRVs X and Y , you can do the following integral:
n
X
i=1
Var(Xi ) + 2
b
= |ad bc|
d
fX (x)fY (t x)dx
fX+Y (t) =
Cov(Xi , Xj )
i<j
X
Y = Var(X + Y ) = Var(X) + Var(Y )
If X1 , X2 , . . . , Xn are identically distributed and have the same
covariance relationships (often by symmetry), then
n
Var(X1 + X2 + + Xn ) = nVar(X1 ) + 2
Cov(X1 , X2 )
2
fX+Y (t) =
1
x2 /2 1
(tx)2 /2
dx
e
e
2
2
By completing the square and using the fact that a Normal PDF
integrates to 1, this works out to fX+Y (t) being the N (0, 2) PDF.
Poisson Process
Definition We have a Poisson process of rate arrivals per unit
time if the following conditions hold:
1. The number of arrivals in a time interval of length t is Pois(t).
+
+
T1
+
+
T3
T4
T5
P (T1 t) = 1 e
Order Statistics
E(Y |A) =
yP (Y = y)
y yP (Y = y|A)
Continuous Y
E(Y ) =
E(Y |A) =
Note that the order statistics are dependent, e.g., learning X(4) = 42
gives us the information that X(1) , X(2) , X(3) are 42 and
X(5) , X(6) , . . . , X(n) are 42.
Distribution Taking n i.i.d. random variables X1 , X2 , . . . , Xn with
CDF F (x) and PDF f (x), the CDF and PDF of X(i) are:
FX(i) (x) = P (X(i) x) =
n
X
n
k
nk
F (x) (1 F (x))
k
k=i
n 1
i1
ni
F (x)
(1 F (x))
f (x)
i1
Y
N (Y , Y )
yf (y|A)dy
2
If the Xi are i.i.d. with mean X and variance X
, then Y = nX
2
2
n , the CLT says
and Y
= nX
. For the sample mean X
2
n = 1 (X1 + X2 + + Xn )
N (X , X /n)
X
n
In other words, the CDF of the left-hand side goes to the standard
Normal CDF, . In terms of the sample mean, the CLT says
n(Xn X ) D
N (0, 1)
X
Markov Chains
Definition
We use
to denote is approximately distributed. We can use the
Central Limit Theorem to approximate the distribution of a random
variable Y = X1 + X2 + + Xn that is a sum of n i.i.d. random
2
variables Xi . Let E(Y ) = Y and Var(Y ) = Y
. The CLT says
yfY (y)dy
Conditional Expectation
Conditioning on an Event We can find E(Y |A), the expected value
of Y given that event A occurred. A very important case is when A is
the event X = x. Note that E(Y |A) is a number. For example:
fX(i) (x) = n
E(Y ) =
5/12
1
1/2
1/2
1/4
7/12
1/3
1/6
7/8
1/4
1/8
State Properties
A state is either recurrent or transient.
If you start at a recurrent state, then you will always return
back to that state at some point in the future. You can
check-out any time you like, but you can never leave.
Otherwise you are at a transient state. There is some positive
probability that once you leave you will never return. You
dont have to go home, but you cant stay here.
A state is either periodic or aperiodic.
If you start at a periodic state of period k, then the GCD of
the possible numbers of steps it would take to return back is
k > 1.
Otherwise you are at an aperiodic state. The GCD of the
possible numbers of steps it would take to return back is 1.
Transition Matrix
Continuous Distributions
Uniform Distribution
0.10
0.05
0.2
PDF
0.1
0.0
0.00
15
20
0.10
PDF
0.05
0.10
PDF
0.00
0.05
15
20
10
x
Example You are at a bank, and there are 3 people ahead of you.
The serving time for each person is Exponential with mean 2 minutes.
Only one person at a time can be served. The distribution of your
waiting time until its your turn to be served is Gamma(3, 12 ).
Beta Distribution
Beta(2, 1)
2.0
Beta(0.5, 0.5)
1.5
X
N (0, 1)
Story You sit waiting for shooting stars, where the waiting time for a
star is distributed Expo(). You want to see n shooting stars before
you go home. The total waiting time for the nth shooting star is
Gamma(n, ).
Z=
Example The waiting time until the next shooting star is distributed
Expo(4) hours. Here = 4 is the rate parameter, since shooting
stars arrive at a rate of 1 per 1/4 hour on average. The expected time
until the next shooting star is 1/ = 1/4 hour.
PDF
1.0
0.5
0.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
Beta(2, 8)
Beta(5, 5)
0.8
1.0
0.8
1.0
2.5
2.0
Exponential Distribution
10
x
Y Expo() X = Y Expo(1)
3
0.00
0
Normal Distribution
20
PDF
1.0
1.5
15
0.5
To find the stationary distribution, you can solve the matrix equation
(Q0 I)~
s 0 = 0. The stationary distribution is uniform if the columns
of Q sum to 1.
10
x
0.0
Gamma(5, 0.5)
Stationary Distribution
20
15
10
x
Gamma(10, 1)
Chain Properties
= P (Xn+m = j|Xn = i)
Example William throws darts really badly, so his darts are uniform
over the whole room because theyre equally likely to appear anywhere.
Williams darts have a Uniform distribution on the surface of the
room. The Uniform is the only distribution where the probability of
hitting in any specific region is proportional to the length/area/volume
of that region, and where the density of occurrence in any one specific
spot is constant throughout the whole support.
PDF
2
(m)
qij
Gamma(3, 0.5)
To find the probability that the chain goes from state i to state j in
exactly m steps, take the (i, j) element of Qm .
Gamma(3, 1)
Gamma Distribution
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
x
X
X+Y
Beta(a, b)
X+Y
X
X+Y
2 (Chi-Square) Distribution
Let us say that X is distributed 2n . We know the following:
Geometric Distribution
1
10
Discrete Distributions
Replace
No Replace
Binomial
(Bern if n = 1)
NBin
(Geom if r = 1)
HGeom
NHGeom
Bernoulli Distribution
Hypergeometric Distribution
Example Let X be the indicator of Heads for a fair coin toss. Then
X Bern( 21 ). Also, 1 X Bern( 21 ) is the indicator of Tails.
Binomial Distribution
0.25
0.30
Bin(10,1/2)
0.15
0.05
0.00
10
2. Conditional X|(X + Y = n) Bin n,
1
1 +2
Multivariate Distributions
Multinomial Distribution
~ = (X1 , X2 , X3 , . . . , Xk ) Multk (n, p
Let us say that the vector X
~)
where p
~ = (p1 , p2 , . . . , pk ).
Story We have n items, which can fall into any one of the k buckets
independently with the probabilities p
~ = (p1 , p2 , . . . , pk ).
Example Let us assume that every year, 100 students in the Harry
Potter Universe are randomly and independently sorted into one of
four houses with equal probability. The number of people in each of the
houses is distributed Mult4 (100, p
~), where p
~ = (0.25, 0.25, 0.25, 0.25).
Note that X1 + X2 + + X4 = 100, and they are dependent.
Joint PMF For n = n1 + n2 + + nk ,
n!
n
n
n
~ =~
P (X
n) =
p 1 p 2 . . . pk k
n1 !n2 ! . . . nk ! 1 2
Marginal PMF, Lumping, and Conditionals Marginally,
Xi Bin(n, pi ) since we can define success to mean category i. If
you lump together multiple categories in a Multinomial, then it is still
Multinomial. For example, Xi + Xj Bin(n, pi + pj ) for i 6= j since
we can define success to mean being in category i or j. Similarly, if
k = 6 and we lump categories 1-2 and lump categories 3-5, then
(X1 + X2 , X3 + X4 + X5 , X6 ) Mult3 (n, (p1 + p2 , p3 + p4 + p5 , p6 ))
Conditioning on some Xj also still gives a Multinomial:
pk1
p1
,...,
X1 , . . . , Xk1 |Xk = nk Multk1 n nk ,
1 pk
1 pk
Variances and Covariances We have Xi Bin(n, pi ) marginally, so
Var(Xi ) = npi (1 pi ). Also, Cov(Xi , Xj ) = npi pj for i 6= j.
1. Sum X + Y Pois(1 + 2 )
0.10
pmf
0.20
You have w white balls and b black balls, and you draw n balls
without replacement. The number of white balls in your sample
is HGeom(w, b, n); the number of black balls is HGeom(b, w, n).
Capture-recapture A forest has N elk, you capture n of them,
tag them, and release them. Then you recapture a new sample
of size m. How many tagged elk are now in the new sample?
HGeom(n, N n, m)
See the univariate Uniform for stories and examples. For the 2D
Uniform on some region, probability is proportional to area. Every
point in the support has equal density, of value area of1 region . For the
3D Uniform, probability is proportional to volume.
Poisson Distribution
Story There are rare events (low probability events) that occur many
different ways (high possibilities of occurences) at an average rate of
occurrences per unit space or time. The number of events that occur
in that unit of space or time is X.
Example A certain busy intersection has an average of 2 accidents
per month. Since an accident is a low probability event that can
happen many different ways, it is reasonable to model the number of
accidents in a month at that intersection as Pois(2). Then the number
of accidents that happen in two months at that intersection is
distributed Pois(4).
Distribution Properties
Important CDFs
1+
Standard Normal
Exponential() F (x) = 1 ex , for x (0, )
1
1
1
+ + +
log n + 0.577 . . .
2
3
n
n!
2n
n
e
n
Miscellaneous Definitions
Example Problems
2. Beta(1, 1) Unif(0, 1)
3. Gamma(1, ) Expo()
1
4. 2n Gamma n
2, 2
Calculating Probability
5. NBin(1, p) Geom(p)
Inequalities
p
1. Cauchy-Schwarz |E(XY )|
2. Markov P (X a)
E|X|
a
E(X 2 )E(Y 2 )
for a > 0
3. Chebyshev P (|X | a)
2
a2
Formulas
1 + r + r + + r
n1
n1
X
1 rn
=
r =
1r
k=0
k
1
if |r| < 1
1 + r + r + =
1r
2
X
xn
x2
x3
x n
x
e =
=1+x+
+
+ = lim
1+
n
n!
2!
3!
n
n=0
Geometric Series
X1
n
n
n
+
+ +
= n
n
n1
1
j
j=1
for 0 < a < 1 (and the CDF is 0 for a 0 and 1 for a 1).
1
X+1
=
X
k=0
1
X+1
. Answer: By LOTUS,
1 e k
e X k+1
e
=
=
(e 1)
k+1
k!
k=0 (k + 1)!
p(1 s)
s
p(1 p)(1 s)
p2 (1 s)
p(1 s)(p + s(1 p))
+
=
s
s2
s2
But a much nicer way to use the MGF here is via pattern recognition:
note that M (t) looks like it came from a geometric series:
1
1
n
X
X
t
n! tn
=
=
n n!
n=0
n=0
0
Q=
1
0
1
1
,
+ +
= s1 q10
s0 q01 =
+
Problem-Solving Strategies
Contributions from Jessy Hwang, Yuan Jiang, Yuqi Hou
1. Getting started. Start by defining relevant events and
random variables. (Let A be the event that I pick the fair
coin; Let X be the number of successes.) Clear notion is
important for clear thinking! Then decide what it is that youre
supposed to be finding, in terms of your notation (I want to
find P (X = 3|A)). Think about what type of object your
answer should be (a number? A random variable? A PMF? A
PDF?) and what it should be in terms of.
Try simple and extreme cases. To make an abstract experiment
more concrete, try drawing a picture or making up numbers
that could have happened. Pattern recognition: does the
structure of the problem resemble something weve seen before?
Biohazards
Contributions from Jessy Hwang
1. Dont misuse the naive definition of probability. When
answering What is the probability that in a group of 3 people,
no two have the same birth month?, it is not correct to treat
the people as indistinguishable balls being placed into 12 boxes,
since that assumes the list of birth months {January, January,
January} is just as likely as the list {January, April, June},
even though the latter is six times more likely.
2. Dont confuse unconditional, conditional, and joint
P (B|A)P (A)
probabilities. In applying P (A|B) =
, it is not
P (B)
correct to say P (B) = 1 because we know B happened; P (B)
is the prior probability of B. Dont confuse P (A|B) with
P (A, B).
3. Dont assume independence without justification. In the
matching problem, the probability that card 1 is a match and
card 2 is a match is not 1/n2 . Binomial and Hypergeometric
are often confused; the trials are independent in the Binomial
story and dependent in the Hypergeometric story.
write
F (X)dx, because F (X) is a random variable. It does
not make sense to write P (X), because X is not an event.
Recommended Resources
Introduction to Probability Book
(http://bit.ly/introprobability)
Stat 110 Online (http://stat110.net)
Stat 110 Quora Blog (https://stat110.quora.com/)
Quora Probability FAQ (http://bit.ly/probabilityfaq)
R Studio (https://www.rstudio.com)
LaTeX File (github.com/wzchen/probability cheatsheet)
Please share this cheatsheet with friends!
http://wzchen.com/probability-cheatsheet
Distributions in R
Command
help(distributions)
dbinom(k,n,p)
pbinom(x,n,p)
qbinom(a,n,p)
rbinom(r,n,p)
dgeom(k,p)
dhyper(k,w,b,n)
dnbinom(k,r,p)
dpois(k,r)
dbeta(x,a,b)
dchisq(x,n)
dexp(x,b)
dgamma(x,a,r)
dlnorm(x,m,s)
dnorm(x,m,s)
dt(x,n)
dunif(x,a,b)
What it does
shows documentation on distributions
PMF P (X = k) for X Bin(n, p)
CDF P (X x) for X Bin(n, p)
ath quantile for X Bin(n, p)
vector of r i.i.d. Bin(n, p) r.v.s
PMF P (X = k) for X Geom(p)
PMF P (X = k) for X HGeom(w, b, n)
PMF P (X = k) for X NBin(r, p)
PMF P (X = k) for X Pois(r)
PDF f (x) for X Beta(a, b)
PDF f (x) for X 2n
PDF f (x) for X Expo(b)
PDF f (x) for X Gamma(a, r)
PDF f (x) for X LN (m, s2 )
PDF f (x) for X N (m, s2 )
PDF f (x) for X tn
PDF f (x) for X Unif(a, b)
The table above gives R commands for working with various named
distributions. Commands analogous to pbinom, qbinom, and rbinom
work for the other distributions in the table. For example, pnorm,
qnorm, and rnorm can be used to get the CDF, quantiles, and random
generation for the Normal. For the Multinomial, dmultinom can be used
for calculating the joint PMF and rmultinom can be used for generating
random vectors. For the Multivariate Normal, after installing and
loading the mvtnorm package dmvnorm can be used for calculating the
joint PDF and rmvnorm can be used for generating random vectors.
Table of Distributions
Distribution
Expected Value
Variance
MGF
Bernoulli
Bern(p)
P (X = 1) = p
P (X = 0) = q = 1 p
pq
q + pet
k {0, 1, 2, . . . n}
np
npq
(q + pet )n
P (X = k) = q k p
k {0, 1, 2, . . . }
q/p
q/p2
rq/p
rq/p2
Binomial
Bin(n, p)
P (X = k) =
Geometric
Geom(p)
Negative Binomial
NBin(r, p)
Hypergeometric
HGeom(w, b, n)
Poisson
Pois()
n {0, 1, 2, . . . }
P (X = k) =
Beta
Beta(a, b)
Log-Normal
LN (, 2 )
Chi-Square
2n
Student-t
tn
w
k
P (X = k) =
b
nk
w+b
n
w+bn
w+b1
nn
(1
)
n
messy
e(e
a+b
2
(ba)2
12
x (, )
et+
f (x) = ex
x (0, )
1
2
,
t
a
2
f (x) =
1
ba
1
(x)a ex x1
(a)
(a+b) a1
x
(1
(a)(b)
=
2
e(log x)
2 t2
2
t<
a
,t<
x)b1
x (0, 1)
1)
etb eta
t(ba)
2
2
1 e(x ) /(2 )
2
x (0, )
nw
b+w
e k
k!
x (a, b)
p
r
t
( 1qe
t ) , qe < 1
f (x) =
f (x) =
qet < 1
k {0, 1, 2, . . . }
Exponential
Expo()
Gamma
Gamma(a, )
k {0, 1, 2, . . . , n}
f (x) =
p
,
1qet
r+n1 r n
p q
r1
P (X = n) =
Uniform
Unif(a, b)
Normal
N (, 2 )
n k nk
p q
k
(1)
(a+b+1)
a
a+b
messy
/(2 2 )
x (0, )
= e+
/2
2 (e 1)
doesnt exist
2n
1
xn/21 ex/2
2n/2 (n/2)
x (0, )
((n+1)/2)
(1
n(n/2)
+ x2 /n)(n+1)/2
x (, )
0 if n > 1
n
n2
if n > 2
doesnt exist