A Probability and Statistics Cheatsheet
A Probability and Statistics Cheatsheet
i=0
i! x!
0.8
● ● ● ● ●
凸 +
8 "
n = 30, p = 0.6 p = 0.5 λ=4
0.25
+ +
n = 25, p = 0.9 p = 0.8 λ = 10
0.3
0.20
0.6
0.15
0.2
1 ^1
PMF
PMF
PMF
PMF
●\
0.4
n
● ● ● ● ● ●
I●
●
\ j,
●
I
心
\'
0.10
I
j,
++
●
\
1
^ ,
●
++
0.1
',
,
0.2
● +
I
● ^1,● \
0.05
● +
^Lj
- -- -~-
● ●
● + + +
I● ●
\ +
---- ●
● +
丛
● ●
● +
A +● ●
● ● ●
午
+ ++ + +
令
牛
凸
令
伞
●
令
企
伞
伞
十
0.00
● ●
0.0
0.0
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● + + + ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
a b 0 10 20 30 40 0 2 4 6 8 10 0 5 10 15 20
x x x x
1 We use the notation γ(s, x) and Γ(x) to refer to the Gamma functions (see §22.1), and use B(x, y) and Ix to refer to the Beta functions (see §22.2).
3
1.2 Continuous Distributions
Notation FX (x) fX (x) E [X] V [X] MX (s)
0 x<a
(b − a)2 esb − esa
x−a I(a < x < b) a+b
Uniform Unif (a, b) a<x<b
b−a b−a 2 12 s(b − a)
1 x>b
(x − µ)2
Z x
σ 2 s2
1
N µ, σ 2 σ2
Normal Φ(x) = φ(t) dt φ(x) = √ exp − µ exp µs +
−∞ σ 2π 2σ 2 2
(ln x − µ)2
1 1 ln x − µ 1 2 2 2
ln N µ, σ 2 eµ+σ /2
(eσ − 1)e2µ+σ
Log-Normal + erf √ √ exp −
2 2 2σ 2 x 2πσ 2 2σ 2
1 T
Σ−1 (x−µ) 1
Multivariate Normal MVN (µ, Σ) (2π)−k/2 |Σ|−1/2 e− 2 (x−µ) µ Σ exp µT s + sT Σs
2
−(ν+1)/2
Γ ν+1
ν ν
2 x2
Student’s t Student(ν) Ix , √ ν
1+ 0 0
2 2 νπΓ 2 ν
1 k x 1
Chi-square χ2k γ , xk/2 e−x/2 k 2k (1 − 2s)−k/2 s < 1/2
Γ(k/2) 2 2 2k/2 Γ(k/2)
r
d
(d1 x)d1 d2 2
2d22 (d1 + d2 − 2)
d1 d1 (d1 x+d2 )d1 +d2 d2
F F(d1 , d2 ) I d1 x , d1 d1 d2 − 2 d1 (d2 − 2)2 (d2 − 4)
d1 x+d2 2 2 xB 2
, 2
1 −x/β 1
Exponential Exp (β) 1 − e−x/β e β β2 (s < 1/β)
β 1 − βs
α
γ(α, x/β) 1 1
Gamma Gamma (α, β) xα−1 e−x/β αβ αβ 2 (s < 1/β)
Γ(α) Γ (α) β α 1 − βs
Γ α, βx
β α −α−1 −β/x β β2 2(−βs)α/2 p
Inverse Gamma InvGamma (α, β) x e α>1 α>2 Kα −4βs
Γ (α) Γ (α) α−1 (α − 1)2 (α − 2)2 Γ(α)
P
k
Γ i=1 αi Y α −1
k
αi E [Xi ] (1 − E [Xi ])
Dirichlet Dir (α) Qk xi i Pk Pk
i=1 Γ (αi ) i=1 i=1 αi i=1 αi + 1
∞ k−1
!
Γ (α + β) α−1 α αβ X Y α+r sk
Beta Beta (α, β) Ix (α, β) x (1 − x)β−1 1+
Γ (α) Γ (β) α+β (α + β)2 (α + β + 1) r=0
α+β+r k!
k=1
∞
sn λn
k k x k−1 −(x/λ)k 1 2 X n
Weibull Weibull(λ, k) 1 − e−(x/λ) e λΓ 1 + λ2 Γ 1 + − µ2 Γ 1+
λ λ k k n=0
n! k
x α
m xα αxm xα
Pareto Pareto(xm , α) 1− x ≥ xm m
α α+1 x ≥ xm α>1 m
α>2 α(−xm s)α Γ(−α, −xm s) s < 0
x x α−1 (α − 1)2 (α − 2)
4
2 Probability Theory Law of Total Probability
n n
Definitions X G
P [B] = P [B|Ai ] P [Ai ] Ω= Ai
• Sample space Ω i=1 i=1
2. P [Ω] = 1
"∞ #
G X∞ 3 Random Variables
3. P Ai = P [Ai ]
i=1 i=1 Random Variable
• Probability space (Ω, A, P) X:Ω→R
d −1 dx 1 • E [X] = E [E [X | Y ]]
fZ (z) = fX (ϕ−1 (z)) ϕ (z) = fX (x) = fX (x) ∞
Z
dz dz |J| • E[ϕ(X, Y ) | X = x] = ϕ(x, y)fY |X (y | x) dx
Z −∞
∞
The Rule of the Lazy Statistician
• E [ϕ(Y, Z) | X = x] = ϕ(y, z)f(Y,Z)|X (y, z | x) dy dz
−∞
Z
E [Z] = ϕ(x) dFX (x) • E [Y + Z | X] = E [Y | X] + E [Z | X]
Z Z • E [ϕ(X)Y | X] = ϕ(X)E [Y | X]
E [IA (x)] = IA (x) dFX (x) = dFX (x) = P [X ∈ A] • E[Y | X] = c =⇒ Cov [X, Y ] = 0
A
Convolution
Z ∞ Z z
5 Variance
X,Y ≥0
• Z := X + Y fZ (z) = fX,Y (x, z − x) dx = fX,Y (x, z − x) dx Variance
−∞ 0
Z ∞ 2
2
• Z := |X − Y | fZ (z) = 2 fX,Y (x, z + x) dx • V [X] = σX = E (X − E [X])2 = E X 2 − E [X]
" n # n
Z ∞ 0 Z ∞ X X X
X ⊥⊥ • V Xi = V [Xi ] + 2 Cov [Xi , Yj ]
• Z := fZ (z) = |x|fX,Y (x, xz) dx = xfx (x)fX (x)fY (xz) dx i=1 i=1
Y −∞ −∞ " n #
i6=j
X n
X
• V Xi = V [Xi ] iff Xi ⊥⊥ Xj
4 Expectation i=1 i=1
1
z
(∀ε > 0) lim P [|Xn − X| > ε] = 0
n→∞
f (x, y) = exp −
2(1 − ρ2 )
p
2πσx σy 1 − ρ2 as
3. Almost surely (strongly): Xn → X
" 2 2 #
x − µx y − µy x − µx y − µy h i h i
z= + − 2ρ P lim Xn = X = P ω ∈ Ω : lim Xn (ω) = X(ω) = 1
σx σy σx σy n→∞ n→∞
9
qm
4. In quadratic mean (L2 ): Xn → X CLT Notations
11 Statistical Inference
10.1 Law of Large Numbers (LLN) iid
Let X1 , · · · , Xn ∼ F if not otherwise noted.
Let {X1 , . . . , Xn } be a sequence of iid rv’s, E [X1 ] = µ, and V [X1 ] < ∞.
11.1 Point Estimation
Weak (WLLN)
P • Point estimator θbn of θ is a rv: θbn = g(X1 , . . . , Xn )
X̄n → µ as n → ∞ h i
• bias(θbn ) = E θbn − θ
Strong (SLLN) P
as • Consistency: θbn → θ
X̄n → µ as n → ∞
• Sampling distribution: F (θbn )
r h i
• Standard error: se(θn ) = V θbn
b
10.2 Central Limit Theorem (CLT)
h i h i
Let {X1 , . . . , Xn } be a sequence of iid rv’s, E [X1 ] = µ, and V [X1 ] = σ 2 . • Mean squared error: mse = E (θbn − θ)2 = bias(θbn )2 + V θbn
p-value evidence
where < 0.01 very strong evidence against H0
0.01 − 0.05 strong evidence against H0
r
T 0.05 − 0.1 weak evidence against H0
se(b
b τ) = ˆ
∇ϕ Jˆn ∇ϕ
ˆ > 0.1 little or no evidence against H0
Wald Test
• Two-sided test
and Jˆn = Jn (θ) ˆ = ∇ϕ
b and ∇ϕ
θ=θb
. θb − θ0
• Reject H0 when |W | > zα/2 where W =
se
b
• P |W | > zα/2 → α
• p-value = Pθ0 [|W | > |w|] ≈ P [|Z| > |w|] = 2Φ(−|w|)
12.4 Parametric Bootstrap
Likelihood Ratio Test (LRT)
Sample from f (x; θbn ) instead of from F̂n , where θbn could be the mle or method supθ∈Θ Ln (θ) Ln (θbn )
• T (X) = =
of moments estimator. supθ∈Θ0 Ln (θ) Ln (θbn,0 ) 13