0% found this document useful (0 votes)
37 views9 pages

A Probability and Statistics Cheatsheet

Uploaded by

niklauswillking
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views9 pages

A Probability and Statistics Cheatsheet

Uploaded by

niklauswillking
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1 Distribution Overview

1.1 Discrete Distributions


Notation1 FX (x) fX (x) E [X] V [X] MX (s)

0 x<a
(b − a + 1)2 − 1 eas − e−(b+1)s

bxc−a+1 I(a < x < b) a+b
Uniform Unif {a, . . . , b} a≤x≤b
 b−a b−a+1 2 12 s(b − a)
1 x>b

Bernoulli Bern (p) (1 − p)1−x px (1 − p)1−x p p(1 − p) 1 − p + pes
!
n x
Binomial Bin (n, p) I1−p (n − x, x + 1) p (1 − p)n−x np np(1 − p) (1 − p + pes )n
x
k k
!n
n! x
X X
Multinomial Mult (n, p) px1 1 · · · pkk xi = n npi npi (1 − pi ) pi e si
x1 ! . . . xk ! i=1 i=0
! m m−x
 
x − np x n−x nm nm(N − n)(N − m)
Hypergeometric Hyp (N, m, n) ≈Φ N
N/A
N 2 (N − 1)
p 
np(1 − p) x
N
!  r
x+r−1 r 1−p 1−p p
Negative Binomial NBin (n, p) Ip (r, x + 1) p (1 − p)x r r
r−1 p p2 1 − (1 − p)es
1 1−p p
Geometric Geo (p) 1 − (1 − p)x x ∈ N+ p(1 − p)x−1 x ∈ N+
p p2 1 − (1 − p)es
x
X λi λx e−λ s
Poisson Po (λ) e−λ λ λ eλ(e −1)

i=0
i! x!

Uniform (discrete) Binomial Geometric Poisson


n = 40, p = 0.3 p = 0.2 λ=1

0.8
● ● ● ● ●
凸 +

8 "
n = 30, p = 0.6 p = 0.5 λ=4
0.25

+ +
n = 25, p = 0.9 p = 0.8 λ = 10

0.3
0.20

0.6
0.15

0.2
1 ^1
PMF

PMF

PMF

PMF
●\
0.4

n
● ● ● ● ● ●
I●

\ j,

I

\'
0.10

I
j,

++

\
1
^ ,


++

0.1
',

0.2

● +

I
● ^1,● \
0.05

● +

^Lj
- -- -~-
● ●

● + + +
I● ●
\ +
---- ●
● +

● ●
● +
A +● ●
● ● ●

+ ++ + +









0.00

● ●
0.0

0.0
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● + + + ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

a b 0 10 20 30 40 0 2 4 6 8 10 0 5 10 15 20
x x x x

1 We use the notation γ(s, x) and Γ(x) to refer to the Gamma functions (see §22.1), and use B(x, y) and Ix to refer to the Beta functions (see §22.2).
3
1.2 Continuous Distributions
Notation FX (x) fX (x) E [X] V [X] MX (s)

0 x<a
(b − a)2 esb − esa

x−a I(a < x < b) a+b
Uniform Unif (a, b) a<x<b
 b−a b−a 2 12 s(b − a)
1 x>b

(x − µ)2
Z x
σ 2 s2
   
1
N µ, σ 2 σ2

Normal Φ(x) = φ(t) dt φ(x) = √ exp − µ exp µs +
−∞ σ 2π 2σ 2 2
(ln x − µ)2
   
1 1 ln x − µ 1 2 2 2
ln N µ, σ 2 eµ+σ /2
(eσ − 1)e2µ+σ

Log-Normal + erf √ √ exp −
2 2 2σ 2 x 2πσ 2 2σ 2
 
1 T
Σ−1 (x−µ) 1
Multivariate Normal MVN (µ, Σ) (2π)−k/2 |Σ|−1/2 e− 2 (x−µ) µ Σ exp µT s + sT Σs
2
−(ν+1)/2
Γ ν+1
 
ν ν 
2 x2
Student’s t Student(ν) Ix , √ ν
 1+ 0 0
2 2 νπΓ 2 ν
 
1 k x 1
Chi-square χ2k γ , xk/2 e−x/2 k 2k (1 − 2s)−k/2 s < 1/2
Γ(k/2) 2 2 2k/2 Γ(k/2)
r
d
(d1 x)d1 d2 2
2d22 (d1 + d2 − 2)
 
d1 d1 (d1 x+d2 )d1 +d2 d2
F F(d1 , d2 ) I d1 x , d1 d1 d2 − 2 d1 (d2 − 2)2 (d2 − 4)

d1 x+d2 2 2 xB 2
, 2
1 −x/β 1
Exponential Exp (β) 1 − e−x/β e β β2 (s < 1/β)
β 1 − βs
 α
γ(α, x/β) 1 1
Gamma Gamma (α, β) xα−1 e−x/β αβ αβ 2 (s < 1/β)
Γ(α) Γ (α) β α 1 − βs
Γ α, βx

β α −α−1 −β/x β β2 2(−βs)α/2 p 
Inverse Gamma InvGamma (α, β) x e α>1 α>2 Kα −4βs
Γ (α) Γ (α) α−1 (α − 1)2 (α − 2)2 Γ(α)
P 
k
Γ i=1 αi Y α −1
k
αi E [Xi ] (1 − E [Xi ])
Dirichlet Dir (α) Qk xi i Pk Pk
i=1 Γ (αi ) i=1 i=1 αi i=1 αi + 1
∞ k−1
!
Γ (α + β) α−1 α αβ X Y α+r sk
Beta Beta (α, β) Ix (α, β) x (1 − x)β−1 1+
Γ (α) Γ (β) α+β (α + β)2 (α + β + 1) r=0
α+β+r k!
k=1

sn λn 
   
k k  x k−1 −(x/λ)k 1 2 X n
Weibull Weibull(λ, k) 1 − e−(x/λ) e λΓ 1 + λ2 Γ 1 + − µ2 Γ 1+
λ λ k k n=0
n! k
 x α
m xα αxm xα
Pareto Pareto(xm , α) 1− x ≥ xm m
α α+1 x ≥ xm α>1 m
α>2 α(−xm s)α Γ(−α, −xm s) s < 0
x x α−1 (α − 1)2 (α − 2)

4
2 Probability Theory Law of Total Probability
n n
Definitions X G
P [B] = P [B|Ai ] P [Ai ] Ω= Ai
• Sample space Ω i=1 i=1

• Outcome (point or element) ω ∈ Ω Bayes’ Theorem


• Event A ⊆ Ω
n
• σ-algebra A P [B | Ai ] P [Ai ] G
P [Ai | B] = Pn Ω= Ai
1. ∅ ∈ A j=1 P [B | Aj ] P [Aj ] i=1
S∞
2. A1 , A2 , . . . , ∈ A =⇒ i=1 Ai ∈ A Inclusion-Exclusion Principle
3. A ∈ A =⇒ ¬A ∈ A
n n r
• Probability distribution P
[ X X \
Ai = (−1)r−1 Aij
1. P [A] ≥ 0 for every A i=1 r=1 i≤i1 <···<ir ≤n j=1

2. P [Ω] = 1
"∞ #
G X∞ 3 Random Variables
3. P Ai = P [Ai ]
i=1 i=1 Random Variable
• Probability space (Ω, A, P) X:Ω→R

Properties Probability Mass Function (PMF)

• P [∅] = 0 fX (x) = P [X = x] = P [{ω ∈ Ω : X(ω) = x}]


• B = Ω ∩ B = (A ∪ ¬A) ∩ B = (A ∩ B) ∪ (¬A ∩ B)
Probability Density Function (PDF)
• P [¬A] = 1 − P [A]
b
• P [B] = P [A ∩ B] + P [¬A ∩ B]
Z
P [a ≤ X ≤ b] = f (x) dx
• P [Ω] = 1 P [∅] = 0 a
S T T S
• ¬( n An ) = n ¬An ¬( n An ) = n ¬An DeMorgan
S T Cumulative Distribution Function (CDF):
• P [ n An ] = 1 − P [ n ¬An ]
• P [A ∪ B] = P [A] + P [B] − P [A ∩ B] FX : R → [0, 1] FX (x) = P [X ≤ x]
=⇒ P [A ∪ B] ≤ P [A] + P [B] 1. Nondecreasing: x1 < x2 =⇒ F (x1 ) ≤ F (x2 )
• P [A ∪ B] = P [A ∩ ¬B] + P [¬A ∩ B] + P [A ∩ B] 2. Normalized: limx→−∞ = 0 and limx→∞ = 1
• P [A ∩ ¬B] = P [A] − P [A ∩ B] 3. Right-continuous: limy↓x F (y) = F (x)
Continuity of Probabilities
S∞ Z b
• A1 ⊂ A2 ⊂ . . . =⇒ limn→∞ P [An ] = P [A] where A = i=1 Ai P [a ≤ Y ≤ b | X = x] = fY |X (y | x)dy a≤b
T∞
• A1 ⊃ A2 ⊃ . . . =⇒ limn→∞ P [An ] = P [A] where A = i=1 Ai a
f (x, y)
Independence ⊥⊥ fY |X (y | x) =
A⊥
⊥ B ⇐⇒ P [A ∩ B] = P [A] P [B] fX (x)
Independence
Conditional Probability
1. P [X ≤ x, Y ≤ y] = P [X ≤ x] P [Y ≤ y]
P [A ∩ B]
P [A | B] = if P [B] > 0 2. fX,Y (x, y) = fX (x)fY (y)
P [B] 6
Z
3.1 Transformations • E [XY ] = xyfX,Y (x, y) dFX (x) dFY (y)
X,Y
Transformation function
• E [ϕ(Y )] 6= ϕ(E [X]) (cf. Jensen inequality)
Z = ϕ(X)
• P [X ≥ Y ] = 0 =⇒ E [X] ≥ E [Y ] ∧ P [X = Y ] = 1 =⇒ E [X] = E [Y ]
Discrete X ∞
X • E [X] = P [X ≥ x]
fZ (z) = P [ϕ(X) = z] = P [{x : ϕ(x) = z}] = P X ∈ ϕ−1 (z) =
 
f (x) x=1
x∈ϕ−1 (z) Sample mean
n
Continuous 1X
X̄n = Xi
Z n i=1
FZ (z) = P [ϕ(X) ≤ z] = f (x) dx with Az = {x : ϕ(x) ≤ z} Conditional Expectation
Az Z
Special case if ϕ strictly monotone • E [Y | X = x] = yf (y | x) dy

d −1 dx 1 • E [X] = E [E [X | Y ]]
fZ (z) = fX (ϕ−1 (z)) ϕ (z) = fX (x) = fX (x) ∞
Z
dz dz |J| • E[ϕ(X, Y ) | X = x] = ϕ(x, y)fY |X (y | x) dx
Z −∞

The Rule of the Lazy Statistician
• E [ϕ(Y, Z) | X = x] = ϕ(y, z)f(Y,Z)|X (y, z | x) dy dz
−∞
Z
E [Z] = ϕ(x) dFX (x) • E [Y + Z | X] = E [Y | X] + E [Z | X]
Z Z • E [ϕ(X)Y | X] = ϕ(X)E [Y | X]
E [IA (x)] = IA (x) dFX (x) = dFX (x) = P [X ∈ A] • E[Y | X] = c =⇒ Cov [X, Y ] = 0
A
Convolution
Z ∞ Z z
5 Variance
X,Y ≥0
• Z := X + Y fZ (z) = fX,Y (x, z − x) dx = fX,Y (x, z − x) dx Variance
−∞ 0
Z ∞ 2
    2
• Z := |X − Y | fZ (z) = 2 fX,Y (x, z + x) dx • V [X] = σX = E (X − E [X])2 = E X 2 − E [X]
" n # n
Z ∞ 0 Z ∞ X X X
X ⊥⊥ • V Xi = V [Xi ] + 2 Cov [Xi , Yj ]
• Z := fZ (z) = |x|fX,Y (x, xz) dx = xfx (x)fX (x)fY (xz) dx i=1 i=1
Y −∞ −∞ " n #
i6=j
X n
X
• V Xi = V [Xi ] iff Xi ⊥⊥ Xj
4 Expectation i=1 i=1

Expectation Standard deviation p


X sd[X] = V [X] = σX


 xfX (x) X discrete Covariance
Z  x

• E [X] = µX = x dFX (x) = • Cov [X, Y ] = E [(X − E [X])(Y − E [Y ])] = E [XY ] − E [X] E [Y ]
Z
 • Cov [X, a] = 0
 xfX (x) X continuous


• Cov [X, X] = V [X]
• P [X = c] = 1 =⇒ E [c] = c • Cov [X, Y ] = Cov [Y, X]
• E [cX] = c E [X] • Cov [aX, bY ] = abCov [X, Y ]
• E [X + Y ] = E [X] + E [Y ] • Cov [X + a, Y + b] = Cov [X, Y ]
7
 
Xn m
X n X
X m • limn→∞ Bin (n, p) = N (np, np(1 − p)) (n large, p far from 0 and 1)
• Cov  Xi , Yj  = Cov [Xi , Yj ]
Negative Binomial
i=1 j=1 i=1 j=1
• X ∼ NBin (1, p) = Geo (p)
Correlation Pr
Cov [X, Y ] • X ∼ NBin (r, p) = i=1 Geo (p)
ρ [X, Y ] = p • Xi ∼ NBin (ri , p) =⇒
P P
Xi ∼ NBin ( ri , p)
V [X] V [Y ]
• X ∼ NBin (r, p) . Y ∼ Bin (s + r, p) =⇒ P [X ≤ s] = P [Y ≥ r]
Independence
Poisson
X ⊥⊥ Y =⇒ ρ [X, Y ] = 0 ⇐⇒ Cov [X, Y ] = 0 ⇐⇒ E [XY ] = E [X] E [Y ] n
X n
X
!
• Xi ∼ Po (λi ) ∧ Xi ⊥⊥ Xj =⇒ Xi ∼ Po λi
Sample variance i=1 i=1
n
1 X
 
n n
S2 = (Xi − X̄n )2 X X λi
n − 1 i=1 • Xi ∼ Po (λi ) ∧ Xi ⊥⊥ Xj =⇒ Xi Xj ∼ Bin  Xj , Pn 
j=1 j=1 j=1 λj
Conditional Variance
Exponential
    2
• V [Y | X] = E (Y − E [Y | X])2 | X = E Y 2 | X − E [Y | X] n
X
• V [Y ] = E [V [Y | X]] + V [E [Y | X]] • Xi ∼ Exp (β) ∧ Xi ⊥⊥ Xj =⇒ Xi ∼ Gamma (n, β)
i=1
• Memoryless property: P [X > x + y | X > y] = P [X > x]
6 Inequalities Normal
 
X−µ

Cauchy-Schwarz
2
• X ∼ N µ, σ 2 =⇒ ∼ N (0, 1)
σ
E [XY ] ≤ E X 2 E Y 2
   
 
• X ∼ N µ, σ ∧ Z = aX + b =⇒ Z ∼ N aµ + b, a2 σ 2
2
Markov •
 
X ∼ N µ1 , σ12 ∧ Y ∼ N µ2 , σ22 =⇒ X + Y ∼ N µ1 + µ2 , σ12 + σ22

E [ϕ(X)]
P [ϕ(X) ≥ t] ≤ •

Xi ∼ N µi , σi2 =⇒
P
X ∼N
P P 2

t  i i i µi , i σi

P [a < X ≤ b] = Φ b−µ − Φ a−µ



Chebyshev • σ σ
V [X]
P [|X − E [X]| ≥ t] ≤ • Φ(−x) = 1 − Φ(x) φ0 (x) = −xφ(x) φ00 (x) = (x2 − 1)φ(x)
t2
−1
Chernoff • Upper quantile of N (0, 1): zα = Φ (1 − α)

 
P [X ≥ (1 + δ)µ] ≤ δ > −1 Gamma
(1 + δ)1+δ
• X ∼ Gamma (α, β) ⇐⇒ X/β ∼ Gamma (α, 1)
Jensen Pα
• Gamma (α, β) ∼ i=1 Exp (β)
E [ϕ(X)] ≥ ϕ(E [X]) ϕ convex P P
• Xi ∼ Gamma (αi , β) ∧ Xi ⊥⊥ Xj =⇒ i Xi ∼ Gamma ( i αi , β)
Z ∞
Γ(α) α−1 −λx
• = x e dx
7 Distribution Relationships λα 0
Beta
Binomial
1 Γ(α + β) α−1
n • xα−1 (1 − x)β−1 = x (1 − x)β−1
• Xi ∼ Bern (p) =⇒
X
Xi ∼ Bin (n, p) B(α, β) Γ(α)Γ(β)
  B(α + k, β) α+k−1
E X k−1
 
i=1 • E Xk = =
• X ∼ Bin (n, p) , Y ∼ Bin (m, p) =⇒ X + Y ∼ Bin (n + m, p) B(α, β) α+β+k−1
• limn→∞ Bin (n, p) = Po (np) (n large, p small) • Beta (1, 1) ∼ Unif (0, 1)
8
8 Probability and Moment Generating Functions Conditional mean and variance
σX
E [X | Y ] = E [X] + ρ (Y − E [Y ])
 
• GX (t) = E tX |t| < 1
"∞ σY

#  
X (Xt)i X E Xi
· ti
t
 Xt

• MX (t) = GX (e ) = E e =E =
p
i! i! V [X | Y ] = σX 1 − ρ2
i=0 i=0
• P [X = 0] = GX (0)
• P [X = 1] = G0X (0) 9.3 Multivariate Normal
(i)
GX (0) Covariance Matrix Σ (Precision Matrix Σ−1 )
• P [X = i] =
i!  
V [X1 ] · · · Cov [X1 , Xk ]
• E [X] = G0X (1− )
Σ=
 .. .. .. 
  (k)
• E X k = MX (0) . . . 

X!
 Cov [Xk , X1 ] · · · V [Xk ]
(k)
• E = GX (1− )
(X − k)! If X ∼ N (µ, Σ),
2
• V [X] = G00X (1− ) + G0X (1− ) − (G0X (1− ))  
−1/2 1
• GX (t) = GY (t) =⇒ X = Y
d
fX (x) = (2π)−n/2 |Σ| exp − (x − µ)T Σ−1 (x − µ)
2
Properties
9 Multivariate Distributions
• Z ∼ N (0, 1) ∧ X = µ + Σ1/2 Z =⇒ X ∼ N (µ, Σ)
9.1 Standard Bivariate Normal • X ∼ N (µ, Σ) =⇒ Σ−1/2 (X − µ) ∼ N (0, 1)

p • X ∼ N (µ, Σ) =⇒ AX ∼ N Aµ, AΣAT
Let X, Y ∼ N (0, 1) ∧ X ⊥
⊥ Z with Y = ρX + 1 − ρ2 Z 
• X ∼ N (µ, Σ) ∧ a is vector of length k =⇒ aT X ∼ N aT µ, aT Σa
Joint density  2
x + y 2 − 2ρxy

1
f (x, y) = exp − 10 Convergence
2(1 − ρ2 )
p
2π 1 − ρ2
Conditionals Let {X1 , X2 , . . .} be a sequence of rv’s and let X be another rv. Let Fn denote
the cdf of Xn and let F denote the cdf of X.
(Y | X = x) ∼ N ρx, 1 − ρ2 (X | Y = y) ∼ N ρy, 1 − ρ2
 
and
Types of Convergence
Independence D
X⊥
⊥ Y ⇐⇒ ρ = 0 1. In distribution (weakly, in law): Xn → X

lim Fn (t) = F (t) ∀t where F continuous


n→∞
9.2 Bivariate Normal
  P
Let X ∼ N µx , σx2 and Y ∼ N µy , σy2 . 2. In probability: Xn → X

1

z
 (∀ε > 0) lim P [|Xn − X| > ε] = 0
n→∞
f (x, y) = exp −
2(1 − ρ2 )
p
2πσx σy 1 − ρ2 as
3. Almost surely (strongly): Xn → X
" 2  2   #
x − µx y − µy x − µx y − µy h i h i
z= + − 2ρ P lim Xn = X = P ω ∈ Ω : lim Xn (ω) = X(ω) = 1
σx σy σx σy n→∞ n→∞
9
qm
4. In quadratic mean (L2 ): Xn → X CLT Notations

lim E (Xn − X)2 = 0


  Zn ≈ N (0, 1)
n→∞
σ2
 
X̄n ≈ N µ,
Relationships n
σ2
 
qm
• Xn → X =⇒ Xn → X =⇒ Xn → X
P D X̄n − µ ≈ N 0,
n
as
• Xn → X =⇒ Xn → X
P √ 2

D P
n(X̄n − µ) ≈ N 0, σ
• Xn → X ∧ (∃c ∈ R) P [X = c] = 1 =⇒ Xn → X √
n(X̄n − µ)
• Xn
P
→X ∧ Yn
P
→ Y =⇒ Xn + Yn → X + Y
P
≈ N (0, 1)
qm qm qm
n
• Xn →X ∧ Yn → Y =⇒ Xn + Yn → X + Y
P P P
• Xn →X ∧ Yn → Y =⇒ Xn Yn → XY
• Xn
P
→X =⇒
P
ϕ(Xn ) → ϕ(X) Continuity Correction
x + 12 − µ
D D  
• Xn → X =⇒ ϕ(Xn ) → ϕ(X)  
qm P X̄n ≤ x ≈ Φ √
• Xn → b ⇐⇒ limn→∞ E [Xn ] = b ∧ limn→∞ V [Xn ] = 0 σ/ n
qm
• X1 , . . . , Xn iid ∧ E [X] = µ ∧ V [X] < ∞ ⇐⇒ X̄n → µ
x − 12 − µ
 
 
P X̄n ≥ x ≈ 1 − Φ √
Slutzky’s Theorem σ/ n
Delta Method
D P D
• Xn → X and Yn → c =⇒ Xn + Yn → X + c
σ2 σ2
   
D P D 0 2
• Xn → X and Yn → c =⇒ Xn Yn → cX Yn ≈ N µ, =⇒ ϕ(Yn ) ≈ N ϕ(µ), (ϕ (µ))
D D D n n
• In general: Xn → X and Yn → Y =⇒
6 Xn + Yn → X + Y

11 Statistical Inference
10.1 Law of Large Numbers (LLN) iid
Let X1 , · · · , Xn ∼ F if not otherwise noted.
Let {X1 , . . . , Xn } be a sequence of iid rv’s, E [X1 ] = µ, and V [X1 ] < ∞.
11.1 Point Estimation
Weak (WLLN)
P • Point estimator θbn of θ is a rv: θbn = g(X1 , . . . , Xn )
X̄n → µ as n → ∞ h i
• bias(θbn ) = E θbn − θ
Strong (SLLN) P
as • Consistency: θbn → θ
X̄n → µ as n → ∞
• Sampling distribution: F (θbn )
r h i
• Standard error: se(θn ) = V θbn
b
10.2 Central Limit Theorem (CLT)
h i h i
Let {X1 , . . . , Xn } be a sequence of iid rv’s, E [X1 ] = µ, and V [X1 ] = σ 2 . • Mean squared error: mse = E (θbn − θ)2 = bias(θbn )2 + V θbn

√ • limn→∞ bias(θbn ) = 0 ∧ limn→∞ se(θbn ) = 0 =⇒ θbn is consistent


X̄n − µ n(X̄n − µ) D
Zn := q   = →Z where Z ∼ N (0, 1) θbn − θ D
V X̄n σ • Asymptotic normality: → N (0, 1)
se
• Slutzky’s Theorem often lets us replace se(θbn ) by some (weakly) consis-
lim P [Zn ≤ z] = Φ(z) z∈R tent estimator σ
bn .
n→∞ 10
11.2 Normal-based Confidence Interval 11.4 Statistical Functionals
 
b 2 . Let zα/2 = Φ−1 (1 − (α/2)), i.e., P Z > zα/2 = α/2 • Statistical functional: T (F )
 
Suppose θbn ≈ N θ, se
 
and P −zα/2 < Z < zα/2 = 1 − α where Z ∼ N (0, 1). Then • Plug-in estimator of θ = T (F ) : θbn = T (F̂n )
R
• Linear functional: T (F ) = ϕ(x) dFX (x)
Cn = θbn ± zα/2 se
b • Plug-in estimator for linear functional:
Z n
1X
T (F̂n ) =
ϕ(x) dFbn (x) = ϕ(Xi )
11.3 Empirical Distribution Function n i=1
 
Empirical Distribution Function (ECDF) b 2 =⇒ T (F̂n ) ± zα/2 se
• Often: T (F̂n ) ≈ N T (F ), se b
Pn
I(Xi ≤ x) • pth quantile: F −1 (p) = inf{x : F (x) ≥ p}
i=1
Fbn (x) = • µ̂ = X̄n
n
n
1 X
b2 =
• σ (Xi − X̄n )2
n − 1 i=1
(
1 Xi ≤ x
I(Xi ≤ x) = 1
Pn 3
0 Xi > x n i=1 (Xi − µ̂)
• κ̂ =
b3 j
σ
Pn
Properties (for any fixed x) (Xi − X̄n )(Yi − Ȳn )
• ρ̂ = qP i=1 qP
n 2 n
h i
i=1 (X i − X̄n ) i=1 (Yi − Ȳn )
• E F̂n = F (x)
h i F (x)(1 − F (x))
• V F̂n =
n
12 Parametric Inference
F (x)(1 − F (x)) D 
• mse = →0 Let F = f (x; θ : θ ∈ Θ be a parametric model with parameter space Θ ⊂ Rk
n
P and parameter θ = (θ1 , . . . , θk ).
• F̂n → F (x)

Dvoretzky-Kiefer-Wolfowitz (DKW) Inequality (X1 , . . . , Xn ∼ F ) 12.1 Method of Moments


  j th moment Z
2
P sup F (x) − F̂n (x) > ε = 2e−2nε αj (θ) = E X j =
 
xj dFX (x)
x

Nonparametric 1 − α confidence band for F j th sample moment


n
1X j
α̂j = X
L(x) = max{F̂n − n , 0} n i=1 i
U (x) = min{F̂n + n , 1} Method of Moments Estimator (MoM)
s  
1 2
= log α1 (θ) = α̂1
2n α
α2 (θ) = α̂2
.. ..
.=.
P [L(x) ≤ F (x) ≤ U (x) ∀x] ≥ 1 − α αk (θ) = α̂k
11
with Jn (θ) = In−1 . Further, if θbj is the j th component of θ, then 13 Hypothesis Testing
H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1
(θbj − θj ) D
→ N (0, 1) Definitions
se
bj
• Null hypothesis H0
h i • Alternative hypothesis H1
b 2j = Jn (j, j) and Cov θbj , θbk = Jn (j, k)
where se • Simple hypothesis θ = θ0
• Composite hypothesis θ > θ0 or θ < θ0
• Two-sided test: H0 : θ = θ0 versus H1 : θ 6= θ0
• One-sided test: H0 : θ ≤ θ0 versus H1 : θ > θ0
12.3.1 Multiparameter Delta Method • Critical value c
• Test statistic T
Let τ = ϕ(θ1 , . . . , θk ) be a function and let the gradient of ϕ be • Rejection Region R = {x : T (x) > c}
• Power function β(θ) = P [X ∈ R]
∂ϕ
 • Power of a test: 1 − P [Type II error] = 1 − β = inf β(θ)
θ∈Θ1
 ∂θ1  • Test size: α = P [Type I error] = sup β(θ)
 . 
 .. 
∇ϕ =   θ∈Θ0
 ∂ϕ 
Retain H0 Reject H0

∂θk H0 true Type
√ I error (α)
H1 true Type II error (β) (power)
p-value
Suppose ∇ϕ θ=θb
6= 0 and τb = ϕ(θ).
b Then,

• p-value = supθ∈Θ0 Pθ [T (X) ≥ T (x)] = inf α : T (x) ∈ Rα
Pθ [T (X ? ) ≥ T (X)]

• p-value = supθ∈Θ0 = inf α : T (X) ∈ Rα
τ − τ) D
(b
→ N (0, 1)
| {z }
se(b
b τ) 1−Fθ (T (X)) since T (X ? )∼Fθ

p-value evidence
where < 0.01 very strong evidence against H0
0.01 − 0.05 strong evidence against H0
r
 T   0.05 − 0.1 weak evidence against H0
se(b
b τ) = ˆ
∇ϕ Jˆn ∇ϕ
ˆ > 0.1 little or no evidence against H0
Wald Test
• Two-sided test
and Jˆn = Jn (θ) ˆ = ∇ϕ
b and ∇ϕ
θ=θb
. θb − θ0
• Reject H0 when |W | > zα/2 where W =
  se
b
• P |W | > zα/2 → α
• p-value = Pθ0 [|W | > |w|] ≈ P [|Z| > |w|] = 2Φ(−|w|)
12.4 Parametric Bootstrap
Likelihood Ratio Test (LRT)

Sample from f (x; θbn ) instead of from F̂n , where θbn could be the mle or method supθ∈Θ Ln (θ) Ln (θbn )
• T (X) = =
of moments estimator. supθ∈Θ0 Ln (θ) Ln (θbn,0 ) 13

You might also like