0% found this document useful (0 votes)
233 views4 pages

Statistical Inference Cheat Sheet

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
233 views4 pages

Statistical Inference Cheat Sheet

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Stat 111 Midterm Notesheet Empirical Estimates Law of Large Numbers

Center: The sample mean converges in probability to the true mean.


Compiled by Jamie Liu based on materials from Professor Joe P k
1
Blitzstein’s Spring 2023 Stat 111 Lectures. Sample moment: n (Yj ) p
Ȳ → µ
Sample median: Y(⌈n/2⌉) (order statistic)
Models Central Limit Theorem
Spread:
A statistical model is a collection of joint distributions for Y , Sample variance: 1
P
(Yj − Ȳ )2 (unbiased) The sample mean converges in distribution to Normal.
n−1
indexed by a parameter θ.
1
P !
A model is parametric if the dimension of θ is finite, and Sample Covariance: n−1 (Xj − X̄)(Yj − Ȳ ) (unbiased) √ d

2
 Ȳ − µ d · σ2
non-parametric otherwise. n(Ȳ − µ) → N 0, σ ⇐⇒ √ → N (0, 1) ⇐⇒ Ȳ ∼ N µ,
Empirical CDF: F̂ (y) =
P
Ind(Yj ≤ y) σ/ n n
The parameter space is the space of possible values for θ.
An estimate is a crystallized value of an estimator, which is used to Sample Quantile: Q̂(p) = Y(⌈np⌉) (order statistic)
Continuous Mapping Theorem
estimate an estimand. F̂ (y+h/2)−F̂ (y−h/2)
Kernel Density Estimate: fˆ(y) = n For a continuous function g, if X → X̃, then g(X) → g(X̃) in the same
A statistic is any function of the data.
manner (p or d). The converse is false unless X̃ is a constant.
Evaluation
Likelihood Slutsky’s Theorem
Bias(θ̂) = E(θ̂ − θ) = E(θ̂) − θ
A likelihood function represents the likelihood of observing the data, q d d
given the parameter. For i.i.d data, the joint data is
Q
f (yi ) SE(θ̂) = Var(θ̂) If X → X̃ and Y → c where c is a constant, then:
A log-likelihood function is the natural log of a likelihood function, d
2
MSE(θ̂) = E(θ̂ − θ) = Var(θ̂) + Bias(θ̂)
2 • X + Y → X̃ + c
and is often easier to optimize.
d
A score function is the derivative of a log-likelihood function. Set it Loss function: any function such that L(θ, θ̂) ≥ 0, L(θ, θ) = 0 • X − Y → X̃ − c
to zero to solve for MLE. The expectation of the score function is zero. d
Differentiate it again to confirm maximum.
e.g. Error Tolerance: P (|θ̂ − θ|) ≥ E • XY → X̃c
d
Fisher Information informs on the variance within the data; the less Cramér-Rao Lower Bound (CRLB) • X/Y → X̃/c.
information, the more noise.
An unbiased estimator θ̂ has Var(θ̂) ≥ IY (θ)−1 . Delta Method
IY (θ) = Var(s(θ; Y ); θ) = nI1 (θ)
For some random variable T (often an estimator for θ) and a function
Change of Variables: Rao-Blackwell Theorem g that is differentiable on the domain of interest, suppose
IY (θ) IY (θ) √ d
IY (τ ) = = For an estimator θ̂ and a sufficient statistic T , Rao-Blackwellization n(T − θ) → N (0, σ 2 ). Then:
( dτ
dθ )
2 g ′ (θ)2 yields θ̂RB = E(θ̂|T ) such that Bias(θ̂RB ) = Bias(θ̂) and
√ d

′ 2 2 
Information Equality: Var(θ̂RB ) ≤ Var(θ̂). n(g(T ) − g(θ)) → N 0, g (θ) σ
′ 2 Note: If θ̂ is already a function of T , then θ̂RB = θ̂.
E(s (θ)) = −IY (θ) = −E(s (θ)) It follows that: !
· ′ 2 σ 2
Point Estimation Interval Estimation g(T ) ∼ N g(θ), g (θ)
n
For point estimation, the estimand, estimator, and estimates are all For interval estimation, the estimator is an interval. Intuitively, an Note: The sample mean satisfies these conditions by CLT.
single values. interval estimate captures plausible values of the estimand.
A point estimate is consistent if its bias and variance are Regression
asymptotically zero, i.e. if it converges in probability to the true value. Confidence Intervals
Can be checked through showing MSE → 0, LLN, and/or CMT. ⃗ be the predictors/covariates/features, and Y be the
Let X
The interval [L(Y ), U (Y )] is a (1 − α) confidence interval if the
outcome/label. Let the data be (X⃗ j , Yj ) for j ∈ {1, 2, . . . n}.
Maximum Likelihood Estimate coverage probability P (L(Y ) ≤ θ ≤ U (Y ) = 1 − α. In practice, α is
often 0.05, i.e. a 95% confidence interval. Let µ(x) = E(Y | X = x) and U (x) = Y − µ(x) (error).
Compute the MLE by optimizing the likelihood function or
log-likelihood function, usually by setting the score function equal to Intuition: Over replicated sampling of Y , if the interval is computed Theorems:
zero. Check second derivative of log-likelihood. using the same functions L(Y ) and U (Y ) of the data, a proportion of
1 − α of these intervals will contain the true value of θ. 1. Y = µ(x) + U (x) (signal + noise)
Invariance Property: Let ψ = g(θ) for some function g. The MLE
A pivot is a quantity with a known distribution, usually with an 2. E(U (x) | X = x) = 0
ψ̂M LE = g(θ̂M LE ). Invariance holds for the likelihood function as a
whole, allowing for reparameterization. Expectation is a function in unknown value. Utilize a pivot to find the distribution of a statistic 3. E(U (x)) = 0
which repara. can be applied. with a known value but unknown distribution.
√ d
MLE is asymptotically distributed n(θ̂ − θ) → N (0, I1 (θ)−1 ), i.e. Student’s T distribution is a useful pivot. tn = √ Z
, where
Gaussian Linear Regression
vn /n
asymptotically, MLE is unbiased and achieves CRLB. MLE is Assume Y | X = x ∼ N (θx, σ 2 (x)). Homoskedasticity implies that σ 2
Z ∼ N (0, 1) and vn ∼ χ2n with Z ⊥
⊥ vn . (See Examples)
consistent for true θ ∗ does not change with X, i.e. σ 2 (x) is a constant, whereas
·
It follows that MLE is approximately distributed θ̂ ∼ N (θ, IY (θ)−1 ). heteroskedasticity implies otherwise.
Asymptotics Error: yj − θxj , Residual: yj − θ̂xj
Method of Moments p
(yj − θxj )2
P
Convergence of Empirical CDF: F̂ → F Sum of Square Errors: (θ) =
1. Write the estimand in terms of theoretical moments.
2. Replace the estimand with the estimator, replace the Convergence of Sample Quantiles (PDF f ): Least Squares Estimator: θ̂LS = argmin (θ)
θ
theoretical moments with sample moments.
√ p(1 − p)
 
d Note: For Gaussian Linear Regression, LSE agrees with MLE.
3. Solve for the estimator. n(Q̂(p) − Q(p)) → N 0,
f (Q(p))2
Logistic Regression Bayes’ Rule: Inequalities
P (B | A)P (A)
For binary Y , which must follow a Bernoulli distribution. • Cauchy-Schwarz: |E(XY )| ≤
p
P (A | B) = E(X 2 )E(Y 2 )
P (B)
µ(x) = E(Y | X = x) = P (Y = 1 | X = x) = F (θx) • Markov: P (X ≥ a) ≤
E|X|
for a > 0
P (A, B, C) P (B | A, C)P (A | C) P (B, C | A)P (A) a
P (A | B, C) = = =
where F is some function that maps to [0, 1], i.e. a CDF. Logisitc P (B, C) P (B | C P (B, C σ2
• Chebyshev: P (|X − µ| ≥ a) ≤ a2
regression uses the logistic CDF, a.k.a. sigmoid function, and probit – posterior is proportional to likelihood times prior. Conditional
exp(θ0 +θ1 x1 )
regression uses the standard normal CDF. 1+exp(θ , the inverse Distribution of RVs: • Jensen: E(g(X)) ≥ g(E(X)) for convex g; reverse for concave g
0 +θ1 x1 )
p
logit function (logit = log( 1−p fX,Y fY |X fX
fX|Y = = Poisson Processes
fY fY
Mathematical Tools For a Poisson process of rate λ arrivals per unit of time:
Marginal Distribution of RVs: • The number of arrivals in a time interval of length t is Pois(λt)
Taylor Approximation: (at a point a) Z
fX = fX,Y dY • Numbers of arrivals in disjoint time intervals are independent

X f (n) (a) n • Inter-arrival times are i.i.d. Expo(λ)
f (x) ≈ (x − a)
n!
n=0 Expectation and Variance Convolutions of Random Variables
Differentiation Under the Integral Sign (DUThIS): Definition of Expectation:
• X ∼ Pois(λ1 ), Y ∼ Pois(λ2) =⇒ X +Y ∼ Pois(λ1 + λ2 )
∂ ∂
Z Z Z λ1
h(x, θ)dx = h(x, θ)dx E(X) = xfX (x)dx =⇒ X | X + Y = n ∼ Bin n, λ +λ
1 2
∂θ ∂θ
• X ∼ Bin(n1 , p), Y ∼ Bin(n2 , p) =⇒ X + Y ∼ Bin(n1 + n2 , p)
Sum of Squared Differences Identity: Linearity:
E(aX + bY + c) = aE(X) + bE(Y ) + c • X ∼ Gamma(a1 , λ),
n n
X 2
X 2 2 Y ∼ Gamma(a2 , λ) =⇒ X + Y ∼ Gamma(a1 + a2 , λ)
(yj − µ) = (yj − ȳ) + n(ȳ − µ) Conditional Expectation:
j=1 j=1 Z • X ∼ NBin(r1 , p),
E(X | A) = xfX (x | A)dx Y ∼ NBin(r2 , p) ⇐⇒ X + Y ∼ NBin(r1 + r2 , p)
Change of Variables:
• X ∼ N (µ1 , σ12 ), Y ∼ N (µ2 , σ22 ) ⇐⇒
Let X and Y be random variables such that Y = g(X) where g is LOTUS: X + Y ∼ N (µ1 + µ2 , σ12 + σ22 )
differentiable and strictly increasing. Then, for corresponding x, y:
Z
E(g(X)) = g(x)fX (x)dx • Y ∼ Expo(λ) ⇐⇒ λY ∼ Expo(1) ⇐⇒ kY ∼ Expo(λ/k)
dx
fY (y) = fX (x) and FY (y) = FX (x) λ1
dy Adam’s Law (LOTE): • X ∼ Expo(λ1 ), Y ∼ Expo(λ2 ) =⇒ P (X < Y ) = λ1 +λ2

Note: When g is strictly decreasing, FY (y) = 1 − FX (x). E(Y ) = E(E(Y | A)) = E(Y | A1 )P (A1 ) + E(Y | A2 )P (A2 ) + . . . • Minimum of k independent Expo(λj ) ∼ Expo(λ1 + · · · + λk )
Geometric Series: Eve’s Law (LOTV): • Maximum of k i.i.d. Expo(λ) = Y1 + Y2 + . . . Yk , where
n−1 Yj ∼ Expo(jλ)
2 n−1
X k 1 − rn Var(Y ) = E(Var(Y | X)) + Var(E(Y | X))
a + ar + ar + · · · + ar = ar = a
k=0
1−r Fundamental Bridge (Indicator RVs): Special Cases of Distributions
Taylor Series for Exponential Function: • Bin(n, p) can be thought of as the sum of n i.i.d. Bern(p)
E(Ind(A)) = P (A)
∞ n 2 3  n • Beta(1, 1) is the same distribution as Unif(0, 1)
x
X x x x x Variance:
e = =1+x+ + + · · · = lim 1+ • The sum of n i.i.d. Expo(λ) is Gamma(n, λ),
n! 2! 3! n→∞ n 2 2 2
n=0 Var(X) = E(X − E(X)) = E(X ) − (E(X))
• χ2n is the sum of squares of n i.i.d. N (0, 1) or Gamma n 1

2, 2
Sum of First n Terms of Harmonic Series: Standard Deviation: • NBin(r, p) can be thought of as the sum of r i.i.d. Geom(p)
1 1 1 q
1 + + + ··· + ≈ log n + 0.5772 SD(X) = Var(X) • For X ∼ Expo(λ), X 1/γ ∼ Weibull(λ, γ)
2 3 n
Gamma and Beta Integrals Covariance: • For X ∼ N (µ, σ 2 ), eX ∼ Log-Normal(µ, σ 2 )
Z ∞ Z 1 Γ(a)Γ(b) • For X ∼ Gamma(a, λ) and Y ∼ Gamma(b, λ),
t−1 −x a−1 b−1 Cov(X, Y ) = E[(X − E(X))(Y − E(Y ))] = E(XY ) − E(X)E(Y ) X X
x e dx = Γ(t) x (1 − x) dx = X+Y ∼ Beta(a, b), with X + Y ⊥
⊥ X+Y
0 0 Γ(a + b)
Correlation: • Beta-Binomial Conjugacy: For X | p ∼ Bin(n, p) and
Notes: Γ(a + 1) = aΓ(a), and Γ(n) = (n − 1)! if n is a positive integer. Cov(X, Y )
Corr(X, Y ) = p p ∼ Beta(a, b), the posterior p | X ∼ Beta(a + x, b + n − x)
Var(X)Var(Y )
• Gamma-Poisson Conjugacy For X | λ ∼ Pois(λt) and
Useful Stat 110 Concepts Properties: λ ∼ Gamma(r, b), the posterior λ | X ∼ Gamma(r + x, b + t)
• Var(aX) = a2 Var(X)
Conditional Probability • Chicken-Egg: For Z ∼ Pois(λ) items and acceptance
• For X ⊥
⊥ Y , Var(X + Y ) = Var(X − Y ) = Var(X) + Var(Y ) probability p, accepted items Z1 ∼ Pois(λp) and rejected items
Definition of Conditional Probability: Z2 ∼ Pois(λ(1 − p)) with Z1 ⊥ ⊥ Z2
• Cov(X, Y ) = Cov(Y, X)
P (A, B) • Poisson-Normal: Pois(λ) is approximately N (λ, λ) when λ is
P (A | B) = • Cov(X + a, Y + b) = Cov(X, Y )
P (B) large
• Cov(aX, bY ) = abCov(X, Y )
• Binomial-Poisson: Bin(n, p) is approximately Pois(np) when
Law of Total Probability:
• Cov(W + X, Y + Z) = p is small
n Cov(W, Y ) + Cov(W, Z) + Cov(X, Y ) + Cov(X, Z)
X • Binomial-Normal: Bin(n, p) is approximately N (np, npq)
P (A) = P (A | B1 )P (B1 )+P (A | B2 )P (B2 )+· · · = P (A | Bk )P (Bk )
• Corr(aX + b, cY + d) = Corr(X, Y ) when n is large and p is not near 0 or 1
k=1
Examples German Tank Problem Variance-Stabilizing of Poisson
Suppose n tanks are captured, with serial numbers Y1 , Y2 , . . . Yn . Let T ∼ Pois(λ)
MLE and MoM of Mean/Variance in Normal Assume the population serial numbers are 1, 2, . . . t and that the data √≈ N (λ, λ) for large λ. What is the approximate
distribution of T ?
i.i.d. is a simple random sample. Estimate the total number of tanks t.
Let Y1 , Y2 , Yn ∼ N (µ, σ 2 ). ·
1 T ∼ N (λ, λ)
Maximum Likelihood Estimates: L(t) = t
if Y1 , Y2 , . . . Yn ∈ {1, 2, . . . t} and 0 otherwise
√ √ 1
 
·
! n
(Yj − µ)2
P
1 2 T ∼N λ, , by Delta Method
L(µ, σ ) =exp − 4
σn 2σ 2 Ind(Y(n) ≤ t)
= t
2
ℓ(µ, σ ) = −n log(σ) −
1 X 2
(Yj − Ȳ ) + n(Ȳ − µ)
2
 n Confidence Interval for Mean/Variance in Normal
2σ 2 The likelihood of t is 0 for t < Y(n) because we would have already
The log-likelihood maximizes when µ = Ȳ , so µ̂M LE = Ȳ . observed a tank with a higher serial number. However, the likelihood Let Y ∼ N (µ, σ 2 ), with µ and σ 2 both unknown.
function is decreasing, so the maximum likelihood estimate must be Suppose we use the unbiased sample variance σ̂ 2 = 1
P
(Yj − Ȳ )2 .
Plugging µ̂M LE into the log-likelihood function: t̂M LE = Y(n) . However, this estimator is biased. n−1

2 1 X 2
ℓ(σ ) = −n log(σ) − (Yj − Ȳ ) The PMF for Y(n) is the number of ways to choose n − 1 tanks with n−1 2 2
2σ 2 serial numbers less than Y(n) divided by the total number of ways to σ̂ ∼ χn−1
σ2
n 2 1 X 2 choose n tanks from t.
= − log(σ ) − (Yj − Ȳ )
2 2σ 2 m−1 So, the 95% confidence interval for σ 2 is:
2 n 1 X 2 n−1
s(σ ) = − 2 + (Yj − Ȳ ) P (Y(n) = m) = t
" #
2σ 4σ 2 n
σ̂ 2 (n − 1) σ̂ 2 (n − 1)
n ,
2 1 X 2 Q1 (0.975) Q1 (0.025)
σ̂M LE = (Yj − Ȳ ) t m − 1 n
n j=1
X
E(Y(n) ) = m = (t + 1)  
n−1 n+1 where Q1 (p) is the quantile function for χ2n−1 or Gamma n−1 1
Method of Moments: m=n 2 , 2 .
n+1
1. µ = E(Y ) So, we can correct our estimator to n Y(n) − 1, which is unbiased.
2. µ̂M oM = Ȳ Now, suppose we use the sample mean µ̂ = Ȳ .
Sample Mean vs. Sample Median σ
1. σ 2 = E(Y 2 ) − (E(Y ))2 Ȳ − µ ∼ √ N (0, 1)
i.i.d.
2
Let Y1 , Y2 , . . . Yn ∼ N (θ, σ 2 ); estimand θ n
P 2
1
Yj − Ȳ 2 = 1
(Yj − Ȳ )2
P
2. σ̂M oM = n n
2
 
Sufficient Statistic and MLE in an NEF Sample mean: Ȳ ∼ N µ, σn n−1 2 2
σ̂ ∼ χn−1
σ2
σ2
 
·
The PMF/PDF of an NEF can be written as fθ (y) = eθy−ψ(θ) h(y), so Sample median: Mn ∼ N θ, π by asymptotic distribution of
2 n So, we can use the pivot:
the joint log-likelihood is:
sample quantiles
P
Yj −nψ(θ) σ σ
L(θ) = e
θ
Ȳ − µ √
n
Z √
n
Z
Sample mean is a more efficient estimator as it has a lower variance, p = r = q
n but in cases when the assumption of Normal is wrong (e.g. Cauchy), σ̂ 2 /n σ n−1 2
(n−1)σ 2 √ σ̂ /(n − 1)
σ̂ 2 /n
X
ℓ(θ) = θ Yj − nψ(θ) sample median may be more robust. (n−1)σ 2
n σ2
j=1
n

Generic Method of Moments Z
∼ tn−1
X
s(θ) = Yj − nψ (θ) = 0 = q
n−1 2
j=1 Let Y1 , Y2 , . . . Yn i.i.d. with mean θ and variance σ 2 . σ2
σ̂ /(n − 1)
n 1. θ = E(Y )
1 X ′ So, the 95% confidence interval for µ is:
Yj = ψ (θ) = E(Y )
n j=1 2. θ̂M oM = Ȳ  
σ̂ σ̂
µ̂M LE = Ȳ Evaluation: Ȳ − √ Q2 (0.975), Ȳ − √ Q2 (0.025)
n n
So, Ȳ is a sufficient statistic. • Bias(θ̂) = 0 by linearity
2
where Q2 (p) is the quantile function for the Student-t distribution
Censored Data • Var(θ̂) = σn with parameter n − 1.
Suppose there are n = 30 devices. They are observed for 7 months, at 2
 
d
• θ̂ → N θ, σn by CLT Fisher Information Equality
which point 21 have failed while 9 still work. Assume each device’s
i.i.d. p
lifetime Yj ∼ Expo(λ) and the estimand is µ = 1/λ. • θ̂ → θ by LLn i.i.d.
Let Y1 , Y2 , . . . Yn → Geom(p). Find IY (p).
For each observation:
(
f (y) if observed Asymptotics Theorems L1 (p) = p(1 − p)
Y1
Lj (λ) =
1 − F (7) if not observed Let Y1 , Y2 , . . . Yn i.i.d. with

mean µ ̸= 0 and variance σ 2 . Suppose a
n(Ȳ −µ) ℓ1 (p) = log(p) + Y1 log(1 − p)

21
 variable of interest T = Ȳ 3 −µ3 . Find its asymptotic distribution.
−7λ 9
 
L(λ) =
Y
λe
−λtj 
e
21 −21λt̄ −63λ
=λ e e 1 Y1
 √ d s1 (p) = −
j=1 Numerator: n(Ȳ − µ) → σZ where Z ∼ N (0, 1) by CLT p 1−p
p p  
ℓ(λ) = 21 log(λ) − 21λt̄ − 63λ Denominator: Ȳ → µ by LLN, then Ȳ 3 → µ3 by CMT, so 1. IY (p) = nI1 (p) = nVar 1 Y1
− 1−p = n
Var(Y1 ) =
p (1−p)2
21 Ȳ 3 + µ3 = 2µ3 1−p
s(λ) = − 21t̄ − 63 = 0 n
= n
λ Combining the numerator and denominator using Slutsky’s Theorem: (1−p)2 p2 p2 (1−p)
1  
λ̂M LE = !
2. IY (p) = nI1 (p) = −nE(s′1 (p)) = −nE − p12 −
Y1
=
t̄ + 3 d σZ · σ2 (1−p)2
T → = N 0, n n n
µ̂M LE = t̄ + 3, by invariance 2µ3 4µ6 p2
+ (1−p)2
E(Y1 ) = p2 (1−p)
E(Y Z)
Poisson Rao-Blackwellization = E(Y Z) − E(XZ) = E(Y Z) − E(Y Z) = 0 Plugging in ψ = eµ by symmetry of log(Y ):
E(XZ) !
i.i.d.
Let Y1 , Y2 , . . . Yn → Pois(λ). The sufficient statistic is T =
P
Yj . P P P P √ d 2πσ 2 ψ 2
Yj Zj Uj Zj + β Xj Zj Uj Zj n(ψ̂ − ψ) → N 0,
Suppose we use the unbiased estimator λ̂ = Y1 . β̂ = P = P =β+ P 4
Xj Zj Xj Zj Xj Zj
Then, for large n:
λ̂RB = E(λ̂ | T ) = E(Y1 | T ) Then, !
· πσ 2 ψ 2
√ 1 P ψ̂ ∼ N ψ,
We can think of the data as a Poisson process with a rate of λ and a √   n Uj Zj 2n
total of T arrivals. We can split the timeline into disjoint intervals of n β̂ − β = 1 n
P
n Xj Zj So, for large n:
length 1, so the number of arrivals in each interval is distributed as
Pois(λ). Since we know that there are a total of T arrivals and Bias(ψ̂) ≈ ψ − θ


 
1 X nX
r
π
q
per-interval arrivals are i.i.d., we can view the distribution of arrivals d
1
 n Uj Zj − E(U Z) = Uj Zj → N (0, Var(U Z) by CLT SE(ψ̂) = Var(ψ̂) ≈ σψ
that fall within the first interval as Y1 | T ∼ Bin T, n . n n 2n
P (2) What are the bias and standard error for large n when the
T Yj 2 2
Var(U Z) = E((U Z) ) − E(U Z) = E((U Z) )
2
estimand is ψ?
λ̂RB = E(Y1 | T ) = = = Ȳ
n n
By the approximate distribution for ψ̂ found previously:
1 X p
Log-Normal MoM Xj Zj → E(XZ) by LLN Bias(ψ̂) ≈ ψ − ψ = 0
n r
i.i.d. 2 π
Let Y1 , Y2 , . . . Yn ∼ Log-Normal(µ, σ ). Use two methods to find
!
√  
d E((U Z)2 ) SE(ψ̂) ≈ σψ
Method of Moments estimators for (µ, σ 2 ). n β̂ − β → N 0, 2
by Slutsky’s 2n
E(XZ)
Method 1: MLE and MoM for Gaussian Linear Regression
1. Let Xj = log(Yj ) for all j ∈ {1, 2, . . . , n} =⇒ Xj ∼ N (µ, σ 2 ) MoM for Neil’s Commute Problem Let the data be pairs (Xj , Yj ) such that Yj | Xj ∼ N (θXj , σ 2 (Xj )),
Let there be two different routes X and Y and X1 , . . . Xn and with σ 2 (x) known. Note that under homoskedasticity, σ 2 (x) is a
(
µ = E(X)
2. Y1 , . . . Yn be independent commute times with the Xj ’s i.i.d. and Yj ’s constant.
σ 2 = E(X 2 ) − (E(X))2 i.i.d. We want to compare the commute times by looking at the ratio Maximum Likelihood Estimate:
( of expected commute times beyond 40 minutes. The estimand is: n
µ̂ = X̄ 1 X 2
3. E[(Y1 − 40)Ind(Y1 > 40)] ℓ(θ) = − 2 (yj − θxj )
σ̂ 2 = n
P 2
1
Xj − X̄ 2 = n1
(Xj − X̄)2
P
θ= 2σ j=1
E[(X1 − 40)Ind(X1 > 40)]
Method 2: n
1 X
xj (yj − θxj )
(
E(Y ) = exp µ + 12 σ 2

Find an MoM estimator for θ and show that it is consistent. s(θ) = 2
1. σ j=1
E(Y ) = exp(2µ + 2σ 2 )
2
Let ν = E[(Y1 − 40)Ind(Y > 40)]. Applying the MoM principle, P
X j Yj
n θ̂ = P 2
2. Let M = n 1
Yj2 1 X Xj
( ν̂ = (Yj − 40)Ind(Yj > 40)
Ȳ = exp µ̂ + 21 σ̂ 2 n j=1

Properties:
2
M = exp(2µ̂ + 2σ̂ ) • E(θ̂ | X) = P1 2 E( xj Yj ) = P1 2
P P
xj E(Yj | Xj )
( Similarly, let η = E[(X1 − 40)Ind(X > 40)]. x
j
x
j
2
log Ȳ = µ̂ + 21 σ̂ 2 1 1
P P
n
= P 2 xj θxj = P 2 θ xj = θ =⇒ Unbiased
3. 1 X x
j
x
j
log M = 2µ̂ + 2σ̂ 2 η̂ = (Xj − 40)Ind(Xj > 40)
n j=1 • Var(θ̂ | X) = 1
Var( xj yj ) = P 12 2
P P
Var(xj Yj |
( x2 ) 2
P
( ( x )
σ̂ 2 = log M − 2 log Ȳ 1
j
2 1
j
P 2 x
4.
P
Pn Xj ) = P 2 2 xj Var(Yj | Xj ) = P 2 2 xj σ (xj )
µ̂ = 2 log Ȳ − 21 log M ν̂ j=1 (Yj − 40)Ind(Yj > 40) ( x )
j
( x )
j
θ̂ = = Pn
η̂ j=1 (Xj − 40)Ind(Xj > 40) σ2
⋆ Simplifies to P 2 (CRLB) under homoskedasticity
Asymptotics for MoM Estimator x
j
For consistency, we want to show that θ̂ converges in probability to θ.
Suppose we have data in the form of triplets (Xj , Yj , Zj ) for • Robust Variance: Since E(σ 2 (Xj ) − (Yj − θXj )2 | Xj ) = 0, use
p p
E(Y Z)
j = 1, 2, . . . n. The estimand is β = E(XZ) . ν̂ → ν and η̂ → η by LLN (Yj − θXj )2 as an unbiased estimator for σ 2 (Xj ). Then, the
P 2
robust variance is P 12 2 xj (yj − θxj )2
( x )
P
Y Z
P j j
ν̂ p j
We can use the method of moments estimator β̂ = X j Zj
. = θ̂ → θ by Slutsky’s
η̂ Method of Moments 1: (Gauss’s Estimator)
(1) Show that β̂ is asymptotically unbiased. A = Y1 > 40 ⇒ E[(Y1 − 40)I(Y1 > 40)] = E(XY ) = E(E(XY | X)) = E(XE(Y | X)) = E(XθX) = θE(X )
2

p E[(Y1 − 40)I(A)|A]P (A) + E[(Y1 − 40)I(A)|Ac ]P (Ac ) P


E(XY ) X j Yj
X
Yj Zj → E(Y Z) by LLN = E[Y1 − 40|A]P (A) = µY e−40/µY =⇒ θ = =⇒ θ̂ = P 2 = M LE
E(X )2 Xj
X p
Xj Zj → E(XZ) by LLN Asymptotics of Sample Median Method of Moments 2: (Cauchy’s Estimator)
i.i.d. 2 µ+σ 2 /2 E((X)Y ) = E(E((X)Y | X)) = E((X)E(Y | X))
P
Yj Z j p Let Y1 , Y2 , . . . Yn ∼ Log-Normal(µ, σ ). Let θ = E(Y1 ) = e P
→ β by Slutsky’s and let ψ be the median of the distribution. Suppose that the (xj )yj
= E((X)θX) = θE(|X|) =⇒ θ̂ = P
P
Xj Zj
estimand is the sample median. |xj |
√  
(2) Derive the asymptotic distribution of n β̂ − β . Letting (1) What are the bias and standard error for large n when the Properties:
 P(X  P |X
estimand is θ? )θXj j|
 
U = Yβ X, the parameters of the answer can be in the terms of any • E(θ̂ | X) = E P j = θE P =θ
|Xj | |Xj |
moments of X, Y, Z, U or any moments of their products. By the asymptotic distribution of sample quantiles:
• Var(θ̂ | X) = (P |x1 |)2
P
Var((X)Y | X) =

 
d 1/4 j
E(U Z) = E((Y − βX)Z) = E(Y Z) − βE(XZ) n(ψ̂ − ψ) → N 0, P 1
P 2
σ (xj ) =⇒ less efficient than MLE
(f (ψ))2 ( |xj |)2

You might also like