Statistical Inference Cheat Sheet

Uploaded by

mbxtqtllzicvkjhvkw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

234 views4 pages

Statistical Inference Cheat Sheet

Uploaded by

mbxtqtllzicvkjhvkw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Stat 111 Midterm Notesheet Empirical Estimates Law of Large Numbers

Center: The sample mean converges in probability to the true mean.

Compiled by Jamie Liu based on materials from Professor Joe P k
1
Blitzstein’s Spring 2023 Stat 111 Lectures. Sample moment: n (Yj ) p
Ȳ → µ
Sample median: Y(⌈n/2⌉) (order statistic)
Models Central Limit Theorem
Spread:
A statistical model is a collection of joint distributions for Y , Sample variance: 1
P
(Yj − Ȳ )2 (unbiased) The sample mean converges in distribution to Normal.
n−1
indexed by a parameter θ.
1
P !
A model is parametric if the dimension of θ is finite, and Sample Covariance: n−1 (Xj − X̄)(Yj − Ȳ ) (unbiased) √ d

2
Ȳ − µ d · σ2
non-parametric otherwise. n(Ȳ − µ) → N 0, σ ⇐⇒ √ → N (0, 1) ⇐⇒ Ȳ ∼ N µ,
Empirical CDF: F̂ (y) =
P
Ind(Yj ≤ y) σ/ n n
The parameter space is the space of possible values for θ.
An estimate is a crystallized value of an estimator, which is used to Sample Quantile: Q̂(p) = Y(⌈np⌉) (order statistic)
Continuous Mapping Theorem
estimate an estimand. F̂ (y+h/2)−F̂ (y−h/2)
Kernel Density Estimate: fˆ(y) = n For a continuous function g, if X → X̃, then g(X) → g(X̃) in the same
A statistic is any function of the data.
manner (p or d). The converse is false unless X̃ is a constant.
Evaluation
Likelihood Slutsky’s Theorem
Bias(θ̂) = E(θ̂ − θ) = E(θ̂) − θ
A likelihood function represents the likelihood of observing the data, q d d
given the parameter. For i.i.d data, the joint data is
Q
f (yi ) SE(θ̂) = Var(θ̂) If X → X̃ and Y → c where c is a constant, then:
A log-likelihood function is the natural log of a likelihood function, d
2
MSE(θ̂) = E(θ̂ − θ) = Var(θ̂) + Bias(θ̂)
2 • X + Y → X̃ + c
and is often easier to optimize.
d
A score function is the derivative of a log-likelihood function. Set it Loss function: any function such that L(θ, θ̂) ≥ 0, L(θ, θ) = 0 • X − Y → X̃ − c
to zero to solve for MLE. The expectation of the score function is zero. d
Differentiate it again to confirm maximum.
e.g. Error Tolerance: P (|θ̂ − θ|) ≥ E • XY → X̃c
d
Fisher Information informs on the variance within the data; the less Cramér-Rao Lower Bound (CRLB) • X/Y → X̃/c.
information, the more noise.
An unbiased estimator θ̂ has Var(θ̂) ≥ IY (θ)−1 . Delta Method
IY (θ) = Var(s(θ; Y ); θ) = nI1 (θ)
For some random variable T (often an estimator for θ) and a function
Change of Variables: Rao-Blackwell Theorem g that is differentiable on the domain of interest, suppose
IY (θ) IY (θ) √ d
IY (τ ) = = For an estimator θ̂ and a sufficient statistic T , Rao-Blackwellization n(T − θ) → N (0, σ 2 ). Then:
( dτ
dθ )
2 g ′ (θ)2 yields θ̂RB = E(θ̂|T ) such that Bias(θ̂RB ) = Bias(θ̂) and
√ d

′ 2 2
Information Equality: Var(θ̂RB ) ≤ Var(θ̂). n(g(T ) − g(θ)) → N 0, g (θ) σ
′ 2 Note: If θ̂ is already a function of T , then θ̂RB = θ̂.
E(s (θ)) = −IY (θ) = −E(s (θ)) It follows that: !
· ′ 2 σ 2
Point Estimation Interval Estimation g(T ) ∼ N g(θ), g (θ)
n
For point estimation, the estimand, estimator, and estimates are all For interval estimation, the estimator is an interval. Intuitively, an Note: The sample mean satisfies these conditions by CLT.
single values. interval estimate captures plausible values of the estimand.
A point estimate is consistent if its bias and variance are Regression
asymptotically zero, i.e. if it converges in probability to the true value. Confidence Intervals
Can be checked through showing MSE → 0, LLN, and/or CMT. ⃗ be the predictors/covariates/features, and Y be the
Let X
The interval [L(Y ), U (Y )] is a (1 − α) confidence interval if the
outcome/label. Let the data be (X⃗ j , Yj ) for j ∈ {1, 2, . . . n}.
Maximum Likelihood Estimate coverage probability P (L(Y ) ≤ θ ≤ U (Y ) = 1 − α. In practice, α is
often 0.05, i.e. a 95% confidence interval. Let µ(x) = E(Y | X = x) and U (x) = Y − µ(x) (error).
Compute the MLE by optimizing the likelihood function or
log-likelihood function, usually by setting the score function equal to Intuition: Over replicated sampling of Y , if the interval is computed Theorems:
zero. Check second derivative of log-likelihood. using the same functions L(Y ) and U (Y ) of the data, a proportion of
1 − α of these intervals will contain the true value of θ. 1. Y = µ(x) + U (x) (signal + noise)
Invariance Property: Let ψ = g(θ) for some function g. The MLE
A pivot is a quantity with a known distribution, usually with an 2. E(U (x) | X = x) = 0
ψ̂M LE = g(θ̂M LE ). Invariance holds for the likelihood function as a
whole, allowing for reparameterization. Expectation is a function in unknown value. Utilize a pivot to find the distribution of a statistic 3. E(U (x)) = 0
which repara. can be applied. with a known value but unknown distribution.
√ d
MLE is asymptotically distributed n(θ̂ − θ) → N (0, I1 (θ)−1 ), i.e. Student’s T distribution is a useful pivot. tn = √ Z
, where
Gaussian Linear Regression
vn /n
asymptotically, MLE is unbiased and achieves CRLB. MLE is Assume Y | X = x ∼ N (θx, σ 2 (x)). Homoskedasticity implies that σ 2
Z ∼ N (0, 1) and vn ∼ χ2n with Z ⊥
⊥ vn . (See Examples)
consistent for true θ ∗ does not change with X, i.e. σ 2 (x) is a constant, whereas
·
It follows that MLE is approximately distributed θ̂ ∼ N (θ, IY (θ)−1 ). heteroskedasticity implies otherwise.
Asymptotics Error: yj − θxj , Residual: yj − θ̂xj
Method of Moments p
(yj − θxj )2
P
Convergence of Empirical CDF: F̂ → F Sum of Square Errors: (θ) =
1. Write the estimand in terms of theoretical moments.
2. Replace the estimand with the estimator, replace the Convergence of Sample Quantiles (PDF f ): Least Squares Estimator: θ̂LS = argmin (θ)
θ
theoretical moments with sample moments.
√ p(1 − p)

d Note: For Gaussian Linear Regression, LSE agrees with MLE.
3. Solve for the estimator. n(Q̂(p) − Q(p)) → N 0,
f (Q(p))2
Logistic Regression Bayes’ Rule: Inequalities
P (B | A)P (A)
For binary Y , which must follow a Bernoulli distribution. • Cauchy-Schwarz: |E(XY )| ≤
p
P (A | B) = E(X 2 )E(Y 2 )
P (B)
µ(x) = E(Y | X = x) = P (Y = 1 | X = x) = F (θx) • Markov: P (X ≥ a) ≤
E|X|
for a > 0
P (A, B, C) P (B | A, C)P (A | C) P (B, C | A)P (A) a
P (A | B, C) = = =
where F is some function that maps to [0, 1], i.e. a CDF. Logisitc P (B, C) P (B | C P (B, C σ2
• Chebyshev: P (|X − µ| ≥ a) ≤ a2
regression uses the logistic CDF, a.k.a. sigmoid function, and probit – posterior is proportional to likelihood times prior. Conditional
exp(θ0 +θ1 x1 )
regression uses the standard normal CDF. 1+exp(θ , the inverse Distribution of RVs: • Jensen: E(g(X)) ≥ g(E(X)) for convex g; reverse for concave g
0 +θ1 x1 )
p
logit function (logit = log( 1−p fX,Y fY |X fX
fX|Y = = Poisson Processes
fY fY
Mathematical Tools For a Poisson process of rate λ arrivals per unit of time:
Marginal Distribution of RVs: • The number of arrivals in a time interval of length t is Pois(λt)
Taylor Approximation: (at a point a) Z
fX = fX,Y dY • Numbers of arrivals in disjoint time intervals are independent
∞
X f (n) (a) n • Inter-arrival times are i.i.d. Expo(λ)
f (x) ≈ (x − a)
n!
n=0 Expectation and Variance Convolutions of Random Variables
Differentiation Under the Integral Sign (DUThIS): Definition of Expectation:
• X ∼ Pois(λ1 ), Y ∼ Pois(λ2) =⇒ X +Y ∼ Pois(λ1 + λ2 )
∂ ∂
Z Z Z λ1
h(x, θ)dx = h(x, θ)dx E(X) = xfX (x)dx =⇒ X | X + Y = n ∼ Bin n, λ +λ
1 2
∂θ ∂θ
• X ∼ Bin(n1 , p), Y ∼ Bin(n2 , p) =⇒ X + Y ∼ Bin(n1 + n2 , p)
Sum of Squared Differences Identity: Linearity:
E(aX + bY + c) = aE(X) + bE(Y ) + c • X ∼ Gamma(a1 , λ),
n n
X 2
X 2 2 Y ∼ Gamma(a2 , λ) =⇒ X + Y ∼ Gamma(a1 + a2 , λ)
(yj − µ) = (yj − ȳ) + n(ȳ − µ) Conditional Expectation:
j=1 j=1 Z • X ∼ NBin(r1 , p),
E(X | A) = xfX (x | A)dx Y ∼ NBin(r2 , p) ⇐⇒ X + Y ∼ NBin(r1 + r2 , p)
Change of Variables:
• X ∼ N (µ1 , σ12 ), Y ∼ N (µ2 , σ22 ) ⇐⇒
Let X and Y be random variables such that Y = g(X) where g is LOTUS: X + Y ∼ N (µ1 + µ2 , σ12 + σ22 )
differentiable and strictly increasing. Then, for corresponding x, y:
Z
E(g(X)) = g(x)fX (x)dx • Y ∼ Expo(λ) ⇐⇒ λY ∼ Expo(1) ⇐⇒ kY ∼ Expo(λ/k)
dx
fY (y) = fX (x) and FY (y) = FX (x) λ1
dy Adam’s Law (LOTE): • X ∼ Expo(λ1 ), Y ∼ Expo(λ2 ) =⇒ P (X < Y ) = λ1 +λ2

Note: When g is strictly decreasing, FY (y) = 1 − FX (x). E(Y ) = E(E(Y | A)) = E(Y | A1 )P (A1 ) + E(Y | A2 )P (A2 ) + . . . • Minimum of k independent Expo(λj ) ∼ Expo(λ1 + · · · + λk )
Geometric Series: Eve’s Law (LOTV): • Maximum of k i.i.d. Expo(λ) = Y1 + Y2 + . . . Yk , where
n−1 Yj ∼ Expo(jλ)
2 n−1
X k 1 − rn Var(Y ) = E(Var(Y | X)) + Var(E(Y | X))
a + ar + ar + · · · + ar = ar = a
k=0
1−r Fundamental Bridge (Indicator RVs): Special Cases of Distributions
Taylor Series for Exponential Function: • Bin(n, p) can be thought of as the sum of n i.i.d. Bern(p)
E(Ind(A)) = P (A)
∞ n 2 3 n • Beta(1, 1) is the same distribution as Unif(0, 1)
x
X x x x x Variance:
e = =1+x+ + + · · · = lim 1+ • The sum of n i.i.d. Expo(λ) is Gamma(n, λ),
n! 2! 3! n→∞ n 2 2 2
n=0 Var(X) = E(X − E(X)) = E(X ) − (E(X))
• χ2n is the sum of squares of n i.i.d. N (0, 1) or Gamma n 1

2, 2
Sum of First n Terms of Harmonic Series: Standard Deviation: • NBin(r, p) can be thought of as the sum of r i.i.d. Geom(p)
1 1 1 q
1 + + + ··· + ≈ log n + 0.5772 SD(X) = Var(X) • For X ∼ Expo(λ), X 1/γ ∼ Weibull(λ, γ)
2 3 n
Gamma and Beta Integrals Covariance: • For X ∼ N (µ, σ 2 ), eX ∼ Log-Normal(µ, σ 2 )
Z ∞ Z 1 Γ(a)Γ(b) • For X ∼ Gamma(a, λ) and Y ∼ Gamma(b, λ),
t−1 −x a−1 b−1 Cov(X, Y ) = E[(X − E(X))(Y − E(Y ))] = E(XY ) − E(X)E(Y ) X X
x e dx = Γ(t) x (1 − x) dx = X+Y ∼ Beta(a, b), with X + Y ⊥
⊥ X+Y
0 0 Γ(a + b)
Correlation: • Beta-Binomial Conjugacy: For X | p ∼ Bin(n, p) and
Notes: Γ(a + 1) = aΓ(a), and Γ(n) = (n − 1)! if n is a positive integer. Cov(X, Y )
Corr(X, Y ) = p p ∼ Beta(a, b), the posterior p | X ∼ Beta(a + x, b + n − x)
Var(X)Var(Y )
• Gamma-Poisson Conjugacy For X | λ ∼ Pois(λt) and
Useful Stat 110 Concepts Properties: λ ∼ Gamma(r, b), the posterior λ | X ∼ Gamma(r + x, b + t)
• Var(aX) = a2 Var(X)
Conditional Probability • Chicken-Egg: For Z ∼ Pois(λ) items and acceptance
• For X ⊥
⊥ Y , Var(X + Y ) = Var(X − Y ) = Var(X) + Var(Y ) probability p, accepted items Z1 ∼ Pois(λp) and rejected items
Definition of Conditional Probability: Z2 ∼ Pois(λ(1 − p)) with Z1 ⊥ ⊥ Z2
• Cov(X, Y ) = Cov(Y, X)
P (A, B) • Poisson-Normal: Pois(λ) is approximately N (λ, λ) when λ is
P (A | B) = • Cov(X + a, Y + b) = Cov(X, Y )
P (B) large
• Cov(aX, bY ) = abCov(X, Y )
• Binomial-Poisson: Bin(n, p) is approximately Pois(np) when
Law of Total Probability:
• Cov(W + X, Y + Z) = p is small
n Cov(W, Y ) + Cov(W, Z) + Cov(X, Y ) + Cov(X, Z)
X • Binomial-Normal: Bin(n, p) is approximately N (np, npq)
P (A) = P (A | B1 )P (B1 )+P (A | B2 )P (B2 )+· · · = P (A | Bk )P (Bk )
• Corr(aX + b, cY + d) = Corr(X, Y ) when n is large and p is not near 0 or 1
k=1
Examples German Tank Problem Variance-Stabilizing of Poisson
Suppose n tanks are captured, with serial numbers Y1 , Y2 , . . . Yn . Let T ∼ Pois(λ)
MLE and MoM of Mean/Variance in Normal Assume the population serial numbers are 1, 2, . . . t and that the data √≈ N (λ, λ) for large λ. What is the approximate
distribution of T ?
i.i.d. is a simple random sample. Estimate the total number of tanks t.
Let Y1 , Y2 , Yn ∼ N (µ, σ 2 ). ·
1 T ∼ N (λ, λ)
Maximum Likelihood Estimates: L(t) = t
if Y1 , Y2 , . . . Yn ∈ {1, 2, . . . t} and 0 otherwise
√ √ 1

·
! n
(Yj − µ)2
P
1 2 T ∼N λ, , by Delta Method
L(µ, σ ) =exp − 4
σn 2σ 2 Ind(Y(n) ≤ t)
= t
2
ℓ(µ, σ ) = −n log(σ) −
1 X 2
(Yj − Ȳ ) + n(Ȳ − µ)
2
n Confidence Interval for Mean/Variance in Normal
2σ 2 The likelihood of t is 0 for t < Y(n) because we would have already
The log-likelihood maximizes when µ = Ȳ , so µ̂M LE = Ȳ . observed a tank with a higher serial number. However, the likelihood Let Y ∼ N (µ, σ 2 ), with µ and σ 2 both unknown.
function is decreasing, so the maximum likelihood estimate must be Suppose we use the unbiased sample variance σ̂ 2 = 1
P
(Yj − Ȳ )2 .
Plugging µ̂M LE into the log-likelihood function: t̂M LE = Y(n) . However, this estimator is biased. n−1

2 1 X 2
ℓ(σ ) = −n log(σ) − (Yj − Ȳ ) The PMF for Y(n) is the number of ways to choose n − 1 tanks with n−1 2 2
2σ 2 serial numbers less than Y(n) divided by the total number of ways to σ̂ ∼ χn−1
σ2
n 2 1 X 2 choose n tanks from t.
= − log(σ ) − (Yj − Ȳ )
2 2σ 2 m−1 So, the 95% confidence interval for σ 2 is:
2 n 1 X 2 n−1
s(σ ) = − 2 + (Yj − Ȳ ) P (Y(n) = m) = t
" #
2σ 4σ 2 n
σ̂ 2 (n − 1) σ̂ 2 (n − 1)
n ,
2 1 X 2 Q1 (0.975) Q1 (0.025)
σ̂M LE = (Yj − Ȳ ) t m − 1 n
n j=1
X
E(Y(n) ) = m = (t + 1)
n−1 n+1 where Q1 (p) is the quantile function for χ2n−1 or Gamma n−1 1
Method of Moments: m=n 2 , 2 .
n+1
1. µ = E(Y ) So, we can correct our estimator to n Y(n) − 1, which is unbiased.
2. µ̂M oM = Ȳ Now, suppose we use the sample mean µ̂ = Ȳ .
Sample Mean vs. Sample Median σ
1. σ 2 = E(Y 2 ) − (E(Y ))2 Ȳ − µ ∼ √ N (0, 1)
i.i.d.
2
Let Y1 , Y2 , . . . Yn ∼ N (θ, σ 2 ); estimand θ n
P 2
1
Yj − Ȳ 2 = 1
(Yj − Ȳ )2
P
2. σ̂M oM = n n
2

Sufficient Statistic and MLE in an NEF Sample mean: Ȳ ∼ N µ, σn n−1 2 2
σ̂ ∼ χn−1
σ2
σ2

·
The PMF/PDF of an NEF can be written as fθ (y) = eθy−ψ(θ) h(y), so Sample median: Mn ∼ N θ, π by asymptotic distribution of
2 n So, we can use the pivot:
the joint log-likelihood is:
sample quantiles
P
Yj −nψ(θ) σ σ
L(θ) = e
θ
Ȳ − µ √
n
Z √
n
Z
Sample mean is a more efficient estimator as it has a lower variance, p = r = q
n but in cases when the assumption of Normal is wrong (e.g. Cauchy), σ̂ 2 /n σ n−1 2
(n−1)σ 2 √ σ̂ /(n − 1)
σ̂ 2 /n
X
ℓ(θ) = θ Yj − nψ(θ) sample median may be more robust. (n−1)σ 2
n σ2
j=1
n
′
Generic Method of Moments Z
∼ tn−1
X
s(θ) = Yj − nψ (θ) = 0 = q
n−1 2
j=1 Let Y1 , Y2 , . . . Yn i.i.d. with mean θ and variance σ 2 . σ2
σ̂ /(n − 1)
n 1. θ = E(Y )
1 X ′ So, the 95% confidence interval for µ is:
Yj = ψ (θ) = E(Y )
n j=1 2. θ̂M oM = Ȳ
σ̂ σ̂
µ̂M LE = Ȳ Evaluation: Ȳ − √ Q2 (0.975), Ȳ − √ Q2 (0.025)
n n
So, Ȳ is a sufficient statistic. • Bias(θ̂) = 0 by linearity
2
where Q2 (p) is the quantile function for the Student-t distribution
Censored Data • Var(θ̂) = σn with parameter n − 1.
Suppose there are n = 30 devices. They are observed for 7 months, at 2

d
• θ̂ → N θ, σn by CLT Fisher Information Equality
which point 21 have failed while 9 still work. Assume each device’s
i.i.d. p
lifetime Yj ∼ Expo(λ) and the estimand is µ = 1/λ. • θ̂ → θ by LLn i.i.d.
Let Y1 , Y2 , . . . Yn → Geom(p). Find IY (p).
For each observation:
(
f (y) if observed Asymptotics Theorems L1 (p) = p(1 − p)
Y1
Lj (λ) =
1 − F (7) if not observed Let Y1 , Y2 , . . . Yn i.i.d. with
√
mean µ ̸= 0 and variance σ 2 . Suppose a
n(Ȳ −µ) ℓ1 (p) = log(p) + Y1 log(1 − p)

21
 variable of interest T = Ȳ 3 −µ3 . Find its asymptotic distribution.
−7λ 9

L(λ) =
Y
λe
−λtj 
e
21 −21λt̄ −63λ
=λ e e 1 Y1
 √ d s1 (p) = −
j=1 Numerator: n(Ȳ − µ) → σZ where Z ∼ N (0, 1) by CLT p 1−p
p p
ℓ(λ) = 21 log(λ) − 21λt̄ − 63λ Denominator: Ȳ → µ by LLN, then Ȳ 3 → µ3 by CMT, so 1. IY (p) = nI1 (p) = nVar 1 Y1
− 1−p = n
Var(Y1 ) =
p (1−p)2
21 Ȳ 3 + µ3 = 2µ3 1−p
s(λ) = − 21t̄ − 63 = 0 n
= n
λ Combining the numerator and denominator using Slutsky’s Theorem: (1−p)2 p2 p2 (1−p)
1
λ̂M LE = !
2. IY (p) = nI1 (p) = −nE(s′1 (p)) = −nE − p12 −
Y1
=
t̄ + 3 d σZ · σ2 (1−p)2
T → = N 0, n n n
µ̂M LE = t̄ + 3, by invariance 2µ3 4µ6 p2
+ (1−p)2
E(Y1 ) = p2 (1−p)
E(Y Z)
Poisson Rao-Blackwellization = E(Y Z) − E(XZ) = E(Y Z) − E(Y Z) = 0 Plugging in ψ = eµ by symmetry of log(Y ):
E(XZ) !
i.i.d.
Let Y1 , Y2 , . . . Yn → Pois(λ). The sufficient statistic is T =
P
Yj . P P P P √ d 2πσ 2 ψ 2
Yj Zj Uj Zj + β Xj Zj Uj Zj n(ψ̂ − ψ) → N 0,
Suppose we use the unbiased estimator λ̂ = Y1 . β̂ = P = P =β+ P 4
Xj Zj Xj Zj Xj Zj
Then, for large n:
λ̂RB = E(λ̂ | T ) = E(Y1 | T ) Then, !
· πσ 2 ψ 2
√ 1 P ψ̂ ∼ N ψ,
We can think of the data as a Poisson process with a rate of λ and a √ n Uj Zj 2n
total of T arrivals. We can split the timeline into disjoint intervals of n β̂ − β = 1 n
P
n Xj Zj So, for large n:
length 1, so the number of arrivals in each interval is distributed as
Pois(λ). Since we know that there are a total of T arrivals and Bias(ψ̂) ≈ ψ − θ
√
√

1 X nX
r
π
q
per-interval arrivals are i.i.d., we can view the distribution of arrivals d
1
n Uj Zj − E(U Z) = Uj Zj → N (0, Var(U Z) by CLT SE(ψ̂) = Var(ψ̂) ≈ σψ
that fall within the first interval as Y1 | T ∼ Bin T, n . n n 2n
P (2) What are the bias and standard error for large n when the
T Yj 2 2
Var(U Z) = E((U Z) ) − E(U Z) = E((U Z) )
2
estimand is ψ?
λ̂RB = E(Y1 | T ) = = = Ȳ
n n
By the approximate distribution for ψ̂ found previously:
1 X p
Log-Normal MoM Xj Zj → E(XZ) by LLN Bias(ψ̂) ≈ ψ − ψ = 0
n r
i.i.d. 2 π
Let Y1 , Y2 , . . . Yn ∼ Log-Normal(µ, σ ). Use two methods to find
!
√
d E((U Z)2 ) SE(ψ̂) ≈ σψ
Method of Moments estimators for (µ, σ 2 ). n β̂ − β → N 0, 2
by Slutsky’s 2n
E(XZ)
Method 1: MLE and MoM for Gaussian Linear Regression
1. Let Xj = log(Yj ) for all j ∈ {1, 2, . . . , n} =⇒ Xj ∼ N (µ, σ 2 ) MoM for Neil’s Commute Problem Let the data be pairs (Xj , Yj ) such that Yj | Xj ∼ N (θXj , σ 2 (Xj )),
Let there be two different routes X and Y and X1 , . . . Xn and with σ 2 (x) known. Note that under homoskedasticity, σ 2 (x) is a
(
µ = E(X)
2. Y1 , . . . Yn be independent commute times with the Xj ’s i.i.d. and Yj ’s constant.
σ 2 = E(X 2 ) − (E(X))2 i.i.d. We want to compare the commute times by looking at the ratio Maximum Likelihood Estimate:
( of expected commute times beyond 40 minutes. The estimand is: n
µ̂ = X̄ 1 X 2
3. E[(Y1 − 40)Ind(Y1 > 40)] ℓ(θ) = − 2 (yj − θxj )
σ̂ 2 = n
P 2
1
Xj − X̄ 2 = n1
(Xj − X̄)2
P
θ= 2σ j=1
E[(X1 − 40)Ind(X1 > 40)]
Method 2: n
1 X
xj (yj − θxj )
(
E(Y ) = exp µ + 12 σ 2

Find an MoM estimator for θ and show that it is consistent. s(θ) = 2
1. σ j=1
E(Y ) = exp(2µ + 2σ 2 )
2
Let ν = E[(Y1 − 40)Ind(Y > 40)]. Applying the MoM principle, P
X j Yj
n θ̂ = P 2
2. Let M = n 1
Yj2 1 X Xj
( ν̂ = (Yj − 40)Ind(Yj > 40)
Ȳ = exp µ̂ + 21 σ̂ 2 n j=1

Properties:
2
M = exp(2µ̂ + 2σ̂ ) • E(θ̂ | X) = P1 2 E( xj Yj ) = P1 2
P P
xj E(Yj | Xj )
( Similarly, let η = E[(X1 − 40)Ind(X > 40)]. x
j
x
j
2
log Ȳ = µ̂ + 21 σ̂ 2 1 1
P P
n
= P 2 xj θxj = P 2 θ xj = θ =⇒ Unbiased
3. 1 X x
j
x
j
log M = 2µ̂ + 2σ̂ 2 η̂ = (Xj − 40)Ind(Xj > 40)
n j=1 • Var(θ̂ | X) = 1
Var( xj yj ) = P 12 2
P P
Var(xj Yj |
( x2 ) 2
P
( ( x )
σ̂ 2 = log M − 2 log Ȳ 1
j
2 1
j
P 2 x
4.
P
Pn Xj ) = P 2 2 xj Var(Yj | Xj ) = P 2 2 xj σ (xj )
µ̂ = 2 log Ȳ − 21 log M ν̂ j=1 (Yj − 40)Ind(Yj > 40) ( x )
j
( x )
j
θ̂ = = Pn
η̂ j=1 (Xj − 40)Ind(Xj > 40) σ2
⋆ Simplifies to P 2 (CRLB) under homoskedasticity
Asymptotics for MoM Estimator x
j
For consistency, we want to show that θ̂ converges in probability to θ.
Suppose we have data in the form of triplets (Xj , Yj , Zj ) for • Robust Variance: Since E(σ 2 (Xj ) − (Yj − θXj )2 | Xj ) = 0, use
p p
E(Y Z)
j = 1, 2, . . . n. The estimand is β = E(XZ) . ν̂ → ν and η̂ → η by LLN (Yj − θXj )2 as an unbiased estimator for σ 2 (Xj ). Then, the
P 2
robust variance is P 12 2 xj (yj − θxj )2
( x )
P
Y Z
P j j
ν̂ p j
We can use the method of moments estimator β̂ = X j Zj
. = θ̂ → θ by Slutsky’s
η̂ Method of Moments 1: (Gauss’s Estimator)
(1) Show that β̂ is asymptotically unbiased. A = Y1 > 40 ⇒ E[(Y1 − 40)I(Y1 > 40)] = E(XY ) = E(E(XY | X)) = E(XE(Y | X)) = E(XθX) = θE(X )
2

p E[(Y1 − 40)I(A)|A]P (A) + E[(Y1 − 40)I(A)|Ac ]P (Ac ) P

E(XY ) X j Yj
X
Yj Zj → E(Y Z) by LLN = E[Y1 − 40|A]P (A) = µY e−40/µY =⇒ θ = =⇒ θ̂ = P 2 = M LE
E(X )2 Xj
X p
Xj Zj → E(XZ) by LLN Asymptotics of Sample Median Method of Moments 2: (Cauchy’s Estimator)
i.i.d. 2 µ+σ 2 /2 E((X)Y ) = E(E((X)Y | X)) = E((X)E(Y | X))
P
Yj Z j p Let Y1 , Y2 , . . . Yn ∼ Log-Normal(µ, σ ). Let θ = E(Y1 ) = e P
→ β by Slutsky’s and let ψ be the median of the distribution. Suppose that the (xj )yj
= E((X)θX) = θE(|X|) =⇒ θ̂ = P
P
Xj Zj
estimand is the sample median. |xj |
√
(2) Derive the asymptotic distribution of n β̂ − β . Letting (1) What are the bias and standard error for large n when the Properties:
P(X P |X
estimand is θ? )θXj j|

U = Yβ X, the parameters of the answer can be in the terms of any • E(θ̂ | X) = E P j = θE P =θ
|Xj | |Xj |
moments of X, Y, Z, U or any moments of their products. By the asymptotic distribution of sample quantiles:
• Var(θ̂ | X) = (P |x1 |)2
P
Var((X)Y | X) =
√

d 1/4 j
E(U Z) = E((Y − βX)Z) = E(Y Z) − βE(XZ) n(ψ̂ − ψ) → N 0, P 1
P 2
σ (xj ) =⇒ less efficient than MLE
(f (ψ))2 ( |xj |)2

Data Analysis Formula Sheet Tables (DADM)
No ratings yet
Data Analysis Formula Sheet Tables (DADM)
8 pages
Basic Principles of Experimental Designs
100% (4)
Basic Principles of Experimental Designs
2 pages
Introductory Probability Theory (Nicholas N.N. Nsowah-Nuamah)
No ratings yet
Introductory Probability Theory (Nicholas N.N. Nsowah-Nuamah)
299 pages
MTH 106 INTRODUCTORY TO DESCRIPTIVE STATISTICS
100% (1)
MTH 106 INTRODUCTORY TO DESCRIPTIVE STATISTICS
134 pages
PDF For Successive Approximation
100% (1)
PDF For Successive Approximation
28 pages
Binomial Distribution
100% (1)
Binomial Distribution
15 pages
18.6501x Fundamentals of Statistics
100% (1)
18.6501x Fundamentals of Statistics
8 pages
Probability & Statistics 4
100% (1)
Probability & Statistics 4
42 pages
Chap 012
75% (4)
Chap 012
91 pages
Applied Statistics PDF
No ratings yet
Applied Statistics PDF
417 pages
Cheat Sheet Final
100% (2)
Cheat Sheet Final
7 pages
Lecture Notes Statistics II PDF
No ratings yet
Lecture Notes Statistics II PDF
139 pages
2.lecture2 Ate
No ratings yet
2.lecture2 Ate
61 pages
AllNotes 4
No ratings yet
AllNotes 4
56 pages
My - Higher Engineering Mathematics by Dr. B.S. Grewal (B.S Grewal) (Z-Library)
No ratings yet
My - Higher Engineering Mathematics by Dr. B.S. Grewal (B.S Grewal) (Z-Library)
1,327 pages
2024 Division Statistics Month Celebration Statistics Quiz Answers
No ratings yet
2024 Division Statistics Month Celebration Statistics Quiz Answers
7 pages
Lecture 2 Merged
No ratings yet
Lecture 2 Merged
75 pages
MTH4100 Calculus I: Lecture Notes For Week 5 Thomas' Calculus, Sections 2.6 To 3.5 Except 3.3
No ratings yet
MTH4100 Calculus I: Lecture Notes For Week 5 Thomas' Calculus, Sections 2.6 To 3.5 Except 3.3
10 pages
Lecture 3-AdditionRule PDF
No ratings yet
Lecture 3-AdditionRule PDF
64 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Learn R For Applied Statistics
No ratings yet
Learn R For Applied Statistics
457 pages
Probability Answers PDF
No ratings yet
Probability Answers PDF
87 pages
07 - Natural Experiment (Part 2) PDF
No ratings yet
07 - Natural Experiment (Part 2) PDF
90 pages
BP 801t Biostatistics and Research Methodology Jun 2020
No ratings yet
BP 801t Biostatistics and Research Methodology Jun 2020
3 pages
An Introduction To Bayesian Statistics and MCMC Methods
No ratings yet
An Introduction To Bayesian Statistics and MCMC Methods
69 pages
Advanced Statistical Inference
No ratings yet
Advanced Statistical Inference
7 pages
Properties of The Trinomial Distribution
No ratings yet
Properties of The Trinomial Distribution
2 pages
Notes PDF
No ratings yet
Notes PDF
407 pages
ML Section16 Causality
No ratings yet
ML Section16 Causality
57 pages
ANOVA
No ratings yet
ANOVA
15 pages
Generalised Linear Models and Bayesian Statistics
No ratings yet
Generalised Linear Models and Bayesian Statistics
35 pages
12-Multiple Comparison Procedure
No ratings yet
12-Multiple Comparison Procedure
12 pages
Moment Generating Functions
No ratings yet
Moment Generating Functions
7 pages
2023probabilistic & Statistical Reasoning
No ratings yet
2023probabilistic & Statistical Reasoning
245 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
05 Random Variables
No ratings yet
05 Random Variables
327 pages
Up Tps6 Lecture Powerpoint 11.1 2
No ratings yet
Up Tps6 Lecture Powerpoint 11.1 2
63 pages
PT Activities Overview PDF
No ratings yet
PT Activities Overview PDF
57 pages
Statistics Report..
No ratings yet
Statistics Report..
34 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Ucl Summer School: Language and The Mind: An Introduction To Psycholinguistics
No ratings yet
Ucl Summer School: Language and The Mind: An Introduction To Psycholinguistics
3 pages
Diff in Diff Uk12 Villa
No ratings yet
Diff in Diff Uk12 Villa
16 pages
Introduction To Probabilistic Learning
No ratings yet
Introduction To Probabilistic Learning
9 pages
6.5 Order Statistik
No ratings yet
6.5 Order Statistik
13 pages
Prob&StatsBook PDF
No ratings yet
Prob&StatsBook PDF
202 pages
Econ G2 Final
No ratings yet
Econ G2 Final
10 pages
Roadmap To Crack DS - ML Interviews PDF
No ratings yet
Roadmap To Crack DS - ML Interviews PDF
2 pages
The Effects of School-Based Management in The Philippines: Policy Research Working Paper 5248
No ratings yet
The Effects of School-Based Management in The Philippines: Policy Research Working Paper 5248
29 pages
Marketing Research: MRKT 451 Experimentation I
No ratings yet
Marketing Research: MRKT 451 Experimentation I
41 pages
202004160626023624rajiv Saksena Advance Statistical Inference
No ratings yet
202004160626023624rajiv Saksena Advance Statistical Inference
31 pages
Chapter 3: Answers To Questions and Problems: Managerial Economics and Business Strategy, 8e
No ratings yet
Chapter 3: Answers To Questions and Problems: Managerial Economics and Business Strategy, 8e
16 pages
When Should You Adjust Standard Errors For Clustering?: Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge
No ratings yet
When Should You Adjust Standard Errors For Clustering?: Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge
33 pages
Conditional Probability: Class XII - Math Concepts and Formulae Chapter: P Robability
No ratings yet
Conditional Probability: Class XII - Math Concepts and Formulae Chapter: P Robability
3 pages
Logistic SPSS
No ratings yet
Logistic SPSS
27 pages
Measure Theory Liskevich
No ratings yet
Measure Theory Liskevich
40 pages
2 Right Censoring and Kaplan-Meier Estimator: ST 745, Daowen Zhang
No ratings yet
2 Right Censoring and Kaplan-Meier Estimator: ST 745, Daowen Zhang
33 pages
Research Chapter 3
100% (1)
Research Chapter 3
9 pages
Generalized Linear Models: Ariel Alonso Abad
No ratings yet
Generalized Linear Models: Ariel Alonso Abad
43 pages
Exponential Distribution
No ratings yet
Exponential Distribution
19 pages
Analisis Pengaruh Inflasi, Suku Bunga Kredit, Pendapatan Per Kapita Terhadap Penanaman Modal Dalam Negeri Di Indonesia
No ratings yet
Analisis Pengaruh Inflasi, Suku Bunga Kredit, Pendapatan Per Kapita Terhadap Penanaman Modal Dalam Negeri Di Indonesia
19 pages
Essay On Different Types of Sampling Used in Nepal
No ratings yet
Essay On Different Types of Sampling Used in Nepal
4 pages
Nonparametric Tests in R
No ratings yet
Nonparametric Tests in R
5 pages
Binomial Distribution
No ratings yet
Binomial Distribution
16 pages
Poisson Distribution
No ratings yet
Poisson Distribution
22 pages
STAT 400 Midterm 1 Cheat Sheet
No ratings yet
STAT 400 Midterm 1 Cheat Sheet
4 pages
3 - Principles of Data Reduction
No ratings yet
3 - Principles of Data Reduction
14 pages
Lesson 4 Measure of Central Tendency or Position Activity 67
No ratings yet
Lesson 4 Measure of Central Tendency or Position Activity 67
3 pages
Probability Theory
No ratings yet
Probability Theory
68 pages
NASPUBDOK
No ratings yet
NASPUBDOK
14 pages
Confidence Intervals: Vocabulary: Point Estimate - Interval Estimate - Level of Confidence - Margin of Error
No ratings yet
Confidence Intervals: Vocabulary: Point Estimate - Interval Estimate - Level of Confidence - Margin of Error
8 pages
Assignment 1 Answers
No ratings yet
Assignment 1 Answers
7 pages
Axiomatic Probability and Concepts
No ratings yet
Axiomatic Probability and Concepts
6 pages
Assignment I
No ratings yet
Assignment I
3 pages
JM jap,+LIVEN+JOLANDA+TUEGEH+JURNAL
No ratings yet
JM jap,+LIVEN+JOLANDA+TUEGEH+JURNAL
10 pages
0.1 Simulation Based Power Analysis For Factorial ANOVA Designs PDF
No ratings yet
0.1 Simulation Based Power Analysis For Factorial ANOVA Designs PDF
11 pages
Ch3 Numerically Summarizing Data
No ratings yet
Ch3 Numerically Summarizing Data
35 pages
Bayesian Statistics: A User's Perspective
No ratings yet
Bayesian Statistics: A User's Perspective
24 pages
Problem Set 3 SOLUTIONS
No ratings yet
Problem Set 3 SOLUTIONS
7 pages
Morgan Handbook 2013
No ratings yet
Morgan Handbook 2013
10 pages
Normality and Reliability
No ratings yet
Normality and Reliability
13 pages
Stochastic Signals and Systems Problem Set 10 Solution: N N N 1 1 2 N N M 1 2
No ratings yet
Stochastic Signals and Systems Problem Set 10 Solution: N N N 1 1 2 N N M 1 2
4 pages
PR Ekonometrika
No ratings yet
PR Ekonometrika
8 pages
Practical 9
No ratings yet
Practical 9
6 pages
MATH 7 3RD Alternative Test q4
No ratings yet
MATH 7 3RD Alternative Test q4
2 pages
Midterm Wife Cheat Sheet
No ratings yet
Midterm Wife Cheat Sheet
3 pages
Geometric Distribution Report
No ratings yet
Geometric Distribution Report
5 pages
Lecture 6 Instrumental Variables (IV) Estimation and Two Stage Least Squares (2SLS)
No ratings yet
Lecture 6 Instrumental Variables (IV) Estimation and Two Stage Least Squares (2SLS)
2 pages
Study Theme 1 - Chapter 1 - Hello Data
No ratings yet
Study Theme 1 - Chapter 1 - Hello Data
23 pages
Rohatgi Expl
No ratings yet
Rohatgi Expl
192 pages
Fernando, Logit Tobit Probit March 2011
No ratings yet
Fernando, Logit Tobit Probit March 2011
19 pages

Statistical Inference Cheat Sheet

Uploaded by

Statistical Inference Cheat Sheet

Uploaded by

Stat 111 Midterm Notesheet Empirical Estimates Law of Large Numbers

Center: The sample mean converges in probability to the true mean.

p E[(Y1 − 40)I(A)|A]P (A) + E[(Y1 − 40)I(A)|Ac ]P (Ac ) P

You might also like