4 Estimation
4 Estimation
: Variance + Bias2
• Bias-variance
decomposition/“tradeoff”:
• If two estimators T1 and T2
have same MSE,
then
if one estimator (say, T1) has a smaller bias magnitude,
it (i.e., T1) also has a larger variance
Estimator Mean, Variance, Bias
• Let X1, …, XN be a sample on a random variable X with PDF/PMF P(X; θ)
• Let T(X1, …, XN) be a estimator for parameter whose true value is θ
• Consistent estimator (definition)
• Estimator TN = T(X1, …, XN) is consistent if ∀𝜖 > 0, lim 𝑃 𝑇𝑁 − 𝜃 ≥ 𝜖 = 0
!→#
• Thus, TN is said to “converge in probability” to 𝜃
• ML estimate is:
• ML estimates
• For location parameter:
sample median
• For scale parameter:
mean/average absolute deviation
(MAD/AAD)
from the median
MLE for Uniform Distribution (Continuous)
• Parameters are: lower limit ‘a’ and upper limit ‘b’ (a < b)
• Support of PDF depends on parameters
• Let data from U(a,b) be {x1, …, xN}, sorted in increasing order, & x1 < xN
• What are ML estimates ?
• First, data must lie within [a,b]
• a ≤ x1 , else likelihood function = 0
• b ≥ xN , else likelihood function = 0
• Likelihood function L(a,b; {x1, …, xN}) := (1/(b–a))N
• Log-likelihood function log(L(a,b); {x1, …, xN}) = –N.log(b–a)
• Partial derivative w.r.t. ‘a’ is N/(b–a) > 0
• Partial derivative w.r.t. ‘b’ is (–N/(b–a)) < 0
• L(a,b) is maximum when a = x1 and b = xN
MLE for Uniform Distribution (Continuous)
• Parameters are: lower limit ‘a’ and upper limit ‘b’ (a < b)
• Let data from U(a,b) be {x1, …, xN}, sorted in increasing order, & x1<xN
• Analysis of consistency
• For estimator of ‘b’: ∀𝜖 > 0 and 𝜖 < (b-a), consider 𝑃 𝑏 − max 𝑥' ≥ 𝜖
'(),⋯,,
= 𝑃 𝑏 − 𝑥) ≥ 𝜖 𝑃 𝑏 − 𝑥- ≥ 𝜖 ⋯ 𝑃 𝑏 − 𝑥, ≥ 𝜖
(/01)03 ,
= 𝑃 𝑥) ≤ 𝑏 − 𝜖 ⋯ 𝑃 𝑥, ≤ 𝑏 − 𝜖 = (/03) Estimator TN = T(X1, …, XN) is consistent if
∀𝜖 > 0, lim 𝑃 𝑇𝑁 − 𝜃 ≥ 𝜖 = 0
which → 0 as N→ ∞ !→#
• For estimator of ‘a’: ∀𝜖 > 0 and 𝜖 < (b-a), consider 𝑃 min 𝑥' − 𝑎 ≥ 𝜖
'(),⋯,,
= 𝑃 𝑥) ≥ 𝑎 + 𝜖 𝑃 𝑥- ≥ 𝑎 + 𝜖 ⋯ 𝑃 𝑥, ≥ 𝑎 + 𝜖
341 03 , (/03)01 ,
= 1 − 𝑃 𝑥) ≤ 𝑎 + 𝜖 ⋯ 1 − 𝑃 𝑥, ≤ 𝑎 + 𝜖 = 1− /03
= (/03)
which → 0 as N→ ∞
MLE for Uniform Distribution (Continuous)
• Parameters are: lower limit ‘a’ and upper limit ‘b’ (a < b)
• Let data from U(a,b) be {x1, …, xN}, sorted in increasing order, & x1<xN
• Analysis of bias Bias(T) := E[T] – θ
• Without loss of generality, let a≥0 (shifted random variable)
• For non-negative random variable, apply tail-sum formula
J*#
𝐸[ max 𝑥) ] = J 1−𝑃 max 𝑥) ≤ 𝑡 𝑑𝑡
)*(,⋯,! J*K )*(,⋯,!
J*L J*M J*#
=J 1 𝑑𝑡 + J 1−𝑃 max 𝑥) ≤ 𝑡 𝑑𝑡 + J 1 − 1 𝑑𝑡
J*K J*L )*(,⋯,! J*M
J*M !
𝑡−𝑎
=𝑎+J 1− 𝑑𝑡
J*L 𝑏−𝑎
M:L M:L
=𝑎+ 𝑏−𝑎 − =𝑏− (check that makes sense for N=1)
!N( !N(
Linear Regression
• Given: Data (𝑥! , 𝑦! ) 5!"#
• Linear Model: 𝑌! = 𝛼+,-. + 𝛽+,-. X! + 𝜂! ,
where errors 𝜂! (in measuring 𝑌! ; not 𝑋! )
are zero-mean i.i.d. Gaussian random variables
• Goal: Estimate 𝛼+,-. , 𝛽+,-.
• Log-likelihood function
4
• L 𝛼, 𝛽; (𝑥) , 𝑦) ) )*( = log ∏) 𝐺 𝑦) ; 𝛼 + 𝛽𝑥) , 𝜎 O
• Partial derivative w.r.t. 𝛼 is 0 implies: 𝛼 = 𝑦J − 𝛽𝑥̅ (bar denotes mean)
• Partial derivative w.r.t. 𝛽 is 0 implies: ∑!(𝑦! − 𝛼 − 𝛽𝑥! )𝑥! = 0
• Substituting expression for 𝛼 gives:
∑) 𝑦) − 𝑦X 𝑥) 𝑥𝑦 − 𝑥̅ 𝑦X SampleCov 𝑋, 𝑌
𝛽= = O =
∑) 𝑥) − 𝑥̅ 𝑥) 𝑥 − 𝑥̅ O SampleVar(𝑋)
Linear Regression Slope m := Cov(X,Y) / Var(X)
Intercept c := E[Y] – Cov(X,Y) E[X] / Var(X)
• Analysis of estimates
PQRST'UVW ,,X
• Slope 𝛽 =
PQRST'YQ%(,)
• Unbiased (see next slide)
(ratio of sample-covariance and sample-variance is same with/without correction)
• Can be shown to be consistent (see next slide)
• Intercept 𝛼 = 𝑦X − 𝛽𝑥̅
• We already know that 𝑦6 and 𝑥̅ are unbiased and consistent estimators of E[Y] and E[X]
• Unbiased
• If 𝛽 is unbiased
• Can be shown to be consistent
• If 𝛽 is consistent
Linear Regression
5 5 5 5
∑&(6& 16)(7
̅ :
& 17) ∑& 6& 16̅ 7& 1 ∑& 6& 16̅ 7: ∑& 6& 16̅ 7&
6 6 6 6
•𝛽 = = =
;<=>?.@<,(') ;<=>?.@<,(') ;<=>?.@<,(')
• But, as per model, 𝑦! = 𝛼+,-. + 𝛽+,-. 𝑥! + 𝜂! . Substituting 𝑦! gives:
5 5
∑& 6& 16̅ A"#$% BC"#$7 6& BD& ∑& 6& 16̅ C"#$% 6& BD&
6 6
•𝛽 = =
;<=>?.@<,(') ;<=>?.@<,(')
5 5 5
∑& 6& 16̅ C"#$% (6& 16)B
̅ ∑& 6& 16̅ C"#$% 6B
̅ ∑& 6& 16̅ D&
6 6 6
•=
;<=>?.@<,(')
∑& 6& 16̅ D&
• = 𝛽+,-. +
5 ;<=>?.@<,(')
• So, E 𝛽 = 𝛽+,-. , because E 𝜂! = 0. So, unbiased.
∑& 6& 16̅ 8 @<, D& 5 ;<=>?.@<, E F 8 F8
• Var 𝛽 = (58 ) ;<=>?.@<, ' 8
= (58 ) ;<=>?.@<, ' 8
= 5 ;<=>?.@<,(')
• So, consistent (using Chebyshev’s inequality)
Linear Regression
• Interpretation of estimates
• Line passes through (𝑥,̅ 𝑦)
X
• If x ≔ 𝑥,̅ then y = 𝛼 + 𝛽𝑥̅ = 𝑦6 − 𝛽𝑥̅ + 𝛽𝑥̅ = 𝑦6
• “Residuals” 𝜂) sum to 0
• ∑+ 𝜂+ = ∑+ 𝑦+ − 𝛼 − 𝛽𝑥+ = 𝑛𝑦6 − 𝑛 𝑦6 − 𝛽𝑥̅ − 𝛽𝑛𝑥̅ = 0
• Slope 𝛽 = SampleCov(X,Y) / SampleVar(X)
j j
• “Centering” data
• Weighted average of “slope” for specific points (𝑦+ − 𝑦)/(𝑥
6 + − 𝑥)
̅
• Larger weight for datum (𝑥$ , 𝑦$ ) if 𝑥$ coordinate farther from center 𝑥̅
• Weights are non-negative and sum to 1 (convex combination)
• Intercept 𝛼 = 𝑦X − 𝛽𝑥̅
• From center (𝑥,̅ 𝑦),
6 line with estimated slope 𝛽 intersects ‘y’ axis at 𝑦6 − 𝛽𝑥̅
Linear Regression
• Effect of outliers
A Poem on MLE
• https://www.math.utep.edu/faculty/
lesser/MLE.html
On Preparation for Events (Exams) in Life
• From the Iron Man
• “I don’t really prepare for anything like an event.”
• “The goal is to be at a certain level of fitness.”
• “I should be able to run a full marathon whenever I want.”
• “That is the constant level of fitness that I aspire to.”
• “I keep my fitness level as a goal, not an event as a goal.”
• “There is no such thing as a good shortcut.”
• “If you want to be healthy,
and you want to be fit,
and you want to be happy,
you have to work hard.”
• https://youtu.be/x_96xVfdzu0?t=303