0% found this document useful (0 votes)
9 views45 pages

Mstat Note12 Parametric Inference FSP

Uploaded by

junmokim123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views45 pages

Mstat Note12 Parametric Inference FSP

Uploaded by

junmokim123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Parametric Inference

Math & Stat for Data Science

Graduate School of Data Science


Seoul National University
Parametric Inference
• We consider a following model

• Θ ⊂ 𝑅! : parameter space
• 𝜃 = (𝜃", … , 𝜃#) : parameter

• Now the inference is estimating the parameter 𝜃


• If the parametric assumption is wrong, parametric
inference can be inaccurate
Parametric Inference
• But there are many strength in parametric
inference
• Computationally trackable
• Can provide analytical solution

• Can provide more efficient estimator (ex. Lower


standard error)

• Parameters can provide a direct interpretation


• Ex. How much the disease risk increases by genetic variants
Parameter of Interest
• Usually we are interested in the subset of
parameters, or some function of parameters
• Parameter of interest: 𝑇(𝜃)

• Other parameters are called nuisance parameter

• Ex. Normal(𝜇,𝜎)
• Usually interested in mean
• 𝜇 is the parameter of interest
• 𝜎 is the nuisance parameter
Estimation
• Suppose we want to estimate (𝜇,𝜎)
• Assume X1, …, Xn ~ N(𝜇,𝜎)

• There can be numerous ways to estimate


parameters….

• Likelihood based approach is most commonly used.


Maximum Likelihood
• Likelihood function

• Joint density of data


• But it is a function of the parameter, not a function of data!!!
• Represents how likely the parameters are given data.
Maximum Likelihood Estimation (MLE)

• Find the parameters that most likely generated the


observed data
• Very intuitive idea
• Note that maximizing the likelihood is the same as
maximizing the log likelihood

• Any constants in likelihood do not affect MLE


MLE
• Example: X1,…, Xn ~ Bernoulli(𝑝). MLE of p?
Likelihood function
p=0.3
n=40
6e-11

-40
Log-Likelihood
Likelihood

0e+00 3e-11

-120 -80
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

p p

MLE=0.285
MLE
• Example: X1,…, Xn ~ Poisson(λ). MLE of λ?
Likelihood function
λ =3
n=40

MLE=2.84
MLE
• Example: X1,…, Xn ~ Normal(𝜇,𝜎). MLE of (𝜇,𝜎)?
Log-likelihood function
Mu=1, Sigma=2
n=300 3.0
2.5
sigma

2.0
1.5

0.5 1.0 1.5 2.0

mu

MLE: mu=1.047, sigma=1.94


Properties of Estimation
• Properties of estimator?
• Bias

• Consistency

• Variance

• Distribution
Properties of MLE
Equivalence of MLE

• Convenient to find MLE of transformed parameter:


• Ex. MLE of exp(𝜃)?
Score and Fisher information
Asymptotic Distribution
Score function and Fisher Information

• Score function: first derivative of log likelihood


function
• Fisher Information: variance of score function
Fisher Information

• Fisher information (FI) is the (expected) second


derivative of log likelihood
• Beware the notation: I(𝜃) represents a FI of single
observation and In(𝜃) of n observation
Score and Fisher Information
• Example: X1,…, Xn ~ Poisson(λ). Find score and
fisher information of λ
Score and Fisher Information
• Example: X1,…, Xn ~ Normal(𝜇,𝜎). Suppose 𝜎 is
known. Find score and fisher information of 𝜇
Why is the second derivative
information?
Asymptotic Distribution

1. MLE asymptotically follows Normal Distribution


2. Variance is the inverse of fisher information!!
Asymptotic Confidence Interval

• P-value can also be calculated using asymptotic


normality.
Example
• Example: X1,…, Xn ~ Poission(λ). Distribution of MLE
of λ?
Example
• Example: X1,…, Xn ~ Normal(𝜇,𝜎). Suppose 𝜎 is
known. Distribution of MLE of 𝜇?
Computing MLE
• Usually MLE is estimated finding the solution that
makes score function 0
• Find 𝜃 which makes
𝜕 ln 𝜃
=0
𝜕𝜃
• This works when the log likelihood is a convex function

• It is possible that we can not obtain analytic


solution
• Need to use numerical method
• Gradient Descent, Newton Raphson, etc
Optimality and Delta method
Optimality
• Suppose X1,…, Xn ~ Normal(𝜇,𝜎)
• Two different estimator of 𝜇
• MLE (same as the mean)
• Median

• MLE satisfies

Which one is optimal?


• Median satisfies
Optimality
• More generally, consider two different estimator:
𝑇# and 𝑈#

• Asymptotic relative efficiency:


+!
• ARE(𝑈* , 𝑇* ) =,!
• In normal case, ARE(Median, MLE) = 0.63
• Median effectively using 63% of data compares to MLE
Optimality

• MLE is the optimal estimator! (under some regularity condition)


• Reason why MLE is dominating in parametric inference.
Delta method
• Suppose 𝜏 = 𝑔(𝜃), and 𝑔(𝜃) is a smooth function
(ex. exp(𝜃) )
/
• Equivalence shows that MLE of 𝜏, 𝜏,̂ is 𝑔(𝜃),
• Distribution of 𝜏̂ ?
Delta method
Delta method
• Example: X1,…, Xn ~ Bernoulli(𝑝). MLE distribution
$
of log( )?
%&$
Multiparameter Models
Parametric Bootstrap
Multiparameter Models
• Now we consider multiple parameters:
• 𝜃 = 𝜃", … , 𝜃!
• MLE
• 𝜃. = (𝜃.", … , 𝜃.! )
• Fisher Information
Multiparameter Models
• 𝜃/ = (𝜃/% , … , 𝜃/' ) follows multivariate normal
distribution

"#
Note: 𝐽! = 𝐼! 𝜃
Multiparameter Models
• Example: Let X1,…, Xn ~ Normal(𝜇,𝜎) with unknown
𝜎. MLE distribution of (𝜇,
4 𝜎)?
4
Multiparameter Models
• Let r = g 𝜃% , … , 𝜃'
• Gradient
Parametric Bootstrap
• Resampling approach can be effective when
• Sample size is small, so asymptotic does not work..
• Difficult to calculate distribution..

• In previous
∗ ∗
(nonparametric) bootstrap, we simulated
𝑋! , … , 𝑋# from empirical CDF
• Does not use any distributional assumption
• Often called nonparametric bootstrap

• If we know the parametric form of the distribution, it


can be used in bootstrap
Parametric Bootstrap
• Suppose
𝑋", … , 𝑋* ~ 𝑓(𝑥; 𝜃)

• Estimate 𝜃 (using MLE)

. is a true distribution function.


• Now consider 𝑓(𝑥; 𝜃)
Generate B bootstrap sample from
𝑋"∗ , … , 𝑋*∗ 0
.
~ 𝑓(𝑥; 𝜃)
• For each sample, calculate statistic T(𝑋"∗ , … , 𝑋*∗ )
Parametric Bootstrap
• Example: Let 𝑥% , … , 𝑥# are observed and assumed
to follow exp(𝛽=1). Find a MLE distribution of
log(𝛽)?
Parametric Bootstrap, when n=100
Parametric Bootstrap
• Parametric Bootstrap can work better than
asymptotic approach when the sample size is small

• In the previous example, if we reduce the sample


size to 3
• Note: True SD is around 0.625
Summary
• Parametric inference
• Estimate the parameter 𝜃
• Estimating methods
• Method of Moments
• Maximum Likelihood Estimator (MLE)
• MLE
• Properties
• Score function and Fisher Information
• Asymptotic distribution
• Delta method, Parametric Bootstrap

You might also like