0% found this document useful (0 votes)

108 views20 pages

Theoretical Statistics. Lecture 15.: M-Estimators. Consistency of M-Estimators. Nonparametric Maximum Likelihood

The document discusses M-estimators and Z-estimators. M-estimators are defined by maximizing a criterion function based on a function mθ, while Z-estimators are defined as solutions to estimating equations ψθ. Consistency of M-estimators and Z-estimators can be shown under conditions including a uniform law of large numbers and identifiability. Nonparametric maximum likelihood estimation involves estimating an unknown density p0 by maximizing the log-likelihood over a family of densities P. Hellinger consistency of the maximum likelihood estimator can be shown under appropriate conditions on P and the empirical process.

Uploaded by

Mandar Priya Phatak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views20 pages

Theoretical Statistics. Lecture 15.: M-Estimators. Consistency of M-Estimators. Nonparametric Maximum Likelihood

Uploaded by

Mandar Priya Phatak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Theoretical Statistics. Lecture 15.

Peter Bartlett

M-Estimators.
Consistency of M-Estimators.
Nonparametric maximum likelihood.

1
M-estimators

Goal: estimate a parameter of the distribution P of observations

X1 , . . . , Xn .
Define a criterion 7 Mn () in terms of functions m : X R,

Mn () = Pn m .

The estimator = arg max Mn () is called an M-estimator (M for

maximum).
Example:
maximum likelihood uses

m (x) = log p (x).

2
Z-estimators

Can maximize by setting derivatives to zero:

n () = Pn = 0.

These are estimating equations. van der Vaart calls this a Z-estimator (Z
for zero), but its often called an M-estimator (even if theres no
maximization).
Example:
maximum likelihood:

(x) = log p (x).

3
M-estimators and Z-estimators

Of course, sometimes we cannot transform an M-estimator into a

Z-estimator. Example: p =uniform on [0, ] is not differentiable in , and
there is no natural Z-estimator. The M-estimator chooses

= arg max Pn m

1 [ [0, ]]
= arg max Pn log

= max Xi .
i

4
M-estimators and Z-estimators: Examples

Mean:

m (x) = (x )2 .
(x) = (x ).

Median:

m (x) = |x |.
(x) = sign(x ).

5
M-estimators and Z-estimators: Examples

Huber: [PICTURE]

m (x) = rk (x )

1 2
2 k k(x + k) if x < k,

rk (x) = 21 x2 if |x| k,

1 2

2 k + k(x k) if x > k.
(x) = [x ]kk

k if x < k,

[x]kk = x if |x| k,

k if x > k.

These are all location estimators: m (x) = m(x ), (x) = (x ).

6
Consistency of M-estimators and Z-estimators

P
We want to show that 0 , where approximately maximizes
Mn () = Pn m and 0 maximizes M () = P m . We use a ULLN.

Theorem: Suppose that

P
1. sup |Mn () M ()| 0,
2. For all > 0, sup {M () : d(, 0 ) } < M (0 ), and
3. Mn (n ) Mn (0 ) oP (1).
P
Then n 0 .
(2) is an identifiability condition: approximately maximizing M ()
unambiguously specifies 0 . It suffices if there is a unique maximizer, is
compact, and M is continuous.

7
Proof

From (2), for all > 0 there is a > 0 such that

Pr(d(n , 0 ) )
Pr(M (0 ) M (n ) )
= Pr(M (0 ) Mn (0 ) + Mn (0 ) Mn (n ) + Mn (n ) M (n ) )
Pr(M (0 ) Mn (0 ) /3) + Pr(Mn (0 ) Mn (n ) /3)
+ Pr(Mn (n ) M (n ) /3).

Then (1) implies the first and third probabilities go to zero, and (3) implies
the second probability goes to zero.

8
Consistency of M-estimators and Z-estimators

Same thing for Z-estimators: Finding that is an approximate zero of

n () = Pn leads to 0 , which is the unique zero of () = P .

Theorem: Suppose that

P
1. sup kn () ()k 0,
2. For all > 0, inf {k()k : d(, 0 ) } > 0 = k(0 )k, and
3. n (n ) = oP (1).
P
Then n 0 .

Proof: Choosing Mn () = kn ()k and M () = k()k in the

previous theorem implies the result.

9
Example: Sample median

Sample median n is the zero of

Pn (X) = Pn sign(X ).

Suppose that P is continuous and positive around the median, and check the
conditions:
1. The class {x 7 sign(x ) : R} is Glivenko-Cantelli.

2. The population median is unique, so for all > 0,

1
P (X < 0 ) < < P (X < 0 + ).
2

3. The sample median always has |Pn sign(X n )| = 0.

10
ULLN and M-estimators

Notice the ULLN condition:

P
sup |Mn () M ()| 0.
Typically, this requires the empirical process 7 Pn m to be totally
bounded. This can be problematic if m is unbounded. For instance:
Mean: m (x) = (x )2 ,
Median: m (x) = |x |.
We can get around the problem by restricting to a compact set where most
of the mass of P lies, and showing that this does not affect the asymptotics.
In that case, we can also restrict to an appropriate compact subset.

11
Non-parametric maximum likelihood

Estimate P on X . Suppose it has a density

dP
p0 = P,
d
where P is a family of densities. Define the maximum likelihood estimate

pn = arg max Pn log p.

Well show conditions for which pn is Hellinger consistent, that is,

as
h(
pn , p0 ) 0, where h is the Hellinger distance:
Z 1/2
1 2
h(p, q) = p1/2 q 1/2 d .
2
[The 1/2 ensures 0 h(p, q) 1.]

12
Hellinger distance

We have
1
Z 2
h(p, q)2 = p1/2 q 1/2 d
2
1
Z
= p + q 2p1/2 q 1/2 d
2
Z
= 1 p1/2 q 1/2 d.

This latter integral is called the Hellinger affinity. Expressing h in this form
can simplify its calculation for product densities. Notice that, by
Cauchy-Schwartz,
Z Z Z
p1/2 q 1/2 d p d q d = 1,

so h(p, q) [0, 1].

13
Non-parametric maximum likelihood

The Kullback-Leibler divergence between p and q is

q
Z
dKL (p, q) = log q d.
p
Clearly, dKL (p, p) = 0. Also, since log() is convex,
Z
p p
Z
dKL (p, q) = log q d log q d = 0.
q q

14
Non-parametric maximum likelihood

Relating KL-divergence to a ULLN:

p0
Z
dKL (
pn , p0 ) = log p0 d
pn
p0 p0
Z
log p0 d Pn log
pn pn
p0 p0
= P log Pn log
pn pn
kP Pn kG ,

where the first inequality follows from the fact that pn maximizes Pn log p

15
over p P, and the class G is defined as

p0
G = 1[p0 > 0] log :pP .
p

16
Non-parametric maximum likelihood

One problem here is that log(p0 /p) is unbounded, since p can be zero.
Well take a different approach: For any p P, consider the mixture
p + p0
p = .
2
If the class P is convex and pn , p0 P, this mixture has
Pn log p Pn log pn . This is behind the following lemma.

Lemma: Define
pn + p0
pn = .
2
If P is convex,
pn
Z
p n , p 0 )2
h( d(Pn P ).
pn

17
Non-parametric maximum likelihood

Theorem: For a convex class P of densities, if P has density p0 P and

pn maximizes likelihood over P, we have

pn , p0 )2 kP Pn kG ,
h(

where
2p
G= :pP .
p + p0

Notice that functions in G are bounded between 0 and 2.

18
Non-parametric maximum likelihood: Example

Lemma: Suppose P is a set of densities on a compact subset X of Rd .

Fix a norm k k on Rd . Suppose that, for all p P,

p(x)
p(y) 1 Lkx yk.

p(x)
1. For all p conv P, p(y) 1 Lkx yk.

2p
2. For all p, p0 conv P, p+p0 is O(L2 )-Lipschitz wrt k k.
as
3. kP Pn kG 0, where

2p
G= : p conv P .
p + p0

19
Non-parametric maximum likelihood: Example

But notice that the dependence on the dimension d is terrible: the rate is
exponentially slow in d. The Lipschitz property is a very weak restriction.

Multivariate in Epidemiology
100% (3)
Multivariate in Epidemiology
427 pages
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Joseph M. Hilbe - Practical Guide To Logistic Regression (2016, Taylor & Francis)
No ratings yet
Joseph M. Hilbe - Practical Guide To Logistic Regression (2016, Taylor & Francis)
162 pages
120 DS-With Answer
100% (1)
120 DS-With Answer
32 pages
Questions To Lecture 7 - IS-LM Model and Aggregate Demand
100% (2)
Questions To Lecture 7 - IS-LM Model and Aggregate Demand
6 pages
Translog Production Function
No ratings yet
Translog Production Function
8 pages
1 Probability
No ratings yet
1 Probability
56 pages
Deep Learning M1
No ratings yet
Deep Learning M1
54 pages
Statistical Machine Learning W4400 Lecture Slides PDF
No ratings yet
Statistical Machine Learning W4400 Lecture Slides PDF
520 pages
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
BSC Syllabus
No ratings yet
BSC Syllabus
11 pages
CS4248 AY 2021/22 Semester 1 Tutorial 2: C C C C C P C P
No ratings yet
CS4248 AY 2021/22 Semester 1 Tutorial 2: C C C C C P C P
3 pages
Agn Feedback in Galaxy Formation Cambridge Contemporary Astrophysics 1st Edition Vincenzo Antonucciodelogu Editor Instant Download
No ratings yet
Agn Feedback in Galaxy Formation Cambridge Contemporary Astrophysics 1st Edition Vincenzo Antonucciodelogu Editor Instant Download
75 pages
Dynamic Macro Basic Macro Frameworks Summer 2016 1
No ratings yet
Dynamic Macro Basic Macro Frameworks Summer 2016 1
90 pages
Forward and Futures Pricing
No ratings yet
Forward and Futures Pricing
13 pages
Derivatives
No ratings yet
Derivatives
21 pages
Formulary Eco II
No ratings yet
Formulary Eco II
47 pages
Formulary Eco II
No ratings yet
Formulary Eco II
47 pages
Thesis
No ratings yet
Thesis
82 pages
IFM Part2 Updated
No ratings yet
IFM Part2 Updated
61 pages
04 Combin Optimization
No ratings yet
04 Combin Optimization
41 pages
CS775 Lec 2
No ratings yet
CS775 Lec 2
66 pages
Chapter 1
No ratings yet
Chapter 1
42 pages
Historical Currency Crises (After WWII)
No ratings yet
Historical Currency Crises (After WWII)
74 pages
Instant Ebooks Textbook Molecular Evolution A Statistical Approach 1st Edition Ziheng Yang Download All Chapters
No ratings yet
Instant Ebooks Textbook Molecular Evolution A Statistical Approach 1st Edition Ziheng Yang Download All Chapters
71 pages
7.estimation Clustering
No ratings yet
7.estimation Clustering
56 pages
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
No ratings yet
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
23 pages
Amanuel Project
No ratings yet
Amanuel Project
35 pages
Icra16 Slam Tutorial Grisetti PDF
No ratings yet
Icra16 Slam Tutorial Grisetti PDF
57 pages
Intimate Partner Violence Prevention and Intervention The Risk Assessment and Management Approach 1st Edition Anna C. Baldry
100% (10)
Intimate Partner Violence Prevention and Intervention The Risk Assessment and Management Approach 1st Edition Anna C. Baldry
67 pages
Strategies For Causal Inference Part 0: Introduction: CAU Kiel Summer Term 2019
No ratings yet
Strategies For Causal Inference Part 0: Introduction: CAU Kiel Summer Term 2019
18 pages
Financial 1
No ratings yet
Financial 1
41 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
M3 DensityEstimation v1
No ratings yet
M3 DensityEstimation v1
65 pages
Inf 2
No ratings yet
Inf 2
37 pages
Maximum-Likelihood: Seismic Deconvolution
No ratings yet
Maximum-Likelihood: Seismic Deconvolution
11 pages
Tendeiro Niessen Crisan Meijer (2019)
No ratings yet
Tendeiro Niessen Crisan Meijer (2019)
41 pages
Competition Level Code Generation With Alphacode
No ratings yet
Competition Level Code Generation With Alphacode
73 pages
Oecd Development Centre: Long-Run Growth Trends and Convergence Across Indian States
No ratings yet
Oecd Development Centre: Long-Run Growth Trends and Convergence Across Indian States
56 pages
Oecd Development Centre: Long-Run Growth Trends and Convergence Across Indian States
No ratings yet
Oecd Development Centre: Long-Run Growth Trends and Convergence Across Indian States
56 pages
The Effect of Inequality On Growth: Theory and Evidence From The Indian States
No ratings yet
The Effect of Inequality On Growth: Theory and Evidence From The Indian States
14 pages
Please Answer Questions Tlie With: w:AG), Rü
No ratings yet
Please Answer Questions Tlie With: w:AG), Rü
2 pages
I.,Qi) ' LL, /L: Al N+1 ?y" N-L Yl''
No ratings yet
I.,Qi) ' LL, /L: Al N+1 ?y" N-L Yl''
2 pages
TS Theme3
No ratings yet
TS Theme3
18 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
4.ML Estimation
No ratings yet
4.ML Estimation
19 pages
Ch3 PDF
No ratings yet
Ch3 PDF
55 pages
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
No ratings yet
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
45 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
18.650 Statistics For Applications
No ratings yet
18.650 Statistics For Applications
25 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Estimation 4
No ratings yet
Estimation 4
16 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
Survival Data Analysis HW3
No ratings yet
Survival Data Analysis HW3
5 pages
Lectures 10-11
No ratings yet
Lectures 10-11
17 pages
MIDAS Stata Module For Meta-Analytical Integration
No ratings yet
MIDAS Stata Module For Meta-Analytical Integration
25 pages
Lecture 03 Maximum Likelihood Estimation
No ratings yet
Lecture 03 Maximum Likelihood Estimation
22 pages
NOTES
No ratings yet
NOTES
14 pages
Week 1 1720465962 Estimation Hour 2
No ratings yet
Week 1 1720465962 Estimation Hour 2
14 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
4 Comparison of Estimators: 4.1 Optimality Theory
No ratings yet
4 Comparison of Estimators: 4.1 Optimality Theory
16 pages
Viva Questions and Possible Answers - Ver 1.0
No ratings yet
Viva Questions and Possible Answers - Ver 1.0
3 pages
EC229 Part II Answers
No ratings yet
EC229 Part II Answers
9 pages
Profile Likelihood Method
No ratings yet
Profile Likelihood Method
21 pages
DiStefano, C., & Kamphaus, R. W. (2006) - Investigating Subtypes of Child Development.
No ratings yet
DiStefano, C., & Kamphaus, R. W. (2006) - Investigating Subtypes of Child Development.
17 pages
Stat210b Lecture 9
No ratings yet
Stat210b Lecture 9
6 pages
This Content Downloaded From 140.213.190.131 On Tue, 13 Apr 2021 09:26:31 UTC
No ratings yet
This Content Downloaded From 140.213.190.131 On Tue, 13 Apr 2021 09:26:31 UTC
23 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Testsfor Structural Breaksin Time Series Analysis AReviewof Recent Development
No ratings yet
Testsfor Structural Breaksin Time Series Analysis AReviewof Recent Development
15 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
Dieter Weichert: Bulletin of The Seismological Society of America, Vol. 70, No. 4, Pp. 1337-1346, August 1980
No ratings yet
Dieter Weichert: Bulletin of The Seismological Society of America, Vol. 70, No. 4, Pp. 1337-1346, August 1980
10 pages
Basic Stats Estimation
No ratings yet
Basic Stats Estimation
8 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
Student Work Readiness in Vocational High School 2020
No ratings yet
Student Work Readiness in Vocational High School 2020
6 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Chapter 7. Statistical Estimation: 7.7: Properties of Estimators II
No ratings yet
Chapter 7. Statistical Estimation: 7.7: Properties of Estimators II
6 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
Lecture 22
No ratings yet
Lecture 22
7 pages
Pastore and Scheirer, 1974 (TDS - General)
No ratings yet
Pastore and Scheirer, 1974 (TDS - General)
14 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
14 pages
Detection of Anomalous Crowd Behavior Using Spatio Tempora Multiresolution Model and Kronecker Sum Decompositions
No ratings yet
Detection of Anomalous Crowd Behavior Using Spatio Tempora Multiresolution Model and Kronecker Sum Decompositions
10 pages
Classification Example
No ratings yet
Classification Example
12 pages
Notes
No ratings yet
Notes
10 pages
STAT2602 Tutorial 5
No ratings yet
STAT2602 Tutorial 5
7 pages
Module 4
No ratings yet
Module 4
3 pages
Assignment - Probability Distributions and Data Modeling
No ratings yet
Assignment - Probability Distributions and Data Modeling
4 pages
Solution 3 Problem 1: Let X
No ratings yet
Solution 3 Problem 1: Let X
12 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
16 pages
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
No ratings yet
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
6 pages
牛颖Introduction to M-estimator
No ratings yet
牛颖Introduction to M-estimator
4 pages
Cours 2 MVA
No ratings yet
Cours 2 MVA
5 pages
Unbiased Estimation - of Mean and Variance
No ratings yet
Unbiased Estimation - of Mean and Variance
4 pages
MIT14 30s09 Lec19
No ratings yet
MIT14 30s09 Lec19
7 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
ML Notes
No ratings yet
ML Notes
4 pages
An Improved Bonferroni Inequality and Applications
No ratings yet
An Improved Bonferroni Inequality and Applications
7 pages
A Guide To Modern Econometrics by Verbeek 181 190
No ratings yet
A Guide To Modern Econometrics by Verbeek 181 190
10 pages
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
No ratings yet
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
5 pages
Advanced Statistical Inference
No ratings yet
Advanced Statistical Inference
7 pages
Topic 14: Maximum Likelihood Estimation: 1 Examples
No ratings yet
Topic 14: Maximum Likelihood Estimation: 1 Examples
6 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
Agricultural Land Use in Kerala
No ratings yet
Agricultural Land Use in Kerala
5 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages