0% found this document useful (0 votes)
20 views41 pages

LN1 Introduction Ver2 Slides

Uploaded by

lijuncheng0219
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views41 pages

LN1 Introduction Ver2 Slides

Uploaded by

lijuncheng0219
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Ch01.

Introduction

Ping Yu

HKU Business School


The University of Hong Kong

Ping Yu (HKU) Introduction 1 / 41


Course Information

Instructor: Yu, Ping


Email: [email protected]
Teaching Time: 9:30-10:45am and 10:55am-12:10pm, Thursday
Teaching Location: KKL-301 (except Sep 5, 19, 26 at KKLG104, Nov 7 TBC)
- The lectures will not be recorded and uploaded on Moodle.
Office Hour: 3:00-4:00pm, Thursday, KKL1110
- I will NOT answer questions in email if the answer is long or is not easy to explain
exactly by words. Please stop by during my office hour.

Tutor: Lily Wang


Email: [email protected]
Teaching Time: TBA
Teaching Location: TBA
Office Hour: TBA
- Any issues on administration (e.g., enrollment, Moodle, time clash, software,
attendance, etc.) and HWs (e.g., clarification of problems) should contact the tutor.

Ping Yu (HKU) Introduction 2 / 41


Evaluation

Textbook: My lecture notes posted on Moodle.


- Others: Hayashi (2000), Ruud (2000), Cameron and Trivedi (2005), Wooldridge
(2010), and Hansen (2022).
- References: no need to read unless necessary.
Evaluation: HWs (10%), Midterm Test (40%), Final Exam (50%).
Exercises: turn in only empirical exercises, but analytical exercises are necessary
to pass this course.
- Solve the associated analytical exercises in the lecture notes covered during
this week (exclude the starred exercises or exercises in the starred sections in the
first reading), and I have posted answer keys to all these exercises.
- At the end of Chapters 2, 3 and 5-8, there are one or two empirical exercises
(which are what the HWs counted). Do them only after the whole chapter is
finished. Use matrix programming languages such as Matlab (the tutor will teach
you how1 ) rather than Stata like softwares to solve these exercises.
- Must be typed (e.g., by LaTex or Word). Matlab codes need also be submitted.
Turn in your HW on Moodle on the due day (before the midnight of the next class,
i.e., some Wednesday midnight). Late HWs are not acceptable for whatever
reasons.
1
HKU students are free to download and install Matlab:
https://www.its.hku.hk/services/personal/software/matlab-simulink
Ping Yu (HKU) Introduction 3 / 41
continue...

Tutorial: one tutorial class every other week; the first class starts from week two
(the week starting from Sep. 8). You can suggest some difficult points in my
lectures or some difficult exercises in the lecture notes to the tutor for covering in
the tutorial classes.
Examination: mimic the analytical exercises in the lecture notes; please refer to
past three years’ exams on Moodle for concrete examples (exams three years ago
seem out of date).
- The midterm is an in-class exam on October 24.
- The final will be conducted in the mid of December (TBA) and put more (not all)
weights on the materials that are not covered by the midterm.
- The exams are open-book and open-note (but only the materials on Moodle are
allowed, so a laptop is not allowed).

Ping Yu (HKU) Introduction 4 / 41


Course Policy

In Class: (i) turn off your cell phone and keep quiet; (ii) come to class and return
from the break on time; (iii) you can ask me freely in class, but if your question is
far out of the course or will take a long time to answer, I will answer you after class.
Policy on Plagiarism: If judged as “plagiarism”, you are in serious trouble. If a few
students are judged to copy each other, each gets zero mark. I will not judge who
copied whom. So DO NOT copy others and DO NOT be copied by others.
- This policy applies to HW, midterm and final.
Guest Account (cannot receive announcements):
- Website: http://hkuportal.hku.hk/moodle/guest
- Guest Username: econ6005_1a_2024_guest
- Password: ECON6005@ping

Ping Yu (HKU) Introduction 5 / 41


What is Econometrics? What Will This Course Cover?

Ragnar Frisch (1933): unification of statistics, economic theory, and mathematics.

Linear regression and its extensions.


The objective of econometrics and microeconometrics, and the role of economic
theory in econometrics.
Main econometric approaches.

We will concentrate on linear models, i.e., linear regression and linear GMM, in
this course. Nonlinear models are discussed only briefly.
Sections, proofs, exercises, paragraphs or footnotes indexed by * are optional and
will not be covered in this course.
I may neglect or add materials beyond my notes (depending on your backgrounds
and time constraints). Just follow my slides and read the corresponding sections.

Ping Yu (HKU) Introduction 6 / 41


Linear Regression and Its Extensions

Linear Regression and Its Extensions

Ping Yu (HKU) Introduction 7 / 41


Linear Regression and Its Extensions

Return to Schooling: Our Starting Point

Suppose we observe fyi , xi gni=1 , where yi is the wage rate, xi includes education
and experience, and the target is to study the return to schooling or the
relationship between yi and xi .
The most general model is
y = m (x, u), (1)
)0
where x = (x1 , x2 with x1 being education and x2 being experience, u is a vector
of unobservable errors (e.g., the innate ability, skill, quality of education, work
ethic, interpersonal connection, preference, and family background), which may be
correlated with x (why?), and m ( ) can be any (nonlinear) function. To simplify our
discussion, suppose u is one-dimensional and represents the ability of individuals.
Notations: real numbers (or scalars) are written using lower case italics. Vectors
are defined as column vectors and represented using lowercase bold.

Ping Yu (HKU) Introduction 8 / 41


Linear Regression and Its Extensions

Nonadditively Separable Nonparametric Model

In (1), the return to schooling is

∂ m (x1 , x2 , u )
,
∂ x1
which depends on the levels of x1 and x2 and also u.
In other words, for different levels of education, the returns to schooling are
different; furthermore, for different levels of experience (which is observable) and
ability (which is unobservable), the returns to schooling are also different.
This model is called the NSNM since u is not additively separable.

Ping Yu (HKU) Introduction 9 / 41


Linear Regression and Its Extensions

Additively Separable Nonparametric Model

ASNM:
y = m (x) + u.
In this model, the return to schooling is

∂ m (x1 , x2 )
,
∂ x1
which depends only on observables.
A special case of this model is the additive separable model (ASM) where
m (x ) = m1 (x1 ) + m2 (x2 ).
In this case, the return to schooling is ∂ m1 (x1 )/∂ x1 , which depends only on x1 .

Ping Yu (HKU) Introduction 10 / 41


Linear Regression and Its Extensions

Random Coefficient Model

There is also the case where the return to schooling depends on the unobservable
but not other covariates.
For example, suppose

y = α (u ) + m1 (x1 )β 1 (u ) + m2 (x2 )β 2 (u ),

and then the return to schooling is

∂ m1 (x1 )
β 1 (u ),
∂ x1
which does not depend on x2 but depend on x1 and u.
A special case is the RCM where m1 (x1 ) = x1 and m2 (x2 ) = x2 .
In this case, the return to schooling is β 1 (u ) which depends only on u.

Ping Yu (HKU) Introduction 11 / 41


Linear Regression and Its Extensions

Varying Coefficient Model

The return to schooling may depend only on x2 and u.


For example, if
y = α (x2 , u ) + x1 β 1 (x2 , u ),
then the return to schooling is β 1 (x2 , u ) which does not depend on x1 .
A special case is the VCM where

y = α (x2 ) + x1 β 1 (x2 ) + u,

and the return to schooling is β 1 (x2 ) depending only on x2 .

Ping Yu (HKU) Introduction 12 / 41


Linear Regression and Its Extensions

Linear Regression Model

When the return to schooling does not depend on either (x1 , x2 ) or u, we get the
LRM,
y = α + x1 β 1 + x2 β 2 + u x0 β + u,
where x (1, x1 , x2 )0 , β (α, β 1 , β 2 )0 , and the return to schooling is β 1 which is
constant.
Summary:

x1 X X X X
x2 X X X X
u X X X X
Model NSNM ASNM ? ? ASM VCM RCM LRM
Table 1: Models Based on What The Return to Schooling Depends on

Other popular models:


- The VCM can be simplified to the partially linear model (PLM), where
y = α (x2 ) + x1 β 1 + u.
- Combining the LRM and the ASNM, we get the single index model (SIM) where
y = m (x0 β ) + u.
Ping Yu (HKU) Introduction 13 / 41
Linear Regression and Its Extensions

Other Dimensions

x and u are uncorrelated (or even independent) and x and u are correlated. In the
former case, x is called exogenous, and in the latter case, x is called endogenous.
Limited dependent variables (LDV): part of the information about y is missing.
Single equation vs. Multiple equations.
Different characteristics of the conditional distribution of y given x.
- Conditional mean or conditional expectation function (CEF)
Z Z
m (x) = E [y jx] = yf (y jx)dy = m (x, u )f (ujx)du,

where f (y jx) is the conditional probability density function (pdf) or the conditional
probability mass function (pmf) of y given x.
- Conditional quantile

Qτ (x) = inf fy jF (y jx) τg , τ 2 (0, 1),

where F (y jx) is the conditional cumulative probability function (cdf) of y given x.


Especially, Q.5 (x) is the conditional median.

Ping Yu (HKU) Introduction 14 / 41


Linear Regression and Its Extensions

What Will We Cover?

- Conditional variance
h i
σ 2 (x) = Var (y jx) = E (y m (x))2 x ,

which measures the dispersion of f (y jx).


y m (x) 3
- Conditional skewness E σ (x)
x which measures the asymmetry of f (y jx).
y m (x) 4
- Conditional kurtosis E σ (x)
x which measures the heavy-tailedness of
f (y jx).

LRM Conditional Mean Exogeneity Single Equation


.. .. Endogeneity Multiple Equation
. .

LDV Conditional Mean Exogeneity Single Equation


.. .. Endogeneity Multiple Equation
. .

Ping Yu (HKU) Introduction 15 / 41


Linear Regression and Its Extensions

A Real Example

0.14
f (Wage|Female)
f (Wage|Male)
0.12 Mean(Wage|Female)
Median(Wage|Female)
Mean(Wage|Male)
0.1 Median(Wage|Male)

0.08

0.06

0.04

0.02

0
0 6.7 7.9 9.0 10.1 30

Figure: Wage Densities for Male and Female from the 1985 CPS

Ping Yu (HKU) Introduction 16 / 41


Linear Regression and Its Extensions

What Can We Get From The Figure?

These are conditional densities - the density of hourly wages conditional on


gender.
First, both mean and median of male wage are larger than those of female wage.
Second, for both male and female wage, median is less than mean, which
indicates that wage distributions are positively (or rightly) skewed. This is
supported by the fact that the skewness of both male and female wage is greater
than zero (1.0 and 2.9, respectively).
Third, the variance of male wage (27.9) is greater than that of female wage (22.4).
Fourth, the right tail of male wage is heavier than that of female wage.

Ping Yu (HKU) Introduction 17 / 41


Econometrics, Microeconometrics and Economic Theory

Econometrics, Microeconometrics
and Economic Theory

Ping Yu (HKU) Introduction 18 / 41


Econometrics, Microeconometrics and Economic Theory

Econometric Data Types

In modern econometrics, any economy is viewed as a stochastic process


fWit : t 2 ( ∞, ∞) , i = 1, , nt g which summarizes the economic behavior of all
individuals at time t, where Wit can be infinite-dimensional, and nt is the number
of individuals at time t.
Any economic phenomenon (i.e., a data set) is viewed as a (partial) realization of
this stochastic process.
Typically, three types of data are collected.
- Cross-sectional data. The observations are fwi : i = 1, , ng at a fixed time point
t, where w is a subset of W (e.g., wage, consumption, education, etc.) or a
transformation of W (e.g., aggregations such as unemployment rates in different
countries, consumption at the household level and investment of different
coporations), and n nt .
- Time series data. The observations are fwt : t = 1, , T g for the same target of
interest (e.g., GDP, CPI, stock price, etc.), where the time unit can be year,
quarter, month, day, hour or even second.
- Panel data or longitudinal data. The observations are
fwit : t = 1, , T ; i = 1, , ng.
If specify to the setup in the return-to-schooling example, we can think w = (y , x0 )0 .

Ping Yu (HKU) Introduction 19 / 41


Econometrics, Microeconometrics and Economic Theory

The Objective of Econometrics

The objective of econometrics is to infer (characteristics of) the probability law of


this economic stochastic process (i.e., the data generating process or DGP) using
observed data, and then use the obtained knowledge to explain what has
happened (i.e., internal validity), and predict what will happen (i.e., external
validity).
The internal validity concerns three problems:
- What is a plausible value for the parameter? (point estimation)
- What are a plausible set of values for the parameter? (set/interval estimation)
- Is some preconceived notion or economic theory on the parameter "consistent"
with the data? (hypothesis testing).
In other words, the objectives of econometrics are estimation, inferences (including
hypothesis testing and confidence interval (CI) construction) and prediction.

Ping Yu (HKU) Introduction 20 / 41


Econometrics, Microeconometrics and Economic Theory

The Objective of Microeconometrics

This course will concentrate on microeconometrics, i.e., the main data types
analyzed in this course are cross-sectional data and panel data.2
One main objective of microeconometrics is to explore causal relationships
between a response variable y and some covariates x.
- the effect of class sizes on test scores
- police expenditures on crime rates
- climate change on economic activity
- years of schooling on wages
- baby-bearing on the labor force participation of women
- institutional structure on growth
- the effectiveness of rewards on behavior
- the consequences of medical procedures on health outcomes
Caveat: causality is different from correlation.
- using umbrellas can predict raining but we cannot claim umbrellas cause raining.
- the rooster crow can predict sunrise but cannot cause sunrise.
- Correlation is used to "predict" y from x, while causality can be used to "explain"
y from x.
Noncausal relationships describe only associations, so are of less economic
interests.
2
Maybe only cross-sectional data will be discussed due to time constraint.
Ping Yu (HKU) Introduction 21 / 41
Econometrics, Microeconometrics and Economic Theory

Roles of Economic Theory

Economic theory or model is not a general framework that embeds an


econometric model. In contrast, economic theory is often formulated as a
restriction on the probability law of the DGP.
Such a restriction can be used to validate economic theory, and to improve
forecasts if the restriction is valid or approximately valid.
Usually, the economic theory play the following roles in econometric modeling:
- indication of the nature (e.g., conditional mean, conditional variance, etc.) of the
relationship between y and x: which moments are important and of interest?
- choice of economic variables x (e.g., theoretical considerations may suggest that
certain variables have no direct effect on others because they do not enter into
agents’ utility function, nor do they affect the constraints these agents face);
- restriction on the functional form or parameters of the relationship (e.g., for
Cobb-Douglas production function, Y = ALβ 1 K β 2 , constant-return-to-scale implies
that β 1 + β 2 = 1);
- help judge causal relationship (e.g., whether women’s fertility choice affects their
employment statuses and hours worked).

Ping Yu (HKU) Introduction 22 / 41


Econometric Approaches

Econometric Approaches

Ping Yu (HKU) Introduction 23 / 41


Econometric Approaches

There are two econometric traditions: the frequentist approach and the Bayesian
approach.
- the former treats the parameter as fixed (i.e., there is only one true value) and
the samples as random.
- the latter treats the parameter as random and the samples as fixed.
This course will concentrate on the frequentist approach.
Two main methods in the frequentist approach are the likelihood method and the
method of moments (MoM).
We will concentrate on the MoM and only briefly discuss the likelihood method.
We will use the estimation problem to illustrate these two methods.

Ping Yu (HKU) Introduction 24 / 41


Econometric Approaches

History of the Bayesian Analysis

Thomas Bayes (1701-1761), English Reverend

Bayes never published what would eventually become his most famous accomplishment; his
notes were edited and published after his death by Richard Price.

Ping Yu (HKU) Introduction 25 / 41


Econometric Approaches The Maximum Likelihood Estimator

The Maximum Likelihood Estimator

The MLE was popularized by R.A. Fisher (1890-1962).


The basic idea of the MLE is to guess the truth which could generate the
phenomenon we observed most likely. [practical examples here]
Mathematically,
R
θ MLE = arg max E [ln(f (X jθ ))] = arg max f (x ) ln f (xjθ )dx
θ 2Θ
R θ 2Θ (2)
= arg max ln f (xjθ )dF (x ),
θ 2Θ

where X is a random vector, f (x ) is the true pdf or the true pmf, f (xjθ ) is the
specified parametrized pdf or pmf, Θ is the parameter space, and F (x ) is the true
cdf.

Ping Yu (HKU) Introduction 26 / 41


Econometric Approaches The Maximum Likelihood Estimator

History of the MLE

Ronald A. Fisher (1890-1962), UCL

Ronald A. Fisher (1890-1962) is one iconic founder of modern statistical theory. The name of
F -distribution was coined by G.W. Snedecor, in honor of R.A. Fisher. The p-value is also
credited to him.

Ping Yu (HKU) Introduction 27 / 41


Econometric Approaches The Maximum Likelihood Estimator

continue...

Another explanation of the MLE is to minimize the Kullback-Leibler information


distance between f (x ) and f (xjθ ),
Z
f (x )
KLIC = f (x ) ln dx = E [ln(f (X )) ln(f (X jθ ))] .
f (xjθ )
- (**) In information theory, the KL distance is viewed as the extra number of nats
needed on average to code data generated from a source f (x ) under the
distribution f (xjθ ) as oppose to f (x ).
Two Good Properties of the MLE: (i) Invariant: if θb MLE is the MLE of θ , then
τ (θb MLE ) is the MLE of τ (θ ). (ii) Efficient: it reaches the Cramér-Rao Lower
Bound (CRLB) asymptotically.

Ping Yu (HKU) Introduction 28 / 41


Econometric Approaches The Maximum Likelihood Estimator

History of the Kullback-Leibler Information Criterion

Solomon Kullback (1907-1994), NSA Richard A. Leibler (1914-2003), NSA

Ping Yu (HKU) Introduction 29 / 41


Econometric Approaches The Maximum Likelihood Estimator

History of the Cramér-Rao Bound

Harald Cramér (1893-1985), Stockholm C.R. Rao (1920-2023), ISI and PSU3

3
A student of R.A. Fisher, recipient of International Prize in Statistic – the Nobel Prize of statistics.
Ping Yu (HKU) Introduction 30 / 41
Econometric Approaches The Method of Moments Estimator

The MoM Estimator

The MoM estimator was invented by Karl Pearson (1857-1936).


The original problem is to estimate k unknown parameters, say θ = (θ 1 , , θ k )0 ,
in f (x ). But we are not fully sure about the functional form of f (x ).
Nevertheless, we know the functional form of the moments of X 2 R as a function
of θ :
E [X ] = g1 (θ ),
E [X 2 ] = g2 (θ ),
.. (3)
.
E [X k ] = gk (θ ).
There are k functions with k unknowns, so we can solve out θ uniquely in
principle.

Ping Yu (HKU) Introduction 31 / 41


Econometric Approaches The Method of Moments Estimator

History of the MoM

Karl Pearson (1857-1936), UCL

Karl Pearson (1857-1936) is also the inventor of the correlation coefficient, so the correlation
coefficient is also called the Pearson correlation coefficient. He is also the founder of
Biometrika.

Ping Yu (HKU) Introduction 32 / 41


Econometric Approaches The Method of Moments Estimator

Efficiency and Robustness

The MoM estimator uses only the moment information in X , while the MLE uses
"all" information in X , so the MLE is more efficient than the MoM estimator.
However, the MoM estimator is more robust than the MLE since it does not rely on
the correctness of the full distribution but relies only on the correctness of the
moment functions.
Efficiency and robustness are a common trade-off among econometric methods.

Ping Yu (HKU) Introduction 33 / 41


Econometric Approaches The Method of Moments Estimator

A Microeconomic Example of the MoM Estimator

Moment conditions often originate from the first order conditions (FOCs) in an
optimization problem.
Suppose the firms are maximizing their profits conditional on the information in
hand; then the problem for the firm i is
Z
max Eνjz [π (di , zi , ν i ; θ )] := max π (di , zi , ν i ; θ )f (vi jzi ) dvi . (4)
di di

π is the profit function, e.g.,

π (di , zi , ν i ; θ ) = pi f (Li , ν i ; θ ) wi L i ,

where zi = (pi , wi )0 is all information used in decision and can be observed by


both the firm and the econometrician, pi is the output price and wi is the wage, ν i
is the exogenous random error (e.g., weather, financial crisis, trade war,
COVID-19, etc.) and cannot be observed or controlled by either the firm or the
econometrician, and di = Li is the decision of labor input.
φ
θ is the technology parameter, e.g., if f (Li , ν i ; θ ) = Li exp(ν i ), then θ = φ , and is
known to the firm but unknown to the econometrician. Our goal is to estimate θ ,
which is relevant to measure the causal effect - the effect of labor input on profit.

Ping Yu (HKU) Introduction 34 / 41


Econometric Approaches The Method of Moments Estimator

continue...

The first-order conditions (FOCs) of (4) are


∂ π (di , zi , ν i ; θ )
Eνjz m (di , zi jθ ) = 0.
∂ di

When there is randomness even in zi ,4 then the objective function changes to

max E [π (di , zi , ν i ; θ )] ,
di

and the FOCs change to


E [m (di , zi jθ )] = 0, (5)
which are a special set of moment conditions.

4
The difference between zi and ν i is that zi can be observed ex post while ν i cannot. That zi is random
means that the decision is made before zi is revealed, or the decision is made ex ante.
Ping Yu (HKU) Introduction 35 / 41
Econometric Approaches The Method of Moments Estimator

A Macroeconomic Example of the MoM Estimator


max ∑ ρ t E0 [u (ct )]
fct g∞
t =1 t = 1
s.t. ct +1 + kt +1 = kt Rt +1 , k0 is known,

ρ is the discount factor, E0 [u ( )] is the conditional expected utility based on the


information at t = 0, kt is the capital accumulation at time period t, ct is the
consumption at t, and Rt is the gross return rate at t.
From dynamic programming, we have the Euler equation
u 0 (ct +1 )
E0 ρ R = 1.
u 0 (ct ) t +1
1 α
If u (c ) = c 1 α 1 , α > 0, is the isoelastic utility function with the constant relative
risk aversion α,5 then we get
α
ct
E0 ρ Rt +1 = 1. (6)
ct +1
Suppose ρ is known while α is unknown; then (6) is a moment condition for α.
5 cu 00 (c )
The coefficient of relative risk aversion (RRA) is defined as R (c ) = cA (c ) = u 0 (c )
, where
u 00 (c )
A (c ) = u 0 (c )
is the coefficient of absolute risk aversion (ARA).
Ping Yu (HKU) Introduction 36 / 41
Econometric Approaches The Analog Method

Population Version vs Sample Version of Moment Conditions

Equations (3), (5) and (6) are the population version of moment conditions.
Although some econometricians treat "population" as a physical population (e.g.,
all individuals in the US census) in the real world, the term "population" is often
treated abstractly, and is potentially infinitely large.
Since the population distribution is unknown, we cannot solve the population
moment conditions to estimate the parameters.
In practice, we often have a set of finite data points from the population, so we
can substitute the population distribution in the moment conditions by the
empirical distribution of the data, which generates the sample version of the
moment conditions.
This is called the analog method.

Ping Yu (HKU) Introduction 37 / 41


Econometric Approaches The Analog Method

History of the Analog Method

Charles F. Manski (1948-), Northwestern Manski (1988)

Ping Yu (HKU) Introduction 38 / 41


Econometric Approaches The Analog Method

(The Sample Version of) the MoM Estimator


Suppose the true distribution of X satisfies
Z
E [m (X jθ 0 )] = 0 or m (xjθ 0 )dF (x ) = 0,

where m : Θ Rk ! Rk , and F ( ) is the true cdf of X .


Notations:
( R
m (xjθ 0 )f (x )dx, if X is continuous,
E [m (X jθ 0 )] =
∑Jj=1 m (xj jθ 0 )pj , if X is discrete,

where f (x ) is the pdf Rof X , and pj = P (X = xj )jj = 1, , J is the pmf of X . We


write E [m (X jθ 0 )] as m (xjθ 0 )dF (x ) to cover both cases.
The essence of the MoM estimator is to substitute the true distribution F ( ) by the
n
empirical distribution F bn (x ) = 1 ∑ 1(Xi x ):
n
i =1
Z
bn (x ) = 0,
m (xjθ )d F

which is equivalent to
1 n
n i∑
m (Xi jθ ) = 0. (7)
=1
The MoM estimator θb (X1 , , Xn ) is the solution to (7).
Ping Yu (HKU) Introduction 39 / 41
Econometric Approaches The Analog Method

(The Sample Version of) the MLE

Similarly, the MLE can be constructed as the maximizer of the average


log-likelihood function
1 n
`n (θ ) = ∑ ln f (Xi jθ ) ,
n i =1
which is equivalent to the maximizer of the log-likelihood function
n
Ln (θ ) = ∑ ln f (Xi jθ )
i =1

or the likelihood function


n
Ln (θ ) = exp fLn (θ )g = ∏ f (Xi jθ ) .
i =1

If f (xjθ ) is smooth in θ , the FOCs for the MLE are


1 n
n i∑
s (Xi jθ ) = 0,
=1

where s ( jθ ) = ∂ ln f ( jθ )/∂ θ is called the score function.6 So the MLE is a special


MoM estimator in this case.
6
More often, ∑ni=1 s (Xi jθ ) is called the score function.
Ping Yu (HKU) Introduction 40 / 41
Econometric Approaches The Analog Method

Extensions of the Two Methods

(Parametric) Likelihood
0 1
semi-parametric: empirical likelihood
! @ semi-nonparametric: semi-nonparametric likelihood A
nonparametric: nonparametric likelihood
MoM ! GMM
We will cover only the GMM method in this course.
I will teach Econ6086 in the spring semester which will concentrate on machine
learning, and machine learning will focus on nonparametric methods.

Another principle, which is useful especially in linear models, is projection, which


is the topic of our next chapter.
This principle provides more straightforward interpretations of the
above-mentioned estimators by geometric intuitions.

Keep the three principles in mind: likelihood, GMM and projection.

Ping Yu (HKU) Introduction 41 / 41

You might also like