0% found this document useful (0 votes)
38 views95 pages

04 LDV

The document discusses binary dependent variables and limitations of the linear probability model for modeling such variables. It introduces the logit model as a better alternative that uses a cumulative distribution function to ensure estimated probabilities remain between 0 and 1.

Uploaded by

Rinda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views95 pages

04 LDV

The document discusses binary dependent variables and limitations of the linear probability model for modeling such variables. It introduces the logit model as a better alternative that uses a cumulative distribution function to ensure estimated probabilities remain between 0 and 1.

Uploaded by

Rinda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 95

Department of Economics Universitas Padjadjaran | Microeconometrics

Microeconometrics:
Binary Dependent Variable

Department of Economics
Universitas Padjadjaran
2019
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics

Additional References
• Dougherty, Introduction to Econometrics, 4th Ed, 2011
*best for basics*
• Golder, M., Advanced Quantitative Analysis: Maximum
Likelihood Estimation,
https://files.nyu.edu/mrg217/public/homepage.htm
Department of Economics Universitas Padjadjaran | Microeconometrics

Estimators we (will) know


• Ordinary Least Square (OLS) estimator
– If we have a SLR of 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖 and 𝑥𝑖 is
𝑐𝑜𝑣(𝑥𝑖 ,𝑦𝑖 )

exogenous, then we have 𝛽1 = 𝑂𝐿𝑆
𝑣𝑎𝑟 𝑥𝑖

• Instrumental Variable (IV) estimator


– If we have a SLR of 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖 and 𝑥𝑖 is
መ 𝐼𝑉 𝑐𝑜𝑣(𝑧𝑖 ,𝑦𝑖 )
endogenous, then we have 𝛽1 = where
𝑐𝑜𝑣 𝑧𝑖 ,𝑥𝑖
𝐶𝑜𝑣(𝑧𝑖 , 𝑥𝑖 ) ≠ 0
• Maximum Likelihood (ML) estimator
Department of Economics Universitas Padjadjaran | Microeconometrics

Why uses binary dependent variable?


• Observed vs unobserved variables

• Suppose we want to analyse socioeconomic


factors underlying some people to:
– Corrupt
– Smoke
– Borrow money
– Get a scholarship
– Have boy/girl-friend(s)
– etc
Department of Economics Universitas Padjadjaran | Microeconometrics

Why uses binary dependent variable?


• Observed vs unobserved variables

• It would be best to know (observe)


– Utility derived from corruption, smoking, or
borrowing money, having a boy/girl-friend(s)…
– The actual (factual) cash flow of families
– A consistent way of measuring poverty

• And we could just apply OLS


Department of Economics Universitas Padjadjaran | Microeconometrics

Why uses binary dependent variable?


• Observed vs unobserved variables

• It would be best to know (observe)


– Utility derived from corruption, smoking, or
borrowing money, having a boy/girl-friend(s)…
– The actual (factual) cash flow of families
– A consistent way of measuring poverty

• But.. they are not observed


Department of Economics Universitas Padjadjaran | Microeconometrics

Why uses binary dependent variable?


• Observed vs unobserved variables

• What we observe is that


– Some people corrupt
– Some people smoke
– Some people borrow money
– Some people get scholarships
– Some people have boy/girl-friend(s)
Department of Economics Universitas Padjadjaran | Microeconometrics

The mechanism
Suppose:
𝑈𝑖𝑠 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖

But 𝑈𝑖𝑠 , the utility of smoking, is unobserved.

We, however, observe

𝑦𝑖 = 1 𝑖𝑓 𝑈𝑖𝑠 > 0 and


𝑦𝑖 = 0 𝑖𝑓 𝑈𝑖𝑠 ≤ 0
Department of Economics Universitas Padjadjaran | Microeconometrics

The mechanism
So we estimate
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖

We know the value of 𝑦𝑖 , either 0 or 1


• Because of this, we may think as if 𝑦𝑖 is an event which
outcomes is 0 or 1
• Therefore, essentially what we want to know is 𝐸(𝑦𝑖 )
Department of Economics Universitas Padjadjaran | Microeconometrics

The Linear Probability Model


Using formula for expected value:

𝐸(𝑦𝑖 ) = σ 𝑦𝑖 × 𝑃𝑟𝑜𝑏(𝑦𝑖 )

= 1 × 𝑃𝑟𝑜𝑏 𝑦𝑖 = 1 + [0 × 𝑃𝑟𝑜𝑏 𝑦𝑖 = 0 ]

= 𝑃𝑟𝑜𝑏(𝑦𝑖 = 1)
Department of Economics Universitas Padjadjaran | Microeconometrics

The Linear Probability Model

If we estimate
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖

with 𝑦𝑖 is either 0 or 1

using OLS, we have a Linear Probability Model

𝑃𝑟𝑜𝑏(𝑦𝑖 = 1) = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖
Department of Economics Universitas Padjadjaran | Microeconometrics

The Linear Probability Model


We know from previous lectures about OLS that:

• We assume 𝐸(𝑢𝑖 𝑥𝑖 = 0

• we can write 𝑦𝑖 as 𝐸(𝑦𝑖 |𝑥𝑖 )

Therefore we can write our LPM

𝐸(𝑦𝑖 𝑥𝑖 = 𝑝𝑟𝑜𝑏(𝑦𝑖 = 1|𝑥𝑖 ) = 𝛽0 + 𝛽1 𝑥𝑖


Department of Economics Universitas Padjadjaran | Microeconometrics

LPM Interpretation
Suppose we have a more complete set of
independent variables:

𝐸(𝑦𝑖 𝑥𝑖 = 𝑝𝑟𝑜𝑏(𝑦𝑖 = 1|𝑥𝑖 ) = 𝛽0 + ෍ 𝛽𝑖 𝑥𝑖

• We cannot interpret our 𝛽𝑖 ’s as usual, because


𝑦𝑖 changes ONLY from 0 to 1 (vice versa)
Department of Economics Universitas Padjadjaran | Microeconometrics

LPM Interpretation
Suppose we have a more complete set of
independent variables:

𝐸(𝑦𝑖 𝑥 = 𝑝𝑟𝑜𝑏(𝑦𝑖 = 1|𝑥) = 𝛽0 + ෍ 𝛽𝑖 𝑥𝑖

• If 𝑥 is continuous:
– “If 𝑥 increases/decreases by 1 (unit), the
probability of 𝑦 increases/decreases by 𝛽𝑖
percentage points”
Department of Economics Universitas Padjadjaran | Microeconometrics

LPM Interpretation
Suppose we have a more complete set of
independent variables:

𝐸(𝑦𝑖 𝑥 = 𝑝𝑟𝑜𝑏(𝑦𝑖 = 1|𝑥) = 𝛽0 + ෍ 𝛽𝑖 𝑥𝑖

• If 𝑥 is dummy variable (e.g 1=male):


– “Suppose there are two individuals who are identical
in every respect but 1 individual is male the other one
is female; The probability of 𝑦 of male is 𝛽𝑖 percentage
points higher or lower (than female)”
Department of Economics Universitas Padjadjaran | Microeconometrics

LPM Interpretation
Department of Economics Universitas Padjadjaran | Microeconometrics

Limitations of LPM
• Distribution of the error term is not following
Normal Distribution, so test statistics are not
robust

• Suppose
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1 + 𝑢𝑖
𝑢𝑖 = 𝑦𝑖 − 𝛽0 − 𝛽1 𝑥1

When 𝑦𝑖 = 1, 𝑢𝑖 = 1 − 𝛽0 − 𝛽1 𝑥1
When 𝑦𝑖 = 0, 𝑢𝑖 = −𝛽0 − 𝛽1 𝑥1
Department of Economics Universitas Padjadjaran | Microeconometrics

Limitations of LPM
• Distribution of the error term is not following
Normal Distribution, so test statistics are not
robust

• Suppose:
– The probability when 𝑢𝑖 = 1 − 𝛽0 − 𝛽1 𝑥1 is 𝑃𝑖
– and when 𝑢𝑖 = −𝛽0 − 𝛽1 𝑥1 is (1 – 𝑃𝑖)

• The error term follows Bernoulli distribution


Department of Economics Universitas Padjadjaran | Microeconometrics

Limitations of LPM
• Heteroskedasiticity

Since the error term follows Bernoulli distribution, then

Variance of the error term:


𝜎 2 = 𝑃𝑖 1 − 𝑃𝑖 = 1 − 𝛽0 − 𝛽1 𝑥1 −𝛽0 − 𝛽1 𝑥1
Department of Economics Universitas Padjadjaran | Microeconometrics

Limitations of LPM
• Nonfulfillment of 0 ≤ 𝐸( 𝑦𝑖 | 𝑥𝑖) ≤ 1: Does it make sense
to assume this probability is a linear function of 𝑥𝑖 ?
Department of Economics Universitas Padjadjaran | Microeconometrics

Having said that…


• LPM is still widely used in empirical research,
as long as we are sure that all limitations are
tackled
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics

What is a “better” model for estimating


E(yi)?
• Since probability of an event has to be
between 0 and 1, a good model would be a
nonlinear function of x that its result never
gets negative or larger than 1 !

• A class of function that we have already seen


in statistics and satisfy this requirement are
Cumulative Distribution Functions (CDFs)
Department of Economics Universitas Padjadjaran | Microeconometrics

What is a better model for estimating


E(yi)?

PDF CDF
Department of Economics Universitas Padjadjaran | Microeconometrics

What is a better model for E(yi)?


• We denote CDFs using the letter F

𝐸(𝑦𝑖 𝑥 = 𝑝𝑟𝑜𝑏(𝑦𝑖 = 1|𝑥) = 𝐹 𝛽0 + ෍ 𝛽𝑖 𝑥𝑖

Where F is a CDF

• Therefore to model a binary dependent variables we need to choose a


CDF and to have an estimation method appropriate for estimating 𝛽0 and
𝛽1
Department of Economics Universitas Padjadjaran | Microeconometrics

Solution
• We need a math function for 𝑦, or 𝐸(𝑦|𝑥), or
𝑃𝑟𝑜𝑏 (𝑦 = 1), that always results in values
between 0 and 1

• Whatever the values of independent variables are


(can be from −∞ to +∞), the values of
dependent variable will be between 0 and 1

• In general:
𝑃 𝑦 = 1 𝑥 = 𝐹(𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 )
Department of Economics Universitas Padjadjaran | Microeconometrics

Solution 1: Logit Model


𝑃 𝑦 = 1 𝑥 = 𝐹 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘
= 𝐹 𝑧𝑖

F can be in the form of


1
𝑃𝑖 = 𝑃 𝑦 = 1 𝑥 =
1 + 𝑒 −𝑧𝑖
equivalently:
𝑃𝑖
= 𝑒 𝑧𝑖
1−𝑃𝑖
Department of Economics Universitas Padjadjaran | Microeconometrics

Solution 1: Logit Model


Logit Model:
1 𝑒 𝑧𝑖
Λ 𝑧𝑖 = 𝑃 𝑦 = 1 𝑥 = −𝑧
=
1+𝑒 𝑖 1 + 𝑒 𝑧𝑖
Department of Economics Universitas Padjadjaran | Microeconometrics

Solution 1: Logit Model


Taking the log of both sides
𝑃𝑖
𝑙𝑜𝑔 = 𝑧𝑖
1 − 𝑃𝑖
Hence
𝑃𝑖
𝐿𝑖 = 𝑙𝑜𝑔 = 𝑧𝑖
1 − 𝑃𝑖
• We call Li  Logit model
• We estimate logit model using Maximum
Likelihood method
Department of Economics Universitas Padjadjaran | Microeconometrics

Logit Model: Coefficients &


Marginal Effects
• Coefficients are not Marginal Effects (not directly
interprettable)
– Because of non-linearity setting in the model

• Therefore
1
𝐸 𝑦𝑖 𝑥 = 𝑃 𝑦𝑖 = 1 𝑥 =
1 + 𝑒 −𝑧𝑖

𝑒 𝑧𝑖
𝐸 𝑦𝑖 𝑥 = 𝑃 𝑦𝑖 = 1 𝑥 = 𝑧
= Λ(𝑧𝑖 )
1+𝑒 𝑖
Department of Economics Universitas Padjadjaran | Microeconometrics

Logit Model: Coefficients &


Marginal Effects
𝑒 𝑧𝑖
𝐸 𝑦𝑖 𝑥 = 𝑃 𝑦𝑖 = 1 𝑥 =
1 + 𝑒 𝑧𝑖

To get the marginal effect, we need to differentiate:

𝑧𝑖 2
𝑑𝐸 𝑦𝑖 𝑥 𝑑Λ 𝑧𝑖 𝑒
= = ∙ 𝛽𝑖
𝑑𝑥 𝑑𝑥 1 + 𝑒 𝑧𝑖
Department of Economics Universitas Padjadjaran | Microeconometrics

Solution 2: Probit Model


Suppose we have an equation:

𝑦 ∗ = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘 + 𝑢𝑖

But 𝑦 ∗ is unobservable

What we observed is actually 𝑦, which takes the


value of 1 if 𝑦 ∗ > 0 and 0 otherwise

We assume that 𝑢𝑖 ~𝑁(0, 𝜎 2 )


Department of Economics Universitas Padjadjaran | Microeconometrics

Solution 2: Probit Model


Hence

𝑃 𝑦 = 1 = 𝑃 𝑦∗ > 0
= 𝑃 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘 + 𝑢𝑖 > 0
= 𝑃 𝑢𝑖 > −𝛽0 − 𝛽1 𝑥1 − 𝛽2 𝑥2 − ⋯ 𝛽𝑘 𝑥𝑘
𝑢𝑖 −𝛽0 −𝛽1 𝑥1 −𝛽2 𝑥2 −⋯𝛽𝑘 𝑥𝑘
=𝑃 >
𝜎 𝜎

𝑢𝑖
The distribution of is standard normal
𝜎
Department of Economics Universitas Padjadjaran | Microeconometrics

Solution 2: Probit Model


Since the normal distribution is symmetric, we can write
𝑢𝑖 −𝛽0 − 𝛽1 𝑥1 − 𝛽2 𝑥2 − ⋯ 𝛽𝑘 𝑥𝑘
𝑃(𝑦 = 1) = 𝑃 >
𝜎 𝜎
𝑢𝑖 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘
=𝑃 <
𝜎 𝜎

𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘

𝜎
And may be estimated using ML
Department of Economics Universitas Padjadjaran | Microeconometrics

Probit Model: Coefficients &


Marginal Effects
• Coefficients are not Marginal Effects (not directly
interpretable)
– Because of non-linearity setting in the model

• Therefore
𝐸 𝑦𝑖 𝑥 = 𝑃 𝑦𝑖 = 1 𝑥
𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘

𝜎
Department of Economics Universitas Padjadjaran | Microeconometrics

Probit Model: Coefficients &


Marginal Effects
𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘
𝐸 𝑦𝑖 𝑥 = 𝑃 𝑦𝑖 = 1 𝑥 = Φ
𝜎

To get the marginal effect, we need to differentiate:

𝑑𝐸 𝑦𝑖 𝑥
= 𝐸 𝑦𝑖 𝑥 = 𝑃 𝑦𝑖 = 1 𝑥
𝑑𝑥
𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘
=Φ 𝛽𝑖
𝜎
Department of Economics Universitas Padjadjaran | Microeconometrics

Logit or Probit?
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics

Gender Inequality and Poverty in Indonesia:


Evidence from Household Data
Kinanti Z. Patria
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics

Jeffrey Aron Natan, 2019


Department of Economics Universitas Padjadjaran | Microeconometrics

Jeffrey Aron Natan, 2019


Department of Economics Universitas Padjadjaran | Microeconometrics

Estimation of Logit and Probit Models


• We do not use OLS, rather we use the Maximum
Likelihood Method

• MLE (Maximum Likelihood Estimator) of the


unknown parameters are the value of the
parameters that maximize the likelihood function

• Likelihood function: the joint probability of the


observed sample implied by the model
Department of Economics Universitas Padjadjaran | Microeconometrics

Extension

MAXIMUM LIKELIHOOD
ESTIMATOR
Department of Economics Universitas Padjadjaran | Microeconometrics

Maximum Likelihood Estimator


• Remember that our data is Random Variable
– Follows certain probability density function (pdf)
or probability distribution
• Suppose we have 5 observations of variable Y
𝑌 = {15,21,30,45,50}
– What is the odds that we will have these
observations from a normal distribution with 𝜇 =
100?
Department of Economics Universitas Padjadjaran | Microeconometrics

Maximum Likelihood Estimator


• Remember that our data is Random Variable
– Follows certain probability density function (pdf)
or probability distribution
• Suppose we have 5 observations of variable Y
𝑌 = {15,21,30,45,50}
– What is the odds that we will have these
observations from a normal distribution with 𝜇 =
100? 𝜇 = 35?
Department of Economics Universitas Padjadjaran | Microeconometrics

Maximum Likelihood Estimator


• “Maximum Likelihood is just a systematic way
of searching for the parameter values of our
chosen distribution that maximize the
probability of observing the data we observe”
(Matt Golder, 2013)
Copyright Christopher Dougherty 2012.

These slideshows may be downloaded by anyone, anywhere for personal use.


Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for
teaching an econometrics course. There is no need to refer to the author.

The content of this slideshow comes from Section R.2 of C. Dougherty, Introduction to Econometrics,
fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be downloaded from the OUP Online
Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit from participation in a
formal course should consider the London School of Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
Department of Economics Universitas Padjadjaran | Microeconometrics

Method of ML
• The method of maximum likelihood is
intuitively appealing, because we attempt to
find the values of the true parameters that
would have most likely produced the data that
we in fact observed.
• For most cases of practical interest, the
performance of maximum likelihood
estimators is optimal for large enough data.
p
• This sequence introduces the
0.4 principle of maximum likelihood
0.3 estimation and illustrates it with
some simple examples.
0.2

0.1
• Suppose that you have a
normally-distributed random
0.0 variable X with unknown
0 1 2 3 4 5 6 7 8 m population mean m and
L standard deviation s, and that
0.06 you have a sample of two
observations, 4 and 6. For the
0.04 time being, we will assume that
s is equal to 1.
0.02

0.00
0 1 2 3 4 5 6 7 8 m
1
Normal Distribution

1 xm 2
1  ( )
f ( x)  e 2 s
s 2
This is a bell shaped curve
Note constants: with different centers and
=3.14159 spreads depending on m
e=2.71828 and s
p
0.4 m p(4) p(6)
0.3521
0.3
3.5 0.3521 0.0175

0.2

0.1
0.0175
0.0
0 1 2 3 4 5 6 7 8 m
L
0.06
Suppose initially you
0.04 consider the hypothesis m =
3.5. Under this hypothesis
0.02 the probability density at 4
would be 0.3521 and that at
6 would be 0.0175.
0.00
0 1 2 3 4 5 6 7 8 m
1
p
0.4 m p(4) p(6) L
0.3521
0.3
3.5 0.3521 0.0175 0.0062

0.2

0.1
0.0175
0.0
0 1 2 3 4 5 6 7 8 m
L
0.06
The joint probability
0.04 density, shown in the
bottom chart, is the product
0.02 of these, 0.0062.

0.00
0 1 2 3 4 5 6 7 8 m
1
p
0.4 0.3989 m p(4) p(6) L
3.5 0.3521 0.0175 0.0062
0.3
4.0 0.3989 0.0540 0.0215
0.2

0.1
0.0540
0.0
0 1 2 3 4 5 6 7 8 m
L
0.06 Next consider the
hypothesis m = 4.0. Under
0.04 this hypothesis the
probability densities
0.02
associated with the two
observations are 0.3989 and
0.0540, and the joint
0.00
probability density is
0 1 2 3 4 5 6 7 8 m
0.0215.
1
p
0.4 m p(4) p(6) L
0.3521
0.3 3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
0.2
0.1295 4.5 0.3521 0.1295 0.0456
0.1

0.0
0 1 2 3 4 5 6 7 8 m
L
0.06 Next under the hypothesis
m = 4.5, the probability
0.04 densities are 0.3521 and
0.1295, and the joint
0.02
probability density is
0.0456.
0.00
0 1 2 3 4 5 6 7 8 m
1
p
0.4 m p(4) p(6) L

0.3 3.5 0.3521 0.0175 0.0062


0.2420 0.2420
4.0 0.3989 0.0540 0.0215
0.2
4.5 0.3521 0.1295 0.0456
0.1
5.0 0.2420 0.2420 0.0585
0.0
0 1 2 3 4 5 6 7 8 m
L
0.06 Under the hypothesis m =
5.0, the probability
0.04 densities are both 0.2420
and the joint probability
0.02
density is 0.0585.

0.00
0 1 2 3 4 5 6 7 8 m
1
p
0.4 m p(4) p(6) L
0.3521
0.3 3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
0.2
0.1295 4.5 0.3521 0.1295 0.0456
0.1
5.0 0.2420 0.2420 0.0585
0.0
5.5 0.1295 0.3521 0.0456
0 1 2 3 4 5 6 7 8 m
L
0.06 Under the hypothesis m =
5.5, the probability
0.04 densities are 0.1295 and
0.3521 and the joint
0.02
probability density is
0.0456.
0.00
0 1 2 3 4 5 6 7 8 m
1
p
0.4 m p(4) p(6) L
0.3521
0.3 3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
0.2
0.1295 4.5 0.3521 0.1295 0.0456
0.1
5.0 0.2420 0.2420 0.0585
0.0
5.5 0.1295 0.3521 0.0456
0 1 2 3 4 5 6 7 8 m
L
0.06 The complete joint density
function for all values of m
0.04 has now been plotted in the
lower diagram. We see that
0.02
it peaks at m = 5.

0.00
0 1 2 3 4 5 6 7 8 m
1
1 X  m  2
1   
2 s 
f (X )  e
s 2

Now we will look at the mathematics of the example. If X is normally distributed with mean
m and standard deviation s, its density function is as shown.
10
1 X  m  2
1   
2 s 
f (X )  e
s 2
1 2
1   X m 
f (X )  e 2
2

For the time being, we are assuming s is equal to 1, so the density function simplifies to the
second expression.

11
1 X  m  2
1   
2 s 
f (X )  e
s 2
1 2
1   X m 
f (X )  e 2
2

1 2 1 2
1   4 m  1   6 m 
f ( 4)  e 2
f ( 6)  e 2
2 2

Hence we obtain the probability densities for the observations where X = 4 and X = 6.

12
1 X  m  2
1   
2 s 
f (X )  e
s 2
1 2
1   X m 
f (X )  e 2
2

1 2 1 2
1   4 m  1   6 m 
f ( 4)  e 2
f ( 6)  e 2
2 2

 1  1  4  m  2  1  1  6  m  2 
joint density  e 2  e 2 
 2  2 
  
The joint probability density for the two observations in the sample is just the product of
their individual densities.

13
1 X  m  2
1   
2 s 
f (X )  e
s 2
1 2
1   X m 
f (X )  e 2
2

1 2 1 2
1   4 m  1   6 m 
f ( 4)  e 2
f ( 6)  e 2
2 2

 1  1  4  m  2  1  1  6  m  2 
joint density  e 2  e 2 
 2  2 
  
In maximum likelihood estimation we choose as our estimate of m ,the value that gives us
the greatest joint density for the observations in our sample. This value is associated with
the greatest probability, or maximum likelihood, of obtaining the observations in the sample.
14
Department of Economics Universitas Padjadjaran | Microeconometrics

MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS

MLE AND REGRESSION ANALYSIS


Y

b1 + b2Xi

b1

Xi X

1
Y

b1 + b2Xi

b1

Xi X

3
Y

b1 + b2Xi

b1

Xi X

Potential values of Y close to b1 + b2Xi will have relatively large densities ...

6
Y

b1 + b2Xi

b1

Xi X

... while potential values of Y relatively far from b1 + b2Xi will have small ones.

7
Y

b1 + b2Xi

b1

Xi X

The mean value of the distribution of Yi is b1 + b2Xi. Its standard deviation is s, the standard deviation of
the disturbance term.

8
1 Y  b  b X 2

1   i 1 2 i 
s
f (Yi )  e 2 

Y s 2

b1 + b2Xi

b1

Xi X

Hence the density function for the ex ante distribution of Yi is as shown.

9
1 Y  b  b X 2

1   i 1 2 i 
s
f (Yi )  e 2 
s 2

1  Y1  b1  b 2 X 1  2 1  Yn  b1  b 2 X n  2
1    1   
s s
f (Y1 )  ...  f (Yn )  e 2 
 ...  e 2 
s 2 s 2

The joint density function for the observations on Y is the product of their individual densities.

10
1 Y  b  b X 2

1   i 1 2 i 
s
f (Yi )  e 2 
s 2

1  Y1  b1  b 2 X 1  2 1  Yn  b1  b 2 X n  2
1    1   
s s
f (Y1 )  ...  f (Yn )  e 2 
 ...  e 2 
s 2 s 2

1 Y  b  b X  2 1 Y  b  b X  2
1   1 1 2 1 1   n 1 2 n
L b 1 , b 2 , s | Y1 ,...,Yn   e 2 s 
 ...  e 2 s 
s 2 s 2

Now, taking b1, b2 and s as our choice variables, and taking the data on Y and X as given, we can re-
interpret this function as the likelihood function for b1, b2, and s.
REMEMBER THIS
11
1 Y  b  b X 2

1   i 1 2 i 
s
f (Yi )  e 2 
s 2

1  Y1  b1  b 2 X 1  2 1  Yn  b1  b 2 X n  2
1    1   
s s
f (Y1 )  ...  f (Yn )  e 2 
 ...  e 2 
s 2 s 2

1 Y  b  b X  2 1 Y  b  b X  2
1   1 1 2 1 1   n 1 2 n
L b 1 , b 2 , s | Y1 ,...,Yn   e 2 s 
 ...  e 2 s 
s 2 s 2

 1 1  Y1  b1  b 2 X 1  2
 
1  Yn  b1  b 2 X n  2 
 
log L  log  e 2 s


 ... 
1
e 2 s  

 s 2 s 2 
 
We will choose b1, b2, and s so as to maximize the likelihood, given the data on Y and X. As usual, it is
easier to do this indirectly, maximizing the log-likelihood instead.

12
 1 Y  b  b X  2
  1 1 2 1
1 Y  b  b X  2
  n 1 2 n 
log L  log 
1 s 1 s
e 2 
 ...  e 2 
s 2 s 2 
 
 1  Y1  b1  b 2 X 1  2
    1 1  Yn  b1  b 2 X n  2 
 
 log   ...  log  
1  
2 s  2 s
e e
s 2   s 2 
   
 1  1  Y1  b 1  b 2 X 1  1  Yn  b 1  b 2 X n 
2 2

 n log      ...   
 s 2  2  s  2 s 
 1  s
2
 n log  Z
 s 2  2

As usual, the first step is to decompose the expression as the sum of the logarithms of the factors.

13
 1 Y  b  b X  2
  1 1 2 1
1 Y  b  b X  2
  n 1 2 n 
log L  log 
1 s 1 s
e 2 
 ...  e 2 
s 2 s 2 
 
 1  Y1  b1  b 2 X 1  2
    1 1  Yn  b1  b 2 X n  2 
 
 log   ...  log  
1  
2 s  2 s
e e
s 2   s 2 
   
 1  1  Y1  b 1  b 2 X 1  1  Yn  b 1  b 2 X n 
2 2

 n log      ...   
 s 2  2  s  2 s 
 1  s
2
 n log  Z
 s 2  2

Then we split the logarithm of each factor into two components. The first component is the same in each
case.

14
 1 Y  b  b X  2
  1 1 2 1
1 Y  b  b X  2
  n 1 2 n 
log L  log 
1 s 1 s
e 2 
 ...  e 2 
s 2 s 2 
 
 1  Y1  b1  b 2 X 1  2
    1 1  Yn  b1  b 2 X n  2 
 
 log   ...  log  
1  
2 s  2 s
e e
s 2   s 2 
   
 1  1  Y1  b 1  b 2 X 1  1  Yn  b 1  b 2 X n 
2 2

 n log      ...   
 s 2  2  s  2 s 
 1  s
2
 n log  Z
 s 2  2

where Z  (Y1  b 1  b 2 X 1 )2  ...  (Yn  b 1  b 2 X n )2 

Hence the log-likelihood simplifies as shown.

15
 1 Y  b  b X  2
  1 1 2 1
1 Y  b  b X  2
  n 1 2 n 
log L  log 
1 s 1 s
e 2 
 ...  e 2 
s 2 s 2 
 
 1  Y1  b1  b 2 X 1  2
    1 1  Yn  b1  b 2 X n  2 
 
 log   ...  log  
1  
2 s  2 s
e e
s 2   s 2 
   
 1  1  Y1  b 1  b 2 X 1  1  Yn  b 1  b 2 X n 
2 2

 n log      ...   
 s 2  2  s  2 s 
 1  s
2
 n log  Z
 s 2  2

where Z  (Y1  b 1  b 2 X 1 )2  ...  (Yn  b 1  b 2 X n )2 

To maximize the log-likelihood, we need to minimize Z. But choosing estimators of b1 and b2 to minimize
Z is exactly what we did when we derived the least squares regression coefficients.

16
 1 Y  b  b X  2
  1 1 2 1
1 Y  b  b X  2
  n 1 2 n 
log L  log 
1 s 1 s
e 2 
 ...  e 2 
s 2 s 2 
 
 1  Y1  b1  b 2 X 1  2
    1 1  Yn  b1  b 2 X n  2 
 
 log   ...  log  
1  
2 s  2 s
e e
s 2   s 2 
   
 1  1  Y1  b 1  b 2 X 1  1  Yn  b 1  b 2 X n 
2 2

 n log      ...   
 s 2  2  s  2 s 
 1  s
2
 n log  Z
 s 2  2

where Z  (Y1  b 1  b 2 X 1 )2  ...  (Yn  b 1  b 2 X n )2 

Thus, for this regression model, the maximum likelihood estimators of b1 and b2 are identical to the least
squares estimators.

17
 s
2
 1
log L  n log  Z
 s 2  2
where Z  (Y1  b 1  b 2 X 1 )2  ...  (Yn  b 1  b 2 X n )2 
  ei2 where ei  Yi  b1  b2 X i

As a consequence, Z will be the sum of the squares of the least squares residuals.

18
 1  s
2
log L  n log  Z
 s 2  2
 
1  1  s 2
 n log   n log  Z
s   2  2
 1  s
2
  n log s  n log  Z
 2  2

To obtain the maximum likelihood estimator of s, it is convenient to rearrange the log-likelihood function
as shown.

19
 1  s
2
log L  n log  Z
 s 2  2
 
1  1  s 2
 n log   n log  Z
s   2  2
 1  s
2
  n log s  n log  Z
 2  2
 log L
   s  3 Z  s  3 Z  ns 2 
n
s s

Differentiating it with respect to s, we obtain the expression shown.

20
 1  s
2
log L  n log  Z
 s 2  2
 
1  1  s 2
 n log   n log  Z
s   2  2
 1  s
2
  n log s  n log  Z
 2  2
 log L
   s  3 Z  s  3 Z  ns 2 
n
s s

ŝ 2    i
2
Z e
n n

The first order condition for a maximum requires this to be equal to zero. Hence the maximum likelihood
estimator of the variance is the sum of the squares of the residuals divided by n.

21
 1  s
2
log L  n log  Z
 s 2  2
 
1  1  s 2
 n log   n log  Z
s   2  2
 1  s
2
  n log s  n log  Z
 2  2
 log L
   s  3 Z  s  3 Z  ns 2 
n
s s

ŝ 2    i
2
Z e
n n

Note that this is biased for finite samples. To obtain an unbiased estimator, we should divide by n–k,
where k is the number of parameters, in this case 2. However, the bias disappears as the sample size
becomes large.
22

You might also like