04 LDV
04 LDV
Microeconometrics:
Binary Dependent Variable
Department of Economics
Universitas Padjadjaran
2019
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Additional References
• Dougherty, Introduction to Econometrics, 4th Ed, 2011
*best for basics*
• Golder, M., Advanced Quantitative Analysis: Maximum
Likelihood Estimation,
https://files.nyu.edu/mrg217/public/homepage.htm
Department of Economics Universitas Padjadjaran | Microeconometrics
The mechanism
Suppose:
𝑈𝑖𝑠 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖
The mechanism
So we estimate
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖
𝐸(𝑦𝑖 ) = σ 𝑦𝑖 × 𝑃𝑟𝑜𝑏(𝑦𝑖 )
= 1 × 𝑃𝑟𝑜𝑏 𝑦𝑖 = 1 + [0 × 𝑃𝑟𝑜𝑏 𝑦𝑖 = 0 ]
= 𝑃𝑟𝑜𝑏(𝑦𝑖 = 1)
Department of Economics Universitas Padjadjaran | Microeconometrics
If we estimate
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖
with 𝑦𝑖 is either 0 or 1
𝑃𝑟𝑜𝑏(𝑦𝑖 = 1) = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖
Department of Economics Universitas Padjadjaran | Microeconometrics
• We assume 𝐸(𝑢𝑖 𝑥𝑖 = 0
LPM Interpretation
Suppose we have a more complete set of
independent variables:
LPM Interpretation
Suppose we have a more complete set of
independent variables:
• If 𝑥 is continuous:
– “If 𝑥 increases/decreases by 1 (unit), the
probability of 𝑦 increases/decreases by 𝛽𝑖
percentage points”
Department of Economics Universitas Padjadjaran | Microeconometrics
LPM Interpretation
Suppose we have a more complete set of
independent variables:
LPM Interpretation
Department of Economics Universitas Padjadjaran | Microeconometrics
Limitations of LPM
• Distribution of the error term is not following
Normal Distribution, so test statistics are not
robust
• Suppose
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1 + 𝑢𝑖
𝑢𝑖 = 𝑦𝑖 − 𝛽0 − 𝛽1 𝑥1
When 𝑦𝑖 = 1, 𝑢𝑖 = 1 − 𝛽0 − 𝛽1 𝑥1
When 𝑦𝑖 = 0, 𝑢𝑖 = −𝛽0 − 𝛽1 𝑥1
Department of Economics Universitas Padjadjaran | Microeconometrics
Limitations of LPM
• Distribution of the error term is not following
Normal Distribution, so test statistics are not
robust
• Suppose:
– The probability when 𝑢𝑖 = 1 − 𝛽0 − 𝛽1 𝑥1 is 𝑃𝑖
– and when 𝑢𝑖 = −𝛽0 − 𝛽1 𝑥1 is (1 – 𝑃𝑖)
Limitations of LPM
• Heteroskedasiticity
Limitations of LPM
• Nonfulfillment of 0 ≤ 𝐸( 𝑦𝑖 | 𝑥𝑖) ≤ 1: Does it make sense
to assume this probability is a linear function of 𝑥𝑖 ?
Department of Economics Universitas Padjadjaran | Microeconometrics
PDF CDF
Department of Economics Universitas Padjadjaran | Microeconometrics
Where F is a CDF
Solution
• We need a math function for 𝑦, or 𝐸(𝑦|𝑥), or
𝑃𝑟𝑜𝑏 (𝑦 = 1), that always results in values
between 0 and 1
• In general:
𝑃 𝑦 = 1 𝑥 = 𝐹(𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 )
Department of Economics Universitas Padjadjaran | Microeconometrics
• Therefore
1
𝐸 𝑦𝑖 𝑥 = 𝑃 𝑦𝑖 = 1 𝑥 =
1 + 𝑒 −𝑧𝑖
𝑒 𝑧𝑖
𝐸 𝑦𝑖 𝑥 = 𝑃 𝑦𝑖 = 1 𝑥 = 𝑧
= Λ(𝑧𝑖 )
1+𝑒 𝑖
Department of Economics Universitas Padjadjaran | Microeconometrics
𝑧𝑖 2
𝑑𝐸 𝑦𝑖 𝑥 𝑑Λ 𝑧𝑖 𝑒
= = ∙ 𝛽𝑖
𝑑𝑥 𝑑𝑥 1 + 𝑒 𝑧𝑖
Department of Economics Universitas Padjadjaran | Microeconometrics
𝑦 ∗ = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘 + 𝑢𝑖
But 𝑦 ∗ is unobservable
𝑃 𝑦 = 1 = 𝑃 𝑦∗ > 0
= 𝑃 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘 + 𝑢𝑖 > 0
= 𝑃 𝑢𝑖 > −𝛽0 − 𝛽1 𝑥1 − 𝛽2 𝑥2 − ⋯ 𝛽𝑘 𝑥𝑘
𝑢𝑖 −𝛽0 −𝛽1 𝑥1 −𝛽2 𝑥2 −⋯𝛽𝑘 𝑥𝑘
=𝑃 >
𝜎 𝜎
𝑢𝑖
The distribution of is standard normal
𝜎
Department of Economics Universitas Padjadjaran | Microeconometrics
𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘
=Φ
𝜎
And may be estimated using ML
Department of Economics Universitas Padjadjaran | Microeconometrics
• Therefore
𝐸 𝑦𝑖 𝑥 = 𝑃 𝑦𝑖 = 1 𝑥
𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘
=Φ
𝜎
Department of Economics Universitas Padjadjaran | Microeconometrics
𝑑𝐸 𝑦𝑖 𝑥
= 𝐸 𝑦𝑖 𝑥 = 𝑃 𝑦𝑖 = 1 𝑥
𝑑𝑥
𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽𝑘 𝑥𝑘
=Φ 𝛽𝑖
𝜎
Department of Economics Universitas Padjadjaran | Microeconometrics
Logit or Probit?
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Department of Economics Universitas Padjadjaran | Microeconometrics
Extension
MAXIMUM LIKELIHOOD
ESTIMATOR
Department of Economics Universitas Padjadjaran | Microeconometrics
The content of this slideshow comes from Section R.2 of C. Dougherty, Introduction to Econometrics,
fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be downloaded from the OUP Online
Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.
Individuals studying econometrics on their own who feel that they might benefit from participation in a
formal course should consider the London School of Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
Department of Economics Universitas Padjadjaran | Microeconometrics
Method of ML
• The method of maximum likelihood is
intuitively appealing, because we attempt to
find the values of the true parameters that
would have most likely produced the data that
we in fact observed.
• For most cases of practical interest, the
performance of maximum likelihood
estimators is optimal for large enough data.
p
• This sequence introduces the
0.4 principle of maximum likelihood
0.3 estimation and illustrates it with
some simple examples.
0.2
0.1
• Suppose that you have a
normally-distributed random
0.0 variable X with unknown
0 1 2 3 4 5 6 7 8 m population mean m and
L standard deviation s, and that
0.06 you have a sample of two
observations, 4 and 6. For the
0.04 time being, we will assume that
s is equal to 1.
0.02
0.00
0 1 2 3 4 5 6 7 8 m
1
Normal Distribution
1 xm 2
1 ( )
f ( x) e 2 s
s 2
This is a bell shaped curve
Note constants: with different centers and
=3.14159 spreads depending on m
e=2.71828 and s
p
0.4 m p(4) p(6)
0.3521
0.3
3.5 0.3521 0.0175
0.2
0.1
0.0175
0.0
0 1 2 3 4 5 6 7 8 m
L
0.06
Suppose initially you
0.04 consider the hypothesis m =
3.5. Under this hypothesis
0.02 the probability density at 4
would be 0.3521 and that at
6 would be 0.0175.
0.00
0 1 2 3 4 5 6 7 8 m
1
p
0.4 m p(4) p(6) L
0.3521
0.3
3.5 0.3521 0.0175 0.0062
0.2
0.1
0.0175
0.0
0 1 2 3 4 5 6 7 8 m
L
0.06
The joint probability
0.04 density, shown in the
bottom chart, is the product
0.02 of these, 0.0062.
0.00
0 1 2 3 4 5 6 7 8 m
1
p
0.4 0.3989 m p(4) p(6) L
3.5 0.3521 0.0175 0.0062
0.3
4.0 0.3989 0.0540 0.0215
0.2
0.1
0.0540
0.0
0 1 2 3 4 5 6 7 8 m
L
0.06 Next consider the
hypothesis m = 4.0. Under
0.04 this hypothesis the
probability densities
0.02
associated with the two
observations are 0.3989 and
0.0540, and the joint
0.00
probability density is
0 1 2 3 4 5 6 7 8 m
0.0215.
1
p
0.4 m p(4) p(6) L
0.3521
0.3 3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
0.2
0.1295 4.5 0.3521 0.1295 0.0456
0.1
0.0
0 1 2 3 4 5 6 7 8 m
L
0.06 Next under the hypothesis
m = 4.5, the probability
0.04 densities are 0.3521 and
0.1295, and the joint
0.02
probability density is
0.0456.
0.00
0 1 2 3 4 5 6 7 8 m
1
p
0.4 m p(4) p(6) L
0.00
0 1 2 3 4 5 6 7 8 m
1
p
0.4 m p(4) p(6) L
0.3521
0.3 3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
0.2
0.1295 4.5 0.3521 0.1295 0.0456
0.1
5.0 0.2420 0.2420 0.0585
0.0
5.5 0.1295 0.3521 0.0456
0 1 2 3 4 5 6 7 8 m
L
0.06 Under the hypothesis m =
5.5, the probability
0.04 densities are 0.1295 and
0.3521 and the joint
0.02
probability density is
0.0456.
0.00
0 1 2 3 4 5 6 7 8 m
1
p
0.4 m p(4) p(6) L
0.3521
0.3 3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
0.2
0.1295 4.5 0.3521 0.1295 0.0456
0.1
5.0 0.2420 0.2420 0.0585
0.0
5.5 0.1295 0.3521 0.0456
0 1 2 3 4 5 6 7 8 m
L
0.06 The complete joint density
function for all values of m
0.04 has now been plotted in the
lower diagram. We see that
0.02
it peaks at m = 5.
0.00
0 1 2 3 4 5 6 7 8 m
1
1 X m 2
1
2 s
f (X ) e
s 2
Now we will look at the mathematics of the example. If X is normally distributed with mean
m and standard deviation s, its density function is as shown.
10
1 X m 2
1
2 s
f (X ) e
s 2
1 2
1 X m
f (X ) e 2
2
For the time being, we are assuming s is equal to 1, so the density function simplifies to the
second expression.
11
1 X m 2
1
2 s
f (X ) e
s 2
1 2
1 X m
f (X ) e 2
2
1 2 1 2
1 4 m 1 6 m
f ( 4) e 2
f ( 6) e 2
2 2
Hence we obtain the probability densities for the observations where X = 4 and X = 6.
12
1 X m 2
1
2 s
f (X ) e
s 2
1 2
1 X m
f (X ) e 2
2
1 2 1 2
1 4 m 1 6 m
f ( 4) e 2
f ( 6) e 2
2 2
1 1 4 m 2 1 1 6 m 2
joint density e 2 e 2
2 2
The joint probability density for the two observations in the sample is just the product of
their individual densities.
13
1 X m 2
1
2 s
f (X ) e
s 2
1 2
1 X m
f (X ) e 2
2
1 2 1 2
1 4 m 1 6 m
f ( 4) e 2
f ( 6) e 2
2 2
1 1 4 m 2 1 1 6 m 2
joint density e 2 e 2
2 2
In maximum likelihood estimation we choose as our estimate of m ,the value that gives us
the greatest joint density for the observations in our sample. This value is associated with
the greatest probability, or maximum likelihood, of obtaining the observations in the sample.
14
Department of Economics Universitas Padjadjaran | Microeconometrics
b1 + b2Xi
b1
Xi X
1
Y
b1 + b2Xi
b1
Xi X
3
Y
b1 + b2Xi
b1
Xi X
Potential values of Y close to b1 + b2Xi will have relatively large densities ...
6
Y
b1 + b2Xi
b1
Xi X
... while potential values of Y relatively far from b1 + b2Xi will have small ones.
7
Y
b1 + b2Xi
b1
Xi X
The mean value of the distribution of Yi is b1 + b2Xi. Its standard deviation is s, the standard deviation of
the disturbance term.
8
1 Y b b X 2
1 i 1 2 i
s
f (Yi ) e 2
Y s 2
b1 + b2Xi
b1
Xi X
9
1 Y b b X 2
1 i 1 2 i
s
f (Yi ) e 2
s 2
1 Y1 b1 b 2 X 1 2 1 Yn b1 b 2 X n 2
1 1
s s
f (Y1 ) ... f (Yn ) e 2
... e 2
s 2 s 2
The joint density function for the observations on Y is the product of their individual densities.
10
1 Y b b X 2
1 i 1 2 i
s
f (Yi ) e 2
s 2
1 Y1 b1 b 2 X 1 2 1 Yn b1 b 2 X n 2
1 1
s s
f (Y1 ) ... f (Yn ) e 2
... e 2
s 2 s 2
1 Y b b X 2 1 Y b b X 2
1 1 1 2 1 1 n 1 2 n
L b 1 , b 2 , s | Y1 ,...,Yn e 2 s
... e 2 s
s 2 s 2
Now, taking b1, b2 and s as our choice variables, and taking the data on Y and X as given, we can re-
interpret this function as the likelihood function for b1, b2, and s.
REMEMBER THIS
11
1 Y b b X 2
1 i 1 2 i
s
f (Yi ) e 2
s 2
1 Y1 b1 b 2 X 1 2 1 Yn b1 b 2 X n 2
1 1
s s
f (Y1 ) ... f (Yn ) e 2
... e 2
s 2 s 2
1 Y b b X 2 1 Y b b X 2
1 1 1 2 1 1 n 1 2 n
L b 1 , b 2 , s | Y1 ,...,Yn e 2 s
... e 2 s
s 2 s 2
1 1 Y1 b1 b 2 X 1 2
1 Yn b1 b 2 X n 2
log L log e 2 s
...
1
e 2 s
s 2 s 2
We will choose b1, b2, and s so as to maximize the likelihood, given the data on Y and X. As usual, it is
easier to do this indirectly, maximizing the log-likelihood instead.
12
1 Y b b X 2
1 1 2 1
1 Y b b X 2
n 1 2 n
log L log
1 s 1 s
e 2
... e 2
s 2 s 2
1 Y1 b1 b 2 X 1 2
1 1 Yn b1 b 2 X n 2
log ... log
1
2 s 2 s
e e
s 2 s 2
1 1 Y1 b 1 b 2 X 1 1 Yn b 1 b 2 X n
2 2
n log ...
s 2 2 s 2 s
1 s
2
n log Z
s 2 2
As usual, the first step is to decompose the expression as the sum of the logarithms of the factors.
13
1 Y b b X 2
1 1 2 1
1 Y b b X 2
n 1 2 n
log L log
1 s 1 s
e 2
... e 2
s 2 s 2
1 Y1 b1 b 2 X 1 2
1 1 Yn b1 b 2 X n 2
log ... log
1
2 s 2 s
e e
s 2 s 2
1 1 Y1 b 1 b 2 X 1 1 Yn b 1 b 2 X n
2 2
n log ...
s 2 2 s 2 s
1 s
2
n log Z
s 2 2
Then we split the logarithm of each factor into two components. The first component is the same in each
case.
14
1 Y b b X 2
1 1 2 1
1 Y b b X 2
n 1 2 n
log L log
1 s 1 s
e 2
... e 2
s 2 s 2
1 Y1 b1 b 2 X 1 2
1 1 Yn b1 b 2 X n 2
log ... log
1
2 s 2 s
e e
s 2 s 2
1 1 Y1 b 1 b 2 X 1 1 Yn b 1 b 2 X n
2 2
n log ...
s 2 2 s 2 s
1 s
2
n log Z
s 2 2
15
1 Y b b X 2
1 1 2 1
1 Y b b X 2
n 1 2 n
log L log
1 s 1 s
e 2
... e 2
s 2 s 2
1 Y1 b1 b 2 X 1 2
1 1 Yn b1 b 2 X n 2
log ... log
1
2 s 2 s
e e
s 2 s 2
1 1 Y1 b 1 b 2 X 1 1 Yn b 1 b 2 X n
2 2
n log ...
s 2 2 s 2 s
1 s
2
n log Z
s 2 2
To maximize the log-likelihood, we need to minimize Z. But choosing estimators of b1 and b2 to minimize
Z is exactly what we did when we derived the least squares regression coefficients.
16
1 Y b b X 2
1 1 2 1
1 Y b b X 2
n 1 2 n
log L log
1 s 1 s
e 2
... e 2
s 2 s 2
1 Y1 b1 b 2 X 1 2
1 1 Yn b1 b 2 X n 2
log ... log
1
2 s 2 s
e e
s 2 s 2
1 1 Y1 b 1 b 2 X 1 1 Yn b 1 b 2 X n
2 2
n log ...
s 2 2 s 2 s
1 s
2
n log Z
s 2 2
Thus, for this regression model, the maximum likelihood estimators of b1 and b2 are identical to the least
squares estimators.
17
s
2
1
log L n log Z
s 2 2
where Z (Y1 b 1 b 2 X 1 )2 ... (Yn b 1 b 2 X n )2
ei2 where ei Yi b1 b2 X i
As a consequence, Z will be the sum of the squares of the least squares residuals.
18
1 s
2
log L n log Z
s 2 2
1 1 s 2
n log n log Z
s 2 2
1 s
2
n log s n log Z
2 2
To obtain the maximum likelihood estimator of s, it is convenient to rearrange the log-likelihood function
as shown.
19
1 s
2
log L n log Z
s 2 2
1 1 s 2
n log n log Z
s 2 2
1 s
2
n log s n log Z
2 2
log L
s 3 Z s 3 Z ns 2
n
s s
20
1 s
2
log L n log Z
s 2 2
1 1 s 2
n log n log Z
s 2 2
1 s
2
n log s n log Z
2 2
log L
s 3 Z s 3 Z ns 2
n
s s
ŝ 2 i
2
Z e
n n
The first order condition for a maximum requires this to be equal to zero. Hence the maximum likelihood
estimator of the variance is the sum of the squares of the residuals divided by n.
21
1 s
2
log L n log Z
s 2 2
1 1 s 2
n log n log Z
s 2 2
1 s
2
n log s n log Z
2 2
log L
s 3 Z s 3 Z ns 2
n
s s
ŝ 2 i
2
Z e
n n
Note that this is biased for finite samples. To obtain an unbiased estimator, we should divide by n–k,
where k is the number of parameters, in this case 2. However, the bias disappears as the sample size
becomes large.
22