0% found this document useful (0 votes)
398 views

MicroEconometrics Lecture10

This document discusses limited dependent variable models used in econometrics. It begins by introducing binary choice models, where the dependent variable can take on only two values (usually 0 and 1). It then discusses the linear probability model (LPM) and some of its limitations, such as predicting probabilities outside the valid 0-1 range. As an alternative, it proposes binary response models using an index function and a cumulative distribution function like the logit or probit models. The document concludes by introducing maximum likelihood estimation as the preferred approach for estimating these types of non-linear models.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
398 views

MicroEconometrics Lecture10

This document discusses limited dependent variable models used in econometrics. It begins by introducing binary choice models, where the dependent variable can take on only two values (usually 0 and 1). It then discusses the linear probability model (LPM) and some of its limitations, such as predicting probabilities outside the valid 0-1 range. As an alternative, it proposes binary response models using an index function and a cumulative distribution function like the logit or probit models. The document concludes by introducing maximum likelihood estimation as the preferred approach for estimating these types of non-linear models.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Economics 440.

618 Microeconometrics
Michael T. Sandfort
Department of Applied Economics
The Johns Hopkins University
November 19, 2013
Limited Dependent Variable Models

The models we have looked at so far are appropriate if y is a


continuous quantitative variable, e.g., demand, price, output,
wage rate, etc.

But there are many interesting economic decisions which are


tough to characterize continously:

For a worker: accept or reject a job oer?

For a rm: enter (if out) or exit (if in) a market?

For a woman: whether to have a child? If so, how many?

For a consumer: which brand of peanut butter to buy?

Today, we are going to start discussing the econometrics of


this kind of data starting with a very simple model: the binary
choice model.

The binary choice model is a good entry point to talking


about more complicated limited dependent variable models,
which is where we will end the course.
Economics 440.618 Microeconometrics 2
Binary Response Model

We say that a dependent variable y we want to model is


binary if it takes on only two values. We typically recode these
two values as 1 (yes, true, success, accepted,
survived, etc.) and 0 (no, false, failure, rejected,
died, etc.).

Whenever the variable we want to model is binary, its natural


to think in terms of probabilities.

What is the probability that an individual with characteristics x


owns a home?

If the persons characteristics were x

rather than x, how


would that aect the probability that they own a home?

Data for such dependent variables looks just like a dummy


variable that one might use as an explanatory variable in a
regression.

But the implications of having it on the left hand side of our


structural equation rather than the right hand side are
signicant.
Economics 440.618 Microeconometrics 3
Binary Response Model

Assuming we have a random sample, the sample mean of our


binary variable is an unbiased estimate of the unconditional
probability of success (y = 1). That is,
E
_
1
n

i
y
i
_
=
1
n
nE(y) = E(y)
= 1 Pr (y = 1) + 0 Pr (y = 0)
= Pr (y = 1)

Estimating the unconditional probability of success is ne, but


it wont allow us to explore many interesting policy questions.

What is the overall rate of home ownership vs. What


would be the eect on the home ownership rate of rescinding
the home mortgage interest tax deduction?

Can we use OLS to address a problem like this?


Economics 440.618 Microeconometrics 4
OLS Estimation: The Linear Probability Model (LPM)

A naive way to approach this problem is to simply apply the


classical regression model, so
y = x + u
with OLS.1 and OLS.2.

Using the same logic as on the previous slide, we know that


Pr (y = 1|x) = E(y|x) = x.

Since probabilities must sum to one, it is also true that


Pr (y = 0|x) = 1 Pr (y = 1|x) = 1 x.

This is a binary response model where the probability of


success is a linear function of x hence the name.

In the linear probability model, the incremental change in


probability due to a discrete change x
j
in x
j
is
Pr (y = 1|x) =
j
x
j
.
Economics 440.618 Microeconometrics 5
LPM: An Example
Consider again the data on womens wages weve looked at before.
Rather than dropping the zero-hours observations, well try to
directly model the labor force participation decision.
Our dependent variable will be y = inlf which takes the value 1 if
the woman was in the labor force and 0 otherwise. We t a model
of labor force participation:
> print(coeftest(res))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.58551922 0.15417800 3.7977 0.0001579 ***
nwifeinc -0.00340517 0.00144849 -2.3508 0.0189908 *
educ 0.03799530 0.00737602 5.1512 3.317e-07 ***
exper 0.03949239 0.00567267 6.9619 7.376e-12 ***
expersq -0.00059631 0.00018479 -3.2270 0.0013059 **
age -0.01609081 0.00248468 -6.4760 1.709e-10 ***
kidslt6 -0.26181047 0.03350579 -7.8139 1.889e-14 ***
kidsge6 0.01301223 0.01319596 0.9861 0.3244154
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
This looks OK, but consider the following...
Economics 440.618 Microeconometrics 6
LPM: An Example
On the left is what the data and tted regression line look like with
just inlf as a function of exper. On the right is the marginal
eect of an increase in experience on the probability of labor force
participation:
10 0 10 20 30 40 50

1
.
0

0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
2
.
0
exper
i
n
l
f
10 0 10 20 30 40 50
0
.
0
0
.
5
1
.
0
1
.
5
exper

P
r
Note that the predicted probability Pr (y = 1|exper) > 1 for
exper > 10 and that Pr exceeds one (100%) for exper > 30.
Economics 440.618 Microeconometrics 7
Concerns About LPM
Some of the problems are fairly evident, but worth highlighting

(LHS gure) Some plausible combinations of independent


variables give tted/predicted probabilities less than zero or
greater than one. Since a probability must lie between zero
and one, this can be embarrassing.

(RHS gure) It doesnt really make sense to say that a


probability measure is linearly related to a continuously
varying independent variable over a large range. Changes in
experience exper > 30 are within the range of the data, and
lead to changes in probability over 100%.

Heteroskedasticity is baked into the model in the sense that


V(u|x) = Pr (y = 1|x)[1 x]
2
+ Pr (y = 0|x)[0 x]
2
= x[1 x]
2
+ (1 x)[x]
2
= x[1 x]
which is clearly a function of x. Thus, use of a robust
variance-covariance matrix is compulsory.
Economics 440.618 Microeconometrics 8
Concerns About LPM
Finally, the residuals from the LPM are clearly non-normal, as seen
in the gure. A kernel density plot (a continuous approximation of
a histogram) of the data is shown in red. The scaled normal
density is shown in green for reference.
Histogram and Kernel Density for LPM Residuals
Residuals
D
e
n
s
i
t
y
1.0 0.5 0.0 0.5 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Economics 440.618 Microeconometrics 9
An Alternative Model

To address these problems in the LPM, consider an alternative


binary response model of the form
Pr (y = 1|x) = G(
1
+
2
x
2
+ +
k
x
k
) = G(x)
where G(w) is a function taking values strictly between zero
and one (0 < G(w) < 1) for all w R.

In this general form, the model is called an index model, since


all of the variation in x is being compressed into a single index
x, to which G() is then applied.

This model immediately solves several of the LPM problems,


since G() cant take values outside (0, 1) and marginal eects
will be similarly bounded.

G() is usually chosen to be a cumulative density function, so


Pr (y = 1|x) 1 as x + and Pr (y = 0|x) 0 as
x .
Economics 440.618 Microeconometrics 10
The Logit Model
Two CDFs G(w) are used in
most applications. The rst is
the logit CDF, which has the
form
G(x) =
exp(x)
1 + exp(x)
= (x)
3 2 1 0 1 2 3
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
The Logit CDF (w)
w

(
w
)
Economics 440.618 Microeconometrics 11
The Probit Model
The second is the standard
normal CDF, which has the form
G(x) =
_
x

(w)dw
= (x)
where
(w) =
1

2
exp
_

w
2
2
_
3 2 1 0 1 2 3
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
The Probit (Normal) CDF (w)
w

(
w
)
Economics 440.618 Microeconometrics 12
How to estimate?

Both of these functions increase rapidly near the origin and


increase much less rapidly toward the extremes.

That is, the partial/marginal eect of an increase in x


changes, depending on the level of x.

This is a desirable feature, because constant marginal eects


was one of the objections to the LPM.

Nonetheless, it raises a new challenge, because this is no


longer a model that we can estimate by OLS.

The approach usually taken to estimation of index function


models like the logit and probit is maximum likelihood (ML).

ML estimation is a technique with very broad applications, so


well spend the rest of this class talking about the ML
estimator and then apply it specically to the logit and probit.
Economics 440.618 Microeconometrics 13
Maximum Likelihood (ML) Estimation
The principle underlying ML estimation is fairly simple to
articulate. Consider the gure below where the histogram from a
sample y = (y
1
, . . . , y
n
) from a univariate random variable y is
shown. The sample looks like it could have come from a normal
distribution, so the same gure shows several normal densities
(,
2
), with dierent and
2
. From which density does the
data most likely come?
Histogram of y
y
D
e
n
s
i
t
y
4 2 0 2 4
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Economics 440.618 Microeconometrics 14
Maximum Likelihood (ML) Estimation

ML estimation is just a way of formalizing this procedure, so


you can use the technique in situations with many more
parameters situations where your ability to visualize may fail
you.

Formally, suppose a population in which the random variable


y is distributed according to some distribution with density
f (y, ), where is a vector of parameters.

Also suppose that we have a random sample y = (y


1
, . . . , y
n
)
drawn from that population but that we dont know .

Evidently, the data is more likely (think about the gure


again) to have been drawn from a density with one set of
parameters (say,

) rather than another (say,

).

Our objective is to use the data y to develop our best


estimate

of .
Economics 440.618 Microeconometrics 15
Maximum Likelihood (ML) Estimation

If the draws in y are taken independently from the population


with true parameter f (y, ), then the probability (or
likelihood) of observing the sample is
L(|y
1
, y
2
, . . . , y
n
) = f (y
1
, )f (y
2
, ) f (y
1
, )
since the probability of two or more independent events is just
the product of their probabilities.

The function L(, y) is known as the likelihood function.

Since the maximum of any function g(z) is the same as the


maximum of h(g(z)) if h is a strictly increasing function, we
often work with the log-likelihood function
ln L(, y) =
n

i =1
ln f (y
i
, )
rather than the log likelihood function itself.

The value

which maximizes L(, y) (or ln L(, y)) is known
as the maximum likelihood estimate of .
Economics 440.618 Microeconometrics 16
ML Estimation: Example 1 (Exponential)

As a rst example, consider using ML to estimate the single


parameter from data y known to come from an exponential
distribution. The density of the exponential distribution is
f (y, ) =
1

So the likelihood function L(, y) is


L(, y) =
_
1

y
1

__
1

y
2

_

_
1

y
n

And the log-likelihood function is


ln L(, y) = n ln()
1

i =1
y
i
Economics 440.618 Microeconometrics 17
ML Estimation: Example 1 (Exponential)

Because the log-likelihood function is so simple, we can


actually solve it algebraically. Usually, thats not possible.
Taking the derivative of the log-likelihood function with
respect to gives
d ln L(, y)
d
=
n

i
y
i

Setting the derivative equal to zero solves for an interior


maximizer

of the log-likelihood function.

i
y
i

2
= 0
or

=
1
n

i
y
i
.

In other words, the ML estimate of for an exponential


distribution is just

= y, the sample mean.

This makes sense because (if you look up the exponential


distribution), is the population mean of an exponential
population.
Economics 440.618 Microeconometrics 18
ML Estimation: Example 1 (Exponential)

Generally, we cant solve algebraically for the ML estimator, so


we have to use a computer to calculate

.

To do this for the exponential example, I rst create a


function which returns the value of the log-likelihood function
ln L(, y). This function has two arguments, but in ML the
data is taken as given, so the second argument doesnt
change.
> LL.exp = function(theta,y) {
+ n = length(y)
+ LL = -n * log(theta) - (1/theta) * sum(y)
+ return(LL)
+ }

If the number of parameters in is large, I may not be able to


plot the log-likelihood function very easily but I can solve for
the ML estimator even if I cant visualize the log-likelihood
function.

In this example I only have a single parameter ( = ), so a


plot is a useful tool.
Economics 440.618 Microeconometrics 19
ML Estimation: Example 1 (Exponential)
The sample (n = 1000) was drawn from an exp(.5) distribution, so
the sample mean will be very near
1
2
. The log-likelihood evidently
has a maximum around 0.5, but to nd out what it is exactly, we
will need to nd the max numerically.
0.4 0.6 0.8 1.0

6
0
0

5
0
0

4
0
0

3
0
0

l
n

L
(

,
y
)
Economics 440.618 Microeconometrics 20
ML Estimation: Example 1 (Exponential)
The optim function can be used to perform numerical
optimization. Heres an example of using it to solve the last
problem:
> sol = optim(1,fn=LL.exp,gr=NULL,y.data,method="BFGS",control=list(fnscale=-1))
> sol$par
[1] 0.4983282
Heres what the arguments mean:
gr A function providing the gradient is often used to speed up
calculation. The argument gr references that function if it exists
(otherwise NULL).
y.data This is a reference to our data set. All unnamed arguments after gr
are passed through to the function to be optimized: LL.exp.
method There are several options. This is a safe one.
control This is a list of control parameters. By setting fnscale to -1, we
are saying we want to maximize rather than minimize (the default).
Economics 440.618 Microeconometrics 21
ML Estimation: Example 1 (Exponential)
Note that our data doesnt have to be drawn from an exponential
to t it with an exponential density. We could do the same thing
with data from a U(.25, 1).
0.4 0.6 0.8 1.0

1
0
0
0

8
0
0

6
0
0

l
n

L
(

,
y
)
> sol = optim(1,fn=LL.exp,gr=NULL,y.data,method="BFGS",control=list(fnscale=-1))
> sol$par
[1] 0.6135531
Economics 440.618 Microeconometrics 22
ML Estimation: Example 2 (Normal)
Similarly, we can estimate parameters of a normal via ML. We
dont need to do much algebra to know that, since the population
parameters are = (,
2
), the ML estimates of the parameters
should end up looking like the sample mean and variance. Lets
check.

The likelihood function is


L(
1
,
2
, y) =
n

i =1
_
1

2
2
exp
_

(y
i

1
)
2
2
2
__
or
L(
1
,
2
, y) =
_
1

2
2
_
n
exp
_

i
(y
i

1
)
2
2
2
_

The log-likelihood function is


ln L(
1
,
2
, y) =
n
2
ln(2
2
)

i
(y
i

1
)
2
2
2
Economics 440.618 Microeconometrics 23
ML Estimation: Example 2 (Normal)

The partial derivative of the log-likelihood with respect to


1
is
ln L

1
=
1

2
n

i =1
(y
i

1
)

The partial with respect to


2
is
ln L

2
=
n
2
2
+
1
2
2
2
n

i =1
(y
i

1
)
2

Its left to you as an exercise to set these two expressions


equal to zero and show that they lead to

1
= y and

2
=
1
n
n

i =1
(y
i
y)
2
Economics 440.618 Microeconometrics 24
ML Estimation: Example 2 (Normal)
Now lets solve this as we usually would, numerically. We start
with a data set consisting of 1000 draws from a N(0, 1). Below are
the log-likelihood function and a plot of contours of the
log-likelihood function. The plot clearly shows a maximum

= (

1
,

2
) somewhere in the vicinity of (0, 1) so far so good.
> LL.norm = function(theta,y) {
+ n = length(y)
+ LL = -(n/2) * log(2*pi) - (n/2)*log(theta[2]) -
+ (1/(2*theta[2])) * sum( (y-theta[1])^2 )
+ return(LL)
+ }
Economics 440.618 Microeconometrics 25
ML Estimation: Example 2 (Normal)
t1
t
2
1456
1452
1450
1448 1448
1446
1444
1442
1440
1438

1
4
3
8

1436

1
4
3
6

1434
1434
1432
1432
1430
1428
1426
1424
0.10 0.05 0.00 0.05 0.10
0
.
8
0
.
9
1
.
0
1
.
1
1
.
2
Economics 440.618 Microeconometrics 26
ML Estimation: Example 2 (Normal)
> sol = optim(c(.5,.75),fn=LL.norm,gr=NULL,y.data,
+ method="BFGS",control=list(fnscale=-1))
> sol$par
[1] 0.0295824 1.0092104
Economics 440.618 Microeconometrics 27

You might also like