0% found this document useful (0 votes)
61 views42 pages

Microeconometrie Chapitre1 BinaryOutcomeModels

The document discusses binary outcome models, including the logit and probit models. It introduces the Bernoulli model and explains why OLS regression is not suitable for binary dependent variables. Examples of binary choice models for fishing mode choice are provided. The chapter outline includes logit vs probit models, specification issues, and estimation methods.

Uploaded by

Astrela Mengue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views42 pages

Microeconometrie Chapitre1 BinaryOutcomeModels

The document discusses binary outcome models, including the logit and probit models. It introduces the Bernoulli model and explains why OLS regression is not suitable for binary dependent variables. Examples of binary choice models for fishing mode choice are provided. The chapter outline includes logit vs probit models, specification issues, and estimation methods.

Uploaded by

Astrela Mengue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Microéconométrie

Théophile T. Azomahou
University Clermont Auvergne, CNRS, CERDI
Maastricht University, School of Business and Economics
Email: [email protected]

PTCI PhD program


Campus Dakar, 20-28 Février 2020

Théophile T. Azomahou (CERDI) Février 20-28, 2020 1 / 42


Binary Outcomes Models

Chapter 1. Binary Outcomes Models


1. Introduction
Discrete outcome or qualitative response models or limited dependent
variables are models for a dependent variable that indicates in which one of
m mutually exclusive categories the outcome of interest falls. Usually, there
is no natural ordering of the categories.

Example: occupation choice of a worker, decision to purchase a good (car


of some color..), decision to participate/contribute in some activity; for a
firm, decision to innovate, etc.
This chapter: the simplest case of binary outcomes, where there are two
possible outcomes (YES/NO or 1/0).
Estimation: usually done by maximum likelihood because the distribution of
the data is necessarily defined by the Bernoulli model.
If the probability of one outcome equals p, then the probability of the other
outcome must be (1 − p).

Théophile T. Azomahou (CERDI) Février 20-28, 2020 2 / 42


Binary Outcomes Models

The Bernoulli Model


In probability theory and statistics, the Bernoulli distribution, named after
Swiss mathematician Jacob Bernoulli, is the discrete probability distribution
of a random variable which takes the value 1 with probability p, and the
value 0 with probability 1 − p; meaning, the probability distribution of any
single experiment that asks a yes-no question.
If Y is a random variable with Bernoulli distribution, then (by
definition):

P(Y = 1) = p P(Y = 0) = 1 − p (1)

Also (find out), the expected value and the variance of X are:

E (Y ) = p V (Y ) = p(1 − p) (2)

The two standard binary outcome models, the logit and the probit models,
specify different functional forms for this probability as a function of
regressors. The difference between these estimators is qualitatively similar to
use of different functional forms for the conditional mean in least-squares
regression.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 3 / 42


Binary Outcomes Models

Why OLS regresion cannot do the job in this context?


OLS regression (linear regression) is designed for continuous response
variable (taking on real values: y ∈ R)

Assume: we want to fit the model:


yi = α + βxi + ui i = 1, · · · , N
where
(
1 if individual i benefits from social protection
yi =
0 otherwise
and x denotes individual characteristics (income, occupation, age,
education, marital status, household characteristics, etc.)

OLS regression computes: E (yi |xi ) that ignores the discreteness of the
dependent variable and does not constrain predicted probabilities to be
between zero and one.

What we want to fit here: pi = P(yi = 1|xi ) with 0 ≤ pi ≤ 1. Logit


and Probit models do this job.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 4 / 42
Binary Outcomes Models

Example: Cameron and Trivedi (2005)

Fishing mode choice: from a charter boat vs. from a pier. y = 1 if


charter boat and 0 otherwise.

yi = α + ln(pricecharter .i /pricepier .i ) + εi

Table: Data summary

Variable Charter (y = 1) Pier (y = 0) Overall


Price charter ($) 75 110 85
Price pier ($) 121 31 95
ln (relative price) -0.264 1.643 0.275
Sample probability 0.717 0.283 1.000
Observations 452 178 630

Théophile T. Azomahou (CERDI) Février 20-28, 2020 5 / 42


Binary Outcomes Models

Table: Estimation results: Logit, Probit, OLS

Variable Logit Probit OLS


ln (relative price) -1.823 -1.056 -0.243
(-12.61) (-13.87) (-28.15)
Intercept 2.053 1.194 0.784
(12.15) (13.34) (65.58)
ln L -206.83 -204.41
Pseudo R 2 0.449 0.455 0.463
Observations 452 178 630

Théophile T. Azomahou (CERDI) Février 20-28, 2020 6 / 42


Binary Outcomes Models

Figure: Predicted probabilities (propensity score)

Théophile T. Azomahou (CERDI) Février 20-28, 2020 7 / 42


Binary Outcomes Models

Outline of the chapter


Logit vs. Probit models - Latent variables - Specification Issues - Estimation: ML,
Two-step, GMM, Semiparametric ML - Application
References
1 Amemiya, Takeshi (1985), Advanced Econometrics, Blackwell.

2 Cameron, A.C. and Trivedi, P.K. (2005) Microeconometrics: Methods and


Applications. Cambridge University Press.
3 Manski, Charles F., Daniel L. McFadden, Eds. (1981), Structural Analysis of
Discrete Data with Econometric Applications, MIT Press.
4 Greene, W.H. (2018) Econometric Analysis. Pearson.
5 Maddala, G.S. (1983). Limited-Dependent and Qualitative Variables in
Econometrics. Cambridge University Press, Cambridge, UK.
6 Stock, James H.; Watson, Mark W. (2003). Introduction to Econometrics.
Addison-Wesley, Boston.
7 Wooldridge, J.M. (2010). Econometric Analysis of Cross Section and Panel
Data. MIT Press, Cambridge.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 8 / 42
Binary Outcomes Models

2. Logit and Probit models


2.1 General binary outcome model

Assume: (
1 with probability p,
yi =
0 with probability 1 − p
A regression model is formed by parameterizing the probability p to depend on a
regressor vector x and a K × 1 parameter vector β. The commonly used models
are of single-index form with conditional probability given by:

pi ≡ P(yi = 1 | x) = F (x 0i β) (3)

where F (·) is a specified function. To ensure that 0 ≤ p ≤ 1, one has to make


sure that F (·) defines a cumulative distribution function (CDF).
Logit model: F (·) is the CDF of the logistic distribution
Probit model: F (·) is the standard normal CDF
Linear probability model: does not use a CDF and instead lets pi = x 0i β

Théophile T. Azomahou (CERDI) Février 20-28, 2020 9 / 42


Binary Outcomes Models

Plot of cumulative distribution function, CDF

1.00

0.75
F(bx)

0.50

0.25

0.00
230 220 210 0 10 20 30
b9x

lim P(Y = 1|x) = 1 lim P(Y = 1|x) = 0


xβ→+∞ xβ→−∞

Théophile T. Azomahou (CERDI) Février 20-28, 2020 10 / 42


Binary Outcomes Models

Table: Binary Outcome Data: Usual Models

Model Probability (p ≡ P[y = 1 | x]) Marginal effect (∂p/∂xk )


0
0 ex β
Logit Λ(x β) = 1+e x 0 β
Λ(x 0 β)[1 − Λ(x 0 β)]βk
0
R x 0β
Probit Φ(x β) = −∞ φ(z)dz φ(x 0 β)βk
Linear probability x 0β βk

where φ(z) and Φ(z) denote the normal density and the CDF of the normal
distribution respectively:
(  2 )
1 1 z − z̄
φ(z) = √ exp − , normal density
σ 2π 2 σ

1
f (z) = , Logistic density: of hyperbolic-secant-square, sech2
(1 + e −z )2

Théophile T. Azomahou (CERDI) Février 20-28, 2020 11 / 42


Binary Outcomes Models

2.2 Maximum Likelihood estimation (ML)


n

Reminder: for binomial distribution, P(X = k) = k p k (1 − p)n−k .

Consider a given sample (yi , x i ), i = 1, · · · , N, where we assume independence


over i. The outcome is Bernoulli distributed, the binomial distribution with just
one trial. A very convenient compact notation for the density of yi , or more
formally its probability mass function, is

f (yi |x i ) = piyi (1 − pi )1−yi , yi = 0, 1 (4)

where pi = F (x 0i β). This yields probabilities pi and (1 − pi ) since


f (1) = p 1 (1 − p)0 = p and f (0) = p 0 (1 − p)1 = 1 − p. The density (4) implies
that
ln f (yi ) = yi ln pi + (1 − yi ) ln(1 − pi ) (5)

Théophile T. Azomahou (CERDI) Février 20-28, 2020 12 / 42


Binary Outcomes Models

Given independence over i and model (3) for pi , the log-likelihood function is
N
X
L(yi , x i , β) = [yi ln F (x 0i β) + (1 − yi ) ln(1 − F (x 0i β))] (6)
i=1

Differentiating with respect to β, we have that the MLE β


b
ML solves

N  
∂L(yi , x i , β) X yi 0 1 − yi 0
= Fi x i − Fi x i = 0 (7)
∂β Fi 1 − Fi
i=1

Converting to fractions with common denominator Fi (1 − Fi ) and simplifying


yields the ML first-order conditions:
N
X yi − F (x 0i β)
F 0 (x 0i β)x i = 0 (8)
F (x 0i β)(1 − F (x 0i β))
i=1

There is no explicit solution for β


b
ML and iterative (algorithm) procedures such as
Newton-Raphson are usually used and convergence is achieved provided that the
log-likelihood function is well specified and globally concave (first derivative
positive and second derivative negative). – Figure –
Théophile T. Azomahou (CERDI) Février 20-28, 2020 13 / 42
Binary Outcomes Models

2.3 Consistency and asymptotic distribution of β


b
ML

Consistency. β b
ML is consistent if the conditional density of y given x is correctly
specified (Bernoulli), meaning if pi ≡ F (x 0i β). For binary data, we have
E (y ) = 1 × p + 0 × (1 − p) = p. This implies that E (yi |x i ) = F (x 0i β). This also
implies that the left hand side of (8) has expected value zero, the essential
condition for consistency.

Asymptotic distribution. The general results from ML theory apply:


b ∼ N(β, Ω0 ), where Ω0 defines the information matrix: – reminder here –
β ML
−1
∂2L

Ω0 = −E (9)
∂β∂β 0
which is estimated by the asymptotic variance matrix

N
!−1
\ 1 X 1
V (β
b )=
ML 0 β̂)(1 − F (x 0 β̂))
F 0 (x 0i β̂)2 x i x 0i (10)
N F (x i i
i=1

Exercise 1: Compute (9) and derive (10)


Théophile T. Azomahou (CERDI) Février 20-28, 2020 14 / 42
Binary Outcomes Models

2.4 Marginal effects and Average Partial Effects (APE)


The marginal effect of change in a regressor on the conditional probability that
y = 1 is of interest. For general probability model and change in the jth regressor,
the marginal effect is:
For a continuous regressor

∂P(yi = 1 | x i )
= F 0 (x 0i β)βk (11)
∂xik
∂F (z)
where F 0 = ∂z

For a discrete regressor: the marginal effect is the difference in predicted


probabilities. In particular, in the case of a dichotomic regressor, the marginal
effect would be computed as:

P[y = 1|x −k , xk = 1] − P[y = 1|x −k , xk = 0] = F (x 0−k β −k + βk ) − F (x 0−k β −k )


(12)
where x −k denotes a vector with all regressors in x but x k , and β −k denote a
vector with all coefficients in β but β k .

Théophile T. Azomahou (CERDI) Février 20-28, 2020 15 / 42


Binary Outcomes Models

Remarks
The marginal effects differ with the point of evaluation x i as for any
nonlinear model, and differ with different choices of F (·). Usually the
marginal effect is evaluated at the sample means of the data.
When the marginal effect is evaluated at every observation and then the
sample average of individual marginal effects is used, we get the Average
Partial Effects (APE).
The sign of the coefficient gives the sign of the marginal effect, since
F (·) > 0.
Interpretation
i) Marginal effects are additive approximations of effects in non-additive
models. For example, a value of 0.07 means that the probability is
more likely to increase by 7 percentage point.
ii) The marginal effect tells by how many units the probability changes if
the explanatory variable changes by 1 unit. A probability is measured in
percentage and its units are percentage points.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 16 / 42


Binary Outcomes Models

2.5 Logit specification

The logit model or logistic regression model specifies


0
ex β
p = Λ(x 0 β) = (13)
1 + e x 0β
ez 1
where Λ(·) is the logistic CDF with Λ(z) = (1+e z ) = 1+e −z

The MLE FOC (8) turns to be


N
X
(yi − Λ(x 0i β))x i = 0 (14)
i=1

1
Λ(x 0i β̂) = ȳ the
P
Moreover, the average in-sample predicted probability N i
sample frequency of yi .

Exercise 2: Compute the asymptotic variance of β


b
ML for the logit
specification.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 17 / 42


Binary Outcomes Models

The marginal effects for the logit model can be fairly easily obtained from
the coefficients, since:
∂pi
= pi (1 − pi )βk (15)
∂xik
where pi = Λ(x 0i β). The marginal effect gives the increase in the probability
that y = 1 as xik changes.
Odds ratio or relative risk. A very common interpretation of the
coefficients is in terms of marginal effects on the odds ratio rather than on
the probability. For the logit model:
0
ex β p 0
p= x 0 β =⇒ = ex β (16)
1+e 1−p
 
p
ln = x 0β (17)
1−p
p/(1 − p) measures the probability that y = 1 relative to the probability that
y = 0 and is called odds ratio or relative risk.
Example: Odds ratios are multiplicative effects. A odd ratio of 1.07
corresponds to a, increase of 7%. A logit model slope parameter of 0.1
means that a one-unit increase in the regressor increases the odds ratio by a
multiple 0.1
Théophile T. Azomahou (CERDI) Février 20-28, 2020 18 / 42
Binary Outcomes Models

2.6 Probit specification


The probit model specifies the conditional probability
Z x 0β
p = Φ(x 0 β) = φ(z)dz (18)
−∞

where Φ(·) denotes the standard normal CDF, with derivative


 2
1 −z
φ(z) = √ exp (19)
2π 2
which is the standard normal density function. The MLE FOC (8) are:
N
X
wi (yi − Φ(x 0i β))x i = 0 (20)
i=1

where unlike the logit, the weight wi = φ(x 0i β)/[Φ(x 0i β)(1 − Φ(x 0i β))] varies
across observations. The probit model marginal effects are
∂pi
= φ(x 0i β)βk (21)
∂xik
without no further simplification. – Exercise 3: Compute the asymptotic
variance of βb
ML for the probit specification.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 19 / 42
Binary Outcomes Models

2.7 Remarks on OLS estimation

A simple alternative to either logit or probit is OLS regression of y on x.


This has the obvious deficiency that it is possible to obtain predicted
probabilities x 0i β̂ that are negative or that exceed one.
The OLS estimator is nonetheless useful as an exploratory tool. In practice it
provides a reasonable direct estimate of the sample-average marginal effect
on the probability that y = 1 as x changes, even though it provides a poor
model for individual probabilities. In practice it provides a good guide to
which variables are statistically significant.
If the OLS estimator is used then standard errors should correct for
heteroskedasticity. Linear regression is justified if the probability pi = x 0i β.
Then y |x i has mean x 0i β and heteroskedastic variance x 0i β(1 − x 0i β) that
varies with x 0i . However, it is very unlikely that p = since then p = x 0i β is
not restricted to be between 0 and 1.
Although OLS estimation with heteroskedastic standard errors can be a
useful exploratory data analysis tool, it is best to use the logit or probit MLE
for final data analysis.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 20 / 42


Binary Outcomes Models

2.8 Choosing a binary model: logit or probit?


The logit model has a relatively simple form for the first-order conditions
and interpretation of coefficients in terms of the log-odds ratio.
The probit model, in contrast, has the attraction of being motivated by a
latent normal random variable. For these reasons many economists use the
probit model.
The different models do yield quite different estimates βb of regression
parameters. However, this is just an artifact of using different CDF.
Logit and probit models have similar shapes for central values of F (·) but
differ in tails as F (·) approaches 0 or 1.
It is more meaningful to compare the marginal effect across models, as this
measure is scaled similarly across the three models.
∂p ∂p ∂p
For logit: ∂x k
= 0.25β̂k ; For Probit: ∂xk
= 0.4β̂k ; For OLS: ∂xk
= β̂k . This
suggests comparison of coef. across models using the conversion factor:
β
b ' 4β
logit
b
ols (22)
β
b
probit ' 2.5β
b
ols (23)
β
b
logit ' 1.6β
b
probit (24)
Théophile T. Azomahou (CERDI) Février 20-28, 2020 21 / 42
Binary Outcomes Models

3. Latent variable models


A latent variable is a variable that is incompletely observed (or which one only
observes the sign), say y ∗ . Latent variable models do provide extensions to
multinomial outcomes and censored outcomes.

Assume an observed continuous latent random variable y ∗ , but all we observe is


the binary variable y , which takes value 1 or 0 according to whether or not y ∗
crosses a threshold. Different distributions for y ∗ lead to different binary outcome
models.

The regression model for y ∗ is the index function model:

y ∗ = x 0β + u (25)

However, this model cannot be estimated as y ∗ is not observed. Instead, we


observe (
1 if y ∗ > 0
y= (26)
0 if y ∗ ≤ 0
where the threshold of zero is a normalization explained in the following.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 22 / 42
Binary Outcomes Models

Equations (25) and (26) can be rewritten in a more compact way,

y∗ = x 0β + u (27)
y = 1[y ∗ >0]

where 1[·] denotes the indicator function which is 1 if the condition is fulfilled and
0 if not. Given (27),

P(y = 1 | x) = P(y ∗ > 0)


= P(x 0 β + u > 0)
= P(−u < x 0 β)
= F (x 0 β)

where F is the CDF of −u, which equals the CDF of u in the usual case of
density symmetric around 0.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 23 / 42


Binary Outcomes Models

Remarks on the latent model

Identification of the single-index model requires a restriction on the variance


of u, as the single-index model can only identify β up to scale. All that is
observed is whether or not y ∗ > 0, or equivalently whether or not
x 0 β + u > 0. However, this is equivalent to whether or not x 0 β + + u + > 0,
where β + = aβ and u + = au for any a > 0. Placing a restriction on the
variance of the error (u or u + ) secures uniqueness of β. The error variance is
set to one in the probit model and π 2 /3 in the logit model.
The index function model implies a direct interpretation of β as the change
in the latent variable y ∗ when x changes by one unit. Even though y ∗ is
unobserved, this interpretation is meaningful if one uses knowledge of the
specified variance of u. For example, a slope parameter of 0.5 in the probit
model means a one-unit change in the regressor leads to a 0.5 standard
deviation change in y ∗ , since in this model the variance of y ∗ equals 1.
Commonly used extensions of the index function approach are to ordered
discrete choice models and to models for censored and selected samples.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 24 / 42


Binary Outcomes Models

4. Specification issues in binary response models


4.1 Neglected heterogeneity
Neglected heterogeneity problem stands for the consequences of omitted
variables when those variables are independent of the included control variables.
Assume the latent model:
y ∗ = xβ + γc + u (28)
where y = 1[y ∗ >0] . We assume u|x, c ∼ N(0, 1). Suppose c ⊥ x and
c ∼ N(0, τ 2 ). Under these assumptions, the composite term γc + u is
independent of x and is normally distributed such that γc + u ∼ N(0, γ 2 τ 2 + 1).
Therefore  

P(y = 1) = P(γc + u > −xβ | x) = Φ (29)
σ
p
where: σ = γ 2 τ 2 + 1. From (29), it follows that a probit estimation
consistently estimates β/σ. So, if β̂ is a probit estimator of y on x, then
β̂ = β/σ. Because σ > 1 (unless γ = 0, or τ 2 = 0), |β/σ| < |β|.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 25 / 42


Binary Outcomes Models

Remarks on neglected heterogeneity

The statement that in probit models, neglected heterogeneity is a much


serious problem (leading to inconsistent estimates) than in linear models
(even if the omitted heterogeneity in independent of x), should be taken
with more cautious.
In fact, we just show that probit of y on x consistently estimates β/σ rather
than β. So the statement is technically correct. However, given that we are
more interested in estimating marginal effects (and then directions of the
effects) and not just the parameters, estimating β/σ is just as good as
estimating β.

If c is correlated with x or is otherwise dependent on x (for example,


V (c|x) depends on x, then the omission of c poses a serious problem. But
this is already well known from linear regression models with omitted
controls that are correlated with x.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 26 / 42


Binary Outcomes Models

4.2 Endogeneity
4.2.1 Continuous endogenous explanatory variable
We assume that at least one continuous explanatory variable is correlated with
the error term in the latent equation. One option can be to estimate a linear
probability model by 2SLS (with instrumental variables, IV).
Let’s rather consider the case of estimating a latent probit model:
y1∗ = z 1 δ1 + α1 y2 + u1 (30)
y2 = z 1 δ21 + z 2 δ22 + v2 = zδ2 + v2 (31)
y1 = 1 [y1∗ >0] (32)
where the couple error terms (u1 , v2 ) has zero mean, bivariate normal
distribution and is independent of z. Equations (30) and (32) are the structural
equations. Equation (31) is the reduced form for y2 which is endogenous if u1
and v2 are correlated.
If u1 and v2 are independent, there is no endogeneity issue. Because v2 is
normally distributed, y2 |z is normal and thus, y2 has characteristics of a random
normal variable (for ex. y2 should not be discrete). The model is valid when y2 is
correlated with u1 because of omitted variables or measurement errors.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 27 / 42
Binary Outcomes Models

Two-step estimation by Rivers and Vuong (1988)


Most useful estimation procedure as it also leads to a simple test of endogeneity
of y2 . Under joint normality of (u1 , v2 ), with the normalization V (u1 ) = 1, we
can write:
u1 = θ1 v2 + e1 (33)
where θ1 = η1 /τ22 , η1 = cov (u1 , v2 ), τ22 = V (v2 ), and e1 is independent of z and
v2 (and therefore of y2 ). Because of joint normality of (u1 , v2 ), e1 is also normally
distributed with E (e1 ) = 0 and V (e1 ) = V (u1 ) − θ12 V (v2 ) = −η12 /τ22 = 1 − ρ21
where ρ = corr (u1 , v2 ), as V (u1 ) = 1. As a result, the latent model can be
rewritten:

y1∗ = z 1 δ1 + α1 y2 + θ1 v2 + e1 , (e1 |z, y2 , v2 ) ∼ N(0, 1 − ρ21 ) (34)

It follows that:
!
z 1 δ1 + α1 y2 + θ1 v2
P(y1 = 1|z, y2 , v2 ) = Φ p (35)
1 − ρ21

NB. Observe that if y2 were exogenous, ρ = 0.


Théophile T. Azomahou (CERDI) Février 20-28, 2020 28 / 42
Binary Outcomes Models

p
2
The probit estimation
p p estimates of δρ1 ≡ δ1 /( 1 − ρ1 ),
provides consistent
2 2
αρ1 ≡ α1 /( 1 − ρ1 ) and θρ1 ≡ θ1 /( 1 − ρ1 ). Since δ2 is unknown, we must
first estimate it. This leads to the two-step estimation algorithm:

Step 1: Run the OLS regression of y2 on z, save the residuals v̂2


Step 2: Run the probit of y1 on z 1 , y2 and v̂2 to get consistent estimates of
scaled parameters δρ1 , αρ1 and θρ1 .

Endogeneity test. The usual probit t statistic on v̂2 is a valid test of the null
hypothesis that y2 is exogenous.

H0 : θ1 = 0 (y2 exogenous) (36)


H1 : θ1 6= 0

Under H0 : θ1 = 0, e1 = u1 , and so the distribution of v2 plays no role under the


null. Therefore, the test of exogeneity is valid without assuming normality or
homoskedasticity of v2 . However, if y2 and u1 are correlated (endogeneity of y2 ),
normality of v2 is crucial.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 29 / 42


Binary Outcomes Models

Conditional Maximum Likelihood (CML)


Estimate the equations (30)-(32) using CML. The joint distribution of (y1 , y2 ) is
f (y1 , y2 |z) = f (y1 |y2 , z)f (y2 |z). Given that y2 |z ∼ N(zδ2 , τ22 ), the density
f (y2 |z) is easy to get. Moreover, since v2 = y2 − zδ2 and y1 = 1[y1∗ >0] ,
!
z 1 δ1 + α1 y2 + (ρ1 /τ2 )(y2 − zδ2 )
P(y1 = 1|y2 , z) = Φ p (37)
1 − ρ21
with ρ1 /τ2 = θ1 . By denoting ` the term inside Φ(·), we compute the likelihood
function:
   
y1 1−y1 1 y2 − zδ2
f (y1 , y2 |z) = {Φ(`)} {1 − Φ(`)} φ (38)
τ2 τ2
Taking the log of (38) provides the log-likelihood function to maximize.
Endogeneity test. Testing y2 is exogenous: H0 : ρ1 = 0. One can also use the
likelihood ratio test – reminder here LR test and its distribution –.
Remark: While CML forces discipline (with information use) and leads to correct
standard errors estimates, it can be cumbersome. On the other hand, Rivers and
Vuong (1988) is a limited information procedure.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 30 / 42
Binary Outcomes Models

Generalized Method of Moments (GMM)


Consider the model

y∗ = x 0i β + γωi + εi (39)
yi = 1[yi∗ >0] (40)
E (εi |ωi ) = g (ωi ) 6= 0 (41)

Thus, ωi is endogenous. The ML estimators considered earlier will not


consistently estimate (β, γ) without additional specification that allows to
formalize P(yi = 1|x i , ωi ).
Let’s denote ζi a relevant instrumental variable such that E (εi |ζi , x i ) = 0 and
E (εi ζi ) 6= 0. A natural instrumental variable estimator would be based on the
moment condition:
  
xi
E (yi∗ − x 0i β − γωi ) =0
ζi

However, yi∗ is not observed, but yi is. One approach used by Bertschek and
Lechner (1998) is to assume that ζi is orthogonal to the residual
[yi − Φ(x 0i β + γωi )].
Théophile T. Azomahou (CERDI) Février 20-28, 2020 31 / 42
Binary Outcomes Models

This leads to the moment condition:


  
0 xi
E (yi − Φ(x i β + γωi )) =0
ζi
and the usual two-step GMM estimator applies.

4.2.2 Binary endogenous explanatory variable


We now consider the case where the endogenous regressor is a dummy variable.
y1 = 1[z 1 δ1 +α1 y2 +u1 >0] (42)
y2 = 1[zδ2 +v2 >0] (43)
where (u1 , v2 ) is independent of z and bivariate normal distributed with zero
mean; each has unit variance and ρ1 = corr (u1 , v2 ). If ρ1 6= 1, then u1 and y2
are correlated and probit estimation of (42) is inconsistent for δ1 and α1 .
The likelihood function requires the joint distribution of (y1 , y2 ) obtained as
f (y1 , y2 |z) = f (y1 |y2 , z)f (y2 |z). To get P(y1 = 1|y2 , z), observe first that:
!
z 1 δ1 + α1 y2 + ρ1 v2
P(y1 = 1|v2 , z) = Φ p (44)
1 − ρ21

Théophile T. Azomahou (CERDI) Février 20-28, 2020 32 / 42


Binary Outcomes Models

One also needs features from truncated results. Indeed, if v2 has normal
distribution and is independent of z, then the density of v2 given v2 > −zδ2
(meaning y2 = 1) is:
φ(v2 ) φ(v2 )
=
P(v2 > −zδ2 ) Φ(zδ2 )
We can establish:
P11 (y1 = 1|y2 = 1, z) = E [P(y1 = 1|v2 , z)|y2 = 1, z]
" ! #
z 1 δ1 + α1 y2 + ρ1 v2
= E Φ p | y2 = 1, z (45)
1 − ρ21
Z ∞ !
1 z 1 δ1 + α1 y2 + ρ1 v2
= Φ p φ(v2 )dv2
Φ(zδ2 ) −zδ2 1 − ρ21
Observe that P(y1 = 0|y2 = 1, z) is 1 − P(y1 = 1|y2 = 1, z).
Similarly, we have:
!
Z −zδ2
1 z 1 δ1 + α1 y2 + ρ1 v2
P10 (y1 = 1|y2 = 0, z) = Φ p φ(v2 )dv2
1 − Φ(zδ2 ) −∞ 1 − ρ21
(46)
Théophile T. Azomahou (CERDI) Février 20-28, 2020 33 / 42
Binary Outcomes Models

Combining the four possible outcomes of (y1 , y2 ) in the probit specification for y2 ,
the log-likelihood function to maximize (non trivial task!) is:

N
X
ln L(δ1 , α1 , δ2 , ρ1 ) = [y1i y2i ln P11i + (1 − y1i ) ln P01i (47)
i=1
+ y1i (1 − y2i ) ln P10i + (1 − y1i )(1 − y2i ) ln P00i ]

where

P00i ≡ P(y1 = 0, y2 = 0 | z) = Φ2 (−z 1 δ1 , −zδ2 ; ρ1 )


P10i ≡ P(y1 = 1, y2 = 0 | z) = Φ(−zδ2 ) − P00 (48)
P01i ≡ P(y1 = 0, y2 = 1 | z) = Φ(−z 1 δ1 − α1 ) − Φ2 (−z 1 δ1 − α1 , −zδ2 ; ρ1 )
P11i ≡ P(y1 = 1, y2 = 1 | z) = 1 − P00 − P10 − P01

Théophile T. Azomahou (CERDI) Février 20-28, 2020 34 / 42


Binary Outcomes Models

5. Goodness of fit
Because the hypothesis that all slopes in the model are zero is often
interesting, report the log-likelihood computed with only a constant term
(ln L0 ), and the log-likelihood of the full specification (ln L).
McFadden’s (1974) likelihood ratio index:

ln L
R2 = 1 −
ln L0

Predicted outcomes vs predicted probabilities: the predictive ability of


the model is usually summarized in a 2 × 2 table of the hits and the misses
of a prediction rule such as:

ŷ = 1 if F̂ > F ∗

with the usual threshold value 0.5

Théophile T. Azomahou (CERDI) Février 20-28, 2020 35 / 42


Binary Outcomes Models

6. Semiparametric estimation
Estimation under weaker assumptions
Semiparametric conditional mean estimation

E (yi |x i ) = G (x i )

where G (x i ) is unknown. Kernel regression can do the job. See Figure (1).

Semiparametric ML estimation. Klein and Spady (1993) proposed a fully


efficient semiparametric estimator that maximizes:
N h
X i
L(yi , x i , β) = yi ln Ĝ (x 0i β) + (1 − yi ) ln(1 − Ĝ (x 0i β))
i=1

where Ĝ (x 0i β) is a nonparametric estimate of G (x 0i β).

Théophile T. Azomahou (CERDI) Février 20-28, 2020 36 / 42


Binary Outcomes Models

7. Exercise

Pour un n-échantillon indépendant de couples de variables aléatoires réelles


n
(Yi , Xi )i=1 , on considère le modèle dichotomique

Yi = 1[αXi +σui >0] ,

où α et σ sont deux paramètres réels et ui est une variable aléatoire indépendante


de Xi . On suppose de plus que la loi de ui est continue et symétrique par rapport
à zéro, l’espérance de ui . On note F et f , respectivement, la fonction de
répartition et la densité de la loi de ui .
1. Discutez l’identification du modèle. Dans la suite, on notera m le rapport
α/σ.
2. En quoi la réponse à la question précédente se modifie-t’elle si le modèle est
Yi = 1[αXi +σui >bi ] , où bi est un seuil connu variant d’un individu à l’autre.
Dans la suite, on considère le modèle initial où le seuil bi et nul pour tout i.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 37 / 42


Binary Outcomes Models

3. Ecrivez la vraisemblance L et montrez que le score défini par



S(Y |X ; m) = ∂m ln L(Y |X ; m)
n
X fi
S(Y |X ; m) = (Yi − Fi )Xi ,
Fi (1 − Fi )
i=1

où Fi et fi désignent les valeurs prises par F et f en mXi .


4. Vérifiez que l’espérance du score est nulle et calculer I (m), la variance du
score.
5. Déterminez l’estimateur du maximum de vraisemblance m̂n de m lorsque la
variable Xi est la constante. En particulier, donnez sous cette hypothèse,
l’expression de m̂n lorsque ui suit la loi normale centrée réduite, et lorsque ui
suit une loi logistique de fonction de répartition
1
G (u) = .
1 + exp(−u)

6. Discutez les propriétés de l’estimateur des moindres carrés ordinaires du


coefficient de Xi dans la régression linéaire de Y sur X .

Théophile T. Azomahou (CERDI) Février 20-28, 2020 38 / 42


Binary Outcomes Models

8. Application using STATA


The effect of informality on social protection
Question: scarring effect of the informal sector employment. Do past
experiences in the informal sector induce discrimination in terms of social
protection for workers, even if they currently transit to the formal sector? –
Effects of different transitions from the informal sector on social protection
offered by the employer.

Data: South African Labour force Survey (LFS) from 2001 to 2004. It is a
twice-yearly household panel survey, a nationwide survey that focuses on
labour market issues and collects also information on demographics
characteristics and education. We consider individuals aged 15 or more who
are being employed at the time of the survey.

Main indicators: Social protection and informality are measured as three


points ordinal scales: non covered, partially covered and fully covered for the
former, and fully informal, partially informal and formal for the latter. –
Individual characteristics: age, education, gender, occupation, marital status,
origin, etc.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 39 / 42
Binary Outcomes Models

A. Why the relationship between social protection and informality matters


for public policy?
In developing countries, the informal economy comprises more than half of
the total employment (La Porta and Andrei, 2014). The share of the
informal employment in non-agricultural activities is quite high in many
regions. In South and East Asia, the share of the informal employment
ranges from 42% to 84%. In Sub-Saharan Africa, it ranges from 32% to
82%.
One of the characteristics of the informal economy is the vulnerability of
workers. Most of the workers lack health insurance and pension protection
and they earn smaller wages compared to the formal sector.
The lack of social protection for workers in the informal sector may still
persist, even after they join the formal sector. Such situation is known as
labour market state dependence, whereby the probability to exit from a
certain state is related to your labor market history.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 40 / 42


Binary Outcomes Models

B. Definition of main variables


Social protection: Questions about whether their employer contributes to
unemployment insurance or health insurance. We consider employees to be
non-covered when they do not benefit from any of these contributions from
their employer. If they benefit from only one, they are partially covered. If
they benefit from both they are fully covered.

Informality: Regarding their level of informality, the criteria are based on


information about the place in which they work, in other words, the
informality is about the sector in which they are. The employees are asked
whether the place where they are working is registered and or is paying the
value added tax (VAT).

The employees are fully informal if the place in which they work does not
comply with any of these criteria. They are considered as partially informal
when the place in which they work complies with at least one of these
criteria. When both criteria are met, the employees are considered as formal.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 41 / 42


Binary Outcomes Models

C. Implement the study


Read the data set, generate new variables when needed
Produce descriptive statistics
Run estimation using logit vs. probit estimation
Compute marginal effects
Comment on results
Discuss endogeneity issues and address them if any

Théophile T. Azomahou (CERDI) Février 20-28, 2020 42 / 42

You might also like