0% found this document useful (0 votes)

61 views42 pages

Microeconometrie Chapitre1 BinaryOutcomeModels

The document discusses binary outcome models, including the logit and probit models. It introduces the Bernoulli model and explains why OLS regression is not suitable for binary dependent variables. Examples of binary choice models for fishing mode choice are provided. The chapter outline includes logit vs probit models, specification issues, and estimation methods.

Uploaded by

Astrela Mengue

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views42 pages

Microeconometrie Chapitre1 BinaryOutcomeModels

Uploaded by

Astrela Mengue

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Microéconométrie

Théophile T. Azomahou
University Clermont Auvergne, CNRS, CERDI
Maastricht University, School of Business and Economics
Email: [email protected]

PTCI PhD program

Campus Dakar, 20-28 Février 2020

Théophile T. Azomahou (CERDI) Février 20-28, 2020 1 / 42

Binary Outcomes Models

Chapter 1. Binary Outcomes Models

1. Introduction
Discrete outcome or qualitative response models or limited dependent
variables are models for a dependent variable that indicates in which one of
m mutually exclusive categories the outcome of interest falls. Usually, there
is no natural ordering of the categories.

Example: occupation choice of a worker, decision to purchase a good (car

of some color..), decision to participate/contribute in some activity; for a
firm, decision to innovate, etc.
This chapter: the simplest case of binary outcomes, where there are two
possible outcomes (YES/NO or 1/0).
Estimation: usually done by maximum likelihood because the distribution of
the data is necessarily defined by the Bernoulli model.
If the probability of one outcome equals p, then the probability of the other
outcome must be (1 − p).

Théophile T. Azomahou (CERDI) Février 20-28, 2020 2 / 42

Binary Outcomes Models

The Bernoulli Model

In probability theory and statistics, the Bernoulli distribution, named after
Swiss mathematician Jacob Bernoulli, is the discrete probability distribution
of a random variable which takes the value 1 with probability p, and the
value 0 with probability 1 − p; meaning, the probability distribution of any
single experiment that asks a yes-no question.
If Y is a random variable with Bernoulli distribution, then (by
definition):

P(Y = 1) = p P(Y = 0) = 1 − p (1)

Also (find out), the expected value and the variance of X are:

E (Y ) = p V (Y ) = p(1 − p) (2)

The two standard binary outcome models, the logit and the probit models,
specify different functional forms for this probability as a function of
regressors. The difference between these estimators is qualitatively similar to
use of different functional forms for the conditional mean in least-squares
regression.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 3 / 42

Binary Outcomes Models

Why OLS regresion cannot do the job in this context?

OLS regression (linear regression) is designed for continuous response
variable (taking on real values: y ∈ R)

Assume: we want to fit the model:

yi = α + βxi + ui i = 1, · · · , N
where
(
1 if individual i benefits from social protection
yi =
0 otherwise
and x denotes individual characteristics (income, occupation, age,
education, marital status, household characteristics, etc.)

OLS regression computes: E (yi |xi ) that ignores the discreteness of the
dependent variable and does not constrain predicted probabilities to be
between zero and one.

What we want to fit here: pi = P(yi = 1|xi ) with 0 ≤ pi ≤ 1. Logit

and Probit models do this job.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 4 / 42
Binary Outcomes Models

Example: Cameron and Trivedi (2005)

Fishing mode choice: from a charter boat vs. from a pier. y = 1 if

charter boat and 0 otherwise.

yi = α + ln(pricecharter .i /pricepier .i ) + εi

Table: Data summary

Variable Charter (y = 1) Pier (y = 0) Overall

Price charter ($) 75 110 85
Price pier ($) 121 31 95
ln (relative price) -0.264 1.643 0.275
Sample probability 0.717 0.283 1.000
Observations 452 178 630

Théophile T. Azomahou (CERDI) Février 20-28, 2020 5 / 42

Binary Outcomes Models

Table: Estimation results: Logit, Probit, OLS

Variable Logit Probit OLS

ln (relative price) -1.823 -1.056 -0.243
(-12.61) (-13.87) (-28.15)
Intercept 2.053 1.194 0.784
(12.15) (13.34) (65.58)
ln L -206.83 -204.41
Pseudo R 2 0.449 0.455 0.463
Observations 452 178 630

Théophile T. Azomahou (CERDI) Février 20-28, 2020 6 / 42

Binary Outcomes Models

Figure: Predicted probabilities (propensity score)

Théophile T. Azomahou (CERDI) Février 20-28, 2020 7 / 42

Binary Outcomes Models

Outline of the chapter

Logit vs. Probit models - Latent variables - Specification Issues - Estimation: ML,
Two-step, GMM, Semiparametric ML - Application
References
1 Amemiya, Takeshi (1985), Advanced Econometrics, Blackwell.

2 Cameron, A.C. and Trivedi, P.K. (2005) Microeconometrics: Methods and

Applications. Cambridge University Press.
3 Manski, Charles F., Daniel L. McFadden, Eds. (1981), Structural Analysis of
Discrete Data with Econometric Applications, MIT Press.
4 Greene, W.H. (2018) Econometric Analysis. Pearson.
5 Maddala, G.S. (1983). Limited-Dependent and Qualitative Variables in
Econometrics. Cambridge University Press, Cambridge, UK.
6 Stock, James H.; Watson, Mark W. (2003). Introduction to Econometrics.
Addison-Wesley, Boston.
7 Wooldridge, J.M. (2010). Econometric Analysis of Cross Section and Panel
Data. MIT Press, Cambridge.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 8 / 42
Binary Outcomes Models

2. Logit and Probit models

2.1 General binary outcome model

Assume: (
1 with probability p,
yi =
0 with probability 1 − p
A regression model is formed by parameterizing the probability p to depend on a
regressor vector x and a K × 1 parameter vector β. The commonly used models
are of single-index form with conditional probability given by:

pi ≡ P(yi = 1 | x) = F (x 0i β) (3)

where F (·) is a specified function. To ensure that 0 ≤ p ≤ 1, one has to make

sure that F (·) defines a cumulative distribution function (CDF).
Logit model: F (·) is the CDF of the logistic distribution
Probit model: F (·) is the standard normal CDF
Linear probability model: does not use a CDF and instead lets pi = x 0i β

Théophile T. Azomahou (CERDI) Février 20-28, 2020 9 / 42

Binary Outcomes Models

Plot of cumulative distribution function, CDF

1.00

0.75
F(bx)

0.50

0.25

0.00
230 220 210 0 10 20 30
b9x

lim P(Y = 1|x) = 1 lim P(Y = 1|x) = 0

xβ→+∞ xβ→−∞

Théophile T. Azomahou (CERDI) Février 20-28, 2020 10 / 42

Binary Outcomes Models

Table: Binary Outcome Data: Usual Models

Model Probability (p ≡ P[y = 1 | x]) Marginal effect (∂p/∂xk )

0
0 ex β
Logit Λ(x β) = 1+e x 0 β
Λ(x 0 β)[1 − Λ(x 0 β)]βk
0
R x 0β
Probit Φ(x β) = −∞ φ(z)dz φ(x 0 β)βk
Linear probability x 0β βk

where φ(z) and Φ(z) denote the normal density and the CDF of the normal
distribution respectively:
( 2 )
1 1 z − z̄
φ(z) = √ exp − , normal density
σ 2π 2 σ

1
f (z) = , Logistic density: of hyperbolic-secant-square, sech2
(1 + e −z )2

Théophile T. Azomahou (CERDI) Février 20-28, 2020 11 / 42

Binary Outcomes Models

2.2 Maximum Likelihood estimation (ML)

n

Reminder: for binomial distribution, P(X = k) = k p k (1 − p)n−k .

Consider a given sample (yi , x i ), i = 1, · · · , N, where we assume independence

over i. The outcome is Bernoulli distributed, the binomial distribution with just
one trial. A very convenient compact notation for the density of yi , or more
formally its probability mass function, is

f (yi |x i ) = piyi (1 − pi )1−yi , yi = 0, 1 (4)

where pi = F (x 0i β). This yields probabilities pi and (1 − pi ) since

f (1) = p 1 (1 − p)0 = p and f (0) = p 0 (1 − p)1 = 1 − p. The density (4) implies
that
ln f (yi ) = yi ln pi + (1 − yi ) ln(1 − pi ) (5)

Théophile T. Azomahou (CERDI) Février 20-28, 2020 12 / 42

Binary Outcomes Models

Given independence over i and model (3) for pi , the log-likelihood function is
N
X
L(yi , x i , β) = [yi ln F (x 0i β) + (1 − yi ) ln(1 − F (x 0i β))] (6)
i=1

Differentiating with respect to β, we have that the MLE β

b
ML solves

N
∂L(yi , x i , β) X yi 0 1 − yi 0
= Fi x i − Fi x i = 0 (7)
∂β Fi 1 − Fi
i=1

Converting to fractions with common denominator Fi (1 − Fi ) and simplifying

yields the ML first-order conditions:
N
X yi − F (x 0i β)
F 0 (x 0i β)x i = 0 (8)
F (x 0i β)(1 − F (x 0i β))
i=1

There is no explicit solution for β

b
ML and iterative (algorithm) procedures such as
Newton-Raphson are usually used and convergence is achieved provided that the
log-likelihood function is well specified and globally concave (first derivative
positive and second derivative negative). – Figure –
Théophile T. Azomahou (CERDI) Février 20-28, 2020 13 / 42
Binary Outcomes Models

2.3 Consistency and asymptotic distribution of β

b
ML

Consistency. β b
ML is consistent if the conditional density of y given x is correctly
specified (Bernoulli), meaning if pi ≡ F (x 0i β). For binary data, we have
E (y ) = 1 × p + 0 × (1 − p) = p. This implies that E (yi |x i ) = F (x 0i β). This also
implies that the left hand side of (8) has expected value zero, the essential
condition for consistency.

Asymptotic distribution. The general results from ML theory apply:

b ∼ N(β, Ω0 ), where Ω0 defines the information matrix: – reminder here –
β ML
−1
∂2L

Ω0 = −E (9)
∂β∂β 0
which is estimated by the asymptotic variance matrix

N
!−1
\ 1 X 1
V (β
b )=
ML 0 β̂)(1 − F (x 0 β̂))
F 0 (x 0i β̂)2 x i x 0i (10)
N F (x i i
i=1

Exercise 1: Compute (9) and derive (10)

Théophile T. Azomahou (CERDI) Février 20-28, 2020 14 / 42
Binary Outcomes Models

2.4 Marginal effects and Average Partial Effects (APE)

The marginal effect of change in a regressor on the conditional probability that
y = 1 is of interest. For general probability model and change in the jth regressor,
the marginal effect is:
For a continuous regressor

∂P(yi = 1 | x i )
= F 0 (x 0i β)βk (11)
∂xik
∂F (z)
where F 0 = ∂z

For a discrete regressor: the marginal effect is the difference in predicted

probabilities. In particular, in the case of a dichotomic regressor, the marginal
effect would be computed as:

P[y = 1|x −k , xk = 1] − P[y = 1|x −k , xk = 0] = F (x 0−k β −k + βk ) − F (x 0−k β −k )

(12)
where x −k denotes a vector with all regressors in x but x k , and β −k denote a
vector with all coefficients in β but β k .

Théophile T. Azomahou (CERDI) Février 20-28, 2020 15 / 42

Binary Outcomes Models

Remarks
The marginal effects differ with the point of evaluation x i as for any
nonlinear model, and differ with different choices of F (·). Usually the
marginal effect is evaluated at the sample means of the data.
When the marginal effect is evaluated at every observation and then the
sample average of individual marginal effects is used, we get the Average
Partial Effects (APE).
The sign of the coefficient gives the sign of the marginal effect, since
F (·) > 0.
Interpretation
i) Marginal effects are additive approximations of effects in non-additive
models. For example, a value of 0.07 means that the probability is
more likely to increase by 7 percentage point.
ii) The marginal effect tells by how many units the probability changes if
the explanatory variable changes by 1 unit. A probability is measured in
percentage and its units are percentage points.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 16 / 42

Binary Outcomes Models

2.5 Logit specification

The logit model or logistic regression model specifies

0
ex β
p = Λ(x 0 β) = (13)
1 + e x 0β
ez 1
where Λ(·) is the logistic CDF with Λ(z) = (1+e z ) = 1+e −z

The MLE FOC (8) turns to be

N
X
(yi − Λ(x 0i β))x i = 0 (14)
i=1

1
Λ(x 0i β̂) = ȳ the
P
Moreover, the average in-sample predicted probability N i
sample frequency of yi .

Exercise 2: Compute the asymptotic variance of β

b
ML for the logit
specification.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 17 / 42

Binary Outcomes Models

The marginal effects for the logit model can be fairly easily obtained from
the coefficients, since:
∂pi
= pi (1 − pi )βk (15)
∂xik
where pi = Λ(x 0i β). The marginal effect gives the increase in the probability
that y = 1 as xik changes.
Odds ratio or relative risk. A very common interpretation of the
coefficients is in terms of marginal effects on the odds ratio rather than on
the probability. For the logit model:
0
ex β p 0
p= x 0 β =⇒ = ex β (16)
1+e 1−p

p
ln = x 0β (17)
1−p
p/(1 − p) measures the probability that y = 1 relative to the probability that
y = 0 and is called odds ratio or relative risk.
Example: Odds ratios are multiplicative effects. A odd ratio of 1.07
corresponds to a, increase of 7%. A logit model slope parameter of 0.1
means that a one-unit increase in the regressor increases the odds ratio by a
multiple 0.1
Théophile T. Azomahou (CERDI) Février 20-28, 2020 18 / 42
Binary Outcomes Models

2.6 Probit specification

The probit model specifies the conditional probability
Z x 0β
p = Φ(x 0 β) = φ(z)dz (18)
−∞

where Φ(·) denotes the standard normal CDF, with derivative

2
1 −z
φ(z) = √ exp (19)
2π 2
which is the standard normal density function. The MLE FOC (8) are:
N
X
wi (yi − Φ(x 0i β))x i = 0 (20)
i=1

where unlike the logit, the weight wi = φ(x 0i β)/[Φ(x 0i β)(1 − Φ(x 0i β))] varies
across observations. The probit model marginal effects are
∂pi
= φ(x 0i β)βk (21)
∂xik
without no further simplification. – Exercise 3: Compute the asymptotic
variance of βb
ML for the probit specification.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 19 / 42
Binary Outcomes Models

2.7 Remarks on OLS estimation

A simple alternative to either logit or probit is OLS regression of y on x.

This has the obvious deficiency that it is possible to obtain predicted
probabilities x 0i β̂ that are negative or that exceed one.
The OLS estimator is nonetheless useful as an exploratory tool. In practice it
provides a reasonable direct estimate of the sample-average marginal effect
on the probability that y = 1 as x changes, even though it provides a poor
model for individual probabilities. In practice it provides a good guide to
which variables are statistically significant.
If the OLS estimator is used then standard errors should correct for
heteroskedasticity. Linear regression is justified if the probability pi = x 0i β.
Then y |x i has mean x 0i β and heteroskedastic variance x 0i β(1 − x 0i β) that
varies with x 0i . However, it is very unlikely that p = since then p = x 0i β is
not restricted to be between 0 and 1.
Although OLS estimation with heteroskedastic standard errors can be a
useful exploratory data analysis tool, it is best to use the logit or probit MLE
for final data analysis.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 20 / 42

Binary Outcomes Models

2.8 Choosing a binary model: logit or probit?

The logit model has a relatively simple form for the first-order conditions
and interpretation of coefficients in terms of the log-odds ratio.
The probit model, in contrast, has the attraction of being motivated by a
latent normal random variable. For these reasons many economists use the
probit model.
The different models do yield quite different estimates βb of regression
parameters. However, this is just an artifact of using different CDF.
Logit and probit models have similar shapes for central values of F (·) but
differ in tails as F (·) approaches 0 or 1.
It is more meaningful to compare the marginal effect across models, as this
measure is scaled similarly across the three models.
∂p ∂p ∂p
For logit: ∂x k
= 0.25β̂k ; For Probit: ∂xk
= 0.4β̂k ; For OLS: ∂xk
= β̂k . This
suggests comparison of coef. across models using the conversion factor:
β
b ' 4β
logit
b
ols (22)
β
b
probit ' 2.5β
b
ols (23)
β
b
logit ' 1.6β
b
probit (24)
Théophile T. Azomahou (CERDI) Février 20-28, 2020 21 / 42
Binary Outcomes Models

3. Latent variable models

A latent variable is a variable that is incompletely observed (or which one only
observes the sign), say y ∗ . Latent variable models do provide extensions to
multinomial outcomes and censored outcomes.

Assume an observed continuous latent random variable y ∗ , but all we observe is

the binary variable y , which takes value 1 or 0 according to whether or not y ∗
crosses a threshold. Different distributions for y ∗ lead to different binary outcome
models.

The regression model for y ∗ is the index function model:

y ∗ = x 0β + u (25)

However, this model cannot be estimated as y ∗ is not observed. Instead, we

observe (
1 if y ∗ > 0
y= (26)
0 if y ∗ ≤ 0
where the threshold of zero is a normalization explained in the following.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 22 / 42
Binary Outcomes Models

Equations (25) and (26) can be rewritten in a more compact way,

y∗ = x 0β + u (27)
y = 1[y ∗ >0]

where 1[·] denotes the indicator function which is 1 if the condition is fulfilled and
0 if not. Given (27),

P(y = 1 | x) = P(y ∗ > 0)

= P(x 0 β + u > 0)
= P(−u < x 0 β)
= F (x 0 β)

where F is the CDF of −u, which equals the CDF of u in the usual case of
density symmetric around 0.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 23 / 42

Binary Outcomes Models

Remarks on the latent model

Identification of the single-index model requires a restriction on the variance

of u, as the single-index model can only identify β up to scale. All that is
observed is whether or not y ∗ > 0, or equivalently whether or not
x 0 β + u > 0. However, this is equivalent to whether or not x 0 β + + u + > 0,
where β + = aβ and u + = au for any a > 0. Placing a restriction on the
variance of the error (u or u + ) secures uniqueness of β. The error variance is
set to one in the probit model and π 2 /3 in the logit model.
The index function model implies a direct interpretation of β as the change
in the latent variable y ∗ when x changes by one unit. Even though y ∗ is
unobserved, this interpretation is meaningful if one uses knowledge of the
specified variance of u. For example, a slope parameter of 0.5 in the probit
model means a one-unit change in the regressor leads to a 0.5 standard
deviation change in y ∗ , since in this model the variance of y ∗ equals 1.
Commonly used extensions of the index function approach are to ordered
discrete choice models and to models for censored and selected samples.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 24 / 42

Binary Outcomes Models

4. Specification issues in binary response models

4.1 Neglected heterogeneity
Neglected heterogeneity problem stands for the consequences of omitted
variables when those variables are independent of the included control variables.
Assume the latent model:
y ∗ = xβ + γc + u (28)
where y = 1[y ∗ >0] . We assume u|x, c ∼ N(0, 1). Suppose c ⊥ x and
c ∼ N(0, τ 2 ). Under these assumptions, the composite term γc + u is
independent of x and is normally distributed such that γc + u ∼ N(0, γ 2 τ 2 + 1).
Therefore
xβ
P(y = 1) = P(γc + u > −xβ | x) = Φ (29)
σ
p
where: σ = γ 2 τ 2 + 1. From (29), it follows that a probit estimation
consistently estimates β/σ. So, if β̂ is a probit estimator of y on x, then
β̂ = β/σ. Because σ > 1 (unless γ = 0, or τ 2 = 0), |β/σ| < |β|.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 25 / 42

Binary Outcomes Models

Remarks on neglected heterogeneity

The statement that in probit models, neglected heterogeneity is a much

serious problem (leading to inconsistent estimates) than in linear models
(even if the omitted heterogeneity in independent of x), should be taken
with more cautious.
In fact, we just show that probit of y on x consistently estimates β/σ rather
than β. So the statement is technically correct. However, given that we are
more interested in estimating marginal effects (and then directions of the
effects) and not just the parameters, estimating β/σ is just as good as
estimating β.

If c is correlated with x or is otherwise dependent on x (for example,

V (c|x) depends on x, then the omission of c poses a serious problem. But
this is already well known from linear regression models with omitted
controls that are correlated with x.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 26 / 42

Binary Outcomes Models

4.2 Endogeneity
4.2.1 Continuous endogenous explanatory variable
We assume that at least one continuous explanatory variable is correlated with
the error term in the latent equation. One option can be to estimate a linear
probability model by 2SLS (with instrumental variables, IV).
Let’s rather consider the case of estimating a latent probit model:
y1∗ = z 1 δ1 + α1 y2 + u1 (30)
y2 = z 1 δ21 + z 2 δ22 + v2 = zδ2 + v2 (31)
y1 = 1 [y1∗ >0] (32)
where the couple error terms (u1 , v2 ) has zero mean, bivariate normal
distribution and is independent of z. Equations (30) and (32) are the structural
equations. Equation (31) is the reduced form for y2 which is endogenous if u1
and v2 are correlated.
If u1 and v2 are independent, there is no endogeneity issue. Because v2 is
normally distributed, y2 |z is normal and thus, y2 has characteristics of a random
normal variable (for ex. y2 should not be discrete). The model is valid when y2 is
correlated with u1 because of omitted variables or measurement errors.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 27 / 42
Binary Outcomes Models

Two-step estimation by Rivers and Vuong (1988)

Most useful estimation procedure as it also leads to a simple test of endogeneity
of y2 . Under joint normality of (u1 , v2 ), with the normalization V (u1 ) = 1, we
can write:
u1 = θ1 v2 + e1 (33)
where θ1 = η1 /τ22 , η1 = cov (u1 , v2 ), τ22 = V (v2 ), and e1 is independent of z and
v2 (and therefore of y2 ). Because of joint normality of (u1 , v2 ), e1 is also normally
distributed with E (e1 ) = 0 and V (e1 ) = V (u1 ) − θ12 V (v2 ) = −η12 /τ22 = 1 − ρ21
where ρ = corr (u1 , v2 ), as V (u1 ) = 1. As a result, the latent model can be
rewritten:

y1∗ = z 1 δ1 + α1 y2 + θ1 v2 + e1 , (e1 |z, y2 , v2 ) ∼ N(0, 1 − ρ21 ) (34)

It follows that:
!
z 1 δ1 + α1 y2 + θ1 v2
P(y1 = 1|z, y2 , v2 ) = Φ p (35)
1 − ρ21

NB. Observe that if y2 were exogenous, ρ = 0.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 28 / 42
Binary Outcomes Models

p
2
The probit estimation
p p estimates of δρ1 ≡ δ1 /( 1 − ρ1 ),
provides consistent
2 2
αρ1 ≡ α1 /( 1 − ρ1 ) and θρ1 ≡ θ1 /( 1 − ρ1 ). Since δ2 is unknown, we must
first estimate it. This leads to the two-step estimation algorithm:

Step 1: Run the OLS regression of y2 on z, save the residuals v̂2

Step 2: Run the probit of y1 on z 1 , y2 and v̂2 to get consistent estimates of
scaled parameters δρ1 , αρ1 and θρ1 .

Endogeneity test. The usual probit t statistic on v̂2 is a valid test of the null
hypothesis that y2 is exogenous.

H0 : θ1 = 0 (y2 exogenous) (36)

H1 : θ1 6= 0

Under H0 : θ1 = 0, e1 = u1 , and so the distribution of v2 plays no role under the

null. Therefore, the test of exogeneity is valid without assuming normality or
homoskedasticity of v2 . However, if y2 and u1 are correlated (endogeneity of y2 ),
normality of v2 is crucial.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 29 / 42

Binary Outcomes Models

Conditional Maximum Likelihood (CML)

Estimate the equations (30)-(32) using CML. The joint distribution of (y1 , y2 ) is
f (y1 , y2 |z) = f (y1 |y2 , z)f (y2 |z). Given that y2 |z ∼ N(zδ2 , τ22 ), the density
f (y2 |z) is easy to get. Moreover, since v2 = y2 − zδ2 and y1 = 1[y1∗ >0] ,
!
z 1 δ1 + α1 y2 + (ρ1 /τ2 )(y2 − zδ2 )
P(y1 = 1|y2 , z) = Φ p (37)
1 − ρ21
with ρ1 /τ2 = θ1 . By denoting ` the term inside Φ(·), we compute the likelihood
function:

y1 1−y1 1 y2 − zδ2
f (y1 , y2 |z) = {Φ(`)} {1 − Φ(`)} φ (38)
τ2 τ2
Taking the log of (38) provides the log-likelihood function to maximize.
Endogeneity test. Testing y2 is exogenous: H0 : ρ1 = 0. One can also use the
likelihood ratio test – reminder here LR test and its distribution –.
Remark: While CML forces discipline (with information use) and leads to correct
standard errors estimates, it can be cumbersome. On the other hand, Rivers and
Vuong (1988) is a limited information procedure.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 30 / 42
Binary Outcomes Models

Generalized Method of Moments (GMM)

Consider the model

y∗ = x 0i β + γωi + εi (39)
yi = 1[yi∗ >0] (40)
E (εi |ωi ) = g (ωi ) 6= 0 (41)

Thus, ωi is endogenous. The ML estimators considered earlier will not

consistently estimate (β, γ) without additional specification that allows to
formalize P(yi = 1|x i , ωi ).
Let’s denote ζi a relevant instrumental variable such that E (εi |ζi , x i ) = 0 and
E (εi ζi ) 6= 0. A natural instrumental variable estimator would be based on the
moment condition:

xi
E (yi∗ − x 0i β − γωi ) =0
ζi

However, yi∗ is not observed, but yi is. One approach used by Bertschek and
Lechner (1998) is to assume that ζi is orthogonal to the residual
[yi − Φ(x 0i β + γωi )].
Théophile T. Azomahou (CERDI) Février 20-28, 2020 31 / 42
Binary Outcomes Models

This leads to the moment condition:

0 xi
E (yi − Φ(x i β + γωi )) =0
ζi
and the usual two-step GMM estimator applies.

4.2.2 Binary endogenous explanatory variable

We now consider the case where the endogenous regressor is a dummy variable.
y1 = 1[z 1 δ1 +α1 y2 +u1 >0] (42)
y2 = 1[zδ2 +v2 >0] (43)
where (u1 , v2 ) is independent of z and bivariate normal distributed with zero
mean; each has unit variance and ρ1 = corr (u1 , v2 ). If ρ1 6= 1, then u1 and y2
are correlated and probit estimation of (42) is inconsistent for δ1 and α1 .
The likelihood function requires the joint distribution of (y1 , y2 ) obtained as
f (y1 , y2 |z) = f (y1 |y2 , z)f (y2 |z). To get P(y1 = 1|y2 , z), observe first that:
!
z 1 δ1 + α1 y2 + ρ1 v2
P(y1 = 1|v2 , z) = Φ p (44)
1 − ρ21

Théophile T. Azomahou (CERDI) Février 20-28, 2020 32 / 42

Binary Outcomes Models

One also needs features from truncated results. Indeed, if v2 has normal
distribution and is independent of z, then the density of v2 given v2 > −zδ2
(meaning y2 = 1) is:
φ(v2 ) φ(v2 )
=
P(v2 > −zδ2 ) Φ(zδ2 )
We can establish:
P11 (y1 = 1|y2 = 1, z) = E [P(y1 = 1|v2 , z)|y2 = 1, z]
" ! #
z 1 δ1 + α1 y2 + ρ1 v2
= E Φ p | y2 = 1, z (45)
1 − ρ21
Z ∞ !
1 z 1 δ1 + α1 y2 + ρ1 v2
= Φ p φ(v2 )dv2
Φ(zδ2 ) −zδ2 1 − ρ21
Observe that P(y1 = 0|y2 = 1, z) is 1 − P(y1 = 1|y2 = 1, z).
Similarly, we have:
!
Z −zδ2
1 z 1 δ1 + α1 y2 + ρ1 v2
P10 (y1 = 1|y2 = 0, z) = Φ p φ(v2 )dv2
1 − Φ(zδ2 ) −∞ 1 − ρ21
(46)
Théophile T. Azomahou (CERDI) Février 20-28, 2020 33 / 42
Binary Outcomes Models

Combining the four possible outcomes of (y1 , y2 ) in the probit specification for y2 ,
the log-likelihood function to maximize (non trivial task!) is:

N
X
ln L(δ1 , α1 , δ2 , ρ1 ) = [y1i y2i ln P11i + (1 − y1i ) ln P01i (47)
i=1
+ y1i (1 − y2i ) ln P10i + (1 − y1i )(1 − y2i ) ln P00i ]

where

P00i ≡ P(y1 = 0, y2 = 0 | z) = Φ2 (−z 1 δ1 , −zδ2 ; ρ1 )

P10i ≡ P(y1 = 1, y2 = 0 | z) = Φ(−zδ2 ) − P00 (48)
P01i ≡ P(y1 = 0, y2 = 1 | z) = Φ(−z 1 δ1 − α1 ) − Φ2 (−z 1 δ1 − α1 , −zδ2 ; ρ1 )
P11i ≡ P(y1 = 1, y2 = 1 | z) = 1 − P00 − P10 − P01

Théophile T. Azomahou (CERDI) Février 20-28, 2020 34 / 42

Binary Outcomes Models

5. Goodness of fit
Because the hypothesis that all slopes in the model are zero is often
interesting, report the log-likelihood computed with only a constant term
(ln L0 ), and the log-likelihood of the full specification (ln L).
McFadden’s (1974) likelihood ratio index:

ln L
R2 = 1 −
ln L0

Predicted outcomes vs predicted probabilities: the predictive ability of

the model is usually summarized in a 2 × 2 table of the hits and the misses
of a prediction rule such as:

ŷ = 1 if F̂ > F ∗

with the usual threshold value 0.5

Théophile T. Azomahou (CERDI) Février 20-28, 2020 35 / 42

Binary Outcomes Models

6. Semiparametric estimation
Estimation under weaker assumptions
Semiparametric conditional mean estimation

E (yi |x i ) = G (x i )

where G (x i ) is unknown. Kernel regression can do the job. See Figure (1).

Semiparametric ML estimation. Klein and Spady (1993) proposed a fully

efficient semiparametric estimator that maximizes:
N h
X i
L(yi , x i , β) = yi ln Ĝ (x 0i β) + (1 − yi ) ln(1 − Ĝ (x 0i β))
i=1

where Ĝ (x 0i β) is a nonparametric estimate of G (x 0i β).

Théophile T. Azomahou (CERDI) Février 20-28, 2020 36 / 42

Binary Outcomes Models

7. Exercise

Pour un n-échantillon indépendant de couples de variables aléatoires réelles

n
(Yi , Xi )i=1 , on considère le modèle dichotomique

Yi = 1[αXi +σui >0] ,

où α et σ sont deux paramètres réels et ui est une variable aléatoire indépendante

de Xi . On suppose de plus que la loi de ui est continue et symétrique par rapport
à zéro, l’espérance de ui . On note F et f , respectivement, la fonction de
répartition et la densité de la loi de ui .
1. Discutez l’identification du modèle. Dans la suite, on notera m le rapport
α/σ.
2. En quoi la réponse à la question précédente se modifie-t’elle si le modèle est
Yi = 1[αXi +σui >bi ] , où bi est un seuil connu variant d’un individu à l’autre.
Dans la suite, on considère le modèle initial où le seuil bi et nul pour tout i.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 37 / 42

Binary Outcomes Models

3. Ecrivez la vraisemblance L et montrez que le score défini par

∂
S(Y |X ; m) = ∂m ln L(Y |X ; m)
n
X fi
S(Y |X ; m) = (Yi − Fi )Xi ,
Fi (1 − Fi )
i=1

où Fi et fi désignent les valeurs prises par F et f en mXi .

4. Vérifiez que l’espérance du score est nulle et calculer I (m), la variance du
score.
5. Déterminez l’estimateur du maximum de vraisemblance m̂n de m lorsque la
variable Xi est la constante. En particulier, donnez sous cette hypothèse,
l’expression de m̂n lorsque ui suit la loi normale centrée réduite, et lorsque ui
suit une loi logistique de fonction de répartition
1
G (u) = .
1 + exp(−u)

6. Discutez les propriétés de l’estimateur des moindres carrés ordinaires du

coefficient de Xi dans la régression linéaire de Y sur X .

Théophile T. Azomahou (CERDI) Février 20-28, 2020 38 / 42

Binary Outcomes Models

8. Application using STATA

The effect of informality on social protection
Question: scarring effect of the informal sector employment. Do past
experiences in the informal sector induce discrimination in terms of social
protection for workers, even if they currently transit to the formal sector? –
Effects of different transitions from the informal sector on social protection
offered by the employer.

Data: South African Labour force Survey (LFS) from 2001 to 2004. It is a
twice-yearly household panel survey, a nationwide survey that focuses on
labour market issues and collects also information on demographics
characteristics and education. We consider individuals aged 15 or more who
are being employed at the time of the survey.

Main indicators: Social protection and informality are measured as three

points ordinal scales: non covered, partially covered and fully covered for the
former, and fully informal, partially informal and formal for the latter. –
Individual characteristics: age, education, gender, occupation, marital status,
origin, etc.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 39 / 42
Binary Outcomes Models

A. Why the relationship between social protection and informality matters

for public policy?
In developing countries, the informal economy comprises more than half of
the total employment (La Porta and Andrei, 2014). The share of the
informal employment in non-agricultural activities is quite high in many
regions. In South and East Asia, the share of the informal employment
ranges from 42% to 84%. In Sub-Saharan Africa, it ranges from 32% to
82%.
One of the characteristics of the informal economy is the vulnerability of
workers. Most of the workers lack health insurance and pension protection
and they earn smaller wages compared to the formal sector.
The lack of social protection for workers in the informal sector may still
persist, even after they join the formal sector. Such situation is known as
labour market state dependence, whereby the probability to exit from a
certain state is related to your labor market history.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 40 / 42

Binary Outcomes Models

B. Definition of main variables

Social protection: Questions about whether their employer contributes to
unemployment insurance or health insurance. We consider employees to be
non-covered when they do not benefit from any of these contributions from
their employer. If they benefit from only one, they are partially covered. If
they benefit from both they are fully covered.

Informality: Regarding their level of informality, the criteria are based on

information about the place in which they work, in other words, the
informality is about the sector in which they are. The employees are asked
whether the place where they are working is registered and or is paying the
value added tax (VAT).

The employees are fully informal if the place in which they work does not
comply with any of these criteria. They are considered as partially informal
when the place in which they work complies with at least one of these
criteria. When both criteria are met, the employees are considered as formal.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 41 / 42

Binary Outcomes Models

C. Implement the study

Read the data set, generate new variables when needed
Produce descriptive statistics
Run estimation using logit vs. probit estimation
Compute marginal effects
Comment on results
Discuss endogeneity issues and address them if any

Théophile T. Azomahou (CERDI) Février 20-28, 2020 42 / 42

Dummy Dependent Variable
100% (1)
Dummy Dependent Variable
58 pages
Study Guide For Iassc Certified Lean Six Sigma Green Belt (Icgb) Certification Exam
No ratings yet
Study Guide For Iassc Certified Lean Six Sigma Green Belt (Icgb) Certification Exam
11 pages
Business Statistics Formula - Sheet
100% (2)
Business Statistics Formula - Sheet
7 pages
4a. LPM-Logit-Probit-Tobit Model - IInd Sem 23-24
No ratings yet
4a. LPM-Logit-Probit-Tobit Model - IInd Sem 23-24
130 pages
Statistics & Probability Q4 - Week 7-8
No ratings yet
Statistics & Probability Q4 - Week 7-8
15 pages
Statistics and Probability
100% (1)
Statistics and Probability
27 pages
Module 1 - Statistical Process Control PDF
No ratings yet
Module 1 - Statistical Process Control PDF
37 pages
Qualitative Response Regression Model - Probabilistic Models
No ratings yet
Qualitative Response Regression Model - Probabilistic Models
34 pages
Best Document For Smart PLS
No ratings yet
Best Document For Smart PLS
8 pages
IB Questionbank Mathematics Higher Level 3rd Edition 1
No ratings yet
IB Questionbank Mathematics Higher Level 3rd Edition 1
7 pages
Lean Six Sigma Guide Step 1
No ratings yet
Lean Six Sigma Guide Step 1
25 pages
Lecture 8
No ratings yet
Lecture 8
22 pages
Hypothesis Tests About The Mean and Proportion: Prem Mann, Introductory Statistics, 9/E
No ratings yet
Hypothesis Tests About The Mean and Proportion: Prem Mann, Introductory Statistics, 9/E
126 pages
Statistics 244 - Binary Response Regression, and Related Issues
100% (1)
Statistics 244 - Binary Response Regression, and Related Issues
30 pages
Qualitative Response Regression Models 1
No ratings yet
Qualitative Response Regression Models 1
29 pages
In All The Regression Models That We Have Considered So
100% (1)
In All The Regression Models That We Have Considered So
52 pages
3logistic Regression
No ratings yet
3logistic Regression
61 pages
GLM Slides 6 Binary Response Print
No ratings yet
GLM Slides 6 Binary Response Print
55 pages
Limited Dependent Variables
No ratings yet
Limited Dependent Variables
34 pages
EC501 Lecture 04
No ratings yet
EC501 Lecture 04
30 pages
Unitb - II - Linear Probability, Logit and Probit
No ratings yet
Unitb - II - Linear Probability, Logit and Probit
34 pages
Part III - Analysis With NonLinear Models
No ratings yet
Part III - Analysis With NonLinear Models
68 pages
Internship Report
No ratings yet
Internship Report
9 pages
Dougherty5e Studyguide ch11
No ratings yet
Dougherty5e Studyguide ch11
21 pages
Binary
No ratings yet
Binary
47 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
LPM, Logit and Probit Models
No ratings yet
LPM, Logit and Probit Models
21 pages
Week 6 Mle Perraillon 0
No ratings yet
Week 6 Mle Perraillon 0
69 pages
Multivariate Regression Model - Lecture Notes
No ratings yet
Multivariate Regression Model - Lecture Notes
17 pages
Qualitative Response Models
No ratings yet
Qualitative Response Models
35 pages
Data Analytics and Visualization Previous Year Questions
No ratings yet
Data Analytics and Visualization Previous Year Questions
4 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Discrete Choice Model Soderbom
No ratings yet
Discrete Choice Model Soderbom
43 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
UnivariateRegression 3
No ratings yet
UnivariateRegression 3
81 pages
Week 6 Mle
No ratings yet
Week 6 Mle
41 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
5.practice Problems On Hypotheses Testing-One Sample
No ratings yet
5.practice Problems On Hypotheses Testing-One Sample
2 pages
Probit Logit Models
No ratings yet
Probit Logit Models
26 pages
Cap1 Slides
No ratings yet
Cap1 Slides
30 pages
Econometrics
No ratings yet
Econometrics
37 pages
Chapter 5 MGT
No ratings yet
Chapter 5 MGT
60 pages
Lecture 8
No ratings yet
Lecture 8
39 pages
Econometrics 2 Module 5 Video 2 Canvas
No ratings yet
Econometrics 2 Module 5 Video 2 Canvas
13 pages
7 Binaryresponsemf
No ratings yet
7 Binaryresponsemf
11 pages
3.handouts Binary Dependent Variables
No ratings yet
3.handouts Binary Dependent Variables
8 pages
09 Discrete Choice 1 Notes
No ratings yet
09 Discrete Choice 1 Notes
17 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
Metrikaq
No ratings yet
Metrikaq
11 pages
26GeneralizedLinearModelBernoulliAnnotated PDF
No ratings yet
26GeneralizedLinearModelBernoulliAnnotated PDF
46 pages
Studi Kasus Pengaruh-Motivasi-Kerja-Terhadap-Kinerja - 2 PDF
No ratings yet
Studi Kasus Pengaruh-Motivasi-Kerja-Terhadap-Kinerja - 2 PDF
14 pages
411 Note LDV
No ratings yet
411 Note LDV
12 pages
Qualitative Response Regression Questions
No ratings yet
Qualitative Response Regression Questions
10 pages
Logit and Probit: Models With Discrete Dependent Variables
No ratings yet
Logit and Probit: Models With Discrete Dependent Variables
30 pages
Business Research Chapter 4
No ratings yet
Business Research Chapter 4
62 pages
10 Dichotomous or Binary Responses
No ratings yet
10 Dichotomous or Binary Responses
74 pages
Notes 13
No ratings yet
Notes 13
18 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Regression With A Binary Dependent Variable
No ratings yet
Regression With A Binary Dependent Variable
63 pages
Econometria Avanzada: Generalized Linear Models
No ratings yet
Econometria Avanzada: Generalized Linear Models
30 pages
Econometrics - Qualitative Response Models
No ratings yet
Econometrics - Qualitative Response Models
17 pages
Data Mining Project
No ratings yet
Data Mining Project
5 pages
CH 5. Discrete Choice Model
No ratings yet
CH 5. Discrete Choice Model
38 pages
Econometrics Eviews 6
No ratings yet
Econometrics Eviews 6
12 pages
Discriminant Analysis
0% (1)
Discriminant Analysis
16 pages
7 - Monte-Carlo-Simulation With XL STAT - English Guideline
No ratings yet
7 - Monte-Carlo-Simulation With XL STAT - English Guideline
8 pages
Msfe Week9
No ratings yet
Msfe Week9
5 pages
Limited Dependent Variables
No ratings yet
Limited Dependent Variables
17 pages
Regression 101
No ratings yet
Regression 101
18 pages
Steps in SPSS To Find Correlation Matrix and Partial Correlation
No ratings yet
Steps in SPSS To Find Correlation Matrix and Partial Correlation
33 pages
Lecture15 Binary Dependent Variables
No ratings yet
Lecture15 Binary Dependent Variables
38 pages
Applied Management Statistics
No ratings yet
Applied Management Statistics
25 pages
Topic 3: Qualitative Response Regression Models
No ratings yet
Topic 3: Qualitative Response Regression Models
29 pages
Binaryresponsemf IMP
No ratings yet
Binaryresponsemf IMP
11 pages
Arch Garch Assignment
No ratings yet
Arch Garch Assignment
5 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Section 9 Limited Dependent Variables
No ratings yet
Section 9 Limited Dependent Variables
17 pages
Section 11 PDF
No ratings yet
Section 11 PDF
7 pages
Seminar Econometrie
No ratings yet
Seminar Econometrie
15 pages
Gebhardt+Gebhardt-Bayesian Methods in Geostatistics
No ratings yet
Gebhardt+Gebhardt-Bayesian Methods in Geostatistics
2 pages
ECON F213 Mathstat
No ratings yet
ECON F213 Mathstat
3 pages
P9120 Syllabus
No ratings yet
P9120 Syllabus
5 pages
Econometrics Lecture Notes
No ratings yet
Econometrics Lecture Notes
16 pages
Data Set JAMOVI - ALUBA
No ratings yet
Data Set JAMOVI - ALUBA
6 pages
FinalExam Manguera IdrisJeffrey
No ratings yet
FinalExam Manguera IdrisJeffrey
6 pages
T-Test: T-TEST PAIRS Seblm WITH Sesdh (PAIRED) /CRITERIA CI (.9500) /missing Analysis
No ratings yet
T-Test: T-TEST PAIRS Seblm WITH Sesdh (PAIRED) /CRITERIA CI (.9500) /missing Analysis
3 pages
Chapter4 Lecture4
No ratings yet
Chapter4 Lecture4
12 pages
Ema Riboon
No ratings yet
Ema Riboon
1 page
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet