0% found this document useful (0 votes)

10 views47 pages

Binary

The document discusses econometric methods for analyzing binary data, focusing on modelization, estimation, and identification of parameters. It highlights the limitations of linear models for binary dependent variables and introduces generalized linear models (GLMs) as a solution. Key concepts include latent variables, the probit and logit models, and the use of maximum likelihood estimation for parameter estimation.

Uploaded by

Elisée AMEWOUAME

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views47 pages

Binary

Uploaded by

Elisée AMEWOUAME

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Econometrics 2

4. Binary data methods

Laurent Davezies & Elia Lapenta

ENSAE, 2024/2025

1 / 47
Outline

Introduction

Modelization and parameters of interest

Identification and estimation

Model quality, selection of variables

Linear probability models

Potential Outcomes

Application

2 / 47
Motivation

▶ The objective is to explain a binary variable Y by

X = (1, X1 , ..., XK −1 )′ ∈ RK .
▶ The two possible values that Y can take are arbitrary, we assume
without loss of generalization that Y ∈ {0, 1}.
▶ There are many contexts where the dependent variable Y is binary:
▶ In micro-economics: participant or non-participant in the labor
market, employment vs unemployment, purchase or non-purchase of
a durable consumption good.
▶ In credit risk analysis: borrower defaults or not.
▶ In assurance: insurance claim or not.
▶ In biostatistics: individual recovers from a disease or remains ill,
medical treatment is effective or not.
▶ In social sciences: graduation or not, living in couple or single, vote
or abstention, etc.

3 / 47
General ideas

▶ Linear models are not well adapted to analyze binary dependent

variables ...
▶ ... But we will assume that a latent (unobserved) variable Y ∗ is
linear in the parameters and determine how Y is defined through
this latent variable.
▶ Since the resulting models for Y are non-linear, it is crucial to think
carefully about what are the parameters of interest.
▶ The fact that Y ∗ is not observed implies that certain parameters
need to be normalized.
▶ Parameters are estimated using the maximum likelihood method.
▶ Some of these ideas reappear in chapter 5.

4 / 47
Outline

Introduction

Modelization and parameters of interest

Identification and estimation

Model quality, selection of variables

Linear probability models

Potential Outcomes

Application

5 / 47
Modelization
▶ Linear models are not well suited to analyze binary dependent
variables. Indeed, if Y ∈ {0, 1}, then
E (Y |X ) = P(Y = 1|X ) ∈ [0, 1]. (1)
In a linear model (under the exogeneity assumption E (ε|X ) = 0), it
follows that E (Y |X ) = X ′ β0 . But nothing guarantees that
X ′ β0 ∈ [0, 1].
▶ In order to satisfy the restriction (1), the following assumption is
made:
E (Y |X ) = F (X ′ β0 ), (2)
where F (.) is a strictly increasing bijective (and known) function
from R to ]0, 1[, that is to say F satisfies the properties of a
distribution function.
▶ N.B.: equation (2) corresponds to the equation of a so-called
Generalized Linear Model (GLM), i.e., a model of the form:
h(E (Y |X )) = X ′ β0 ,
where h is a known function called the link function.
6 / 47
Non-linear models and latent variables

▶ Model (2) can be interpreted in terms of latent variables.

▶ Suppose there exists a continuous variable Y ∗ ∈ R such that

Y = 1{Y ∗ ≥ 0}.

▶ Suppose in addition that Y ∗ is generated by a linear model:

Y ∗ = X ′ β0 + ε, (3)

where −ε is independent from X and has the distribution function

F . Then

P(Y = 1|X ) = P(X ′ β0 + ε ≥ 0|X ) = P(−ε ≤ X ′ β0 |X ) = F (X ′ β0 ).

▶ So we obtain again equation (2).

▶ The latent variable interpretation is, often, quite natural.

7 / 47
Latent variable interpretation

Example 1: microeconomics.
▶ Y = agent’s choice between two options. Let U1 be the (expected)
utility of this agent when choosing option Y = 1, and U0 the utility
when choosing Y = 0.
▶ Then define the difference between the utilities of the two choices as
Y ∗ = U1 − U0 .
▶ If the agent is rational the chosen option is the one that generates
the highest utility:

Y = 1{U1 ≥ U0 } = 1{Y ∗ ≥ 0}.

Example 2: corporate finance.

▶ The default (Y = 1) of a company occurs if its debt, denoted D, is
larger then some threshold value S (possibly random).
▶ The latent variable is then Y ∗ = D − S.

8 / 47
Latent variable interpretation

Example 3: biostatistics.
▶ Y =1 if an individual is ill, 0 otherwise.
▶ Un individual has recovered if the number of bacteries N (for
example) has descended below a certain threshold S, which may
possibly be individal-specific.
▶ On a alors Y ∗ = N − S.
Example 4: education.
▶ Y = 1 if a student graduates and obtain a diploma, 0 otherwise.
▶ The diploma is obtained if the student’s average grade M exceeds a
fixed threshold s.
▶ Then Y ∗ = M − s.

9 / 47
Two important examples: the probit and logit models

▶ A priori any choice for F is possible.

▶ The two most common choices are
▶ F = Φ, distribution function of a N (0, 1) random variable: probit
model;
▶ F (x ) = Λ(x ) = 1/(1 + exp(−x )), the logistic distribution function:
logit model.
▶ The difference between the two distribution functions is quite small.
2
When |x | → +∞, φ(x ) = Φ′ (x ) ∝ e −x /2 and
Λ′ (x ) = Λ(x )(1 − Λ(x )) = O(e −|x | ).
⇒ the logistic density function has fatter tails.

10 / 47
Parameters and marginal effects

▶ Qualitatively, the j−th component of X , Xj , has a positive effect

on P(Y = 1|X ) iff β0j > 0.
▶ Quantitatively, the interpretation of β0j is more subtle.
▶ In a standard linear model E (Y |X ) = X ′ β0 , the parameter β0j
corresponding to Xj can be interpreted as the marginal effect of Xj :

∂E (Y |X1 = x1 , ..., XK −1 = xK −1 )
= β0j .
∂xj

▶ This effect is independent of x = (x1 , ..., xK −1 ).

11 / 47
Parameters and marginal effects

▶ But in binary models (as generally in non-linear models), the

marginal effect of Xj is no longer β0j , and it depends on x :

∂E (Y |X = x ) ∂F (u) ∂x ′ β0
= = f (x ′ β0 )β0j with f = F ′ .
∂xj ∂u ′
u=x β0 ∂x j

▶ Remark 1 : if f is symmetric and unimodal, the effect of a variable

on P(Y = 1|X ) is larger the closer x ′ β0 is to 0, that is
P(Y = 1|X ) ≃ 0.5.
▶ Remark 2 : we always have

β0l ∂E (Y |X = x )/∂xl
=
β0j ∂E (Y |X = x )/∂xj

The ratio of parameters can thus be interpreted as in a standard

linear model.

12 / 47
Parameters and marginal effects
▶ Besides estimating β0j, it is of interest to estimate the average
marginal effect:
∆j = E [f (X ′ β0 )] β0j .
This is the expected marginal effect of Xj across the whole
population.
▶ It is also possible to focus on marginal effects calculated across
sub-populations, by calculating the expected marginal effect for units
verifying X ∈ A (for example): E [f (X ′ β0 )|X ∈ A] β0j , or the
marginal effect for the average (representative) unit, f (E (X )′ β0 ) β0j .
▶ When an explanatory variable Xj is discrete (dichotomous), it is
more appropriate to replace marginal effects by
′ ′
F (x−j β0−j + β0j ) − F (x−j β0−j ),
where x−j = (1, x1 , ..., xj−1 , xj+1 , ..., xK −1 )′ . The average effect of
globally switching Xj from 0 to 1 is then:
′ ′

E F (X−j β0−j + β0j ) − F (X−j β0−j ) .
13 / 47
A specificity of the logit model: the odds-ratios.

▶ Define the risk (or odd) as the ratio of probabilities

P(Y = 1|X )/P(Y = 0|X ).
▶ In the case of a logit model:
′
P(Y = 1|X = x ) 1/(1 + e −x β0 ) ′
= −x ′ β0 = e x β0
P(Y = 0|X = x ) e /(1 + e −x ′ β0 )
▶ Consider now a binary explanatory variable Xj ∈ {0, 1}. We then get:

P(Y = 1|X−j = x−j , Xj = 1)/P(Y = 0|X−j = x−j , Xj = 1)

e β0j = .
P(Y = 1|X−j = x−j , Xj = 0)/P(Y = 0|X−j = x−j , Xj = 0)

▶ Therefore e β0j equals the ratio of risks (odds-ratio) corresponding to

Xj = 1 and Xj = 0. This odds-ratio is independent of the value of
X−j .

14 / 47
Outline

Introduction

Modelization and parameters of interest

Identification and estimation

Model quality, selection of variables

Linear probability models

Potential Outcomes

Application

15 / 47
Identification
▶ Let us return to the equation:
Y = 1{X ′ β0 + ε ≥ 0}.
▶ Two questions: (i) why is the threshold fixed at 0 ? (ii) why is the
variance of ε fixed (at 1 for the probit model, at π 2 /3 for the logit
model) ?
▶ Reason: the model parameters are not identified otherwise. Indeed,
we have:
′ ′
Y = 1{β01 +X−1 β0−1 +ε ≥ s} ⇐⇒ Y = 1{β01 −s+X−1 β0−1 +ε ≥ 0}.
▶ Put differently, it is not possible to identify separately the constant
β01 and the treshold s. The threshold s is therefore (arbitrarily)
fixed at 0.
▶ Similarly, it is not possible to identify separately β0 and the variance
σ02 of the error term. ε. Indeed,
Y = 1{X ′ β0 + ε ≥ 0} ⇐⇒ Y = 1{X ′ (β0 /σ0 ) + ε/σ0 ≥ 0}.
▶ The variance σ02 is therefore arbitrarily fixed.
16 / 47
Identification
When s and σ0 are fixed and if E (XX ′ ) is an invertible matrix, the model
is identified.
Proof: letting Pβ be the distribution of the observations when the true
parameter is β, it is necessary to show that the function β 7→ Pβ is
injective. In our conditional binary model, the identifiability of the model
is equivalent to

Pβ (Y = 1|X ) = Pβ ′ (Y = 1|X ) ⇒ β = β ′ ∀(β, β ′ ).

But
(E (XX ′ ) is invertible ) ⇐⇒ (X ′ λ = 0 =⇒ λ = 0)
Consequently,

Pβ (Y = 1|X ) = Pβ ′ (Y = 1|X ) ⇐⇒ F (X ′ β) = F (X ′ β ′ )
⇐⇒ X ′β = X ′β′
⇐⇒ β = β′ 2

17 / 47
Estimation of the model: the maximum likelihood method

▶ Let us now consider the estimation of β0 using a sample of i.i.d.

observations ((Y1 , X1 ), ..., (Yn , Xn )).
▶ Since the model is fully parametric, the unknown parameters can be
estimated by the method of maximum likelihood.
▶ The density of Y conditional on X is
y 1−y
P(Y = y |X = x ) = [P(Y = 1|X = x )] [P(Y = 0|X = x )]
= F (x ′ β)y (1 − F (x ′ β))1−y .

▶ The maximum likelihood function of the i.i.d. observations

(Y, X) = ((Y1 , X1 ), ..., (Yn , Xn )) conditional on X can then be
written as
n
Y
Ln (Y|X; β) = F (Xi′ β)Yi (1 − F (Xi′ β))1−Yi .
i=1

18 / 47
Estimation of the model: the maximum likelihood method

▶ The maximum likelihood estimator is then defined as:

βb ∈ arg max Ln (Y|X; β).

β∈RK

▶ Note that this estimator is not necessarily unique, and it might not
even exist.
▶ Also note that Ln (Y|X; β) is the likelihood conditional on X.
Denoting g(Xi ) the density of Xi , the unconditional likelihood can
be written
n
Y
Ln (Y, X; β) = F (Xi′ β)Yi (1 − F (Xi′ β))1−Yi g(Xi ).
i=1

Since in practice the distribution of X is typically of no particular

interest, the focus is on the conditional likelihood.

19 / 47
First order conditions
▶ It is easier to maximize the log-likelihood function:
n
X
ℓn (Y|X; β) = Yi ln (F (Xi′ β)) + (1 − Yi ) ln (1 − F (Xi′ β))
i=1

▶ We have ∂F (Xi′ β)/∂β= f (Xi′ β)Xi . Therefore:

n
f (Xi′ β) −f (Xi′ β)

∂ℓn X
(Y|X; β) = Yi + (1 − Yi ) Xi .
∂β i=1
F (Xi′ β) 1 − F (Xi′ β)

Reorganizing terms gives

n
∂ℓn X f (Xi′ β) ′
∂β
(Y|X; β) =
F (X ′ β)(1 − F (X ′ β)) [Yi − F (Xi β)] Xi . (4)
i=1 i i

▶ The first order conditions can hence be written as:

n
X f (Xi′ β)
b h i
Yi − F (Xi′ β)
b Xi = 0 (5)
i=1 i
b − F (X ′ β))
F (X ′ β)(1 b
i

which do not imply, in general, a simple analytical solution.

20 / 47
Existence and unicity of the solution

▶ If a dichotomous variable Xj is such that: if xij = 1 ⇒ yi = 1 for all

i (or xij = 1 ⇒ yi = 0 for all i), the the estimator does not exist.
▶ Indeed, ∂ℓn /∂βj can then be written as (considering the case xij = 1
⇒ yi = 1 for all i)
n
X f (xi′ β) X f (x ′ β)
[yi − F (xi′ β)] xij = i
>0 ∀β
i=1
F (xi′ β)(1 ′
− F (xi β)) i:x =1
F (x ′
i β)
ij

▶ This example shows that, among observations such that xij = 1, the
variable yi should vary across i in order to estimate β0j . In the
absence of variation, economtrics/statistics software packages like
Stata indicate that β0j is not identified and will automatically
“expel” xj from the list of explanatory variables.

21 / 47
Existence and unicity of the solution

▶ In the case of a logit model, we have Λ′ = Λ(1 − Λ), so

n
∂ 2 ℓn X
(Y|X; β) = − Λ′ (Xi′ β)Xi Xi′ << 0.
∂β∂β ′ i=1

The matrix of second derivatives is a negative definite matrix. The

log-likelihood is then strictly concave ⇒ the first order conditions
contain at most one solution, and this solution correspond to a
global maximum.
▶ In the case of a probit model, it can also be shown that the
log-likelihood is strictly concave.
▶ In general, the maximisation program is not necessarily concave and
multiple solutions may exist. Ideally it must then be verified that the
solution corresponds to a global maximum.

22 / 47
Remarks on the optimisation procedure

▶ Unlike the OLS estimator, the maximum likelihood estimator

cannot, in general, be expressed explicitly.
▶ The estimator can be obtained numerically using optimisation
algorithms such as Newton-Raphson algorithm (there are other
algorithms as well).
Starting with an initial value β (0) , define the
(m)
sequence β m∈N
:
−1
∂ 2 ℓn

∂ℓn
β (m+1) = β (m) − (Y|X; β (m) ) (Y|X; β (m) )
∂β∂β ′ ∂β

▶ Under strict concavity of ℓn (Y|X; β), the sequence β (m) , necessarily

converges to the the maximum likelihood estimator.
▶ In the cases of the logit and probit models the iterations typically
converge very quickly to the optimum.

23 / 47
Asymptotic properties

Under various technical conditions (see the Statistics 1 course), it can be

P √ d
shown that βb −→ β0 and n(βb − β0 ) −→ N 0, I1−1 (β0 ) , where I1 (β0 )

is the Fisher information associated with one observation. Moreover,

f 2 (X ′ β0 )

′
I1 (β0 ) = E XX .
F (X ′ β0 )(1 − F (X ′ β0 ))

This Fisher information can be consistently estimated by:

n
1X f 2 (Xi′ β)
b
I\(β
1 0 ) = Xi Xi′ .
n i=1 F (Xi β)(1
′ b − F (X ′ β))
i
b

Recall that the maximum likelihood estimator is asymptotically the best

“regular” estimator: if another estimator βe verifies
√ e
d
n β − β0 −→ N (0, V ), then V >> I1−1 (β0 ).

24 / 47
Asymptotic properties
Proof of the formula for I1 (β0 ): we have

∂ℓ1
I1 (β0 ) = V (Y |X ; β0 )
∂β

where ℓ1 (Y |X ; β0 ) is the log-likelihood (evaluated at β0 ) of one

observation:

ℓ1 (Y |X ; β0 ) = Y ln (F (X ′ β0 )) + (1 − Y ) ln (1 − F (X ′ β0 ))

Using the law of total variance gives:

∂ℓ1 ∂ℓ1
I1 (β0 ) = E V (Y |X ; β0 ) X +V E (Y |X ; β0 ) X .
∂β ∂β

Using equation (4)) gives

∂ℓ1 f (X ′ β0 )
(Y |X ; β0 ) = [Y − F (X ′ β0 )] X .
∂β F (X β0 )(1 − F (X ′ β0 ))
′

25 / 47
Asymptotic properties

f 2 (X ′ β0 )XX ′
=
F (X ′ β ′
0 )(1 − F (X β0 ))

since E (Y − F (X ′ β0 ))2 X = F (X ′ β0 )(1 − F (X ′ β0 )). The result is
then obtained.

26 / 47
Hypothesis testing

▶ We wish to test a hypothesis of the form

H0 : Rβ0 = 0 against H1 : Rβ0 ̸= 0 (R matrice p × K , p ≤ K ).

▶ For example, β0j = 0 or β02 = ... = β0K −1 = 0 (that is β0−1 = 0).

▶ We use one of the following three maximum-likelihood based tests:
the Wald test, the score test, and the likelihood ratio test. The test
statistics associated with 3 tests are:
h −1
i−1
ξnW = nβb′ R ′ R I\ 1 (β0 ) R′ R βb
1 ∂ℓn −1 ∂ℓn
ξnS = (Y|X; βbC ) I\
1 (β0 ) (Y|X; βbC )
n ∂β ′ ∂β
h i
ξnR = 2 ℓn (Y|X; βb) − ℓn (Y|X; βbC )

where βbC is the constraint maximum likelihood estimator, i.e., the

one obtained under H0 .

27 / 47
Hypothesis testing
▶ Concerning the statistic ξnW , I\
1 (β0 ) corresponds to the formula on
page 24.
▶ Concerning the statistic ξnS , it is the same formula except that βb is
replaced by βbC .
▶ Note that the three statistics tend to be “small” under the
hypothesis H0 .
d
▶ Under H0 , ξnT −→ χ2 (p) (T = W , S or R).
▶ The critical region of a test of asymptotic level α therefore takes the
2 2
form {ξnT > q χ (p) (1 − α)} where q χ (p) (y ) is the quantile of order y
of a χ2 (p).
▶ To test H0 : β0j = 0 against H1 : β0j ̸= 0, the usual t-test is
mostly used. This test produces the same result as the Wald test,
2
because (you should verify this) ξ W = βbj /se(βbj ) ≡ t 2 and
n j
|tj | > q N(0,1) (1 − α/2) ⇔ ξnW > q χ2 (1) (1 − α) where
q N(0,1) (1 − α/2) is the quantile of order 1 − α/2 of a 0, 1 .
28 / 47
Outline

Introduction

Modelization and parameters of interest

Identification and estimation

Model quality, selection of variables

Linear probability models

Potential Outcomes

Application

29 / 47
Explanatory power of the model

▶ Let us define, similarly to the R 2 in linear model, the pseudo-R 2 as:

ℓn (Y|X; β)
b
pseudo-R 2 = 1 −
ℓn (Y|X; βbC )

where βbC is the parameter estimate obtained under the null

hypothesis β0−1 = 0.
▶ Since 0 > ℓn (Y|X; β) b ≥ ℓn (Y|X; βbC ), the pseudo-R 2 belongs to
]0, 1]. It is close to 1 when

Yi = 1 and F (Xi′ β)
b ≃ 1 or Yi = 0 and F (X ′ β)
i
b ≃ 0.

▶ Like the R 2 , the pseudo-R 2 augments mechanically with the number

of variables in the model.
▶ Other indicators: concordance table, score, percentage of concordant
pairs...

30 / 47
Choice of variables
▶ Trade off between:
▶ increase of explanatory power of model;
▶ loss of precision due to estimating a large number of parameters.
▶ One can test the null hypothesis that variables have no effect,
possibly through sequential procedures (forward, backward, ...).
▶ Drawback: when n tends to infinity, such procedures entail that the
null is rejected for most of the explanatory variables.
▶ one can also use the information criteria AIC (Akaike Information
Criterion, Akaike, 1973) or BIC (Bayesian Information Criterion,
Schwarz, 1978).
▶ Such criteria are used to determine which model to select. Suppose
there are J possible parametric models:

(Pβ (1) )β (1) ∈B (1) , ..., (Pβ (J) )β (J) ∈B (J) .

The objective is to select the “true” data-generating model.

31 / 47
Choice of variables

▶ Akaike’s criterion for model j having pj parameters:

AIC(j) = ℓn (Y|X; βb(j) ) − pj

The chosen model is then model j0 = arg maxj AIC(j).

▶ This criterion does not necessarily lead to a correct selection when n
tends to infinity. Indeed, it does not sufficiently the number of
parameters of each model.
▶ To account for that drawback, Schwarz (1978) proposed the
following criterion:
pj
BIC(j) = ℓn (Y|X; βb(j) ) − ln(n)
2

32 / 47
Outline

Introduction

Modelization and parameters of interest

Identification and estimation

Model quality, selection of variables

Linear probability models

Potential Outcomes

Application

33 / 47
Advantage of a linear model
▶ Sometimes, for simplicity reasons, a linear probability model is
estimated instead of a logit or probit model:
E (Y |X ) = X ′ β0 .
▶ Example: panel data. Suppose that
E (Yit |Xit , αi ) = Xit′ β0 + αi ,
where αi is the unit-specific fixed which is possibly correlated with
Xit .
▶ This unobserved fixed effect can be eliminated by applying the first
difference or within operator:
E (Yit − Yit−1 |Xit , Xit−1 ) = (Xit − Xit−1 )′ β0 .
▶ In non-linear models, this trick does not work since
E (Yit − Yit−1 |Xit , Xit−1 , αi ) = F (Xit′ β + αi ) − F (Xit−1
′
β + αi ).
▶ Furthermore, maximum likelihood estimation of (β, α1 , ..., αn ) does
not produce consistent estimators of the parameters because of the
so-called incidental parameter problem: the number of parameters
tend to infinity with n. 34 / 47
Modelization and estimation
▶ The linear probability model can be rewritten as Y = X ′ β0 + ε, with

1 − X ′ β0 with (conditional) probability X ′ β0

ε=
−X ′ β0 with probability 1 − X ′ β0
▶ Therefore:

V (ε|X ) = E (ε2 |X ) = X ′ β0 (1 − X ′ β0 )2 + (1 − X ′ β0 )(X ′ β0 )2

= X ′ β0 (1 − X ′ β0 ).
▶ The model is heteroscedastic. It can be estimated by OLS but also
by Generalized Least Squares (GLS):
▶ First estimate β0 by OLS: βbOLS .
▶ Then reestimate β0 by
n
X 1 2
Yi − Xi′ β

βbGLS = arg min
β
i=1
Xi′ βbOLS (1 − Xi′ βbOLS )

▶ The GLS estimator more precise in theory, but not necessarily in

practice if Xi′ βbOLS ≃ 0 or ≃ 1 for certain observations i.
35 / 47
Comparaison logit/probit/linear model

▶ The difference between logit, probit and linear models is that not the
same distribution function F (.) is chosen in E (Y |X ) = F (X ′ β0 ):
F = Λ, Φ, or the identity function, depending on the type of model.
▶ There are semi-parametric models wherein it is assumed that
P(Y = 1|X ) = F (X ′ β0 ) with F and β0 unknown. This is equivalent
to considering a latent model Y ∗ = X ′ β0 + ε where ε ⊥⊥ X and with
an unknown distribution function.
▶ Such model are less rectrictive but are harder to estimate.
▶ The results obtained using a logit model, a probit model, or a linear
model, are often quite similar in terms of their marginal effects.
▶ In terms of the coefficients, we have in general

βblogit ≃ 1.6βbprobit ≃ 4βblinéaire .

36 / 47
Outline

Introduction

Modelization and parameters of interest

Identification and estimation

Model quality, selection of variables

Linear probability models

Potential Outcomes

Application

37 / 47
Potential Outcomes

Consider a bianry outcome Y and an experimental framework where

Y (0), Y (1), X ⊥
⊥ D and a Logit regression of Y on (1, D, X ):
▶ Experimental framework ensures E (Y (d)|X ) = E (Y |D = d, X ).
▶ First order conditions of the MLE are
n
X
Yi − Λ(βb0 + βbD Di + Xi′ βbX ) = 0
i=1
n
X
Di Yi − Λ(βb0 + βbD Di + Xi′ βbX ) = 0
i=1
n
X
Xi Yi − Λ(βb0 + βbD Di + Xi′ βbX ) = 0
i=1

38 / 47
Logit and Potential Outcomes

The two first equations are equivalent to:

n
1 X
Pn (1 − Di ) Yi (0) − Λ βb0 + Xi′ βbX =0
i=1 1 − Di i=1

n
1 X
Pn Di Yi (1) − Λ βb0 + βbD + Xi′ βbX =0
i=1 Di i=1

Let Zi = (1, Di , Xi′ )′ and assume E (Zi Zi′ ) non singular. Then
(b0 , bD , bX ) 7→ E (ℓ1 (Y|X; (b0 , bD , bX′ )′ )) admits a unique maximum
denoted β0 , βD , βX .
We can show that (βb0 , βbD , βbX ) converges to (β0 , βD , βX ) and the first
order conditions ensures the non random equations:

E (Yi (0) − Λ(β0 + Xi′ βX )|Di = 0) = 0, (6)

E (Yi (1) − Λ(β0 + βD + Xi′ βX )|Di = 1) = 0. (7)

39 / 47
Robust estimation of the Average Treatment Effect
Even if E (Y (d)|X ) ̸= Λ(β0 + βD d + x ′ βX ) (misspecification):
Independence of D with Y (0), Y (1), X ensures that the average
treatment effect is (!):

δ := E (Y (1) − Y (0))
(1)
= E (Y (1)|D = 1) − E (Y (0)|D = 0)
(2)
= E (Λ(β0 + βD + Xi′ βX )|D = 1) − E (Λ(β0 + Xi′ βX )|D = 0)
(3)
= E (Λ(β0 + βD + Xi′ βX ) − Λ(β0 + Xi′ βX )) ,

(1) and (3) are consequences of D ⊥ ⊥ (Y (0), Y (1), X ), (2) is a

consequence of (6) and (7).
δ can be estimated consistently by:
n
1X b
δb = Λ(β0 + βbD + Xi′ βbX ) − Λ(βb0 + Xi′ βbX )
n i=1

40 / 47
Robust estimation of the Average Treatment Effect
In an experimental design, with a binary outcome:
Pn Pn
Yi Di (1−Di )Yi
1. Naive estimator δ =
b Pi=1
n
D
− P
i=1
n
1−D
of the ATE is obtained
i i
i=1 i=1
by (check this !):
▶ Linear regression of Y on (1, D): b
δ = βbD
▶ Logit regression of Y on (1, D): δ = Λ(βb0 + βbD ) − Λ(βb0 )
b
▶ δ = Φ(βb0 + βbD ) − Φ(βb0 )
Probit regression of Y on (1, D): b
▶ Considering any MLE based on P(Y = 1|D) = F (β0 + βD D) for F
continuous with continuous inverse: bδ = F (βb0 + βbD ) − F (βb0 )
2. If some X also affect potential outcomes Y (0), Y (1), more efficient
estimators to consider:
▶ Linear regression on (1, D, X ): e
δ = βbD
▶ Logit regression on (1, D, X ) (robust to misspecification):
Pn
δ̆ = n1 i=1 Λ(βb0 + βbD + Xi βbX ) − Λ(βb0 + Xi βbX )
▶ but estimators derived from Probit of other model for binary
outcome are not robust to misspecification!
▶ Choice between e δ or δ̆ depends on their respective asymptotic
variance (use robust variance estimator based on sandwich formula!).

41 / 47
Conditional Average Treatment Effect in Expe.

Moreover if E (Y (d)|X ) = Λ(β0 + βD d + x ′ βX ) (no misspecification):

n
1X b
δ(x ) =
b Λ(β0 + βbD + x ′ βbX ) − Λ(βb0 + x ′ βbX )
n i=1
converges to the conditional average treatment effect:

δ(x ) := E (Y (1) − Y (0)|X = x )

This estimator is not robust to misspecification.

Competition with linear regression on (1, D, X ): E (Y (d)|X = x ) can not
be equal to β0 + x ′ βX if x ′ βX has a support larger than [−β0 ; 1 − β0 ] for
any β0 ∈ R.
Similar consistency for the Probit (or model based on other link function)
but again these estimators of the CATE are not robust to
misspecification.

42 / 47
Outline

Introduction

Modelization and parameters of interest

Identification and estimation

Model quality, selection of variables

Linear probability models

Potential Outcomes

Application

43 / 47
Example: labor market participation of women

▶ The objective is to explain labor market participation (Y = 1, Y = 0

otherwise) of women according to their age, their diploma and
family situation (living in couple or not, number of children below
the age of 3 years).
▶ The 2023 French labor force survey (Enquête Emploi) is used,
focusing on women aged less than 65, and who have terminated
their studies.
▶ Modalities of the variable measuring education level (DIP7):
1 Advanced diploma (Master level university or Grande Ecole)
2 Baccalauréat + 3/4 years
3 Baccalauréat + 2 years
4 Baccalauréat or brevet professionnel or equivalent diploma
5 CAP, BEP or equivalent diploma
6 Brevet des collèges
7 No diploma or CEP

44 / 47
Code R

45 / 47
Results: logit model coefficients

46 / 47
Results: probit model coefficients

47 / 47

Bruno Lecture Notes PDF
No ratings yet
Bruno Lecture Notes PDF
251 pages
Lurgi's Methanol To Propylene (MTP) Report On A Successful Commercialisation
No ratings yet
Lurgi's Methanol To Propylene (MTP) Report On A Successful Commercialisation
7 pages
Limited Dependent Variables
No ratings yet
Limited Dependent Variables
17 pages
3.Handouts_binary_dependent_variables
No ratings yet
3.Handouts_binary_dependent_variables
8 pages
Binary data advanced
No ratings yet
Binary data advanced
42 pages
Seminar Econometrie
No ratings yet
Seminar Econometrie
15 pages
Binaryresponsemf IMP
No ratings yet
Binaryresponsemf IMP
11 pages
Econometria Avanzada: Generalized Linear Models
No ratings yet
Econometria Avanzada: Generalized Linear Models
30 pages
Msfe Week9
No ratings yet
Msfe Week9
5 pages
Econometrics Eviews 6
No ratings yet
Econometrics Eviews 6
12 pages
7 binaryresponsemf
No ratings yet
7 binaryresponsemf
11 pages
Econometrics 2 Module 5 Video 2 Canvas
No ratings yet
Econometrics 2 Module 5 Video 2 Canvas
13 pages
MicroEconometrics Lecture10
No ratings yet
MicroEconometrics Lecture10
27 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Limited Dependent Variables Models PDF
No ratings yet
Limited Dependent Variables Models PDF
47 pages
Limited Dependent Variables - Binary Dependent Variables
No ratings yet
Limited Dependent Variables - Binary Dependent Variables
24 pages
Lecture 1a
No ratings yet
Lecture 1a
17 pages
09-Limited Dependent Variable Models
No ratings yet
09-Limited Dependent Variable Models
71 pages
2part Latent Trait
No ratings yet
2part Latent Trait
33 pages
Section 9 Limited Dependent Variables
No ratings yet
Section 9 Limited Dependent Variables
17 pages
CH 5. Discrete Choice Model
No ratings yet
CH 5. Discrete Choice Model
38 pages
metrikaq
No ratings yet
metrikaq
11 pages
Chapter 5 Mgt
No ratings yet
Chapter 5 Mgt
60 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Ecntr Assmm
No ratings yet
Ecntr Assmm
23 pages
Lecture15 Binary Dependent Variables
No ratings yet
Lecture15 Binary Dependent Variables
38 pages
Metrics WT 2023-24 Unit10 Ml+Discrete Choice
No ratings yet
Metrics WT 2023-24 Unit10 Ml+Discrete Choice
36 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Uni Variate Regression
No ratings yet
Uni Variate Regression
61 pages
2A.3 Lecture Slides20 LDV 1
No ratings yet
2A.3 Lecture Slides20 LDV 1
21 pages
0-LIMDEP-MODEL-nb-nb
No ratings yet
0-LIMDEP-MODEL-nb-nb
96 pages
Logit and Probit: Models With Discrete Dependent Variables
No ratings yet
Logit and Probit: Models With Discrete Dependent Variables
30 pages
Econometrics
No ratings yet
Econometrics
37 pages
Bio24_Rathouz
No ratings yet
Bio24_Rathouz
45 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
Pro Bit
No ratings yet
Pro Bit
5 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Binary
No ratings yet
Binary
135 pages
Econometrics Module 2
No ratings yet
Econometrics Module 2
38 pages
Modelos Escolha Discreta
No ratings yet
Modelos Escolha Discreta
67 pages
PD2004 9
No ratings yet
PD2004 9
26 pages
Lecture 6&7_Qualitative Dependent Models
No ratings yet
Lecture 6&7_Qualitative Dependent Models
15 pages
Sta 3
No ratings yet
Sta 3
9 pages
Censoring
No ratings yet
Censoring
29 pages
Qualitative Response Models
No ratings yet
Qualitative Response Models
35 pages
Newsletter 23 - Logit, Probit, Tobit (2P)
No ratings yet
Newsletter 23 - Logit, Probit, Tobit (2P)
2 pages
Section 11 PDF
No ratings yet
Section 11 PDF
7 pages
Week 12 LPN Logit 0
No ratings yet
Week 12 LPN Logit 0
35 pages
Econometrics 2 Module 6 Video 3 Canvas
No ratings yet
Econometrics 2 Module 6 Video 3 Canvas
10 pages
Unitb - II - Linear Probability, Logit and Probit
No ratings yet
Unitb - II - Linear Probability, Logit and Probit
34 pages
Week1 Lecture2
No ratings yet
Week1 Lecture2
57 pages
UnivariateRegression 3
No ratings yet
UnivariateRegression 3
81 pages
9 multinomialchoice2up
No ratings yet
9 multinomialchoice2up
8 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
Econ-607 - Unit2-W1-3
No ratings yet
Econ-607 - Unit2-W1-3
117 pages
8 limiteddependent2up
No ratings yet
8 limiteddependent2up
9 pages
lpm stata baum
No ratings yet
lpm stata baum
73 pages
C3-English
No ratings yet
C3-English
31 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Differentiation (Calculus) Mathematics Question Bank
From Everand
Differentiation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
4/5 (1)
article DID
No ratings yet
article DID
13 pages
Document 5
No ratings yet
Document 5
40 pages
document4
No ratings yet
document4
10 pages
document3
No ratings yet
document3
36 pages
document2
No ratings yet
document2
48 pages
A. Krawczyk Sołtys
No ratings yet
A. Krawczyk Sołtys
13 pages
Persuasive Speech Analysis
No ratings yet
Persuasive Speech Analysis
5 pages
Bill November
No ratings yet
Bill November
2 pages
Epcc11 Ail 301 Pip Ifd 000 0008
No ratings yet
Epcc11 Ail 301 Pip Ifd 000 0008
2 pages
Material Schedule: Useful Tool For Effective Management of Building Project
No ratings yet
Material Schedule: Useful Tool For Effective Management of Building Project
13 pages
Naukri ShubhamSharma (4y 0m)
No ratings yet
Naukri ShubhamSharma (4y 0m)
3 pages
Training & Devlopment Iffco Aonla
50% (2)
Training & Devlopment Iffco Aonla
114 pages
IESEG MasterManagement PGE en
No ratings yet
IESEG MasterManagement PGE en
36 pages
Investment CH18
No ratings yet
Investment CH18
48 pages
May Data
No ratings yet
May Data
1,226 pages
how_a_new_reliability_maintenance_program_delivered
No ratings yet
how_a_new_reliability_maintenance_program_delivered
9 pages
Tender Form: (Techno Commercial Unpriced Bid)
No ratings yet
Tender Form: (Techno Commercial Unpriced Bid)
1 page
Pramod DN
No ratings yet
Pramod DN
2 pages
Reservation Form
No ratings yet
Reservation Form
1 page
A Comparison of Information Systems - Eiss, DSS, and Miss: January 2017
No ratings yet
A Comparison of Information Systems - Eiss, DSS, and Miss: January 2017
14 pages
James Glover Wi Rent Ledger
No ratings yet
James Glover Wi Rent Ledger
8 pages
Finance Notes
No ratings yet
Finance Notes
5 pages
CVBU Spare Invoice VAT - 2025-02-28T140836.262
No ratings yet
CVBU Spare Invoice VAT - 2025-02-28T140836.262
2 pages
Naukri_PatilChalobaRamchandra[6y_6m]
No ratings yet
Naukri_PatilChalobaRamchandra[6y_6m]
4 pages
02 - Risk Management & Assessment - MBM
No ratings yet
02 - Risk Management & Assessment - MBM
108 pages
Accounting Global 9th Edition Horngren Solutions Manualdownload
100% (5)
Accounting Global 9th Edition Horngren Solutions Manualdownload
48 pages
Entrepreneurship Group Assignment - Final Document
No ratings yet
Entrepreneurship Group Assignment - Final Document
7 pages
Qav Basf Apr 2015
No ratings yet
Qav Basf Apr 2015
6 pages
ICAI - The Institute of Chartered Accountants of India
No ratings yet
ICAI - The Institute of Chartered Accountants of India
1 page
Entry of Into United Kingtom: Ceren Sevinç Cansu Kelleci Global Marketing Project
0% (1)
Entry of Into United Kingtom: Ceren Sevinç Cansu Kelleci Global Marketing Project
46 pages
Nucor - Report
83% (6)
Nucor - Report
14 pages
Total Quality Management: January 2017
No ratings yet
Total Quality Management: January 2017
11 pages
CBM 112 Sim Study Guide
No ratings yet
CBM 112 Sim Study Guide
138 pages
Social Housing As Investment-The Case For Incremental Housing
No ratings yet
Social Housing As Investment-The Case For Incremental Housing
30 pages