Microeconometrie Chapitre1 BinaryOutcomeModels
Microeconometrie Chapitre1 BinaryOutcomeModels
Théophile T. Azomahou
University Clermont Auvergne, CNRS, CERDI
Maastricht University, School of Business and Economics
Email: [email protected]
Also (find out), the expected value and the variance of X are:
E (Y ) = p V (Y ) = p(1 − p) (2)
The two standard binary outcome models, the logit and the probit models,
specify different functional forms for this probability as a function of
regressors. The difference between these estimators is qualitatively similar to
use of different functional forms for the conditional mean in least-squares
regression.
OLS regression computes: E (yi |xi ) that ignores the discreteness of the
dependent variable and does not constrain predicted probabilities to be
between zero and one.
yi = α + ln(pricecharter .i /pricepier .i ) + εi
Assume: (
1 with probability p,
yi =
0 with probability 1 − p
A regression model is formed by parameterizing the probability p to depend on a
regressor vector x and a K × 1 parameter vector β. The commonly used models
are of single-index form with conditional probability given by:
pi ≡ P(yi = 1 | x) = F (x 0i β) (3)
1.00
0.75
F(bx)
0.50
0.25
0.00
230 220 210 0 10 20 30
b9x
where φ(z) and Φ(z) denote the normal density and the CDF of the normal
distribution respectively:
( 2 )
1 1 z − z̄
φ(z) = √ exp − , normal density
σ 2π 2 σ
1
f (z) = , Logistic density: of hyperbolic-secant-square, sech2
(1 + e −z )2
Given independence over i and model (3) for pi , the log-likelihood function is
N
X
L(yi , x i , β) = [yi ln F (x 0i β) + (1 − yi ) ln(1 − F (x 0i β))] (6)
i=1
N
∂L(yi , x i , β) X yi 0 1 − yi 0
= Fi x i − Fi x i = 0 (7)
∂β Fi 1 − Fi
i=1
Consistency. β b
ML is consistent if the conditional density of y given x is correctly
specified (Bernoulli), meaning if pi ≡ F (x 0i β). For binary data, we have
E (y ) = 1 × p + 0 × (1 − p) = p. This implies that E (yi |x i ) = F (x 0i β). This also
implies that the left hand side of (8) has expected value zero, the essential
condition for consistency.
N
!−1
\ 1 X 1
V (β
b )=
ML 0 β̂)(1 − F (x 0 β̂))
F 0 (x 0i β̂)2 x i x 0i (10)
N F (x i i
i=1
∂P(yi = 1 | x i )
= F 0 (x 0i β)βk (11)
∂xik
∂F (z)
where F 0 = ∂z
Remarks
The marginal effects differ with the point of evaluation x i as for any
nonlinear model, and differ with different choices of F (·). Usually the
marginal effect is evaluated at the sample means of the data.
When the marginal effect is evaluated at every observation and then the
sample average of individual marginal effects is used, we get the Average
Partial Effects (APE).
The sign of the coefficient gives the sign of the marginal effect, since
F (·) > 0.
Interpretation
i) Marginal effects are additive approximations of effects in non-additive
models. For example, a value of 0.07 means that the probability is
more likely to increase by 7 percentage point.
ii) The marginal effect tells by how many units the probability changes if
the explanatory variable changes by 1 unit. A probability is measured in
percentage and its units are percentage points.
1
Λ(x 0i β̂) = ȳ the
P
Moreover, the average in-sample predicted probability N i
sample frequency of yi .
The marginal effects for the logit model can be fairly easily obtained from
the coefficients, since:
∂pi
= pi (1 − pi )βk (15)
∂xik
where pi = Λ(x 0i β). The marginal effect gives the increase in the probability
that y = 1 as xik changes.
Odds ratio or relative risk. A very common interpretation of the
coefficients is in terms of marginal effects on the odds ratio rather than on
the probability. For the logit model:
0
ex β p 0
p= x 0 β =⇒ = ex β (16)
1+e 1−p
p
ln = x 0β (17)
1−p
p/(1 − p) measures the probability that y = 1 relative to the probability that
y = 0 and is called odds ratio or relative risk.
Example: Odds ratios are multiplicative effects. A odd ratio of 1.07
corresponds to a, increase of 7%. A logit model slope parameter of 0.1
means that a one-unit increase in the regressor increases the odds ratio by a
multiple 0.1
Théophile T. Azomahou (CERDI) Février 20-28, 2020 18 / 42
Binary Outcomes Models
where unlike the logit, the weight wi = φ(x 0i β)/[Φ(x 0i β)(1 − Φ(x 0i β))] varies
across observations. The probit model marginal effects are
∂pi
= φ(x 0i β)βk (21)
∂xik
without no further simplification. – Exercise 3: Compute the asymptotic
variance of βb
ML for the probit specification.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 19 / 42
Binary Outcomes Models
y ∗ = x 0β + u (25)
y∗ = x 0β + u (27)
y = 1[y ∗ >0]
where 1[·] denotes the indicator function which is 1 if the condition is fulfilled and
0 if not. Given (27),
where F is the CDF of −u, which equals the CDF of u in the usual case of
density symmetric around 0.
4.2 Endogeneity
4.2.1 Continuous endogenous explanatory variable
We assume that at least one continuous explanatory variable is correlated with
the error term in the latent equation. One option can be to estimate a linear
probability model by 2SLS (with instrumental variables, IV).
Let’s rather consider the case of estimating a latent probit model:
y1∗ = z 1 δ1 + α1 y2 + u1 (30)
y2 = z 1 δ21 + z 2 δ22 + v2 = zδ2 + v2 (31)
y1 = 1 [y1∗ >0] (32)
where the couple error terms (u1 , v2 ) has zero mean, bivariate normal
distribution and is independent of z. Equations (30) and (32) are the structural
equations. Equation (31) is the reduced form for y2 which is endogenous if u1
and v2 are correlated.
If u1 and v2 are independent, there is no endogeneity issue. Because v2 is
normally distributed, y2 |z is normal and thus, y2 has characteristics of a random
normal variable (for ex. y2 should not be discrete). The model is valid when y2 is
correlated with u1 because of omitted variables or measurement errors.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 27 / 42
Binary Outcomes Models
It follows that:
!
z 1 δ1 + α1 y2 + θ1 v2
P(y1 = 1|z, y2 , v2 ) = Φ p (35)
1 − ρ21
p
2
The probit estimation
p p estimates of δρ1 ≡ δ1 /( 1 − ρ1 ),
provides consistent
2 2
αρ1 ≡ α1 /( 1 − ρ1 ) and θρ1 ≡ θ1 /( 1 − ρ1 ). Since δ2 is unknown, we must
first estimate it. This leads to the two-step estimation algorithm:
Endogeneity test. The usual probit t statistic on v̂2 is a valid test of the null
hypothesis that y2 is exogenous.
y∗ = x 0i β + γωi + εi (39)
yi = 1[yi∗ >0] (40)
E (εi |ωi ) = g (ωi ) 6= 0 (41)
However, yi∗ is not observed, but yi is. One approach used by Bertschek and
Lechner (1998) is to assume that ζi is orthogonal to the residual
[yi − Φ(x 0i β + γωi )].
Théophile T. Azomahou (CERDI) Février 20-28, 2020 31 / 42
Binary Outcomes Models
One also needs features from truncated results. Indeed, if v2 has normal
distribution and is independent of z, then the density of v2 given v2 > −zδ2
(meaning y2 = 1) is:
φ(v2 ) φ(v2 )
=
P(v2 > −zδ2 ) Φ(zδ2 )
We can establish:
P11 (y1 = 1|y2 = 1, z) = E [P(y1 = 1|v2 , z)|y2 = 1, z]
" ! #
z 1 δ1 + α1 y2 + ρ1 v2
= E Φ p | y2 = 1, z (45)
1 − ρ21
Z ∞ !
1 z 1 δ1 + α1 y2 + ρ1 v2
= Φ p φ(v2 )dv2
Φ(zδ2 ) −zδ2 1 − ρ21
Observe that P(y1 = 0|y2 = 1, z) is 1 − P(y1 = 1|y2 = 1, z).
Similarly, we have:
!
Z −zδ2
1 z 1 δ1 + α1 y2 + ρ1 v2
P10 (y1 = 1|y2 = 0, z) = Φ p φ(v2 )dv2
1 − Φ(zδ2 ) −∞ 1 − ρ21
(46)
Théophile T. Azomahou (CERDI) Février 20-28, 2020 33 / 42
Binary Outcomes Models
Combining the four possible outcomes of (y1 , y2 ) in the probit specification for y2 ,
the log-likelihood function to maximize (non trivial task!) is:
N
X
ln L(δ1 , α1 , δ2 , ρ1 ) = [y1i y2i ln P11i + (1 − y1i ) ln P01i (47)
i=1
+ y1i (1 − y2i ) ln P10i + (1 − y1i )(1 − y2i ) ln P00i ]
where
5. Goodness of fit
Because the hypothesis that all slopes in the model are zero is often
interesting, report the log-likelihood computed with only a constant term
(ln L0 ), and the log-likelihood of the full specification (ln L).
McFadden’s (1974) likelihood ratio index:
ln L
R2 = 1 −
ln L0
ŷ = 1 if F̂ > F ∗
6. Semiparametric estimation
Estimation under weaker assumptions
Semiparametric conditional mean estimation
E (yi |x i ) = G (x i )
where G (x i ) is unknown. Kernel regression can do the job. See Figure (1).
7. Exercise
Data: South African Labour force Survey (LFS) from 2001 to 2004. It is a
twice-yearly household panel survey, a nationwide survey that focuses on
labour market issues and collects also information on demographics
characteristics and education. We consider individuals aged 15 or more who
are being employed at the time of the survey.
The employees are fully informal if the place in which they work does not
comply with any of these criteria. They are considered as partially informal
when the place in which they work complies with at least one of these
criteria. When both criteria are met, the employees are considered as formal.