1 - Binary Dependent Variable Models
1 - Binary Dependent Variable Models
IMS
Information
NOVA
Management
School
Binary Dependent
IMS Variable Models.
Information
Management
School Econometrics II
Bruno Damásio
[email protected]
@bmpdamasio
Carolina Vasconcelos
[email protected]
@vasconceloscm
2022/2023
Nova Information Management School
NOVA University of Lisbon
Instituto Superior de Estatística e Gestão da Informação
Universidae Nova de Lisboa
NOVA
IMS Table of contents i
Information
Management
School
1
NOVA
IMS
Information
Management
School
2
NOVA
IMS Context ii
Information
Management
School
Example
The dependent variable can indicate:
3
NOVA
IMS Linear Probability Model i
Information
Management
School
4
NOVA
IMS Linear Probability Model ii
Information
Management
School
which says that the probability of success, say, p(x) = P(y = 1|x), is
a linear function of xj .
5
NOVA
IMS Linear Probability Model i
Information
Management
School
6
NOVA
IMS Interpreting
Information
Management
School
7
NOVA
IMS Interpreting
Information
Management
School
library(wooldridge)
summary(lm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, data = mroz))
##
## Call:
## lm(formula = inlf ~ nwifeinc + educ + exper + expersq + age +
## kidslt6 + kidsge6, data = mroz)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.93432 -0.37526 0.08833 0.34404 0.99417
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5855192 0.1541780 3.798 0.000158 ***
## nwifeinc -0.0034052 0.0014485 -2.351 0.018991 *
## educ 0.0379953 0.0073760 5.151 3.32e-07 ***
## exper 0.0394924 0.0056727 6.962 7.38e-12 ***
## expersq -0.0005963 0.0001848 -3.227 0.001306 **
## age -0.0160908 0.0024847 -6.476 1.71e-10 ***
## kidslt6 -0.2618105 0.0335058 -7.814 1.89e-14 ***
## kidsge6 0.0130122 0.0131960 0.986 0.324415
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4271 on 745 degrees of freedom
## Multiple R-squared: 0.2642,^^IAdjusted R-squared: 0.2573
## F-statistic: 38.22 on 7 and 745 DF, p-value: < 2.2e-16
8
NOVA
IMS Interpreting
Information
Management
School
9
NOVA
IMS Interpreting
Information
Management
School
10
NOVA
IMS Interpreting
Information
Management
School
11
NOVA
IMS Estimating probabilities
Information
Management
School
12
NOVA
IMS Estimating probabilities
Information
Management
School
## [1] 0.3090346
## [1] -0.3292861
13
NOVA
IMS Main Issues
Information
Management
School
14
NOVA
IMS Main Issues
Information
Management
School
Due to the binary nature of y, the linear probability model does violate
one of the Gauss-Markov assumptions. When y is a binary variable, its
variance, conditional on x, is
15
NOVA
IMS
Information
Management
School
16
NOVA
IMS Limited Dependent Variable Models ii
Information
Management
School
17
NOVA
IMS Logit and Probit Models
Information
Management
School
18
NOVA
IMS Logit Models
Information
Management
School
which is between zero and one for all real numbers z. This is the
cumulative distribution function (cdf) for a standard logistic random
variable.
19
NOVA
IMS Probit Models
Information
Management
School
20
NOVA
IMS Logit and Probit Models
Information
Management
School
21
NOVA
IMS Logit and Probit Models
Information
Management
School
1.00
0.75
0.25
0.00
23
NOVA
IMS MLE estimator
Information
Management
School
• Up until now, we have had little need for MLE, although we did note
that, under the classical linear model assumptions, the OLS
estimator is the maximum likelihood estimator (conditional on the
explanatory variables);
• Because of the nonlinear nature of E[y|x], OLS and WLS are not
applicable;
• For estimating limited dependent variable models, maximum
likelihood methods are indispensable.
24
NOVA
IMS MLE estimator
Information
Management
School
25
NOVA
IMS MLE estimator
Information
Management
School
26
NOVA
IMS MLE estimator - Example
Information
Management
School
##
## Call: glm(formula = y ~ x1, family = binomial(link = "probit"))
##
## Coefficients:
## (Intercept) x1
## -0.0316 1.4416
##
## Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
## Null Deviance:^^I 1386
## Residual Deviance: 1048 ^^IAIC: 1052
27
NOVA
IMS MLE estimator - Example
Information
Management
School
28
NOVA
IMS MLE estimator
Information
Management
School
29
NOVA
IMS Logit and Probit Models
Information
Management
School
30
NOVA
IMS Logit and Probit Models
Information
Management
School
∂p(x)
= g(x′i β)βj , (10)
∂xj
where g(z) ≡ ∂G
∂z (z).
• Because G is the cdf of a continuous random variable, g is a
probability density function (pdf).
31
NOVA
IMS Logit and Probit Models
Information
Management
School
• In the logit and probit cases, G(·) is a strictly increasing cdf, and so
g(z) > 0 for all z.
• Therefore, the partial effect of xj on p(x) depends on x through the
positive quantity g(x′i β), which means that the partial effect always
has the same sign as βj .
32
NOVA
IMS Logit and Probit Models
Information
Management
School
33
NOVA
IMS Partial effects
Information
Management
School
34
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School
35
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School
library(mfx)
logitmfx(inlf ~ nwifeinc + educ + exper +
I(exper^2) + age + kidslt6 + kidsge6, data = mroz)
## Call:
## logitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00519005 0.00204820 -2.5340 0.011278 *
## educ 0.05377731 0.01056074 5.0922 3.539e-07 ***
## exper 0.05005693 0.00782462 6.3974 1.581e-10 ***
## I(exper^2) -0.00076692 0.00024768 -3.0965 0.001959 **
## age -0.02140302 0.00353973 -6.0465 1.480e-09 ***
## kidslt6 -0.35094982 0.04963897 -7.0700 1.549e-12 ***
## kidsge6 0.01461621 0.01818832 0.8036 0.421625
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
36
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School
## Call:
## probitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00469619 0.00192965 -2.4337 0.014945 *
## educ 0.05112843 0.00992310 5.1525 2.571e-07 ***
## exper 0.04817690 0.00734505 6.5591 5.413e-11 ***
## I(exper^2) -0.00073705 0.00023464 -3.1412 0.001683 **
## age -0.02064309 0.00330485 -6.2463 4.203e-10 ***
## kidslt6 -0.33914996 0.04634765 -7.3175 2.526e-13 ***
## kidsge6 0.01406306 0.01719895 0.8177 0.413546
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
37
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School
38
NOVA
IMS Average partial effect (APE)
Information
Management
School
which results from averaging the individual partial effects across the
sample.
39
NOVA
IMS Average partial effect (APE)
Information
Management
School
## Call:
## logitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz, atmean = FALSE)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00381181 0.00153898 -2.4769 0.013255 *
## educ 0.03949652 0.00846811 4.6641 3.099e-06 ***
## exper 0.03676411 0.00655577 5.6079 2.048e-08 ***
## I(exper^2) -0.00056326 0.00018795 -2.9968 0.002728 **
## age -0.01571936 0.00293269 -5.3600 8.320e-08 ***
## kidslt6 -0.25775366 0.04263493 -6.0456 1.489e-09 ***
## kidsge6 0.01073482 0.01339130 0.8016 0.422769
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
40
NOVA
IMS Average partial effect (APE)
Information
Management
School
## Call:
## probitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz, atmean = FALSE)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00361618 0.00146972 -2.4604 0.013876 *
## educ 0.03937009 0.00726571 5.4186 6.006e-08 ***
## exper 0.03709734 0.00516823 7.1780 7.076e-13 ***
## I(exper^2) -0.00056755 0.00017708 -3.2050 0.001351 **
## age -0.01589566 0.00235868 -6.7392 1.592e-11 ***
## kidslt6 -0.26115346 0.03190239 -8.1860 2.700e-16 ***
## kidsge6 0.01082889 0.01322413 0.8189 0.412859
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
41
NOVA
IMS Effects of discrete variable
Information
Management
School
42
NOVA
IMS Effects of discrete variable
Information
Management
School
p1 = exp(sum(coef(mod1)*x1))/(1+exp(sum(coef(mod1)*x1)))
p0 = exp(sum(coef(mod1)*x0))/(1+exp(sum(coef(mod1)*x0)))
p1-p0
## [1] -0.3444345
43
NOVA
IMS Effects of discrete variable
Information
Management
School
p1 = pnorm(sum(coef(mod2)*x1))
p0 = pnorm(sum(coef(mod2)*x0))
p1-p0
## [1] -0.334577
44
NOVA
IMS Statistical Inference
Information
Management
School
H0 : c(β) = 0
where:
45
NOVA
IMS 3 classical test principles
Information
Management
School
46
NOVA
IMS 3 classical test principles
Information
Management
School
47
NOVA
IMS Wald test
Information
Management
School
48
NOVA
IMS LR test
Information
Management
School
49
NOVA
IMS LR test
Information
Management
School
y = β0 + β1 x1 + · · · + βk xk + u
y = β0 + β1 x1 + · · · + βk−q xk−q + u
50
NOVA
IMS LR test
Information
Management
School
• Formally, we have
Lur a
LR = 2log = 2(Lur − Lr ) ∼ χ2q
Lr
where:
• Lur is the log-likelihood of the unrestricted model, Lr the
log-likelihood of the restricted model and q the number of
restrictions under the null.
• Lur is the likelihood of the unrestricted model, Lr the likelihood of
the restricted model and q the number of restrictions under the null.
51
NOVA
IMS LR test
Information
Management
School
library(lmtest)
full <- glm(inlf ~ nwifeinc + educ + exper +
I(exper^2) + age + kidslt6 + kidsge6,
family = binomial(link='logit'), data = mroz)
lrtest(full, reduced)
52
NOVA
IMS LR test
Information
Management
School
Deviance = 2(LS − LM )
where
• LS denotes the maximized log-likelihood value for the most complex
model possible (a saturated model).
• LM denotes maximized log-likelihood value for a model M of interest.
• It can easily be shown that
53
NOVA
IMS LR test
Information
Management
School
full
##
## Call: glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age +
## kidslt6 + kidsge6, family = binomial(link = "logit"), data = mroz)
##
## Coefficients:
## (Intercept) nwifeinc educ exper I(exper^2) age
## 0.425452 -0.021345 0.221170 0.205870 -0.003154 -0.088024
## kidslt6 kidsge6
## -1.443354 0.060112
##
## Degrees of Freedom: 752 Total (i.e. Null); 745 Residual
## Null Deviance:^^I 1030
## Residual Deviance: 803.5 ^^IAIC: 819.5
reduced
##
## Call: glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age,
## family = binomial(link = "logit"), data = mroz)
## 54
## Coefficients:
NOVA
IMS LM test
Information
Management
School
55
NOVA
IMS Classification Tables
Information
Management
School
56
NOVA
IMS Classification Tables
Information
Management
School
Estimated values
Observed values Total
ŷi = 1 ŷi = 0
yi = 1 n11 n10 n1·
yi = 0 n01 n00 n0·
Total n·1 n·0 n
57
NOVA
IMS Classification Tables
Information
Management
School
58
NOVA
IMS Information criteria
Information
Management
School
Two common measures that are baed on the same logic as the adjusted
R-squared for the linear model are:
59
NOVA
IMS Pseudo R-squared
Information
Management
School
60