0% found this document useful (0 votes)
47 views5 pages

Problem Set 7

The document examines models for binary dependent variables using examples of women's labor force participation and smoking behavior. It discusses the linear probability model and introduces the logit model. Several problems with the linear probability model are highlighted and the logit model is presented as an alternative that addresses these issues.

Uploaded by

Luca Vanz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views5 pages

Problem Set 7

The document examines models for binary dependent variables using examples of women's labor force participation and smoking behavior. It discusses the linear probability model and introduces the logit model. Several problems with the linear probability model are highlighted and the logit model is presented as an alternative that addresses these issues.

Uploaded by

Luca Vanz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Universidad Carlos III de Madrid

ME-MIEM
Econometrics
Binary Dependent Variable Models
Problem Set 7

1. We want to examine the determinants of women’s labour supply. Using the 1998 Labor Force Survey, we select
women aged between 18 and 55 and distinguish between women who choose inactivity and activity (i.e. being
employed or searching for a job). Di¤erent determinants can explain such a choice: age, education, living place,
children, marital status, ... We propose di¤erent methods to examine the e¤ects of these variables on labour
supply behavior.
Let Yi = 0, 1 be the labour supply behavior of a woman, Yi = 1 if she is active on the Labour Market, 0 if not.
The linear probability model consists in considering the binary variable Yi as a continuous variable. Table 1
presents the results of the OLS regression of Yi on Xi :

(a) What are the assumption on …rst order moments of the regression residual with this setting? Show that
the model can not be homoskedastic.
(b) Exhibit some situations for which the expected probability of being active is lower than zero or greater
than 1. Is it a problem?
(c) Table 2 present the main quantiles of the predicted Y^i = Xi0 ^ when the woman is active (Yi = 1) and
when she is not (Yi = 0). What do you conclude?
(d) To overcome these problems, we consider that each variable Yi is drawn from a discrete Bernoulli dis-
tribution B ( i ; 1) where i depends on Xi0 , i = H (Xi0 ) ; H is assumed to be known. It is strictly
increasing over R and stands strictly between 0 and 1: Why does this speci…cation overcome the problems
stated above?
(e) How can be estimated?
(f) Consider the following model:
Yi = Xi + ui
ui being drawn from a cdf F;
Yi = 1 if Yi 0; Yi = 0 if Yi < 0:
Show that this model is equivalent to the previous one. What is the relationship between F and H? How
would you interpret the variable Yi ?

1
Table 1: Linear Probability Model
Variable Estimate Std Error t stat Pr > jtj
Intercept 0.30 0.04 7.63 <. 0001
No diploma Ref.
BEPC 0.10 0.01 14.00 <. 0001
CAP. BEP or similar 0.13 0.01 24.73 <. 0001
Baccalaureat 0.17 0.01 26.55 <. 0001
Licence 0.21 0.01 30.56 <. 0001
Master or more 0.24 0.01 28.99 <. 0001
Studying 0.21 0.02 9.73 <. 0001
Partner’s occupation
Farmer 0.07 0.01 6.03 <. 0001
Craftsman. Shopkeeper 0.02 0.01 2.22 0.0263
Executives -0.07 0.01 -9.89 <. 0001
Intermediate profession 0.01 0.01 1.23 0.217
White-collar worker 0.04 0.01 7.39 <. 0001
Blue-collar worker Ref.
Pensioner -0.14 0.01 -11.29 <. 0001
Other inactive -0.33 0.01 -37.70 <. 0001
Rural city Ref.
Urban area with less than 20.000 -0.03 0.01 -4.37 <. 0001
Urban area between 20.000 and 200.000 -0.02 0.01 -4.27 <. 0001
Urban area with more than 200.000 -0.04 0.01 -7.24 <. 0001
Paris area 2.68E-03 0.01 0.43 0.6661
Age 0.03 0.00 14.10 <. 0001
Age square -3.84E-04 0.00 -15.17 <. 0001
Partner’s age 2.96E-03 0.00 2.05 0.0408
Partner’s age square -3.90E-05 0.00 -2.37 0.0176
Living with a partner 0.08 0.00 16.05 <. 0001
Having a child under 3 -0.18 0.01 -26.48 <. 0001
Having a child under 18 -0.08 0.00 -39.81 <. 0001
R2 0.19
n 39706

Table 2: Percentile of Y^i = Xi0 ^


Percentile Y =0 Y =1
100% Max 1.19 1.20
99% 0.98 1.12
95% 0.90 1.06
90% 0.85 1.02
75% Q3 0.77 0.94
50% Median 0.67 0.84
25% Q1 0.54 0.74
10% 0.41 0.64
5% 0.33 0.56
1% 0.14 0.40
0% Min -0.18 -0.08

2
2. The variable smokes is a binary variable that takes the value one if a person smokes and zero otherwise. Using
the SMOKE database, a linear probability model is estimated for smokes;
d
smokes = 0:656 0:069 log (cigpric) + 0:012 log (income) 0:029 educ
(:855) (:204) (:026) (:006)
[:856] [:207] [:026] [:006]
+ 0:020 age 0:00026 age2 0:101 restaurn 0:026 white
(:006) (:00006) (:039) (:052)
[:005] [:00006] [:038] [:050]
n = 807; R2 = 0:062

The variable white equals one if the sampled individual is white and zero otherwise, and restaurn equals 1
if the sampled individual lives in an state with restrictions to smoke in restaurants. Both the usual standard
error, ( ) as well as the robust to heteroskedasticity standard error [ ] are reported.

(a) Are any important di¤erences among the two standard errors?
(b) Holding other factors …xed, if education increases in four years, which happens with the estimated prob-
ability of smoking?
(c) From which age on, an additional year reduces the probability of smoking?
(d) Interpret the coe¢ cient of the binary variable restaurn:
(e) The individual number 206 of the sample has the following characteristics: cigpric = 67:33; income =
6500; educ = 16; age = 77; restaurn = 0; white = 0 and smokes = 0: Calculate the estimated probability
of smoking for this person and comment the results.

3. 400 candidates to obtain the driving licence were randomly chosen and were asked whether they pass the
exam (P assi = 1) or they failed (P assi = 0); furthermore information on their gender (M alei = 1 if man or
M alei = 0 if woman) were registered, as well as their driving experience (Experiencei in years).
Dependent Variable: P ass
(1) (2) (3)
Experience 0:006 0:007
(0:002) (0:031)
M ale 0:071 0:22
(0:340) (0:061)
M ale Experience 0:003
(0:004)
Intercept 0:774 0:900 0:790
(0:034) (0:022) (0:020)

Using column (1):

(a) Does the probability of passing the exam depend on experience? Explain your answer.
(b) Matthew has 10 years of driving experience, which is the probability of passing the exam?
(c) Christopher is a novel driver (zero years of experience). Which is the probability of Matthew passing the
exam?
(d) Make a graph of the estimated probabilities for this linear probability model in terms of the variable
Experience for values between 0 and 60: Do you think the linear probability model is adequate in this
case? Why or why not?
(e) The sample includes values of the variable Experience between 0 and 40 years, and only four people in
the sample have more than 30 years of driving experience. Jed is 95 years old and he has been driving
since he was 15 years old. Which is the model prediction about the probability of Jed passing the exam?
Interpret the result.
Using model (2)
(f) Obtain the estimated probabilities of passing the exam for men and women. Which relationship have
these estimations with the sample proportions of men and women passing the exam?
Using model (3)

3
(g) Akira is a man with 10 years of driving experience. Which is the probability of Akira passing the exam?
(h) Jane is a women with 2 years of driving experience. Which is the probability of Jane passing the exam?
(i) Does the e¤ect of the variable Experience on the exam results depend on the gender? Explain your answer.

4. Let grad be a dummy variable for whether a student-athlete at a large university graduates in …ve years. Let
hsGP A and SAT be high school grade point average and SAT score. Let study be the number of hours spent
per week in organized study hall. Suppose that, using data on 420 student-athletes, the following logit model
is obtained:
c
Pr(grad = 1jhsGP A; SAT; study) = ( 1:17 + :24hsGP A + :00058SAT + :073study)

where (z) = exp (z) = [1 + exp (z)] = 1= [1 + exp ( z)] is the logit function.

(a) Holding hsGP A …xed at 3.0 and SAT …xed at 1,200, compute the estimated di¤erence in the graduation
probability for someone who spent 10 hours per week in study hall and someone who spent 11 hours per
week.
(b) Compute the odds of graduating for each type of student and interpret the odds ratio.

5. Use the data in LOANAPP.RAW from Wooldridge for this exercise. The binary variable to be explained is
approve, which is equal to one if a mortgage loan to an individual was approved. The key explanatory variable
is white, a dummy variable equal to one if the applicant was white. The other applicants in the data set are
black and Hispanic.
To test for discrimination in the mortgage loan market, a linear probability model can be used:

approve = 0 + 1 white + other factors:

(a) If there is discrimination against minorities, and the appropriate factors have been controlled for, what is
the sign of 1 ?
(b) Regress approve on white and report the results in the usual form. Interpret the coe¢ cient on white. Is
it statistically signi…cant? Is it practically large?
(c) As controls, add the variables hrat; obrat; loanprc, unem; male; married, dep; sch; cosign, chist; pubrec,
mortlat1; mortlat2, and vr. What happens to the coe¢ cient on white? Is there still evidence of discrim-
ination against nonwhites?
(d) Now allow the e¤ect of race to interact with the variable measuring other obligations as a percent of
income (obrat). Is the interaction term signi…cant?
(e) Using the model from part (d), what is the e¤ect of being white on the probability of approval when
obrat = 32, which is roughly the mean value in the sample? Obtain a 95% con…dence interval for this
e¤ect.
(f) Estimate a logit model of approve on white. Find the estimated probability of loan approval for both
whites and nonwhites. How do these compare with the linear probability estimates?
(g) Now, add the variables hrat; obrat; loanprc; unem; male; married, dep; sch; cosign, chist; pubrec, mortlat1; mortlat2,
and vr to the logit model. Is there statistically signi…cant evidence of discrimination against nonwhites?
(h) Estimate the model in (g) by logit with the interaction of white with obrat and calculate the OR for being
white compared to nonwhite when obrat = 32.

ANSWERS

1. .
2. .

4
(a) No.
(b) It reduces by 0.116.
(c) 38.5.
(d) .
(e) The probability is 0:0053:

3. (a) Yes.
(b) 0.847.
(c) 0.762.
(d) .
(e) 1.245.
(f) Men: 0.829. Women: 0.900. They are the same.
(g) 0:61:
(h) 0:804:
(i) No.

You might also like