Module 6A
Module 6A
MYASAR
SP 2025
Let’s use the data from Mroz (1987) and estimate a linear
probability model, where 428 out of 753 women in the
sample report being in the labor force at some point during
1975
Let inlf =1 if the woman reports working for a wage and zero
otherwise
Assume that the labor force participation depends on the
following:
other sources of income (nwifeinc, in $1000)
years of education (educ)
experience (exper)
Age
number of children less than six years old (kidslt6)
number of kids between 6 and 18 years of age (kidsge6)
use mroz, replace
reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6, r
BSAD 6318-ECON 5339, SP 2025 7
Source SS df MS Number of obs = 753
F( 7, 745) = 38.22
Model 48.8080578 7 6.97257969 Prob > F = 0.0000
Residual 135.919698 745 .182442547 R-squared = 0.2642
Adj R-squared = 0.2573
Total 184.727756 752 .245648611 Root MSE = .42713
The estimated slope coefficient (betaj) is the impact of a unit change in that
explanatory variable (xj) on the probability that Y=1
The coefficient on educ indicates that an extra year of education increases
the probability of labor force participation by 0.038 or by 3.8 percentage
points, ceteris paribus
gen PI=s46/100
BSAD 6318-ECON 5339, SP 2025 10
Example 3: Binary Dependent Variables
regress den PI, r
. regress den PI, r
Robust
den Coefficient std. err. t P>|t| [95% conf. interval]
Note that the estimated coefficient on the PI ratio is positive (0.604) and
significant at a .01 significance level. Thus, those with higher payments as a
fraction of income are more likely to have their application denied
For example, if the PI ratio is .10, then the probability of den increases by
.604*.10*100, by almost 6 percentage points.
11
BSAD 6318-ECON 5339, SP 2025
Example 3: Binary Dependent Variables
Now, let's compute the predicted den probabilities as a
function of the PI ratio
If, for instance, the PI ratio is .30, the predicted value from
the predicted equation is
-0.08+0.604*.30 = 0.101
An applicant whose projected debt payments are 30% of
his/her income has a probability of 0.101 that his/her
application will be denied.
Robust
den Coefficient std. err. t P>|t| [95% conf. interval]
The coefficient on race is 0.177, which indicates that a black applicant has a 17.7
percentage points higher probability of having a mortgage application denied
than the control group, holding PI constant.
But keep in mind that we do not control for many variables. Thus, this difference
may change as we add more explanatory variables. This is just a simple example.
Robust
den Coefficient std. err. t P>|t| [95% conf. interval]
. margins, dydx(*)
Delta-method
dy/dx std. err. t P>|t| [95% conf. interval]
01
!"#$%&'(#
/01
/I
+ I - .
)*
This graph illustrates that for a given value of x (PI), there are two possible
values of the residual, indicating that the variance of the error term in the LPM
is heteroskedastic BSAD 6318-ECON 5339, SP 2025 19
Limitations of LPM: Heteroskedasticity
The OLS estimators are unbiased if x variables are uncorrelated with the
explanatory variables
However, the errors are heteroskedastic
From Var(y/x)=p(x)[1-p(x)]
Thus, Var(u) can take on different values for different observations
There will be a heteroskedasticity in the LPM, except in the case where the
probability does not depend on any of the independent variables
The dependent variable takes on only 0 or 1 for given values of the
independent variables. Thus, the error term (u) will also take on only these two
values
When den (yi) =1, ui = 1 – b0-b1*PIi – b2*blacki ---For y to be equal to 1
When den (yi) =0, ui = 0 – b0-b1*PIi – b2*blacki ---For y to be equal to 0
Thus, the distribution of u has only two specific values
Since u (specific two values) change with the explanatory variables, the error
term cannot be assumed to be homoskedastic
gen prob=-0.091+.559*PI+0.177*race
tab prob
/2P
!"##$%&'(F*$+
2P
. / 0 1
I-
Wooldridge (2009)
Stock and Watson (2005)