0% found this document useful (0 votes)

3 views25 pages

Module 6A

The document discusses the Linear Probability Model (LPM) for binary dependent variables, explaining how to estimate probabilities using linear regression when the outcome is binary (0 or 1). It provides examples of applying LPM to real data, including factors affecting labor force participation and mortgage application denials, while also addressing the limitations of LPM, such as heteroskedasticity and omitted variable bias. The document emphasizes that the coefficients in LPM represent changes in the probability of success as explanatory variables change.

Uploaded by

chingchangwafang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views25 pages

Module 6A

Uploaded by

chingchangwafang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

BSAD 6318/ ECON 5339

MYASAR
SP 2025

Module 6A: Linear Probability Model

BSAD 6318-ECON 5339, SP 2025

1
Binary Dependent Variables

A dependent variable we use can be binary

It takes on only two values: 0 and 1
Whether a person used public or private transportation
to work
Whether a firm exported during the year or not
Whether another firm took over a firm during a given
year
In each case, the dependent variable is coded as a binary
variable
For instance, the dependent variable y =1 if the firm
exported during a given year and y=0 otherwise

BSAD 6318-ECON 5339, SP 2025 2

Binary Dependent Variables
Suppose we are interested in estimating the following model
when y is a binary variable.
" = !! + !" #" + ### + ! ! #! + $ $"%
Since the dependent variable (y) takes on only two values (0
or 1), betaj cannot be interpreted as before (can not be
interpreted as the change in y given a one-unit increase in xj,
cp)
n y either changes from 0 to 1 or from 1 to 0 or does not change
If we assume the zero conditional mean assumption
(E(u\x1,….xk)=0, then we have
" # # $ $% = !! + !"$" + &&& + !! $! #'%

Where x is shorthand for all of the explanatory variables 3

Binary Dependent Variables
When the dependent variable y is binary (y =1 or 0),
P(y=1\x) = E(y/x), meaning that the probability of success (the
probability that y = 1) is the same as the expected value of y
Thus, we can write our model as follows:
"# # = "$ $% = !! + !" $" + &&& + !! $! ' #(%
which indicates that the probability of success (p(x) =
P(y=1\x)) is a linear function of x
This is an example of a binary response model.

When the dependent variable is binary, the multiple regression

model is called the Linear Probability Model (LPM) because the
response probability (P(y=1/x)) is linear in the parameters
betaj

BSAD 6318-ECON 5339, SP 2025 4

Binary Dependent Variables
Using the zero conditional mean assumption, one can show
that
" # # $ $% = !! + !" $" + &&& + ! ! $!
The probability that Yi = 1 is P = Pr (Yi = 1)
and the probability that Yi = 0 is 1 - P = Pr (Yi = 0)
So, since Yi only takes on the values 0 and 1, it has a Bernoulli
distribution- a binomial distribution with n=1. Thus,
E(Yi) = 1 × Pi + 0 × (1 - Pi) = Pi
V(Yi)=sqrt(Pi(1-Pi))
Probability models aim to examine the determinants of Pi, the
probability that Yi=1 rather than Yi=0.
The predicted probability Pi_hat can be determined as
Yi_hat for a given value of xi.
BSAD 6318-ECON 5339, SP 2025 5
Binary Dependent Variables: Interpretation
In the LPM, Betaj measures the change in the
probability of success when xj changes, ceteris
paribus
"# ! $! = "# %$ = ! " "%!"
If you rewrite the estimated equation as
# = !# + !# $ + $$$ + !# $
! ! " !" " !"

y_hat is the predicted probability of success

Beta0 is the predicted probability of success when xj is
zero
Bataj measures the predicted change in probability of
success when x1 increases by one unit
BSAD 6318-ECON 5339, SP 2025 6
Example: Binary Dependent Variables

Let’s use the data from Mroz (1987) and estimate a linear
probability model, where 428 out of 753 women in the
sample report being in the labor force at some point during
1975
Let inlf =1 if the woman reports working for a wage and zero
otherwise
Assume that the labor force participation depends on the
following:
other sources of income (nwifeinc, in $1000)
years of education (educ)
experience (exper)
Age
number of children less than six years old (kidslt6)
number of kids between 6 and 18 years of age (kidsge6)
use mroz, replace
reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6, r
BSAD 6318-ECON 5339, SP 2025 7
Source SS df MS Number of obs = 753
F( 7, 745) = 38.22
Model 48.8080578 7 6.97257969 Prob > F = 0.0000
Residual 135.919698 745 .182442547 R-squared = 0.2642
Adj R-squared = 0.2573
Total 184.727756 752 .245648611 Root MSE = .42713

inlf Coef. Std. Err. t P>|t| [95% Conf. Interval]

nwifeinc -.0034052 .0014485 -2.35 0.019 -.0062488 -.0005616

educ .0379953 .007376 5.15 0.000 .023515 .0524756
exper .0394924 .0056727 6.96 0.000 .0283561 .0506287
expersq -.0005963 .0001848 -3.23 0.001 -.0009591 -.0002335
age -.0160908 .0024847 -6.48 0.000 -.0209686 -.011213
kidslt6 -.2618105 .0335058 -7.81 0.000 -.3275875 -.1960335
kidsge6 .0130122 .013196 0.99 0.324 -.0128935 .0389179
_cons .5855192 .154178 3.80 0.000 .2828442 .8881943

The estimated slope coefficient (betaj) is the impact of a unit change in that
explanatory variable (xj) on the probability that Y=1
The coefficient on educ indicates that an extra year of education increases
the probability of labor force participation by 0.038 or by 3.8 percentage
points, ceteris paribus

BSAD 6318-ECON 5339, SP 2025 8

Example 2: Binary Dependent Variables

BSAD 6318-ECON 5339, SP 2025 9

Example 2: Binary Dependent Variables

Use hmda_sw2.dta", replace

Let's first create a binary variable
den=1 if loan application is denied, and den = 0 if loan is originated or
application is approved but not accepted by the applicant.
See hmda.doc file for the variable definitions
gen den=0
replace den=1 if s7==3
(S13) Applicant race
1 –3 – Black; 5 – White;
gen race=0
replace race=1 if s13==3

*s46 is the debt-to-income ratio (the banks’ calculation of total

*obligations/income)

gen PI=s46/100
BSAD 6318-ECON 5339, SP 2025 10
Example 3: Binary Dependent Variables
regress den PI, r
. regress den PI, r

Linear regression Number of obs = 2,380

F(1, 2378) = 37.56
Prob > F = 0.0000
R-squared = 0.0397
Root MSE = .31828

Robust
den Coefficient std. err. t P>|t| [95% conf. interval]

PI .6035349 .0984826 6.13 0.000 .4104144 .7966555

_cons -.0799096 .0319666 -2.50 0.012 -.1425949 -.0172243

Note that the estimated coefficient on the PI ratio is positive (0.604) and
significant at a .01 significance level. Thus, those with higher payments as a
fraction of income are more likely to have their application denied
For example, if the PI ratio is .10, then the probability of den increases by
.604*.10*100, by almost 6 percentage points.
11
BSAD 6318-ECON 5339, SP 2025
Example 3: Binary Dependent Variables
Now, let's compute the predicted den probabilities as a
function of the PI ratio
If, for instance, the PI ratio is .30, the predicted value from
the predicted equation is
-0.08+0.604*.30 = 0.101
An applicant whose projected debt payments are 30% of
his/her income has a probability of 0.101 that his/her
application will be denied.

BSAD 6318-ECON 5339, SP 2025 12

Example 3: Binary Dependent Variables
Now, let's examine the effect of race on the probability of denial, holding the
PI constant
regress den PI race, vce (r)
. regress den PI race, vce(r)

Linear regression Number of obs = 2,380

F(2, 2377) = 49.39
Prob > F = 0.0000
R-squared = 0.0760
Root MSE = .31228

Robust
den Coefficient std. err. t P>|t| [95% conf. interval]

PI .5591946 .0886663 6.31 0.000 .3853233 .7330658

race .1774282 .0249463 7.11 0.000 .1285096 .2263469
_cons -.0905136 .0285996 -3.16 0.002 -.1465963 -.0344309

The coefficient on race is 0.177, which indicates that a black applicant has a 17.7
percentage points higher probability of having a mortgage application denied
than the control group, holding PI constant.
But keep in mind that we do not control for many variables. Thus, this difference
may change as we add more explanatory variables. This is just a simple example.

BSAD 6318-ECON 5339, SP 2025

13
The rest of the material is optional. We will discuss it in the second course next
semester

BSAD 6318-ECON 5339, SP 2025 14

Marginal Effects
Now let's look at the marginal effects for the LPM model
mfx
. mfx

Marginal effects after regress

y = Fitted values (predict)
= .1197479

variable dy/dx Std. err. z P>|z| [ 95% C.I. ] X

PI .5591946 .08867 6.31 0.000 .385412 .732977 .330814

race* .1774282 .02495 7.11 0.000 .128534 .226322 .142437

(*) dy/dx is for discrete change of dummy variable from 0 to 1

The marginal effects are the same as the slope coefficients

Why? Because the relationships are linear in LPM regression and do not
vary with the values of the other explanatory variables
We will learn below that this is not the case with logit and probit models

BSAD 6318-ECON 5339, SP 2025 15

Instead of mfx, you may use margins, dydx(*) to obtain the marginal effects. Margins
command is faster.
. regress den PI i.race, vce(r)

Linear regression Number of obs = 2,380

F(2, 2377) = 49.39
Prob > F = 0.0000
R-squared = 0.0760
Root MSE = .31228

Robust
den Coefficient std. err. t P>|t| [95% conf. interval]

PI .5591946 .0886663 6.31 0.000 .3853233 .7330658

1.race .1774282 .0249463 7.11 0.000 .1285096 .2263469
_cons -.0905136 .0285996 -3.16 0.002 -.1465963 -.0344309

. margins, dydx(*)

Average marginal effects Number of obs = 2,380

Model VCE: Robust

Expression: Linear prediction, predict()

dy/dx wrt: PI 1.race

Delta-method
dy/dx std. err. t P>|t| [95% conf. interval]

PI .5591946 .0886663 6.31 0.000 .3853233 .7330658

1.race .1774282 .0249463 7.11 0.000 .1285096 .2263469
BSAD 6318-ECON 5339, SP 2025 16
Note: dy/dx for factor levels is the discrete change from the base level.
Limitations of LPM
Based on these results, can we conclude that mortgage decisions
have a racial bias?
No
Many other factors can affect this decision, which are omitted
from the above models.
If these other factors are correlated with the independent
variables, then their omission from the model will cause an
omitted variable bias.
We will include some variables in the following models.

The Linear Probability Model also has the following limitations.

BSAD 6318-ECON 5339, SP 2025 17

Limitations of LPM: Heteroskedasticity

Problem 1 with LPM: Heteroskedasticity

let's get the residuals and graph them

regress den PI, r
predict resid, resid
graph7 resid PI, ylab xlab yline(0)

BSAD 6318-ECON 5339, SP 2025 18

Limitations of LPM: Heteroskedasticity

01
!"#$%&'(#

/01

/I
+ I - .
)*

This graph illustrates that for a given value of x (PI), there are two possible
values of the residual, indicating that the variance of the error term in the LPM
is heteroskedastic BSAD 6318-ECON 5339, SP 2025 19
Limitations of LPM: Heteroskedasticity
The OLS estimators are unbiased if x variables are uncorrelated with the
explanatory variables
However, the errors are heteroskedastic
From Var(y/x)=p(x)[1-p(x)]
Thus, Var(u) can take on different values for different observations
There will be a heteroskedasticity in the LPM, except in the case where the
probability does not depend on any of the independent variables
The dependent variable takes on only 0 or 1 for given values of the
independent variables. Thus, the error term (u) will also take on only these two
values
When den (yi) =1, ui = 1 – b0-b1*PIi – b2*blacki ---For y to be equal to 1
When den (yi) =0, ui = 0 – b0-b1*PIi – b2*blacki ---For y to be equal to 0
Thus, the distribution of u has only two specific values
Since u (specific two values) change with the explanatory variables, the error
term cannot be assumed to be homoskedastic

BSAD 6318-ECON 5339, SP 2025 20

Limitations of LPM: Heteroskedasticity
This will not cause a bias in the OLS estimates of the betaj (iff x
variables are all exogenous), but we know that homoskedasticity is
crucial for test statistics.
The standard errors must be corrected for the heteroskedasticity
We should use robust standard errors to calculate the test statistics
Note also that since the distribution of u has only two values,
normality does not hold, which can be an issue for the standard
errors and test statistics

BSAD 6318-ECON 5339, SP 2025 21

Linear versus Non-Linear

Logit and Probit models are nonlinear and provide

predicted probabilities between 0 and 1

gen prob=-0.091+.559*PI+0.177*race
tab prob

BSAD 6318-ECON 5339, SP 2025 22

Limitations of LPM: outside of the 0 and 1 range
Problem 2 with LPM: The range of the predicted probabilities can lie
outside of the 0 and 1 range
Now let's compute the predicted value of den
predict den_hat
now let's graph them
graph7 den_hat PI, ylab xlab yline (0,1)
As illustrated on the next slide, the graph shows that the probability
is between -.0799096 and 1.730695. It should be between 0 and 1

As we will explain later, LPM can be a good alternative when there

are fewer 1s in the dependent variable.

BSAD 6318-ECON 5339, SP 2025 23

Limitations of LPM: outside of the 0 and 1 range
0

/2P
!"##$%&'(F*$+

. / 0 1
I-

BSAD 6318-ECON 5339, SP 2025 24

Sources

Wooldridge (2009)
Stock and Watson (2005)

BSAD 6318-ECON 5339, SP 2025 25

Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
Problem Set 1
100% (2)
Problem Set 1
26 pages
Data Interpretation Guide For All Competitive and Admission Exams
From Everand
Data Interpretation Guide For All Competitive and Admission Exams
Mohmmad Khaja Shareef
2.5/5 (6)
Cap1 Slides
No ratings yet
Cap1 Slides
30 pages
Lecture15 Binary Dependent Variables
No ratings yet
Lecture15 Binary Dependent Variables
38 pages
In All The Regression Models That We Have Considered So
100% (1)
In All The Regression Models That We Have Considered So
52 pages
Outputs 1
No ratings yet
Outputs 1
3 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
Econometrics Chapter 8 PPT Slides
100% (1)
Econometrics Chapter 8 PPT Slides
42 pages
Week 3 Hypothesis Testing and Inference - 2024
No ratings yet
Week 3 Hypothesis Testing and Inference - 2024
51 pages
Introduction To Econometrics - Stock & Watson - CH 6 Slides
No ratings yet
Introduction To Econometrics - Stock & Watson - CH 6 Slides
59 pages
Econometrics CH 4
No ratings yet
Econometrics CH 4
14 pages
AE6207 - Solution 1 - 2024
No ratings yet
AE6207 - Solution 1 - 2024
8 pages
Logistic
No ratings yet
Logistic
14 pages
Topic 1 Class Exercises
No ratings yet
Topic 1 Class Exercises
5 pages
RM2017 Midterm Questions
No ratings yet
RM2017 Midterm Questions
9 pages
Solutions To Practice Questions For Classes 6 To 12 ECO2151 Winter 2024 PDF
No ratings yet
Solutions To Practice Questions For Classes 6 To 12 ECO2151 Winter 2024 PDF
11 pages
No Linealidades Stock Watson
No ratings yet
No Linealidades Stock Watson
59 pages
Chapter 4
No ratings yet
Chapter 4
11 pages
Assignement 1 .Hridita. BUS 525
No ratings yet
Assignement 1 .Hridita. BUS 525
10 pages
Chapter 7. Software Application
No ratings yet
Chapter 7. Software Application
43 pages
CH 6 Slides
No ratings yet
CH 6 Slides
59 pages
Examec605d19 20
No ratings yet
Examec605d19 20
2 pages
ECMT1020 - Week 06 Workshop
No ratings yet
ECMT1020 - Week 06 Workshop
4 pages
Multiple Regression: Department of Government, University of Essex GV207 - Political Analysis, Week 10
No ratings yet
Multiple Regression: Department of Government, University of Essex GV207 - Political Analysis, Week 10
6 pages
Notes 9
No ratings yet
Notes 9
57 pages
ECO311 Stata
100% (1)
ECO311 Stata
111 pages
DRAFT - 15.5 Exercises For Exam 2
No ratings yet
DRAFT - 15.5 Exercises For Exam 2
14 pages
L9.1 2023
No ratings yet
L9.1 2023
47 pages
Session CLRM Review 4
No ratings yet
Session CLRM Review 4
15 pages
Assignment 3 (QM)
No ratings yet
Assignment 3 (QM)
3 pages
Lnq = Β + Β Lnli + Β Lnki + Ɛ
No ratings yet
Lnq = Β + Β Lnli + Β Lnki + Ɛ
12 pages
Dummy Variable Regression Models
No ratings yet
Dummy Variable Regression Models
48 pages
Quick Stata Guide
No ratings yet
Quick Stata Guide
22 pages
Mock Test Econ
No ratings yet
Mock Test Econ
2 pages
Lecture4 Linearregression Oneregressor
No ratings yet
Lecture4 Linearregression Oneregressor
37 pages
(A) Regress Log of Wages On A Constant and The Female Dummy. Paste Output Here
No ratings yet
(A) Regress Log of Wages On A Constant and The Female Dummy. Paste Output Here
5 pages
Modelo Multiple
No ratings yet
Modelo Multiple
51 pages
Lecture Set 5
No ratings yet
Lecture Set 5
54 pages
Regresión Lineal - II - 14
No ratings yet
Regresión Lineal - II - 14
42 pages
FinalExam Fall2020 Updated GB213
No ratings yet
FinalExam Fall2020 Updated GB213
11 pages
ch4 Dummy
No ratings yet
ch4 Dummy
54 pages
ECON3002 2013 Final Merged Answer
No ratings yet
ECON3002 2013 Final Merged Answer
23 pages
KTL - Mocktest
No ratings yet
KTL - Mocktest
3 pages
Part 2 - Simple Regression Model
No ratings yet
Part 2 - Simple Regression Model
56 pages
Kiểm định
No ratings yet
Kiểm định
3 pages
ps5 Fall+2015
No ratings yet
ps5 Fall+2015
9 pages
Part 2 - Multiple Regression Model
No ratings yet
Part 2 - Multiple Regression Model
49 pages
Uji Asumsi Klasik 1
No ratings yet
Uji Asumsi Klasik 1
6 pages
Ch4 Classifications24
No ratings yet
Ch4 Classifications24
42 pages
Class 7 Binary Dependent Variables1 1
No ratings yet
Class 7 Binary Dependent Variables1 1
26 pages
Analysing Panel Data
No ratings yet
Analysing Panel Data
25 pages
SMMD: Practice Problem Set 6 Topic: The Simple Regression Model
No ratings yet
SMMD: Practice Problem Set 6 Topic: The Simple Regression Model
6 pages
Econometrics For Management Assignment
No ratings yet
Econometrics For Management Assignment
3 pages
Econ 251 PS4 Solutions
No ratings yet
Econ 251 PS4 Solutions
11 pages
YD Slides5 NonLin
No ratings yet
YD Slides5 NonLin
54 pages
Nu - Edu.kz Econometrics-I Assignment 4 Answer Key
No ratings yet
Nu - Edu.kz Econometrics-I Assignment 4 Answer Key
4 pages
Solutions Manual to accompany Introduction to Linear Regression Analysis
From Everand
Solutions Manual to accompany Introduction to Linear Regression Analysis
Douglas C. Montgomery
1/5 (1)
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Real Estate Math Express: Rapid Review and Practice with Essential License Exam Calculations
From Everand
Real Estate Math Express: Rapid Review and Practice with Essential License Exam Calculations
Stephen Mettling
No ratings yet
Problem Set 1 - Econ 710 - Spring 2018 - Part 2
No ratings yet
Problem Set 1 - Econ 710 - Spring 2018 - Part 2
2 pages
Econometric Methods
No ratings yet
Econometric Methods
4 pages
Basic Concepts of Probability and Statistics Hydrology
67% (3)
Basic Concepts of Probability and Statistics Hydrology
14 pages
Game Theory
No ratings yet
Game Theory
60 pages
CPS 270: Artificial Intelligence Decision Theory: Vincent Conitzer
No ratings yet
CPS 270: Artificial Intelligence Decision Theory: Vincent Conitzer
7 pages
Excel Modeling of Portfolio Variance
No ratings yet
Excel Modeling of Portfolio Variance
3 pages
Summary of Probability 2 1
No ratings yet
Summary of Probability 2 1
3 pages
NPTEL Assign 2 Jan23 Behavioral and Personal Finance
No ratings yet
NPTEL Assign 2 Jan23 Behavioral and Personal Finance
7 pages
Session 3 - Linear Regression
No ratings yet
Session 3 - Linear Regression
96 pages
PVF, PVAF, CVF, CVAF Tables For Financial Management
No ratings yet
PVF, PVAF, CVF, CVAF Tables For Financial Management
10 pages
Decision Making Mathematical Models
No ratings yet
Decision Making Mathematical Models
21 pages
Regression Test Lesson Notes (Optional Download)
No ratings yet
Regression Test Lesson Notes (Optional Download)
5 pages
CH 4 Decision Theory
No ratings yet
CH 4 Decision Theory
37 pages
Asig 4-1
No ratings yet
Asig 4-1
9 pages
FamaPaper Fm73replication Extension PDF
No ratings yet
FamaPaper Fm73replication Extension PDF
71 pages
Forecasting: Seasonal Adjustment
No ratings yet
Forecasting: Seasonal Adjustment
14 pages
ECON 230 Spring 2022-23 Course Outline
No ratings yet
ECON 230 Spring 2022-23 Course Outline
3 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 5
No ratings yet
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 5
9 pages
MI2026 Final
No ratings yet
MI2026 Final
2 pages
Estimation Hypothesis Testing: Decisions Inferences
No ratings yet
Estimation Hypothesis Testing: Decisions Inferences
8 pages
Lecture19 Short
No ratings yet
Lecture19 Short
84 pages
Chapter Three Statistical Inference in Simple Linear Regression Model
No ratings yet
Chapter Three Statistical Inference in Simple Linear Regression Model
33 pages
Case 1 - Number 2
No ratings yet
Case 1 - Number 2
11 pages
Two Phase Simplex PDF
No ratings yet
Two Phase Simplex PDF
2 pages
Econometrics Assignment MBA - 2
No ratings yet
Econometrics Assignment MBA - 2
3 pages
Emailing BHU Syllabus M.Sc.
No ratings yet
Emailing BHU Syllabus M.Sc.
25 pages
Add Maths - Binomial Distribution
No ratings yet
Add Maths - Binomial Distribution
15 pages
Simple Explanation of Statsmodel Linear Regression Model Summary
No ratings yet
Simple Explanation of Statsmodel Linear Regression Model Summary
19 pages
Desscriptive Questions: Statistical Lab
No ratings yet
Desscriptive Questions: Statistical Lab
4 pages
Probability and Statistics - Practice Tests and Solutions
No ratings yet
Probability and Statistics - Practice Tests and Solutions
46 pages