Chapter 15.1
Chapter 15.1
• Where Y=1 if a family owns a house and 0 otherwise and X indicates family
income
• LPM because Y = binary variable
• Conditional expectation of Y given X, gives the probability of a family
owning a house and whose income is the given amount X.
Justification of LPM
• ……………….1
• As with OLS, assume (to obtain unbiased estimators), therefore
• ………………2
• Also,
– = probability that (the event occurs)
– = probability that (event does not occur)
Y Prob
– has the following distribution: 0 1–P
1 P
Total 1
Justification of LPM
• follows the Bernoulli probability distribution. By definition of mathematical expectation, we
obtain:
• …………..3
• If we equate equation 2 and 3 we get:
• ………………..4
• Thus the conditional expectation of model 1 can be interpreted as the conditional probability of
• If there are n independent trials, each with a probability p of success and probability (1 – p) of
failure and X of these trials represent the number of successes, then X follow the binomial
distribution with:
– mean = np
– Variance = np(1 –p)
• Binomial random variable with parameter n=1 is equivalent to Bernoulli random variable
Justification of LPM
• must lie between 0 and 1, therefore the restriction:
• LPM poses several problems (can therefore not simply extend OLS to binary
dependent variable regression models)
– Non-normality of the disturbance
– Heteroscedastistic variances of disturbances
– Nonfulfillment of
– Questionable value of R2 as measure of goodness of fit
Non-normality of the disturbance
• Although OLS does not require to be normally distributed, assumed for the
purpose of statistical inference
• Normality of not tenable for LPMs because only 2 values, therefore the
Bernoulli distribution
• Nonfulfillment of normality not critical because OLS point estimates remain
unbiased. Also due to central limit theorem – as sample size increases
indefinitely, OLS estimators tend to be normally distributed generally.
Therefore in large sample, statistical inference of LPM will follow usual OLS
procedure under normality assumption
Heteroscedastistic variances of disturbances
• For Bernoulli distribution theoretical mean = p and theoretical variance = p(1 – p) where p =
probability of success (event occurring) showing that the variance is a function of the mean –
therefore the error variance is heteroscedastic
• In presence of heteroscedasticity, OLS estimators although unbiased, are not efficient – they
don’t have minimum variance
• Solution: Weighted least squares (WLS) – but adapted slightly for LPM:
1. Run OLS despite heteroscedasticity problem and obtain . Then obtain estimate of
– )
2. Use estimated to transform data:
• May also use White’s heteroscedastistic-corrected standard errors if the sample is reasonably
large
Nonfulfillment of
• There is no guarantee that will necessarily fulfil this restriction and this is the
real problem with OLS estimation of LPM. OLS doesn’t take this restriction
into account
• Ways of finding out if lies between 0 and 1:
– Estimate LPM by usual OLS and find out if it lies between 0 and 1. Values < 0 – assumed
to be zero. Values > 1 – assumed to be 1
– Devise estimating technique that will guarantee that estimated conditional probabilities
will lie between 0 and 1. Logit and probit models guarantee this
Questionable value of R2 as measure of
goodness of fit
• R2 has a limited value in binary response models
• Because Y is either 0 or 1, corresponding to a given X – all Y values will
either lie along the X-axis (where Y=0) or along the line corresponding to 1
• Generally no LPM is expected to fit such a scatter well (whether constrained /
unconstrained) – R2 is likely to be much lower than 1 for such models (in
general it lies between 0.2 and 0.6)
• Only in excess of 0.8 if actual scatter is very closely clustered around points A
and B (fig c). Predicted Y will be very close to either 0 or 1
• Rather avoid using R2 in models with qualitative dependent variables
Example 15.1
Example 15.1
• Probability that a family with R0 income will own a house (remember probability
can’t be negative, therefore this value is treated as zero (which makes sense here)
• For a unit change in income, on the average the probability of owning a house
increases by 0.1021 (approximately 10%)
• We can also estimate the actual probability of owning a house given a particular
level of income – e.g:
– Probability that a family with an income of $12000 will own a house is approximately 28%
• Standard errors are now smaller and t ratios are larger. Keep in mind that we dropped 12
observations
Yfa/sqw c(1)/sqw x/sqw
• y/sqw c(1)/sqw x/sqw
Alternatives to LPM
• LPM plagued by several problems as discussed but fundamental problem:
– Logically not a very attractive model because it assumes probability increases linearly
with X – thus the marginal effect of X remains constant throughout which doesn’t
always makes sense
– E.g. homeownership example as X increases by a unit the probability of owning a house
increases by the same constant amount of 0.1. In reality it is expected that P is
nonlinearly related to X
– Very low income – family will not own a house
– Sufficiently high level of income, e.g. X* - most likely will own a house
– Any increase in income beyond X* will have little effect on probability of owning a
house. At both ends of income distribution, probability of owning a house will be
virtually unaffected by a small increase in X
Alternatives to LPM
• We need a probability model that has 2 features:
1. As increases, increases but never steps outside the 0 – 1 interval
2. Relationship between and is nonlinear – “one which approaches zero at slower and
slower rates as gets small and approaches one at slower and slower rates as gets very
large”
Alternatives to LPM
• Geometric form of model – probability lies between 0 and 1 and varies
nonlinearly with X
• S-shaped curve resembles cumulative distribution function of random variable
• For each random variable there is a unique CDF
• CDFs commonly chosen to represent the 0-1 response models are:
– Logistic (Logit) Model
– Normal (Probit/Normit) Model
Tutorials this week
• Please go through the practical worksheet, memo and instructional video to
make sure that you can apply the work practically
• The tutors will cover the tutorial during the tutorial sessions this week. Make
sure to attend one of the sessions. The lecturing schedule contains all the
information of contact sessions for the module.