3a - Relaxing The Ols Assumptions
3a - Relaxing The Ols Assumptions
ASSUMPTIONS
Internal and external validity
• A study is internally valid if its statistical inferences about causal effects are
valid for the population and setting being studied
• A study is externally valid if its statistical inferences can be generalized to
other populations and settings
• The population studied is the population of entities—people, companies, school
districts, and so forth—from which the sample was drawn
• The population to which the results are generalized, or the population of interest, is the
population of entities to which the causal inferences from the study are to be applied to.
• Example: A policy maker might want to generalize the findings on household income
and consumption in Maharashtra (the population studied) to the population of
households in India (the population of interest)
Internal and external validity
Threats to External Validity
• Differences in populations: Differences between the population studied and the
population of interest can pose a threat to external validity
• This could be because the population studied was chosen in a way that makes it different
from the population of interest- due to differences in characteristics, geographical
differences or because the study is out-of-date
• Differences in settings: The results based on the population studied may not be
generalizable due to differences in settings arising out of cultural or institutional
differences in the two populations
Internal and external validity
Threats to Internal Validity
• Internal validity has two components:
• The estimator of the causal effect should be unbiased and consistent
• Hypothesis tests should have the desired significance level (the actual rejection
rate of the test under the null hypothesis should equal its desired significance
level), and confidence intervals should have the desired confidence level
• Threats to internal validity originate from the failures of one or more of the
least squares assumptions
Ordinary Least Squares: Assumptions
• Assumption 1: The regression line is linear in parameters, although it may not be
linear in variables
• Assumption 2: The values taken by the regressor 𝑋 may be considered fixed in
repeated sampling
• In many real world scenarios, the data are collected such that the independent variables
are random or stochastic, in nature.
• In such cases, we assume that the 𝑋 variables are independent of the error term
𝐶𝑜𝑣 𝑋𝑖 , 𝑢𝑖 = 0
• Assumption 2 implies the absence of a linear association between 𝑋𝑖 and 𝑢𝑖
𝜎2
1 =
⇒ 𝑣𝑎𝑟 𝛽
σ 𝑥1𝑖 𝑥2𝑖 2
σ 𝑥1𝑖 2 −
σ 𝑥2𝑖 2
𝜎2
1 =
⇒ 𝑣𝑎𝑟 𝛽
2 2 σ 𝑥1𝑖 𝑥2𝑖 2
σ 𝑥1𝑖 − σ 𝑥1𝑖
σ 𝑥1𝑖 2 σ 𝑥2𝑖 2
𝜎2 1 σ 𝑥 1𝑖 𝑥2𝑖
2
1 =
⇒ 𝑣𝑎𝑟 𝛽 2
where 𝑟12 =
2 2
σ 𝑥1𝑖 1 − 𝑟12 σ 𝑥1𝑖 2 σ 𝑥2𝑖 2
1 is directly proportional to 𝑟12
• 𝑣𝑎𝑟 𝛽 2
or the correlation between 𝑋1 and 𝑋2
Consequences of imperfect multicollinearity
• In a general 𝑘-variable regression model,
𝜎2 1
𝑣𝑎𝑟 𝛽𝑗 =
σ 𝑥𝑗 2 1 − 𝑅𝑗2
𝜎2
𝑣𝑎𝑟 𝛽𝑗 = 𝑉𝐼𝐹𝑗
σ 𝑥𝑗 2
• The estimates and the standard errors of the OLS estimators become very sensitive
to even the slightest changes in data
Detection of multicollinearity
• High 𝑅2 but very few significant 𝑡 ratios: A classic symptom of multicollinearity is a
situation where the 𝑅2 is high but the individual slope coefficients are not significant
• Auxiliary regressions: Another way of detecting multicollinearity in a 𝑘-variable
regression is to regress each of the independent variables on the remaining regressors
• If the estimated 𝑅2 from the auxiliary regression exceeds 0.90, it can imply that
multicollinearity is an issue
Detection of multicollinearity
• The tolerance and the Variance Inflating Factor (VIF) are useful tools for detecting
multicollinearity
• The VIF of a regressor 𝑋𝑗 is defined as
1
𝑉𝐼𝐹𝑗 =
1 − 𝑅𝑗2
𝑅𝑗2 is the 𝑅2 of the regression of 𝑋𝑗 on the remaining regressors
• As a rule of thumb, if the VIF of a variable exceeds 10, which will happen if 𝑅𝑗2 >
0.90, that variable is said to be highly collinear
• The tolerance of a regressor is defined as the inverse of the VIF.
1
𝑇𝑂𝐿𝑗 = = 1 − 𝑅𝑗 2
𝑉𝐼𝐹𝑗
• Lower the tolerance, the greater is the degree of multicollinearity
Detection of multicollinearity
• The condition number and the condition index are useful tools for detecting
multicollinearity
• The condition number 𝑘 is defined as
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒
𝑘=
𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒
• As a rule of thumb, if the condition number lies between 100 to 1000, there is moderate
to strong multicollinearity if it exceeds 1000 there is severe multicollinearity
• The condition index of a regressor is defined as the square root of the condition number.
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒
𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 𝐼𝑛𝑑𝑒𝑥 = = 𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 𝑛𝑢𝑚𝑏𝑒𝑟
𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒
• Under heteroscedasticity, the variance of the error term varies with the values of
the regressor
𝐸 𝑢𝑖 2 ⃓𝑋𝑖 = 𝜎𝑖 2
Violation of OLS Assumptions: Heteroscedasticity
Heteroscedasticity: Sources
• Cross-sectional data involve observations on heterogeneous units with varying
range/scale. Such data invariably results in heteroscedasticity
• Example: Data on households display a varying range of expenditures at different levels
of income. Households with low income exhibit low range of expenses whereas higher
income households exhibit a larger variance of expenses due to their greater discretionary
incomes
• Example: Data on firms display a varying range of profits at different levels of firm sizes.
Smaller firms exhibit low range of profits whereas larger firms exhibit a wider variance of
profits due to their higher R&D expenses which is inherently risky
• Example: The number of typing errors and its variance decreases with the hours of
typing practice as practice makes one perfect
Heteroscedasticity: Sources
• Heteroscedasticity also arises due to the presence of outliers
• Heteroscedasticity may arise due to incorrect model specification
• Omission of a relevant variable results in the effect of this variable being subsumed in the
disturbance term. If this omitted variable is correlated with any other regressor, the
disturbance term will be systematically correlated with the regressor
• Specifying a linear functional form instead of a quadratic form results in the quadratic
term being subsumed in the disturbance leading to correlation between the regressor and
the error term
Heteroscedasticity: Consequences
• The OLS estimate of 𝛽1 under heteroscedasticity is identical to its estimate under
homoscedasticity
1 under heteroscedasticity is still a linear, unbiased and consistent
• The OLS estimate, 𝛽
estimator of 𝛽1
1 is still asymptotically normally distributed under
• Under certain conditions, 𝛽
heteroscedasticity
• The interpretation of the goodness-of-fit measures, 𝑅2 and 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 is also unaffected
by the presence of heteroscedasticity
Heteroscedasticity: Consequences
• The variance of 𝛽1 is different from its corresponding variance under homoscedasticity
𝜎2
1 =
Under homoscedasticity: 𝑣𝑎𝑟 𝛽
σ 𝑥𝑖 2
σ 𝑥𝑖 2 𝜎𝑖 2
1 =
Under heteroscedasticity: 𝑣𝑎𝑟 𝛽
(σ 𝑥𝑖 2 )2
1 is no longer the best or minimum variance estimator of 𝛽1 in
• Under heteroscedasticity, 𝛽
the class of linear unbiased estimators
2
1 = 𝐸 𝑘𝑖 𝑢𝑖
⇒ 𝑣𝑎𝑟 𝛽 = 𝐸 𝑘1 𝑢1 + 𝑘2 𝑢2 + ⋯ + 𝑘𝑛 𝑢𝑛 2
σ 𝑥𝑖 2 𝜎𝑖 2
1 =
𝑣𝑎𝑟 𝛽
(σ 𝑥𝑖 2 )2
Heteroscedasticity: Detection
Informal methods of detecting heteroscedasticity
• Nature of the problem: In cross-sectional data involving heterogeneous units with a
large range, heteroscedasticity may be the rule rather than the exception
• Graphical method: We estimate the residuals 𝑢ෝ𝑖 after running an OLS model assuming
homoscedasticity
• Plotting 𝑢ෝ𝑖 2 against 𝑌
𝑖 provides evidence on the presence or absence of heteroscedasticity
ෞ𝑖 2
σ 𝑥𝑖 2 𝑢
1 under any form of heteroscedasticity:
• A valid estimator of 𝑣𝑎𝑟 𝛽
σ 𝑥𝑖 2 2
• White’s heteroscedasticity consistent standard errors are valid only for large samples
Heteroscedasticity: Remedial measures
Log transformation
• The 𝑘 = 2 variable linear regression model: 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖
• A very useful solution for solving heteroscedasticity is log-transformation of the model
• Log transformation compresses the scale of the variable
THANK YOU