0% found this document useful (0 votes)

4 views37 pages

3a - Relaxing The Ols Assumptions

The document discusses the concepts of internal and external validity in research, emphasizing the importance of proper sampling and generalization of results. It outlines the assumptions of Ordinary Least Squares (OLS) regression, including linearity, independence, and normality, and highlights potential threats to these assumptions such as multicollinearity and non-normality. The document also explains the consequences of violating these assumptions, particularly in terms of estimation accuracy and hypothesis testing.

Uploaded by

priyanshuiitg3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views37 pages

3a - Relaxing The Ols Assumptions

Uploaded by

priyanshuiitg3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

RELAXING THE OLS

ASSUMPTIONS
Internal and external validity
• A study is internally valid if its statistical inferences about causal effects are
valid for the population and setting being studied
• A study is externally valid if its statistical inferences can be generalized to
other populations and settings
• The population studied is the population of entities—people, companies, school
districts, and so forth—from which the sample was drawn

• The population to which the results are generalized, or the population of interest, is the
population of entities to which the causal inferences from the study are to be applied to.

• Example: A policy maker might want to generalize the findings on household income
and consumption in Maharashtra (the population studied) to the population of
households in India (the population of interest)
Internal and external validity
Threats to External Validity
• Differences in populations: Differences between the population studied and the
population of interest can pose a threat to external validity
• This could be because the population studied was chosen in a way that makes it different
from the population of interest- due to differences in characteristics, geographical
differences or because the study is out-of-date

• Differences in settings: The results based on the population studied may not be
generalizable due to differences in settings arising out of cultural or institutional
differences in the two populations
Internal and external validity
Threats to Internal Validity
• Internal validity has two components:
• The estimator of the causal effect should be unbiased and consistent
• Hypothesis tests should have the desired significance level (the actual rejection
rate of the test under the null hypothesis should equal its desired significance
level), and confidence intervals should have the desired confidence level

• Threats to internal validity originate from the failures of one or more of the
least squares assumptions
Ordinary Least Squares: Assumptions
• Assumption 1: The regression line is linear in parameters, although it may not be
linear in variables
• Assumption 2: The values taken by the regressor 𝑋 may be considered fixed in
repeated sampling
• In many real world scenarios, the data are collected such that the independent variables
are random or stochastic, in nature.
• In such cases, we assume that the 𝑋 variables are independent of the error term
𝐶𝑜𝑣 𝑋𝑖 , 𝑢𝑖 = 0
• Assumption 2 implies the absence of a linear association between 𝑋𝑖 and 𝑢𝑖

Example: When regressing household income on consumption, we assume that omitted

factors such as household wealth which is subsumed in 𝑢𝑖 is uncorrelated with 𝑋𝑖
Ordinary Least Squares: Assumptions
• Assumption 3 (Zero Conditional Mean): Given the value of 𝑋𝑖 , the mean or
the expected value of the random disturbance term 𝑋𝑖 is zero
𝐸 𝑢𝑖 ⃓𝑋𝑖 = 0
• Assumption 3 implies Assumption 2 but extends the latter to include non-linear
association between 𝑋𝑖 and 𝑢𝑖
• Assumption 3 ensures that 𝑌0𝑖 , 𝑌1𝑖 ⊥ 𝑋𝑖
• The presence of some omitted factor (or baseline characteristic) influencing
treatment assignment violates Assumption 3
Example: When regressing household income on consumption, we assume that the
average influence of some omitted factor such as household wealth which is subsumed
in 𝑢𝑖 is zero for given values of 𝑋𝑖
Ordinary Least Squares: Assumptions
• Assumption 4 (Homoscedasticity): The variance of the error, or disturbance,
term is the same regardless of the value of 𝑋
𝑣𝑎𝑟 𝑢𝑖 = 𝐸[𝑢𝑖 − 𝐸(𝑢𝑖 ⃓𝑋𝑖 )]2
= 𝐸 𝑢𝑖 2 ⃓𝑋𝑖 = 𝜎 2
Ordinary Least Squares: Assumptions
• Assumption 5 (No Autocorrelation): Given any two 𝑋 values, 𝑋𝑖 and 𝑋𝑗 (𝑖 ≠
𝑗), the correlation between any two 𝑢𝑖 and 𝑢𝑗 (𝑖 ≠ 𝑗) is zero.
• In short, the observations are sampled independently.
𝑐𝑜𝑣 𝑢𝑖 , 𝑢𝑗 ⃓𝑋𝑖 , 𝑋𝑗 = 0

• Assumption 6 (Normality of 𝑢𝑖 ): The disturbance term 𝑢𝑖 is normally

distributed.
• The disturbance term, 𝑢𝑖 represents the combined influence of a large number
of 𝑖𝑖𝑑 omitted variables
• By Central Limit Theorem, the sum of a large number of 𝑖𝑖𝑑 variables are
normally distributed
Multiple Linear Regression: Assumptions
• Assumption 7 (No Multicollinearity): Under the assumption of no
multicollinearity, it is assumed that there is no perfect collinearity among the
independent variables.
• It is assumed that none of the regressors can be written as exact linear
combinations of the remaining regressors in the model.
• No multicollinearity implies that there exists no set of numbers 𝜆1 and 𝜆2 , not
both zero, such that,
𝜆1 𝑋1𝑖 + 𝜆2 𝑋2𝑖 = 0

• Assumption 8 (No specification bias): The regression model is correctly

specified
Violation of OLS Assumptions: Non-normality of 𝑢𝑖
• The assumption of the normality of 𝑢𝑖 is essential for hypothesis testing with regard to
the OLS estimators
• Without the normality assumption of 𝑢𝑖 , the OLS estimators are also not normally
distributed and we cannot use the usual 𝑡 and 𝐹 tests to test statistical hypothesis in the
usual sense
• In case of non-normality of 𝑢𝑖 , we rely on inferences based on large sample
If the disturbances 𝑢𝑖 are independently and identically distributed with mean zero and constant
variance, 𝜎 2 , and if the explanatory variables are constant in repeated sampling, the OLS coefficient
estimates are asymptotically normally distributed with means equal to the corresponding 𝛽’s. Thus the
usual test procedures –𝑡 and 𝐹 tests– are still valid asymptotically, that is in large samples but not in
finite small samples.
Violation of OLS Assumptions: Multicollinearity
Multicollinearity: What happens if the regressors are correlated?
• The term multicollinearity originally meant the existence of a perfect or exact linear
relationship among some or all explanatory variables of a regression model
• For a 𝑘-variable linear regression involving the explanatory variables 𝑋0 , 𝑋1 , 𝑋2 , … 𝑋𝑘 , an
exact linear relationship is said to exist if the following condition is satisfied:
𝜆0 𝑋0 + 𝜆1 𝑋1 + 𝜆2 𝑋2 + ⋯ + 𝜆𝑘 𝑋𝑘 = 0
Here 𝑋0 = 1 for all observations to allow for the intercept term
• In modern times, the term multicollinearity refers to perfect as well as imperfect
interrelationship among the regressors, as defined below:
𝜆0 𝑋0 + 𝜆1 𝑋1 + 𝜆2 𝑋2 + ⋯ + 𝜆𝑘 𝑋𝑘 + 𝑣𝑖 = 0
Here 𝑣𝑖 is the stochastic error term
Multicollinearity
Sources of Multicollinearity
• Data collection method: Sampling over a limited range of values taken by the
regressors in the model
• Nature of the variables in regression model: Some regressors in a regression model
may be interrelated by nature resulting in the problem of multicollinearity
• Example: Regression of consumption on household income and household size faces the
problem of multicollinearity as household income and household size are likely to be
highly correlated

• Model specification: Adding polynomial terms of a regressor to a model leads to the

problem of multicollinearity
Multicollinearity: Consequences
Estimation in the presence of perfect multicollinearity
• Three variable regression model:
෢0 + 𝛽
𝑌𝑖 = 𝛽 ෢1 𝑋1𝑖 + 𝛽
෢2 𝑋2𝑖 + 𝑢𝑖

σ 𝑦𝑖 𝑥1𝑖 σ 𝑥2𝑖 2 − σ 𝑦𝑖 𝑥2𝑖 σ 𝑥1𝑖 𝑥2𝑖

෢1 =
𝛽
σ 𝑥1𝑖 2 σ 𝑥2𝑖 2 − σ 𝑥1𝑖 𝑥2𝑖 2
ത 𝑥1𝑖 = 𝑋1𝑖 − 𝑋1 ; 𝑥2𝑖 = 𝑋2𝑖 − 𝑋2
Here 𝑦𝑖 = 𝑌𝑖 − 𝑌;
• Under perfect multicollinearity: 𝑋1𝑖 = 𝜆𝑋2𝑖 ⇒ 𝑥1𝑖 = 𝜆𝑥2𝑖

𝜆 σ 𝑦𝑖 𝑥2𝑖 σ 𝑥2𝑖 2 − σ 𝑦𝑖 𝑥2𝑖 𝜆 σ 𝑥2𝑖 2 0

෢
𝛽1 = = i. e. undefined
𝜆2 σ 𝑥2𝑖 2 σ 𝑥2𝑖 2 − 𝜆2 σ 𝑥2𝑖 2 2 0
෢1 and 𝛽
• Both 𝛽 ෢2 are indeterminate
Multicollinearity: Consequences
Consequences of imperfect multicollinearity
• Even in the presence of near perfect multicollinearity, the OLS estimators retain the
property of Best Linear Unbiased Estimators (BLUE)
Consequences of imperfect multicollinearity
• The OLS estimators have large variances making precise estimation difficult
σ 𝑥2𝑖 2
෢1 =
𝑣𝑎𝑟 𝛽 𝜎2
σ 𝑥1𝑖 2 σ 𝑥2𝑖 2 − σ 𝑥1𝑖 𝑥2𝑖 2

𝜎2
෢1 =
⇒ 𝑣𝑎𝑟 𝛽
σ 𝑥1𝑖 𝑥2𝑖 2
σ 𝑥1𝑖 2 −
σ 𝑥2𝑖 2

𝜎2
෢1 =
⇒ 𝑣𝑎𝑟 𝛽
2 2 σ 𝑥1𝑖 𝑥2𝑖 2
σ 𝑥1𝑖 − σ 𝑥1𝑖
σ 𝑥1𝑖 2 σ 𝑥2𝑖 2

𝜎2 1 σ 𝑥 1𝑖 𝑥2𝑖
2
෢1 =
⇒ 𝑣𝑎𝑟 𝛽 2
where 𝑟12 =
2 2
σ 𝑥1𝑖 1 − 𝑟12 σ 𝑥1𝑖 2 σ 𝑥2𝑖 2
෢1 is directly proportional to 𝑟12
• 𝑣𝑎𝑟 𝛽 2
or the correlation between 𝑋1 and 𝑋2
Consequences of imperfect multicollinearity
• In a general 𝑘-variable regression model,

𝜎2 1
෡
𝑣𝑎𝑟 𝛽𝑗 =
σ 𝑥𝑗 2 1 − 𝑅𝑗2

𝜎2
𝑣𝑎𝑟 𝛽෡𝑗 = 𝑉𝐼𝐹𝑗
σ 𝑥𝑗 2

Here, 𝛽෡𝑗 is the estimated partial regression coefficient of the regressor 𝑋𝑗 ;

𝑅𝑗2 is the 𝑅2 of the regression of 𝑋𝑗 on the remaining (𝑘 − 2) regressors.
Consequences of imperfect multicollinearity
• Inflated variances results in wider confidence intervals leading to the acceptance of
the null hypothesis more readily
• In cases of high multicollinearity, the sample data may be compatible with a large set of
hypotheses increasing the probability of Type II error.

• The 𝑡-ratio of 𝛽෡𝑗 tends to be statistically insignificant

𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 − ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
𝑡= ~𝑡𝑚 𝑑.𝑓.
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟
• High collinearity by artificially enlarging the standard errors can result in a smaller 𝑡-
statistic thus increasing the chances of accepting the null hypothesis
Consequences of imperfect multicollinearity
• The individual regression coefficients may be statistically insignificant on the basis of
𝑡-test, yet the overall 𝑅2 in such situations may be high
𝐹 test would reject 𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑗 = ⋯ = 𝛽𝑘 = 0
Individual 𝑡 test would fail to reject 𝐻0 : 𝛽𝑗 = 0; 𝑗 = 1,2, … 𝑘
• The 𝐹 test would convincingly reject the null hypothesis that all the slope coefficients are
simultaneously equal to zero although the 𝛽’s may all be individually insignificant
• High collinearity among the regressors makes it very difficult to identify the partial or
direct effect of an individual regressor on the outcome variable

• The estimates and the standard errors of the OLS estimators become very sensitive
to even the slightest changes in data
Detection of multicollinearity
• High 𝑅2 but very few significant 𝑡 ratios: A classic symptom of multicollinearity is a
situation where the 𝑅2 is high but the individual slope coefficients are not significant
• Auxiliary regressions: Another way of detecting multicollinearity in a 𝑘-variable
regression is to regress each of the independent variables on the remaining regressors
• If the estimated 𝑅2 from the auxiliary regression exceeds 0.90, it can imply that
multicollinearity is an issue
Detection of multicollinearity
• The tolerance and the Variance Inflating Factor (VIF) are useful tools for detecting
multicollinearity
• The VIF of a regressor 𝑋𝑗 is defined as
1
𝑉𝐼𝐹𝑗 =
1 − 𝑅𝑗2
𝑅𝑗2 is the 𝑅2 of the regression of 𝑋𝑗 on the remaining regressors
• As a rule of thumb, if the VIF of a variable exceeds 10, which will happen if 𝑅𝑗2 >
0.90, that variable is said to be highly collinear
• The tolerance of a regressor is defined as the inverse of the VIF.
1
𝑇𝑂𝐿𝑗 = = 1 − 𝑅𝑗 2
𝑉𝐼𝐹𝑗
• Lower the tolerance, the greater is the degree of multicollinearity
Detection of multicollinearity
• The condition number and the condition index are useful tools for detecting
multicollinearity
• The condition number 𝑘 is defined as
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒
𝑘=
𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒
• As a rule of thumb, if the condition number lies between 100 to 1000, there is moderate
to strong multicollinearity if it exceeds 1000 there is severe multicollinearity
• The condition index of a regressor is defined as the square root of the condition number.

𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒
𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 𝐼𝑛𝑑𝑒𝑥 = = 𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 𝑛𝑢𝑚𝑏𝑒𝑟
𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒

• Higher the condition index, the greater is the degree of multicollinearity

Multicollinearity: Remedial measures
• Dropping a variable and specification bias: One of the simplest methods to counter
multicollinearity is to drop one of the collinear variables
• Although, it can solve the multicollinearity problem it leads to specification bias in the model
• Transformation of variables: A useful method to correct for multicollinearity is to
transform the variables in the model.
• Examples, include specifying values in per capita or in logarithms rather than in levels.
• Additional or new data: Simply increasing the sample size may be a potent solution to the
multicollinearity problem.
𝜎2
𝑣𝑎𝑟 𝛽෡𝑗 = 𝑉𝐼𝐹𝑗
σ 𝑥𝑗 2
• As the sample size increases, σ 𝑥𝑗 2 is likely to increase, lowering 𝑣𝑎𝑟 𝛽෡𝑗
Multicollinearity: Remedial measures
• Reducing collinearity in polynomial models: One simple solution to reducing
multicollinearity in models with quadratic or cubic terms is to express the independent
variables in deviation forms.
• Other methods such as factor analysis, principal components methods and ridge
regression can also be used to solve for the multicollinearity problem
Violation of OLS Assumptions: Heteroscedasticity
Heteroscedasticity: What happens if the error variance is non-constant
• The assumption of homoscedasticity states that the error variance conditional on
the values of the regressor is constant
𝑣𝑎𝑟 𝑢𝑖 ⃓𝑋𝑖 = 𝐸 𝑢𝑖 2 ⃓𝑋𝑖 = 𝜎 2

• Under heteroscedasticity, the variance of the error term varies with the values of
the regressor
𝐸 𝑢𝑖 2 ⃓𝑋𝑖 = 𝜎𝑖 2
Violation of OLS Assumptions: Heteroscedasticity
Heteroscedasticity: Sources
• Cross-sectional data involve observations on heterogeneous units with varying
range/scale. Such data invariably results in heteroscedasticity
• Example: Data on households display a varying range of expenditures at different levels
of income. Households with low income exhibit low range of expenses whereas higher
income households exhibit a larger variance of expenses due to their greater discretionary
incomes
• Example: Data on firms display a varying range of profits at different levels of firm sizes.
Smaller firms exhibit low range of profits whereas larger firms exhibit a wider variance of
profits due to their higher R&D expenses which is inherently risky
• Example: The number of typing errors and its variance decreases with the hours of
typing practice as practice makes one perfect
Heteroscedasticity: Sources
• Heteroscedasticity also arises due to the presence of outliers
• Heteroscedasticity may arise due to incorrect model specification
• Omission of a relevant variable results in the effect of this variable being subsumed in the
disturbance term. If this omitted variable is correlated with any other regressor, the
disturbance term will be systematically correlated with the regressor
• Specifying a linear functional form instead of a quadratic form results in the quadratic
term being subsumed in the disturbance leading to correlation between the regressor and
the error term
Heteroscedasticity: Consequences
• The OLS estimate of 𝛽1 under heteroscedasticity is identical to its estimate under
homoscedasticity
෢1 under heteroscedasticity is still a linear, unbiased and consistent
• The OLS estimate, 𝛽
estimator of 𝛽1
෢1 is still asymptotically normally distributed under
• Under certain conditions, 𝛽
heteroscedasticity
• The interpretation of the goodness-of-fit measures, 𝑅2 and 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 is also unaffected
by the presence of heteroscedasticity
Heteroscedasticity: Consequences
• The variance of 𝛽෢1 is different from its corresponding variance under homoscedasticity

𝜎2
෢1 =
Under homoscedasticity: 𝑣𝑎𝑟 𝛽
σ 𝑥𝑖 2
σ 𝑥𝑖 2 𝜎𝑖 2
෢1 =
Under heteroscedasticity: 𝑣𝑎𝑟 𝛽
(σ 𝑥𝑖 2 )2
෢1 is no longer the best or minimum variance estimator of 𝛽1 in
• Under heteroscedasticity, 𝛽
the class of linear unbiased estimators

• It is not possible to rely on conventionally computed confidence intervals and 𝑡-test to

test hypothesis about individual regression coefficients
Standard errors of least squares estimators
𝑥𝑖
෢1 = 𝐸[𝛽
𝑣𝑎𝑟 𝛽 ෢1 − 𝛽1 ]2 ∵ 𝛽
෢1 = 𝛽1 + ෍ 𝑘𝑖 𝑢𝑖 , 𝑤ℎ𝑒𝑟𝑒 𝑘𝑖 =
σ 𝑥𝑖2

2
෢1 = 𝐸 ෍ 𝑘𝑖 𝑢𝑖
⇒ 𝑣𝑎𝑟 𝛽 = 𝐸 𝑘1 𝑢1 + 𝑘2 𝑢2 + ⋯ + 𝑘𝑛 𝑢𝑛 2

෢1 = 𝐸 𝑘12 𝑢12 + 𝑘22 𝑢22 + ⋯ + 𝑘𝑛2 𝑢𝑛2 + 2𝑘1 𝑘2 𝑢1 𝑢2 + ⋯ + 2𝑘𝑛−1 𝑘𝑛 𝑢𝑛−1 𝑢𝑛

⇒ 𝑣𝑎𝑟 𝛽

• Under the assumption of heteroscedasticity and no autocorrelation,

σ 𝑥𝑖 2 𝜎𝑖 2
෢1 =
𝑣𝑎𝑟 𝛽
(σ 𝑥𝑖 2 )2
Heteroscedasticity: Detection
Informal methods of detecting heteroscedasticity
• Nature of the problem: In cross-sectional data involving heterogeneous units with a
large range, heteroscedasticity may be the rule rather than the exception
• Graphical method: We estimate the residuals 𝑢ෝ𝑖 after running an OLS model assuming
homoscedasticity
• Plotting 𝑢ෝ𝑖 2 against 𝑌
෢𝑖 provides evidence on the presence or absence of heteroscedasticity

• Panels 𝑏 − (𝑒) display definite patterns of heteroscedasticity

Heteroscedasticity: Detection
Heteroscedasticity: Detection
Formal methods of detecting heteroscedasticity: Breusch-Pagan-Godfrey test
• The 𝑘 + 1 variable linear regression model: 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖 + 𝑢𝑖
• The error variance is assumed to be a function of the 𝑋’s. Specifically,
𝜎𝑖 2 = 𝛼0 + 𝛼1 𝑋1𝑖 + 𝛼2 𝑋2𝑖 + ⋯ + 𝛼𝑘 𝑋𝑘𝑖
• Under homoscedasticity: 𝑯𝟎 : 𝛼1 = 𝛼2 = ⋯ = 𝛼𝑘 = 0 → 𝜎𝑖 2 = 𝛼0 (a constant)
• Estimate 𝑢ෝ𝑖 2 from the model: 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖 + 𝑢𝑖
• Regress 𝑢ෝ𝑖 2 on the 𝑋’s: 𝑢ෝ𝑖 2 = 𝛼0 + 𝛼1 𝑋1𝑖 + 𝛼2 𝑋2𝑖 + ⋯ … + 𝛼𝑚 𝑋𝑚𝑖 + 𝑣𝑖
• Test 𝑯𝟎 : 𝛼1 = 𝛼2 = ⋯ = 𝛼𝑚 = 0 using 𝐿𝑀 or 𝐹 test
• If the critical value of 𝐿𝑀 or 𝐹 statistic exceeds the theoretical value, we reject 𝑯𝟎 .
Else, we may accept 𝑯𝟎
Heteroscedasticity: Detection
Formal methods of detecting heteroscedasticity: White’s test
• The 𝑘 = 3 variable linear regression model: 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + 𝑢𝑖
• The error variance is assumed to be a general function of the 𝑋’s. Specifically,
𝜎𝑖 2 = 𝛼0 + 𝛼1 𝑋1𝑖 + 𝛼2 𝑋2𝑖 + 𝛼3 𝑋1𝑖 2 + 𝛼4 𝑋2𝑖 2 + 𝛼5 𝑋1𝑖 𝑋2𝑖
• Under homoscedasticity: 𝑯𝟎 : 𝛼1 = 𝛼2 = ⋯ = 𝛼5 = 0 → 𝜎𝑖 2 = 𝛼0 (a constant)
• Estimate 𝑢ෝ𝑖 2 from the model: 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖 + 𝑢𝑖
• Rather than regressing on the non-linear functions of the 𝑋’s, we equivalently regress 𝑢ෝ𝑖 2 on a
non-linear function of the fitted 𝑌’s:
2 2
෡ ෡
𝑢ෝ𝑖 = 𝛿0 + 𝛿1 𝑌𝑖 + 𝛿2 𝑌𝑖 + 𝑣𝑖
• Test 𝑯𝟎 : 𝛿1 = 𝛿2 = 0 using 𝐿𝑀 or 𝐹 test
• If the critical value of 𝐿𝑀 or 𝐹 statistic exceeds the theoretical value, we reject 𝑯𝟎 . Else, we
may accept 𝑯𝟎
Heteroscedasticity: Remedial measures
White’s heteroscedasticity consistent standard errors
• The 𝑘 = 2 variable linear regression model: 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖
• Under heteroscedasticity:
𝑣𝑎𝑟 𝑢𝑖 ⃓𝑋𝑖 = 𝐸 𝑢𝑖 2 ⃓𝑋𝑖 = 𝜎𝑖 2
σ 𝑥𝑖 2 𝜎𝑖 2
෢1 =
𝑣𝑎𝑟 𝛽
(σ 𝑥𝑖 2 )2

ෞ𝑖 2
σ 𝑥𝑖 2 𝑢
෢1 under any form of heteroscedasticity:
• A valid estimator of 𝑣𝑎𝑟 𝛽
σ 𝑥𝑖 2 2

• White’s heteroscedasticity consistent standard errors are valid only for large samples
Heteroscedasticity: Remedial measures
Log transformation
• The 𝑘 = 2 variable linear regression model: 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖
• A very useful solution for solving heteroscedasticity is log-transformation of the model
• Log transformation compresses the scale of the variable
THANK YOU

Econometrics I Chapter 3
No ratings yet
Econometrics I Chapter 3
82 pages
CH 4.violations of The Assumptions of The Classical Model
No ratings yet
CH 4.violations of The Assumptions of The Classical Model
54 pages
MGT Three
No ratings yet
MGT Three
86 pages
Lecture 5,6,7 - Violations of CLRM
No ratings yet
Lecture 5,6,7 - Violations of CLRM
91 pages
Chapter Four Violations of The Assumptions of Classical Model
No ratings yet
Chapter Four Violations of The Assumptions of Classical Model
151 pages
Chapter 4
No ratings yet
Chapter 4
38 pages
Chapter 04
No ratings yet
Chapter 04
70 pages
EE1 - 3 - Multiple Linear Regression
No ratings yet
EE1 - 3 - Multiple Linear Regression
30 pages
Lecture 2 - Regression - Multiple - Regressors
No ratings yet
Lecture 2 - Regression - Multiple - Regressors
30 pages
Render Qam13e PPT 01
100% (1)
Render Qam13e PPT 01
41 pages
Econometrics Lecture4 MultipleRegression
No ratings yet
Econometrics Lecture4 MultipleRegression
40 pages
4 Multicollinearity
No ratings yet
4 Multicollinearity
27 pages
Lecture 3 - Econometria I
No ratings yet
Lecture 3 - Econometria I
46 pages
CHAPTER 4 - Violations of Assumptions
No ratings yet
CHAPTER 4 - Violations of Assumptions
96 pages
Multicollinearity, Heteroscedasticity and Autocorrelation
100% (3)
Multicollinearity, Heteroscedasticity and Autocorrelation
23 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Practical Research 2 (Quantitative) First Quarter: Week 1
100% (2)
Practical Research 2 (Quantitative) First Quarter: Week 1
40 pages
(Crim. 6) Criminological Research and Statistics
33% (3)
(Crim. 6) Criminological Research and Statistics
12 pages
Chap4 Econometrics I Jonse
No ratings yet
Chap4 Econometrics I Jonse
51 pages
FRM Part 1: Regression With Multiple Explanatory Variables
No ratings yet
FRM Part 1: Regression With Multiple Explanatory Variables
29 pages
ch4 (Multi Hetro Auto)
No ratings yet
ch4 (Multi Hetro Auto)
43 pages
Lecture 3
No ratings yet
Lecture 3
27 pages
Chapter 4
No ratings yet
Chapter 4
47 pages
ECONOMETRICS Summary 21:22
No ratings yet
ECONOMETRICS Summary 21:22
54 pages
Multicollinearity AND Heteroskedasticity
No ratings yet
Multicollinearity AND Heteroskedasticity
75 pages
Econometrics Edited Chapter-4
No ratings yet
Econometrics Edited Chapter-4
35 pages
Mult Hetero Notes Agd
No ratings yet
Mult Hetero Notes Agd
29 pages
CH 10
No ratings yet
CH 10
9 pages
CH 4 2023 Eonometrics For Acct and Finance
No ratings yet
CH 4 2023 Eonometrics For Acct and Finance
21 pages
Ecc321 Chapter 3
No ratings yet
Ecc321 Chapter 3
8 pages
Chapter 7 (Multicolinarity)
No ratings yet
Chapter 7 (Multicolinarity)
64 pages
Theme 2 Ordinary Least Squares Regression
No ratings yet
Theme 2 Ordinary Least Squares Regression
10 pages
Statistics Module I
100% (1)
Statistics Module I
94 pages
Multicollinearity
No ratings yet
Multicollinearity
35 pages
PHD Course Work in Vtu
100% (2)
PHD Course Work in Vtu
4 pages
CH 10 MULTICOLLINEARITY WHAT HAPPENS IF THE EGRESSORS ARE CORRELATED
No ratings yet
CH 10 MULTICOLLINEARITY WHAT HAPPENS IF THE EGRESSORS ARE CORRELATED
36 pages
pr2 Module 1 Q2
No ratings yet
pr2 Module 1 Q2
16 pages
Econometrics Chapter 3
No ratings yet
Econometrics Chapter 3
24 pages
Multiple Linear Regression Model - Final
No ratings yet
Multiple Linear Regression Model - Final
16 pages
Multicollinearity 2023
No ratings yet
Multicollinearity 2023
32 pages
CH 03
No ratings yet
CH 03
17 pages
Tasnim (2023)
No ratings yet
Tasnim (2023)
116 pages
TỰ LUẬN LƯỢNG
No ratings yet
TỰ LUẬN LƯỢNG
2 pages
Media and Communication Research Methods 2nd Edition Hansen Anders Machin PDF Download
No ratings yet
Media and Communication Research Methods 2nd Edition Hansen Anders Machin PDF Download
76 pages
Lecture 3 Multiple Regression Model-Estimation
No ratings yet
Lecture 3 Multiple Regression Model-Estimation
40 pages
Chapter9 Regression Multicollinearity
No ratings yet
Chapter9 Regression Multicollinearity
25 pages
Child Labor and Learning Outcome in Fishing Communities in Pakwach, Uganda
No ratings yet
Child Labor and Learning Outcome in Fishing Communities in Pakwach, Uganda
26 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
Chapter 4
No ratings yet
Chapter 4
68 pages
Econometric Modeling
No ratings yet
Econometric Modeling
38 pages
The Dutch Flower Cluster
100% (1)
The Dutch Flower Cluster
97 pages
Violation of OLS Assumption - Multicollinearity
No ratings yet
Violation of OLS Assumption - Multicollinearity
18 pages
Lesson 1 Introduction To Research
No ratings yet
Lesson 1 Introduction To Research
49 pages
Mulicolinearity
No ratings yet
Mulicolinearity
18 pages
7dJDuD5Y2Fia6Ch 6 Multicollinearity&Heterosced
No ratings yet
7dJDuD5Y2Fia6Ch 6 Multicollinearity&Heterosced
23 pages
Chapter7 Econometrics Multicollinearity
No ratings yet
Chapter7 Econometrics Multicollinearity
24 pages
CH 03 Wooldridge 6e PPT Updated
No ratings yet
CH 03 Wooldridge 6e PPT Updated
36 pages
Yaregal Birhanu
No ratings yet
Yaregal Birhanu
8 pages
Multicollinearity
No ratings yet
Multicollinearity
26 pages
St. Lawrence High School: First Term Worksheet No. - 16 Class: 11
No ratings yet
St. Lawrence High School: First Term Worksheet No. - 16 Class: 11
2 pages
Wooldridge Notes
No ratings yet
Wooldridge Notes
15 pages
Synopsis Sachin
No ratings yet
Synopsis Sachin
15 pages
Multicollinearity Samiji
No ratings yet
Multicollinearity Samiji
13 pages
BRM-Chapter 1 Edited
No ratings yet
BRM-Chapter 1 Edited
10 pages
2 - Model Linear Jamak Dan OLS
No ratings yet
2 - Model Linear Jamak Dan OLS
11 pages
6338 - Multicollinearity & Autocorrelation
No ratings yet
6338 - Multicollinearity & Autocorrelation
28 pages
Multicollinearity and Endogeneity PDF
No ratings yet
Multicollinearity and Endogeneity PDF
37 pages
Business Econometrics Using SAS Tools (BEST) : Class XI and XII - OLS BLUE and Assumption Errors
No ratings yet
Business Econometrics Using SAS Tools (BEST) : Class XI and XII - OLS BLUE and Assumption Errors
15 pages
FSS Question - SPLENDOUR AT NAPSS LEVEL
No ratings yet
FSS Question - SPLENDOUR AT NAPSS LEVEL
5 pages
Multicollinearity Autocorrelation
No ratings yet
Multicollinearity Autocorrelation
28 pages
19MC201010
No ratings yet
19MC201010
39 pages
Test Bank For Behavioral Sciences STAT 2nd Edition by Heiman
100% (1)
Test Bank For Behavioral Sciences STAT 2nd Edition by Heiman
15 pages
Engineering Management Prelim Reviewer
No ratings yet
Engineering Management Prelim Reviewer
5 pages
III Lesson 2 Identifying The Problem and Asking The Question
No ratings yet
III Lesson 2 Identifying The Problem and Asking The Question
26 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Research Reviewer
No ratings yet
Research Reviewer
7 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
CHAPTER 3 - RESEARCH METHODOLOGY: Data Collection Method and Research Tools
No ratings yet
CHAPTER 3 - RESEARCH METHODOLOGY: Data Collection Method and Research Tools
10 pages
Final Thesis - The Importance of Staff Training in The Hotel Industy
No ratings yet
Final Thesis - The Importance of Staff Training in The Hotel Industy
57 pages
Literature Matrix Sample
No ratings yet
Literature Matrix Sample
16 pages
A Proposal To Implement and Evaluate The Beautiful Me Program
No ratings yet
A Proposal To Implement and Evaluate The Beautiful Me Program
13 pages
Anachem 1
No ratings yet
Anachem 1
3 pages
Handout 1 - Introduction To Research
No ratings yet
Handout 1 - Introduction To Research
3 pages
Quality Assessment Tool For Quantitative Studies Dictionary: A) Selection Bias
No ratings yet
Quality Assessment Tool For Quantitative Studies Dictionary: A) Selection Bias
4 pages
Classification of Research by Method
No ratings yet
Classification of Research by Method
2 pages
Matrix
No ratings yet
Matrix
6 pages
Maddela Comprehensive High School Item Analysis: Subject: Practical Research 2 Grade/Section: STEM 1 - 12
No ratings yet
Maddela Comprehensive High School Item Analysis: Subject: Practical Research 2 Grade/Section: STEM 1 - 12
4 pages
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet

3a - Relaxing The Ols Assumptions

Uploaded by

3a - Relaxing The Ols Assumptions

Uploaded by

RELAXING THE OLS

Example: When regressing household income on consumption, we assume that omitted

• Assumption 6 (Normality of 𝑢𝑖 ): The disturbance term 𝑢𝑖 is normally

• Assumption 8 (No specification bias): The regression model is correctly

• Model specification: Adding polynomial terms of a regressor to a model leads to the

σ 𝑦𝑖 𝑥1𝑖 σ 𝑥2𝑖 2 − σ 𝑦𝑖 𝑥2𝑖 σ 𝑥1𝑖 𝑥2𝑖

𝜆 σ 𝑦𝑖 𝑥2𝑖 σ 𝑥2𝑖 2 − σ 𝑦𝑖 𝑥2𝑖 𝜆 σ 𝑥2𝑖 2 0

Here, 𝛽෡𝑗 is the estimated partial regression coefficient of the regressor 𝑋𝑗 ;

• The 𝑡-ratio of 𝛽෡𝑗 tends to be statistically insignificant

• Higher the condition index, the greater is the degree of multicollinearity

• It is not possible to rely on conventionally computed confidence intervals and 𝑡-test to

෢1 = 𝐸 𝑘12 𝑢12 + 𝑘22 𝑢22 + ⋯ + 𝑘𝑛2 𝑢𝑛2 + 2𝑘1 𝑘2 𝑢1 𝑢2 + ⋯ + 2𝑘𝑛−1 𝑘𝑛 𝑢𝑛−1 𝑢𝑛

• Under the assumption of heteroscedasticity and no autocorrelation,

• Panels 𝑏 − (𝑒) display definite patterns of heteroscedasticity

You might also like