MFIN 514:
Multiple Regression,
Violations of OLS
Assumptions
Dr. Ryan Ratcliff
SCHOOL OF BUSINESS
Multiple Regression Model
Consider the case of two (or more) regressors:
Yi = β0 + β1X1i + β2X2i + ei, i = 1,…,n
Y is the dependent variable
X1, X2 are the two independent variables (regressors)
(Yi, X1i, X2i) denote the ith observation on Y, X1, and X2.
β0 = intercept
β1 = effect on Y of a change in X1, holding X2 constant
β2 = effect on Y of a change in X2, holding X1 constant
ei = the regression error (omitted factors)
With a few exceptions, most of what you know about simple
regression will generalize to this case with multiple regressors
SCHOOL OF BUSINESS
Omitted Variable Bias
• In our test scores example, we
found that test scores were
negatively correlated with
higher student teacher ratio
(STR).
• Everything that might affect
test scores that’s not STR is in
the error term.
• OLS assumes that the error is
uncorrelated with STR. If
there is something in the error
term that’s correlated with
STR, our estimate of β will be
biased.
SCHOOL OF BUSINESS
Omitted Variable Bias
• Districts with lower % Eng. Learners (PCT_EL) have higher test
scores AND lower STR (smaller classes)
• Do we find negative correlation between STR and Tests Scores
b/c STR is just a proxy for PCT_EL?
SCHOOL OF BUSINESS
Omitted Variable Bias: Math
TESTSCRi = a + b(STRi)+ d(EL_PCTi)+ei
This is the “correct” regression that accounts for both variables, and the b,d
coefficient have the usual “holding all else constant” interpretation.
EL_PCTi = c + g(STRi)+ui (STR, EL_PCT are correlated)
TESTSCRi = a + b(STRi)+ d[c + g(STRi)+ui ]+ei
TESTSCRi = (a+dc)+ (b+dg)STRi + (dui +ei)
If we just have STR, the estimated coefficient is actually (b+dg): a mix of
the coefficient we’re trying to estimate (b) and the indirect effect that
EL_PCT matters for TEST_SCR (d) and STR is correlated with this
omitted variable (g). The dg term is called “omitted variable bias.”
SCHOOL OF BUSINESS
Omitted Variable Bias: Intuition
TESTSCRi = a + b(STRi)+ d(EL_PCTi)+ei
EL_PCTi = c + g(STRi)+ui (STR, EL_PCT are correlated)
TESTSCRi = (a+dc)+ (b+dg)STRi + (dui +ei)
Our regression shows a negative coefficient on STR: bigger classes
predict lower test scores. However, if
1) big classes tend to have high shares of non-native speakers (g>0) AND
2) a high el_pct also predicts lower test scores (d<0), then
dg<0 will make our estimated coefficient in the STR regression look more
negative than the base effect, b – our estimate is biased. In some sense,
STR “gets credit” for some of EL_PCT’s negative effect on test scores.
SCHOOL OF BUSINESS
Cures for Omitted Variable Bias
Three ways to overcome omitted variable bias
1. Run a controlled experiment in which treatment (STR) is randomly
assigned: then PctEL is still a determinant of TestScore, but PctEL
is uncorrelated with STR (g=0). (This solution to OV bias is often
infeasible.)
2. Adopt the “cross tabulation” approach, with finer gradations of STR
and PctEL – within each group, all classes have the same PctEL,
so we “control for PctEL” (Common in Finance)
3. Use a regression in which the omitted variable (PctEL) is no longer
omitted: include PctEL as an additional regressor in a multiple
regression.
SCHOOL OF BUSINESS
CA Test Scores: Mult. Regression
-------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
-------------------------------------------------------------------------
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------
• The t-test for the significance of an individual coefficient is the
same as before…
• Compare these two regressions, and relate these results to our
previous discussion of Omitted Variable Bias.
SCHOOL OF BUSINESS
Coef. Tests,Predictions Same as Before
� 𝐻𝐻𝐻𝐻
𝛽𝛽−𝛽𝛽 �
𝛽𝛽
t-statistic: 𝑡𝑡 = =
𝑆𝑆𝑆𝑆(𝛽𝛽) 𝑆𝑆𝑆𝑆(𝛽𝛽)
Conf Interval: {𝛽𝛽̂ ± 5% Crit Value×SE(𝛽𝛽)}
̂
Predicted Values:
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------
TESTSCR Prediction = 686.03 – 1.10*STR – 0.6498 * PCTEL
SCHOOL OF BUSINESS
N – k : Degrees of Freedom
reg testscr str pctel, robust;
Regression with robust standard errors Number of obs = 420
F( 2, 417) = 223.82
Prob > F = 0.0000
R-squared = 0.4264
Root MSE = 14.464
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------
• A number of tests (esp. ANOVA) will require you to know Sample Size
(N), and # of regressors beyond constant (K)
• Here, N = 420, and K = 2. Several formulae (eg Fstat) will contain a
“degrees of freedom” correction N – K – 1 (-1 is for the constant).
SCHOOL OF BUSINESS
N –k: SER and RMSE
As in regression with a single regressor, the Std.
Error of the Regression and the Root Mean-Sq.
Error are measures of the spread of the Ys around
the regression line (Std. Dev. of Errors):
1 n
SER = ∑
n − k − 1 i =1
ˆ
ui
2 Degrees of Freedom
Correction
1 n 2
RMSE = ∑
n i =1
uˆi No Correction
SCHOOL OF BUSINESS
ANOVA: Same, except N - k
Source of Sum of
Df Mean Square
Variation Squares
Regression
k RSS MSR = RSS/k
(explained)
Error
n–k–1 SSE MSE=SSE/(n-k-1)
(unexplained)
Total n–1 SST
𝑅𝑅𝑅𝑅𝑅𝑅�
2 explained variation RSS 𝑘𝑘 𝑀𝑀𝑀𝑀𝑀𝑀
R = = 𝐹𝐹 =
𝑆𝑆𝑆𝑆𝑆𝑆
=
𝑀𝑀𝑀𝑀𝑀𝑀
total variation SST 𝑛𝑛 − 𝑘𝑘 − 1
SCHOOL OF BUSINESS
R2 vs. Adjusted R2
Recall R2 = RSS / SST = 1 – SSE / SST.
This formulation has the annoying feature that R2 always
increases when we add variables to the regression, even if
they are insignificant.
Adj. R2 uses the N – k logic to apply a penalty to including
another variable.
𝑛𝑛−1 𝑆𝑆𝑆𝑆𝑆𝑆 𝑛𝑛−1
Adj R2= 1 − =1− (1 − 𝑅𝑅2 )
𝑛𝑛−𝑘𝑘−1 𝑆𝑆𝑆𝑆𝑆𝑆 𝑛𝑛−𝑘𝑘−1
If SSE doesn’t go down enough, the benefit of the new
variables does not exceed the cost, so Adj. R2 won’t increase.
SCHOOL OF BUSINESS
CFA 2 Questions on Mult. Regression
Standard Error of the
Variable Coefficient t-statistic p-value
Coefficient
Intercept 0.043 0.01159 3.71 < 0.001
Ln(No. of Analysts) −0.027 0.00466 −5.80 < 0.001
Ln(Market Value) 0.006 0.00271 2.21 0.028
Degrees of Freedom Sum of Squares Mean Square
Regression 2 0.103 0.051
Residual 194 0.559 0.003
Total 196 0.662
Dave Turner is a security analyst who is using regression analysis to determine how well two
factors explain returns for common stocks. The independent variables are the natural logarithm
of the number of analysts following the companies, Ln(no. of analysts), and the natural
logarithm of the market value of the companies, Ln(market value). The regression output
generated from a statistical program is given in the following tables. Each p-value corresponds
to a two-tail test.
Turner plans to use the result in the analysis of two investments. WLK Corp. has twelve analysts
following it and a market capitalization of $2.33 billion. NGR Corp. has two analysts following it
and a market capitalization of $47 million.
SCHOOL OF BUSINESS
CFA 2 Questions on Mult. Regression
Standard Error of the
Variable Coefficient t-statistic p-value
Coefficient
Intercept 0.043 0.01159 3.71 < 0.001
Ln(No. of Analysts) −0.027 0.00466 −5.80 < 0.001
Ln(Market Value) 0.006 0.00271 2.21 0.028
Degrees of Freedom Sum of Squares Mean Square
Regression 2 0.103 0.051
Residual 194 0.559 0.003
Total 196 0.662
The 95% confidence interval (use a t-stat of 1.96 for this question only) of the
estimated coefficient for the independent variable Ln(Market Value) is closest to:
A) 0.011 to 0.001
B) 0.014 to -0.009
C) -0.018 to -0.036
SCHOOL OF BUSINESS
CFA 2 Questions on Mult. Regression
Standard Error of the
Variable Coefficient t-statistic p-value
Coefficient
Intercept 0.043 0.01159 3.71 < 0.001
Ln(No. of Analysts) −0.027 0.00466 −5.80 < 0.001
Ln(Market Value) 0.006 0.00271 2.21 0.028
Degrees of Freedom Sum of Squares Mean Square
Regression 2 0.103 0.051
Residual 194 0.559 0.003
Total 196 0.662
NGR Corp. has two analysts following it and a market capitalization of $47
million. If the number of analysts on NGR Corp. were to double to 4, the change
in the forecast of NGR would be closest to?
A) −0.019.
B) −0.035.
C) −0.055.
SCHOOL OF BUSINESS
CFA 2 Questions on Mult. Regression
Standard Error of the
Variable Coefficient t-statistic p-value
Coefficient
Intercept 0.043 0.01159 3.71 < 0.001
Ln(No. of Analysts) −0.027 0.00466 −5.80 < 0.001
Ln(Market Value) 0.006 0.00271 2.21 0.028
Degrees of Freedom Sum of Squares Mean Square
Regression 2 0.103 0.051
Residual 194 0.559 0.003
Total 196 0.662
Based on a R2 calculated from the information in Table 2, the analyst should
conclude that the number of analysts and ln(market value) of the firm explain:
A) 84.4% of the variation in returns.
B) 18.4% of the variation in returns.
C) 15.6% of the variation in returns.
SCHOOL OF BUSINESS
Model Interpretation
Big picture, across model comments
Across all variations, STR has a statistically
significant, negative relationship with test
scores.
Which model seems best?
Both STR and PCT_EL change magnitudes
depending on the presence of other
controls. Model 3 has highest Adj. R2 /
lowest SER and seems to best address
omitted variable bias.
Econ. interp. / sig of coeffs
In Model 3, a 1 student increase in average
class size predicts a 1 point drop in test
scores. Since average class sizes vary by
more than 20 students across the sample,
these differences matter on 800 point test
SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.
A dummy variable is a 0 or 1 variable that groups the data
into categories:
• ACTION = 1 if this was an action movie
• SEQUEL = 1 if this movie is a sequel
SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.
To interpret the dummy, write out the pred. eqtn by category.
Overall, it’s
BOX = $5,672,516 + 236,527* BUDGET – 2,807,283*ACTION + …
For a new comedy (ACTION, SEQUEL, HORROR = 0), our prediction
is BOX = $5,672,516 + 236,527* BUDGET – 2,807,283*(0)
For a new action movie: BOX = $5,672,516 + 236,527* BUDGET –
2,807,283*(1) = $2,865,233 + 236,527* BUDGET
SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.
For a new action movie: BOX = $5,672,516 + 236,527* BUDGET –
2,807,283*(1) = $2,865,233 + 236,527* BUDGET
When the dummy appears alone, it is an intercept shifter: the
prediction is box office will be lower for an action movie with the
same budget as a non-action movie (?!)
However, significance matters here: the weak t-stat (p-value 53%)
says that we can’t reject the null that box office for action movies is
no different than other movies.
SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.
What if we had the idea that action movies generate more box office
per dollar of budget – a different slope.
By multiplying the variable by the appropriate dummy, we can model
this difference in slope:
BOX = a + c*ACTION + b*BUDGET + d*(BUDGET*ACTION) +…
Not Action (ACTION=0)
BOX = a + c*0 + b*BUDGET + d*(BUDGET*0) +.. = a + b*BUDGET…
Action Movie (ACTION=1)
BOX = a + c*1 + b*BUDGET + d*(BUDGET*1) = (a+c) + (b+d)*BUDGET …
d is difference in slope for action movies; c is the different intercept.
SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.
How do you interpret these results?
SCHOOL OF BUSINESS
Specification Tricks: Logs
Lots of regression specifications use logs:
I. linear-log Yi = β0 + β1ln(Xi) + ui
II. log-linear ln(Yi) = β0 + β1Xi + ui
III. log-log ln(Yi) = β0 + β1ln(Xi) + ui
There are two main reasons to use logs:
1) It’s an easy cure for skewed/heteroskedastic data
2) Coefficients on logs give percentage changes
(elasticities)
SCHOOL OF BUSINESS
Logs: Skew / Heteroskedasticity
Lots of size type variables in finance are very skewed, which will distort OLS.
Taking the log of data like this gives a more normal distribution.
Current Assets ln(Current Assets)
4.0e-04
.3
3.0e-04 4.0e-04
.3
2.0e-04 3.0e-04
.2 .2
Density
Density
Density
Density
1.0e-04 2.0e-04
.1 .1
1.0e-04 0
0 20000 40000 60000 2 4 6 8 10
Current Assets - Total l_act
0
0 20000 40000 60000 2 4 6 8 10
Current Assets - Total l_act
SCHOOL OF BUSINESS
Specification Tricks: Logs
I. linear-log Yi = β0 + β1ln(Xi) + ui
1% change in X β1 unit change in Y
II. log-linear ln(Yi) = β0 + β1Xi + ui
1 unit change in X β1% change in Y
III. log-log ln(Yi) = β0 + β1ln(Xi) + ui
1% change in X β1% change in Y
Important: You can’t compare SER, R2, etc.
across a model of Y vs. ln(Y) – different units.
SCHOOL OF BUSINESS
Violations of Regression Assumptions
Regression Assumption Condition if Violated
Error term has constant Heteroskedasticity
variance.
Error terms are not Serial correlation
correlated with each other. (autocorrelation)
No exact linear relationship Multicollinearity
among “X” variables.
Define it, Explain its effect on OLS
Detect it, Correct for it
SCHOOL OF BUSINESS
Heteroskedasticity
Type 1: Unconditional heteroskedasticity – doesn’t matter
Type 2: Conditional heteroskedasticity
• Related to independent variables (next slide)
• This IS a problem
• Impact: t-stats are usually artificially high
Not affected
Too small in OLS
• Standard error too low = t-stat too high; Type I errors
SCHOOL OF BUSINESS
Conditional Heteroskedasticity
Y Low residual
variance
High residual
variance
0 X
Detection: Scatter diagrams can show when error
variance changes systematically with an X variable.
SCHOOL OF BUSINESS
Conditional Heteroskedasticity
Breusch-Pagan test: Regress squared
residuals on “X” variables.
• Point: Test significance of resulting R2 (do the independent
variables explain a significant part of the variation in
squared residuals?)
• H0: No heteroskedasticity
• Chi-square test: BP = Rresid2 × n (with k df)
Name Drop: B-P Test detects heteroskedasticity
SCHOOL OF BUSINESS
Correcting for Heteroskedasticity
First Method: Use STATA “robust” standard
errors (Huber-White standard errors).
Result: Relative to OLS, standard errors
higher, t-stats lower, and conclusions more
accurate
Second Method: Use generalized least
squares, modifying original equation to
eliminate heteroscedasticity (not on CFA 2).
SCHOOL OF BUSINESS
Serial Correlation: Definition
Positive autocorrelation: Each error term is
positively correlated w/ previous error.
• Common in financial time series data; not as common for
cross-sectional data.
Same problems as heteroskedasticity
• OLS t-stats are too high (Type I errors)
Again: False
significance
Not affected
Too small in OLS
SCHOOL OF BUSINESS
Serial Correlation: Detection
Residual Plots – clusters of +/- errors
Durbin-Watson statistic
DW ≅ 2(1 – r)
Three cases: No correlation, positive correlation, and
negative correlation
• No autocorrelation (ρ = 0)
• DW ≅ 2(1 – 0) = 2
• Positive serial correlation (ρ = 1)
• DW ≅ 2(1 – 1) = 0
• Negative serial correlation (ρ = –1)
• DW ≅ 2(1 – (– 1) ) = 4
SCHOOL OF BUSINESS
Serial Correlation: Correction
Preferred method: Use HAC Std. Errors
• Hansen or Newey West Heteroskedastcity and
Autocorrelation Consistent Std. Errors are bigger than OLS
errors, which offsets OLS tendency to over-reject H0.
• Some gymnastics required to implement in STATA
Alternative: Quasi-Differencing
• Old school: transform data with an estimate of the
correlation between errors so that the new data is not
serially correlated.
SCHOOL OF BUSINESS
Multicollinearity
Define: Two or more “X” variables are strongly
correlated with each other
Intuition: X1 and X2 strongly correlated: hard
to estimate effect of changing X1 when X2 is
held constant.
Effects:
• Inflates OLS SEs; reduces OLS t-stats; increases chance of
Type II “should reject but don’t” errors
• Point: t-stats artificially small so variables falsely look
unimportant
SCHOOL OF BUSINESS
Multicollinearity: Detection & Correction
Observation 1: Significant F-stat (and high R2),
but all t-stats insignificant
Observation 2: High correlation between “X”
variables (more complicated for k>2)
Correction
• Omit one or more of the correlated “X” variables
SCHOOL OF BUSINESS
Perfect Multicollinearity: Dummy Trap
Suppose your data can be perfectly sorted into 2 (or more)
categories: e.g. USD Alums and others
If you include a USD Alum and Not USD Alum dummy together,
then summing the Alum + Not variables will equal 1 for every
observation, which is identical to the constant – “perfect
multicollinearity”
Correction
• Estimate the constant and leave out one dummy: constant is
the intercept for the omitted dummy, other dummies are
deviations from that intercept.
• No constant and all the dummies: each dummy is an intercept.
SCHOOL OF BUSINESS
Summarizing Problems & Fixes
Conditional
Violation Serial Correlation Multicollinearity
Heteroskedasticity
Residual variance related Residuals are Two or more X’s are
What is it?
to level of X’s correlated correlated
Effect? Type I errors (overreject) Type I errors Type II errors
Breusch-Pagan Conflicting t and F
Detection? Durbin-Watson test
chi-square test statistics
Hansen / Newey
Use White-corrected Drop one of the correlated
Correction? West standard errors
standard errors variables
SCHOOL OF BUSINESS
Key Concepts for CFA 2
• Regression: Output, Hypo Test, Conf. Int.
• ANOVA table
• OLS Problems: Detect, Effects, Cures
SCHOOL OF BUSINESS