0% found this document useful (0 votes)

20 views39 pages

MFIN 514 Mod 3

This document provides an overview of multiple regression analysis and violations of ordinary least squares (OLS) assumptions. It discusses omitted variable bias that can occur when important independent variables are excluded from a regression model. It also covers calculating degrees of freedom, standard error of regression, root mean squared error, analysis of variance (ANOVA), the R-squared statistic, and adjusted R-squared statistic in the context of multiple regression analysis. Examples using California test score data are provided to illustrate key concepts.

Uploaded by

Liam Fraleigh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views39 pages

MFIN 514 Mod 3

Uploaded by

Liam Fraleigh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

MFIN 514:

Multiple Regression,
Violations of OLS
Assumptions

Dr. Ryan Ratcliff

SCHOOL OF BUSINESS
Multiple Regression Model
Consider the case of two (or more) regressors:
Yi = β0 + β1X1i + β2X2i + ei, i = 1,…,n

Y is the dependent variable

X1, X2 are the two independent variables (regressors)
(Yi, X1i, X2i) denote the ith observation on Y, X1, and X2.
β0 = intercept
β1 = effect on Y of a change in X1, holding X2 constant
β2 = effect on Y of a change in X2, holding X1 constant
ei = the regression error (omitted factors)

With a few exceptions, most of what you know about simple

regression will generalize to this case with multiple regressors

SCHOOL OF BUSINESS
Omitted Variable Bias
• In our test scores example, we
found that test scores were
negatively correlated with
higher student teacher ratio
(STR).

• Everything that might affect

test scores that’s not STR is in
the error term.

• OLS assumes that the error is

uncorrelated with STR. If
there is something in the error
term that’s correlated with
STR, our estimate of β will be
biased.

SCHOOL OF BUSINESS
Omitted Variable Bias

• Districts with lower % Eng. Learners (PCT_EL) have higher test

scores AND lower STR (smaller classes)
• Do we find negative correlation between STR and Tests Scores
b/c STR is just a proxy for PCT_EL?

SCHOOL OF BUSINESS
Omitted Variable Bias: Math
TESTSCRi = a + b(STRi)+ d(EL_PCTi)+ei
This is the “correct” regression that accounts for both variables, and the b,d
coefficient have the usual “holding all else constant” interpretation.

EL_PCTi = c + g(STRi)+ui (STR, EL_PCT are correlated)

TESTSCRi = a + b(STRi)+ d[c + g(STRi)+ui ]+ei

TESTSCRi = (a+dc)+ (b+dg)STRi + (dui +ei)

If we just have STR, the estimated coefficient is actually (b+dg): a mix of

the coefficient we’re trying to estimate (b) and the indirect effect that
EL_PCT matters for TEST_SCR (d) and STR is correlated with this
omitted variable (g). The dg term is called “omitted variable bias.”

SCHOOL OF BUSINESS
Omitted Variable Bias: Intuition
TESTSCRi = a + b(STRi)+ d(EL_PCTi)+ei
EL_PCTi = c + g(STRi)+ui (STR, EL_PCT are correlated)
TESTSCRi = (a+dc)+ (b+dg)STRi + (dui +ei)

Our regression shows a negative coefficient on STR: bigger classes

predict lower test scores. However, if
1) big classes tend to have high shares of non-native speakers (g>0) AND
2) a high el_pct also predicts lower test scores (d<0), then

dg<0 will make our estimated coefficient in the STR regression look more
negative than the base effect, b – our estimate is biased. In some sense,
STR “gets credit” for some of EL_PCT’s negative effect on test scores.

SCHOOL OF BUSINESS
Cures for Omitted Variable Bias
Three ways to overcome omitted variable bias

1. Run a controlled experiment in which treatment (STR) is randomly

assigned: then PctEL is still a determinant of TestScore, but PctEL
is uncorrelated with STR (g=0). (This solution to OV bias is often
infeasible.)

2. Adopt the “cross tabulation” approach, with finer gradations of STR

and PctEL – within each group, all classes have the same PctEL,
so we “control for PctEL” (Common in Finance)

3. Use a regression in which the omitted variable (PctEL) is no longer

omitted: include PctEL as an additional regressor in a multiple
regression.

SCHOOL OF BUSINESS
CA Test Scores: Mult. Regression
-------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
-------------------------------------------------------------------------

------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------

• The t-test for the significance of an individual coefficient is the

same as before…
• Compare these two regressions, and relate these results to our
previous discussion of Omitted Variable Bias.

SCHOOL OF BUSINESS
Coef. Tests,Predictions Same as Before
� 𝐻𝐻𝐻𝐻
𝛽𝛽−𝛽𝛽 �
𝛽𝛽
t-statistic: 𝑡𝑡 = =
𝑆𝑆𝑆𝑆(𝛽𝛽) 𝑆𝑆𝑆𝑆(𝛽𝛽)

Conf Interval: {𝛽𝛽̂ ± 5% Crit Value×SE(𝛽𝛽)}

Predicted Values:
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------

TESTSCR Prediction = 686.03 – 1.10STR – 0.6498 PCTEL

SCHOOL OF BUSINESS
N – k : Degrees of Freedom
reg testscr str pctel, robust;

Regression with robust standard errors Number of obs = 420

F( 2, 417) = 223.82
Prob > F = 0.0000
R-squared = 0.4264
Root MSE = 14.464

• A number of tests (esp. ANOVA) will require you to know Sample Size
(N), and # of regressors beyond constant (K)

• Here, N = 420, and K = 2. Several formulae (eg Fstat) will contain a

“degrees of freedom” correction N – K – 1 (-1 is for the constant).

SCHOOL OF BUSINESS
N –k: SER and RMSE
As in regression with a single regressor, the Std.
Error of the Regression and the Root Mean-Sq.
Error are measures of the spread of the Ys around
the regression line (Std. Dev. of Errors):

1 n

SER = ∑
n − k − 1 i =1
ˆ
ui
2 Degrees of Freedom
Correction

1 n 2
RMSE = ∑
n i =1
uˆi No Correction

SCHOOL OF BUSINESS
ANOVA: Same, except N - k
Source of Sum of
Df Mean Square
Variation Squares
Regression
k RSS MSR = RSS/k
(explained)
Error
n–k–1 SSE MSE=SSE/(n-k-1)
(unexplained)
Total n–1 SST

𝑅𝑅𝑅𝑅𝑅𝑅�
2 explained variation RSS 𝑘𝑘 𝑀𝑀𝑀𝑀𝑀𝑀
R = = 𝐹𝐹 =
𝑆𝑆𝑆𝑆𝑆𝑆
=
𝑀𝑀𝑀𝑀𝑀𝑀
total variation SST 𝑛𝑛 − 𝑘𝑘 − 1

SCHOOL OF BUSINESS
R2 vs. Adjusted R2
Recall R2 = RSS / SST = 1 – SSE / SST.

This formulation has the annoying feature that R2 always

increases when we add variables to the regression, even if
they are insignificant.

Adj. R2 uses the N – k logic to apply a penalty to including

another variable.

𝑛𝑛−1 𝑆𝑆𝑆𝑆𝑆𝑆 𝑛𝑛−1

Adj R2= 1 − =1− (1 − 𝑅𝑅2 )
𝑛𝑛−𝑘𝑘−1 𝑆𝑆𝑆𝑆𝑆𝑆 𝑛𝑛−𝑘𝑘−1

If SSE doesn’t go down enough, the benefit of the new

variables does not exceed the cost, so Adj. R2 won’t increase.

SCHOOL OF BUSINESS
CFA 2 Questions on Mult. Regression
Standard Error of the
Variable Coefficient t-statistic p-value
Coefficient
Intercept 0.043 0.01159 3.71 < 0.001
Ln(No. of Analysts) −0.027 0.00466 −5.80 < 0.001
Ln(Market Value) 0.006 0.00271 2.21 0.028

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051

Residual 194 0.559 0.003
Total 196 0.662

Dave Turner is a security analyst who is using regression analysis to determine how well two
factors explain returns for common stocks. The independent variables are the natural logarithm
of the number of analysts following the companies, Ln(no. of analysts), and the natural
logarithm of the market value of the companies, Ln(market value). The regression output
generated from a statistical program is given in the following tables. Each p-value corresponds
to a two-tail test.

Turner plans to use the result in the analysis of two investments. WLK Corp. has twelve analysts
following it and a market capitalization of $2.33 billion. NGR Corp. has two analysts following it
and a market capitalization of $47 million.

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051

Residual 194 0.559 0.003
Total 196 0.662

The 95% confidence interval (use a t-stat of 1.96 for this question only) of the
estimated coefficient for the independent variable Ln(Market Value) is closest to:

A) 0.011 to 0.001
B) 0.014 to -0.009
C) -0.018 to -0.036

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051

Residual 194 0.559 0.003
Total 196 0.662

NGR Corp. has two analysts following it and a market capitalization of $47
million. If the number of analysts on NGR Corp. were to double to 4, the change
in the forecast of NGR would be closest to?

A) −0.019.
B) −0.035.
C) −0.055.

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051

Residual 194 0.559 0.003
Total 196 0.662

Based on a R2 calculated from the information in Table 2, the analyst should

conclude that the number of analysts and ln(market value) of the firm explain:

A) 84.4% of the variation in returns.

B) 18.4% of the variation in returns.
C) 15.6% of the variation in returns.

SCHOOL OF BUSINESS
Model Interpretation
Big picture, across model comments
Across all variations, STR has a statistically
significant, negative relationship with test
scores.

Which model seems best?

Both STR and PCT_EL change magnitudes
depending on the presence of other
controls. Model 3 has highest Adj. R2 /
lowest SER and seems to best address
omitted variable bias.

Econ. interp. / sig of coeffs

In Model 3, a 1 student increase in average
class size predicts a 1 point drop in test
scores. Since average class sizes vary by
more than 20 students across the sample,
these differences matter on 800 point test

SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.

A dummy variable is a 0 or 1 variable that groups the data

into categories:

• ACTION = 1 if this was an action movie

• SEQUEL = 1 if this movie is a sequel

SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.

To interpret the dummy, write out the pred. eqtn by category.

Overall, it’s

BOX = $5,672,516 + 236,527* BUDGET – 2,807,283*ACTION + …

For a new comedy (ACTION, SEQUEL, HORROR = 0), our prediction

is BOX = $5,672,516 + 236,527* BUDGET – 2,807,283*(0)

For a new action movie: BOX = $5,672,516 + 236,527* BUDGET –

2,807,283*(1) = $2,865,233 + 236,527* BUDGET

SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.

For a new action movie: BOX = $5,672,516 + 236,527* BUDGET –

2,807,283*(1) = $2,865,233 + 236,527* BUDGET

When the dummy appears alone, it is an intercept shifter: the

prediction is box office will be lower for an action movie with the
same budget as a non-action movie (?!)

However, significance matters here: the weak t-stat (p-value 53%)

says that we can’t reject the null that box office for action movies is
no different than other movies.

SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.
What if we had the idea that action movies generate more box office
per dollar of budget – a different slope.

By multiplying the variable by the appropriate dummy, we can model

this difference in slope:

BOX = a + cACTION + bBUDGET + d(BUDGETACTION) +…

Not Action (ACTION=0)

BOX = a + c*0 + b*BUDGET + d*(BUDGET*0) +.. = a + b*BUDGET…

Action Movie (ACTION=1)

BOX = a + c*1 + b*BUDGET + d*(BUDGET*1) = (a+c) + (b+d)*BUDGET …

d is difference in slope for action movies; c is the different intercept.

SCHOOL OF BUSINESS
Specification Tricks: Dummy Vars.

How do you interpret these results?

SCHOOL OF BUSINESS
Specification Tricks: Logs
Lots of regression specifications use logs:

I. linear-log Yi = β0 + β1ln(Xi) + ui
II. log-linear ln(Yi) = β0 + β1Xi + ui
III. log-log ln(Yi) = β0 + β1ln(Xi) + ui

There are two main reasons to use logs:

1) It’s an easy cure for skewed/heteroskedastic data

2) Coefficients on logs give percentage changes
(elasticities)

SCHOOL OF BUSINESS
Logs: Skew / Heteroskedasticity
Lots of size type variables in finance are very skewed, which will distort OLS.
Taking the log of data like this gives a more normal distribution.
Current Assets ln(Current Assets)
4.0e-04

.3
3.0e-04 4.0e-04

.3
2.0e-04 3.0e-04

.2 .2
Density

Density
Density

Density
1.0e-04 2.0e-04

.1 .1
1.0e-04 0

0 20000 40000 60000 2 4 6 8 10

Current Assets - Total l_act
0

0 20000 40000 60000 2 4 6 8 10

Current Assets - Total l_act

SCHOOL OF BUSINESS
Specification Tricks: Logs
I. linear-log Yi = β0 + β1ln(Xi) + ui
1% change in X  β1 unit change in Y

II. log-linear ln(Yi) = β0 + β1Xi + ui

1 unit change in X  β1% change in Y

III. log-log ln(Yi) = β0 + β1ln(Xi) + ui

1% change in X  β1% change in Y

Important: You can’t compare SER, R2, etc.

across a model of Y vs. ln(Y) – different units.

SCHOOL OF BUSINESS
Violations of Regression Assumptions
Regression Assumption Condition if Violated
Error term has constant Heteroskedasticity
variance.
Error terms are not Serial correlation
correlated with each other. (autocorrelation)
No exact linear relationship Multicollinearity
among “X” variables.

Define it, Explain its effect on OLS

Detect it, Correct for it

SCHOOL OF BUSINESS
Heteroskedasticity
Type 1: Unconditional heteroskedasticity – doesn’t matter

Type 2: Conditional heteroskedasticity

• Related to independent variables (next slide)
• This IS a problem
• Impact: t-stats are usually artificially high

Not affected
Too small in OLS
• Standard error too low = t-stat too high; Type I errors

SCHOOL OF BUSINESS
Conditional Heteroskedasticity
Y Low residual
variance

High residual
variance

0 X
Detection: Scatter diagrams can show when error
variance changes systematically with an X variable.

SCHOOL OF BUSINESS
Conditional Heteroskedasticity
Breusch-Pagan test: Regress squared
residuals on “X” variables.

• Point: Test significance of resulting R2 (do the independent

variables explain a significant part of the variation in
squared residuals?)
• H0: No heteroskedasticity
• Chi-square test: BP = Rresid2 × n (with k df)

Name Drop: B-P Test detects heteroskedasticity

SCHOOL OF BUSINESS
Correcting for Heteroskedasticity
First Method: Use STATA “robust” standard
errors (Huber-White standard errors).
Result: Relative to OLS, standard errors
higher, t-stats lower, and conclusions more
accurate
Second Method: Use generalized least
squares, modifying original equation to
eliminate heteroscedasticity (not on CFA 2).

SCHOOL OF BUSINESS
Serial Correlation: Definition
Positive autocorrelation: Each error term is
positively correlated w/ previous error.
• Common in financial time series data; not as common for
cross-sectional data.

Same problems as heteroskedasticity

• OLS t-stats are too high (Type I errors)
Again: False
significance
Not affected
Too small in OLS

SCHOOL OF BUSINESS
Serial Correlation: Detection
Residual Plots – clusters of +/- errors

Durbin-Watson statistic
DW ≅ 2(1 – r)
Three cases: No correlation, positive correlation, and
negative correlation
• No autocorrelation (ρ = 0)
• DW ≅ 2(1 – 0) = 2
• Positive serial correlation (ρ = 1)
• DW ≅ 2(1 – 1) = 0
• Negative serial correlation (ρ = –1)
• DW ≅ 2(1 – (– 1) ) = 4

SCHOOL OF BUSINESS
Serial Correlation: Correction
Preferred method: Use HAC Std. Errors
• Hansen or Newey West Heteroskedastcity and
Autocorrelation Consistent Std. Errors are bigger than OLS
errors, which offsets OLS tendency to over-reject H0.
• Some gymnastics required to implement in STATA

Alternative: Quasi-Differencing
• Old school: transform data with an estimate of the
correlation between errors so that the new data is not
serially correlated.

SCHOOL OF BUSINESS
Multicollinearity
Define: Two or more “X” variables are strongly
correlated with each other
Intuition: X1 and X2 strongly correlated: hard
to estimate effect of changing X1 when X2 is
held constant.
Effects:
• Inflates OLS SEs; reduces OLS t-stats; increases chance of
Type II “should reject but don’t” errors

• Point: t-stats artificially small so variables falsely look

unimportant

SCHOOL OF BUSINESS
Multicollinearity: Detection & Correction
Observation 1: Significant F-stat (and high R2),
but all t-stats insignificant

Observation 2: High correlation between “X”

variables (more complicated for k>2)

Correction
• Omit one or more of the correlated “X” variables

SCHOOL OF BUSINESS
Perfect Multicollinearity: Dummy Trap
Suppose your data can be perfectly sorted into 2 (or more)
categories: e.g. USD Alums and others

If you include a USD Alum and Not USD Alum dummy together,
then summing the Alum + Not variables will equal 1 for every
observation, which is identical to the constant – “perfect
multicollinearity”

Correction
• Estimate the constant and leave out one dummy: constant is
the intercept for the omitted dummy, other dummies are
deviations from that intercept.
• No constant and all the dummies: each dummy is an intercept.

SCHOOL OF BUSINESS
Summarizing Problems & Fixes

Conditional
Violation Serial Correlation Multicollinearity
Heteroskedasticity

Residual variance related Residuals are Two or more X’s are

What is it?
to level of X’s correlated correlated

Effect? Type I errors (overreject) Type I errors Type II errors

Breusch-Pagan Conflicting t and F

Detection? Durbin-Watson test
chi-square test statistics
Hansen / Newey
Use White-corrected Drop one of the correlated
Correction? West standard errors
standard errors variables

SCHOOL OF BUSINESS
Key Concepts for CFA 2
• Regression: Output, Hypo Test, Conf. Int.

• ANOVA table

• OLS Problems: Detect, Effects, Cures

SCHOOL OF BUSINESS

Econometrics For Finance Lecture III
No ratings yet
Econometrics For Finance Lecture III
54 pages
Topic 3 Multiple Regression Analysis Estimation
No ratings yet
Topic 3 Multiple Regression Analysis Estimation
31 pages
EC311_Slides_Spring25_Week9_Part1
No ratings yet
EC311_Slides_Spring25_Week9_Part1
16 pages
Multiple Regression
No ratings yet
Multiple Regression
60 pages
Multiple Regression Slides Mod-Ed
No ratings yet
Multiple Regression Slides Mod-Ed
32 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Raw Introduction to Linear Regression(서울대 회귀분석 강의노트)
No ratings yet
Raw Introduction to Linear Regression(서울대 회귀분석 강의노트)
226 pages
Part 8 Linear Regression
No ratings yet
Part 8 Linear Regression
6 pages
Week 5 Multiple Regression: Busa3500 Statistics For Business Ii Piedmont College
No ratings yet
Week 5 Multiple Regression: Busa3500 Statistics For Business Ii Piedmont College
57 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
C2 English
No ratings yet
C2 English
34 pages
FRM Part 1: Regression With Multiple Explanatory Variables
No ratings yet
FRM Part 1: Regression With Multiple Explanatory Variables
29 pages
Introduction To Econometrics - Stock & Watson - CH 5 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 5 Slides
71 pages
4.1 Multiple Regression Models
No ratings yet
4.1 Multiple Regression Models
6 pages
Basic Econometrics III
No ratings yet
Basic Econometrics III
23 pages
High Yield Notes
No ratings yet
High Yield Notes
251 pages
Multiple Regression (Compatibility Mode)
No ratings yet
Multiple Regression (Compatibility Mode)
24 pages
Unit 5
No ratings yet
Unit 5
10 pages
ADM2304 Multiple Regression Dr. Suren Phansalker
No ratings yet
ADM2304 Multiple Regression Dr. Suren Phansalker
12 pages
Multiple Regr 0
No ratings yet
Multiple Regr 0
44 pages
Pajares, Allan Mark L. - MLR.
No ratings yet
Pajares, Allan Mark L. - MLR.
2 pages
Multiple Regression
No ratings yet
Multiple Regression
36 pages
SRM FS
No ratings yet
SRM FS
23 pages
Linear Regression (Simple & Multiple)
No ratings yet
Linear Regression (Simple & Multiple)
29 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Multiple Linear Regression-I
No ratings yet
Multiple Linear Regression-I
6 pages
01 - Quantitative Methods
No ratings yet
01 - Quantitative Methods
28 pages
CH 14 Handout
No ratings yet
CH 14 Handout
6 pages
2b Multiple Linear Regression
No ratings yet
2b Multiple Linear Regression
14 pages
Chapter 15 - Q&A Extra
No ratings yet
Chapter 15 - Q&A Extra
3 pages
Chapter 14 MR
No ratings yet
Chapter 14 MR
35 pages
Chapter 09
No ratings yet
Chapter 09
25 pages
Regression Analysis: Business Statistics Class Lecturer: Dr. Phan Nguyen Ky Phuc
No ratings yet
Regression Analysis: Business Statistics Class Lecturer: Dr. Phan Nguyen Ky Phuc
3 pages
Tutorial 7 Questions - Model - Answers-21
No ratings yet
Tutorial 7 Questions - Model - Answers-21
5 pages
t2 Sol
No ratings yet
t2 Sol
5 pages
Week 11-2 Lecture 15 Student
No ratings yet
Week 11-2 Lecture 15 Student
54 pages
Bivariate
No ratings yet
Bivariate
28 pages
MultipleRegression 1
No ratings yet
MultipleRegression 1
40 pages
C2 English
No ratings yet
C2 English
33 pages
4 Multiple Regression Analysis
No ratings yet
4 Multiple Regression Analysis
58 pages
Team8 Lab3
No ratings yet
Team8 Lab3
12 pages
Module 6A Estimating Relationships
No ratings yet
Module 6A Estimating Relationships
104 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
Chapter 14-Introduction To Multiple Regression
No ratings yet
Chapter 14-Introduction To Multiple Regression
67 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Week 2 - The Simple Linear Regression Model PDF
No ratings yet
Week 2 - The Simple Linear Regression Model PDF
47 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
73 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Final Exam 2019 RSoln
No ratings yet
Final Exam 2019 RSoln
52 pages
Lecture 12 Regression Edited
No ratings yet
Lecture 12 Regression Edited
35 pages
Lect 7
No ratings yet
Lect 7
15 pages
EVSC 445 Week 11
No ratings yet
EVSC 445 Week 11
40 pages
Linear Regression
100% (2)
Linear Regression
28 pages
5ssmn932 Lecture7 2021 Collated Online
No ratings yet
5ssmn932 Lecture7 2021 Collated Online
79 pages
Statistics
No ratings yet
Statistics
2 pages
Pacioli - Group4 Quality of Work Life and Job Satisfaction Final Paper
No ratings yet
Pacioli - Group4 Quality of Work Life and Job Satisfaction Final Paper
63 pages
Business Statistics 2023-25
No ratings yet
Business Statistics 2023-25
4 pages
Prediction of House Prices Using Machine Learning
No ratings yet
Prediction of House Prices Using Machine Learning
8 pages
Sample Surveys Unit II Part 2 Systematic Sampling
No ratings yet
Sample Surveys Unit II Part 2 Systematic Sampling
4 pages
Machine-Learning Set 5
No ratings yet
Machine-Learning Set 5
22 pages
13 - GBFR-24-02-09 - Nam Huong Dau
No ratings yet
13 - GBFR-24-02-09 - Nam Huong Dau
13 pages
ML Unit2
No ratings yet
ML Unit2
22 pages
2022 Sydney Grammar Standard Maths Trials & Solutions
No ratings yet
2022 Sydney Grammar Standard Maths Trials & Solutions
48 pages
1018 5941 001.2024.issue 221 en
No ratings yet
1018 5941 001.2024.issue 221 en
60 pages
Sma 160 Introduction To Probability and Statistics
No ratings yet
Sma 160 Introduction To Probability and Statistics
4 pages
Educational Statistics Notes
No ratings yet
Educational Statistics Notes
32 pages
Data Set and Linear Regression Analysis
No ratings yet
Data Set and Linear Regression Analysis
2 pages
08 Hunegnaw Presentation
No ratings yet
08 Hunegnaw Presentation
16 pages
7MC0 01 Que 20180517
No ratings yet
7MC0 01 Que 20180517
28 pages
E Wallet Service
No ratings yet
E Wallet Service
9 pages
Biostatistics and Computer Based Analysis of Health Data Using Stata Full Text Download
100% (16)
Biostatistics and Computer Based Analysis of Health Data Using Stata Full Text Download
14 pages
Ateba and Ongo (2024)
No ratings yet
Ateba and Ongo (2024)
38 pages
The Revised Ucla Loneliness Scale
No ratings yet
The Revised Ucla Loneliness Scale
10 pages
Covariance - Wikipedia
No ratings yet
Covariance - Wikipedia
10 pages
Analysis of A Data Set
No ratings yet
Analysis of A Data Set
7 pages
Introduction To Biostatistics and Research Methods 5th Edition P. S. S. Sundar Rao PDF Download
0% (1)
Introduction To Biostatistics and Research Methods 5th Edition P. S. S. Sundar Rao PDF Download
61 pages
BL NCM 5200 Lec 1922S Biostatistics
No ratings yet
BL NCM 5200 Lec 1922S Biostatistics
9 pages
2 - CE 727 Remote Sensing - Image Processing
No ratings yet
2 - CE 727 Remote Sensing - Image Processing
79 pages
IB Math Practice IA (GPA and Salary)
No ratings yet
IB Math Practice IA (GPA and Salary)
3 pages
Takane 2001
No ratings yet
Takane 2001
29 pages
Bsa s01 s02 Ppt-In-class
No ratings yet
Bsa s01 s02 Ppt-In-class
125 pages
Mso-03 IGNOU Assignment
No ratings yet
Mso-03 IGNOU Assignment
2 pages
Research Presentation
No ratings yet
Research Presentation
29 pages

MFIN 514 Mod 3

Uploaded by

MFIN 514 Mod 3

Uploaded by

MFIN 514:

Dr. Ryan Ratcliff

Y is the dependent variable

With a few exceptions, most of what you know about simple

• Everything that might affect

• OLS assumes that the error is

• Districts with lower % Eng. Learners (PCT_EL) have higher test

EL_PCTi = c + g(STRi)+ui (STR, EL_PCT are correlated)

TESTSCRi = a + b(STRi)+ d[c + g(STRi)+ui ]+ei

If we just have STR, the estimated coefficient is actually (b+dg): a mix of

Our regression shows a negative coefficient on STR: bigger classes

1. Run a controlled experiment in which treatment (STR) is randomly

2. Adopt the “cross tabulation” approach, with finer gradations of STR

3. Use a regression in which the omitted variable (PctEL) is no longer

• The t-test for the significance of an individual coefficient is the

Conf Interval: {𝛽𝛽̂ ± 5% Crit Value×SE(𝛽𝛽)}

TESTSCR Prediction = 686.03 – 1.10*STR – 0.6498 * PCTEL

Regression with robust standard errors Number of obs = 420

• Here, N = 420, and K = 2. Several formulae (eg Fstat) will contain a

This formulation has the annoying feature that R2 always

Adj. R2 uses the N – k logic to apply a penalty to including

𝑛𝑛−1 𝑆𝑆𝑆𝑆𝑆𝑆 𝑛𝑛−1

If SSE doesn’t go down enough, the benefit of the new

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051

Degrees of Freedom Sum of Squares Mean Square

Regression 2 0.103 0.051

Based on a R2 calculated from the information in Table 2, the analyst should

A) 84.4% of the variation in returns.

Which model seems best?

Econ. interp. / sig of coeffs

A dummy variable is a 0 or 1 variable that groups the data

• ACTION = 1 if this was an action movie

To interpret the dummy, write out the pred. eqtn by category.

BOX = $5,672,516 + 236,527* BUDGET – 2,807,283*ACTION + …

For a new comedy (ACTION, SEQUEL, HORROR = 0), our prediction

For a new action movie: BOX = $5,672,516 + 236,527* BUDGET –

For a new action movie: BOX = $5,672,516 + 236,527* BUDGET –

When the dummy appears alone, it is an intercept shifter: the

However, significance matters here: the weak t-stat (p-value 53%)

By multiplying the variable by the appropriate dummy, we can model

BOX = a + c*ACTION + b*BUDGET + d*(BUDGET*ACTION) +…

Not Action (ACTION=0)

Action Movie (ACTION=1)

d is difference in slope for action movies; c is the different intercept.

How do you interpret these results?

There are two main reasons to use logs:

1) It’s an easy cure for skewed/heteroskedastic data

0 20000 40000 60000 2 4 6 8 10

0 20000 40000 60000 2 4 6 8 10

II. log-linear ln(Yi) = β0 + β1Xi + ui

III. log-log ln(Yi) = β0 + β1ln(Xi) + ui

Important: You can’t compare SER, R2, etc.

Define it, Explain its effect on OLS

Detect it, Correct for it

Type 2: Conditional heteroskedasticity

• Point: Test significance of resulting R2 (do the independent

Name Drop: B-P Test detects heteroskedasticity

Same problems as heteroskedasticity

• Point: t-stats artificially small so variables falsely look

Observation 2: High correlation between “X”

Residual variance related Residuals are Two or more X’s are

Effect? Type I errors (overreject) Type I errors Type II errors

Breusch-Pagan Conflicting t and F

• OLS Problems: Detect, Effects, Cures

You might also like

TESTSCR Prediction = 686.03 – 1.10STR – 0.6498 PCTEL

BOX = a + cACTION + bBUDGET + d(BUDGETACTION) +…