BA Module 5 Summary
BA Module 5 Summary
© Copyright 2020 President and Fellows of Harvard College. All Rights Reserved.
do this by analyzing the regression’s residual plots and the p-values associated
with each independent variable’s coefficient.
• For multiple regression models, because it is difficult to view the data in a simple
scatter plot, residual plots are an indispensable tool for detecting whether the
linear model is a good fit.
o There is a residual plot for each independent variable included in the
regression model.
o We can graph a residual plot for each independent variable to help detect
patterns such as heteroskedasticity and nonlinearity.
o As with single variable regression models, if the underlying multiple
relationship is linear, each of the residuals follows a normal distribution
with a mean of zero and fixed variance.
• We should also analyze the p-values of the independent variables to determine
whether there is a significant relationship between the variables in the model. If
the p-value of each of the independent variables is less than 0.05, we conclude
that there is sufficient evidence to say that we are 95% confident that there is a
significant linear relationship between the independent and dependent variables.
• Multiple regression requires us to be aware of the possibility of multicollinearity
among the independent variables.
o Multicollinearity occurs when there is a strong linear relationship among
two or more of the independent variables.
o Indications of multicollinearity include seeing an independent variable’s
p-value increase when one or more other independent variables are
added to a regression model.
o We may be able to reduce multicollinearity by either increasing the sample
size or removing one (or more) of the collinear variables.
• Dummy variables and lagged variables can be useful in regression models.
o Multiple regression models allow us to include multiple dummy variables
for categorical data—day of week, for example.
§ A dummy variable is equal to 1 when the variable of interest fits a
certain criterion. For example, a dummy variable for “Saturday”
would equal 1 for observations relating to Saturdays and 0 for
observations related to all other days.
§ The number of dummy variables we include must always be one
fewer than the number of options in a category.
• We can also include lagged variables in multiple regression models. Lagged
values are used to capture the ongoing effects of a given variable.
o The lag period is based on managerial insight and data availability.
o Including lagged variables has some drawbacks:
§ Each lagged variable decreases our sample size by one
observation.
© Copyright 2020 President and Fellows of Harvard College. All Rights Reserved. 2
§ If the lagged variable does not increase the model’s explanatory
power, the addition of the variable decreases Adjusted R2.
EXCEL SUMMARY
Recall the Excel functions and analyses covered in this course and make sure to
familiarize yourself with all of the necessary steps, syntax, and arguments. We have
provided some additional information for the more complex functions listed below. As
usual, the arguments shown in square brackets are optional.
© Copyright 2020 President and Fellows of Harvard College. All Rights Reserved. 3