Chapter 7 - Linear Regression
Chapter 7 - Linear Regression
• Managerial decisions are often based on the relationship between two or more variables
• Example: After considering the relationship between advertising expenditures and sales,
a marketing manager might attempt to predict sales for a given level of advertising
expenditures
• Sometimes a manager will rely on intuition to judge how two variables are related
• If data can be obtained, a statistical procedure called regression analysis can be used to develop
an equation showing how the variables are related
• Independent variables or predictor variables: Variables being used to predict the value of the
dependent variable
• Linear regression: A regression analysis involving one independent variable and one dependent
variable
• In statistical notation:
y = dependent variable
x = independent variable
• Simple linear regression: A regression analysis for which any one unit change in the
independent variable, x, is assumed to result in the same change in the dependent variable, y
• Multiple linear regression: Regression analysis involving two or more independent variables
REGRESSION MODEL
y = β0 + β1x + ε
• The error term accounts for the variability in y that cannot be explained by the
linear relationship between x and y
^y = b0 + b1x
• ^y = Point estimator of E(y|x)
• b0 = Estimated y-intercept
• b1 = Estimated slope
• The graph of the estimated simple linear regression equation is called the estimated
regression line
• Least squares method: A procedure for using sample data to find the estimated regression
equation
• The slope b1 is the estimated change in the mean of the dependent variable y that is
associated with a one unit increase in the independent variable x
• The y-intercept b0 is the estimated value of the dependent variable y when the
independent variable x is equal to 0
• ith residual: The error made using the regression model to estimate the mean value of the
dependent variable for the ith observation
• Experimental region: The range of values of the independent variables in the data used to
estimate the model
• Extrapolation: Prediction of the value of the dependent variable outside the experimental
region
• It is risky
• Because we have no empirical evidence that the relationship we have found holds true for values
of x outside of the range of values of x in the data used to estimate the relationship,
extrapolation is risky and should be avoided if possible.
• For Butler Trucking, this means that any prediction outside the travel time for a driving
distance less than 50 miles or greater than 100 miles is not a reliable estimate, and so
for this model the estimate of β0 is meaningless.
• However, if the experimental region for a regression problem includes zero, the y-
intercept will have a meaningful interpretation.
Sums of Squares
• Sum of squares due to error: The value of SSE is a measure of the error in using the estimated
regression equation to predict the values of the dependent variable in the sample
Coefficient of Determination
• The ratio SSR/SST used to evaluate the goodness of fit for the estimated regression equation
SSR
• r2 =
SST
• Take values between zero and one
• Interpreted as the percentage of the total sum of squares that can be explained by using
the estimated regression equation
• y = dependent variable
• ε = error term (accounts for the variability in y that cannot be explained by the linear
effect of the q independent variables)