0% found this document useful (0 votes)

12 views119 pages

Business Analytics

Uploaded by

isma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views119 pages

Business Analytics

Uploaded by

isma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 119

Business Analytics

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Linear Regression
Chapter 7

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Introduction (Slide 1 of 2)
• Managerial decisions are often based on the relationship between
two or more variables:
• Example: After considering the relationship between advertising
expenditures and sales, a marketing manager might attempt to
predict sales for a given level of advertising expenditures.
• Sometimes a manager will rely on intuition to judge how two
variables are related.
• If data can be obtained, a statistical procedure called regression
analysis can be used to develop an equation showing how the
variables are related.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Introduction (Slide 2 of 2)
• Dependent variable or response: Variable being predicted.
• Independent variables or predictor variables: Variables being used
to predict the value of the dependent variable.
• Simple linear regression: A regression analysis for which any one
unit change in the independent variable, x, is assumed to result in
the same change in the dependent variable, y.
• Multiple linear regression: A regression analysis involving two or
more independent variables.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression Model (Slide 1 of 5)
Regression Model:
• The equation that describes how y is related to x and an error term.

Parameters: The characteristics of the population,  0 and 1 .

Random variable: Error term,  .

The error term accounts for the variability in y that cannot be explained
by the linear relationship between x and y.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression Model (Slide 2 of 5)
Estimated Regression Equation:
The parameter values are usually not known and must be estimated using
sample data.
Sample statistics (denoted b0 and b1 ) are computed as estimates of the
population parameters  0 and 1.
Substituting the values of the sample statistics b0 and b1 for  0 and 1 in
the regression equation and dropping the error term, we obtain the
estimated regression for simple linear regression.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Linear Regression Model (Slide 3 of 5)
In the estimated simple linear regression equation:
ŷ  b0  b1 x
yˆ  Estimate for the mean value of y corresponding to a given value of x.
b0  Estimated y -intercept.
b1  Estimated slope.

The graph of the estimated simple linear regression equation is called the
estimated regression line.
In general, yˆ is the point estimator of E  y x  , the mean value of y for a
given value of x.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Least Squares Method
Least Squares Estimates of the Regression Parameters
Using Excel’s Chart Tools to Compute the Estimated Regression Equation

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Least Squares Method (Slide 1 of 15)
Least squares method: A procedure for using sample data to find the
estimated regression equation.
Determine the values of b0 and b1 .
Interpretation of b0 and b1:
The slope b1 is the estimated change in the mean of the dependent
variable y that is associated with a one unit increase in the
independent variable x.

The y -intercept b0 is the estimated value of the dependent variable y

when the independent variable x is equal to 0.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Least Squares Method (Slide 2 of 15)
Driving x = Miles y = Travel Time
Table 7.1: Miles Traveled and Assignment i Traveled (hours)
Travel Time for 10 Butler Trucking 1 100 9.3
Company Driving Assignments 2 50 4.8
3 50 8.9
4 100 6.5
5 50 4.2
6 80 6.2
7 75 7.4
8 65 6.0
9 90 7.6
10 90 6.1

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Least Squares Method (Slide 4 of 15)

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Least Squares Method (Slide 5 of 15)
i th residual: The error made using the regression model to estimate the
mean value of the dependent variable for the i th observation.
Denoted as ei  yi  yˆi .
Hence,
n n
min   yi  yˆi   min  ei 2
2

i 1 i 1

We are finding the regression that minimizes the sum of squared errors.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Least Squares Method (Slide 6 of 15)
Least Squares Estimates of the Regression Parameters:
• For the Butler Trucking Company data in Table 7.1:
Estimated slope of b1  0.0678.
y -intercept of b0  1.2739.
The estimated simple linear regression model:
yˆ  1.2739  0.0678 x1

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Least Squares Method (Slide 7 of 15)
Interpretation of b1:
If the length of a driving assignment were 1 unit (1 mile) longer, the
mean travel time for that driving assignment would be 0.0678 units
(0.0678 hours, or approximately 4 minutes) longer.
Interpretation of b0 :
If the driving distance for a driving assignment was 0 units (0 miles), the
mean travel time would be 1.2739 units (1.2739 hours, or approximately
76 minutes).

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Least Squares Method (Slide 8 of 15)
• Experimental region: The range of values of the independent
variables in the data used to estimate the model.
• The regression model is valid only over this region.
• Extrapolation: Prediction of the value of the dependent variable
outside the experimental region.
• It is risky.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Least Squares Method (Slide 9 of 15)
Butler Trucking Company example: Use the estimated model and the
known values for miles traveled for a driving assignment (x) to estimate
mean travel time in hours.
• For example, the first driving assignment in Table 7.1 has a value for miles
traveled of x  100.
• The mean travel time in hours for this driving assignment is estimated to be:
yˆ1  1.2739  0.0678 100   8.0539
• The resulting residual of the estimate is:
e1  y1  yˆ1  9.3  8.0539  1.2461

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Least Squares Method (Slide 11 of 15)
Figure 7.4: Scatter Chart
of Miles Traveled and
Travel Time for Butler
Trucking Company
Driving Assignments
with Regression Line
Superimposed

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Least Squares Method (Slide 13 of 15)
Using Excel’s Chart Tools to Compute the Estimated Regression
Equation:
• After constructing a scatter chart with Excel’s chart tools:
1. Right-click on any data point and select Add Trendline.
2. When the Format Trendline task pane appears:
• Select Linear in the Trendline Options area.
• Select Display Equation on chart in the Trendline Options area.

xi  value of the independent variable for the ith observation.

yi  value of the dependent variable for the ith observation.
x  mean value for the independent variable.
y  mean value for the dependent variable.
n  total number of observations.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Assessing the Fit of the Simple Linear
Regression Model (Slide 1 of 10)
The Sums of Squares:
• Sum of squares due to error: The value of SSE is a measure of the error in using
the estimated regression equation to predict the values of the dependent
variable in the sample.

From Table 7.2, n

SSE   ei  8.0288 2

i 1

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Assessing the Fit of the Simple Linear
Regression Model (Slide 3 of 10)
Table 7.3 shows the sum of squared deviations obtained by using
the sample mean y  6.7 to predict the value of travel time in hours
for each driving assignment in the sample.

Butler Trucking Example: For the ith driving assignment in the

sample, the difference yi  y provides a measure of the error
involved in using y to predict travel time for the ith driving
assignment.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Assessing the Fit of the Simple Linear
Regression Model (Slide 5 of 10)
Table 7.3:
Calculations
for the Sum of
Squares Total
for the Butler
Trucking
Simple Linear
Regression

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Assessing the Fit of the Simple Linear
Regression Model (Slide 6 of 10)
Figure 7.8:
Deviations about
the Estimated
Regression Line and
the Line y  y
for the Third Butler
Trucking Company
Driving Assignment

Measures how much the yˆ values on the estimated regression line deviate
from y .
Relation between SST, SSR, and SSE: SST = SSR + SSE
where
• SST = total sum of squares
• SSR = sum of squares due to regression
• SSE = sum of squares due to error.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Assessing the Fit of the Simple Linear
Regression Model (Slide 8 of 10)
The Coefficient of Determination:
• The ratio SSR/SST used to evaluate the goodness of fit for the estimated
regression equation; this ratio is called the coefficient of determination
2
and is denoted by .
r
• Take values between zero and one.
• Interpreted as the percentage of the total sum of squares that can be
explained by using the estimated regression equation.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Assessing the Fit of the Simple Linear
Regression Model (Slide 9 of 10)
Using Excel’s Chart Tools to Compute the Coefficient of Determination:
• To compute the coefficient of determination using the scatter chart in
Figure 7.3:
1. Right-click on any data point in the scatter chart and select Add
Trendline…
2. When the Format Trendline task pane appears:
• Select Display R-squared value on chart in the Trendline Options area.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Assessing the Fit of the Simple Linear
Regression Model (Slide 10 of 10)
Figure 7.9: Scatter
Chart and Estimated
Regression Line with
Coefficient of
2
Determination r
for Butler Trucking
Company

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Multiple Regression Model
Regression Model
Estimated Multiple Regression Equation
Least Squares Method and Multiple Regression
Butler Trucking Company and Multiple Regression
Using Excel’s Regression Tool to Develop the Estimated Multiple Regression Equation

y  dependent variable.
x1 , x2 , . . ., xq  independent variables.
 0 , 1 , 2 , ,  q  parameters.
  error term (accounts for the variability in y that cannot be explained
by the linear effect of the q independent variables).

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Multiple Regression Model (Slide 2 of 11)
Regression Model (cont.):
Interpretation of coefficient  j :
Represents the change in the mean value of the dependent variable y
that corresponds to a one unit increase in the independent variable x j ,
holding the values of all other independent variables in the model
constant.
The multiple regression equation that describes how the mean value of y is
related to x1 , x2 , . . ., xq :

E  y x2 , x2 , . . . , xq    0  1 x1  2 x2     q xq

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Multiple Regression Model (Slide 4 of 11)
Least Squares Method and Multiple Regression:
The least squares method is used to develop the estimated multiple
regression equation:
Finding b0 , b1 , b2 , , bq that satisfy min i 1  y i  yˆi   min  i 1 ei .
n 2 n 2

Uses sample data to provide the values of b0 , b1 , b2 , , bq

that minimize the sum of squared residuals.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Multiple Regression Model (Slide 6 of 11)
Butler Trucking Company and Multiple Regression:
The estimated simple linear regression equation, yˆi  1.2739  0.0678 xi .
The linear effect of the number of miles traveled explains 66.41%
r 2  0.6641 of the variability in travel time in the sample data.
This implies, 33.59% of the variability in sample travel times remains
unexplained
The managers might want to consider adding one or more independent
variables, such as number of deliveries, to the model to explain some of the
remaining variability in the dependent variable.

yˆ  Estimated mean travel time.

x1  Distance traveled.
x2  Number of deliveries.
The SST, SSR, SSE and R 2 are computed using the formulas discussed
earlier.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Multiple Regression Model (Slide 8 of 11)
Figure 7.11: Data
Analysis Tools Box
Using Excel’s
Regression Tool to
Develop the
Estimated Multiple
Regression
Equation:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Inference and Regression
Conditions Necessary for Valid Inference in the Least Squares Regression Model
Testing Individual Regression Parameters
Addressing Nonsignificant Independent Variables
Multicollinearity

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Inference and Regression (Slide 1 of 17)
• Statistical inference: Process of making estimates and drawing conclusions
about one or more characteristics of a population (the value of one or more
parameters) through the analysis of sample data drawn from the population.
• In regression, inference is commonly used to estimate and draw conclusions
about:
The regression parameters  0 , 1 , 2 , ,  q .

The mean value and/or the predicted value of the dependent variable
* * *
y for specific values of the independent variables x1 , x2 , , xq .

• Consider both hypothesis testing and interval estimation.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Inference and Regression (Slide 2 of 17)
Conditions Necessary for Valid Inference in the Least Squares
Regression Model:
For any given combination of values of the independent variables
x1 , x2 , , xq , the population of potential error terms  is normally
distributed with a mean of 0 and a constant variance.
The values of  are statistically independent.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Inference and Regression (Slide 9 of 17)
Testing Individual Regression Parameters:
To determine whether statistically significant relationships exist
between the dependent variable y and each of the independent
variables x1 , x2 , , xq individually.

If  j  0, there is no linear relationship between the dependent variable y

and the independent variable x j .

If  j  0, there is a linear relationship between y and x j .

As the magnitude of t increases (as t deviates from zero in either direction),

we are more likely to reject the hypothesis that the regression parameter
 j is zero.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Inference and Regression (Slide 11 of 17)
Testing Individual Regression Parameters (cont.):
• Confidence interval can be used to test whether each of the regression
parameters
 0 , 1 , 2 , ,  q is equal to zero.
• Confidence interval: An estimate of a population parameter that provides
an interval believed to contain the value of the parameter at some level of
confidence.
• Confidence level: Indicates how frequently interval estimates based on
samples of the same size taken from the same population using identical
sampling techniques will contain the true value of the parameter we are
estimating.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Inference and Regression (Slide 12 of 17)
Addressing Nonsignificant Independent Variables:
• If practical experience dictates that the nonsignificant independent
variable has a relationship with the dependent variable, the independent
variable should be left in the model.
• If the model sufficiently explains the dependent variable without the
nonsignificant independent variable, then consider rerunning the
regression without the nonsignificant independent variable.
• The appropriate treatment of the inclusion or exclusion of the y-intercept
when b0 is not statistically significant may require special consideration.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Inference and Regression (Slide 13 of 17)
Multicollinearity:
• Multicollinearity refers to the correlation among the independent
variables in multiple regression analysis.
• In t tests for the significance of individual parameters, the difficulty
caused by multicollinearity is that it is possible to conclude that a
parameter associated with one of the multicollinear independent
variables is not significantly different from zero when the independent
variable actually has a strong relationship with the dependent variable.
• This problem is avoided when there is little correlation among the
independent variables.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Inference and Regression (Slide 14 of 17)
Figure 7.21: Excel
Regression Output
for the Butler
Trucking Company
with Miles and
Gasoline
Consumption as
Independent
Variables

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Inference and Regression (Slide 16 of 17)
Multicollinearity (cont.):
• Testing for an overall regression relationship:
• Use an F test based on the F probability distribution.
• If the F test leads us to reject the hypothesis that the values of b1 , b2 , , bq
are all zero:
• Conclude that there is an overall regression relationship.
• Otherwise, conclude that there is no overall regression relationship.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Inference and Regression (Slide 17 of 17)
Multicollinearity (cont.):
• Testing for an overall regression relationship (cont.):
• The test statistic generated by the sample data for this test is:
SSR q
F
SSE  n  q  1 
• SSR = Sum of squares due to regression.
• SSE = Sum of squares due to error.
• q = the number of independent variables in the regression model.
• n = the number of observations in the sample.
• Larger values of F provide stronger evidence of an overall regression
relationship.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Categorical Independent Variables (Slide 1 of
10)
Butler Trucking Company and Rush Hour:
Dependent variable, y: Travel time.
Independent variables: miles traveled  x1  and number of deliveries  x2 .
Categorical variable: rush hour  x3   0 if an assignment did not include
travel on the congested segment of highway during afternoon rush hour.
Categorical variable: rush hour  x3   1 if an assignment included travel
on the congested segment of highway during afternoon rush hour.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Categorical Independent Variables (Slide 2 of
10)
Figure 7.23: Histograms of the Residuals for Driving Assignments That
Included Travel on a Congested Segment of a Highway During the Afternoon
Rush Hour and Residuals for Driving Assignments That Did Not

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Categorical Independent Variables (Slide 3 of
10)
Figure 7.24: Excel
Data and Output for
Butler Trucking with
Miles Traveled (x1 ),
Number of Deliveries
(x2 ), and the Highway
Rush Hour Dummy
Variable (x3 ), as the
Independent Variables

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Categorical Independent Variables (Slide 4 of
10)
Interpreting the Parameters:
• The model estimates that travel time increases by:
• 0.0672 hours (about 4 minutes) for every increase of 1 mile traveled,
holding constant the number of deliveries and whether the driving
assignment route requires the driver to travel on the congested segment of
a highway during the afternoon rush hour period.
• 0.6735 hours (about 40 minutes) for every delivery, holding constant the
number of miles traveled and whether the driving assignment route
requires the driver to travel on the congested segment of a highway during
the afternoon rush hour period.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Categorical Independent Variables (Slide 5 of
10)
Interpreting the Parameters (cont.):
The model estimates that travel time increases by (cont.):
• 0.9980 hours (about 60 minutes) if the driving assignment route requires the
driver to travel on the congested segment of a highway during the afternoon
rush hour period, holding constant the number of miles traveled and the
number of deliveries.
R 2  0.8838 indicates that the regression model explains approximately
88.4% of the variability in travel time for the driving assignments in the
sample.

When x3  1:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Categorical Independent Variables (Slide 7 of
10)
More Complex Categorical Variables:
If a categorical variable has k levels, k minus 1 dummy variables are
required, with each dummy variable corresponding to one of the levels of
the categorical variable and coded as 0 or 1.
• Example:
• Suppose a manufacturer of vending machines organized the sales territories
for a particular state into three regions: A, B, and C.
• The managers want to use regression analysis to help predict the number of
vending machines sold per week.
• Suppose the managers believe sales region is one of the important factors
in predicting the number of units sold.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Categorical Independent Variables (Slide 8 of
10)
More Complex Categorical Variables (cont.):
• Example (cont.):
Sales region: categorical variable with three levels (A, B, and C).
Number of dummy variables = 3  2  1.
Each variable can be coded 0 or 1 as:
x1  1 if sales Region B; 0 if otherwise.
x2  1 if sales Region C; 0 if otherwise.
The values of x1 and x2 are:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Categorical Independent Variables (Slide 9 of
10)
More Complex Categorical Variables (cont.):
• Example (cont.):
• The regression equation relating the mean number of units sold to the dummy
variables is written as:
ŷ  b0  b1 x1  b2 x2

• Observations corresponding to Region A correspond to x1  0, x2  0,

so the estimated mean number of units sold in Region A is:

yˆ  b0  b1  0   b2  0   b0

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Categorical Independent Variables (Slide 10 of 10)
More Complex Categorical Variables (cont.):
• Example (cont.):
• Observations corresponding to Region B are coded x1  1, x2  0,
so the estimated mean number of units sold in Region C is:

yˆ  b0  b1 1   b2  0   b0  b1
• Observations corresponding to Region C are coded x1  0, x2  1,
so the estimated mean number of units sold in Region B is:
yˆ  b0  b1  0   b2 1   b0  b2

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Modeling Nonlinear Relationships (Slide 3 of
16)
Figure 7.27: Scatter
Chart of the
Residuals and
Predicted Values
of the Dependent
Variable for the
Reynolds Simple
Linear Regression

Quadratic Regression Models:

• In the Reynolds example, to account for the curvilinear relationship
between months employed and scales sold, we could include the square
of the number of months the salesperson has been employed as a
second independent variable.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Modeling Nonlinear Relationships (Slide 8 of
16)
Figure 7.31: Scatter Chart of the Residuals and Predicted Values of the
Dependent Variable for the Reynolds Quadratic Regression Model

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Modeling Nonlinear Relationships (Slide 9 of
16)
Piecewise Linear Regression Models:
• For the Reynolds data, as an alternative to a quadratic regression model:
• Recognize that below some value of Months Employed, the relationship
between Months Employed and Sales appears to be positive and linear.
• Whereas the relationship between Months Employed and Sales appears to
be negative and linear for the remaining observations.
• Piecewise linear regression model: This model will allow us to fit these
relationships as two linear regressions that are joined at the value of
Months at which the relationship between Months Employed and Sales
changes.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Modeling Nonlinear Relationships (Slide 10 of 16)
Piecewise Linear Regression Models (cont.):
• Knot: The value of the independent variable at which the relationship
between dependent variable and independent variable changes; also
called breakpoint.
• For the Reynolds data, knot is the value of the independent variable
Months Employed at which the relationship between Months Employed
and Sales changes.

x1  Months.
x k   value of the knot (90 months for the Reynolds example).
xk  the knot dummy variable.

• Then fit the following estimated regression equation:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Modeling Nonlinear Relationships (Slide 14 of
16)
Interaction Between Independent Variables:
• Interaction: This occurs when the relationship between the dependent
variable and one independent variable is different at various values of a
second independent variable.
• The estimated multiple linear regression equation is given as:

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Model Fitting
Variable Selection Procedures
Overfitting

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Model Fitting (Slide 1 of 10)
Variable Selection Procedures:
• Special procedures are sometimes employed to select the independent
variables to include in the regression model.
• Iterative procedures: At each step of the procedure, a single independent
variable is added or removed and the new model is evaluated. Iterative
procedures include:
• Backward elimination.
• Forward selection.
• Stepwise selection.
• Best subsets procedure: Evaluates regression models involving different
subsets of the independent variables.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Model Fitting (Slide 2 of 10)
Variable Selection Procedures (cont.):
• Backward elimination procedure:
• Begins with the regression model that includes all of the independent
variables under consideration.
• At each step, backward elimination considers the removal of an
independent variable according to some criterion.
• Stops when all independent variables in the model are significant at a
specified level of significance.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Model Fitting (Slide 3 of 10)
Variable Selection Procedures (cont.):
• Forward selection procedure:
• Begins with none of the independent variables under consideration
included in the regression model.
• At each step, forward selection considers the addition of an independent
variable according to some criterion.
• Stops when there are no independent variables not currently in the model
that meet the criterion for being added to the regression model.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Model Fitting (Slide 4 of 10)
Variable Selection Procedures (cont.):
• Stepwise selection procedure:
• Begins with none of the independent variables under consideration
included in the regression model.
• The analyst establishes both a criterion for allowing independent variables
to enter the model and a criterion for allowing independent variables to
remain in the model.
• To initiate the procedure, the most significant independent variable is
added to the empty model if its level of significance satisfies the entering
threshold.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Model Fitting (Slide 5 of 10)
Variable Selection Procedures (cont.):
• Stepwise selection procedure (cont.):
• Each subsequent step involves two intermediate steps:
• First, the remaining independent variables not in the current model are
evaluated, and the most significant one is added to the model.
• Then the independent variables in the current model are evaluated, and the
least significant one is removed.
• Stops when no independent variables not currently in the model have a
level of significance for remaining in the regression model.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Model Fitting (Slide 6 of 10)
Variable Selection Procedures (cont.):
• Best subsets procedure:
• Simple linear regressions for each of the independent variables under
consideration are generated, and then the multiple regressions with all
combinations of two independent variables under consideration are
generated, and so on.
• Once a regression model has been generated for every possible subset of
the independent variables under consideration, the entire collection of
regression models can be compared and evaluated.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Model Fitting (Slide 7 of 10)
Overfitting:
• Overfitting generally results from creating an overly complex model to
explain idiosyncrasies in the sample data.
• In regression analysis, this often results from the use of complex
functional forms or independent variables that do not have meaningful
relationships with the dependent variable.
• If a model is overfit to the sample data, it will perform better on the
sample data used to fit the model than it will on other data from the
population.
• Thus, an overfit model can be misleading about its predictive capability
and its interpretation.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Model Fitting (Slide 8 of 10)
Overfitting (cont.):
• How does one avoid overfitting a model?
• Use only independent variables that you expect to have real and meaningful
relationships with the dependent variable.
• Use complex models, such as quadratic models and piecewise linear
regression models, only when you have a reasonable expectation that such
complexity provides a more accurate depiction of what you are modeling.
• Do not let software dictate your model; use iterative modeling procedures,
such as the stepwise and best-subsets procedures, only for guidance and
not to generate your final model.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Model Fitting (Slide 9 of 10)
Overfitting (cont.):
• How does one avoid overfitting a model? (cont.):
• If you have access to a sufficient quantity of data, assess your model on data
other than the sample data that were used to generate the model (this is
referred to as cross-validation).
• One possible ways to execute cross-validation is the holdout method.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Model Fitting (Slide 10 of 10)
Overfitting (cont.):
• Holdout method: The sample data are randomly divided into
mutually exclusive and collectively exhaustive training and
validation sets.
• Training set: The data set used to build the candidate models that
appear to make practical sense.
• Validation set: The set of data used to compare model
performances and ultimately select a model for predicting values
of the dependent variable.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Big Data and Regression
Inference and Very Large Samples
Model Selection

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Big Data and Regression (Slide 1 of 6)
Inference and Very Large Samples:
• Virtually all relationships between independent variables and the
dependent variable will be statistically significant if the sample is
sufficiently large.
• That is, if the sample size is very large, there will be little difference in the
bj values generated by different random samples.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Big Data and Regression (Slide 3 of 6)
Table 7.4: Regression Parameter Estimates and the Corresponding p
values for 10 Multiple Regression Models, Each Estimated on 50
Observations from the LargeCredit Data

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Big Data and Regression (Slide 4 of 6)
Figure 7.37: Excel Regression Output for Credit Card Company Example
after Adding Number of Hours per Week Spent Watching Television

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Big Data and Regression (Slide 5 of 6)
Model Selection:
• When dealing with large samples, it is often more difficult to discern the
most appropriate model.
• If developing a regression model for explanatory purposes, the practical
significance of the estimated regression coefficients should be considered
when interpreting the model and considering which variables to keep in
the model.
• If developing a regression model to make future predictions, the selection
of the independent variables to include in the regression model should be
based on the predictive accuracy on observations that have not been used
to train the model.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Prediction with Regression (Slide 1 of 2)
• In addition to the point estimate, there are two types of interval
estimates associated with the regression equation:
• A confidence interval is an interval estimate of the mean y value given
values of the independent variables.

• A prediction interval is an interval estimate of an individual y value

given values of the independent variables.

© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Prediction with Regression (Slide 2 of 2)
Table 7.5: Predicted Values and 95% Confidence Intervals and
Prediction Intervals for 10 New Butler Trucking Routes

Time Series Penn
No ratings yet
Time Series Penn
67 pages
Chapter 4 (Regression) PDF
No ratings yet
Chapter 4 (Regression) PDF
63 pages
Simple Regression
100% (1)
Simple Regression
50 pages
Chapter Simple Linear Regression 1
100% (1)
Chapter Simple Linear Regression 1
77 pages
Chap 04 Linear Regression
No ratings yet
Chap 04 Linear Regression
99 pages
Simple Lin Regress Inference
No ratings yet
Simple Lin Regress Inference
51 pages
Camm 3e Ch07 PPT PDF
No ratings yet
Camm 3e Ch07 PPT PDF
116 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
Lgt2425 Introduction To Business Analytics: Lecture 3: Linear Regression (Part I)
No ratings yet
Lgt2425 Introduction To Business Analytics: Lecture 3: Linear Regression (Part I)
36 pages
Fundamentals of Business Statistics: 6E John Loucks
No ratings yet
Fundamentals of Business Statistics: 6E John Loucks
40 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
63 pages
Strategic Service Management
No ratings yet
Strategic Service Management
312 pages
Numerical Methods With Applications
No ratings yet
Numerical Methods With Applications
34 pages
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
100% (1)
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
37 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
Slides Prepared by John S. Loucks St. Edward's University
No ratings yet
Slides Prepared by John S. Loucks St. Edward's University
48 pages
Impact of Stress Among Nurses in Private Hospitals - An Empirical Study
No ratings yet
Impact of Stress Among Nurses in Private Hospitals - An Empirical Study
11 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
Chapter 14 Simple Linear Regression
No ratings yet
Chapter 14 Simple Linear Regression
45 pages
39.02.06.19 Impact of Leadership On Organisational Performance
100% (1)
39.02.06.19 Impact of Leadership On Organisational Performance
8 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
Engineering Analysis & Statistics: Lect. # 11
No ratings yet
Engineering Analysis & Statistics: Lect. # 11
22 pages
Statistics For Business and Economics (13e) : John Loucks
No ratings yet
Statistics For Business and Economics (13e) : John Loucks
41 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
82 pages
Measuring Commuters' Perception On Service Quality Using SERVQUAL in Public Transportation
No ratings yet
Measuring Commuters' Perception On Service Quality Using SERVQUAL in Public Transportation
14 pages
Applied Economics
No ratings yet
Applied Economics
18 pages
Multivariate Linear Regression
No ratings yet
Multivariate Linear Regression
30 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Simple Linear Regression Model Ordinary Least Square (OLS) Method
No ratings yet
Simple Linear Regression Model Ordinary Least Square (OLS) Method
18 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Coefficient of Determination Formula
No ratings yet
Coefficient of Determination Formula
8 pages
Simple Regression
No ratings yet
Simple Regression
35 pages
Unit Regression Analysis: Objectives
No ratings yet
Unit Regression Analysis: Objectives
18 pages
09 Inference For Regression Part1
No ratings yet
09 Inference For Regression Part1
12 pages
Marketing Optimization: Predictive Analytics Use Case
No ratings yet
Marketing Optimization: Predictive Analytics Use Case
18 pages
TSQM PDF
No ratings yet
TSQM PDF
13 pages
Chap 14 A
No ratings yet
Chap 14 A
41 pages
CIA 3 RM Customer Satisfaction in Banking Sector
No ratings yet
CIA 3 RM Customer Satisfaction in Banking Sector
29 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Kaiser Et Al 2010 - Artigo PDF
No ratings yet
Kaiser Et Al 2010 - Artigo PDF
10 pages
J. Account. Public Policy: John Daniel Eshleman, Peng Guo
No ratings yet
J. Account. Public Policy: John Daniel Eshleman, Peng Guo
15 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Dynamic Changes of Headspace Gases in CO, and N, Packaged Fresh Beef
No ratings yet
Dynamic Changes of Headspace Gases in CO, and N, Packaged Fresh Beef
5 pages
Chapter 4 (Regression)
No ratings yet
Chapter 4 (Regression)
125 pages
Download
No ratings yet
Download
8 pages
LINEAR REGRESSION Feu Diliman
No ratings yet
LINEAR REGRESSION Feu Diliman
11 pages
Chapter - 2 Regression - Business Analytics
No ratings yet
Chapter - 2 Regression - Business Analytics
99 pages
Cengage EBA 2e Chapter07
No ratings yet
Cengage EBA 2e Chapter07
123 pages
FBAS
No ratings yet
FBAS
5 pages
Fba 1
No ratings yet
Fba 1
9 pages
Purchase Intention
No ratings yet
Purchase Intention
15 pages
Music and Mental Health Statistical Analysis
No ratings yet
Music and Mental Health Statistical Analysis
15 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
10 Bda
No ratings yet
10 Bda
35 pages
JPSP 2022 168
No ratings yet
JPSP 2022 168
10 pages
Financial Literacy Affects The Financial Management
No ratings yet
Financial Literacy Affects The Financial Management
11 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
Updated Lecture 7
No ratings yet
Updated Lecture 7
29 pages
Lesson 10 Simple Linear Regression and Correlation
No ratings yet
Lesson 10 Simple Linear Regression and Correlation
70 pages
Green Supply Chain Management Practice of FDI Companies in Vietnam
No ratings yet
Green Supply Chain Management Practice of FDI Companies in Vietnam
10 pages
Module 3 PoM-Forecasting
No ratings yet
Module 3 PoM-Forecasting
5 pages
Unit III
No ratings yet
Unit III
18 pages
Regression
No ratings yet
Regression
60 pages
Chapter 8
No ratings yet
Chapter 8
60 pages
Unit - Iii
No ratings yet
Unit - Iii
9 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
25 pages
IJRPR21983
No ratings yet
IJRPR21983
7 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
ESBE7 CH 12 A
No ratings yet
ESBE7 CH 12 A
48 pages
TI-84 Calculator Cheat Sheet Unit 2
No ratings yet
TI-84 Calculator Cheat Sheet Unit 2
3 pages
Regression
No ratings yet
Regression
60 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
Regression Equations
No ratings yet
Regression Equations
94 pages
F Regression
No ratings yet
F Regression
65 pages
Relationship Between Student Engagement and Academic Performance
No ratings yet
Relationship Between Student Engagement and Academic Performance
8 pages
Mas Topic 2 Cost Behavior Cost Classification
No ratings yet
Mas Topic 2 Cost Behavior Cost Classification
38 pages
Topic 7 2023 Linear Regression STD
No ratings yet
Topic 7 2023 Linear Regression STD
14 pages
05-. Linear Regression Analysis - 线性回归分析
No ratings yet
05-. Linear Regression Analysis - 线性回归分析
47 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Chapter 7 - New 1
No ratings yet
Chapter 7 - New 1
29 pages
CH 5
No ratings yet
CH 5
8 pages
Lecture 13 BA
No ratings yet
Lecture 13 BA
36 pages
Ijirt155449 Paper
No ratings yet
Ijirt155449 Paper
5 pages
Benefits Realisation: The Change-Driven Approach to Project Success
From Everand
Benefits Realisation: The Change-Driven Approach to Project Success
Rasmus Rytter
No ratings yet

Business Analytics

Uploaded by

Business Analytics

Uploaded by

Business Analytics

Parameters: The characteristics of the population,  0 and 1 .

The y -intercept b0 is the estimated value of the dependent variable y

xi  value of the independent variable for the ith observation.

From Table 7.2, n

Butler Trucking Example: For the ith driving assignment in the

Uses sample data to provide the values of b0 , b1 , b2 , , bq

yˆ  Estimated mean travel time.

• Consider both hypothesis testing and interval estimation.

If  j  0, there is no linear relationship between the dependent variable y

If  j  0, there is a linear relationship between y and x j .

As the magnitude of t increases (as t deviates from zero in either direction),

• Observations corresponding to Region A correspond to x1  0, x2  0,

Quadratic Regression Models:

• Then fit the following estimated regression equation:

• A prediction interval is an interval estimate of an individual y value

You might also like