0% found this document useful (0 votes)

280 views17 pages

16 Review of Part II

The document provides an outline and list of terms for Part II of a statistics course covering topics such as inference for regression, multiple regression, and building regression models. Key terms include regression coefficients, correlation, coefficient of determination, residuals, and variable selection procedures for building regression models.

Uploaded by

Rama Dulce

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

280 views17 pages

16 Review of Part II

Uploaded by

Rama Dulce

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

REVIEW OF PART II

Topics Outline
Inference for Regression
Multiple Regression
Building Regression Models

Terms and Concepts

1. Equation of a regression model

18. ANOVA table

2. Equation of the true population line/plane/surface

19. Sums of Squares (SST, SSR, SSE)

3. Equation of the least squares regression

line/plane/surface

20. Mean squares (MSR, MSE)

4. Least squares criterion

21. Collinearity

5. Regression coefficients and their interpretation

22. Variance inflation factor (VIF)

6. Correlation r and its interpretation

23. Correlation matrix

7. Coefficient of determination r 2 and its interpretation 24. Validation of the fit

8. Regression standard error s e and its interpretation

25. Dummy (indicator) variables

9. Confidence intervals for the intercept ( ) and for

the regression slopes ( s)

26. Interaction (cross-product) terms

10. The overall F test

27. Nonlinear transformations

(quadratic and logarithmic)

11. t test for a regression slope

28. The partial F test

12. Confidence interval for a mean response

29. Adjusted r 2

13. Prediction interval for an individual response

30.

14. Residuals and residual plots

31. Include/exclude decisions

15. Normal probability plot (Q-Q plot)

32. Principle of parsimony

16. Regression assumptions and how to check them

33. Building regression models

17. Excel regression output (what each cell represents

and what the connections between the cells are)

34. Variable selection procedures

(forward selection, backward elimination,
stepwise regression, best subsets regression)

-1-

C p statistic

Example 1
Florida reappraises real estate every year, so the county appraisers Web site lists the current
fair market value of each piece of property. Property usually sells for somewhat more than the
appraised market value. Data for the appraised market values and actual selling prices
(in thousands of dollars) of 16 condominium units sold in a beachfront building over a 19-month
period are stored in the file Condominiums.xlsx.
Condominium

Selling Price

Appraised Value

1
2

15
16

850
900

1325
845

758.0
812.7

1031.8
586.7

Excel output for a linear regression of selling price on appraised value is shown on the next page.
(a) Write the equation for the model of the population regression line.
y x

(b) Write the equation of the true population regression line.

y x
(c) What is the equation of the least-squares regression line for predicting selling price
from appraised value?
y 127.27 1.0466 x
(d) What is the correlation between appraised value and selling price?
The correlation r is the square root of r 2 .
r 0.861 0.93
(We take the positive square root because the sign of r must be the same as the sign of
the slope, 1.0466.)
Reminder: For simple and multiple regression, r is the correlation between the observed values of y
and the predicted values y . For simple linear regression, r is also the correlation between x and y.
(e) Explain why the pattern you see on the residual plot agrees with the conditions of
linear relationship and constant standard deviation needed for regression inference.
On the residual plot, as usual a horizontal line is added at residual zero, the mean of the residuals.
This line corresponds to the regression line in the plot of selling price against appraised value.
The residuals show a random scatter about the line, with roughly equal vertical spread across
their range. This is what we expect when the conditions for regression inference hold.
(f) Does the histogram of the residuals suggest lack of normality?
The distribution of residuals has a bit of a cluster at the left, but there are no outliers or
other strong deviations from normality that would prevent regression inference.
-2-

Regression Statistics
Multiple R
0.9277
R Square
0.8606
Adjusted R Square
0.8506
Standard Error
69.7299
Observations
16
ANOVA
df
Regression
Residual
Total

SS
MS
420072.1418 420072.1418
68071.6082
4862.2577
488143.7500

1
14
15
Coefficients
127.2705
1.0466

Intercept
Appraised Value

Standard Error
79.4892
0.1126

t Stat
1.6011
9.2949

F
Significance F
86.3945
0.0000

P-value
0.1317
0.0000

Lower 95%
-43.2168
0.8051

Upper 95%
297.7578
1.2881

Fitted Line Plot

1500
y = 1.0466x + 127.27
R = 0.8606

Selling Price

1300
1100
900
700
500
400

600

800

1000

Appraised Value

Histogram of the Residuals

Residuals versus Appraised Value

150

Frequency

Residuals

100
50
0
400

600

800

1000

1200

-50
-100

4
3
2
1
0

Appraised Value

-3-

Residual

(g) How many degrees of freedom does the t distribution used for statistical inference on these data have?
There are n = 16 data pairs, so df = n 1 1 = 16 2 = 14.
Reminder:
df = n k 1, where n is the number of observations, k is the number of explanatory variables.
(h) Explain what the slope of the true regression line means in this setting.

is the average rate of increase in selling price in a population of condominium units when
appraised value increases by $1,000.
(i) Find a 95% confidence interval for the population slope .

b t * SEb = 1.0466 (2.145)( 0.1126) = 1.0466 0.2415 = 0.8051 to 1.2881

(j) Is there significant evidence that selling price increases as appraised value increases?
To test
H0 : 0
Ha : 0
b
we use the test statistic t
9.2949 with df = 14. The P-value is 0.000.
SEb
We reject the null hypothesis and conclude that there is a positive linear relationship between
selling prices and appraised values of beach houses in Florida.
(k) What is the selling price of a house appraised at $802,600?

y 127.27 1.0466(802.600) 967.2712

The selling price of a house appraised at $802,600 is $967,271.
(l) What is a 95% interval for the mean selling price of a unit appraised at $802,600?
(Use SEmean = 21.6387.)
For a mean selling price, we should use the confidence interval:

y * t * SEmean 967.2712 2.145(21.6387) 967.2712 46.415

$921,000 to $1,014,000
(m) Hamada owns a unit in this building appraised at $802,600.
What is a 95% interval for this appraised value? (Use SEind = 73.0102.)
For a range of values for an individual unit, we should use the prediction interval.
Hamada can be 95% confident that her unit would sell for

y * t * SEind 967.2712 2.145(73.0102) 967.2712 156.6069

$811,000 to $1,124,000

-4-

Example 2
Demand and Cost for Electricity
The Public Service Electric Company produces different quantities of electricity each month,
depending on the demand. The file Cost_of_Power.xlsx lists the number of Units of electricity
produced and the total Cost (in dollars) of producing these units for a 36-month period.
Month
1
2

35
36

Cost
45623
46507

45218
45357

Units
601
738

705
637

(a) What does the scatterplot of Cost versus Units reveal about the relationship between Cost and Units?
Scatterplot of Cost vs Units
50000
45000

Cost

40000
35000
30000
25000
200

300

400

500

600

700

800

900

Units

The scatterplot indicates a definite positive relationship and one that is nearly linear.
However, there is also some evidence of curvature in the plot. The points increase slightly
less rapidly as Units increases from left to right. In economic terms, there might be
economies of scale, so that the marginal cost of electricity decreases as more units of
electricity are produced.
(b) The output for a simple linear regression is shown on the next page.
Does the residual plot suggest the need for a nonlinear transformation?
The residuals to the far left and the far right are all negative, whereas the majority of the
residuals in the middle are positive. This negative-positive-negative behavior of residuals
suggests a parabola. Admittedly, the pattern is far from a perfect parabola because there are
several negative residuals in the middle. However, this plot certainly suggests nonlinear
behavior and exploring a quadratic relationship with the square of Units included in the
equation is reasonable.

-5-

Regression output for the linear model

Multiple

Adjusted

StErr of

R-Square

Estimate

0.7359

0.7282

2733.7

Degrees of

Sum of

Mean of

Freedom

Squares

Summary

R-Square

0.8579

ANOVA Table

1
34

Explained
Unexplained

708085273.8 708085273.8
254093815.2 7473347.506

Coefficient
Regression Table

Standard

F-Ratio

p-Value

94.7481

< 0.0001

t-Value

p-Value

12.3369
9.7339

< 0.0001
< 0.0001

Error

23651.5
30.533

Constant
Units

1917.1
3.137

Confidence Interval 95%

Lower

Upper

19755.4
24.158

27547.6
36.908

Scatterplot of Residual vs Fit

5000.0
4000.0
3000.0
2000.0

Residual

1000.0
0.0
30000
-1000.0

35000

40000

45000

50000

-2000.0
-3000.0
-4000.0
-5000.0

Fit

(c) The regression output for estimating a quadratic relationship between Cost and Units is
shown on the next page. What is the estimated regression equation? Does it provide a better
fit than the linear equation?
The estimated regression equation is
Predicted Cost = 5792.80 + 98.350Units 0.0600(Units)2
The graph of the regression equation superimposed on the scatterplot of Cost versus Units
shows a reasonably good fit, plus an obvious curvature.
The quadratic model provides a better fit as indicated by the coefficient of determination r 2
which has increased from 73.6% to 82.2% and the standard error of estimate s e which has
decreased from $2,734 to $2,281.
-6-

Regression output for the quadratic model

Multiple

Adjusted

StErr of

R-Square

Estimate

0.8216

0.8108

2280.800

Degrees of

Sum of

Mean of

Freedom

Squares

Summary

R-Square

0.9064

ANOVA Table

2
33

Explained
Unexplained

790511518.3 395255759.1
171667570.7 5202047.597
Standard

Coefficient
Regression Table

F-Ratio

p-Value

75.9808

< 0.0001

t-Value

p-Value

1.2162
5.7058
-3.9806

0.2325
0.0000
0.0004

Error

5792.7983
98.3504
-0.0600

Constant
Units
(Units)^2

Confidence Interval 95%

Lower

4763.0585
17.2369
0.0151

-3897.7171
63.2817
-0.0906

Upper

15483.3137
133.4191
-0.0293

Quadratic Fit
50000

45000

Cost

40000

35000

30000

25000
200

300

400

500

600

700

800

900

Units

(d) Interpret the regression coefficients.

The interpretation of the y intercept is that the predicted cost for zero units of electricity
produced is $5,792.80.
There is no easy way to interpret the slope coefficients in a quadratic equation. For example,
you can't conclude from the 98.35 coefficient of Units that Cost increases by 98.35 dollars
when Units increases by one. The reason is that when Units increases by one, (Units)2 doesn't
stay constant; it also increases. You can provide instead a qualitative description of the
relationship between x and y from the signs of the coefficients b1 and b2 . For example if b1 is
positive and b2 is negative (as it is in our example), then y increases as x increases, but the
marginal rate of change decreases as x increases.
-7-

(e) Test the significance of the quadratic effect.

To test

H 0 : 2 0 (Including the quadratic term does not significantly improve the model.)
H 0 : 2 0 (Including the quadratic term significantly improves the model.)
we use the test statistic t = 3.98 with df = n k 1 = 36 2 1 = 33 and P-value = 0.0004.
The small P-value indicates that the quadratic effect is significant.
Notes:
1. The coefficient of (Units)2, 0.0600 is negative and it makes the parabola bend downward.
This produces the decreasing marginal cost behavior, where every extra unit of electricity
incurs a smaller cost. Actually, the curve described by the regression equation eventually
goes downhill for large values of Units, but this part of the curve is irrelevant because the
company evidently never produces such large quantities.
2. You should not be fooled by the small magnitude of this coefficient.
Remember that it is the coefficient of Units squared, which is a large quantity.
Therefore, the effect of the product 0.0600(Units)2 is sizable.
(f) To examine the possibility for a logarithmic fit, a new variable Log(Units), the natural
logarithm of Units has been created. The output from a regression of Cost against
Log(Units) is shown on the next page. Interpret the slope of the regression line.
The estimated regression equation is
Predicted Cost = 63993 + 16654 Log(Units)
Reminder: If b is the coefficient of the log of x, then the expected change in y when x increases
by 1% is approximately 0.01 times b.
In the present case, you can interpret the slope coefficient as follows.
Suppose that Units increases by 1%, for example, from 600 to 606.
Then the regression equation implies that the expected Cost will increase by approximately
(0.01)(16654) = 166.54 dollars.
In words, every 1% increase in Units is accompanied by an expected $166.54 increase in Cost.
Note that for larger values of Units, a 1% increase represents a larger absolute increase
(from 700 to 707 instead of from 600 to 606, say). But each such 1% increase entails the same
increase in Cost. This is another way of describing the decreasing marginal cost property.

-8-

Regression output for the logarithmic model

Multiple
Summary

R-Square

ANOVA Table

R-Square

Estimate

0.7977

0.7917

2392.8

Degrees of

Sum of

Mean of

Freedom

Squares

1
34

Unexplained

767506900.9 767506900.9
194672188.1 5725652.59
Standard

Coefficient
Regression Table

F-Ratio

p-Value

134.0471

< 0.0001

t-Value

Confidence Interval 95%

p-Value

Error

-63993.3
16653.6

Log(Units)

StErr of

0.8931

Explained

Constant

Adjusted

9144.3
1438.4

-6.9981
11.5779

< 0.0001
< 0.0001

Lower

Upper

-82576.8
13730.4

-45409.8
19576.7

Logarithmic Fit
50000

45000

Cost

40000

35000

30000

25000
200

300

400

500

600

700

800

900

Units

(g) Compare the quadratic and the logarithmic fits.

To the naked eye, the logarithmic curve appears to be similar, and about as good a fit,
as the quadratic curve. However, the values of r 2 , adjusted r 2 , and s e indicate that the
logarithmic fit is not quite as good as the quadratic fit:
Model

2
radj

Quadratic
Logarithmic

82.2%
79.8%

81.1%
79.2%

$2,281
$2,393

The advantage of the logarithmic equation is that it is easier to interpret.

-9-

Example 3
Meddicorp
Meddicorp Company sells medical supplies to hospitals, clinics, and doctors offices.
The company currently markets in three regions of the United States: the South, the West,
and the Midwest. These regions are each divided into many smaller sales territories.
Meddicorp management is concerned with the effectiveness of a new bonus program.
This program is overseen by regional sales managers and provides bonuses to salespeople based
on performance. Management wants to know if the bonuses paid in 2010 were related to sales.
(Obviously, if there is a relationship here, the managers expect it to be a direct positive one.)
In determining whether this relationship exists, they also want to take into account the effects of
advertising, market share, and competitors sales. The variables to be used in the study include:
y = Sales Meddicorp sales (in thousands of dollars) in each territory for 2010
x1 = Adv the amount Meddicorp spent on advertising in each territory (in hundreds of dollars) in 2010
x 2 = Bonus the total amount of bonuses paid in each territory (in hundreds of dollars) in 2010
x3 = MktShare percentage of the market share currently held by Meddicorp in each territory

x 4 = Compet largest competitors sales (in thousands of dollars) in each territory

Data for a random sample of 25 of Meddicorps sales territories are contained in the file Meddicorp.xlsx.
Territory Sales
1
963.50
2
893.00

24
1583.75
25
1124.75

Adv
374.27
408.50

583.85
499.15

Bonus
230.98
236.28

289.29
272.55

MktShare Compet
33
29

202.22
252.77

27
26

313.44
374.11

(a) What is the hypothesized regression model?

y 1 x1 2 x2 3 x3 4 x4
(b) Interpret the equation of the true population surface.
The population regression equation

y 1 x1 2 x2 3 x3 4 x4
shows that the conditional mean of y given x1 , x2 , x3 , and x4 is a point on the four-dimensional
hyperplane described by 1 x1 2 x2 3 x3 4 x4 .

- 10 -

(c) Below are the least squares regression results. Conduct the F test for overall fit of the regression.
Regression of y on x1 (Adv), x 2 (Bonus), x3 (MktShare), x 4 (Compet)
Regression Statistics
Multiple R
0.9269
R Square
0.8592
Adjusted R Square
0.8310
Standard Error
93.7697
Observations
25
ANOVA
df
Regression
Residual
Total

Intercept
Adv
Bonus
MktShare
Compet

4
20
24
Coefficients
-593.5375
2.5131
1.9059
2.6510
-0.1207

The hypotheses are:

SS
MS
1073118.5420 268279.6355
175855.1980
8792.7599
1248973.7400
Standard Error
259.1959
0.3143
0.7424
4.6357
0.3718

t Stat
-2.2899
7.9966
2.5673
0.5719
-0.3247

F
30.5114

Significance F
0.0000

P-value
0.0330
0.0000
0.0184
0.5738
0.7488

Lower 95%
-1134.2105
1.8576
0.3574
-7.0188
-0.8963

Upper 95%
-52.8644
3.1687
3.4545
12.3208
0.6549

H 0 : 1 2 3 4 0
H a : At least one j 0

Because of the small P-value ( 0) for the F statistic (= 30.51) we reject the null hypothesis
and conclude that at least one of the regression slopes ( 1 , 2 , 3 , 4 ) is not equal to zero.
This means that at least one of the variables ( x1 , x2 , x3 , x4 ) is important in explaining the
variation in y.
(d) At the 0.05 significance level, test the significance of the relationship between y and each of
the explanatory variables.
The P-values for the four t tests are:
0.00 for Adv, 0.02 for Bonus, 0.57 for MktShare, 0.75 for Compet
Thus, the two explanatory variables x1 (amount spent on advertising) and x 2 (amount of bonuses)
are related to y (sales). The variables x3 (market share) and x 4 (competitors sales) are not
useful in explaining any of the variation in y (sales) and should be excluded from the model.

- 11 -

(e) Below is the regression output for the model with x1 (amount spent on advertising) and
x 2 (amount of bonuses). Interpret the estimated regression equation and its slope coefficients.
Regression of y on x1 (Adv), x 2 (Bonus)
Regression Statistics
Multiple R
0.9246
R Square
0.8549
Adjusted R Square
0.8418
Standard Error
90.7485
Observations
25

Residuals

Residual Plot
180
120
60
0
-60 800
-120
-180

1000

1200

1400

1600

Predicted Sales

ANOVA
df
Regression
Residual
Total

Coefficients
-516.4443
2.4732
1.8562

Intercept
Adv
Bonus

F
64.8306

Significance F
0.0000

Standard Error
189.8757
0.2753
0.7157

P-value
0.0125
0.0000
0.0166

Lower 95%
-910.2224
1.9022
0.3719

1600
1400
1200
1000
800

t Stat
-2.7199
8.9832
2.5934

Sales

2
22
24

SS
MS
1067797.3206 533898.6603
181176.4194
8235.2918
1248973.7400

350

450

550

650

1600
1400
1200
1000
800
220

Adv

Upper 95%
-122.6662
3.0441
3.3405

270

320

Bonus

After rounding, the least squares regression equation describing the relationship between sales
and the two explanatory variables may be written
Predicted Sales = 516.4 + 2.47 Adv + 1.86 Bonus
This equation can be interpreted as providing an estimate of mean sales for a given level of
advertising and bonus payment.
If bonus payment is held fixed, the equation shows that mean sales tends to rise by $2,470
(2.47 thousands of dollars) for each $100 spent on ads.
If advertising is held fixed, the equation shows that mean sales tends to rise by $1,860
(1.86 thousands of dollars) for each $100 of bonus paid.
- 12 -

(f) The best subsets regression procedure has been performed using all four explanatory variables.
Below is a summary of the results. Which is the best model according to the best subsets
regression technique?
Variables in the Regression
Adv
Bonus
Compet
MktShare
Adv, Bonus
Adv, MktShare
Adv, Compet
Bonus, Compet
Bonus, MktShare
MktShare, Compet
Adv, Bonus, MktShare
Adv, Bonus, Compet
Adv, MktShare Compet
Bonus, MktShare, Compet
Adv, Bonus, MktShare, Compet

k+1
2
2
2
2
3
3
3
3
3
3
4
4
4
4
5

5.90
75.19
100.85
120.97
1.61
7.66
7.74
68.03
76.46
100.18
3.11
3.33
9.59
66.95
5.00

r2
0.811
0.323
0.142
0.001
0.855
0.812
0.812
0.387
0.328
0.161
0.859
0.857
0.813
0.409
0.859

2
radj

0.802
0.293
0.105
0.000
0.842
0.795
0.795
0.332
0.267
0.085
0.838
0.836
0.786
0.325
0.831

se
101.42
191.76
215.83
232.97
90.75
103.23
103.38
186.51
195.33
218.20
91.75
92.26
105.52
187.48
93.71

Recall that small values of C p and values close to k + 1 are of interest in choosing good sets
of explanatory variables.
There are four competing models with relatively small C p values:
Variables in the Regression

k+1

2
radj

Adv, Bonus

1.61

0.855

0.842

90.75

Adv, Bonus, MktShare

Adv, Bonus, Compet
Adv, Bonus, MktShare, Compet

4
4
5

3.11
3.33
5.00

0.859
0.857
0.859

0.838
0.836
0.831

91.75
92.26
93.71

The smallest C p value is for the regression with Adv and Bonus as explanatory variables.
It has a C p value of 1.61 and explains 85.5% of the variation in sales.
Note that only modest increases in r 2 are achieved in the other three models.
The adjusted r 2 is highest and the standard error of estimate is smallest for the regression
with Adv and Bonus, again supporting this model as best.
Therefore, the best subsets procedure suggests using this model.

- 13 -

(g) Below is the StatTools output from forward selection, backward elimination, and stepwise
regression when applied to the Meddicorp data with P-value to enter = 0.05 and
P-value to leave = 0.10. Which model appears to be the best?
Forward selection
Multiple
Summary

R-Square

Adjusted

StErr of

R-Square

Estimate
90.7485

0.9246

0.8549

0.8418

Degrees of

Sum of

Mean of

ANOVA Table

Freedom

Squares

Explained
Unexplained

2
22

Regression Table
Constant
Adv
Bonus

Coefficient
-516.4443
2.4732
1.8562
Multiple

Step Information
Adv
Bonus

R
0.9003
0.9246

1067797.3206
181176.4194
Standard

533898.6603
8235.2918
t-Value

Error
189.8757
0.2753
0.7157
R-Square
0.8106
0.8549

-2.7199
8.9832
2.5934

F-Ratio

p-Value

64.8306

< 0.0001

p-Value
0.0125
0.0000
0.0166

Confidence Interval 95%

Lower

Upper

-910.2224
1.9022
0.3719

-122.6662
3.0441
3.3405

Adjusted

StErr of

Entry

R-Square

Estimate

Number

0.8024
0.8418

101.4173
90.7485

1
2

Backward elimination
Multiple
Summary

R-Square

Adjusted

StErr of

R-Square

Estimate
90.7485

0.9246

0.8549

0.8418

Degrees of

Sum of

Mean of

ANOVA Table

Freedom

Squares

Explained
Unexplained

2
22

Regression Table
Constant
Adv
Bonus

Coefficient
-516.4443
2.4732
1.8562
Multiple

Step Information
All Variables
Compet
MktShare

R
0.9269
0.9265
0.9246

1067797.3206
181176.4194
Standard

533898.6603
8235.2918
t-Value

Error
189.8757
0.2753
0.7157
R-Square
0.8592
0.8585
0.8549

- 14 -

-2.7199
8.9832
2.5934

F-Ratio

p-Value

64.8306

< 0.0001

p-Value
0.0125
0.0000
0.0166

Confidence Interval 95%

Lower

Upper

-910.2224
1.9022
0.3719

-122.6662
3.0441
3.3405

Adjusted

StErr of

Exit

R-Square

Estimate

Number

0.8310
0.8382
0.8418

93.7697
91.7508
90.7485

1
2

Stepwise regression
Multiple
Summary

R-Square

Adjusted

StErr of

R-Square

Estimate
90.7485

0.9246

0.8549

0.8418

Degrees of

Sum of

Mean of

ANOVA Table

Freedom

Squares

Explained
Unexplained

2
22

Regression Table
Constant
Adv
Bonus

Adv
Bonus

Standard

Coefficient

R
0.9003
0.9246

533898.6603
8235.2918
t-Value

Error

-516.4443
2.4732
1.8562
Multiple

Step Information

1067797.3206
181176.4194

189.8757
0.2753
0.7157
R-Square
0.8106
0.8549

F-Ratio

p-Value

64.8306

< 0.0001

Confidence Interval 95%

p-Value

-2.7199
8.9832
2.5934

0.0125
0.0000
0.0166

Lower

Upper

-910.2224
1.9022
0.3719

-122.6662
3.0441
3.3405

Adjusted

StErr of

Enter or

R-Square

Estimate

Exit

0.8024
0.8418

101.4173
90.7485

Enter
Enter

Regardless of the procedure used, the result is the same. The equation chosen is
Predicted Sales = 516.4 + 2.47 Adv + 1.86 Bonus

(h) Medicorp markets in three regions of the United States: the South, the West, and the Midwest.
Management of Meddicorp believes that, in addition to advertising and bonus, the regions
it markets in may be important in explaining variation in sales.
What is the equation of the regression model that includes the region information?
Since there are three regions, two indicator variables have to be included in the model.
Let Midwest be the base category. Then the two dummy variables are:
1 if the territory is in the South
x3 South
0 otherwise
1 if the territory is in the West
x4 West
0 otherwise

The model is

Region

South

West

Midwest

y 1 x1 2 x2 3 x3 4 x4

- 15 -

(i) The regression output for this model follows. Interpret the regression equation from the least squares fit.
Regression of y on x1 (Adv), x 2 (Bonus), x3 (South), x 4 (West)
Regression Statistics
Multiple R
0.9730
R Square
0.9468
Adjusted R Square
0.9362
Standard Error
57.6254
Observations
25

Residuals

Residual Plot
120
80
40
0
-40 900
-80
-120

1100

1300

1500

1700

Predicted Sales

ANOVA
df
Regression
Residual
Total

Intercept
Adv
Bonus
South
West

4
20
24
Coefficients
435.0989
1.3678
0.9752
-257.8916
-209.7457

SS
MS
1182559.8959 295639.9740
66413.8441
3320.6922
1248973.7400

F
89.0296

Significance F
0.0000

Standard Error
206.2342
0.2622
0.4808
48.4129
37.4203

P-value
0.0477
0.0000
0.0561
0.0000
0.0000

Lower 95%
4.9020
0.8208
-0.0278
-358.8792
-287.8032

t Stat
2.1097
5.2165
2.0281
-5.3269
-5.6051

Upper 95%
865.2958
1.9148
1.9781
-156.9040
-131.6883

The regression equation is

Predicted Sales = 435.0989 + 1.3678 Adv + 0.9752 Bonus 257.8916 South 209.7457 West
The coefficient of each indicator variable represents the difference in the intercept between
the base-level (Midwest) group and the indicated group. This can be expressed through the
use of three separate equations:
South ( x3 = 1, x 4 = 0):

y 435.0989 1.3678 Adv 0.9752Bonus 257.8916

177.2073 1.3678 Adv 0.9752Bonus

West ( x3 = 0, x 4 = 1):

y 435.0989 1.3678 Adv 0.9752Bonus 209.7457

225.3532 1.3678 Adv 0.9752Bonus

Midwest ( x3 = 0, x 4 = 0):

y 435.0989 1.3678 Adv 0.9752Bonus

For given amounts spent by Meddicorp on advertising and bonuses, the estimated mean sales
in a territory that is in the South region will be $257,892 (257.8916 thousands of dollars)
below the sales in a territory that is in the Midwest region.
For given amounts spent by Meddicorp on advertising and bonuses, the estimated mean sales
in a territory that is in the West region will be $209,746 (209.7457 thousands of dollars)
below the sales in a territory that is in the Midwest region.
- 16 -

(j) Predict the average sales in each region when advertising expenditures equal 500 hundreds of
dollars and bonuses are 250 hundreds of dollars.
South:
y 177.2073 1.3678(500) 0.9752(250) 1104.9073
West:
y 225.3532 1.3678(500) 0.9752(250) 1153.0532
Midwest:
y 435.0989 1.3678(500) 0.9752(250) 1362.7989
The mean sales figures ($1,104,907 for South, $1,153,053 for West, and $1,362,799 for Midwest)
when advertising expenditures equal to $50,000 and bonus payments equal to $25,000 differ according
to the coefficients of the dummy variables: the figure for South is $257,892 smaller than the
figure for Midwest; the figure for West is $209,746 smaller than the figure for Midwest.
(k) Determine whether there is a significant difference in sales for territories in different regions.
Because the location of territories is measured by two variables in a group x3 (South) and

x 4 (West) the appropriate test is the partial F test:

H 0 : 3 4 0
H a : 3 0 and/or 4 0
The full model contains the dummy variables; the reduced model does not.
From the output for the regression of y on x1 (Adv), x 2 (Bonus), x3 (South), x 4 (West),
SSE(full) = 66413.8441 and MSE(full) = 3320.6922
From the output for the regression of y on x1 (Adv) and x 2 (Bonus), SSE(reduced) = 181176.4194
The test statistic is:
SSE (reduced) SSE (full) 181176.4194 66413.8441
57381.2877
2
F number of extra terms

17.2799
MSE (full)
3320.6922
3320.6922
n = 25, k = 4, j = 2; df1 = k j = 4 2 = 2, df2 = n k 1 = 25 4 1 = 20
P-value = FDIST(17.2799,2,20) = 0.000044
The P-value is very small and we reject the null hypothesis.
Thus, at least one of the coefficients of the indicator variables is not zero.
This means that there are statistically significant differences in average sales levels between
the three regions in which Meddicorp does business.

(l) How useful is the group of the dummy variables x3 (South) and x 4 (West)?
Do they improve considerably the explanation of variation in sales?
Model
Adv, Bonus
Adv, Bonus, South, West

r2
85%
95%

2
radj

84%
94%

91
58

2
A comparison of r 2 , radj
, and s e for the reduced and full models shows that the indicator variables

x3 (South) and x 4 (West) carry a lot of explanatory power. They help to explain about 10% more of
the variation in sales while reducing the standard error of estimate by about $33,000.
Therefore, the dummy variables x3 (South) and x 4 (West) should be retained in the model.
- 17 -

Chapter 8 California Mathematics - Grade 6-9
No ratings yet
Chapter 8 California Mathematics - Grade 6-9
64 pages
Geography SL (7) - 2
No ratings yet
Geography SL (7) - 2
18 pages
No Post No Crown1
100% (3)
No Post No Crown1
8 pages
Markov Chain
No ratings yet
Markov Chain
11 pages
Volatility Estimation
No ratings yet
Volatility Estimation
20 pages
Determination of Basic Wind Speed For The Design o
No ratings yet
Determination of Basic Wind Speed For The Design o
9 pages
04 Decision Analysis Part2-2 PDF
100% (1)
04 Decision Analysis Part2-2 PDF
16 pages
C6 Regression
No ratings yet
C6 Regression
27 pages
MIFI 564 - UNIT 1 - New
No ratings yet
MIFI 564 - UNIT 1 - New
53 pages
Forecasting 1234
No ratings yet
Forecasting 1234
52 pages
Cha 5
50% (4)
Cha 5
9 pages
Zero Lecture of MTH302
No ratings yet
Zero Lecture of MTH302
29 pages
Group 1
No ratings yet
Group 1
5 pages
X X X X X X: Data Presentation and Interpretation
No ratings yet
X X X X X X: Data Presentation and Interpretation
89 pages
Lecture 13
No ratings yet
Lecture 13
7 pages
Training Excel
No ratings yet
Training Excel
1 page
Amylase
No ratings yet
Amylase
1 page
HS 221 Mini-Project
No ratings yet
HS 221 Mini-Project
20 pages
Regression Model Development For Revenue Dataset
No ratings yet
Regression Model Development For Revenue Dataset
9 pages
Chapter 13 Part 1
No ratings yet
Chapter 13 Part 1
49 pages
4TH Year Cat 1
No ratings yet
4TH Year Cat 1
12 pages
Regression Analysis
No ratings yet
Regression Analysis
52 pages
Chapter 8 PPT New Period 3
No ratings yet
Chapter 8 PPT New Period 3
12 pages
SPSS Regression PC
No ratings yet
SPSS Regression PC
8 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
Regression Practice Questions
No ratings yet
Regression Practice Questions
19 pages
Regrassion Analysis Lab Question and Answer
No ratings yet
Regrassion Analysis Lab Question and Answer
13 pages
Lecture 12
No ratings yet
Lecture 12
5 pages
House Prices Prediction in King County
No ratings yet
House Prices Prediction in King County
10 pages
AP Statistics Problems #09
0% (1)
AP Statistics Problems #09
6 pages
10 - 4 - ML - SUP - Linear Regression
No ratings yet
10 - 4 - ML - SUP - Linear Regression
59 pages
MGCR 271: Assignment #4 Fall 2019
No ratings yet
MGCR 271: Assignment #4 Fall 2019
10 pages
Design Calculation For Solar Panel
100% (1)
Design Calculation For Solar Panel
7 pages
01 Probability and Probability Distributions
No ratings yet
01 Probability and Probability Distributions
18 pages
Lec 11
No ratings yet
Lec 11
4 pages
Simple and Multiple Regression
100% (2)
Simple and Multiple Regression
39 pages
018-033 Engine Derates
No ratings yet
018-033 Engine Derates
10 pages
Regression
No ratings yet
Regression
6 pages
Lecture 11
No ratings yet
Lecture 11
62 pages
Lecture 12
No ratings yet
Lecture 12
36 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Riyadh Weather
No ratings yet
Riyadh Weather
9 pages
Taler - 2005 - Prediction of Heat Transfer Correlations For Compact Heat Exchangers
No ratings yet
Taler - 2005 - Prediction of Heat Transfer Correlations For Compact Heat Exchangers
14 pages
Statistical Inferences Notes
No ratings yet
Statistical Inferences Notes
15 pages
Regression For Everyone Vol. 1
No ratings yet
Regression For Everyone Vol. 1
25 pages
HW9 Pissot Final
No ratings yet
HW9 Pissot Final
63 pages
Ut511 en
No ratings yet
Ut511 en
31 pages
11 Multiple Regression Part1
100% (1)
11 Multiple Regression Part1
13 pages
Computer Assignment1
No ratings yet
Computer Assignment1
2 pages
Condenser IIT
100% (1)
Condenser IIT
17 pages
14 Building Regression Models Part1-2
No ratings yet
14 Building Regression Models Part1-2
15 pages
14 Building Regression Models Part1-2
No ratings yet
14 Building Regression Models Part1-2
15 pages
14 Altprob 8e
No ratings yet
14 Altprob 8e
3 pages
16 Altprob 8e
No ratings yet
16 Altprob 8e
4 pages
12 Altprob 8e
No ratings yet
12 Altprob 8e
4 pages
Validation of The Standardized Version of RQLQ Juniper1999
No ratings yet
Validation of The Standardized Version of RQLQ Juniper1999
6 pages
557b (Phillips) PDF
100% (1)
557b (Phillips) PDF
17 pages
05 Statistical Inference-2 PDF
No ratings yet
05 Statistical Inference-2 PDF
14 pages
05 Statistical Inference-2 PDF
No ratings yet
05 Statistical Inference-2 PDF
14 pages
06 Simple Linear Regression Part1
No ratings yet
06 Simple Linear Regression Part1
8 pages
Sasin DECS 434 Session 3 - Regression For Prediction
No ratings yet
Sasin DECS 434 Session 3 - Regression For Prediction
55 pages
07 Simple Linear Regression Part2
No ratings yet
07 Simple Linear Regression Part2
9 pages
15 Building Regression Models Part2
No ratings yet
15 Building Regression Models Part2
17 pages
03 Decision Analysis Part1
No ratings yet
03 Decision Analysis Part1
11 pages
Simplelinearregression NBC
No ratings yet
Simplelinearregression NBC
50 pages
Regression Model
No ratings yet
Regression Model
26 pages
SPS Question Key Steps Answer Key: Do Not Give Reason
No ratings yet
SPS Question Key Steps Answer Key: Do Not Give Reason
7 pages
13 Multiple Regression Part3
No ratings yet
13 Multiple Regression Part3
20 pages
Lecture 13
No ratings yet
Lecture 13
53 pages
Tasarı - Estimation of Uster H
No ratings yet
Tasarı - Estimation of Uster H
14 pages
Simple Linear Regression: Slide 1
No ratings yet
Simple Linear Regression: Slide 1
82 pages
Transpiration Lab
No ratings yet
Transpiration Lab
13 pages
09 Inference For Regression Part1
No ratings yet
09 Inference For Regression Part1
12 pages
Multi Class Logistic Regression Training and Testing
No ratings yet
Multi Class Logistic Regression Training and Testing
9 pages
Linear Regression Example Data: House Price in $1000s (Y) Square Feet (X)
No ratings yet
Linear Regression Example Data: House Price in $1000s (Y) Square Feet (X)
33 pages
Writing An ESS Lab Report
100% (2)
Writing An ESS Lab Report
16 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
Brown Robertson PDF
No ratings yet
Brown Robertson PDF
8 pages
2023 Tutorial 11
No ratings yet
2023 Tutorial 11
7 pages
10 Inference For Regression Part2
No ratings yet
10 Inference For Regression Part2
12 pages
Review of Part I Topics Outline
No ratings yet
Review of Part I Topics Outline
19 pages
The Popcorn Lab
No ratings yet
The Popcorn Lab
4 pages
A2 Math C3 Done
No ratings yet
A2 Math C3 Done
53 pages
Group 6 Solution For Assignment
No ratings yet
Group 6 Solution For Assignment
17 pages
12 Multiple Regression Part2
No ratings yet
12 Multiple Regression Part2
9 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
Notes 516 Summer 09 Part 2
No ratings yet
Notes 516 Summer 09 Part 2
15 pages
Non Performing Loan
No ratings yet
Non Performing Loan
21 pages
Presentation Business Applications
No ratings yet
Presentation Business Applications
18 pages
AQRM Exercise Lecture 1
No ratings yet
AQRM Exercise Lecture 1
16 pages
Business Statistics: A Decision-Making Approach: Introduction To Linear Regression and Correlation Analysis
No ratings yet
Business Statistics: A Decision-Making Approach: Introduction To Linear Regression and Correlation Analysis
64 pages
Regression Essay
No ratings yet
Regression Essay
7 pages
Sales P: Finance 30210 Solutions To Problem Set #6: Demand Estimation and Forecasting
No ratings yet
Sales P: Finance 30210 Solutions To Problem Set #6: Demand Estimation and Forecasting
10 pages
Chapter 13 Sol
No ratings yet
Chapter 13 Sol
21 pages
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
No ratings yet
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
23 pages
10 - 4 - ML - SUP - Linear Regression
No ratings yet
10 - 4 - ML - SUP - Linear Regression
59 pages
Regression Practice: KWH HDD Ref
No ratings yet
Regression Practice: KWH HDD Ref
8 pages
Multiple Regression Real Estate Example PDF
No ratings yet
Multiple Regression Real Estate Example PDF
6 pages
Assignment 5 Solutions 1. Repr Minicase: Scatterplot of TIME Vs UNITS
No ratings yet
Assignment 5 Solutions 1. Repr Minicase: Scatterplot of TIME Vs UNITS
5 pages
Homework - Week 7: Problem 3.31
No ratings yet
Homework - Week 7: Problem 3.31
13 pages
Part VI: Simple Regression: X X Y E
No ratings yet
Part VI: Simple Regression: X X Y E
22 pages
Chapter 14
No ratings yet
Chapter 14
3 pages
Statistic For Agriculture Studies: The Assumptions of Regression
No ratings yet
Statistic For Agriculture Studies: The Assumptions of Regression
6 pages
Professional Realty Word
No ratings yet
Professional Realty Word
7 pages
QT2 Tutorial 11 12
No ratings yet
QT2 Tutorial 11 12
5 pages
Multiple Linear Regression: Response Explanatory - I
No ratings yet
Multiple Linear Regression: Response Explanatory - I
5 pages
Regression
No ratings yet
Regression
66 pages
Exam 1 Notes
No ratings yet
Exam 1 Notes
2 pages
Standard-Slope Integration: A New Approach to Numerical Integration
From Everand
Standard-Slope Integration: A New Approach to Numerical Integration
Peter James Italia, MD
No ratings yet
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
No ratings yet
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
23 pages
Heat Transfer II Essentials
From Everand
Heat Transfer II Essentials
The Editors of REA
3.5/5 (3)

16 Review of Part II

Uploaded by

16 Review of Part II

Uploaded by

REVIEW OF PART II

Terms and Concepts

18. ANOVA table

2. Equation of the true population line/plane/surface

19. Sums of Squares (SST, SSR, SSE)

3. Equation of the least squares regression

20. Mean squares (MSR, MSE)

4. Least squares criterion

5. Regression coefficients and their interpretation

22. Variance inflation factor (VIF)

6. Correlation r and its interpretation

23. Correlation matrix

7. Coefficient of determination r 2 and its interpretation 24. Validation of the fit

25. Dummy (indicator) variables

9. Confidence intervals for the intercept ( ) and for

26. Interaction (cross-product) terms

10. The overall F test

27. Nonlinear transformations

11. t test for a regression slope

28. The partial F test

12. Confidence interval for a mean response

13. Prediction interval for an individual response

14. Residuals and residual plots

31. Include/exclude decisions

15. Normal probability plot (Q-Q plot)

32. Principle of parsimony

16. Regression assumptions and how to check them

33. Building regression models

17. Excel regression output (what each cell represents

34. Variable selection procedures

(b) Write the equation of the true population regression line.

Fitted Line Plot

Histogram of the Residuals

Residuals versus Appraised Value

b t * SEb = 1.0466 (2.145)( 0.1126) = 1.0466 0.2415 = 0.8051 to 1.2881

y 127.27 1.0466(802.600) 967.2712

y * t * SEmean 967.2712 2.145(21.6387) 967.2712 46.415

y * t * SEind 967.2712 2.145(73.0102) 967.2712 156.6069

Regression output for the linear model

Confidence Interval 95%

Scatterplot of Residual vs Fit

Regression output for the quadratic model

Confidence Interval 95%

(d) Interpret the regression coefficients.

(e) Test the significance of the quadratic effect.

Regression output for the logarithmic model

Confidence Interval 95%

(g) Compare the quadratic and the logarithmic fits.

The advantage of the logarithmic equation is that it is easier to interpret.

x 4 = Compet largest competitors sales (in thousands of dollars) in each territory

(a) What is the hypothesized regression model?

The hypotheses are:

Adv, Bonus, MktShare

Confidence Interval 95%

Confidence Interval 95%

Confidence Interval 95%

The regression equation is

y 435.0989 1.3678 Adv 0.9752Bonus 257.8916

y 435.0989 1.3678 Adv 0.9752Bonus 209.7457

y 435.0989 1.3678 Adv 0.9752Bonus

x 4 (West) the appropriate test is the partial F test:

You might also like