0% found this document useful (0 votes)
261 views19 pages

Self Check Exercises: Exercise 13.2

This document contains 13 multiple regression problems involving predicting an output variable (Y) from two input variables (X1 and X2). For each problem, the learner is asked to (a) calculate the multiple regression equation and (b) use the equation to predict Y for given values of X1 and X2. It also contains introductory text on multiple regression concepts and applications.

Uploaded by

Bhargav D.S.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
261 views19 pages

Self Check Exercises: Exercise 13.2

This document contains 13 multiple regression problems involving predicting an output variable (Y) from two input variables (X1 and X2). For each problem, the learner is asked to (a) calculate the multiple regression equation and (b) use the equation to predict Y for given values of X1 and X2. It also contains introductory text on multiple regression concepts and applications.

Uploaded by

Bhargav D.S.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Exercise 

13.2

Self­Check Exercises
SC 13-1 Given the following set of data
(a) Calculate the multiple-regression plane.
(b) Predict Y when X1  3.0 and X2  2.7.
Y X1 X2
25 3.5 5.0
30 6.7 4.2
11 1.5 8.5
22 0.3 1.4
27 4.6 3.6
19 2.0 1.3

SC 13-2 The following information has been gathered from a random sample of apartment renters in a
city. We are trying to predict rent (in dollars per month) based on the size of the apartment
(number of rooms) and the distance from downtown (in miles).
Rent Number of Distance from
($) Rooms Downtown
360 2 1
1,000 6 1
450 3 2
525 4 3
350 2 10
300 1 4

(a) Calculate the least-squares equation that best relates these three variables.
(b) If someone is looking for a two-bedroom apartment 2 miles from downtown, what rent
should he expect to pay?

Basic Concepts
13-7 Given the following set of data
(a) Calculate the multiple-regression plane.
(b) Predict Y when X1  10.5 and X2  13.6
Y X1 X2
11.4 4.5 13.2
16.6 8.7 18.7
20.5 12.6 19.8
29.4 19.7 25.4
7.6 2.9 22.8
13.8 6.7 17.8
28.5 17.4 14.6

13-8 For the following set of data:


(a) Calculate the multiple-regression plane.
(b) Predict Y for X1  28 and X2  10.
Y X1 X2
10  8  4
17 21  9
18 21 11
26 17 20
35 36 13
 8  9 28
13-9 Given the following set of data
(a) Calculate the multiple-regression plane.
(b) Predict Y when X1 1 and X2  4.
Y X1 X2
6 1 3
10 3 –1
9 2 4
14 –2 7
7 3 2
5 6 –4

Applications
13-10 Sam Spade, owner and general manager of the Campus Stationery Store, is concerned about
the sales behavior of a compact cassette tape recorder sold at the store. He realizes that there
are many factors that might help explain sales, but believes that advertising and price are
major determinants. Sam has collected the following data:
Sales Advertising Price
(units sold) (number of ads) ($)
33  3 125
61  6 115
70 10 140
82 13 130
17  9 145
24  6 140

(a) Calculate the least-squares equation to predict sales from advertising and price.
(b) If advertising is 7 and price is $132, what sales would you predict?
13-11 A developer of food for pigs would like to determine what relationship exists among the age
of a pig when it starts receiving a newly developed food supplement, the initial weight of the
pig, and the amount of weight it gains in a 1-week period with the food supplement. The
following information is the result of a study of eight piglets:
X1 X2 Y
Piglet Initial Weight Initial Age Weight
Number (Pounds) (Weeks) Gain
1 39 8 7
2 52 6 6
3 49 7 8
4 46 12 10
5 61 9 9
6 35 6 5
7 25 7 3
8 55 4 4

(a) Calculate the least-squares equation that best describes these three variables.
(b) How much might we expect a pig to gain in a week with the food supplement if it were
9 weeks old and weighed 48 pounds?
13-12 A graduate student trying to purchase a used Neptune car has researched the prices. She
believes the year of the car and the number of miles the car has been driven both influence the
purchase price. Data are given below for 10 cars with the price (Y) in thousands of dollars,
year (X1) and miles driven (X2) in thousands.
(a) Calculate the least-squares equation that best relates these three variables.
(b) The student would like to purchase a 1991 Neptune with about 40,000 miles on it. How
much do you predict she will pay?
(Y ) X2
Price X1 Miles
($ thousands) Year (thousands)
2.99 1987 55.6
6.02 1992 18.4
8.87 1993 21.3
3.92 1988 46.9
9.55 1994 11.8
9.05 1991 36.4
9.37 1992 28.2
4.2 1988 44.2
4.8 1989 34.9
5.74 1991 26.4

13-13 The Federal Reserve is performing a preliminary study to determine the relationship between
certain economic indicators and annual percentage change in the gross national product
(GNP). Two such indicators being examined are the amount of the federal government’s
deficit (in billions of dollars) and the Dow Jones Industrial Average (the mean value over the
year). Data for 6 years follow:
Y X1 X2
Change in GNP Federal Deficit Dow Jones
2.5 100 2,850
–1.0 400 2,100
4.0 120 3,300
1.0 200 2,400
1.5 180 2,550
3.0 80 2,700

(a) Calculate the least-squares equation that best describes the data.
(b) What percentage change in GNP would be expected in a year in which the federal deficit
was $240 billion and the mean Dow Jones value was 3,000?

Exercise 13.3

Self­Check Exercise
SC13-3 Pam Schneider owns and operates an accounting firm in Ithaca, New York. Pam feels that it
would be useful to be able to predict in advance the number of rush income-tax returns during
the busy March 1 to April 15 period so that she can better plan her personnel needs during
this time. She has hypothesized that several factors may be useful in her prediction. Data for
these factors and number of rush returns for past years are as follows:
X1 X2 X3 Y
Economic Population within Average Income Number of Rush Returns,
Index 1 Mile of Office in Ithaca March 1 to April 15
 99 10,188 21,465 2,306
106 8,566 22,228 1,266
100 10,557 27,665 1,422
129 10,219 25,200 1,721
179 9,662 26,300 2,544

(a) Use the following Minitab output to determine the best-fitting regression equation for
these data:
The regression equation is
Y  -1275  17.1 X1  0.541 X2 - 0.174 X3
Predictor Coef Stdev t-ratio p
Constant -1275 2699 -0.47 0.719
X1 17.059 6.908 2.47 0.245
X2 0.5406 0.3144 1.72 0.335
X3 -0.1743 0.1005 -1.73 0.333

s  396.1   R - sq    87.2%

(b) What percentage of the total variation in the number of rush returns is explained by this
equation?
(c) For this year, the economic index is 169, the population within 1 mile of the office is
10,212, and the average income in Ithaca is $26,925. How many rush returns should Pam
expect to process between March 1 and April 15?

Basic Concepts
13-14 Given the following set of data, use whatever computer package is available to find the best-
fitting regression equation and answer the following:
(a) What is the regression equation?
(b) What is the standard error of estimate?
(c) What is R2 for this regression?
(d) What is the predicted value for Y when X1  5.8, X2  4.2, and X3  5.1?
Y X1 X2 X3
64.7 3.5 5.3 8.5
80.9 7.4 1.6 2.6
24.6 2.5 6.3 4.5
43.9 3.7 9.4 8.8
77.7 5.5 1.4 3.6
20.6 8.3 9.2 2.5
66.9 6.7 2.5 2.7
34.3 1.2 2.2 1.3

13-15 Given the following set of data, use whatever computer package is available to find the best-
fitting regression equation and answer the following:
(a) What is the regression equation?
(b) What is the standard error of estimate?
(c) What is R2 for this regression?
(d) Give an approximate 95 percent confidence interval for the value of Y when the values of
X1, X2, X3, and X4 are 52.4, 41.6, 35.8, and 3, respectively.
X1 X2 X3 X4 Y
21.4 62.9 21.9 –2 22.8
51.7 40.7 42.9 5 93.7
41.8 81.8 69.8 2 64.9
11.8 41.0 90.9 –4 19.2
71.6 22.6 12.9 8 55.8
91.9 61.5 30.9 1 23.1

Applications
13-16 Police stations across the country are interested in predicting the number of arrests they can
expect to process each month so as to better schedule office employees. Historically, the
average number of arrests (Y) each month is influenced by the number of officers on the
police force (X1), the population of the city in thousands (X2), and the percentage of
unemployed people in the city (X3). Data for these factors in 15 cities are presented below.
(a) Using whatever computer package is available, determine the best-fitting regression
equation for these data.
(b) What percentage of the total variation in the number of arrests (Y) is explained by this
equation?
(c) The ChapelBoro police department is trying to predict the number of monthly arrests.
ChapelBoro has a population of 75,000, a police force of 82, and an unemployment
percentage of 10.5 percent. How many arrests do you predict for each month?
Monthly Average Number of Officers Size of the City (X2) Percentage
Number of Arrests (Y) on the Force (X1) in Thousands Unemployed (X3)
390.6 68 81.6 4.3
504.3 94 75.1 3.9
628.4 125 97.3 5.6
745.6 175 123.5 8.7
585.2 113 118.4 11.4
450.3 82 65.4 9.6
327.8 46 61.6 12.4
260.5 32 54.3 18.3
477.5 89 97.4 4.6
389.8 67 82.4 6.7
312.4 47 56.4 8.4
367.5 59 71.3 7.6
374.4 61 67.4 9.8
494.6 87 96.3 11.3
487.5 92 86.4 4.7

13-17 We are trying to predict the annual demand for widgets (DEMAND) using the following
independent variables.
PRICE  price of widgets (in $)

INCOME  consumer income (in $)

SUB  price of a substitute commodity (in $)

(Note: A substitute commodity is one that can be substituted for another commodity. For
example, margarine is a substitute commodity for butter.)
Data have been collected from 1982 to 1996:
Year Demand Price ($) Income ($) Sub ($)
1982 40 9 400 10
1983 45 8 500 14
1984 50 9 600 12
1985 55 8 700 13
1986 60 7 800 11
1987 70 6 900 15
1988 65 6 1,000 16
1989 65 8 1,100 17
1990 75 5 1,200 22
1991 75 5 1,300 19
1992 80 5 1,400 20
1993 100 3 1,500 23
1994 90 4 1,600 18
1995 95 3 1,700 24
1996 85 4 1,800 21

(a) Using whatever computer package is available, determine the best-fitting regression
equation for these data.
(b) Are the signs ( or) of the regression coefficients of the independent variables as one
would expect? Explain briefly. (Note: This is not a statistical question; you just need to
think about what the regression coefficients mean.)
(c) State and interpret the coefficient of multiple determination for this problem.
(d) State and interpret the standard error of estimate for this problem.
(e) Using the equation, what would you predict for DEMAND if the price of widgets was $6,
consumer income was $1,200, and the price of the substitute commodity was $17?
13-18 Bill Buxton, a statistics professor in a leading business school, has a keen interest in factors
affecting students’ performance on exams. The midterm exam for the past semester had a
wide distribution of grades, but Bill feels certain that several factors explain the
distribution: He allowed his students to study from as many different books as they liked,
their IQs vary, they are of different ages, and they study varying amounts of time for
exams. To develop a predicting formula for exam grades, Bill asked each student to answer,
at the end of the exam, questions regarding study time and number of books used. Bill’s
teaching records already contained the IQs and ages for the students, so he compiled the
data for the class and ran a multiple regression with Minitab. The output from Bill’s
computer run was as follows:
Predictor Coef Stdev t-ratio p
Constant -49.948 41.55 -1.20 0.268
HOURS 1.06931 0.98163 1.09 0.312
IQ 1.36460 0.37627 3.63 0.008
BOOKS 2.03982 1.50799 1.35 0.218
AGE -1.79890 0.67332 -2.67 0.319

s  11.657   R - sq  76.7%

(a) What is the best-fitting regression equation for these data?


(b) What percentage of the variation in grades is explained by this equation?
(c) What grade would you expect for a 21-year-old student with an IQ of 113, who studied
5 hours and used three different books?
13-19 Fourteen Twenty-Two Food Stores, Inc., is planning to expand its convenience store chain. To
aid in selecting locations for the new stores, it has collected weekly sales data from each of its
23 stores. To help explain the variability in weekly sales, it has also collected information
describing four variables that it believes are related to sales. The data that were collected
-follow. The variables are defined as follows:
SALES : average weekly sales for each store in thousands of dollars

AUTOS : average weekly auto traffic volume in thousands of cars

ENTRY : ease of entry/exit measured on a scale of 1 to 100

ANNINC : average annual household income for the area in thousands of dollars

DISTANCE : distance in miles from the store to the nearest supermarket

The data were analyzed using Minitab and the output follows:
Predictor Coef Stdev t-ratio p
Constant 175.37 92.62 1.89 0.075
AUTOS -0.028 0.315 -0.09 0.929
ENTRY 3.775 1.272 2.97 0.008
ANNINC 1.990 4.510 0.44 0.664
DISTANCE 212.41 28.090 7.56 0.000
s  85.587   R - sq  95.8%

(a) What is the best-fitting regression equation, as given by Minitab?


(b) What is the standard error of estimate for this equation?
(c) What fraction of the variation in sales is explained by this regression?
(d) What sales would you predict for a store located in a neighborhood that had an average
annual household income of $20,000, was 2 miles from the nearest supermarket, was on
a road with weekly traffic volume of 100,000 autos, and had an ease of entry of 50?
13-20 Rick Blackburn is thinking about selling his house. In order to decide what price to ask, he
has collected data for 12 recent closings. He has recorded sales price (in $l,000s), the number
of square feet in the house (in 100s of sq ft.), the number of stories, the number of bathrooms,
and the age of the house (in years).
Sales Price Square Feet Stories Bathrooms Age
49.65 8.9 1 1.0 2
67.95 9.5 1 1.0 6
81.15 12.6 2 1.5 11
81.60 12.9 2 1.5 8
91.50 19.0 2 1.0 22
95.25 17.6 1 1.0 17
100.35 20.0 2 1.5 12
104.25 20.6 2 1.5 11
112.65 20.5 1 2.0 9
149.70 25.1 2 2.0 8
160.65 22.7 2 2.0 18
232.50 40.8 3 4.0 12

(a) Using whatever computer package is available, determine the best-fitting regression
equation for these data.
(b) What is R2 for this equation? What does this number measure?
(c) If Rick’s house has 1,800 square feet (18.0 hundreds of square feet), 1 story, 1.5
bathrooms, and is 6 years old, what sale price can Rick expect?
13-21 Allegheny Steel Corporation has been looking into the factors that influence how many
millions of tons of steel it is able to sell each year. Management suspects that the following
are major factors: the annual national inflation rate, the average price per ton by which
imported steel undercuts Allegheny’s prices (in dollars), and the number of cars (in millions)
that U.S. automakers are planning to produce in that year. Data for 7 years have been
collected:
Y X1 X2 X3
Millions of Inflation Imported Number of
Year Tons Sold Rate Undercut Cars
1993 4.2  3.1 3.10 6.2
1992 3.1  3.9 5.00 5.1
1991 4.0  7.5 2.20 5.7
1990 4.7 10.7 4.50 7.1
1989 4.3 15.5 4.35 6.5
1988 3.7 13.0 2.60 6.1
1987 3.5 11.0 3.05 5.9

(a) Using whatever computer package is available, determine the best-fitting regression
equation for these data.
(b) What percentage of the total variation in the number of millions of tons of steel sold by
Allegheny each year is explained by this equation?
(c) How many tons of steel should Allegheny expect to sell in a year in which the inflation
rate is 7.1, American automakers are planning to produce 6.0 million cars, and the
average imported price undercut per ton is $3.50?
Exercise 13.4

Self­Check Exercises
SC 13-4 Edith Pratt is a busy executive in a nationwide trucking company. Edith is late for a meeting
because she has been unable to locate the multiple-regression output that an associate
produced for her. If the total regression was significant at the 0.05 level, then she wanted to
use the computer output as evidence to support some of her ideas at the meeting. The
subordinate, however, is sick today and Edith has been unable to locate his work. As a matter
of fact, all the information she possesses concerning the multiple regression is a piece of
scrap paper with the following on it:
Regression for E. Pratt
SSR 872.4, with df
SSE , with 17 df
SST 1023.6, with 24 df

Because the scrap paper doesn’t even have a complete set of numbers on it, Edith has
concluded that it must be useless. You, however, should know better. Should Edith go directly
to the meeting or continue looking for the computer output?
SC13-5 A New England-based commuter airline has taken a survey of its 15 terminals and has
obtained the following data for the month of February, where
SALES    total revenue based on number of tickets sold (in thousands of dollars)

PROMOT  amount spent on promoting the airline in the area (in thousands of dollars)

COMP    number of competing airlines at that terminal

FREE    the percentage of passengers who flew free (for various reasons)

Sales ($) Promot ($) Comp Free


79.3 2.5 10 3
200.1 5.5 8 6
163.2 6.0 12 9
200.1 7.9 7 16
146.0 5.2 8 15
177.7 7.6 12 9
30.9 2.0 12 8
291.9 9.0 5 10
160.0 4.0 8 4
339.4 9.6 5 16
159.6 5.5 11 7
86.3 3.0 12 6
237.5 6.0 6 10
107.2 5.0 10 4
155.0 3.5 10 4

(a) Use the following Minitab output to determine the best-fitting regression equation for the
airline:
The regression equation is
SALES  172  25.9 PROMOT - 13.2 COMP - 3.04 FREE

Predictor Coef Stdev t-ratio p


Constant 172.34 51.38 3.35 0.006
PROMOT 25.950 4.877 5.32 0.000
COST -13.238 3.686 -3.59 0.004
FREE -3.041 2.342 -1.30 0.221

(b) Do the passengers who fly free cause sales to decrease significantly? State and test
appropriate hypotheses. Use   0.05.
(c) Does an increase in promotions by $1,000 change sales by $28,000, or is the change
significantly different from $28,000? State and test appropriate hypotheses. Use   0.10.
(d) Give a 90 percent confidence interval for the slope coefficient of COMP.

Applications
13-22 Mark Lowtown publishes the Mosquito Junction Enquirer and is having difficulty predicting
the amount of newsprint needed each day. He has randomly selected 27 days over the past
year and recorded the following information:
POUNDS    pounds of newsprint for that day’s newspaper

CLASFIED  number of classified advertisements

DISPLAY   number of display advertisements

FULLPAGE  number of full-page advertisements

Using Minitab to regress POUNDS on the other three variables, Mark got the output that
follows.
Predictor Coef Stdev t-ratio p
Constant 1072.95 872.43 1.23 0.232
CLASFIED 0.251 0.126 1.99 0.060
DISPLAY 1.250 0.884 1.41 0.172
FULLPAGE 250.66 67.92 3.69 0.001

(a) Mark had always felt that each display advertisement used at least 3 pounds of
newsprint. Does the regression give him significant reason to doubt this belief at the 5
percent level?
(b) Similarly, Mark had always felt that each classified advertisement used roughly half a
pound of newsprint. Does he now have significant reason to doubt this belief at the
5 percent level?
(c) Mark sells full-page advertising space to the local merchants for $30 per page. Should he
consider adjusting his rates if newsprint costs him 9¢ per pound? Assume other costs are
negligible. State explicit hypotheses and an explicit conclusion.  (Hint: Holding all else
constant, each additional full-page ad uses 250.66 pounds of paper  $0.09 per pound 
$22.56 cost. Breakeven is at 333.333 pounds. Why? Thus, if the slope coefficient for
FULLPAGE is significantly above 333.333, Mark is not making a profit and his rates
should be changed.)
13-23 Refer to Exercise 13-18. At a significance level of 0.10, which variables are significant
explanatory variables for exam scores? (There were 12 students in the sample.)
13-24 Refer to Exercise 13-18. The following additional output was provided by Minitab when Bill
ran the multiple regression:
Analysis of Variance
SOURCE DF SS MS F p
Regression 4 3134.42 783.60
Error 7 951.25 135.89
Total 11 4085.67

(a) What is the observed value of F?


(b) At a significance level of 0.05, what is the appropriate critical value of F to use in
determining whether the regression as a whole is significant?
(c) Based on your answers to (a) and (b), is the regression significant as a whole?
13-25 Refer to Exercise 13-19. At a significance level of 0.01, is DISTANCE a significant
explanatory variable for SALES?
13-26 Refer to Exercise 13-19. The following additional output was provided by Minitab when the
multiple regression was run:
Analysis of Variance

SOURCE DF SS MS F p
Regression 4 2861495 715374 102.39 0.000
Error 18 125761 6896.7
Total 22 2987256

At the 0.05 level of significance, is the regression significant as a whole?


13-27 Henry Lander is director of production for the Alecos Corporation of Caracas, Venezuela.
Henry has asked you to help him determine a formula for predicting absenteeism in a
meatpacking facility. He hypothesizes that percentage absenteeism can be explained by
average daily temperature. Data are gathered for several months, you run the simple
regression, and you find that temperature explains 66 percent of the variation in
absenteeism. But Henry is not convinced that this is a satisfactory predictor. He suggests
that daily rainfall may also have something to do with absenteeism. So you gather data, run
a regression of absenteeism on rainfall, and get an r2 of 0.59. “Eureka!” you cry. “I’ve got
it! With one predictor that explains 66 percent and another that explains 59 percent, all I
have to do is run a multiple regression using both predictors, and I’ll surely have an almost
perfect predictor!” To your dismay, however, the multiple regression has an R2 of only 68
percent, which is just slightly better than the temperature variable alone. How can you
account for this apparent discrepancy?
13-28 Juan Armenlegg, manager of Rocky’s Diamond and Jewelry Store, is interested in developing
a model to estimate consumer demand for his rather expensive merchandise. Because most
customers buy diamonds and jewelry on credit, Juan is sure that two factors that must
influence consumer demand are the current annual inflation rate and the current prime
lending rate at the leading banks in the country. Explain some of the problems that Juan might
encounter if he were to set up a regression model based on his two predictor variables.
13-29 A new game show, Check That Model, asks contestants to specify the minimum number of
parameters they need to determine whether a multiple regression model is significant as a
whole at   0.01. You have won the bidding with 4 parameters. Using the information
below, determine whether the regression is significant.

 R2  0.7452

SSE  125.4

n  18

Number of independent variables  3

13-30 The Scottish Tourist Agency is interested in the number of tourists who enter the country
weekly during the high season (Y). Data have been collected and are presented below:

Tourists (Y)  Number of tourists who entered Scotland in a week (in thousands)
Rate (X1)   Number of Scottish pounds purchased for $1 U.S.

Price (X2)    Number of Scottish pounds charged for round-trip bus fare from London to Edinburgh

Promot (X3)     Amount spent on promoting the country (in thousands of Scottish pounds)

Temp (X4)    Mean temperature during the week in Edinburgh (in degrees Celsius)

Tourists (Y) Rate (X1) Price (X2) Promot (X3) Temp (X4)
6.9 0.61 40 8.7 15.4
7.1 0.59 40 8.8 15.6
6.8 0.63 40 8.5 15.4
7.9 0.61 35 8.6 15.3
7.6 0.6 35 9.4 15.8
8.2 0.65 35 9.9 16.2
8.0 0.58 35 9.8 16.4
8.4 0.59 35 10.2 16.6
9.7 0.61 30 11.4 17.4
9.8 0.62 30 11.6 17.2
7.2 0.57 40 8.4 17.6
6.7 0.55 40 8.6 16.4

(a) Using whatever computer package is available, determine the best-fitting regression
equation for the tourist agency.
(b) Is the currency exchange rate a significant explanatory variable? State and test the
appropriate hypotheses at a 0.10 significance level.
(c) Does an increase in promotions by one thousand pounds increase the number of tourists
by more than 200? State and test appropriate hypotheses at a 0.05 significance level.
(d) Give a 95 percent confidence interval for the slope coefficient of Temp.

Exercise 13.5

Self­Check Exercises
SC 13-6 Cindy’s, a popular fast-food chain, has recently experienced a marked change in its sales as a
result of a very successful advertising campaign. As a result, management is now looking for
a new regression model for its sales. The following data have been collected in the 12 weeks
since the advertising campaign began.
Time Sales (in thousands) Time Sales (in thousands)
1 4,618 7 19,746
2 3,741 8 34,215
3 5,836 9 50,306
4 4,367 10 65,717
5 5.118 11 86,434
6 8,887 12 105,464

(a) Use the following Minitab output to determine the best-fitting regression of SALES on TIME:
The regression equation is
SALES  26233  9093 TIME
Predictor Coef Stdev t-ratio p
Constant –26233 9551 –2.75 0.021
TIME 9093 1298 7.01 0.000
s  15518   R-sq  83.1%

ROW SALES FITS1 RESI1 ROW SALES FITS1 RESI1


1 4618 –17140 21758 7 19746 37417 –17671
2 3741 –8047 11788 8 34215 46510 –12295
3 5836 1046 4790 9 50306 55603 –5297
4 4367 10139 –5772 10 65717 64696 1021
5 5118 19231 –14113 11 86434 73789 12645
6 8887 28324 –19437 12 105464 82881 22583

(b) Are you satisfied with your model as a predictor of SALES? Explain.
(c) The following output uses TIME and TIMESQR (TIME squared) as explanatory
variables. Is this quadratic model better fit to the data? Explain.
The regression equation is
SALES  13981 8142 TIME  1326 TIMESQR
Predictor Coef Stdev t-ratio p
Constant 13981 2720 5.14 0.000
TIME –8141.5 961.9 –8.46 0.000
TIMESQR 1325.72 72.03 18.41 0.000
s  2631    R-sq  99.6%

ROW SALES FITS2 RESI2 ROW SALES FITS2 RESI2


1 4618 7165 –2547 7 19746 21950 –2204
2 3741 3001 740 8 34215 33695 520
3 5836 1488 4348 9 50306 48090 2216
4 4367 2626 1741 10 65717 65138 579
5 5118 6416 –1298 11 86434 84836 1598
6 8887 12858 –3971 12 105464 107186 –1722

SC 13-7 Below are some data on consumption expenditures, CONSUMP; disposable income,
INCOME; and sex of the head of household, SEX, of 12 randomly chosen families. The
variable GENDER has been coded:

Consump Income ($) Sex Gender


37,070 45,100 M 1
22,700 28,070 M 1
24,260 26,080 F 0
30,420 35,000 M 1
17,360 18,860 F 0
33,520 41,270 M 1
26,960 32,940 M 1
19,360 21,440 F 0
35,680 44,700 M 1
22,360 24,400 F 0
28,640 33,620 F 0
39,720 46,000 M 1

(a) Use the following Minitab output to determine the best-fitting regression to predict
CONSUMP from INCOME and GENDER.
   The regression equation is
   CONSUMP  2036  0.818 INCOME 1664 GENDER
Predictor Coef Stdev t-ratio p
Constant 2036 1310 1.55 0.155
INCOME 0.81831 0.04940 16.56 0.000
GENDER –1664.2 916.9 –1.82 0.103
s  1015    R-sq  98.4%

(b) If disposable income is held constant, is there a significant difference in consumption


between households headed by a male versus those where the head of household is
female? State explicit hypotheses, test them at the 0.10 level, and state an explicit
conclusion.
(c) Give an approximate 95 percent confidence interval for consumption for a household
with disposable income of $40,000 headed by a male.

Basic Concepts
13-31 Describe three situations in everyday life in which dummy variables could be used in
regression models.
13-32 A restaurant owner with restaurants in two cities believes that revenue can be predicted from
traffic flow in front of the restaurant with a quadratic model.
(a) Describe a quadratic model to predict revenue from traffic flow. State the form of the
regression equation.
(b) It has been suggested that the city the restaurant is in has an effect on revenue. Extend
your model from part (a) by using a dummy variable to incorporate the suggestion.
Again, state the form of the regression model.
13-33 Suppose you have a set of data points to which you have fitted a linear regression equation.
Even though the R2 for the line is very high, you wonder whether it would be a good idea to
fit a second-degree equation to the data. Describe how you would make your decision based
on
(a) A scattergram of the data.
(b) A table of residuals from the linear regression.
13-34 A statistician collected a set of 20 pairs of data points. He called the independent variable X1
and the dependent variable Y. He ran a linear regression of Y on X1, and he was dissatisfied
with the results. Because of some nonrandom patterns he observed in the residuals, he
decided to square the values of X1; he called these squared values X2. The statistician then ran
a multiple regression of Y on both X1 and X2. The resulting equation was
Ŷ  200.4  2.79X1 3.92X2

s s
The value of b1 was 3.245 and the value of b2 was 1.53. At a 0.05 level of significance,
determine whether
(a) The set of unsquared values of X1 is a significant explanatory variable for Y.
(b) The set of squared values of X1 is a significant explanatory variable for Y.

Applications
13-35 Dr. Linda Frazer runs a medical clinic in Philadelphia. She collected data on age, reaction to
penicillin, and systolic blood pressure for 30 patients. She established systolic blood pressure
as the dependent variable, age as X1 (independent variable) and reaction to penicillin as X2
(independent variable). Letting 0 stand for a positive reaction to penicillin and 1 stand for a
negative reaction, she ran a multiple regression on her desktop personal computer. The
predicting equation was
Yˆ  6.7  3.5X1  0.489X2

(a) After the regression had already been run, Dr. Frazer discovered that she had meant to
code a positive reaction as 1 and a negative reaction as 0. Does she have to rerun the
regression? If so, why? If not, give her the equation she would have gotten if the variable
had been coded as she had originally intended.
s
(b) If b2 has a value of 0.09, does this regression provide evidence at a significance level of
0.05 that the reaction to penicillin is a significant explanatory variable for systolic blood
pressure?
13-36 Excelsior Notebook computers is reexamining its inventory control policy. They need to
accurately predict the number of the EXC-11E computers that will be ordered by suppliers in
the weeks to come. The data for the last 15 weeks are presented below
Time Demand (in 1000’s)
1 6.7
2 10.2
3 13.4
4 15.6
5 18.2
6 22.6
7 30.5
8 31.4
9 38.7
10 41.6
11 48.7
12 51.4
13 55.8
14 61.5
15 68.9

(a) Using any available computer package, fit a linear model with TIME as the independent
variable and DEMAND as the dependent variable.
(b) Fit a quadratic model for the data. Is this model better? Explain.
13-37 Below are some data from a local pizza parlor on gross sales (SALES), promotion dollars
(PROMO), and type of promotion, including radio, newspaper, or flyers. Assume the pizza
parlor used only one type of promotion in any given week. The variables Type1 and Type2
have been coded:
TYPE1  1 if radio was used, 0 otherwise

TYPE2  1 if flyers were used, 0 otherwise

(when both TYPE1 and TYPE2 are 0, that week’s promotion budget was spent on newspaper
advertisements).
SALES (in 100s) PROMO (in 100s) TYPE1 TYPE2
12.1 3.8 0 1
19.1 6.4 0 1
26.9 7.9 0 0
24.8 8.7 1 0
37.1 12.4 1 0
39.4 15.9 0 1
32.5 11.3 0 0
28.9 9.4 0 0
28.8 8.6 1 0
34.7 12.7 0 1
38.4 14.3 0 0
26.3 6.7 1 0

(a) Using any available computer package, fit a regression model to predict SALES from
PROMO, TYPE1, and TYPE2.
(b) State the fitted regression function.
(c) If PROMO is held constant, is there a significant difference between radio and
newspaper? State appropriate hypotheses and test at a 0.05 level of significance.
(d) If PROMO is held constant, is there a significant difference between flyers and
newspaper? State appropriate hypotheses and test at a 0.05 level of significance.
(e) Compute a 90 percent confidence interval for SALES in a week when $800 is spent using
radio advertisements as the only type of promotion.

Chapter Concepts Test
Circle the correct answer, or fill in the blank. Answers are in the back of the book.
TF 1. The principal advantage of multiple regression over simple regression is that it
allows us to use more of the information available to estimate the dependent
variable.
TF 2. Suppose, in the multiple-regression equation = 24.4 + 5.6X1 + 6.8X2, stands for
weight (in pounds) and X2 stands for age (in years). For each additional year of age,
then, it can be expected that weight will increase by 24.4 pounds.
TF 3. Although it is theoretically possible to do multiple-regression calculations by hand,
we seldom do so.
TF 4. Suppose you are attempting to form a confidence interval for a value of Y from a
multiple-regression equation. If there are 20 elements in the sample and 4
independent variables are used in the regression, you should use 16 degrees of
freedom when you get a value from the t table.
TF 5. The standard error of the coefficient b2 in a multiple regression is denoted s2.
TF 6. Suppose we wish to test whether the values of Y in a multiple regression really
depend on the values of X1. The null hypothesis for our test will be B1 = 0.
TF 7. To determine whether a regression is significant as a whole, an observed value of F

is calculated and compared to a value from a table.


TF 8. If one knows the total sum of squares and regression sum of squares for a multiple
regression, the error sum of squares can always be quickly calculated.
TF 9. Certain patterns in the signs of the residuals from a second-degree regression model
indicate that we should instead use a straight-line model.
TF 10. Simple regressions of Y on X1 and Y on X2 show that X1 and X2 are both significant
explanatory variables for Y. But a multiple regression of Y on X1 and X2 says that
neither X1 nor X2 is a significant explanatory variable for Y. Clearly, this is a case of
multicollinearity.
TF 11. Dummy variables are a technique that can be used to incorporate qualitative data
into multiple regressions.
TF 12. When using a dummy variable with values of 0 and 1, it is very important to make
sure that the 0’s and 1’s are used according to standard practice. Reversing the
coding will completely destroy the results of the multiple regression.
TF 13. We can form a second-degree regression model by multiplying observed values of
an independent variable by 2.
TF 14. Adding additional variables to a multiple regression will always reduce the standard
error of estimate.
TF 15. Suppose a multiple regression yielded this equation: = 5.6 +. 2.8X1 – 3.9X2 + 5.6X3.
If X1, X2, and X3 all had values of zero, then Y could be expected to have a value of
5.6.
TF 16. The analysis of the residuals in a straight-line regression model is done to determine
the correct value for se.
TF 17. Although it is possible to make inferences about the regression as a whole, it is not
possible to make inferences about the estimated regression coefficients.
TF 18. If there is a high level of correlation between explanatory variables, it is usually
possible to disentangle the separate contributions of these variables in a regression.
TF 19. The standard error of the population data points is denoted se.
TF 20. If a regression includes all relevant explanatory factors, the residuals should be
random.
TF 21. A linear relationship between explanatory variables will always produce
multicollinearity in the regression model.
A B C D E 22. Suppose that a multiple regression yielded this equation: = 51.21 + 6.88X1 + 7.06X2
– 3.71X3. The value of b2 for this equation is:
(a) 51.21.
(b) 6.88.
(c) 7.06.
(d) – 3.71.
(e) Cannot be determined from information given.
  A B C D 23. We have said that the standard error of estimate has n–k – 1 degrees of freedom.
What does the k stand for in this expression?
(a) Number of elements in the sample.
(b) Number of independent variables in the multiple regression.
(c) Mean of the sample values of the dependent variable.
(d) None of these.
A B C D E 24. Suppose that You have run a multiple regression and have found that the value of b1
is 1.66. Historical data, however, indicate that the value of B1 should be 1.34. You
wish to test, at a 0.05 level of significance, the null hypothesis that B1 is still 1.34.
Assuming that you have access to any tables you may need, what other information
is required for you to perform your test?
(a) Degrees of freedom.
(b)
(c) se .
(d) (a) and (b) but not (c).
(e) (a) and (c) but not (b).
A B C D 25. Suppose that a toy manufacturer wishes to determine whether his red toys sell better
than his blue toys. He gathered data regarding sales levels, color, price, and average
age levels for which the toys are intended. He entered these into a computer run.
The resulting multiple-regression equation was = 70,663 – 713X1 – 59.6X2 + 66.4X3,
where refers to sales levels in units, X1 refers to color (0 = blue, 1 = red), X2 refers
to retail price (in dollars), and X3 refers to average age level (in years). Which of the
following is true if factors of price and age level are held constant?
(a) Red toys should sell 713 more units than blue toys.
(b) Red toys should sell 713 fewer units than blue toys.
(c) Children will always choose a blue toy over a red one.
(d) (b) and (c) but not (a).

Questions 26 through 31 deal with a director of personnel who is trying to


determine a predicting equation for longevity in his plant. He has used Minitab
to regress months employed for several employees on their education levels
(years of schooling), age when hired, score on the company’s psychological
maturity test, and number of dependents (including the employee). Here are his
results:

The regression equation is


LONGEV = 82.2 – 1.55 SCHOOL – 1.69 AGE + 0.11 SCORE + 6.88 DEPENDEN
Predictor Coef Stdev t-ratio p
Constant 82.237 81.738 1.01 0.361
SCHOOL –1.553 4.362 –0.36 0.736
AGE –1.685 1.253 –1.35 0.236
SCORE 0.110 0.291 0.38 0.720
DEPENDE 6.876 7.658 0.89 0.410
N
s = 13.4  R-sq = 89.1%

Analysis of Variance
SOURCE DF SS MS F p
Regression 4 7325.33 1831.33 10.19 0.013
Error 5 898.28 179.66
Total 9 8223.60

A B C D  26. The regression equation for these data is:


(a)  = 82.24 – 1.55X1 – 1.69X2 + 0.11X3 + 6.88X4.
(b)  = 13.40 – 1.55X1 – 1.69X2 + 0.11X3 + 6.88X4.
(c)  = 81.74 + 4.36X1 + 1.25X2 + 0.29X3 + 7.66X4.
(d)  = 82.24 – 0.36X1 – 1.35X2 + 0.38X3 + 0.90X4.
A B C D 27. How much of the variation in length of employment is explained by the regression?
(a) 94 percent.
(b) 82 percent.
(c) 89 percent.
(d) 13 percent.
A B C D 28. Suppose you wish to test whether years of school is a significant explanatory
variable for longevity. The degrees of freedom you would use would be:
(a) 4.
(b) 10.
(c) 6.
(d) 5.
sb3 ?
A B C D 29. What is the value of
(a) 13.4.
(b) 0.29.
(c) 0.38.
(d) 0.11.
A B C D 30. How many denominator degrees of freedom would there be for an F test to
determine whether this regression was significant as a whole?
(a) 5.
(a) 4.
(a) 9.
(a) 10.
A B C D 31. How many data points did the director enter?
(a) 9.
(b) 10.
(c) 18.
(d) 19.
A B C D 32. In the equation Y = A + B1 X1 + B2X2, Y is independent of X1 if:
(a) B2 = 0.
(b B2 = – 1.
(c) B1 = 1.
(d) None of these.
A B C D E 33. A normal distribution can be used to approximate the t distribution for multiple
regression whenever the degrees of freedom (n minus the number of estimated
regression coefficients) are:
(a) Less than 40.
(b) More than 10.
(c) Equal to 5.
(d) More than 50.
(e) None of these.
A B C D E 34. Because r2 = 1 – Σ(Y –/Σ(Y –r2 is equivalent to:
(a) 1 – SSR/SST.
(b) 1 – SSE/SST.
(c) 1 – SSE/SSR.
(d) 1 – SST/SSR.
(e) 1 – SST/SSE.
A B C D 35. For the multiple regression = a + b1X1 + b2X2 used to estimate Y = A + B1X1 + B2X2, the
form of a plausible confidence interval for B1 is:
(a) B1 –  B1 +
(b) B1 – tse, B1 + tse.
(c) b1 –  b1 +
(d) b1 – tse, b1 + tse.
A B C D 36. Signs of the possible presence of multicollinearity in a multiple regression are:
(a) Significant t values for the coefficients.
(b) Low standard errors for the coefficients.
(c) A sharp increase in a  t value for the coefficient of an explanatory variable
when another variable is removed from the model.
(d) All of the above.
37. ____________________ are methods for deciding which variables to include in a
regression model and the different ways in which they can be included.
38. Mathematical manipulations for converting a variable into a different form so that
we can fit regression curves are called ____________________.
39. The ____________________ is a statistic used to test the significance of a
regression as a whole.
40. A ____________________ variable takes on the values 0 and 1 to describe
qualitative data.
41. A measure of our uncertainty about the exact value of a multiple-regression co-
efficient is the ____________________ of the coefficient.
42. The coefficient of multiple determination in multiple regression measures the
____________________.
43. The significance of a multiple regression can be tested with the null hypothesis
____________________, which indicates that Y does not depend on the Xi’s.
44. The standard error se is also called the ____________________.
45. Alternating strings of consecutive ____________________ with like sign in a linear
regression model indicate that the data might better fit a curve than a straight line.

You might also like