Assignment 3
Assignment 3
Assignment III
Prof. Arnab
SC 12-1 An instructor is interested in finding out how the number of students absent on a given day is
related to the mean temperature that day. A random sample of 10 days was used for the study.
The following data indicate the number of students absent (ABS) and the mean temperature
(TEMP) for each day.
ABS 8 7 5 4 2 3 5 6 8 9
TEMP 10 20 25 30 40 45 50 55 59 60
(a) State the dependent (Y) variable and the independent (X) variable.
(b) Draw a scatter diagram of these data.
(c) Does the relationship between the variables appear to be linear or curvilinear?
(d) What type of curve could you draw through the data?
(e) What is the logical explanation for the observed relationship?
12-1 What is regression analysis?
12-2 In regression analysis, what is an estimating equation?
12-3 What is the purpose of correlation analysis?
12-4 Define direct and inverse relationships.
12-5 To what does the term causal relationship refer?
12-6 Explain the difference between linear and curvilinear relationships.
12-7 Explain why and how we construct a scatter diagram.
12-8 What is multiple-regression analysis?
12-9 For each of the following scatter diagrams, indicate whether a relationship exists and, if so,
whether it is direct or inverse and linear or curvilinear.
12-10 A professor is trying to show his students the importance of quizzes even though 90 percent of
the final grade is determined by exams. He believes that the higher the quiz grade, the higher
the final grade. A random sample of 15 students in his class was selected with the data given
below:
Quiz Average Final Average
59 65
92 84
72 77
90 80
95 77
87 81
89 80
77 84
76 80
65 69
97 83
42 40
94 78
62 65
91 90
(a) State the dependent (Y) variable and the independent (X) variable.
(b) Draw a scatter diagram of these data.
(c) Does the relationship between the variables appear to be linear or curvilinear?
(d) Does the professor’s belief appear to be justified? Explain your reasoning.
12-11 William Hawkins, VP of personnel for International Motors, is working on the relationship
between a worker’s salary and absentee rate. Hawkins divided the salary range of International
into twelve grades or levels (1 being the lowest grade, 12 the highest) and then randomly
sampled a group of workers. He determined the salary grade for each worker and the number
of days that employee had missed over the last 3 years.
Salary ranking 11 10 8 5 9 9 7 3
Absences 18 17 29 36 11 26 28 35
Salary ranking 11 8 7 2 9 8 6 3
Absences 14 20 32 39 16 26 31 40
Construct a scatter diagram for these data and indicate the type of relationship.
12-12 The National Institute of Environmental Health Sciences (NIEHS) has been studying the
statistical relationships between many different variables and the common cold. One of the
variables being examined is the use of facial tissues (X) and the number of days that cold
symptoms were exhibited (Y) by seven people over a 12-month period. What relationship, if
any seems to hold between the two variables? Does this indicate any causal effect?
X 2,000 1,500 500 750 600 900 1,000
Y 60 40 10 15 5 25 30
SC 12-2 For the following set of data:
(a) Plot the scatter diagram.
(b) Develop the estimating equation that best describes the data.
(c) Predict Y for X 10, 15, 20.
X 13 6 14 11 17 9 13 17 18 12
Y 6.2 8.6 7.2 4.5 9.0 3.5 6.5 9.3 9.5 5.7
SC 12-3 Cost accountants often estimate overhead based on the level of production. At the Standard
Knitting Co., they have collected information on overhead expenses and units produced at
different plants, and want to estimate a regression equation to predict future overhead.
Overhead 191 170 272 155 280 173 234 116 153 178
Units 40 42 53 35 56 39 48 30 37 40
Basic Concepts
12-13 For the following data
(a) Plot the scatter diagram.
(b) Develop the estimating equation that best describes the data.
(c) Predict Y for X 6, 13.4, 20.5.
X 2.7 4.8 5.6 18.4 19.6 21.5 18.7 14.3
12-16 Sales of major appliances vary with the new housing market: when new home sales are good,
so are the sales of dishwashers, washing machines, driers, and refrigerators. A trade association
compiled the following historical data (in thousands of units) on major appliance sales and
housing starts:
Housing Starts Appliance Sales
(thousands) (thousands)
2.0 5.0
2.5 5.5
3.2 6.0
3.6 7.0
3.3 7.2
4.0 7.7
4.2 8.4
4.6 9.0
4.8 9.7
5.0 10.0
(a) Develop an equation for the relationship between appliance sales (in thousands) and
housing starts (in thousands).
(b) Interpret the slope of the regression line.
(c) Compute and interpret the standard error of estimate.
(d) Housing starts next year may be beyond the recorded range; estimates as high as 8.0 million
units have been predicted. Compute an approximate 90 percent prediction interval for
appliance sales, based on the previous data and the new prediction of housing starts.
12-17 During recent tennis matches, Diane has noticed that her lobs have been less than totally effective
because her opponents have been returning more of them. Some of the people she plays are quite
tall, so she was wondering whether the height of her opponent could be used to explain the number
of lobs not returned during a match. The following data were collected from five recent matches.
Opponent’s Height (H) Unreturned Lobs (L)
5.0 9
5.5 6
6.0 3
6.5 0
5.0 7
Passengers per 100 miles 800 780 780 660 640 600 620 620
Degree of arousal 39 38 16 18 41 45 25 38
1 2.6 95
2 3.7 140
3 2.4 85
4 4.5 180
5 2.6 100
6 5.0 195
7 2.8 115
8 3.0 136
9 4.0 175
10 3.4 150
(a) Find the least-squares regression line that could be used to predict sales from trainee test
scores.
(b) How much does the expected number of units sold increase for each 1-point increase in a
trainee’s test score?
(c) Use the least-squares regression line to predict the number of units that would be sold by
a trainee who received an average test score.
12-22 The city council of Bowie, Maryland, has gathered data on the number of minor traffic
accidents and the number of youth soccer games that occur in town over a weekend.
X (soccer games) 20 30 10 12 15 25 34
Y (minor accidents) 6 9 4 5 7 8 9
Percentage of Dangerous Pollutants 35.9 31.8 24.7 25.2 36.8 35.8 33.4
SC 12-4 Campus Stores has been selling the Believe It or Not: Wonders of Statistics Study Guide for 12
semesters and would like to estimate the relationship between sales and number of sections of
elementary statistics taught in each semester. The following data have been collected:
Sales (units) 33 38 24 61 52 45
Number of sections 3 7 6 6 10 12
Sales (units) 65 82 29 63 50 79
Number of sections 12 13 12 13 14 15
(a) Develop the estimating equation that best fits the data.
(b) Calculate the sample coefficient of determination and the sample coefficient of correlation.
SC 12-5 Calculate the sample coefficient of determination and the sample coefficient of correlation for
the data in Exercise SC 12-3.
12-25 What type of correlation (positive, negative, or zero) should we expect from these variables?
(a) Ability of supervisors and output of their subordinates.
(b) Age at first full-time job and number of years of education.
(c) Weight and blood pressure.
(d) College grade-point average and student’s height.
In the following exercises, calculate the sample coefficient of determination and the sample
coefficient of correlation for the problems specified.
12-26 Calculate the sample coefficient of determination and the sample coefficient of correlation for
the data in Exercise 12-17.
12-27 Calculate the sample coefficient of determination and the sample coefficient of correlation
for the data in Exercise 12-18.
12-28 Calculate the sample coefficient of determination and the sample coefficient of correlation for
the data in Exercise 12-19.
12-29 Calculate the sample coefficient of determination and the sample coefficient of correlation
for the data in Exercise 12-20.
12-30 Calculate the sample coefficient of determination and the sample coefficient of correlation for
the data in Exercise 12-21.
12-31 Bank of Lincoln is interested in reducing the amount of time people spend waiting to see a
personal banker. The bank is interested in the relationship between waiting time (Y ) in
minutes and number of bankers on duty (X). Customers were randomly selected with the data
given below:
X 2 3 5 4 2 6 1 3 4 3 3 2 4
Y 12.8 11.3 3.2 6.4 11.6 3.2 8.7 10.5 8.2 11.3 9.4 12.8 8.2
(a) Calculate the regression equation that best fits the data.
(b) Calculate the sample coefficient of determination and the sample coefficient of correlation.
12-32 Zippy Cola is studying the effect of its latest advertising campaign. People chosen at random
were called and asked how many cans of Zippy Cola they had bought in the past week and how
many Zippy Cola advertisements they had either read or seen in the past week.
X (number of ads) 3 7 4 2 0 4 1 2
Y (cans purchased) 11 18 9 4 7 6 3 8
(a) Develop the estimating equation that best fits the data.
(b) Calculate the sample coefficient of determination and the sample coefficient of
correlation.
25 3.5 5.0
30 6.7 4.2
11 1.5 8.5
22 0.3 1.4
27 4.6 3.6
19 2.0 1.3
SC 13-2 The following information has been gathered from a random sample of apartment renters in a
city. We are trying to predict rent (in dollars per month) based on the size of the apartment
(number of rooms) and the distance from downtown (in miles).
Rent Number of Distance from
($) Rooms Downtown
360 2 1
1,000 6 1
450 3 2
525 4 3
350 2 10
300 1 4
(a) Calculate the least-squares equation that best relates these three variables.
(b) If someone is looking for a two-bedroom apartment 2 miles from downtown, what rent
should he expect to pay?
13-7 Given the following set of data
(a) Calculate the multiple-regression plane.
(b) Predict Y when X1 10.5 and X2 13.6
Y X1 X2
10 8 4
17 21 9
18 21 11
26 17 20
35 36 13
8 9 28
6 1 3
10 3 –1
9 2 4
14 –2 7
7 3 2
5 6 –4
13-10 Sam Spade, owner and general manager of the Campus Stationery Store, is concerned about
the sales behavior of a compact cassette tape recorder sold at the store. He realizes that there
are many factors that might help explain sales, but believes that advertising and price are major
determinants. Sam has collected the following data:
Sales Advertising Price
(units sold) (number of ads) ($)
33 3 125
61 6 115
70 10 140
82 13 130
17 9 145
24 6 140
(a) Calculate the least-squares equation to predict sales from advertising and price.
(b) If advertising is 7 and price is $132, what sales would you predict?
13-11 A developer of food for pigs would like to determine what relationship exists among the age
of a pig when it starts receiving a newly developed food supplement, the initial weight of the
pig, and the amount of weight it gains in a 1-week period with the food supplement. The
following information is the result of a study of eight piglets:
X1 X2 Y
Piglet Initial Weight Initial Age Weight
Number (Pounds) (Weeks) Gain
1 39 8 7
2 52 6 6
3 49 7 8
4 46 12 10
5 61 9 9
6 35 6 5
7 25 7 3
8 55 4 4
(a) Calculate the least-squares equation that best describes these three variables.
(b) How much might we expect a pig to gain in a week with the food supplement if it were
9 weeks old and weighed 48 pounds?
13-12 A graduate student trying to purchase a used Neptune car has researched the prices. She
believes the year of the car and the number of miles the car has been driven both influence the
purchase price. Data are given below for 10 cars with the price (Y) in thousands of dollars, year
(X1) and miles driven (X2) in thousands.
(a) Calculate the least-squares equation that best relates these three variables.
(b) The student would like to purchase a 1991 Neptune with about 40,000 miles on it. How
much do you predict she will pay?
(Y ) X2
Price X1 Miles
($ thousands) Year (thousands)
2.99 1987 55.6
13-13 The Federal Reserve is performing a preliminary study to determine the relationship between
certain economic indicators and annual percentage change in the gross national product (GNP).
Two such indicators being examined are the amount of the federal government’s deficit (in
billions of dollars) and the Dow Jones Industrial Average (the mean value over the year). Data
for 6 years follow:
Y X1 X2
Change in GNP Federal Deficit Dow Jones
3.0 80 2,700
(a) Calculate the least-squares equation that best describes the data.
(b) What percentage change in GNP would be expected in a year in which the federal deficit
was $240 billion and the mean Dow Jones value was 3,000?
SC13-3 Pam Schneider owns and operates an accounting firm in Ithaca, New York. Pam feels that it
would be useful to be able to predict in advance the number of rush income-tax returns during
the busy March 1 to April 15 period so that she can better plan her personnel needs during this
time. She has hypothesized that several factors may be useful in her prediction. Data for these
factors and number of rush returns for past years are as follows:
X1 X2 X3 Y
Economic Population within Average Income Number of Rush Returns,
Index 1 Mile of Office in Ithaca March 1 to April 15
(a) Use the following Minitab output to determine the best-fitting regression equation for these
data:
The regression equation is
Y -1275 17.1 X1 0.541 X2 - 0.174 X3
Predictor Coef Stdev t-ratio p
Constant -1275 2699 -0.47 0.719
X1 17.059 6.908 2.47 0.245
X2 0.5406 0.3144 1.72 0.335
X3 -0.1743 0.1005 -1.73 0.333
s 396.1 R - sq 87.2%
(b) What percentage of the total variation in the number of rush returns is explained by this
equation?
(c) For this year, the economic index is 169, the population within 1 mile of the office is 10,212,
and the average income in Ithaca is $26,925. How many rush returns should Pam expect
to process between March 1 and April 15?
13-14 Given the following set of data, use whatever computer package is available to find the best-
fitting regression equation and answer the following:
(a) What is the regression equation?
(b) What is the standard error of estimate?
(c) What is R2 for this regression?
(d) What is the predicted value for Y when X1 5.8, X2 4.2, and X3 5.1?
Y X1 X2 X3
13-15 Given the following set of data, use whatever computer package is available to find the best-
fitting regression equation and answer the following:
(a) What is the regression equation?
(b) What is the standard error of estimate?
(c) What is R2 for this regression?
(d) Give an approximate 95 percent confidence interval for the value of Y when the values of
X1, X2, X3, and X4 are 52.4, 41.6, 35.8, and 3, respectively.
X1 X2 X3 X4 Y
21.4 62.9 21.9 –2 22.8
13-16 Police stations across the country are interested in predicting the number of arrests they can
expect to process each month so as to better schedule office employees. Historically, the
average number of arrests (Y) each month is influenced by the number of officers on the police
force (X1), the population of the city in thousands (X2), and the percentage of unemployed
people in the city (X3). Data for these factors in 15 cities are presented below.
(a) Using whatever computer package is available, determine the best-fitting regression
equation for these data.
(b) What percentage of the total variation in the number of arrests (Y) is explained by this
equation?
(c) The ChapelBoro police department is trying to predict the number of monthly arrests.
ChapelBoro has a population of 75,000, a police force of 82, and an unemployment
percentage of 10.5 percent. How many arrests do you predict for each month?
Monthly Average Number of Officers Size of the City (X2) Percentage
Number of Arrests (Y) on the Force (X1) in Thousands Unemployed (X3)
13-17 We are trying to predict the annual demand for widgets (DEMAND) using the following
independent variables.
PRICE price of widgets (in $)
INCOME consumer income (in $)
SUB price of a substitute commodity (in $)
(Note: A substitute commodity is one that can be substituted for another commodity. For
example, margarine is a substitute commodity for butter.)
Data have been collected from 1982 to 1996:
Year Demand Price ($) Income ($) Sub ($)
1982 40 9 400 10
1983 45 8 500 14
1984 50 9 600 12
1985 55 8 700 13
1986 60 7 800 11
1987 70 6 900 15
1988 65 6 1,000 16
1989 65 8 1,100 17
1990 75 5 1,200 22
1991 75 5 1,300 19
1992 80 5 1,400 20
1994 90 4 1,600 18
1995 95 3 1,700 24
1996 85 4 1,800 21
(a) Using whatever computer package is available, determine the best-fitting regression
equation for these data.
(b) Are the signs ( or) of the regression coefficients of the independent variables as one
would expect? Explain briefly. (Note: This is not a statistical question; you just need to
think about what the regression coefficients mean.)
(c) State and interpret the coefficient of multiple determination for this problem.
(d) State and interpret the standard error of estimate for this problem.
(e) Using the equation, what would you predict for DEMAND if the price of widgets was $6,
consumer income was $1,200, and the price of the substitute commodity was $17?
13-18 Bill Buxton, a statistics professor in a leading business school, has a keen interest in factors
affecting students’ performance on exams. The midterm exam for the past semester had a
wide distribution of grades, but Bill feels certain that several factors explain the distribution:
He allowed his students to study from as many different books as they liked, their IQs vary,
they are of different ages, and they study varying amounts of time for exams. To develop a
predicting formula for exam grades, Bill asked each student to answer, at the end of the exam,
questions regarding study time and number of books used. Bill’s teaching records already
contained the IQs and ages for the students, so he compiled the data for the class and ran a
multiple regression with Minitab. The output from Bill’s computer run was as follows:
Predictor Coef Stdev t-ratio p
Constant -49.948 41.55 -1.20 0.268
HOURS 1.06931 0.98163 1.09 0.312
IQ 1.36460 0.37627 3.63 0.008
BOOKS 2.03982 1.50799 1.35 0.218
AGE -1.79890 0.67332 -2.67 0.319
s 11.657 R - sq 76.7%
(a) Using whatever computer package is available, determine the best-fitting regression
equation for these data.
(b) What percentage of the total variation in the number of millions of tons of steel sold by
Allegheny each year is explained by this equation?
(c) How many tons of steel should Allegheny expect to sell in a year in which the inflation
rate is 7.1, American automakers are planning to produce 6.0 million cars, and the average
imported price undercut per ton is $3.50?