Topic 8 Solutions
Topic 8 Solutions
Q1
c) r = 1
Q2
c) Neither of the above is necessarily true
Q3
a) X
Q4
b) Z
Q5
c) Neither of the above
Q6
a) & b) & c)
8
7
Consumption of fuel‐oil (litres)
6 ^
Y = ‐0.2822x + 5.2711
5
0
‐10 ‐5 0 5 10 15 20
Daily temperature (in Centigrade)
e) r = -0.992. the linear relationshiop between daily temperature and daily fuel
consumption is negative and very strong.
Q7
a) b 0 = 51.915, if we apply the regression equation at X=0, the average assessed value will
be 51.915 hundred thousands dollars. However, it is obviously impossible and
meaningless.
b 1 = 16.633, for each additional one thousand square feet in gross area, the estimated
assessed value increases by 16.633 hundred thousands dollars.
Q8
a) Slope = b1 =0.5719. For each additional thousand units increase in reported newsstand
sales, the mean audited sales will increase by an estimated 0.5719 thousand units.
Q9
b) b 0 = 177.1208, if we apply the regression equation at X=0, the expected monthly rental
will be $177.1208. However, it is obviously impossible and meaningless. Therefore,
177.1208 has no practical interpretation.
b 1 = 1.0651, for each increase of 1 square foot in space, the expected monthly rental is
estimated to increase by $1.065.
d) An apartment with 500 square feet is outside the relevant range for the independent
variable. So, it is not appropriate to use the model to predict the monthly rent for
apartments that have 500 square feet
e) From the Excel Output, b1 = 1.065, S b1 =0.1376
Let 1 be the population slope in the population linear regression equation
H 0 : 1 0
H 1 : 1 0 (exists a linear relationship)
0.05 , Critical Value t / 2,n2 t 0.025, 23 2.0687
Reject H 0 if t 2.0687 or t 2.0687
b 0 1.065 0
t 1 = 7.7398
S b1 0.1376
Since t = 7.7398 > 2.0687, we reject H 0 at =0.05.
There is sufficient evidence that there exists a linear relationship between the size of the
apartment and the monthly rent.
Q10
a) From the Excel output,
a = 1.646 b = 0.104
Y = 1.646 + 0.104 X (where Y is earnings and X is sales)
a = if we apply the regression equation at X =0, the average earnings will be 1.646
million dollars. However, it is obviously impossible and meaningless.
In fact, the regression equation can only be applied for sales between 10.4 to 71.7 million
dollars
b = for each increase of 1 million dollars in sales, earnings will have an average increase
of 0.104 million dollars.
b) From the Excel output, sample coefficient of correlation between sales &
Earnings = r = 0.888
Let 1 be the population slope coefficient
H 0 : 1 0
H 1 : 1 0
0.05
Reject H 0 if p-value < 0.05
From the output, t-Stat=6.091, and p-value = 0.000117/2 = 0.0000585 <0.05
We reject H 0 . There is sufficient evidence that there exists a positive linear
relationship between the sales and earnings.
d) X =35, Y = 1.646+0.104(35) = 5.286 million dollars
e) From the Excel Output, b1 = 0.1039, S b1 =0.0171
H 0 : 1 0.15
H 1 : 1 0.15
0.05 Critical Value= t ,n 2 t 0.05,10 1.8125
Reject H 0 if t 1.8125
b 1
t 1
S b1
0.1039 0.15
t
0.0171
t 2.6959 1.8125
We reject H 0 . There is sufficient evidence that the population slope in the regression
model is not 0.15.
Q11
a) 𝑌 = 2.0289+0.04643X
b) Slope= 0.04643
For each increase of purchasing frequency, the average purchase quantity per buyer will
have an average increase of 0.04643kg
e) 𝑅 0.419
41.9% of variation in y (the average quantity per buyer) can be explained by the
variability in x (purchasing frequency).
As the is 𝑅 is not very large, it shows that this model is not very good to predict y (the
average quantity per buyer) or explain by x (purchasing frequency).
Q12
^
a) Y = 15-2X
Y(X=2) = 15-2(2) = 11
Y(X=4) = 15-2(4) = 7
Y(X=6) = 15-2(6) = 3
^
b) SSE = (Yi Yi ) 2 = (10-11) 2 +(12-11) 2 +(6-7) 2 +(8-7) 2 +(1-3) 2 +(5-3) 2 =12
c) Y = (10+12+6+8+1+5)/6=7
SST = (Yi Y ) 2 = (10-7) 2 + (12-7) 2 + (6-7) 2 +(8-7) 2 +(1-7) 2 +(5-7) 2 =76
d) 𝑅 1 1 0.8421
Interpretation: 84.21% of the variation in Y can be explained by the variability in X
Q13
a) Ĉost = 16593.65 + 2600.68 NUMSTORES
c) Suppose the population linear regression equation relating COST and NUMSTORE is
Ĉost = 0 + 1 NUMSTORE+
Ho : 1 3000
H 1 : 1 3000
=0.05, n=14, d.f. =n-2=12, reject H0 if t < -1.7823
b 1 2600.68 3000 399.32
t 1 1.49
sb1 267.66 267.66
Since t = -1.49 (> -1.7823), Do not reject Ho. There is insufficient evidence that each
new store adds is less than 3000 USD of cost.
d) According to 𝑅 of the two regression models, NUMSTORE explained 88.72% of
variations in COST, STORESIZE explained 42.09% of variations in COST. Therefore,
NUMSTORE is more useful to explain the variations in store setup cost.
Q14
^
a) SALARY = 4.5538+2.6708*GPA
Intercept: If applying the regression equation at GPA=0, the average starting salary is
$4553.8.
However, the regression equation can only be applied for GPA within 2.23,3.85
Slope: For each increase of one unit in GPA, the starting salary will have an average
increase of $ 2670.8
b) It is inappropriate for predicting the starting salary by this regression equation since
GPA=4.0 is not within 2.23,3.85
H 0: 0
1
H1:: > 0
1
b) When X 55 ,
Yˆ 250 2(55) 140 defective parts
d)
Xi Yi (Yi Yˆi ) 2 (Yi Y )
2
Ŷi
20 220 210 100 1764
20 200 210 100 484
40 160 170 100 324
60 130 130 0 2304
40 180 170 100 4
SSE 100 100 100 0 100 400
220 200 160 130 180
Y 178
5
SST 1764 484 324 2304 4 4880
e) 𝑅 1 1 0.9180
1 𝑅 = 1-0.9180 = 0.0820 = 8.2% of total variation is unexplained by the estimated
regression equation
c) For each increase of one additional mile in DISTANCE, the estimated average amount
of DAMAGE will increase by 4918.7 dollars.
d) H0: 1 0
H1: 1 0
Use t- test for population slope 1
Reject H0 if p-value < = 0.05
b1 0 4.9187 0
t 12.5254 (from Excel output)
Sb1 0.3927
p-value = 2 P (t 12.5254) 1.2478E-08 . (from Excel output)
Since p-value = 1.2478E-08 < 0.05 , we reject H0.
There is sufficient evidence that there is linear relationship between DISTANCE and
DAMAGE
e) Since
.
𝑅 1 1 0.9235
.
92.35% of the variation in amount of DAMAGE can be explained by the variability in
the DISTANCE.
f) Yes. Since the coefficient of determination is very high. It indicates that the regression
line fits the data set very well.
Or: DISTANCE is a good predictor for the amount of DAMAGE.