0% found this document useful (0 votes)
8 views9 pages

Topic 8 Solutions

The document discusses various aspects of simple linear regression, including calculations for slope, intercept, and predictions based on regression equations. It includes examples of regression analysis, hypothesis testing, and confidence intervals for different datasets. The document emphasizes the interpretation of coefficients and the strength of relationships between variables.

Uploaded by

tofu1002022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views9 pages

Topic 8 Solutions

The document discusses various aspects of simple linear regression, including calculations for slope, intercept, and predictions based on regression equations. It includes examples of regression analysis, hypothesis testing, and confidence intervals for different datasets. The document emphasizes the interpretation of coefficients and the strength of relationships between variables.

Uploaded by

tofu1002022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Topic 8: Simple Linear Regression Solutions

Q1
c) r = 1

Q2
c) Neither of the above is necessarily true

Q3
a) X

Q4
b) Z

Q5
c) Neither of the above

Q6
a) & b) & c)
8

7
Consumption of fuel‐oil (litres)

6 ^
Y = ‐0.2822x + 5.2711
5

0
‐10 ‐5 0 5 10 15 20
Daily temperature (in Centigrade)

d) Predicted consumpotion level, 𝑌 = -0.2822(7.5)+5.2711 = 3.1546 litres

e) r = -0.992. the linear relationshiop between daily temperature and daily fuel
consumption is negative and very strong.
Q7
a) b 0 = 51.915, if we apply the regression equation at X=0, the average assessed value will
be 51.915 hundred thousands dollars. However, it is obviously impossible and
meaningless.

b 1 = 16.633, for each additional one thousand square feet in gross area, the estimated
assessed value increases by 16.633 hundred thousands dollars.

b) Yˆ (when X = 1.75) = 51.915 + 16.633(1.75) = 81.02275


Thus, predicted assessed value is 81.02275 hundred thousands dollars

c) Coefficient of determination, 𝑅 (0.812) 2 = 0.6593


Interpretation: 65.93% of total variation in assessed value can be explained by variation
in gross area.

Q8

a) Slope = b1 =0.5719. For each additional thousand units increase in reported newsstand
sales, the mean audited sales will increase by an estimated 0.5719 thousand units.

b) yˆ  26.7240  0.5719(300) =198.294 thousands. The predicted audited newsstand


sales for a magazine that reports newsstand sales of 300,000 is 198.294 thousands

c) Let 1 be the population slope coefficient


H0: 1  0
H1: 1  0

  0.05 Critical Value  t / 2,n 2  t 0.05 / 2 ,8  2.3060


Reject H 0 if t  2.3060 or t  2.3060 or Reject H 0 if p-value < 0.05
b  0 0.5719  0
t 1 
S b1 0.0668
t  8.5613 or p-value  P (t  8.5613)  2  2.68223 E  05

Since t =8.5613 > 2.3060 or p-value  2.68223E  05  0.05 ,


We reject H 0 at   0.05
There is sufficient evidence to point out that the slope of the population regression line
is not 0.

d) 95% confidence interval for 1 :


b1  t 2,n 2 sb1
 0.5719  (2.306)(0.0668)
 [0.4179,0.7259]
We are 95% confidence that population slope 1 is between 0.4179 and 0.7259.

Q9

a) From the Excel output,


b0 = 177.1208; b1= 1.0651
yˆ  177.1208  1.0651x (where ŷ is predicted monthly rental cost for apartments and x
is the size of the apartment)

b) b 0 = 177.1208, if we apply the regression equation at X=0, the expected monthly rental
will be $177.1208. However, it is obviously impossible and meaningless. Therefore,
177.1208 has no practical interpretation.
b 1 = 1.0651, for each increase of 1 square foot in space, the expected monthly rental is
estimated to increase by $1.065.

c) yˆ  177.1208  1.0651x  177.1208  (1.0651)(1000 )  $1242 .2208

d) An apartment with 500 square feet is outside the relevant range for the independent
variable. So, it is not appropriate to use the model to predict the monthly rent for
apartments that have 500 square feet
e) From the Excel Output, b1 = 1.065, S b1 =0.1376
Let  1 be the population slope in the population linear regression equation
H 0 : 1  0
H 1 : 1  0 (exists a linear relationship)
  0.05 , Critical Value  t / 2,n2  t 0.025, 23  2.0687
Reject H 0 if t  2.0687 or t  2.0687
b  0 1.065  0
t 1  = 7.7398
S b1 0.1376
Since t = 7.7398 > 2.0687, we reject H 0 at  =0.05.
There is sufficient evidence that there exists a linear relationship between the size of the
apartment and the monthly rent.

f) 95% confidence interval for 1 :


b1  t 2,n  2 s b1  1.065  ( 2.0687 )(0.1376)  [0.7803,1.3497 ]
We are 95% confidence that population slope 1 is between 0.7803 and 1.3497.

Q10
a) From the Excel output,
a = 1.646 b = 0.104

Y = 1.646 + 0.104 X (where Y is earnings and X is sales)
a = if we apply the regression equation at X =0, the average earnings will be 1.646
million dollars. However, it is obviously impossible and meaningless.
In fact, the regression equation can only be applied for sales between 10.4 to 71.7 million
dollars
b = for each increase of 1 million dollars in sales, earnings will have an average increase
of 0.104 million dollars.

b) From the Excel output, sample coefficient of correlation between sales &
Earnings = r = 0.888
Let 1 be the population slope coefficient
H 0 : 1  0
H 1 : 1  0
  0.05
Reject H 0 if p-value < 0.05
From the output, t-Stat=6.091, and p-value = 0.000117/2 = 0.0000585 <0.05
 We reject H 0 . There is sufficient evidence that there exists a positive linear
relationship between the sales and earnings.

c) Coefficient of determination = 𝑅 = 0.788


78.8% of total variation in earnings can be explained by variation in sales.


d) X =35, Y = 1.646+0.104(35) = 5.286 million dollars
e) From the Excel Output, b1 = 0.1039, S b1 =0.0171
H 0 :  1  0.15
H 1 :  1  0.15
  0.05 Critical Value=  t ,n 2  t 0.05,10  1.8125
Reject H 0 if t  1.8125
b  1
t 1
S b1
0.1039  0.15
t
0.0171
t  2.6959  1.8125
We reject H 0 . There is sufficient evidence that the population slope in the regression
model is not 0.15.

Q11
a) 𝑌 = 2.0289+0.04643X

b) Slope= 0.04643
For each increase of purchasing frequency, the average purchase quantity per buyer will
have an average increase of 0.04643kg

c) When X=20, 𝑌 = 2.0289+0.04643(20) = 2.9575

d) Let  1 be the slope of the population regression line


H 0 : 1  0
H 1 : 1  0
  0.05
Reject H 0 if p-value < 0.05
0.04643  0
t  2.082
0.022302
p-value= 𝑃 𝑡 2.082 𝑃 𝑡 2.082 = (0.05, 0.1) > 0.05
(or directly, from the Excel output p-value = 0.0825)
Therefore we do not reject H 0 . There is insufficient evidence to point out that
the slope of the population regression line is not 0.

e) 𝑅 0.419
41.9% of variation in y (the average quantity per buyer) can be explained by the
variability in x (purchasing frequency).
As the is 𝑅 is not very large, it shows that this model is not very good to predict y (the
average quantity per buyer) or explain by x (purchasing frequency).
Q12
^
a) Y = 15-2X
Y(X=2) = 15-2(2) = 11
Y(X=4) = 15-2(4) = 7
Y(X=6) = 15-2(6) = 3
^
b) SSE = (Yi  Yi ) 2 = (10-11) 2 +(12-11) 2 +(6-7) 2 +(8-7) 2 +(1-3) 2 +(5-3) 2 =12

c) Y = (10+12+6+8+1+5)/6=7

SST = (Yi  Y ) 2 = (10-7) 2 + (12-7) 2 + (6-7) 2 +(8-7) 2 +(1-7) 2 +(5-7) 2 =76

d) 𝑅 1 1 0.8421
Interpretation: 84.21% of the variation in Y can be explained by the variability in X

e)  slope = -2, the relationship between X & Y is negative


coefficient of correlation, r = - r 2 = -0.9177

f) Let  1 be the population slope coefficient in the regression model


H 0 : 1 = 0
H 1 : 1  0
Let  = 0.05
 p-value = 0.00989 < 0.05
Thus, H 0 is rejected. There is sufficient evidence that there is a linear relationship
between X & Y. i.e. X is an important factor affecting Y

Q13
a) Ĉost = 16593.65 + 2600.68 NUMSTORES

b) Point estimate: b1  21.76


95% CI: b1  t 0.025,12 s b1 =21.76  2.1788*7.37=21.76  16.0578 = [5.7022, 37.8178]
We are 95% confident that true value of 1 is between 5.7022 and 37.8178.

c) Suppose the population linear regression equation relating COST and NUMSTORE is
Ĉost =  0 + 1 NUMSTORE+ 
Ho : 1  3000
H 1 : 1  3000
 =0.05, n=14, d.f. =n-2=12, reject H0 if t < -1.7823
b  1 2600.68  3000  399.32
t 1    1.49
sb1 267.66 267.66
Since t = -1.49 (> -1.7823), Do not reject Ho. There is insufficient evidence that each
new store adds is less than 3000 USD of cost.
d) According to 𝑅 of the two regression models, NUMSTORE explained 88.72% of
variations in COST, STORESIZE explained 42.09% of variations in COST. Therefore,
NUMSTORE is more useful to explain the variations in store setup cost.

e) r = 0.9419, there is a strong positive relationship between number of stores in an area


and store setup cost
OR b1=2600.68, for each additional store added, the store setup cost is estimated to be
increased by 2600.68USD on average.

f) Ĉost =10961.14+21.76 STORESIZE=10961.14+21.76*1000=32721.14USD.


The average total setup cost for a business area that needs 1000 square feet of store area
is estimated to be 32721.14 USD.

Q14
^
a) SALARY = 4.5538+2.6708*GPA

Intercept: If applying the regression equation at GPA=0, the average starting salary is
$4553.8.
However, the regression equation can only be applied for GPA within 2.23,3.85

Slope: For each increase of one unit in GPA, the starting salary will have an average
increase of $ 2670.8

b) It is inappropriate for predicting the starting salary by this regression equation since
GPA=4.0 is not within 2.23,3.85

c) Let  be the population slope in the population linear regression equation:


1

H 0:   0
1

H1::  > 0
1

 =0.01, Reject H0 if p-value <0.01


From the excel output, p-value =P( t  8.475337535) = 0.000375688/2
 0.0001878
Since p-value=0.0001878<0.01
We reject H0 at =0.01
There is sufficient evidence that there is a positive linear relationship between
SALARY and GPA

d) From the relationship between SALARY and GPA, 𝑅 = 0.9349


From the relationship between SALARY and IELTS, 𝑅 = 0.92 =0.8464
GPA explained 93.49% of variations in SALARY while IELTS explained 84.64%
=> GPA is more useful to explain the variations in starting salary.
Q15
a) Yˆ  250  2 X
Intercept: If applying the regression equation at line speed, X  0 , the average number
of defective parts is 250. However, the regression equation can only be applied for X
within 20,60
Slope: For each increase of one foot in line speed, the number of defective parts will have
any average decrease of 2

b) When X  55 ,
Yˆ  250  2(55)  140 defective parts

c) Let  1 be the population slope in the population linear regression equation


H 0 : 1  0 H 1 : 1  0
At   0.05 ,
Reject H 0 if t < -2.3534 or Reject H 0 if p-value < 0.05
From the Excel output,
t = -5.7966 or p-value = P(t -5.7966) = 0.0103/2 = 0.00515
Since t < -2.353 or Since p-value  0.00515  0.05 ,
we reject H 0 at   0.05
There is sufficient evidence that line speed and number of defective parts found are
negatively related.

d)
Xi Yi (Yi  Yˆi ) 2 (Yi  Y )
2
Ŷi
20 220 210 100 1764
20 200 210 100 484
40 160 170 100 324
60 130 130 0 2304
40 180 170 100 4
SSE  100  100  100  0  100  400
220  200  160  130  180
Y   178
5
SST  1764  484  324  2304  4  4880

e) 𝑅 1 1 0.9180
1 𝑅 = 1-0.9180 = 0.0820 = 8.2% of total variation is unexplained by the estimated
regression equation

f) Since the slope is negative, the correlation coefficient is also negative.


r = √𝑅 √0.9180 0.9581
which indicates there is a strong negative linear relationship between Y and X.
Q16
a) There is a strong positive correlation between DAMAGE and DISTANCE.

b) From Excel output, we can find that b0 = 10.2779 and b1 = 12.5254*0.3927=4.9187


^
 Y = 10.2779 + 4.9187 X

c) For each increase of one additional mile in DISTANCE, the estimated average amount
of DAMAGE will increase by 4918.7 dollars.

d) H0: 1  0
H1: 1  0
Use t- test for population slope 1
Reject H0 if p-value < = 0.05
b1  0 4.9187  0
t   12.5254 (from Excel output)
Sb1 0.3927
p-value = 2  P (t  12.5254)  1.2478E-08 . (from Excel output)
Since p-value = 1.2478E-08 <   0.05 , we reject H0.
There is sufficient evidence that there is linear relationship between DISTANCE and
DAMAGE

e) Since
.
𝑅 1 1 0.9235
.
92.35% of the variation in amount of DAMAGE can be explained by the variability in
the DISTANCE.

f) Yes. Since the coefficient of determination is very high. It indicates that the regression
line fits the data set very well.
Or: DISTANCE is a good predictor for the amount of DAMAGE.

g) Correlation coefficient r=0.9610

h) The 95% confidence interval of 1 is [4.0708, 5.7678] from Excel output.

You might also like