CS1B CH 11 Exercises V02
CS1B CH 11 Exercises V02
11
Linear Regression
Exercises
Data requirements
These exercises require the following data file:
• baby weights.txt
• growth.csv
Exercise 11.01
There is currently no Exercise 11.01.
However, you may wish to revisit the exercises from the data analysis chapter if it has been a
while since you looked at them. The exercises in this chapter will assume you are able to recall
and use the R code from that chapter.
Exercise 11.02
A new computerised ultrasound scanning technique has enabled doctors to monitor the weights
of unborn babies. The table below shows the estimated weights for one particular baby at
fortnightly intervals during the pregnancy.
Estimated baby weight (kg) 1.6 1.7 2.5 2.8 3.2 3.5
(i) (a) Load the data in the file ‘baby weights.txt’, and store it in the data frame baby.
(b) Plot a labelled scattergraph of the data and add a red dashed regression line onto
your scatterplot.
(iv) Add blue points to the scatterplot to show the fitted values.
(v) Obtain the expected baby’s weight at 42 weeks (assuming it hasn’t been born by then):
Exercise 11.03
This question uses the ‘baby weights’ linear regression model, model1, of weight on gestation
period, created in an earlier exercise.
(i) Obtain the total sum of squares in the baby weights model together with its split between
the residual sum of squares and the regression sum of squares:
(b) from first principles using the functions sum, mean, fitted and residuals.
(iii) Obtain the correlation coefficient from the extracted coefficient of determination.
Exercise 11.04
This question uses the ‘baby weights’ linear regression model, model1, of weight on gestation
period, created in an earlier exercise.
(iii) Extract the estimated value of beta, the standard error of beta and the degrees of
freedom and store them in the objects b, se and dof.
(iv) Using the objects created in part (iii), use a first principles approach to:
(b) obtain the statistic and p-value for a test of H0 : β = 0.25 vs H1 : β < 0.25 .
(c) obtain the statistic and p-value for a test of H0 : β = 0.18 vs H1 : β ≠ 0.18 .
Exercise 11.05
This question uses the ‘baby weights’ linear regression model, model1, of weight on gestation
period, created in an earlier exercise.
(i) Obtain the results of an F-test to test the ‘no linear relationship’ hypothesis using the:
(ii) Calculate the F statistic and p-value from first principles by extracting the mean sum of
squares and degrees of freedom from the ANOVA table.
(iii) Obtain a 95% confidence interval for the error variance, σ 2 , from first principles.
Exercise 11.06
This question uses the ‘baby weights’ linear regression model, model1, of weight on gestation
period, created in an earlier exercise.
(iv) Obtain a 99% confidence interval for the mean weight of a baby at 0 weeks:
(a) the mean weights of babies at 20, 21, 22, 23, 24 weeks
(b) 95% confidence intervals for the mean weight of a baby at 20, 21, 22, 23, 24
weeks.
Exercise 11.07
This question uses the ‘baby weights’ linear regression model, model1, of weight on gestation
period, created in an earlier exercise.
(ii) (a) Obtain a plot of the residuals against the fitted values.
(b) Comment on the constancy of the variance and whether a linear model is
appropriate.
(iv) Examine the final two graphs obtained by plot(model1) and comment.
Exercise 11.08
Part (i) of this question uses the ‘baby weights’ linear regression model, model1, of weight on
gestation period, created in an earlier exercise.
(i) (a) Obtain a new linear regression model, model2, based on the data without the
second data point (gestation of 32 weeks).
(b) By examining the new value of R2 comment on the fit of model2 compared to
that of model1 which had R2 = 0.9689 .
x 1 2 3 4 5 6 7 8 9 10
y 0.33 0.51 0.75 1.16 1.90 2.59 5.14 7.39 11.3 17.4
(a) Load the csv file and store it in the data frame growth.
(iv) (a) Obtain estimates for the slope and intercept parameters for model3.
(b) Add a red dashed regression line to your scatterplot of lny vs x from part (ii)(c).
(b) Re-plot the scatterplot of y vs x and this time add blue points to the scatterplot
to show the fitted values of y using model3.
(c) Add a dashed red regression curve that passes through the fitted points.
(vi) Obtain a 95% confidence interval for the mean value of y when x = 8.5 .