MATH1541-WE01 Statistics I May 2016
MATH1541-WE01 Statistics I May 2016
MATH1541-WE01 Statistics I May 2016
Title:
Statistics
Instructions to Candidates: Credit will be given for the best SIX answers.
All questions carry the same marks.
This is an open-book examination: you may keep one folder of
notes at your desk.
Revision:
ED01/2016
University of Durham Copyright
Page number Exam code
2 of 8 MATH1541-WE01
2. A new and cheaper insulating material was tested for electrical resistance. Forty
resistance measurements on the material were made, and the results shown below:
14.0 15.6 16.1 16.8 16.9 17.0 17.1 17.2 17.7 17.7
17.9 18.1 18.4 18.5 18.6 18.9 19.1 19.1 19.1 19.2
19.2 19.4 19.5 19.7 20.0 20.4 20.4 20.7 20.8 21.0
21.1 21.3 21.5 22.0 22.4 23.0 23.6 24.7 26.0 30.3
(a) Construct a stem and leaf plot of the 40 measurements, choosing an appropriate
class width.
(b) State clearly the endpoint convention you have used and describe the shape of
the distribution.
(c) Construct a box-plot of the data, making the usual modification to show po-
tential outliers. Show your working.
(d) What effect would you expect on each of the following if any potential out-
liers were moved to just fall within the range of non-outlying observations:
(i) median, (ii) mean, (iii) inter-quartile range, and (iv) standard deviation?
Explain.
ED01/2016 CONTINUED
University of Durham Copyright
Page number Exam code
3 of 8 MATH1541-WE01
3. The following data set is extracted from the book Weisberg, S. (2005). Applied
Linear Regression, 3rd edition. New York: Wiley.
Data were collected in an experiment in which rats were injected with a dose of a
drug approximately proportional to body weight. At the end of the experiment, the
animal’s liver was weighed, and the fraction of the drug recovered in the liver was
recorded. The response variable, y, is the amount of drug recovered from the liver.
The experimenter expected y to be unrelated to the predictors.
A pairs plot of the data is shown on page 4 of this exam paper. The data are shown
on page 5 of this exam paper as part of the R output.
Multiple regression was used to construct an equation for predicting y. The regres-
sion output from R is shown on page 5 of this exam paper, together with standard
deviations for the variables in the data set.
Do not attempt this question until you have seen the pairs plot and R
output on pages 4 and 5 respectively.
ED01/2016 CONTINUED
University of Durham Copyright
Page number Exam code
4 of 8 MATH1541-WE01
● ● ●
● ● ●
190
● ● ●
● ● ●
● ● ●
● ● ●
BodyWt ● ● ● ● ● ●● ● ●
170
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
150
● ● ● ●● ● ● ●
● ● ●
10
● ● ●
● ● ●
● ● ● ●● ● ● ●
9
● ● ● ●
● ● ●
● ● ● ● ● ●●
8
● ●
●● ● ●
LiverWt ●● ● ● ● ● ●●
7
● ● ● ● ●●
● ● ● ● ● ●
6
● ● ●
● ● ● ● ● ●
● ● ●
0.95
●● ● ● ● ●
● ● ●
● ● ● ● ● ●● ● ●
Dose
0.85
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
0.75
● ● ● ● ●
● ● ●
● ● ●
● ● ●
0.50
● ● ●
● ● ●
0.40
●
● ● ●
●
●
●● ●
●
●
● ● ●
●
y
● ● ● ● ● ●
● ● ●
0.30
● ● ●
● ● ● ●● ● ●● ●
● ● ●
● ● ● ● ● ●
0.20
● ● ●
ED01/2016 CONTINUED
University of Durham Copyright
Page number Exam code
5 of 8 MATH1541-WE01
> pairs(rat)
> rat
BodyWt LiverWt Dose y
1 176 6.5 0.88 0.42
2 176 9.5 0.88 0.25
3 190 9.0 1.00 0.56
4 176 8.9 0.88 0.23
5 200 7.2 1.00 0.23
6 167 8.9 0.83 0.32
7 188 8.0 0.94 0.37
8 195 10.0 0.98 0.41
9 176 8.0 0.88 0.33
10 165 7.9 0.84 0.38
11 158 6.9 0.80 0.27
12 148 7.3 0.74 0.36
13 149 5.2 0.75 0.21
14 163 8.4 0.81 0.28
15 170 7.2 0.85 0.34
16 186 6.8 0.94 0.28
17 146 7.3 0.73 0.30
18 181 9.0 0.90 0.37
19 149 6.4 0.75 0.46
Call:
lm(formula = y ~ BodyWt + LiverWt + Dose, data = rat)
Residuals:
Min 1Q Median 3Q Max
-0.100557 -0.063233 0.007131 0.045971 0.134691
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.265922 0.194585 1.367 0.1919
BodyWt -0.021246 0.007974 -2.664 0.0177 *
LiverWt 0.014298 0.017217 0.830 0.4193
Dose 4.178111 1.522625 2.744 0.0151 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> sd(rat)
BodyWt LiverWt Dose y
16.49029486 1.22288127 0.08580203 0.08846647
ED01/2016 CONTINUED
University of Durham Copyright
Page number Exam code
6 of 8 MATH1541-WE01
4. The effects of two brands of pesticides on the yield of three types of barley were
investigated. For each combination of pesticide and barley variety, two repetitions
were performed and the resulting yields for all 12 experiments are shown below.
Pesticide 1 Pesticide 2
Barley A 5.0, 7.2 3.3, 6.9
Barley B 3.6, 5.3 1.9, 1.6
Barley C 5.4, 3.2 3.0, 2.0
(a) Decompose the data into a table of group means and a table of residuals.
(b) Apply mean polish to the the table of means, clearly labelling the components
of the result.
(c) Use your decomposition of the data to draw an “effects and residuals” plot and
briefly comment on it.
(d) Calculate the analysis of variance table and comment on it.
(e) Investigate whether the assumption of homogeneity seems reasonable and sug-
gest a possible remedy if not.
5. A new blood test is developed for a rare, but often fatal disease. The test is quicker
and cheaper than existing methods (which are essentially perfect), but it is some-
times inaccurate. Trials find that the average sensitivity of the test was 96% and
the average specificity was 93%.
ED01/2016 CONTINUED
University of Durham Copyright
Page number Exam code
7 of 8 MATH1541-WE01
Analyse the data and respond to the scientists’ question. Your answer should include
clear justifications of the choices of analysis made.
ED01/2016 CONTINUED
University of Durham Copyright
Page number Exam code
8 of 8 MATH1541-WE01
Test Subject 1 2 3 4 5 6 7 8 9 10 11 12
Day 1 Score (no drug) 51.3 73.8 80.8 75.9 68.5 81.5 84.8 63.5 88.1 49.4 62.3 60.7
Day 2 Score (drug) 51.0 70.1 88.8 81.5 81.2 82.0 88.5 67.4 90.7 56.5 60.2 55.5
(a) Analyse the data and address the scientific question. You must justify any
choices you have made within your analysis.
(b) Suggest any possible ways to improve the experiment.
ED01/2016 END
University of Durham Copyright