Tables 4, 5, 7, 8, 9, 10, 13 & 14 (New Cambridge) - Graph Paper
Tables 4, 5, 7, 8, 9, 10, 13 & 14 (New Cambridge) - Graph Paper
ST104A ZA
Statistics 1
A list of formulae and extracts from statistical tables are provided after the final
question on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A handheld calculator may be used when answering questions on this paper and it
must comply in all respects with the specification given with your Admission Notice.
The make and type of machine must be clearly stated on the front cover of the answer
book.
UL19/0000 p. 1 of 21
SECTION A
p p
1. (a) Suppose
p that xp
1 = 3.2, x 2 = 0, x 3 = 2, x 4 = 5, and y1 = 2.5, y2 = 0.8,
y3 = 6, y4 = 100. Calculate the following quantities:
i=2
X i=4
X i=2
X xi
i. (xi + yi ) ii. x2i yi2 iii. y43 + .
i=1 i=3
y2
i=1 i
(6 marks)
(c) State whether the following are true or false and give a brief explanation. (No
marks will be awarded for a simple true/false answer.)
i. A boxplot is suitable for visualising a categorical variable.
ii. The events A and Ac are mutually exclusive.
iii. The median and the mean are the same value for a normal distribution.
iv. A p-value of 0.8 indicates highly significant evidence against the null
hypothesis.
v. Correlation coefficients are asymmetric.
(10 marks)
(d) Three cards are drawn at random, without replacement, from a standard deck
of 52 cards. What is the probability that they are all of the same suit (that
is, all three cards are hearts, or all are diamonds, or all are spades, or all are
clubs)?
(5 marks)
UL19/0000 p. 2 of 21
(e) The random variable X takes the values 2, 1 and 3 according to the
following probability distribution:
x 2 1 3
pX (x) 3k 2k 3k
(g) Seven students in a class received the following examination and project marks
in a subject:
Examination mark 50 80 70 40 30 75 95
Project mark 75 60 55 40 50 80 65
You want to know if students who excel in examinations in the subject also
had relatively high project marks.
i. Calculate the Spearman rank correlation.
ii. Based on your answer to part i. do you think students who score well in
examinations are also likely to have the highest project marks? Briefly
justify your answer.
(7 marks)
(h) In a simple linear regression model of the form y = ↵ + x + ", where the
dependent variable is income (in dollars) and the independent variable is age
(in years), the value of ↵ was estimated to be 203.56.
i. Interpret this estimate of ↵.
ii. Explain why such an estimate could occur if you are told that age and
income are highly correlated in the sample data used to run the regression.
(6 marks)
UL19/0000 p. 3 of 21
SECTION B
Answer two out of the three questions from this section (25 marks each).
2. (a) i. The following data reflect monthly salaries of a group of people (income),
measured in thousand pounds. Carefully construct a boxplot on the graph
paper provided to display these data.
3 2 4 8 7 19 2 5 3 4 10 12.
(8 marks)
ii. Describe the distribution of the data based on the boxplot you have drawn.
(2 marks)
iii. Name two other types of graphical displays that would be suitable to
represent the data and their distribution. Provide a justification for your
answers.
(3 marks)
Before training 12.5 9.6 10.0 11.3 9.9 11.3 10.5 10.6 12.0
After training 12.3 10.0 9.8 11.0 9.9 11.4 10.8 10.3 12.1
UL19/0000 p. 4 of 21
3. (a) A survey asked participants whether they shop online and recorded their age
to explore a potential association between the two variables. The ages were
then grouped into ‘above 30 years old’ and ‘30 years or younger’. The data are
summarised in the following table, together with row percentages.
Shop online
Age group Frequently Rarely Never Total
Above 30 years old 52 (26%) 94 (47%) 54 (27%) 200 (100%)
30 years or younger 47 (39%) 52 (43%) 21 (18%) 120 (100%)
Total 99 (31%) 146 (46%) 75 (23%) 320 (100%)
i. Based on the data in the table, and without conducting a significance test,
how would you describe the relationship between age group and shopping
online?
(4 marks)
2
ii. Calculate the statistic and use it to test for independence between the
participants age and of whether they shop online, using a 5% significance
level. What do you conclude?
(9 marks)
UL19/0000 p. 5 of 21
4. (a) In order to identify factors that influence house value, a real estate company
collected data from 12 areas of a city in the United States consisting of the
median price of houses (y), measured in tens of thousand dollars, and the
unemployment rate (x), measured in %, for each of these areas. The data are
shown in the following table.
Area A B C D E F G H I J K L
Unemployment rate (x) 5 7 5 7 8 3 2 8 11 4 5 8
Median price of houses (y) 20 22 15 17 19 25 32 13 11 23 21 23
The summary statistics for these data are:
Sum of x data: 73 Sum of the squares of x data: 515
Sum of y data: 241 Sum of the squares of y data: 5197
Sum of the products of x and y data: 1345
i. Draw a scatter diagram of these data on the graph paper provided. Label
the diagram carefully.
(4 marks)
ii. Calculate the sample correlation coefficient. Interpret your findings.
(3 marks)
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
(4 marks)
iv. Using the equation you found in iii., obtain the predicted median house
price for an area with a 10% unemployment rate. Do you think this value
is realistic? Justify your answer.
(2 marks)
(b) A survey is conducted to compare customer satisfaction in two branches of a
bank. Various customers visiting the two branches were selected randomly,
and asked if they were satisfied with the services provided by the branch. The
results of this survey are shown in the following table.
Sample size Number satisfied
Branch A 153 115
Branch B 188 120
i. You are asked to consider an appropriate hypothesis test to determine
whether there is a di↵erence between the two bank branches regarding
the proportion of satisfied customers. Test at two appropriate significance
levels and comment on your findings. Specify the test statistic you use and
its distribution under the null hypothesis.
(7 marks)
ii. State clearly any assumptions you made in part i.
(2 marks)
iii. Compute a 98% confidence interval for the proportion of satisfied
customers visiting both branches A and B combined, assuming the
respective sample sizes are proportional to population sizes.
(3 marks)
END OF PAPER
UL19/0000 p. 6 of 21
ST104a Statistics 1
Examination Formula Sheet
z test of hypothesis for a single mean (σ t test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ0
Z= √ X̄ − µ0
σ/ n T = √
S/ n
UL19/0000 p. 7 of 21
z test of hypothesis for a single z test for the difference between two means
proportion: (variances known):
P − π0 X̄1 − X̄2 − (µ1 − µ2 )
Z∼
=p Z=
π0 (1 − π0 )/n
p
σ12 /n1 + σ22 /n2
t test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
X̄1 − X̄2 − (µ1 − µ2 )
1 1
T = q 2
x̄1 − x̄2 ±tα/2, n1 +n2 −2 × sp +
Sp2 (1/n1 + 1/n2 ) n1 n2
Confidence interval endpoints for the z test for the difference between two
difference in means in paired samples: proportions:
sd P1 − P2 − (π1 − π2 )
x̄d ± tα/2, n−1 × √ Z=p
n P (1 − P ) (1/n1 + 1/n2 )
a = ȳ − bx̄
UL19/0000 p. 8 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 9 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 10 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 11 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 12 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 13 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 14 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 15 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 16 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 17 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 18 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 19 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 20 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 21 of 21