0% found this document useful (0 votes)
130 views21 pages

Tables 4, 5, 7, 8, 9, 10, 13 & 14 (New Cambridge) - Graph Paper

Uploaded by

cookieproductor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views21 pages

Tables 4, 5, 7, 8, 9, 10, 13 & 14 (New Cambridge) - Graph Paper

Uploaded by

cookieproductor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

THIS PAPER IS NOT TO BE REMOVED FROM THE EXAMINATION HALL

ST104A ZA

BSc DEGREES AND GRADUATE DIPLOMAS IN ECONOMICS, MANAGEMENT,


FINANCE AND THE SOCIAL SCIENCES, THE DIPLOMA IN ECONOMICS AND
SOCIAL SCIENCES AND THE CERTIFICATE IN EDUCATION IN SOCIAL
SCIENCES

Statistics 1

Monday 13 May 2019 : 10.00 – 12.00

Time allowed: 2 hours

DO NOT TURN OVER UNTIL TOLD TO BEGIN

Candidates should answer THREE of the following FOUR questions: QUESTION 1


of Section A (50 marks) and TWO questions from Section B (25 marks each).
Candidates are strongly advised to divide their time accordingly.

A list of formulae and extracts from statistical tables are provided after the final
question on this paper.

Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.

A handheld calculator may be used when answering questions on this paper and it
must comply in all respects with the specification given with your Admission Notice.
The make and type of machine must be clearly stated on the front cover of the answer
book.

Tables 4, 5, 7, 8, 9, 10, 13 & 14 (New


Cambridge). Graph paper

© University of London 2019

UL19/0000 p. 1 of 21
SECTION A

Answer all parts of question 1 (50 marks in total).

p p
1. (a) Suppose
p that xp
1 = 3.2, x 2 = 0, x 3 = 2, x 4 = 5, and y1 = 2.5, y2 = 0.8,
y3 = 6, y4 = 100. Calculate the following quantities:

i=2
X i=4
X i=2
X xi
i. (xi + yi ) ii. x2i yi2 iii. y43 + .
i=1 i=3
y2
i=1 i

(6 marks)

(b) Classify each one of the following variables as either measurable


(continuous) or categorical. If a variable is categorical, further classify it as
either nominal or ordinal. Justify your answer. (No marks will be awarded
without a justification.)
i. Clothing sizes of ‘small’, ‘medium’ and ‘large’.
ii. The inflation rate of a country.
iii. A passport’s country of issue.
(6 marks)

(c) State whether the following are true or false and give a brief explanation. (No
marks will be awarded for a simple true/false answer.)
i. A boxplot is suitable for visualising a categorical variable.
ii. The events A and Ac are mutually exclusive.
iii. The median and the mean are the same value for a normal distribution.
iv. A p-value of 0.8 indicates highly significant evidence against the null
hypothesis.
v. Correlation coefficients are asymmetric.
(10 marks)
(d) Three cards are drawn at random, without replacement, from a standard deck
of 52 cards. What is the probability that they are all of the same suit (that
is, all three cards are hearts, or all are diamonds, or all are spades, or all are
clubs)?
(5 marks)

UL19/0000 p. 2 of 21
(e) The random variable X takes the values 2, 1 and 3 according to the
following probability distribution:
x 2 1 3
pX (x) 3k 2k 3k

i. Explain why k = 0.125 and write down the probability distribution of X.


ii. Find E(X), the expected value of X.
iii. Find Var(X), the variance of X.
(6 marks)
(f) Briefly explain two advantages of longitudinal surveys.
(4 marks)

(g) Seven students in a class received the following examination and project marks
in a subject:
Examination mark 50 80 70 40 30 75 95
Project mark 75 60 55 40 50 80 65
You want to know if students who excel in examinations in the subject also
had relatively high project marks.
i. Calculate the Spearman rank correlation.
ii. Based on your answer to part i. do you think students who score well in
examinations are also likely to have the highest project marks? Briefly
justify your answer.
(7 marks)

(h) In a simple linear regression model of the form y = ↵ + x + ", where the
dependent variable is income (in dollars) and the independent variable is age
(in years), the value of ↵ was estimated to be 203.56.
i. Interpret this estimate of ↵.
ii. Explain why such an estimate could occur if you are told that age and
income are highly correlated in the sample data used to run the regression.
(6 marks)

UL19/0000 p. 3 of 21
SECTION B

Answer two out of the three questions from this section (25 marks each).

2. (a) i. The following data reflect monthly salaries of a group of people (income),
measured in thousand pounds. Carefully construct a boxplot on the graph
paper provided to display these data.

3 2 4 8 7 19 2 5 3 4 10 12.

(8 marks)
ii. Describe the distribution of the data based on the boxplot you have drawn.
(2 marks)
iii. Name two other types of graphical displays that would be suitable to
represent the data and their distribution. Provide a justification for your
answers.
(3 marks)

(b) A new training programme is designed to improve the performance of


100-metre runners. A random sample of nine 100-metre runners were trained
according to this programme and, in order to assess its e↵ectiveness, they
participated in a run before and after completing this training programme. The
times (in seconds) for each runner were recorded and are shown below. The
aim is to determine whether this training programme is e↵ective in reducing
the average times of the runners.

Before training 12.5 9.6 10.0 11.3 9.9 11.3 10.5 10.6 12.0
After training 12.3 10.0 9.8 11.0 9.9 11.4 10.8 10.3 12.1

i. Carry out an appropriate hypothesis test at two appropriate significance


levels to determine whether this training programme is e↵ective at reducing
the average times of the runners. State the test hypotheses, and specify
your test statistic and its distribution under the null hypothesis. Comment
on your findings.
(6 marks)
ii. State any assumptions you made in part i.
(2 marks)
iii. Compute an 80% confidence interval for the di↵erence in the means of the
times.
(2 marks)
iv. On the basis of the data alone, would you recommend this training
programme to a runner? Explain why or why not.
(2 marks)

UL19/0000 p. 4 of 21
3. (a) A survey asked participants whether they shop online and recorded their age
to explore a potential association between the two variables. The ages were
then grouped into ‘above 30 years old’ and ‘30 years or younger’. The data are
summarised in the following table, together with row percentages.

Shop online
Age group Frequently Rarely Never Total
Above 30 years old 52 (26%) 94 (47%) 54 (27%) 200 (100%)
30 years or younger 47 (39%) 52 (43%) 21 (18%) 120 (100%)
Total 99 (31%) 146 (46%) 75 (23%) 320 (100%)

i. Based on the data in the table, and without conducting a significance test,
how would you describe the relationship between age group and shopping
online?
(4 marks)
2
ii. Calculate the statistic and use it to test for independence between the
participants age and of whether they shop online, using a 5% significance
level. What do you conclude?
(9 marks)

(b) i. Define simple random sampling and stratified random sampling.


(4 marks)
ii. Why might a researcher prefer to take a stratified random sample rather
than a simple random sample? Give two reasons.
(3 marks)
iii. You have been asked to design a nationwide survey in your country to find
out about the smoking habits of adults. Give two stratification factors you
might use, and explain why you have chosen them.
(5 marks)

UL19/0000 p. 5 of 21
4. (a) In order to identify factors that influence house value, a real estate company
collected data from 12 areas of a city in the United States consisting of the
median price of houses (y), measured in tens of thousand dollars, and the
unemployment rate (x), measured in %, for each of these areas. The data are
shown in the following table.
Area A B C D E F G H I J K L
Unemployment rate (x) 5 7 5 7 8 3 2 8 11 4 5 8
Median price of houses (y) 20 22 15 17 19 25 32 13 11 23 21 23
The summary statistics for these data are:
Sum of x data: 73 Sum of the squares of x data: 515
Sum of y data: 241 Sum of the squares of y data: 5197
Sum of the products of x and y data: 1345
i. Draw a scatter diagram of these data on the graph paper provided. Label
the diagram carefully.
(4 marks)
ii. Calculate the sample correlation coefficient. Interpret your findings.
(3 marks)
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
(4 marks)
iv. Using the equation you found in iii., obtain the predicted median house
price for an area with a 10% unemployment rate. Do you think this value
is realistic? Justify your answer.
(2 marks)
(b) A survey is conducted to compare customer satisfaction in two branches of a
bank. Various customers visiting the two branches were selected randomly,
and asked if they were satisfied with the services provided by the branch. The
results of this survey are shown in the following table.
Sample size Number satisfied
Branch A 153 115
Branch B 188 120
i. You are asked to consider an appropriate hypothesis test to determine
whether there is a di↵erence between the two bank branches regarding
the proportion of satisfied customers. Test at two appropriate significance
levels and comment on your findings. Specify the test statistic you use and
its distribution under the null hypothesis.
(7 marks)
ii. State clearly any assumptions you made in part i.
(2 marks)
iii. Compute a 98% confidence interval for the proportion of satisfied
customers visiting both branches A and B combined, assuming the
respective sample sizes are proportional to population sizes.
(3 marks)

END OF PAPER

UL19/0000 p. 6 of 21
ST104a Statistics 1
Examination Formula Sheet

Expected value of a discrete random Standard deviation of a discrete random


variable: variable:
v
N uN
X √
µ = E(X) = p i xi
uX
2
σ= σ =t pi (xi − µ)2
i=1 i=1

The transformation formula: Finding Z for the sampling distribution


of the sample mean:
X −µ
Z=
σ X̄ − µ
Z= √
σ/ n

Finding Z for the sampling distribution Confidence interval endpoints for a


of the sample proportion: single mean (σ known):
P −π σ
Z=p x̄ ± zα/2 × √
π(1 − π)/n n

Confidence interval endpoints for a Confidence interval endpoints for a


single mean (σ unknown): single proportion:
s r
x̄ ± tα/2, n−1 × √ p (1 − p)
n p ± zα/2 ×
n

Sample size determination for a mean: Sample size determination for a


proportion:
(zα/2 )2 σ 2
n≥ (zα/2 )2 p (1 − p)
e2
n≥
e2

z test of hypothesis for a single mean (σ t test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ0
Z= √ X̄ − µ0
σ/ n T = √
S/ n

UL19/0000 p. 7 of 21
z test of hypothesis for a single z test for the difference between two means
proportion: (variances known):
P − π0 X̄1 − X̄2 − (µ1 − µ2 )
Z∼
=p Z=
π0 (1 − π0 )/n
p
σ12 /n1 + σ22 /n2

t test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s 
X̄1 − X̄2 − (µ1 − µ2 )

1 1
T = q 2
x̄1 − x̄2 ±tα/2, n1 +n2 −2 × sp +
Sp2 (1/n1 + 1/n2 ) n1 n2

Pooled variance estimator: t test for the difference in means in


paired samples:
(n1 − 1)S12 + (n2 − 1)S22
Sp2 =
n1 + n2 − 2 X̄d − µd
T = √
Sd / n

Confidence interval endpoints for the z test for the difference between two
difference in means in paired samples: proportions:
sd P1 − P2 − (π1 − π2 )
x̄d ± tα/2, n−1 × √ Z=p
n P (1 − P ) (1/n1 + 1/n2 )

Pooled proportion estimator: Confidence interval endpoints for the


difference between two proportions:
R1 + R2
P = s
n1 + n2 p1 (1 − p1 ) p2 (1 − p2 )
p1 −p2 ±zα/2 × +
n1 n2

χ2 statistic for test of association: Sample correlation coefficient:


r X
c n
X (Oij − Eij )2 P
xi yi − nx̄ȳ
Eij r = s i=1
i=1 j=1
n n
 
x2i − nx̄2 yi2 − nȳ 2
P P
i=1 i=1

Spearman rank correlation: Simple linear regression line estimates:


n
n P
6
P
d2i xi yi − nx̄ȳ
i=1
rs = 1 − i=1 b= n
n(n2 − 1) x2i − nx̄2
P
i=1

a = ȳ − bx̄

UL19/0000 p. 8 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 9 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 10 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 11 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 12 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 13 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 14 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 15 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 16 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 17 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 18 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 19 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 20 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL19/0000 p. 21 of 21

You might also like