Cs1a September-23 Exam Clean-Proof
Cs1a September-23 Exam Clean-Proof
Cs1a September-23 Exam Clean-Proof
EXAMINATION
In addition to this paper you should have available the 2002 edition of
the Formulae and Tables and your own electronic calculator.
If you encounter any issues during the examination please contact the Assessment Team on
T. 0044 (0) 1865 268 873.
2 An analyst has modelled the loss (in units of £1,000) due to a fire in a supermarket
using a random variable, X, with density function:
𝑎 𝑥 20 , 0 𝑥 20
𝑓 𝑥
0, otherwise
(ii) Calculate the conditional probability that the fire loss exceeds £16,000 given
that it exceeds £8,000. [3]
[Total 5]
CS1A S2023–2
3 The probability mass function of a random variable, 𝑌, is defined as:
𝑘 𝑦 1
𝑃 𝑌 𝑦 1 𝑝 𝑝 , 𝑦 0, 1, 2, 3, …
𝑦
(i) State the distribution of the random variable 𝑌. Your answer should include an
explanation of the meaning of the distribution’s parameters. [2]
(ii) Identify which one of the following options gives the natural parameter θ, the
scale parameter φ, and the relevant functions 𝑏 θ , 𝑎 φ and 𝑐 𝑦, φ of the
exponential family for this distribution.
A θ log 1 𝑝 , φ 1, 𝑏 θ 𝑘 log 𝑝, 𝑎 φ 1,
𝑘 𝑦 1
𝑐 𝑦, φ log
𝑦
B θ log 𝑝 , φ 1, 𝑏 θ 𝑘 log 1 𝑝 ,𝑎 φ 1,
𝑘 𝑦 1
𝑐 𝑦, φ log
𝑦
C θ log 1 𝑝 , φ 𝑘, 𝑏 θ log 𝑝, 𝑎 φ 𝑘,
𝑘 𝑦 1
𝑐 𝑦, φ log
𝑦
D θ 𝑘 log 1 𝑝 , φ 1, 𝑏 θ 𝑘 log 𝑝, 𝑎 φ 1,
𝑘 𝑦 1
𝑐 𝑦, φ log .
𝑘
[3]
(iii) Derive, using the properties of the exponential family, the mean of 𝑌. [2]
(iv) Identify which one of the following options gives the variance of 𝑌:
A 𝑉𝑌
B 𝑉𝑌
C V [Y] 𝑘
D 𝑉𝑌 𝑘 .
[2]
[Total 9]
CS1A S2023–3
4 An insurance company collects and maintains a database of key policyholder data in
relation to the business that they sell.
(i) List the properties that can lead to this data being classified as ‘big data’. [2]
(ii) List two key issues related to the use of data that should be considered by this
insurance company. [1]
[Total 3]
5 X and Y are independent random variables. Let W and Z be the random variables
defined as follows:
𝑍 min 𝑋, 𝑌 , 𝑊 max 𝑋, 𝑌
that is, Z gives the smaller and W the larger of the observations of X and Y.
A 𝐹 𝑠 𝐹 𝑠 𝐹 𝑠
B 𝐹 𝑠 𝐹 𝑠 𝐹 𝑠
C 𝐹 𝑠 𝐹 𝑠 𝐹 𝑠
D 𝐹 𝑠 .
[2]
(iii) (a) Identify which one of the following expressions gives the cumulative
distribution function of Z:
A 1 𝑒
B 1 𝑒
C 1 𝑒
D 1 𝑒 .
[3]
(b) State the distribution and mean of Z, using your answer to part (iii)(a).
[1]
[Total 11]
CS1A S2023–4
6 A railway company investigates the impact of train delays on connections that
passengers experience. In a survey, passengers are asked how many trains they used
for their journey and whether they have experienced any delays.
In other words, the table states that 60% of all journeys involved only one train. Of
those journeys, 5% were delayed. 30% of all journeys involved two trains, and 12%
of those journeys were delayed. And finally, 10% of all journeys involved three or
more trains, and of those journeys, 17% were delayed.
(i) Calculate the probability that a randomly chosen journey involved fewer than
three trains. [1]
(ii) Verify that the probability that a randomly chosen journey was delayed is
8.3%. [2]
(iii) Calculate the probability that a randomly chosen delayed journey involved
fewer than three trains. [3]
CS1A S2023–5
7 Total losses in a particular company are modelled by a random variable 𝑌 with
density function:
𝑐
, 𝑦 1, 𝑐 0
𝑓 𝑦 𝑦
0, otherwise.
(i) Identify which one of the following expressions gives the maximum
likelihood estimate for parameter 𝑐:
A 𝑐̂ ∑
B 𝑐̂ ∑
C 𝑐̂ ∑
D 𝑐̂ ∑
.
[3]
(ii) Determine the posterior distribution of 𝑐 with all its parameters. [6]
(iii) Comment on the relationship between the prior distribution and the posterior
distribution of 𝑐. [1]
(iv) Determine the Bayesian estimate of parameter 𝑐 under quadratic loss. [2]
[Total 12]
CS1A S2023–6
8 The time spent (in minutes) queuing in a line by customers at a bank is modelled by
the random variable 𝑋 with density 𝑓 𝑥 θ𝑥𝑒 , 𝑥 0. The parameter θ is
assumed to follow a priori a gamma distribution with parameters 𝑎 and 𝑏.
A 𝑓 θ|𝑥 ∝ θ 𝑒
B 𝑓 θ|𝑥 ∝ θ 𝑒
C 𝑓 θ|𝑥 ∝ θ 𝑒
D 𝑓 θ|𝑥 ∝ θ 𝑒 .
[3]
(ii) State the posterior distribution of θ in part (i) and its parameters. [3]
The waiting times of ten customers are recorded in the table below and the prior
knowledge is determined with 𝑎 4, 𝑏 1.5.
(iii) Calculate a point estimate of the time spent queuing in the line using Bayesian
estimation under all-or-nothing loss. [4]
[Total 10]
CS1A S2023–7
9 The following nine pairs of data are given on observations from two random variables
X and Y:
𝑥 5 7 0 6 8 1 4 9 2
𝑦 11.00 16.18 0.68 11.99 17.72 2.91 9.64 18.92 6.31
(i) Perform a suitable statistical test to investigate the hypothesis that Pearson’s
population correlation coefficient for X and Y is positive, at significance
level 0.01. [6]
(iv) Determine a 99% confidence interval for the predicted mean response,
when x = 3. [5]
[Total 16]
CS1A S2023–8
10 For a socio-economic analysis, a random sample of 20 regions is considered. An
analyst collects data on average household income (in units of $1,000, denoted by 𝑋)
and crime rate (in percent, denoted by 𝑌) for each region in the sample. The data are
displayed in the following plot, and summary statistics are given below.
(i) Calculate a 95% confidence interval for the mean household income. [5]
For the three regions with the highest household income the following data have been
observed:
(v) Comment on the assumption of equal variances made in part (iv). [2]
[Total 21]
END OF PAPER
CS1A S2023–9