Edexcel S3 Notes PDF
Edexcel S3 Notes PDF
Revision Notes
June 2016
2 S3 JUNE 2016 SDB
Statistics 3
1 Combinations of random variables ....................................................................................................3
Expected mean and variance for X ± Y .....................................................................................................3
Reminder .............................................................................................................................................................................. 3
Combining independent normal random variables Y ................................................................................3
2 Sampling ...............................................................................................................................................4
Methods of collecting data .........................................................................................................................4
Taking a census .................................................................................................................................................................... 4
Sampling ............................................................................................................................................................................... 4
Simple random sampling............................................................................................................................5
Using random number tables ................................................................................................................................................ 5
Systematic sampling ...................................................................................................................................5
Stratified sampling .....................................................................................................................................6
Sampling with and without replacement ....................................................................................................6
Quota sampling ..........................................................................................................................................7
Primary data ..............................................................................................................................................7
Secondary data ...........................................................................................................................................7
3 Biased & unbiased estimators .............................................................................................................8
Unbiased estimators of µ and σ 2 .............................................................................................................11
Estimating µ and σ 2 from a sample ................................................................................................................................. 11
7 Appendix ............................................................................................................................................. 33
Combining random variables .................................................................................................................. 33
E[X + Y] = E[X] + E[Y]......................................................................................................................................................33
Var[X + Y] = Var[X] + Var[Y] ...........................................................................................................................................34
Unbiased & biased estimators ................................................................................................................. 34
Unbiased estimators ............................................................................................................................................................34
Biased Estimators ................................................................................................................................................................35
Unbiased estimates of population mean and variance ............................................................................ 35
Unbiased estimate of the mean ............................................................................................................................................35
Unbiased estimate of the variance of the population ...........................................................................................................36
Bias ......................................................................................................................................................................................37
Probability generating functions ............................................................................................................. 39
Expected mean and variance for a p.g.f...............................................................................................................................39
Mean and variance of a Binomial distribution ....................................................................................................................40
Mean and variance of a Poisson distribution .......................................................................................................................40
Index ............................................................................................................................................................ 41
2 Sampling
Methods of collecting data
Taking a census
A census involves observing every member of a population
and is used if
the size of the population is small
or if extreme accuracy is required.
Advantages
it should give a completely accurate result, a full picture.
Disadvantages
very time consuming and expensive
it cannot be used when testing process destroys article being tested
information is difficult to process because there is so much of it.
Sampling
Sampling involves observing or testing a part of the population.
It is cheaper but does not give such a full picture.
The size of the sample depends on the accuracy desired (for a varied population a large sample will be
required to give a reasonable accuracy).
4 S3 JUNE 2016 SDB
Simple random sampling
Every member of the population must have an equal chance of being selected.
Systematic sampling
First make an ordered list, and divide into equal groups each of size 50 (or??).
Second select every 50th (or ??) member from the list.
In order to make sure that the first on the list is not automatically selected random number tables must be
used to select the member in the first group, then select every 50th (or ??) after that.
Used when the population is too large for simple random number sampling.
Advantages
simple to use
suitable for large samples
Disadvantages
only random if the ordered list is truly random.
it can introduce bias
Example: How would you take a stratified sample of 50 children from a school of 500 pupils divided
as follows:
Boys Girls
Upper sixth 30 40
Lower sixth 30 30
Fifth form 70 60
Fourth form 60 70
Third form 50 60
Solution: As 50 is 1/10 of the total population, 1/10 of each stratum should be selected in the sample.
Thus the sample would comprise
Boys Girls
Upper sixth 3 4
Lower sixth 3 3
Fifth form 7 6
Fourth form 6 7
Third form 5 6
and simple random number sampling would be used within each stratum.
Used when
the sample is large
the population divides naturally into mutually exclusive groups.
Advantages
it can give more accurate estimates (or a more representative picture) than simple random number
sampling when there are clear strata present.
It reflects the population structure.
Disadvantages
within the strata the problems are the same as for any simple random sample
if the strata are not clearly defined they may overlap.
Primary data
Primary data is data collected by or on behalf of the person who is going to use the data.
Advantages
collection method is known
accuracy is known
exact data needed are collected
Disadvantages
costly in time and effort
Secondary data
Secondary data is data not collected by or on behalf of the person who is going to use it. The data are
second-hand – e.g. government census statistics.
Advantages
cheap to obtain
large quantity available (e.g. internet)
much has been collected year on year and can be used to plot trends
Disadvantages
collection method may not be known
accuracy may not be known
it can be in a form which is difficult to handle
bias is not always recognised.
Example: A bag contains a large number of coins, of which 25 are 2p coins and 35 are 5p coins.
(a) X is the value of a single coin draw from the bag. Find the expected mean of all coins in
the bag, µ = E[X].
Samples of size 3 are now drawn from the bag.
(b) Find the sampling distribution of and the expected value of (i) the median, and (ii) the
mean.
(c) (i) The median, Q2, is used as an estimator of the mean of all the coins, µ. Show that
Q2 is a biased estimator of µ, and find the bias.
(ii) The mean, 𝑋, is used as an estimator of the mean of all the coins, µ. Show that 𝑋 is
an unbiased estimator of µ.
(d) kQ2 is now used as an unbiased estimator of the mean of all the coins. Find the value of k.
Solution:
(a) µ = E[X] = ∑ 𝑥𝑖 𝑝𝑖 = 2
5
× 2 + 35 × 5 = 3 ∙ 8
Sampling distribution
(d) If we now use kQ2 as an unbiased estimator of the mean value of all the coins.
493
E[kQ2] = k E[Q2] = 125 𝑘
475
But the true mean µ = 125
Example: A sample of size 3 is drawn from a binomial distribution B(10, 0⋅25) and the mean, 𝑋, is
calculated.
1
The probability of success, p, is estimated by 𝑝̂ = 10 𝑋. Show that 𝑝̂ is an unbiased estimator of p.
1
Solution: E[X] = 2 (3 + β )
1
For a sample {X1, X2, X3, X4}, 𝑋 = 4 (𝑋1 + 𝑋2 + 𝑋3 + 𝑋4 )
1 1
⇒ E� 𝑋 � = E �4 (𝑋1 + 𝑋2 + 𝑋3 + 𝑋4 )� = 4 (E[𝑋1 ] + E[𝑋2 ] + E[𝑋3 ] + E[𝑋4 ])
1
⇒ E� 𝑋 � = 4 × 4 × E[𝑋] = E[𝑋] since E[𝑋𝑖 ] = E[𝑋], for i = 1, 2, 3, 4
1
⇒ E� 𝑋 � = 2
(3 + β )
1
⇒ E�𝛽̂� = E�2𝑋 − 3� = 2E� 𝑋 � − 3 = 2 × 2 (3 + β ) − 3 = 𝛽,
Note: the Edexcel course uses both the letters S2 and sx2 to mean the unbiased estimate of σ 2.
Also, the term Sample Variance is used to denote the unbiased estimate of σ 2, the variance of the
population.
1 1
In these notes I shall always think of the variance, (𝑠𝑑)𝑥 2, as 𝑛
∑ 𝑋𝑖 2 − 𝑋� 2 =
𝑛
∑(𝑋 − 𝑋�)2
To find S2 or sx2, the unbiased estimator for σ 2:–
𝒏
Calculate (𝒔𝒅)𝒙 𝟐, and then multiply by 𝒏−𝟏
Solution:
X 𝑋 − 𝑋� (𝑋 − 𝑋�)2
56 1⋅8 3⋅24
53 -1⋅2 1⋅44
57 2⋅8 7⋅84
51 -3⋅2 10⋅24
54 -0⋅2 0⋅04
271 22⋅8
1 271
⇒ 𝑋� = 𝑛
∑𝑋 =
5
= 54⋅2
1 22∙8
⇒ (𝑠𝑑)𝑥 2 = 𝑛 ∑(𝑋 − 𝑋�)2 = 5
= 4 ∙ 56
𝑛 5
⇒ 𝜎� 2 = 𝑛−1
(𝑠𝑑)𝑥 2 = 4
× 4 ∙ 56 = 5 ∙ 7
Answer Unbiased estimators for the mean and variance of all chocolate bars are 54⋅2 grams
and 5⋅7 grams2.
Example: The volume of water in each of a sample of 14 litre bottles of water from a day’s
production is taken. The results are shown below, in ml.
1023, 1019, 1004, 1011, 1023, 1014, 1017, 1020, 1020, 1010, 1025, 1007, 1016, 1019
Find unbiased estimates for the mean and variance of all bottles produced on that day.
14228
Solution: First find the sample mean, 𝑋�, = 14
= 1016⋅286….
(finding 𝑋 − 𝑋� each time) would give unpleasant arithmetic,
1
so use (𝑠𝑑)𝑥 2 = 𝑛
∑ 𝑋 2 − 𝑋� 2
∑ 𝑋 2 = 14460232
1446032 14228 2
⇒ (𝑠𝑑)𝑥 2 = 14
− � 14
� = 37⋅06122…
𝑛 14
⇒ S2 = sx2 = 𝑛−1
(𝑠𝑑)𝑥 2 = 14−1
× 37 ∙ 06122 … = 39 ∙ 91209 …
Answer Unbiased estimators for the mean and variance of the whole day’s production are
1016⋅3 ml and 39⋅91 ml2.
Example: The lengths of 10 rods are measured, and the sample has mean, 𝑋� = 26⋅7 cm and variance
s2 = 76⋅9 cm2. An eleventh rod has length 30 cm.
Find (a) the mean and (b) the variance of the sample of 11 rods.
Solution: (a) With the sample mean there are no complications.
10 10
1
For 𝑛 = 10, 𝑋�10 = � 𝑋𝑖 = 26 ∙ 7 ⇒ � 𝑋𝑖 = 267
10
𝑖=1 𝑖=1
11 11
1 297
For 𝑛 = 11, � 𝑋𝑖 = 267 + 30 = 297 ⇒ X11 = � 𝑋𝑖 = = 27 𝑐𝑚
11 11
𝑖=1 𝑖=1
(b) WARNING: The question refers to the variance of the sample, which means the unbiased
estimate of the variance of the population.
10 9
𝑠10 2 = 76⋅9 = (10−1)
× (sd)10 2 ⇒ (sd)10 2 = 10
× 76 ∙ 9 = 69 ∙ 21
10
2 1 2
⇒ (sd)10 = � 𝑋𝑖 2 − 𝑋�10 = 69 ∙ 21
10
𝑖=1
10
⇒ � 𝑋𝑖 2 = 692 ∙ 1 + 10 × 26 ∙ 72 = 7821
𝑖=1
11
For 11 rods, sample mean is 27 cm, and sample variance is 70⋅2 cm2.
X is a random variable draw from a population with mean µ and standard deviation σ.
X 1 + X 2 + ... + X n
If {X1, X2, ... , Xn} is a random sample of size n with mean 𝑋 =
n
then E[Xi] = µ, and Var[Xi] = σ 2, for i = 1, 2, 3, …, n
X + X 2 + ... + X n
E� 𝑋 � = E 1
n
=
1
(E[ X 1 ] + E[ X 2 ] + ... E[ X n ]) = 1 (µ + µ + ... µ )
n n
= µ.
X + X 2 + ... + X n
Var� 𝑋 � = Var 1 assuming that all the Xi are independent
n
σ2
1
(
= 2 (Var[ X 1 ] + Var[ X 2 ] + ... Var[ X n ]) = 2 σ 2 + σ 2 ... σ 2
1
) =
𝑛𝜎2
𝑛2
=
n
n n
This means that if very many samples were taken and the mean of each sample calculated then the
σ2
mean of these means would be µ and the variance of these means would be .
n
It can also be shown that the sample means form a Normal distribution (provided that n is ‘large
enough’).
We can then say that for samples drawn from a population with mean µ and variance σ 2,
𝜎2
the sampling distribution of the mean is N(µ, ).
𝑛
Example: A sample of size 50 is taken from a population of eggs with mean 23⋅4 grams and variance
36 grams2.
(i) Find the probability that a single egg weighs more than 25 grams.
(ii) Find the probability that the sample mean is larger than 25.
(iii) What assumptions did you make?
Solution:
(i) The weight of a single egg, X ∼ N(23⋅4, 62)
25−23∙4
⇒ P(X > 25) = Φ � 6
� = Φ(0 ∙ 27) = 0 ∙ 6064
(ii) µ = 23⋅4, σ 2 = 36
6 2
The sample mean 𝑋 ∼ N �23⋅4, � � �
√50
𝜎 6
⇒ standard error is = = 0⋅848528137
√𝑛 √50
𝑋� ~ 𝑁(23 ∙ 4, 0 ∙ 8485 …2 )
25−23∙4
⇒ P(𝑋� > 25) = 1 – Φ �0∙8485…�
(iii) We have assumed the Central Limit Theorem: in particular that the sample means form a
normal distribution .
Solution:
First assume that the machine is still producing packets with the same variance, 25.
Suppose that the mean weight of all packets of biscuits is µ grams then the population of all
packets has mean µ and standard deviation 5.
From the central limit theorem we can assume that the sample means form an approximately
σ 5
normal population with mean µ and standard error (standard deviation) = = 1⋅5811
n 10
f(x)
253∙4−𝜇
⇒ –1⋅9600 < 1∙5811
< 1⋅9600
253∙4−𝜇 253∙4−𝜇
⇒ –1⋅9600 < 1∙5811
and 1∙5811
< 1⋅9600
This means that 95% of the samples will give an interval which contains the mean
and we say that [250⋅3 g, 256⋅5 g] is a 95% confidence interval for µ.
This means that there is a 0⋅95 probability that this interval contains the true mean.
It does not mean that there is a probability of 0⋅95 that the true mean lies in this interval - the true
mean is a fixed number, and either does or does not lie in the interval so the probability that the
true mean lies in the interval is either 1 or 0.
16 S3 JUNE 2016 SDB
In practice we go straight to the last line of the example:
𝜎
95% confidence limits are µ ± 1⋅9600 × since P(Z –1⋅9600 < z < 1⋅9600) = 0⋅95
√𝑛
tables give P(Z > 1⋅9600) = 0⋅025
𝜎
90% confidence limits are µ ± 1⋅6449 × since P(Z –1⋅6449 < z < 1⋅6449) = 0⋅90
√𝑛
tables give P(Z > 1⋅6449) = 0⋅05
Other confidence limits can be found using the Normal Distribution tables.
Example: A sample of 64 packets of cornflakes has a mean weight X = 510 grams and a variance
S 2 = 36 grams2. Find 90% confidence limits for the mean weight of all packets.
(Note that the ‘sample variance’ is taken as the unbiased estimate of σ 2.)
Solution: We assume that the sample variance = the variance of the population of all packets
⇒ S 2 = 36 = σ 2.
Now find standard deviation (standard error) of the sampling distribution of the mean (population
σ 6
of sample means), standard error = = = 0 ⋅ 75
n 64
For 90% confidence limits z = ± 1⋅6449 (remember to use the 4 D.P. tables after the Normal Dist. tables),
using the sample mean X = 510 grams
⇒ 90% confidence limits are 510 ± 1⋅6449 × 0.75 = 510 ± 1⋅234
⇒ a 90% confidence interval is [508⋅8, 511⋅2] to 4 S.F.
Note that we have assumed that the unbiased estimate, S 2 (=36), is the actual variance, σ 2, of the
population.
This is a reasonable assumption as the number in the sample, 64, is large and the error introduced is
therefore small.
5) Conclusion
Do not reject H0 at the 5% level and advise the production manager that there is evidence that
he should not change his setting, or that there is evidence that the machine is working correctly,
etc.
Example: The weights of chocolate bars produced by two machines, A and B, are known to be
normally distributed with variances σA2 = 4 and σB2 = 3 grams2. Samples are taken from each
machine of sizes nA = 25 and nB = 16 which have means X A = 123 ⋅ 1 and X B =124 ⋅ 4
grams. Is there any evidence at the 5% significance level that the bars produced by machine B are
heavier than the bars produced by machine A?
Solution:
Suppose that the mean weights for all bars from the two machines are µA and µB
H0: µA = µB
H1: µB > µA one-tail test at 5% level
Fortunately (!) the formula for testing the difference between sample means
𝑋�−𝑌� – (𝜇𝑥 −𝜇𝑦 )
Z = is in your formula booklet.
𝜎 2 𝜎𝑦 2
� 𝑥 +
𝑛𝑥 𝑛𝑦
When the variance of the population, σ 2, is not known and when the sample is large, we assume that the
variance of the sample (meaning the unbiased estimate of σ 2), S 2, is the variance of the population, σ2.
As the sample is large, the error introduced is small.
Example: A machine usually produces steel rods with a mean length of 25⋅4 cm. The production
manager wants to test 80 rods to see whether the machine is working correctly. The sample has
mean 25⋅31 cm and variance 0⋅332 cm2. Advise the production manager, using a 5% level of
significance.
Important assumption
Solution:
H0: µ = 25⋅4.
H1: µ ≠ 25⋅4 two-tail test, 2⋅5% in each tail
σ 0 ⋅ 33
and standard error = = = 0⋅036895121.
n 80
The observed sample mean is 25.31 and for a two-tail test at 5% we consider
25∙31−25∙4
Φ� � = Φ(−2 ∙ 4393) = 1 − Φ(2 ∙ 44) = 0⋅0073 < 2⋅5%
0∙036895121
⇒ reject H0 and conclude that there is evidence that that the machine is not producing rods of
mean length 25⋅4 cm.
29 − 0
= 1 – Φ� � = 1 – Φ(2 ∙ 04) = 0⋅0207 < 10%
14∙2016…
which is significant at 10%.
5) Conclusion
Reject H0 at the 10% level and conclude that there is evidence that machine B produces
cables with a greater mean strength than machine A.
If the expected frequency for a class is less than 5, then you must group this class with the next
class (or two …).
The number of degrees of freedom, ν, is
the number of cells (after grouping if necessary) minus the number of linear equations connecting
the frequencies.
1 43 50 0⋅98
2 49 50 0⋅02
3 54 50 0⋅32
4 57 50 0⋅98
5 46 50 0⋅72
6 51 50 0⋅02
Totals 300 300 3⋅04
⇒ χ2 = 3⋅04
and ν = number of degrees of freedom = n – 1 = 6 – 1 = 5
since the total is a linear equation connecting the frequencies and is fixed.
From tables we see that χ 5 (2 ⋅ 5%) = 12 ⋅ 832 > 3 ⋅ 04 , so our observed result is not significant.
2
Binomial distribution
For H0 The Binomial distribution is a good fit
we use the mean of the Observed frequencies to calculate the Expected frequencies, and so both Oi and Ei
give the same mean and total: thus there are 2 linear equations connecting the frequencies and ν = n – 2
but For H0 The Binomial distribution, B(30, 0⋅3), is a good fit
the means using Oi and Ei will be different: thus there is only 1 linear equation, the total, connecting the
frequencies and so ν = n – 1.
Poisson distribution
For H0 The Poisson distribution is a good fit
we use the mean of the Observed frequencies to calculate the Expected frequencies, and so both Oi and Ei
give the same mean and total: thus there are 2 linear equations connecting the frequencies and ν = n – 2
but For H0 The Poisson distribution, Po(3), is a good fit
the means using Oi and Ei will be different: thus there is only 1 linear equation, the total, connecting the
frequencies and so ν = n – 1.
Example: A switchboard operator records the number of new calls in 69 consecutive one-minute
periods in the table below.
number of calls 0 1 2 3 4 5 ≥6
frequency 6 9 11 15 13 9 6
a) Say why you think that a Poisson distribution might be suitable.
b) Find the mean and variance of this distribution. Do these figures support the view that they
might form a Poisson distribution?
c) Test the goodness of fit of a Poisson distribution at the 5% level.
Solution:
a) Telephone calls are likely to occur singly, randomly, independently and uniformly which
are the conditions for a Poisson distribution.
b) Treating ≥ 6 as 7 we calculate the mean and variance
2
x f xf xf
0 6 0 0
1 9 9 9
2 11 22 44
3 15 45 135
4 13 52 208
5 9 45 225
7 6 42 294
69 215 915
x O p E O E (O − E ) 2
(grouped) (grouped)
E
0 6 0⋅044337 3⋅059234
1 9 0⋅138151 9⋅532395 15 12⋅59 0⋅461326
2 11 0⋅215235 14⋅8512 11 14⋅85 0⋅998148
3 15 0⋅223553 15⋅42515 15 15⋅43 0⋅011983
4 13 0⋅174145 12⋅01597 13 12⋅02 0⋅079900
5 9 0⋅108525 7⋅488214 9 7⋅49 0⋅304419
≥6 6 0⋅096056 6⋅627836 6 6⋅63 0⋅059864
69 69 69.01 1.915641
The expected frequency for x = 0 is 3.06 < 5 so it has been grouped with x = 1.
Thus we have n = 6 classes (after grouping) and ν = n – 2 = 4
and χ 42 (5%) = 9.488 .
We have calculated χ2 = 1.92 < 9.488 which is not significant so we do not reject H0
and conclude that the Poisson distribution is a suitable model.
Example: The sizes of men’s shoes purchased from a shoe shop in one week are recorded below.
size of shoe ≤6 7 8 9 10 11 ≥ 12
number of pairs 14 19 29 45 40 21 7
Is the manager’s assumption that the normal distribution is a suitable model justified at the 5%
level?
The total number of pairs, mean and standard deviation are calculated to be 175, 8.886 and
1.713 (taking ≤ 6 as 5 and ≥ 12 as 12)
Remembering that size 8 means from 7.5 to 8.5 we need to find the area between 7.5 and 8.5 and
multiply by 175 to find the expected frequency for size 8, and similarly for other sizes.
For a 5 × 4 table in which the totals of each row and column are fixed the ‘?’ cells represent the degrees
of freedom since if we know the values of the ?s the frequencies in the other cells can now be calculated
A B C D E totals
W ? ? ? ?
X ? ? ? ?
Y ? ? ? ?
Z
totals
Example: Natives of England, Africa and China were classified according to blood group giving the
following table.
O A B AB
English 235 212 79 83
African 147 106 30 51
Chinese 162 135 52 43
Is there any evidence at the 5% level that there is a connection between blood group and
nationality?
First redraw the table showing totals of each row and column
O A B AB totals
English 235 212 79 83 609
African 147 106 30 51 334
Chinese 162 135 52 43 392
totals 544 453 161 177 1335
O A B AB totals
English 609×544
1335
= 248.2 609× 453
1335
= 206.6 609×161
1335
= 73.4 609×177
1335
= 80.7 608.9
Observed Expected (O − E ) 2
frequency frequency
E
235 248.2 0.70
212 206.6 0.14
79 73.4 0.43
83 80.7 0.07
147 136.1 0.87
106 113.3 0.47
30 40.3 2.63
51 44.3 1.01
162 159.7 0.03
135 133.0 0.03
52 47.3 0.47
43 52.0 1.56
8.41
We have ν = (4 – 1)(3 – 1) = 6 degrees of freedom and χ 6 (5%) = 12.592 .
2
Example: Rank the following numbers: 45, 65, 76, 56, 34, 45, 23, 67, 65, 45, 81, 32.
Solution: First put in order and give ranks as if all were different: then give the average rank for
those which are equal.
Numbers: 81 76 67 65 65 56 45 45 45 34 32 23
Actual rank 1 2 3 4= 4= 6 7= 7= 7= 10 11 12
Rank (if all different) 1 2 3 4 5 6 7 8 9 10 11 12
average for equal ranks
4+5
2 = 4 12 7 +8+ 9
3 =8
Modified rank 1 2 3 4½ 4½ 6 8 8 8 10 11 12
You must now calculate the PMCC, not Spearman, using the modified ranks.
6∑ d 2 6 × 86
rs = 1 − = 1− = 0.521212 = 0.521 to 3 S.F.
n(n − 1)
2
10 × 99
Spearman or PMCC
(∑ x ) 2
(∑ y ) 2
(∑ x )(∑ y ) .
S xx = ∑ x i − S yy = ∑ y i − S xy = ∑ x i y i −
2 i 2 i i i
where , ,
n n n
Example: The product moment correlation coefficient between 40 pairs of values is +0.52. Is there
any evidence of correlation between the pairs at the 5% level?
From tables for n = 40 which give one-tail figures, we must look at the 2.5% column and the critical
values are ±0.3120
The calculated figure is 0.52 > 0.3120 and so is significant
⇒ we reject H0 and conclude that there is some correlation (positive or negative) between
the pairs.
Solution:
(a) H0 : ρ = 0 ; H1 : ρ > 0
For the PMCC
the 5% Critical Value is 0⋅6215
0⋅572 < 0⋅6215 ⇒ not significant at %5
⇒ there is evidence that there is no positive correlation.
For Spearman’s rank correlation coefficient
the 5% Critical Value is 0⋅6429
0⋅655 > 0⋅6429 ⇒ significant at 5%
⇒ there is evidence of positive correlation.
(b) From the PMCC there is not enough evidence to conclude that as Statistics marks
increased Geography marks also increased
− i.e. conclude that the points on a scatter diagram do not lie close to a straight line.
From Spearman’s rank correlation coefficient there is evidence that students ranked
highly in Statistics were also ranked highly in Geography, or people with high scores in
Statistics also had high scores in Geography
= � 𝑥𝑖 � 𝑟𝑖𝑗 + � 𝑦𝑗 � 𝑟𝑖𝑗
𝑖 𝑗 𝑗 𝑖
𝑛 𝑚
= � 𝑥𝑖 𝑝𝑖 + � 𝑦𝑗 𝑞𝑗
𝑖=1 𝑗=1
Biased Estimators
An estimator 𝜆̂ for a parameter λ is said to be biased if E[𝜆̂] ≠ λ.
Example
A naturalist wishes to estimate the number of squirrels in a wood. He first catches 50 squirrels, marks
them and then releases them. Later he catches 30 squirrels and counts the number, i, which have been
marked.
The true number in the population, n, is then estimated as 𝑛� from the equation
50 𝑖 1500
𝑛�
= 30
⇒ 𝑛� = 𝑖
.
30
1500
Now E[𝑛�] = ∑ 0 i
× pi
Let X be a random variable drawn from a population with mean µ and variance σ 2, then
E[X] = µ , and Var[X] = σ 2.
A random sample, X1, X2, X3, …, Xn, of size n is taken from the population.
1
The sample mean is 𝑋� = 𝑛
(𝑋1 + 𝑋2 + 𝑋3 + … + 𝑋𝑛 ).
Preliminary results
(i) Var[X] = E[X 2] − (E[𝑋])2 = E[X 2] − µ 2
⇒ E[X 2] = Var[X] + µ 2 = σ 2 + µ 2 I
Proof
The variance of X1, X2, X3, … , Xn is defined to be
1
Variance = (s.d.)2 = 𝑛 ∑ 𝑋𝑖 2 − 𝑋� 2
1
⇒ E[(s.d.)2] = E �𝑛 ∑ 𝑋𝑖 2 − 𝑋� 2 �
1
= E � 𝑛 ∑ 𝑋𝑖 2 � − E[𝑋� 2 ]
1
= 𝑛
E�∑ 𝑋𝑖 2 � − E[𝑋� 2 ]
1
= ∑ E�𝑋𝑖 2 � − E[𝑋� 2 ]
𝑛
1
= ∑(𝜎2 + 𝜇2 ) − E[𝑋� 2 ] since E�𝑋𝑖 2 � = (𝜎 2 + 𝜇 2 ) from I
𝑛
1 1
= 𝑛
�𝑛(𝜎 2 + 𝜇 2 ) − �𝑛 𝜎 2 + µ2 �� 𝑛
1
since E[𝑋� 2 ] = � 𝜎 2 + 𝜇 2 � from II
1 𝑛−1
⇒ E[(s.d.)2] = (σ 2 + µ 2) − �𝑛 𝜎 2 + µ2 � = 𝑛
𝜎2
Thus E[(s.d.)2] is not equal to the true value, and so (s.d.)2 is a biased estimator of σ 2,
𝑛
but multiplying both sides by 𝑛−1
, we can see that
𝑛
𝑛−1
(s.d.)2 is an unbiased estimator of σ 2.
(d) Use your answers to part (c) to find E[𝑋�], and Var [𝑋�].
(e) Find the sampling distribution for the mode M.
(f ) Use your answers to part (e) to find E[𝑀], and Var [𝑀].
Solution:
(a) µ = ∑ 𝑥𝑖 𝑝𝑖 = 0 × 0⋅6 + 1 × 0⋅4 = 0⋅4
σ 2 = ∑ 𝑥𝑖 2 𝑝𝑖 – µ 2 = (02 × 0⋅6 + 12 × 0⋅4) – 0⋅42 = 0⋅24
(b) Possible samples are
(0, 0, 0) (1, 0, 0) (1, 1, 0) (1, 1, 1)
(0, 1, 0) (1, 0, 1)
(0, 0, 1) (0, 1, 1)
(c) From (c) we can find the sampling distribution of the mean
1 2
𝑋� 0 3 3
1
⇒ Var[𝑋�] = 0⋅08
(e) From (c) we can find the sampling distribution of the mode
M 0 1
p 0⋅63 + 3 × 0⋅62 × 0⋅4 3 × 0⋅6 × 0⋅42 + 0⋅43
0⋅648 0⋅352
(f ) E[𝑀] = 0 × 0⋅648 + 1 × 0⋅352 = 0⋅352
⟹ � 𝑖 2 𝑝𝑖 = G"(1) + � 𝑖𝑝𝑖
1 1
𝑛 𝑛 2
⟹ Var[𝑋] = � 𝑖 2 𝑝𝑖 − �� 𝑖𝑝𝑖 �
1 1
∞ ∞
𝜆𝑖 𝑒 −𝜆 𝑖 𝜆𝑖
⟹ G(𝑡) = � 𝑡 = 𝑒 −𝜆 � 𝑡 𝑖 = 𝑒 −𝜆 𝑒 𝜆𝑡
𝑖! 𝑖!
𝑖=0 𝑖=0
⇒ σ 2 = Var[X] = λ
χ2 test, 27 Spearman
degrees of freedom, 27 comparison with PMCC, 32