Module02A Slides Print
Module02A Slides Print
(Module 2A)
Semester 2, 2023
Aims of this module
2 of 66
Outline
Sample Mean
Sample Variance
Order Statistics
Sample Quantiles
3 of 66
Important Point Estimators
We will now look at the sampling distributions for some very
important point estimators:
• Sample mean
• Sample variance and standard deviation
• Sample proportion
• Sample (Empirical) cdf, pdf and pmf
• Order Statistics
• Quantiles
4 of 66
Outline
Sample Mean
Sample Variance
Order Statistics
Sample Quantiles
5 of 66
Sample mean
n
1 1X
X̄ = (X1 + X2 + . . . Xn ) = Xi
n n
i=1
We note that:
n
1X nµ
E(X̄) = E(Xi ) = =µ
n n
i=1
n
1 X 1 2 σ2
var(X̄) = var(Xi ) = nσ =
n2 n2 n
i=1
6 of 66
Sample mean - unbiased and consistent
So we can say X̄ is unbiased for µ irrespective of the population
being sampled or the sample size.
We note that its variance (and hence standard deviation) both tend
to zero as the sample size n increases. This is sufficient to establish
convergence in probability for X̄ to µ (again left as a tutorial
exercise). So we can write
P
X̄ → µ
and can say that X̄ is consistent for µ.
7 of 66
Sample mean - non-normal populations
When sampling from a non-normal population, the Central Limit
Theorem (CLT) comes into play:
X̄ − µ d
√ ≈ N(0, 1)
σ/ n
as n → ∞.
8 of 66
Simulation example
Consider sampling from a population X ∼ Exp(λ = rate = 1/5). We
choose this as the Exponential is quite right skewed (towards positive
values).
1
Recall that E(X) = λ= 5 and var(X) = λ12 = 25, by the CLT we
have
52
1 1
X̄ ≈ N , = N 5,
λ nλ2 n
9 of 66
A simulation exercise
Generate B = 1000 samples of size n. For each sample compute x̄.
The continuous curve is the normal N (5, 52 /n) distribution prescribed
by the CLT.
(1)
Sample 1: x1 , . . . , x(1)
n → x̄
(1)
(2)
Sample 2: x1 , . . . , x(2)
n → x̄
(2)
..
.
(B)
Sample B: x1 , . . . , x(B)
n → x̄
(B)
10 of 66
A simulation exercise
The distribution of X̄ approaches the theoretical distribution (CLT). As expected from
consistency it becomes more and more concentrated around µ.
n=1 n=5
0.20
0.15
0.10
Density
Density
0.10
0.05
0.00
0.00
−10 −5 0 5 10 15 20 0 5 10 15
X X
n = 25 n = 100
0.4
Density
0.2
0.1
0.0
2 4 6 8 10 2 4 6 8 10
X X
11 of 66
Sample mean - simulation
From the simulation we see the central portion of the sampling
distribution gets close to normal even for n around 25, even though
the exponential population is very skewed. Convergence happens even
more quickly for symmetric distributions (for example an early random
number generator for the normal used an appropriately rescaled sum
of 12 Uniforms on (0, 1)).
12 of 66
Sample mean - ’robustly’ normal
We say a statistical procedure/result is robust if the probability
calculations required are insensitive to deviations from the
assumptions on which they are based.
And you can use it even in the presence of strong skewness if the
sample size is at least 40. In the majority of real experiments these
sample sizes are fairly modest and easy enough to collect.
13 of 66
Gamblers fallacy
If we sample from a Bernoulli distribution (observing
successes/failures as 1’s or 0’s), then X̄ is just another name for P -
the Sample Proportion of successes.
14 of 66
Gambler’s fallacy
Let S be the Sample Sum, S = ni=1 Xi . If Xi = 1 for a Head
P
(success), then S is just the number of heads. Its mean is nµ = n2 ,
variance nσ 2 = n4 and its distribution is approximately normal.
15 of 66
Outline
Sample Mean
Sample Variance
Order Statistics
Sample Quantiles
16 of 66
Sample variance
Remember that the population variance σ 2 is:
var(X) = E (X − E(X))2
E(T ) = k
var(T ) = 2k
18 of 66
Chi-squared distribution
d
• Note that χ2k = Γ( k2 , rate = 21 )) so you can check the
pdf/mean/variance from your known formulae for the Gamma.
• Also note that you proved in Probability that if Z ∼ N(0, 1) then
Z 2 ∼ Γ( 21 , 12 ) so Z 2 ∼ χ21 - a chi-squared with one degree of
freedom.
• χ2k arises as the sum of the squares of iid standard normal rvs:
• This can be proven easily using the mgf for the Chi-square
distribution.
19 of 66
Sample variance - mean known
20 of 66
Sample variance - mean unknown
For unknown mean, we estimate µ by X̄ and have already defined the
Sample Variance S 2 as:
n
2 1 X 2
S = Xi − X̄
n−1
i=1
21 of 66
Sample variance - mean unknown
22 of 66
Sample variance - mean unknown
23 of 66
Sample variance - sampling from normal
(n − 1)S 2 2σ 4
∼ χ2(n−1) var(S 2 ) =
σ2 n−1
24 of 66
Sample variance - sampling from non-normal
25 of 66
Outline
Sample Mean
Sample Variance
Order Statistics
Sample Quantiles
26 of 66
Sample proportion
27 of 66
Sample proportion
28 of 66
Sample proportion
For any n:
p(1 − p)
E(P̂ ) = p var(P̂ ) =
n
29 of 66
Frequentist view
Analogously, we can think of repeating a random experiment where
on each repetition an event A may occur, and does so with P (A). Let
freq(A) be the number of times A occurs in n repetitions. Then
30 of 66
Sample or Empirical cdf
1
F̂ (x) = freq(X ≤ x)
n
.
• Choosing the event A to be {X ≤ x}, implies P (A) = F (x) and it
follows immediately that F̂ (x) is a reasonable estimator of F (x)
for all x ∈ R
• F̂ (x) is unbiased and consistent for F (x), and approximately
normal.
31 of 66
Sample or Empirical pmf
1
p̂X (x) = freq(X = x)
n
.
• Choosing the event A to be {X = x}, implies P (A) = pX (x) and
it follows immediately that p̂X (x) is a reasonable estimator of
pX (x) for all possible values of X.
• p̂X (x) is unbiased and consistent for pX (x), and approximately
normal.
• This validates simulations of discrete distributions where we
generate a table of frequencies for each observed value to estimate
pX (x) for each possible x.
32 of 66
Outline
Sample Mean
Sample Variance
Order Statistics
Sample Quantiles
33 of 66
Definition (recap)
• Sample: X1 , . . . , Xn
• Arrange them in increasing order:
k = 9 (max)
Probability density
0.6
0.4
0.2
0.0
−3 −2 −1 0 1 2 3
x
36 of 66
Example (triangular distribution)
37 of 66
• This is a binomial with 5 trials and probability of success given by
Z 0.5 0.5
Pr(Xi ⩽ 0.5) = 2x dx = x2 0 = 0.52 = 0.25
0
• So we have,
5
Pr(X(4) ⩽ 0.5) = 0.254 0.75 + 0.255 = 0.0156
4
38 of 66
• More generally we have,
Z x x
F (x) = Pr(Xi ⩽ x) = 2t dt = t2 0 = x2
0
5
G(x) = Pr(X(4) ⩽ x) = (x2 )4 (1 − x2 ) + (x2 )5
4
• Taking derivatives gives the pdf,
′ 5
g(x) = G (x) = 4(x2 )3 (1 − x2 )(2x)
4
5
=4 F (x)3 (1 − F (x)) f (x)
4
39 of 66
Triangular distribution, n = 5
Order statistics
k=4
3
Probability density
2
1
0
x
40 of 66
Distribution of X(k)
Gk (x) = Pr(X(k) ⩽ x)
= Pr(at least k Xi ⩽ x)
n
X n
= F (x)i (1 − F (x))n−i
i
i=k
41 of 66
Thus the pdf of X(k) is,
n
X n
gk (x) = G′k (x) = i F (x)i−1 (1 − F (x))n−i f (x)
i
i=k
n−1
X n
+ (n − i) F (x)i (1 − F (x))n−i−1 (−f (x))
i
i=k
Note the similarity of the term structure in the two sums - this
motivates us to change i to j − 1 in the second sum i.e to change the
variable in the sum so j = i + 1 to make the exponents the same.
42 of 66
With this change of variable:
n
X n
gk (x) = i F (x)i−1 (1 − F (x))n−i f (x)
i
i=k
n
X n
− (n − j + 1) F (x)j−1 (1 − F (x))n−j (f (x))
j−1
j=k+1
n
X n
= i F (x)i−1 (1 − F (x))n−i f (x)
i
i=k
n
X n
− (n − i + 1) F (x)i−1 (1 − F (x))n−i (f (x))
i−1
i=k+1
43 of 66
• We note in the second sum that:
n n!
(n − i + 1) = (n − i + 1)
i−1 (n − i + 1)!(i − 1)!
n!
= i
(n − i)!(i)!
n
=i
i
So indeed the terms are identical in form - lets call them ti say -
and our so-called telescoping sum collapses giving:
n
X n
X
gk (x) = ti − ti = tk
i=k i=k+1
44 of 66
• Hence, the pdf simplifies to,
n
gk (x) = tk = k F (x)k−1 (1 − F (x))n−k f (x)
k
• Special cases: minimum and maximum,
• These are consistent with these familiar expressions for the tails of
these distributions from Probability:
45 of 66
Alternative derivation of the pdf of X(k)
• Heuristically,
◦ One is in x − 21 dy, x + 12 dy
◦ n − k are in x + 12 dy, ∞
46 of 66
• Putting these together,
n!
gk (x) dy ≈ F (x)k−1 (1 − F (x))n−k f (x) dy
(k − 1)! 1! (n − k)!
• Dividing both sides by dy gives the pdf of X(k)
47 of 66
Example (boundary estimate)
• X1 , . . . , X5 ∼ Unif(0, θ)
• Likelihood is
5
1
0 ⩽ xi ⩽ θ, i = 1, . . . , 5
L(θ) = θ
0 otherwise (i.e. if θ < xi for some i)
48 of 66
• Then,
θ 6 θ
5x4
Z
5x 5
E(X(5) ) = x dx = = θ
0 θ5 6θ5 0 6
• So the MLE X(5) is biased
• (But 56 X(5) is unbiased as we quoted in Module 2)
49 of 66
Uniform distribution, n = 4
Order statistics
k = 4 (max)
Probability density
0 E(X(4)) θ
x
50 of 66
Reminder: Inverse transformation method
so the density is
51 of 66
Distribution on cdf scale
52 of 66
Distribution on cdf scale
53 of 66
Beta distribution
Γ(α + β) α−1
f (p) = p (1 − p)β−1 , 0⩽p⩽1
Γ(α) Γ(β)
• Γ is the gamma function, a generalisation of the factorial function.
Note that Γ(n) = (n − 1)!
54 of 66
Beta distribution
• Properties:
α
E(P ) =
α+β
α−1
mode(P ) = (α, β > 2)
α+β−2
αβ
var(P ) =
(α + β)2 (α + β + 1)
• The Beta has a large variety of shapes depending on the parameter
values including a uniform distribution when α = 1 and β = 1.
55 of 66
Order Statistics on Uniform(0,1)
Wi = F (X(k) ) ∼ Beta(k, n − k + 1)
• We also have:
k
E(Wk ) =
n+1
k−1
mode(Wk ) =
n−1
56 of 66
Quantile Types 6 and 7
57 of 66
Uniform distribution, n = 5
Order statistics
5
k = 1 (min)
k=2
k = 3 (median)
4
Probability density
3
2
1
0
x
58 of 66
Outline
Sample Mean
Sample Variance
Order Statistics
Sample Quantiles
59 of 66
Sample Quantile definition
60 of 66
Sample Quantile definition
61 of 66
Sample Quantile distribution
We derive rough approximations for the mean and variance of Ĉq for
sampling from a continuous distribution. These approximations are
asymptotically valid and perform reasonably well provided the sample
size n is not too small. We have:
F̂ (cq ) − q
Ĉq ≈ cq −
f (cq )
62 of 66
Sample Quantile distribution
q(1 − q)
E(F̂ (cq )) = F (cq ) = q and var(F̂ (cq )) =
n
• It follows that:
q(1 − q)
E(Ĉq ) ≈ cq and var(Ĉq ) ≈
nf (cq )2
63 of 66
Sample Quantile distribution
64 of 66
Example: Sample Median for normal
1
E(Ĉ0.5 ) ≈ m and var(Ĉ0.5 ) ≈
4nf (m)2
65 of 66
Sample Quantile distribution
66 of 66