0% found this document useful (0 votes)
8 views66 pages

Module02A Slides Print

This document covers the topic of sampling distributions and important point estimators in statistics, specifically focusing on the sample mean, sample variance, and sample proportion. It explains the properties of these estimators, including unbiasedness, consistency, and the impact of sample size on their distributions. Additionally, it introduces the chi-squared distribution and its relevance to sample variance when sampling from normal distributions.

Uploaded by

bhaikuddus197
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views66 pages

Module02A Slides Print

This document covers the topic of sampling distributions and important point estimators in statistics, specifically focusing on the sample mean, sample variance, and sample proportion. It explains the properties of these estimators, including unbiasedness, consistency, and the impact of sample size on their distributions. Additionally, it introduces the chi-squared distribution and its relevance to sample variance when sampling from normal distributions.

Uploaded by

bhaikuddus197
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Point estimation - Part 2 - Sampling distributions

(Module 2A)

Statistics (MAST20005) &


Elements of Statistics
(MAST90058)

School of Mathematics and Statistics


University of Melbourne

Semester 2, 2023
Aims of this module

• Explore the Sampling Distributions for some very important point


estimators
• Knowledge of the Sampling distributions for such Estimators is the
foundation of inference back to the real population
• Sampling distributions usually depend on the population being
sampled as well as on the Statistic and sample size

2 of 66
Outline

Sampling distributions - important Point estimators

Sample Mean

Sample Variance

Sample Proportion, Empirical pmf and cdf

Order Statistics

Sample Quantiles

3 of 66
Important Point Estimators
We will now look at the sampling distributions for some very
important point estimators:
• Sample mean
• Sample variance and standard deviation
• Sample proportion
• Sample (Empirical) cdf, pdf and pmf
• Order Statistics
• Quantiles

In each case, we assume a sample of iid rvs, X1 , . . . , Xn , with mean µ


and variance σ 2 .

4 of 66
Outline

Sampling distributions - important Point estimators

Sample Mean

Sample Variance

Sample Proportion, Empirical pmf and cdf

Order Statistics

Sample Quantiles

5 of 66
Sample mean

n
1 1X
X̄ = (X1 + X2 + . . . Xn ) = Xi
n n
i=1

We note that:
n
1X nµ
E(X̄) = E(Xi ) = =µ
n n
i=1
n
1 X 1 2 σ2
var(X̄) = var(Xi ) = nσ =
n2 n2 n
i=1

These derivations follow purely from the basic properties of


expectation and variance. So they are exactly true irrespective of the
form of the underlying distribution, and for all sample sizes.

6 of 66
Sample mean - unbiased and consistent
So we can say X̄ is unbiased for µ irrespective of the population
being sampled or the sample size.

We note that its variance (and hence standard deviation) both tend
to zero as the sample size n increases. This is sufficient to establish
convergence in probability for X̄ to µ (again left as a tutorial
exercise). So we can write
P
X̄ → µ
and can say that X̄ is consistent for µ.

Also if sampling from a Normal population, the sampling distribution


of X̄ (being a rescaled linear combination of normal random
2
variables) is exactly Normal for any sample size X̄ ∼ N(µ, σn ).

7 of 66
Sample mean - non-normal populations
When sampling from a non-normal population, the Central Limit
Theorem (CLT) comes into play:

X̄ − µ d
√ ≈ N(0, 1)
σ/ n
as n → ∞.

Of course this is a limiting result, so the practical statistical question


is how quickly does the convergence occur for actual populations and
for sample sizes encountered in real data/experiments?

8 of 66
Simulation example
Consider sampling from a population X ∼ Exp(λ = rate = 1/5). We
choose this as the Exponential is quite right skewed (towards positive
values).

1
Recall that E(X) = λ= 5 and var(X) = λ12 = 25, by the CLT we
have
52
   
1 1
X̄ ≈ N , = N 5,
λ nλ2 n

Let’s see how large n needs to be for X̄ to become normal...

9 of 66
A simulation exercise
Generate B = 1000 samples of size n. For each sample compute x̄.
The continuous curve is the normal N (5, 52 /n) distribution prescribed
by the CLT.

(1)
Sample 1: x1 , . . . , x(1)
n → x̄
(1)

(2)
Sample 2: x1 , . . . , x(2)
n → x̄
(2)

..
.
(B)
Sample B: x1 , . . . , x(B)
n → x̄
(B)

Then represent the distribution of {x̄(b) , b = 1, . . . , B} by a histogram.

10 of 66
A simulation exercise
The distribution of X̄ approaches the theoretical distribution (CLT). As expected from
consistency it becomes more and more concentrated around µ.

n=1 n=5

0.20
0.15
0.10
Density

Density

0.10
0.05
0.00

0.00
−10 −5 0 5 10 15 20 0 5 10 15

X X

n = 25 n = 100
0.4

0.0 0.2 0.4 0.6 0.8


0.3
Density

Density
0.2
0.1
0.0

2 4 6 8 10 2 4 6 8 10

X X

11 of 66
Sample mean - simulation
From the simulation we see the central portion of the sampling
distribution gets close to normal even for n around 25, even though
the exponential population is very skewed. Convergence happens even
more quickly for symmetric distributions (for example an early random
number generator for the normal used an appropriately rescaled sum
of 12 Uniforms on (0, 1)).

12 of 66
Sample mean - ’robustly’ normal
We say a statistical procedure/result is robust if the probability
calculations required are insensitive to deviations from the
assumptions on which they are based.

The sampling distribution of X̄ is robust.

As a practical guide, you can use the normal approximation if the


sample size is greater than 15 except in the presence of strong
skewness or outliers.

And you can use it even in the presence of strong skewness if the
sample size is at least 40. In the majority of real experiments these
sample sizes are fairly modest and easy enough to collect.

13 of 66
Gamblers fallacy
If we sample from a Bernoulli distribution (observing
successes/failures as 1’s or 0’s), then X̄ is just another name for P -
the Sample Proportion of successes.

Let’s think concretely about tossing a fair coin. As the average X̄


collapses to 21 (converges in probability to 12 ), gamblers often
interpret this as meaning that the numbers of heads and tails must
balance out, so, for example, if you have a run of heads then the
sequence must correct with more tails to compensate.

14 of 66
Gambler’s fallacy
Let S be the Sample Sum, S = ni=1 Xi . If Xi = 1 for a Head
P
(success), then S is just the number of heads. Its mean is nµ = n2 ,
variance nσ 2 = n4 and its distribution is approximately normal.

A normal has 32% of its probability outside 1 standard deviation. So


in a trillion (1012 ) tosses, 32% of the time√the number of heads will
12
differ from exactly half ( n2 ) by more than 102 = 500, 000 !

As n increases the expected absolute difference between the number


of heads and tails will continue to grow without bound, whilst the
proportion still collapses to 21 . The discrepancies are of the order of

n, so are swamped when we divide by n to calculate the average.

15 of 66
Outline

Sampling distributions - important Point estimators

Sample Mean

Sample Variance

Sample Proportion, Empirical pmf and cdf

Order Statistics

Sample Quantiles

16 of 66
Sample variance
Remember that the population variance σ 2 is:

var(X) = E (X − E(X))2


It measures how much X varies around its mean. If we were trying to


estimate it and were in the luxurious position of actually knowing the
population mean, then a natural estimate would be:
n
1X
S12 = (Xi − µ)2
n
i=1
Using Probability techniques you will be able derive the distribution of
S12 when sampling from a normal distribution, and to prove it is
unbiased and consistent. However first we name an important
distribution - a special case of the Gamma distribution already known
to you.
17 of 66
Chi-squared distribution

• Also written as χ2 -distribution


• Single parameter: k > 0, known as the degrees of freedom
• Notation: T ∼ χ2k or T ∼ χ2 (k)
• The pdf is:
k t
t 2 −1 e− 2
f (t) = k , t≥0
2 2 Γ( k2 )
• Mean and variance:

E(T ) = k
var(T ) = 2k

• The distribution is bounded below by zero and is right-skewed

18 of 66
Chi-squared distribution

d
• Note that χ2k = Γ( k2 , rate = 21 )) so you can check the
pdf/mean/variance from your known formulae for the Gamma.
• Also note that you proved in Probability that if Z ∼ N(0, 1) then
Z 2 ∼ Γ( 21 , 12 ) so Z 2 ∼ χ21 - a chi-squared with one degree of
freedom.
• χ2k arises as the sum of the squares of iid standard normal rvs:

Zi ∼ N(0, 1) ⇒ T = Z12 + · · · + Zk2 ∼ χ2k

• This can be proven easily using the mgf for the Chi-square
distribution.

19 of 66
Sample variance - mean known

• With this background, we leave it to tutorial exercises for you to


establish the distribution of S12 when sampling from a normal
distribution, and that it is unbiased, and also consistent.
• And we note that it is rare in practice to know the mean of a
population (at least very precisely) and then find yourself trying to
estimate the variance, so S12 is really of more theoretical rather
than practical interest.

20 of 66
Sample variance - mean unknown
For unknown mean, we estimate µ by X̄ and have already defined the
Sample Variance S 2 as:
n
2 1 X 2
S = Xi − X̄
n−1
i=1

When defined this way we can show:


• E(S 2 ) = σ 2 , so S 2 is unbiased for σ 2 ;
4
• var(S 2 ) = ν4 −3σ 2σ 4
, where ν4 ≡ E (X − E(X))4 is the

n + n−1
so-called fourth central moment (result tedious so not proven)

Establishing that S 2 is unbiased for σ 2 and consistent for σ 2 is left to


tutorials.

21 of 66
Sample variance - mean unknown

• Whilst the choice of the (n − 1) denominator will be validated by


your tutorial work as yielding an unbiased estimate, we offer an
intuitive explanation here as a similar phenomena occurs in other
models later in the course.
• Firstly we note that the averaged quantities (Xi − X̄) are not
independent as they sum to zero.
• This introduces ’one’ constraint which reduces the ’degrees of
freedom’ within the sum from n, as it is for S12 , to (n − 1)

22 of 66
Sample variance - mean unknown

• The constraint arises due to the estimation of the one parameter


µ, so the ’degrees of freedom’ have been reduced by one by
estimating a single parameter
• The arising variance estimates can be viewed as a ’Sum of Squares’
(SS) divided by its degrees of freedom (df).
• We now consider the sampling distribution for S 2 when sampling
from a Normal, which will add further context to this intuitive
approach.

23 of 66
Sample variance - sampling from normal

• When sampling from a normal distribution, we have the following


important results for the Sample variance S 2 :

(n − 1)S 2 2σ 4
∼ χ2(n−1) var(S 2 ) =
σ2 n−1

where χ2k is the chi-squared distribution with k degrees of freedom.


• With lots of assistance along the way you will be able to prove the
χ2(n−1) distributional result in tutorials, with full solutions for
confirmation.

24 of 66
Sample variance - sampling from non-normal

• The χ2(n−1) distribution result only applies when sampling from a


normal
• Unlike the CLT supported distribution for X̄, this result is not
robust when sampling from non-normal distributions.
• This has implications for later inferences - in particular the
reliability of quantiles for confidence intervals and hypothesis tests
• We will gain some insights into this complex area using simulations
in the labs

25 of 66
Outline

Sampling distributions - important Point estimators

Sample Mean

Sample Variance

Sample Proportion, Empirical pmf and cdf

Order Statistics

Sample Quantiles

26 of 66
Sample proportion

• The proportion of a population p that share a characteristic is


frequently a focus of interest e.g. proportion undecided in a
referendum, proportion voting green, proportion with diabetes,
proportion of smokers, ...
• We model this as a random sample from a Bernoulli distribution,
where Xi is 1 if the ith person in the sample has the characteristic
of interest, and 0 otherwise.

27 of 66
Sample proportion

• Let N be the number in the sample with the characteristic, then


N ∼ Bi(n, p), and the Sample proportion P̂ is:
Pn
N Xi
P̂ = = i=1 = X̄
n n
• So P̂ is just another name for the Sample Mean when sampling
from a Bernoulli and all Sample Mean properties immediately
follow - unbiasedness, consistency, approximate normality (when
sample size is large enough).
• However we still develop separate theory/formulae for the Sample
Proportion as an intuitive name for the estimator

28 of 66
Sample proportion
For any n:
p(1 − p)
E(P̂ ) = p var(P̂ ) =
n

For large enough n, the sampling distribution is approximately normal:


 
p(1 − p)
P̂ ≈ N p,
n

The approximation breaks down if either np or n(1 − p) are too small


in which case the Binomial is very skewed left or right - under these
circumstances you already know from Probability that a Poisson
approximation is more appropriate.

29 of 66
Frequentist view
Analogously, we can think of repeating a random experiment where
on each repetition an event A may occur, and does so with P (A). Let
freq(A) be the number of times A occurs in n repetitions. Then

freq(A) ∼ Bi(n, P (A))

and P̂ is unbiased and consistent for P (A).

This result underpins the frequentist view that P(A) can be


interpreted as the long term relative frequency of A as the number of
repetitions tends to infinity.

30 of 66
Sample or Empirical cdf

• We defined the empirical cdf as

1
F̂ (x) = freq(X ≤ x)
n
.
• Choosing the event A to be {X ≤ x}, implies P (A) = F (x) and it
follows immediately that F̂ (x) is a reasonable estimator of F (x)
for all x ∈ R
• F̂ (x) is unbiased and consistent for F (x), and approximately
normal.

31 of 66
Sample or Empirical pmf

• For discrete X we defined the empirical pmf as

1
p̂X (x) = freq(X = x)
n
.
• Choosing the event A to be {X = x}, implies P (A) = pX (x) and
it follows immediately that p̂X (x) is a reasonable estimator of
pX (x) for all possible values of X.
• p̂X (x) is unbiased and consistent for pX (x), and approximately
normal.
• This validates simulations of discrete distributions where we
generate a table of frequencies for each observed value to estimate
pX (x) for each possible x.

32 of 66
Outline

Sampling distributions - important Point estimators

Sample Mean

Sample Variance

Sample Proportion, Empirical pmf and cdf

Order Statistics

Sample Quantiles

33 of 66
Definition (recap)
• Sample: X1 , . . . , Xn
• Arrange them in increasing order:

X(1) = Smallest of the Xi


X(2) = 2nd smallest of the Xi
..
.
X(n) = Largest of the Xi
• These are called the order statistics

X(1) ⩽ X(2) ⩽ · · · ⩽ X(n)


• X(k) is called the kth order statistic of the sample
• X(1) is the minimum or sample minimum
• X(n) is the maximum or sample maximum
34 of 66
Motivating example

• Take iid samples X ∼ N(0, 1) of size n = 9


• What can we say about the order statistics, X(k) ?
• Simulated values:
[,1] [,2] [,3] [,4] [,5]
[1,] -0.76 -1.94 -1.32 -0.85 -1.96 <-- Minimum
[2,] -0.32 -0.17 -0.53 -0.30 -0.98
[3,] -0.23 0.06 -0.44 0.14 -0.83
[4,] 0.05 0.18 -0.10 0.25 -0.63
[5,] 0.08 0.76 0.17 0.35 -0.47 <-- Median
[6,] 0.18 0.96 0.26 0.68 0.05
[7,] 0.27 1.07 0.60 0.69 0.34
[8,] 0.73 1.42 0.66 1.13 1.26
[9,] 0.91 1.77 1.93 1.98 1.26 <-- Maximum
35 of 66
Standard normal distribution, n = 9
1.0
Order statistics
k = 1 (min)
k = 5 (median)
0.8

k = 9 (max)
Probability density

0.6
0.4
0.2
0.0

−3 −2 −1 0 1 2 3

x
36 of 66
Example (triangular distribution)

• Random sample: X1 , . . . , X5 with pdf f (x) = 2x, 0 < x < 1


• Calculate Pr(X(4) ⩽ 0.5)
• Critical observation: that the event {X(4) ⩽ 0.5} is the same as
the event {at least 4 of the Xi ⩽ 0.5}
• Remember equality of two events A = B is defined by A ⊆ B and
B ⊆ A so you need to check both directions.
• So:

Pr(X(4) ⩽ 0.5) = Pr(at least 4 Xi ’s less than 0.5)


= Pr(exactly 4 Xi ’s less than 0.5)
+ Pr(exactly 5 Xi ’s less than 0.5)

37 of 66
• This is a binomial with 5 trials and probability of success given by
Z 0.5  0.5
Pr(Xi ⩽ 0.5) = 2x dx = x2 0 = 0.52 = 0.25
0

• So we have,
 
5
Pr(X(4) ⩽ 0.5) = 0.254 0.75 + 0.255 = 0.0156
4

38 of 66
• More generally we have,
Z x  x
F (x) = Pr(Xi ⩽ x) = 2t dt = t2 0 = x2
0
 
5
G(x) = Pr(X(4) ⩽ x) = (x2 )4 (1 − x2 ) + (x2 )5
4
• Taking derivatives gives the pdf,
 
′ 5
g(x) = G (x) = 4(x2 )3 (1 − x2 )(2x)
4
 
5
=4 F (x)3 (1 − F (x)) f (x)
4

since we know that F (x) = x2 .

39 of 66
Triangular distribution, n = 5

Order statistics
k=4
3
Probability density

2
1
0

0.0 0.2 0.4 0.6 0.8 1.0

x
40 of 66
Distribution of X(k)

• Sample from a continuous distribution with cdf F (x) and


pdf f (x) = F ′ (x).
• The cdf of X(k) is,

Gk (x) = Pr(X(k) ⩽ x)
= Pr(at least k Xi ⩽ x)
n  
X n
= F (x)i (1 − F (x))n−i
i
i=k

41 of 66
Thus the pdf of X(k) is,
n  
X n
gk (x) = G′k (x) = i F (x)i−1 (1 − F (x))n−i f (x)
i
i=k
n−1  
X n
+ (n − i) F (x)i (1 − F (x))n−i−1 (−f (x))
i
i=k

Note the similarity of the term structure in the two sums - this
motivates us to change i to j − 1 in the second sum i.e to change the
variable in the sum so j = i + 1 to make the exponents the same.

42 of 66
With this change of variable:
n  
X n
gk (x) = i F (x)i−1 (1 − F (x))n−i f (x)
i
i=k
n  
X n
− (n − j + 1) F (x)j−1 (1 − F (x))n−j (f (x))
j−1
j=k+1
n  
X n
= i F (x)i−1 (1 − F (x))n−i f (x)
i
i=k
n  
X n
− (n − i + 1) F (x)i−1 (1 − F (x))n−i (f (x))
i−1
i=k+1

Now to further align the term structures we consider the leading


terms with binomial coefficients.

43 of 66
• We note in the second sum that:
 
n n!
(n − i + 1) = (n − i + 1)
i−1 (n − i + 1)!(i − 1)!
n!
= i
(n − i)!(i)!
 
n
=i
i

So indeed the terms are identical in form - lets call them ti say -
and our so-called telescoping sum collapses giving:
n
X n
X
gk (x) = ti − ti = tk
i=k i=k+1

44 of 66
• Hence, the pdf simplifies to,
 
n
gk (x) = tk = k F (x)k−1 (1 − F (x))n−k f (x)
k
• Special cases: minimum and maximum,

g1 (x) = n (1 − F (x))n−1 f (x)


gn (x) = n F (x)n−1 f (x)

• These are consistent with these familiar expressions for the tails of
these distributions from Probability:

Pr(X(1) > x) = (1 − F (x))n


Pr(X(n) ⩽ x) = F (x)n

45 of 66
Alternative derivation of the pdf of X(k)
• Heuristically,

Pr(X(k) ≈ x) = Pr(x − 21 dy < X(k) ⩽ x + 12 dy) ≈ gk (x) dy


• Need to observe Xi such that:
◦ k − 1 are in −∞, x − 12 dy


◦ One is in x − 21 dy, x + 12 dy


◦ n − k are in x + 12 dy, ∞


• Trinomial distribution (3 outcomes), event probabilities:


1
Pr(Xi ⩽ x − dy) ≈ F (x)
2
1 1
Pr(x − dy < Xi ⩽ x + dy) ≈ f (x) dy
2 2
1
Pr(Xi > x + dy) ≈ 1 − F (x)
2

46 of 66
• Putting these together,

n!
gk (x) dy ≈ F (x)k−1 (1 − F (x))n−k f (x) dy
(k − 1)! 1! (n − k)!
• Dividing both sides by dy gives the pdf of X(k)

47 of 66
Example (boundary estimate)

• X1 , . . . , X5 ∼ Unif(0, θ)
• Likelihood is
 
5
 1

0 ⩽ xi ⩽ θ, i = 1, . . . , 5
L(θ) = θ

0 otherwise (i.e. if θ < xi for some i)

• We have seen earlier that the MLE here is θ̂ = max(Xi ) = X(5)


• Now,
 x 4  1  5x4
g5 (x) = 5 = , 0⩽x⩽θ
θ θ θ5

48 of 66
• Then,
θ  6 θ
5x4
Z
5x 5
E(X(5) ) = x dx = = θ
0 θ5 6θ5 0 6
• So the MLE X(5) is biased
• (But 56 X(5) is unbiased as we quoted in Module 2)

49 of 66
Uniform distribution, n = 4

Order statistics
k = 4 (max)
Probability density

0 E(X(4)) θ

x
50 of 66
Reminder: Inverse transformation method

• If continuous RV X has cdf F , then F (X) ∼ Unif(0, 1)


• Proof: for 0 ⩽ w ⩽ 1,

G(w) = Pr(F (X) ⩽ w) = Pr(X ⩽ F −1 (w)) = F (F −1 (w)) = w

so the density is

g(w) = G′ (w) = 1, 0⩽w⩽1

so F (X) ∼ Unif(0, 1).


• Remember F (X) is a random variable - it is just a transform of X!

51 of 66
Distribution on cdf scale

• Since F is non-decreasing, we can transform the Order Statistics


for X onto a cdf scale:

F (X(1) ) < F (X(2) ) < · · · < F (X(n) )

• As F (X) ∼ Unif(0, 1), then Wi ≡ F (X(i) ) are just the Order


Statistics from a Unif(0, 1) distribution!

52 of 66
Distribution on cdf scale

• The cdf of Wi is Gi (w) = w, for 0 < w < 1


• So the pdf of kth order statistic Wk = F (X(k) ) is
 
n k−1
gk (w) = k w (1 − w)n−k 0<w<1
k
• This is a special distribution which will come up later in the course
(e.g. in Bayesian statistics). It is called a Beta distribution.

53 of 66
Beta distribution

• A distribution over the unit interval, p ∈ [0, 1]


• Two parameters: α, β > 0
• Notation: P ∼ Beta(α, β)
• The pdf is:

Γ(α + β) α−1
f (p) = p (1 − p)β−1 , 0⩽p⩽1
Γ(α) Γ(β)
• Γ is the gamma function, a generalisation of the factorial function.
Note that Γ(n) = (n − 1)!

54 of 66
Beta distribution

• Properties:
α
E(P ) =
α+β
α−1
mode(P ) = (α, β > 2)
α+β−2
αβ
var(P ) =
(α + β)2 (α + β + 1)
• The Beta has a large variety of shapes depending on the parameter
values including a uniform distribution when α = 1 and β = 1.

55 of 66
Order Statistics on Uniform(0,1)

• So for our Wi , the Order Statistics on a Unif(0, 1), we can write:

Wi = F (X(k) ) ∼ Beta(k, n − k + 1)

• We also have:

k
E(Wk ) =
n+1
k−1
mode(Wk ) =
n−1

56 of 66
Quantile Types 6 and 7

• In fact this throws light on the definitions of Type 6 and Type 7


quantiles
• For Type 6 quantiles x(k) estimates c k which reflects E(Wk )
n+1
(’unbiased for mean’)
• For Type 7 quantiles x(k) estimates c k−1 which reflects mode(Wk )
n−1
(’unbiased for mode’)
• We note this in passing and will not develop the theory in any
more depth for these or the many other quantile types!

57 of 66
Uniform distribution, n = 5

Order statistics
5

k = 1 (min)
k=2
k = 3 (median)
4
Probability density

3
2
1
0

0.0 0.2 0.4 0.6 0.8 1.0

x
58 of 66
Outline

Sampling distributions - important Point estimators

Sample Mean

Sample Variance

Sample Proportion, Empirical pmf and cdf

Order Statistics

Sample Quantiles

59 of 66
Sample Quantile definition

• Whilst we have seen several definitions of Sample Quantiles, as the


sample size n tends to infinity the discrepancies between them
tend to zero, so in this section we will work with the natural
estimator using the sample cdf F̂
• F (cq ) ≡ q defines the population quantile cq
• F̂ (ĉq ) = q defines the qth sample quantile (estimate) ĉq
• F̂ (Ĉq ) = q defines the qth Sample Quantile (Estimator) Ĉq
• the distinction between the two definitions - of the ĉq estimate and
the Ĉq Estimator - is quite subtle!

60 of 66
Sample Quantile definition

• In the expression F̂ (ĉq ) = q it is implicit that F̂ refers to the fixed


F̂ observed for our given sample i.e. determined by the sample
x1 , x2 , . . . , xn .
• In the expression F̂ (Ĉq ) = q it is implicit that F̂ refers to the
random function F̂ reflecting how the sample cdf varies over
successive samples, X1 , X2 , . . . , Xn .

61 of 66
Sample Quantile distribution
We derive rough approximations for the mean and variance of Ĉq for
sampling from a continuous distribution. These approximations are
asymptotically valid and perform reasonably well provided the sample
size n is not too small. We have:

F (Ĉq ) ≈ F (cq ) + (Ĉq − cq )f (cq ) using tangent to F


F̂ (Ĉq ) ≈ F̂ (cq ) + (Ĉq − cq )f (cq ) because F̂ ≈ F

Since F̂ (Ĉq ) ≈ q, we have

F̂ (cq ) − q
Ĉq ≈ cq −
f (cq )

62 of 66
Sample Quantile distribution

• Remember Ĉq and F̂ are both random variables in this derivation


• We have already seen that:

q(1 − q)
E(F̂ (cq )) = F (cq ) = q and var(F̂ (cq )) =
n
• It follows that:

q(1 − q)
E(Ĉq ) ≈ cq and var(Ĉq ) ≈
nf (cq )2

63 of 66
Sample Quantile distribution

• Hence we see that Ĉq is asymptotically unbiased for cq and as its


variance also tends to 0 as n → ∞ it is also consistent for cq .
• It is also true (though we will not give a proof) that Ĉq is
asymptotically normal:
 
q(1 − q)
Ĉq ≈ N cq , as n → ∞
nf (cq )2

64 of 66
Example: Sample Median for normal

• For q = 0.5, and population median c0.5 = m we have:

1
E(Ĉ0.5 ) ≈ m and var(Ĉ0.5 ) ≈
4nf (m)2

• If X ∼ N(µ, σ 2 ), we have c0.5 = m = µ, and f (c0.5 ) = √1


σ 2π
• So for a random sample of size n, E(Ĉ0.5 ) ≈ µ and
πσ 2
var(Ĉ0.5 ) ≈ 2n .
• So for sampling on a normal distribution:

var(Ĉ0.5 ) ≈ 1.57 var(X̄)

65 of 66
Sample Quantile distribution

• So when sampling on a normal, the Sample Median Ĉ0.5 is not as


efficient as the Sample Mean X̄
• In general the variance of the Sample Median depends on f (c0.5 ) -
the intensity of the population density around the median.
• If f (m) is high enough - which occurs for some distributions - then
Ĉq can be the more efficient choice.
• This will be explored further in the tute/labs.

66 of 66

You might also like