Module03 Slides Print
Module03 Slides Print
(Module 3)
Semester 2, 2022
Aims of this module
2 of 82
Outline
Standard error
Confidence intervals
Introduction
Definition
Important distributions
Pivots
Common scenarios
3 of 82
Statistics: the big picture
4 of 82
How useful are point estimates?
Example: surveying Melbourne residents as part of a disability study.
The results will be used to set a budget for disability support.
5 of 82
Going beyond point estimates
6 of 82
Outline
Standard error
Confidence intervals
Introduction
Definition
Important distributions
Pivots
Common scenarios
7 of 82
Report sd(Θ̂)?
Previously, we calculated the variance of our estimators.
q
Reminder: sd(Θ̂) = var(Θ̂)
This tells us a typical amount by which the estimate will vary from
one sample to another, and thus (for an unbiased estimator) how
close to the true parameter value it is likely to be.
8 of 82
Estimate sd(Θ̂)!
We know how to deal with parameter values. . . we estimate them!
Example:
Consider the sample proportion, p̂ = X/n.
We know that var(p̂) = p(1−p)
n .
Therefore, an estimate is var(p̂)
c = p̂(1−p̂)
n .
9 of 82
If we take a sample of size n = 100 and observe x = 30, we get
p̂ = 30/100 = 0.3,
r r
p̂(1 − p̂) 0.3 × 0.7
sd(p̂) =
b = = 0.046.
n 100
se(p̂) = 0.046
10 of 82
Standard error
The standard error of an estimate is the estimated standard deviation
of the estimator.
Notation:
• Parameter: θ
• Estimator: Θ̂
• Estimate: θ̂
• Standard deviation of the estimator: sd(Θ̂)
• Standard error of the estimate: se(θ̂)
12 of 82
Back to the disability example
More info:
• First survey: 5% ± 4%
• Second survey: 2% ± 0.1%
What result should we use for setting the disability support budget?
13 of 82
Outline
Standard error
Confidence intervals
Introduction
Definition
Important distributions
Pivots
Common scenarios
14 of 82
Interval estimates
Let’s go one step further. . .
The form est ± error can be expressed as an interval,
(est − error, est + error).
More general and more useful than just reporting a standard error.
For example, it can cope with skewed (asymmetric) sampling
distributions.
15 of 82
Example
Random sample (iid): X1 , . . . , Xn ∼ N(µ, 1)
or, equivalently,
1 1
Pr µ − 1.96 √ < X̄ < µ + 1.96 √ = 0.95
n n
16 of 82
0.4 shaded area is 0.95
0.3
Stan. Norm. PDF
0.2
0.1
0.0
−3 −2 −1 0 1 2 −3
x
17 of 82
Rearranging gives:
1 1
Pr X̄ − 1.96 √ < µ < X̄ + 1.96 √ = 0.95
n n
√ √
This says that the interval (X̄ − 1.96/ n, X̄ + 1.96/ n) has
probability 0.95 of containing the parameter µ.
18 of 82
Sampling distribution of the interval estimator
• Is this an estimator?
• Does it have a sampling distribution?
• What does it look like?
• There are two statistics here, the endpoints of the interval:
19 of 82
Example
For the previous example:
• Realisations of the interval will have a fixed width but a random
location
• The randomness is due to X̄
• Sampling distribution:
1 1
L ∼ N µ − 1.96 √ ,
n n
1 1
U ∼ N µ + 1.96 √ ,
n n
1
U − L = 2 × 1.96 √
n
20 of 82
• Can write it more formally as a bivariate normal distribution:
" # !
µ − 1.96 √1n 1 1 1
L
∼ N2 ,
U µ + 1.96 √1n n 1 1
21 of 82
µ
22 of 82
Interpretation
23 of 82
Example (more general)
Random sample (iid): X1 , . . . , Xn ∼ N(µ, σ 2 ), and assume that we
know the value of σ 2 .
2
The sampling distribution of the sample mean is X̄ ∼ N(µ, σn ).
Let Φ−1 (1 − α/2) = c, so we can write:
X̄ − µ
Pr −c < √ <c =1−α
σ/ n
or, equivalently,
σ σ
Pr µ − c √ < X̄ < µ + c √ =1−α
n n
24 of 82
Rearranging gives:
σ σ
Pr X̄ − c √ < µ < X̄ + c √ =1−α
n n
25 of 82
Worked example
Suppose X ∼ N(µ, 362 ) represents the lifetime of a light bulb, in
hours. Test 27 bulbs, observe x̄ = 1478.
In other words, we have good evidence that the mean lifetime for a
light bulb is approximately 1,460–1,490 hours.
26 of 82
Example (CLT approximation)
27 of 82
Shaded
Probability is 90%
-1.645 1.645
28 of 82
Definitions
29 of 82
General technique for deriving a CI
• Start with an estimator, T , whose sampling distribution is known
• Write the central probability interval based on its sampling
distribution,
Pr (π0.025 < T < π0.975 ) = 0.95
• The endpoints will depend on the parameter, θ, so can write it as,
30 of 82
Challenge problem (exponential distribution)
Take a random sample of size n from an exponential distribution with
rate parameter λ.
1. Derive an exact 95% confidence interval for λ.
2. Suppose your sample is of size 9 and has sample mean 3.93.
2.1 What is your 95% confidence interval for λ?
2.2 What is your 95% confidence interval for the population mean?
3. Repeat the above using the CLT approximation (rather than an
exact interval).
31 of 82
Recap
32 of 82
Graphical presentation of CIs
Draw CIs as ‘error bars’
● ● ● ●
33 of 82
Width of CIs
The width of a CI is controlled by various factors:
• inherent variation in the data
• choice of estimator
• confidence level
• sample size
For example, the width for the normal distribution example was:
σ
2c √
n
34 of 82
Interpreting CIs
35 of 82
Three important distributions
• χ2 -distribution
• t-distribution
• F -distribution
36 of 82
Chi-squared distribution
E(T ) = k
var(T ) = 2k
37 of 82
• Arises as the sum of iid standard normal rvs:
(n − 1)S 2
∼ χ2n−1
σ2
38 of 82
Student’s t-distribution
− k+1
Γ( k+1 ) t2
2
f (t) = √ 2 k 1+ , −∞ < t < ∞
kπ Γ( 2 ) k
E(T ) = 0, if k > 1
k
var(T ) = , if k > 2
k−2
39 of 82
• The t-distribution is similar to a standard normal but with ‘wide’
tails
• As k → ∞, then tk → N(0, 1)
• If Z ∼ N(0, 1) and U ∼ χ2 (r), and they are independent, then
Z
T =p ∼ tr
U/r
40 of 82
F -distribution
U/m
F = ∼ Fm,n
V /n
41 of 82
Pivots
Recall our general technique that starts with a probability interval
using a statistic with a known sampling distribution:
42 of 82
Remarks about pivots
• The value of the pivot can depend on the parameters, but its
distribution cannot.
• Since pivots are a function of the parameteres as well as the data,
they are usually not statistics.
• If a pivot is also a statistic, then it is called an ancillary statistic.
43 of 82
Examples of pivots
45 of 82
Normal, single mean, known σ
Random sample (iid): X1 , . . . , Xn ∼ N(µ, σ 2 ), and assume that we
know the value of σ.
46 of 82
Normal, single mean, unknown σ
Random sample (iid): X1 , . . . , Xn ∼ N(µ, σ 2 ), and σ is unknown.
47 of 82
Given α, let c be the (1 − α/2) quantile of tn−1 .
We then write:
X̄ − µ
Pr −c < √ < c = 1 − α.
S/ n
Rearranging gives:
S S
Pr X̄ − c √ < µ < X̄ + c √ =1−α
n n
48 of 82
Example (normal, single mean, unknown σ)
X ∼ N(µ, σ 2 ) is the amount of butterfat produced by a cow.
Examining n = 20 cows results in x̄ = 507.5 and s = 89.75. Let c be
the 0.95 quantile of t19 , we have c = 1.729. Therefore, a 90%
confidence interval for µ is,
89.75
507.50 ± 1.729 √ = [472.80, 542.20]
20
49 of 82
> butterfat
[1] 481 537 513 583 453 510 570 500 457 555 618 327
[13] 350 643 499 421 505 637 599 392
data: butterfat
t = 25.2879, df = 19, p-value = 4.311e-16
alternative hypothesis: true mean is not equal to 0
90 percent confidence interval:
472.7982 542.2018
sample estimates:
mean of x
507.5
> sd(butterfat)
[1] 89.75082
> qqnorm(butterfat, main = "")
> qqline(butterfat, probs = c(0.25, 0.75))
51 of 82
Sample Quantiles
52 of 82
−2
−1
0
Theoretical Quantiles
1
2
Remarks
• CIs based on a t-distribution (or a normal distribution) are of the
form:
estimate ± c × standard error
for an appropriate quantile, c, which depends on the sample size
(n) and the confidence level (1 − α).
• The t-distribution is appropriate if the sample is from a normally
distributed population.
• Can check using a QQ plot (in this example, looks adequate).
• If not normal but n is large, can construct approximate CIs using
the normal distribution (as we did in a previous example). This is
usually okay if the distribution is continuous, symmetric and
unimodal (i.e. has a single ‘mode’, or maximum value).
• If not normal and n small, distribution-free methods can be used.
We will cover these later in the semester.
53 of 82
Normal, two means, known σ
Suppose we have two populations, with means µX and µY , and want
to know how much they differ.
X̄ − Ȳ − (µX − µY )
q
2 2
∼ N(0, 1)
σX σY
n + m
54 of 82
Defining c as in previous examples, we then write,
X̄ − Ȳ − (µX − µY )
Pr −c < q
2 2
< c = 1 − α
σX σY
n + m
55 of 82
Normal, two means, unknown σ, many samples
2 and σ 2 ?
What if we don’t know σX Y
Firstly,
X̄ − Ȳ − (µX − µY )
Z= q ∼ N(0, 1)
σ2 σ2
n + m
X̄ − Ȳ − (µX − µY )
T = q
SP n1 + m 1
where s
2 + (m − 1)S 2
(n − 1)SX Y
SP =
n+m−2
is the pooled estimate of the common variance.
Note that the unknown σ has disappeared (cancelled out), therefore
making T a pivot (why?).
58 of 82
We can now find the quantile c so that
59 of 82
Example (normal, two means, unknown common
variance)
Two independent groups of students take the same test. Assume the
scores are normally distributed and have a common unknown
population variance.
60 of 82
t density 22 df
Shaded
Probability is
95%
-2.074 2.074
61 of 82
Normal, two means, unknown σ, different variances
2 ̸= σ 2 ?
What if the sample sizes are small and pretty sure that σX Y
X̄ − Ȳ − (µX − µY )
W = q
2
SX SY2
n + m
62 of 82
Example (normal, two means, unknown different
variances)
We measure the force required to pull wires apart for two types of
wire, X and Y . We take 20 measurements for each wire.
1 2 3 4 5 6 7 8 9 10
X 28.8 24.4 30.1 25.6 26.4 23.9 22.1 22.5 27.6 28.1
Y 14.1 12.2 14.0 14.6 8.5 12.6 13.7 14.8 14.1 13.2
11 12 13 14 15 16 17 18 19 20
X 20.8 27.7 24.4 25.1 24.6 26.3 28.2 22.2 26.3 24.4
Y 12.1 11.4 10.1 14.2 13.6 13.1 11.9 14.8 11.1 13.5
63 of 82
30
25
20
15
10
X Y
64 of 82
Some heavily edited R output. . .
t = 18.8003 t = 18.8003
df = 33.086 df = 38
95% CI: 11.23214 13.95786 95% CI: 11.23879 13.95121
65 of 82
Remarks
• From box plots: look like very different population means and
possibly different spreads
• The Welch approximate t-distribution is appropriate so a 95%
confidence interval is 11.23–13.96
• If we assumed equal variances, the confidence interval becomes
slightly narrower, 11.24–13.95
• Not a big difference!
66 of 82
Normal, paired samples
67 of 82
• We can now use our method of inference for a single mean!
• A 100 · (1 − α)% confidence interval for µD is:
68 of 82
Example (normal, paired samples)
Summary statistics:
The reaction times (in seconds) to a red or n = 8,
green light for 8 people are given in the d¯ = 0.0125,
following table. Find a 95% CI for the mean sd = 0.129
difference in reaction time.
Red (X) Green (Y ) D =X −Y
1 0.30 0.24 0.06 95% CI:
2 0.43 0.27 0.16
3 0.23 0.36 −0.13
0.129
0.0125 ± 2.365 √
4 0.32 0.41 −0.09 8
5 0.41 0.38 0.03 = [−0.095, 0.12]
6 0.58 0.38 0.20
7 0.53 0.51 0.02 (2.365 is the 0.975
8 0.46 0.61 −0.15 quantile of t7 )
69 of 82
Normal, single variance
Random sample (iid): X1 , . . . , Xn ∼ N(µ, σ 2 )
(n − 1)S 2
Pr a < <b =1−α
σ2
70 of 82
Rearranging gives
a 1 b
1 − α = Pr < 2 <
(n − 1)S 2 σ (n − 1)S 2
(n − 1)S 2 (n − 1)S 2
2
= Pr <σ <
b a
(n − 1)s2 (n − 1)s2
,
b a
71 of 82
Example (normal, single variance)
Sample n = 13 seeds from a N(µ, σ 2 ) population.
with the 0.05 and 0.95 quantiles from a χ212 distribution being 5.226
and 21.03.
72 of 82
Normal, two variances
Now we wish to compare the variances of two normally distributed
populations. Random samples (iid) from each population:
2 ) and Y , . . . , Y ∼ N(µ , σ 2 )
X1 , . . . , Xn ∼ N(µX , σX 1 m Y Y
2 /σ 2 . Start by defining:
We will compute a confidence interval for σX Y
SY2 (m−1)SY2
h i
2
σY 2
σY
/(m − 1)
2
SX
= h 2
(n−1)SX
i
σ2 σ2
/(n − 1)
X X
73 of 82
We now need the α/2 and 1 − α/2 quantiles of Fm−1,n−1 .
Call these c and d. In other words,
SY2 /σY2 2
SX 2
σX 2
SX
1 − α = Pr(c < 2 /σ 2 < d) = Pr(c < < d )
SX X SY2 σY2 SY2
2 /σ 2 as
Rearranging gives the 100 · (1 − α)% confidence interval for σX Y
2
s2
s
c x2 , d x2
sy sy
74 of 82
Example (normal, two variances)
Continuing from the previous example, n = 13 and 12s2x = 128.41.
A sample of m = 9 seeds from a second strain gave 8s2y = 36.72.
The 0.01 and 0.99 quantiles of F8,12 are 0.176 and 4.50.
75 of 82
Single proportion
X1 , X2 , . . . , Xn ∼ Be(p)
76 of 82
• The central limit theorem shows for large n,
p̂ − p
p ≈ N (0, 1)
p(1 − p)/n
77 of 82
Example (single proportion)
78 of 82
Example 2 (single proportion)
79 of 82
Two proportions
p̂ − p̂2 − (p1 − p2 )
q1 ≈ N(0, 1)
p1 (1−p1 ) p2 (1−p2 )
n1 + n2
80 of 82
Example (two proportions)
Following on from the previous Newspoll example. . .
• At the previous poll, with 1,824 voters sampled, there were 37% of
voters who reported that they would vote for the Government first.
Has the vote dropped? What is a 90% confidence interval for the
difference in proportions in the population on the two occasions?
• The CI is
r
0.36 × 0.64 0.37 × 0.63
0.36−0.37±1.6449 + = [−0.037, 0.017]
1708 1824
• This interval comfortably surrounds 0, meaning there is no
evidence of a change in public opinion.
• This analysis allows for sampling variability in both polls, so is the
preferred way to infer whether the vote has dropped.
81 of 82
Example 2 (two proportions)
Two detergents. First successful in 63 out of 91 trials, the second in
42 out of 79.
82 of 82