Mit18 05 s22 Class23-Prep-B

Confidence Intervals for the Mean of Non-normal Data
Class 23, 18.05

Jeremy Orloff and Jonathan Bloom
1 Learning Goals
1. Be able to derive the formula for conservative normal confidence intervals for the
proportion 𝜃 in Bernoulli data.
2. Be able to find rule-of-thumb 95% confidence intervals for the proportion 𝜃 of a

Bernoulli distribution.
3. Be able to find large sample confidence intervals for the mean of a general distribution.
2 Introduction
So far, we have focused on constructing confidence intervals for data drawn from a normal
distribution. We’ll now switch gears and learn about confidence intervals for the mean when
the data is not necessarily normal.
We will first look carefully at estimating the probability 𝜃 of success when the data is drawn
from a Bernoulli(𝜃) distribution – recall that 𝜃 is also the mean of the Bernoulli distribution.
Then we will consider the case of a large sample from an unknown distribution. In this case
we can appeal to the central limit theorem to justify the use 𝑧-confidence intervals.
3 Bernoulli data and polling
One common use of confidence intervals is for estimating the proportion 𝜃 in a Bernoulli(𝜃)
distribution. For example, suppose we want to use a political poll to estimate the proportion
of the population that supports candidate A, or equivalent the probability 𝜃 that a random
person supports candidate A. In this case we have a simple rule-of-thumb that allows us to
quickly compute a confidence interval.
3.1 Conservative normal confidence intervals
Suppose we have i.i.d. data 𝑥1 , 𝑥2 , … , 𝑥𝑛 all drawn from a Bernoulli(𝜃) distribution. then
a conservative normal (1 − 𝛼) confidence interval for 𝜃 is given by
1
𝑥 ± 𝑧𝛼/2 ⋅ √ . (1)
2 𝑛
The proof given below uses the central limit theorem and the observation that 𝜎 = √𝜃(1 − 𝜃) ≤
1/2.
You will also see in the derivation below that this formula is conservative, providing an ‘at
least (1 − 𝛼)’ confidence interval.
1
18.05 Class 23, Confidence Intervals for the Mean of Non-normal Data , Spring 2022 2
Example 1. A pollster asks 196 people if they prefer candidate A to candidate B and finds
that 120 prefer 𝐴 and 76 prefer 𝐵. Find the 95% conservative normal confidence interval
for 𝜃, the proportion of the population that prefers 𝐴.
Solution: We have 𝑥 = 120/196 = 0.612, 𝛼 = 0.05 and 𝑧0.025 = 1.96. The formula says a
95% confidence interval is
1.96
𝐼 ≈ 0.612 ± = 0.612 ± 0.007.
2 ⋅ 14
3.2 Proof of Formula 1
The proof of Formula 1 will rely on the following fact.

Fact. The standard deviation of a Bernoulli(𝜃) distribution is at most 0.5.
Proof of fact: Let’s denote this standard deviation by 𝜎𝜃 to emphasize its dependence on
𝜃. The variance is then 𝜎𝜃2 = 𝜃(1 − 𝜃). It’s easy to see using calculus or by graphing this
parabola that the maximum occurs when 𝜃 = 1/2. Therefore the maximum variance is 1/4,
which implies that the standard deviation 𝜎𝑝 is less the √1/4 = 1/2.
Proof of formula (1). The proof relies on the central limit theorem which says that (for
large 𝑛) the distribution of 𝑥 is approximately normal with mean 𝜃 and standard deviation
√
𝜎𝜃 / 𝑛. For normal data we have the (1 − 𝛼) 𝑧-confidence interval
𝜎
𝑥 ± 𝑧𝛼/2 ⋅ √𝜃
𝑛
The trick now is to replace 𝜎𝜃 by 12 : since 𝜎𝜃 ≤ 1

2 the resulting interval around 𝑥
1
𝑥 ± 𝑧𝛼/2 ⋅ √
2 𝑛
√
is always at least as wide as the interval using ± 𝜎𝜃 / 𝑛. A wider interval is more likely to
contain the true value of 𝜃 so we have a ‘conservative’ (1 − 𝛼) confidence interval for 𝜃.
Again, we call this conservative because 2√1𝑛 overestimates the standard deviation of 𝑥,̄
resulting in a wider interval than is necessary to achieve a (1 − 𝛼) confidence level.
3.3 How political polls are reported
Political polls are often reported as a value with a margin-of-error. For example you might
hear
52% favor candidate A with a margin-of-error of ±5%.
The actual precise meaning of this is
if 𝜃 is the proportion of the population that supports A then the point
estimate for 𝜃 is 52% and the 95% confidence interval is 52% ± 5%.
Notice that reporters of polls in the news do not mention the 95% confidence. You just
have to know that that’s what pollsters do.
The 95% rule-of-thumb confidence interval.

Recall that the (1 − 𝛼) conservative normal confidence interval is
1
𝑥 ± 𝑧𝛼/2 ⋅ √ .
2 𝑛
If we use the standard approximation 𝑧0.025 = 2 (instead of 1.96) we get the rule-of thumb
95% confidence interval for 𝜃:
1
𝑥± √ .
𝑛
Example 2. Polling. Suppose there will soon be a local election between candidate 𝐴 and
candidate 𝐵. Suppose that the fraction of the voting population that supports 𝐴 is 𝜃.
Two polling organizations ask voters who they prefer.
1. The firm of Fast and First polls 40 random voters and finds 22 support 𝐴.
2. The firm of Quick but Cautious polls 400 random voters and finds 190 support 𝐴.
Find the point estimates and 95% rule-of-thumb confidence intervals for each poll. Explain
how the statistics reflect the intuition that the poll of 400 voters is more accurate.
Solution: For poll 1 we have
Point estimate: 𝑥 = 22/40 = 0.55
1 1
Confidence interval: 𝑥 ± √ = 0.55 ± √ = 0.55 ± 0.16 = 55% ± 16%.
𝑛 40
For poll 2 we have
Point estimate: 𝑥 = 190/400 = 0.475
1 1
Confidence interval: 𝑥 ± √ = 0.475 ± √ = 0.475 ± 0.05 = 47.5% ± 5%.
𝑛 400
The greater accuracy of the poll of 400 voters is reflected in the smaller margin of error, i.e.
5% for the poll of 400 voters vs. 16% for the poll of 40 voters.
Other binomial proportion confidence intervals

There are many methods of producing confidence intervals for the proportion 𝑝 of a binomial(𝑛,
𝑝) distribution. For a number of other common approaches, see:
https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
4 Large sample confidence intervals
One typical goal in statistics is to estimate the mean of a distribution. When the data follows
a normal distribution we could use confidence intervals based on standardized statistics to
estimate the mean.
But suppose the data 𝑥1 , 𝑥2 , … , 𝑥𝑛 is drawn from a distribution with pmf or pdf 𝑓(𝑥) that
may not be normal or even parametric. If the distribution has finite mean and variance
and if 𝑛 is suﬀiciently large, then the following version of the central limit theorem shows
we can still use a standardized statistic.
Central Limit Theorem: For large 𝑛, the sampling distribution of the studentized mean
𝑥̄ − 𝜇
is approximately standard normal: √ ≈ N(0, 1).
𝑠/ 𝑛
So for large 𝑛 the (1 − 𝛼) confidence interval for 𝜇 is approximately

𝑠 𝑠
[𝑥̄ − √ ⋅ 𝑧𝛼/2 , 𝑥̄ + √ ⋅ 𝑧𝛼/2 ]
𝑛 𝑛
where 𝑧𝛼/2 is the 𝛼/2 critical value for N(0, 1). This is called the large sample confidence
interval.
Example 3. How large must 𝑛 be?
Recall that a type 1 CI error occurs when the confidence interval does not contain the true
value of the parameter, in this case the mean. Let’s call the value (1 − 𝛼) the nominal
confidence level. We say nominal because unless 𝑛 is large we shouldn’t expect the true
type 1 CI error rate to be 𝛼.
We can run numerical simulations to approximate of the true confidence level. We expect
that as 𝑛 gets larger the true confidence level of the large sample confidence interval will
converge to the nominal value.
We ran such simulations for 𝑥 drawn from the exponential distribution exp(1) (which is far
from normal). For several values of 𝑛 and nominal confidence level 𝑐 we ran 100,000 trials.
Each trial consisted of the following steps:
1. draw 𝑛 samples from exp(1).
2. compute the sample mean 𝑥̄ and sample standard deviation 𝑠.
𝑠
3. construct the large sample 𝑐 confidence interval: 𝑥 ± 𝑧𝛼/2 ⋅ √ .
𝑛
4. check for a type 1 CI error, i.e. see if the true mean 𝜇 = 1 is not in the interval.
With 100,000 trials, the empirical confidence level should closely approximate the true level.
For comparison we ran the same tests on data drawn from a standard normal distribution.
Here are the results.
nominal conf. nominal conf.
𝑛 1−𝛼 simulated conf. 𝑛 1−𝛼 simulated conf.
20 0.95 0.905 20 0.95 0.936
20 0.90 0.856 20 0.90 0.885
20 0.80 0.762 20 0.80 0.785
50 0.95 0.930 50 0.95 0.944
50 0.90 0.879 50 0.90 0.894
50 0.80 0.784 50 0.80 0.796
100 0.95 0.938 100 0.95 0.947
100 0.90 0.889 100 0.900 0.896
100 0.80 0.792 100 0.800 0.797
400 0.95 0.947 400 0.950 0.949
400 0.90 0.897 400 0.900 0.898
400 0.80 0.798 400 0.800 0.798
Simulations for exp(1) Simulations for N(0, 1).
For the exp(1) distribution we see that for 𝑛 = 20 the simulated confidence of the large
sample confidence interval is less than the nominal confidence 1 − 𝛼. But for 𝑛 = 100 the
simulated confidence and nominal confidence are quite close. So for exp(1), 𝑛 somewhere
between 50 and 100 is large enough for most purposes.
Think: For 𝑛 = 20 why is the simulated confidence for the N(0, 1) distribution is smaller
than the nominal confidence?
This is because we used 𝑧𝛼/2 instead of 𝑡𝛼/2 . For large 𝑛 these are quite close, but for 𝑛 = 20
there is a noticable difference, e.g. 𝑧0.025 = 1.96 and 𝑡0.025 = 2.09.
MIT OpenCourseWare
https://ocw.mit.edu
18.05 Introduction to Probability and Statistics

Spring 2022
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

Mit18 05 s22 Class23-Prep-B

Uploaded by

Copyright:

Available Formats

Mit18 05 s22 Class23-Prep-B

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mit18 05 s22 Class23-Prep-B

Uploaded by

Copyright:

Available Formats

Confidence Intervals for the Mean of Non-normal Data

Class 23, 18.05

2. Be able to find rule-of-thumb 95% confidence intervals for the proportion 𝜃 of a

3 Bernoulli data and polling

3.1 Conservative normal confidence intervals

3.2 Proof of Formula 1

The proof of Formula 1 will rely on the following fact.

The trick now is to replace 𝜎𝜃 by 12 : since 𝜎𝜃 ≤ 1

3.3 How political polls are reported

The 95% rule-of-thumb confidence interval.

Other binomial proportion confidence intervals

4 Large sample confidence intervals

So for large 𝑛 the (1 − 𝛼) confidence interval for 𝜇 is approximately

18.05 Introduction to Probability and Statistics

You might also like