Determining CI For Mean
Determining CI For Mean
Starting point
• A sample of values is picked up from a population (or from a different point
of view: sample of values of random variable X) and sample mean 𝑥ҧ is
calculated. The sample mean is an unbiased estimate for the population
mean (or the expected value of X) 𝜇. On the other words, it can be said to be
”the best guess” for the population mean. Usually 𝑥ҧ and 𝜇 are not equal.
• If another sample is collected, probably the sample mean is slightly different
than on the first time. Also the sample mean is random variable.
• Let’s suppose that many samples (large enough) are collected. From the
central limit theorem it follows that the sample mean is approximately
normally distributed.
𝜎
The mean of the distribution is 𝜇 and the standard
deviation is , where 𝜎 is the standard deviation of whole population (or X)
𝑛
and 𝑛 is sample size. (The bigger is 𝑛, the smaller is variation of sample
mean).
𝜎
• Quantity s = is also called the standard error of the mean.
𝑛
95% confidence interval of mean
• So, sample mean 𝑥ҧ obeys distribution N(𝜇, 𝑠 2 ) (standard deviation is 𝑠).
• What is the distance d such as 𝑃 𝑥ҧ − 𝑑 ≤ 𝜇 ≤ 𝑥ҧ + 𝑑 = 95% ?
ҧ
𝑥−𝜇
• Let Z be standardized random variable 𝑍 = . Then Z~N(0,1). Now
𝑠
𝑃 𝑥ҧ − 𝑑 ≤ 𝜇 ≤ 𝑥ҧ + 𝑑
= 𝑃 𝜇 − 𝑑 ≤ 𝑥ҧ ≤ 𝜇 + 𝑑
𝜇−𝑑−𝜇 𝜇+𝑑−𝜇
=𝑃 ≤𝑍≤
𝑠 𝑠
−𝑑 𝑑
=𝑃 ≤𝑍≤
𝑠 𝑠
• This probability should be equal to 0,95. Because symmetry, it happens if
𝑑
𝑃 𝑍≤ = 0,975
𝑠
𝑑 𝑑
(the probability to exceed limit , as well as to go under limit − , is 2,5%).
𝑠 𝑠
Conclusion
𝑑
• So, equation 𝑃 𝑍 ≤ = 0,975 is valid, if d/s is the 0,975-fractile (*) of
𝑠
standard normal distribution. Using table of values of CDF of standard
normal distribution (see e.g.
https://en.wikipedia.org/wiki/Standard_normal_table), or Excel command
NORM.S.INV, the 0,975-fractile can be seen to be (approximately) 1,96.
𝑑
• Thus = 1,96, so 𝑑 = 1,96 ⋅ 𝑠 or
𝑠
𝝈
𝒅 = 𝟏, 𝟗𝟔 ⋅
𝒏
• This is the formula for margin of error with confidence level 95%. The
limits of the confidence interval are 𝑥ҧ − 𝑑 and 𝑥ҧ + 𝑑.
• The margin of error with confidence level 1 − 𝛼 is obtained, when value
𝛼
1,96 in the formula is replaced by the (1 − )-fractile.
2
*) By definition, p-fractile of distribution of random variable Z is value z such as 𝑃 𝑍 ≤ 𝑧 = 𝑝.
An example
• Machine parts of certain type are produced in a factory and average mass
of them is investigated. In a sample of 20 parts the sample mean is
observed to be 129,0 grams. The population standard deviation is
supposed to be 6. Let’s find the 99% confidence interval of the mean.
0,01
• Now the fractile for probability 1
− = 0,995 is needed. It is
2
approximately 2,576. Thus
𝜎 6
𝑑 = 2,576 ⋅ = 2,576 ⋅ ≈ 3,46 ≈ 3,5.
𝑛 20
• The confidence interval is [129-3,5 ; 129+3,5] or [125,5 ; 132,5].
• The margin of error 𝑑 can be also calculated with Excel command
CONFIDENCE.NORM.
Unknown standard deviation?
• In practice the standard deviation of the whole distribution (or population) is
usually unknown.
• Then sample standard deviation can be used instead of population standard
deviation, and sample mean can be supposed to obey Student’s t-distribution
with degrees of freedom n-1.
• In the formula of the margin of error the fractile of normal distribution is to be
replaced by the corresponding fractile of t-distribution (those values are
approximately equal if the sample size is very large).
• Also Excel command CONFIDENCE.T can be used to calculate margin of error in
this case.
• However, if sample size is large enough, the difference between normal
distribution and t-distribution is small. In this case the both commands,
CONFIDENCE.T and CONFIDENCE.NORM, give approximately equal margin of
error.