0% found this document useful (0 votes)
180 views12 pages

Statistics and Probability Module 5 Moodle - Copy

This document discusses estimating population means using confidence intervals. It contains three key points: 1) It introduces estimating population means using confidence intervals and divides the topic into three cases based on what is known about the population variance. 2) It derives the formula for constructing a confidence interval for the population mean when the population variance is known. 3) It provides two examples of using the formula to calculate 95% and 99% confidence intervals for population means based on sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
180 views12 pages

Statistics and Probability Module 5 Moodle - Copy

This document discusses estimating population means using confidence intervals. It contains three key points: 1) It introduces estimating population means using confidence intervals and divides the topic into three cases based on what is known about the population variance. 2) It derives the formula for constructing a confidence interval for the population mean when the population variance is known. 3) It provides two examples of using the formula to calculate 95% and 99% confidence intervals for population means based on sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1

Statistics and Probability


Module 5: Estimating Population Means

Objectives: At the end of the unit, I can

1. identify the appropriate form of the confidence interval estimator for the
population mean when
a. the population variance is known;
b. the population variance is unknown; and
c. the central limit theorem (CLT) is to be used.
2. illustrate the t-distribution.
3. construct a t-distribution.
4. identify regions under the t-distribution corresponding to different t-values.
5. compute for the confidence interval estimate based on the appropriate form of
the estimator for the population mean.
6. solve problems involving confidence interval estimation of the population mean.
7. draw conclusion about the population mean based on its confidence interval
estimate.

Introduction:

Like estimating from the sample mean. Estimating the population mean is also considered in most
business research. The head of any department might want to consider the estimated number of tardiness
his or her employees have incurred for a given period of time. This is tedious on his or her part if he or she
has to look and check daily. Taking a considerable number of samples of the population is enough to have a
rough estimate of tardiness incurred by employees in his or her department. This would save time and can
represent the mean tardiness of the population.

As discussed in the previous module, the sample mean, which is a statistic, can be
used to estimate the population mean, which is a parameter. This is the case for point
estimation of the population mean. However, this module will focus on interval
estimation of the population mean. The module will be divided intro three cases,
namely:
1. Estimating the population mean given the population variance.
2. Estimating the population mean using the central limit theorem
3. Estimating the population mean when the population variance is not given

Estimating the Mean of a Normal Population with Known Variance

In the previous module, you have learned how to write confidence intervals using the
standard normal distribution. But as you learned in unit 1, not all normal distributions are
𝑥̅ −𝜇
standard and therefore, you can use the z-score transformation formula z = , where 𝑥̅
𝜎
is the sample mean, 𝜇 is the population mean, and 𝜎 is the population standard deviation.
Further, you have learned that as a result of the central limit theorem, the following z
(𝑥 ̅−𝜇 )
formula for sample mean can be used. z = 𝜎
√𝑛
2

From this formula, we will derive the formula for the confidence interval of the population mean( µ ).
Hence,
(𝑥 ̅−𝜇 ) (𝑥 ̅−𝜇 )
z= 𝜎 is the same as 𝜎 =z
√𝑛 √𝑛

(𝑥 ̅−𝜇 ) 𝑧
𝜎 = Solve for the confidence interval of the population mean (µ).
1
√𝑛
𝜎
1(𝑥̅ - µ) = z Cross multiply.
√𝑛
𝜎
𝑥̅ - µ = z Simplify and solve for µ
√𝑛
𝜎
-µ = z - 𝑥̅ Transpose 𝑥̅ to the right.
√𝑛
𝜎
-1(- µ = z - 𝑥̅ ) Multiply the equation by -1 to remove the negative of (- µ).
√𝑛
𝜎
µ = - z + 𝑥̅ Rearranging the formula, we have
√𝑛
𝜎
µ = 𝑥̅ - z
√𝑛

Using the property of symmetry and this equation, you can now generalize the formula for the
confidence interval of the population mean from a given confidence level, sample mean 𝑥̅ , and population
variance 𝜎.

𝜎 𝜎
𝑥̅ - z 𝑥̅ + z These are the interval of the population mean.
√𝑛 √𝑛

𝜎 𝜎
̅𝑥 - z 𝜇 𝑥̅ + z
√𝑛 √𝑛

𝜎 𝜎
𝑥̅ - 𝑧𝑎 ( ) < 𝜇 < 𝑥̅ + 𝑧𝑎 ( )
2 √𝑛 2 √𝑛

Note that this formula applies when estimating an unknown mean of a normally distributed
population and with known variance by using a sample with any size n.
Study the example below:

Example 1: Consider a random sample of 36 items taken from a normally distributed population with a
sample mean of 211. Compute a 95% confidence interval for 𝜇 if the population standard
deviation is 23.

Solution:
Given:
C= 95% = 0.95 Change percent to decimal.
𝐶
C = 0.95, = 0.475. Divide C by 2.
2
Thus, 𝑧𝑎 = 𝑧0.025 corresponds to the table value of 0.475. Using the z- distribution table, 𝑧0.025 = ± 1.96.
2
(the area is 0.475 corresponds in the z-table 1.9 column 0.06.(1.9+0.06= 1.96)
Given:
𝑥̅ = 211 ; 𝑧𝑎 = ± 1.96 ; 𝜎 = 23 ; n = 36 ;
2

47.5% = 0.475 47.5% = 0.475

0.025 0.025
3

Substituting the known values to the equation:

Given:
𝑥̅ = 211 ; 𝑧𝑎 = ± 1.96 ; 𝜎 = 23 ; n = 36 ;
2

𝜎 𝜎
𝑥̅ - 𝑧𝑎 ( ) < 𝜇 < 𝑥̅ + 𝑧𝑎 ( ) Substitute the given values.
2 √𝑛 2 √𝑛

23 23
211 – 1.96( )< 𝜇 < 211 + 1.96 ( ) Work first inside the parenthesis.
√36 √36
23 23
211 – 1.96( ) < 𝜇 < 211 + 1.96 ( )
6 6

211 – 1.96 (3.833333) < 𝜇 < 211 + 1.96 (3.833333) Six decimal places only.
211 – 7.513333 < µ < 211 + 7.513333 Six decimal places only.
Hence, the confidence interval for the population mean is,

203.49 < 𝜇 < 218.53 final answer to the nearest hundredths.

Example 2: Suppose that the systolic blood pressures of a certain population are normally distributed with
𝜎 = 8 and a sample with size 25 is taken from this population. The average of the systolic blood
pressures of these 25 individuals is found to be 122. Find the 99% confidence interval of the
average systolic pressure for all the members of the population.

Solution:
Given:
𝜎 = 8 ; n=25; 𝑥̅ = 122; C=99% = 0.99
𝐶 0.99
First, find = = 0.495.
2 2
0.495 is the area in the curve and you will locate this in your z-table. But you cannot find 0.495 in
the table. But 0.495 is between 0.4949 and 0.4951 with their corresponding z value of 2.57 and 2.58,
respectively as shown below: What is the approximate z-value of 0.495?

0.4949 0.495 0.4951


2.57 ? 2.58
2.57+2.58 5.15
= = 2.575
2 2

Hence, 0.495 has a z-value of 2.575.

𝑧𝑎 = 𝑧0.025 = ±2.575 because the z-value of 2.575 corresponds to the probability 0.495.
2

Substitute the given in the formula below:

𝜎 𝜎
𝑥̅ - 𝑧𝑎 ( ) < 𝜇 < 𝑥̅ + 𝑧𝑎 ( ) Substitute the given in the formula.
2 √𝑛 2 √𝑛

8 8
122 – 2.575( )< 𝜇 < 122 + 2.575 ( ) Work first inside the parenthesis
√25 √25
8 8
122 – 2.575( ) < 𝜇 < 122 + 2.575 ( ) Work first inside the parenthesis
5 5

122 – 2.575(1.6) < µ < 122 + 2.575 (1.6)


122 – 4.12 < µ < 122 + 4.12
117.88 < 𝜇 < 126.12

This means that the mean systolic blood pressure of the population lies between 117.88 and 126.12
on a 99% confidence interval.

Assessment 5.1
4

Estimating Population Mean Using a Large Sample: Applying the Central Limit Theorem

Recall from unit I that according to the central limit theorem, regardless if the
population distribution of X is normally distributed or not, as the sample size n gets larger,
the shape of the distribution of the sample means taken from the population approaches a
𝜎
normal distribution, with mean 𝜇 and the standard deviation . This implies that you can
√𝑛
𝜎 𝜎
still use the formula 𝑥̅ - 𝑧𝑎 ( ) < 𝜇 < 𝑥̅ + 𝑧𝑎 ( ) to estimate the unknown mean of a given
2 √𝑛 2 √𝑛
population that is not normally distributed as long as the sample size is kept larger, i.e.,
n>30. Moreover, as the sample size gets larger, the value of the sample standard deviation
s gets closer to the value of 𝜎. This means that you can also apply the z-statistic to estimate
the value of 𝜇 even when 𝜎 is unknown provided that n ≥ 30.

Study the example below.

Example 3: Kim wants to estimate the average age of teachers in a certain town with a known standard
deviation of 4. If she surveys a sample of 100 teachers and determined a sample mean of 35,
compute a 95% confidence interval for the average age of teachers.

Solution:
Given: Population standard deviation ( ) = 4; n= 100; Sample mean (𝑥̅ ) = 35; C= 95%= 0.95
𝐶 0.95
= = 0.475. The 0.475 is the area both sides in the curve as shown below. This area of
2 2
0.475 corresponds to z-value of 1.9 column 0.06 when added (1.9 + 0.06) is equal to 1.96. Hence, it has two
values ±1.96

0.475 0.475
0.025 0.025

Even though the population is not normally distributed the population mean can still be
estimated using the central limit theorem. You know that 𝑧𝑎 = 𝑧0.025 = 1.96. Thus, you use the formula
2
𝜎 𝜎
𝑥̅ - 𝑧𝑎 ( ) < 𝜇 < 𝑥̅ + 𝑧𝑎 ( )
2 √𝑛 2 √𝑛

4 4
35– 1.96( ) < 𝜇 < 35 + 1.96( ) Substitute the given in the formula.
√100 √100
4 4
35 – 1.96( ) < 𝜇 < 35 + 1.96( ) Extract the square root.
10 10

35 – 1.96(0.4) < 𝜇 < 35 + 1.96(0.4) Simplify inside the parenthesis first.


35 - 0.784 < 𝜇 < 35 + (0.784)

34.216 < 𝜇 < 35.784

34.22 𝜇 < 35.78

At 95% confidence, the mean age of the teachers in the town is estimated to fall between
34.22 and 35.78.

Assessment 5.2
5

Estimating the Mean of a Population With Unknown Variance Using a Small Sample

Through the central limit theorem, you were able to estimate the population mean
with a sufficiently large sample using the z-test statistic. However, note that using the
standard normal z-value as a test statistic given a small sample size presents two
problems. First, the shape of the sampling distribution depends on the shape of the
population. This suggests that you cannot assume that it tends to approximate a normal
distribution because the central limit theorem ensures normality only when the sample is
sufficiently large. Second, the population standard deviation is almost always unknown.
More often, it is the sample standard deviation that is known since it can easily be
computed using the observation from the selected sample. It is in these cases that
another test statistic must be used. This is called the t-test statistic. But before you
proceed to the estimation of the population mean using this test statistic, it will be of
great help if you first know the t-distribution and its properties.

The t-distribution

The t-distribution
One should remember that when samples taken from the population is large enough, the z-
distribution is used in estimating the population mean. However, there are times that the sample size used
is limited, i.e., a number less than 30. In the case of limited samples or a number of samples less than 30
taken from a randomly distributed population, it is wise to use the so-called t-distribution.
William S. Gosset was an English statistician who developed the t-distribution, which is used instead
of the z-distribution, which is used instead of the z-distribution for doing inferential statistics in the
population mean when the population standard deviation is unknown and the population is normally
distributed. The t-distribution function is basically the same as the z-distribution function, the difference
being only the replacement of the population standard deviation with the sample standard deviation. Thus,
(𝑥̅ ̅−𝜇 )
t= 𝑠
√𝑛

where 𝑥̅ = sample mean, 𝜇 = population mean, s = sample standard deviation, and n = sample size.

Different sample sizes have different distributions, thereby creating many t-tables and
making t-distribution a series of different distributions determined by its degree of
freedom. The degree of freedom (df) is simply a measure of how many values can vary in a
sample statistic. It could also mean the number of independent observations in a set of
data. For t-test, the degree of freedom is simply 1 less than the sample size, thus,
n – 1. The table below summarizes these various t-distribution table that give the critical
values of the t-statistic according to the degree of freedom along with common choices for
𝛼.
6

𝛼
df
7

Table 5.1

The t-distribution is also symmetric about the mean which is equal to 0. The following
are the other characteristics of the t-distribution, together with its graph, in
comparison to the standard normal distribution.
1. It is a bell-shaped curve symmetrical about the mean.
2. The mean of the distribution is equal to 0 and is located at the center of the
distribution.
3. The curve is asymptotic to the x-axis.
𝑑𝑓
4. The variance of the distribution is equal to , where df is the degree of
𝑑𝑓−2
freedom.
5. The variance of the distribution is always greater than 1.

t-distribution normal distribution

-4 -3 -2 -1 0 1 2 3 4
8

Example 4: Find the value of 𝑡𝑎 for a 95% confidence interval when the sample size is 25.
2

Solution:
Given: C = 95%= 0.95; n = 25;

Step 1: Find 𝛼: Use the formula 1-C:


𝛼 = 1 – C = 1- 0.95 = 0.05 The total area in a curve is 1. The difference is the 𝛼.

Hence, 𝛼 = 0.05 Substitute 𝛼 = 0.05 to


𝛼
Step 2: Find 𝑡𝛼 : Use the formula .
2 2

𝛼 0.05
𝑡𝛼 = = = 0.025 Substitute 𝛼 = 0.05.
2 2 2

Hence, 𝑡𝛼 = 0.025.
2

Step 3: Find: Degree of freedom or df. Use the formula df=n-1


df = n-1 = 25-1 = 24
df=24.
Use these results to locate in the t-table.
𝑎
With 95% confidence interval, 𝛼 = 0.05 and 2 = 0.025. The degree of freedom is 24. Thus, locating the
intersection of 24 degrees of freedom and alpha 0.025, 𝑡𝑎 = 2.0639.
2
𝛼 0.05 0.025 0.01 0.005 0.0025 0.001 0.0005
df
1 6.314 12.707 31.819 65.655 127.345 318.493 636.045
2 2.920 4.303 6.965 9.925 14.089 22.328 31.599
.
24 1.711 2.064 2.492 2.797 3.091 3.467 3.745

Therefore, the 𝑡𝑎 for a 95% confidence interval when the sample size is 25 is
2
𝑡𝑎 = 2.064
2

Example 5: Find the variance of the t-distribution with 20 degrees of freedom.

Solution:
Given: df = 20; Variance= ?

𝑑𝑓
Variance for t-distribution =
𝑑𝑓−2
20 20
= = ≈1.11
20−2 18

20 20
The degree of freedom is 20. The variance of a t-distribution is = ≈ 1.11.
20−2 18

Example 6: Use the t-distribution to find 𝑡𝑎 P(− 𝑡𝑎 < 𝑡 < 𝑡𝑎 ) = 0.95, where df = 14.
2 2 2

Given: C=0.95; df=14


Step 1: Find 𝛼: Use the formula 1-C:
𝛼 = 1 – C = 1- 0.95 = 0.05 The total area in a curve is 1. The difference is the 𝛼.

Hence, 𝛼 = 0.05 Substitute 𝛼 = 0.05 to


9

𝛼
Step 2: Find 𝑡𝛼 : Use the formula .
2 2

𝛼 0.05
𝑡𝛼 = = = 0.025 Substitute 𝛼 = 0.05.
2 2 2

Hence, 𝑡𝛼 = 0.025.
2

Step 3: Degree of freedom is given. Use the formula df=n-1


df = 14
Therefore, the 𝑡𝑎 for a 95% confidence interval when the degree of freedom is 14 is
2
𝑡𝑎 = 2.145
2

Assessment 5.3:

Estimating Population Means Using the t-distribution

𝑠
By deriving the formula for 𝜇 = 𝑥̅ + 𝑡𝑎 ( ). Now, the confidence interval for the unknown
2 √ 𝑛
population mean with unknown population variance (or standard deviation) and using a small
sample may be computed using the formula (recall what we did in the Central Limit theorem).

𝑠 𝑠
𝑥̅ - 𝑡𝑎 ( ) < 𝜇 < 𝑥̅ + 𝑡𝑧𝑎 ( ) . where;
2 √𝑛 2 √𝑛

𝑥̅ , the sample mean; s, the sample standard deviation

Example 6: Compute the 99% confidence interval for the mean number of characters printed in a large
number of copies of manuscripts when a sample with size n = 15 produced a mean of 1.24
characters and a standard deviation of 0.19

Solution:
Given: C=99% or 0.99; n = 15; s = 0.19; 𝑥̅ = 1.24.
Step 1: Find 𝛼: Use the formula, 1 – C.
𝛼= 1 – 0.99 = 0.01
𝛼
Step 2: Find 𝑡𝛼 : Use the formula .
2 2
𝛼 0.01
𝑡𝛼 = = = 0.005
2 2 2
𝑎
With 99% confidence level, 𝛼 = 0.01 and 2 = 0.005. Using 14 degrees of freedom,

𝑡𝑎 = 𝑡0.005 = 2.977. Use the formula,


2
10

𝑠 𝑠
𝑥̅ - 𝑡𝑎 ( ) < 𝜇 < 𝑥̅ + 𝑡𝑧𝑎 ( )
2 √𝑛 2 √𝑛

0.19 0.19
1.24 - 2.977 ( ) < 𝜇 < 1.24 + 2.977 ( ) Substitute the given.
√15 √15

1.24 – 2.977(0.04906) < 𝜇 < 1.24 + 2.977(0.04906) Five decimal places only.
1.24 – (0.14605) < 𝜇 < 1.24 + 0.14605 Simplify.
1.09395 < 𝜇 < 1.38605
1.09 < 𝜇 < 1.39 Round up to the nearest hundredths.
Example 7: A recent survey conducted by a telecommunications company aims to determine the average
length of call (in minutes) per month. The researcher in charge surveyed 25 Filipinos at
random and found that the average length of call they consume per month is 62 with a
standard deviation of 4. Estimate a 90% confidence interval of the average length of call (in
minutes) per month.

Solution:
Given: C=90% or 0.90; n = 25; s = 4; 𝑥̅ = 62.
Step 1: Find 𝛼: Use the formula, 1 – C.
𝛼= 1 – 0.90 = 0.10
𝛼
Step 2: Find 𝑡𝛼 : Use the formula .
2 2
𝛼 0.10
𝑡𝛼 = = = 0.05
2 2 2

𝑎
With a 90% confidence level, 𝛼 = 0.10 and 2 = 0.05. The value of 𝑡𝑎 = 𝑡0.05 = 1.711 using 24
2
degrees of freedom. By substitution, use the formula below:

𝑠 𝑠
𝑥̅ - 𝑡𝑎 ( ) < 𝜇 < 𝑥̅ + 𝑡𝑧𝑎 ( )
2 √𝑛 2 √𝑛
4 4
62 - 1.711 ( ) < 𝜇 < 62 + 1.711 ( ) Substitute the given.
√25 √25
4 4
62 – 1.711 ( ) < 𝜇 < 62 + 1.711 ( ) Extract the square root.
5 5

62 – 1.711(0.8) < 𝜇 < 62 + 1.711 (0.8)


62 – 1.3688) < 𝜇 < 62 + 1.3688

60.6312 < 𝜇 < 63.3688


60.63 < 𝜇 < 63.37 Round up to the nearest hundredths.
This means that consumers of the telecommunications company spend between 60.63 and 63.37
minutes of calls per month, based on the selected sample.

Assessment 5.4:
11

In most real life data, the mean always provides meaningful information about the
population. So, with the issues of life, the majority is not always right.
.

Congratulations! You are now ready to proceed to our next module…

References:
Canlapan, R. and Campena, F. 2016. Diwa Senior High School Series: Statistics and Probability.
Makati City, Philippines. University Press First Asia.

Creative commons.org
Online Math learning
Statistics for Managers Using Microsoft excel 2004 Prentice-Hall.
12

You might also like