STATPROB Module 7
STATPROB Module 7
My beloved students, I understand that the current health situation that we are in right now
brings about challenges to every one of us. It is my hope that despite the difficulties that we
may be experiencing at the moment, you continue to draw out the creativity and productivity in
you.
Note. You are encouraged to review the lessons that we learned from Modules 5
and 6.
Recall, that from Modules 5 and 6, we have pointed out that in reality, we do not have
the whole population to work on. Hence, we should make use of a representative
subset of the population ( which we refer to as a random samp l e). Using this random
! 1
sample, we will then generate statistics that we will use to make inferences about the
population and/or its parameters. This process is referred to as inferential
statistics and is illustrated in the following discussions.
Note that the sample taken from the population must be a random sample obtained
using one of the sampling techniques discussed in the previous modules. Likewise,
the inferences that we will make are subject to uncertainty which means that we are
not 100% sure of the inferences or conclusions that we make about the population or its
parameters, based on the statistics generated from the random sample.
In other words, there is a chance or likelihood that we will make a wrong inference
and that we will try to measure this likelihood so that we can minimize it.
1. The government would like to know the per capita rice consumption per day of
Filipinos.
Answer: Inferential statistics is applicable as you cannot take the daily rice consumption
of every Filipino by obtaining a probability sample of households.
! 2
Answer: Inferential statistics cannot be used here since it is likely that not all voters have
cell phones, and even if everyone has, it is important that the sample represents the
target population, which does not here, partly also due to non-responses.
In making inferences about the population, we can either provide a value or values for
the parameter or evaluate a statement about the parameter. The process of providing
a value or values for the parameter is generally referred to as esti mation.
In this module, we will discuss two ways in estimation, namely: point and interval
estimation and differentiate one from the other.
KE Y POI NT S
• A point estimate is a numerical value and it identifies a location or a position
in the distribution of possible values.
• A confidence interval estimate is a range of values where one has a certain
percentage of confidence that the true value will likely fall in.
! 3
L esson 2: Point Est imation of the Population Mean
Recall:
Sample Mean
Sample Median
! 4
Sample Mode is the value(s) with the highest frequency
• With several estimators, we must choose and use the “best” estimator.
• An es timator could be evaluated based on the two statistical properties: accuracy and
precision, which are both measures of closeness.
• Ac c u r a c y is a measure of closeness of the estimates to the true value while pr e c i s i on is
a measure of closeness of the estimates to each other.
To illustrate, take the bull’s eye in a dart board as the parameter and the ‘hits’ made on the board
a s the estimates. There could be a “hit” that is near the bull’s eye or an estimate that is near the
parameter. On the other hand, there could be a “hit” that is far from th e bull’s eye or an estimate
th a t is far from th e parameter. As shown in the following fi gu r e , Estimate No. 1 is far from the
parameter value while Estimate No. 3 is near the parameter value.
An accurate estimator will have estimates that, on the average, are near the parameter value.
When the estimates on the average are equal to the true value, the estimator is said to
unbiased. Thus, an unb i a s ed es t i m a t o r is one whose average value is equal to t h e
p a r a m e t e r itself. If t h e a v e r a g e v a l u e o f t h e e s t i m a t e s d e v i a t e s f r o m t h e
parameter value then the estimator is said to b i a s e d. Bias can then be measured as the
difference between the average value of the estimator (i.e. the expected v a lu e of the sampling
distribution) and the parameter value or mathematically, we compute bias as:
If, on the average, the estimates are greater than the parameter value or the bias is positive,
then we say the estimator overestimates the parameter. On the other hand, if on the average,
the estimates are less than the parameter value or the bias is negative, then we say that the
estimator underestimates the parameter. When bias is equal to zero, the estimator is
unbiased.
In terms of precision, an estimator is precise if the estimates are close to each other.
! 5
Otherwise, the estimator is not precise. A measure of precision of the estimator is its standard
error, which is the square root o f the estimator’s var iance. The smaller the standard error,
the more precise the estimator is.
An example of this is the sample mean (computed from using a simple random sample of size
n), which is an accurate and precise estimator of the population mean. However, we could not
find, in all cases, such an ideal estimator. For practical reasons, one can opt to use a biased
and less precise estimator.
For example, a quality control engineer would use the minimum value of life span of an electric
bu lb r a t h e r than the sample mean in assessing the quality of a batch of manufactured electric
bulbs. For efficiency and practical reasons, the engineer does not need to wait for a sample
of n bulbs to bust in order to compute the sample mean. Instead, he could use the life span
of the first bulb as an estimate of the parameter which is the true value of the life span of the
batch of manufactured electric bulbs.
The following figur es illustrate the different kinds of estimators based on their accuracy and
precision:
! 6
An estimator that is accurate but not precise:
Parameter
µ
Parameter!
µ!
Parameter!
µ!
! 7
An Example.
40 45 46 48 48 50 55 55 56 58
58 59 60 60 62 62 64 64 65 66
Thus, we say the average weight of all learners in the class is estimated to be around
56.05 kg based on a simple random sample of 20 observations.
KEY PO IN T S
• When estimating a parameter (such as a population mean), there are various possible
estimators to use (including the sample mean, sample median, and sample mode).
• What makes an estimator a good estimator? An estimator should have both
accuracy and precision.
o Accuracy is a measure of closeness of the estimates to the true value
o Precision is a measure of closeness of the estimates to each other
• We also prefer the estimator to be unbiased.
o Bias is the difference between the average value of the estimator (i.e. the
expected value of the sampling distribution) and the value of the population
parameter.
! 8
Lesson 3: Confidence Interval Estimation of the Population
Mean (Part 1)
Recall:
• A point estimate gives a single value of the parameter while an interval estimate gives a
range of possible v a l u e s of the pa rame ter . Also, with a n attached c o n f i d e n c e
c o e f f i c i e n t , t h e interval estimate is referred to as confidence interval estimate.
To better understand the concept of confidence interval estimates, consider the graph
regarding estimates of age. Each line segment represents an interval estimate of the true
value of your age.
0 20 40 TRUE VALUE 60 80
:
:
• There are line segments that include the true value but there are others that exclude the
true v a l u e . If all the l i n e s e g m e n t s r e p r e s e n t a l l possible 9 5 % confidence
i n t e r v a l estimates, then 95% of them will contain the true value and only 6% of
them will not contain t h e t r u e v a l u e . Thus, we c o u l d s a y that i f w e h a v e 1 , 0 0 0
p o s s i b l e 9 5 % confidence interval estimates, 950 of these estimates will contain the
true value and only 50 of the estimates will not have the true value within the interval
! 9
estimate. This is another way of interpreting a confidence interval estimate.
where the Tabular Value depends on the sampling distribution of the point estimator.
• In particular, for the population mean, the point estimator is the sample mean while the
standard error of the sample mean will be used in the computation. With a known
population variance (σ2) and sample size (n), the standard error of the sample mean is
computed a s a ratio of the standard deviation ( square root o f the variance) and the square
• Also, since the population variance is known, the sampling distribution of the sample
mean will follow the standard normal distribution or the Z distribution. This would mean
that the tabular value would co me from the Z-distribution t a b l e . Usually, we use the
notation Z α/2 as a tabular value in the Z-distribution ta b le whose area to its right is equal
to α/2.
• Thus, a (1-α) % confidence interval (CI) of the population mean (µ) when the population
variance (σ2) is known is constructed as
or
where is the sample mean computed from a simple random sample of size n.
m a x i m u m a l l o w a b l e de v i a t i o n , denoted by D.
• The maximum allowable deviation is a function of three factors: (1) population standard
deviation, σ ; (2) sample size, n and (3) confidence coefficient (1-α)% through the tabular
value Zα/2. Take note of the following relationships between each of these three factors
and the confidence interval estimator holding other factors constant:
! 10
1. The larger is the variability of the population from which the simple random sample
was drawn, or the larger value of σ will result to larger maximum allowable deviation
and consequently, wider confidence interval estimate.
2. Bigger sample size will lead to smaller maximum allowable deviation and narrower
confidence interval estimate.
3. Higher confidence interval coefficient (1-α)% means lower value of α, thus higher
tabular value Zα/2 which leads to larger maximum allowable deviation and
consequently, wider confidence interval estimate.
• The (1-α)% confidence interval (CI) of the population mean (µ) can be interpreted as a
probability statement or a confidence statement. It is a probability statement when the
upper and lower limits are still considered random variables or they are not yet fixed.
Otherwise, it is considered a confidence statement. For example, one could say that the
probability that a 95% CI of the population mean will include the population mean in
the interval is equal to 0.95 Mathematically, this is expressed as:
Once, we have computed or fixed the lower and upper limits, say the lower limit is 40
and the upper limit is 60, the 95% CI of the population mean becomes a confidence
statement. Thus, we say that we are 95% confident that the true mean value will be
between 40 and 60, and the probability that the true mean value will be between 40
and 60 is either one or zero. The probability is one if the true mean value is indeed
between 40 and 60, and if otherwise, the probability is zero. Note also that we could
interpret the 95% CI of the population mean, in terms of the number of interval
estimates out of all possible confidence interval estimates that will contain or include
the population mean. Like what was said earlier, out of all possible confidence intervals,
95% of them will contain or include the population mean.
Illustration.
Consider the numerical example used in point estimation of the population mean
where the following observed weights (in kilograms) of a random sample of 20 learners were
used.
40 45 46 48 48 50 55 55 56 58
58 59 60 60 62 62 64 64 65 66
Assuming that the population standard deviation of the weights of all learners in the
class is 9 kg, the 95% confidence interval estimate of the true average weight of the
learners is
! 11
Thus, we are 95% confident that the true average weight of all learners in the class is
between 52 kg and 60 kg (rounded off to the nearest integer).
required sample size in estimating the population mean under simple random sampling
scheme is computed as (rounded up to the next higher integer).
Illustration.
Suppose we want to estimate the true average weight of learners enrolled in a school
using a sample to be drawn using simple random sampling. How large should the sample
be if we want the estimate to be within 2 kg away from the true value and that we are
99% confident of our estimate? We could assume that population standard deviation of the
weight is 9 kg.
Thus, we need 135 learners in estimating the true average weight of learners enrolled in this
class under simple random sampling scheme with 99% confidence and maximum allowable
deviation is within 2 kg.
! 12
In this expression, the tabular value depends on the sampling distribution of the sample mean.
You learned in the previous lecture that the tabular value to use in the mathematical expression
when the population variance is known is to be taken from the standard normal distribution.
When the population variance is unknown, there is a slight change in the construction of the
confidence interval and the changes involve the tabular value and the standard error of the sample
mean.
With an unknown population variance (σ2), it has to be estimated using a simple random sample
of size n. A point estimator of the population variance is the sample variance denoted as s 2
and computed as
The square root of the sample variance is the sample standard deviation, denoted as s. Such
point estimate of the population standard deviation is used in the computation of the standard
error of the sample mean and can be computed as a ratio of the sample standard deviation and
the square root of the same size or mathematically, .
The tabular value to use would come from the Student’s t-distribution table. Usually, we use the
notation t (α/2,n-1) as a tabular value in the Student’s t-distribution with degrees of freedom equal
to n-1. Such tabular value is also a point in the distribution whose area to its right is equal to
α/2.
A Student’s t-distribution table (Please see attached table generated using MS Excel®) provides
the area or probability to the right of a given value (t0). The illustration below shows a part of
the table. The first row of the table provides selected probabilities or areas while the first
column provides the degrees of freedom. The intersection of the area and the degrees of
freedom is the needed tabular value.
! 13
Thus, a (1-α)% confidence interval (CI) of the population mean (µ) when the population
variance (σ2) is unknown is constructed as
or
where and s are the sample mean and sample standard deviation, respectively. Both are
computed using a simple random sample of size n. The lower limit of the interval is while the
upper limit is
For this case, the width of the interval estimate is computed as:
or
For this case, the width of the interval estimate is computed as:
Ill u s t r a t io n.
Again, consider the numerical example used in point and interval estimation of the
! 14
population mean where the following observed weights (in kilograms) of a random sample of
20 learners were used.
40 45 46 48 48 50 55 55 56 58
58 59 60 60 62 62 64 64 65 66
kg.
This time, w e don’t have an assumed value of the population standard deviation of the
weights of all learners in the class. Because of this situation, there is a need to use a
point estimate of the population standard deviation. Using the same sample
observations given above, a point estimate of the population standard deviation is
With the sample mean and standard deviation, the 95% confidence interval estimate of
the true average weight of the learners is
Thus, we say that we are 95% confident that the true average weight of all learners in
the class is between 52 kg and 60 kg (rounded off to the nearest integer).
! 15
Example. Determine the point estimate for the following.
2. The mean test score for a simple random sample of n = 100 students was ̄x =
67.30. The population standard deviation of test scores is 𝜎 = 15.
Example.
The mean test score for a simple random sample of n=100 students was 𝑥̅ =
67.30. The population standard deviation of test scores is 𝜎 = 15. Construct a
98% confidence interval for the population mean test score 𝜇.
Solution.