0% found this document useful (0 votes)
63 views16 pages

STATPROB Module 7

Uploaded by

SAKE Tv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views16 pages

STATPROB Module 7

Uploaded by

SAKE Tv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

COURSE CODE SP

COURSE TITLE STATISTICS AND PROBABILITY

SEMESTER 2 SCHOOL YEAR 2020-2021

PERIOD 05-10 April 2021 MODULE NO. 7


INTRODUCTION

My beloved students, I understand that the current health situation that we are in right now
brings about challenges to every one of us. It is my hope that despite the difficulties that we
may be experiencing at the moment, you continue to draw out the creativity and productivity in
you.

Know that this, too, shall pass.

Please stay healthy and stay safe!

CHAPTER 4: ON ESTIMATION OF PARAMETERS

Lesson 1: Concepts of Point and Interval Estimation

At the end of the lesson, the learner should be able to:

a. Classify a decision process as one that makes use of inferential statistics, or


not
b. Illustrate point and interval estimation.
c. Differentiate point from interval estimation.

Note. You are encouraged to review the lessons that we learned from Modules 5
and 6.

Recall, that from Modules 5 and 6, we have pointed out that in reality, we do not have
the whole population to work on. Hence, we should make use of a representative
subset of the population ( which we refer to as a random samp l e). Using this random
! 1
sample, we will then generate statistics that we will use to make inferences about the
population and/or its parameters. This process is referred to as inferential
statistics and is illustrated in the following discussions.

Note that the sample taken from the population must be a random sample obtained
using one of the sampling techniques discussed in the previous modules. Likewise,
the inferences that we will make are subject to uncertainty which means that we are
not 100% sure of the inferences or conclusions that we make about the population or its
parameters, based on the statistics generated from the random sample.

In other words, there is a chance or likelihood that we will make a wrong inference
and that we will try to measure this likelihood so that we can minimize it.

Example. For each of the following situations, determine whether inferential


statistics is applicable or not.

1. The government would like to know the per capita rice consumption per day of
Filipinos.

Answer: Inferential statistics is applicable as you cannot take the daily rice consumption
of every Filipino by obtaining a probability sample of households.

2. The effectiveness of a newly developed cure of cancer

Answer: Inferential statistics is NOT applicable as medical research makes use of


volunteers and not a random sample of cancer patients to test the effectiveness of a
newly developed cure for the disease.

3. A presidential candidate decides to take a survey through text messaging to


determine the proportion of voters who are likely to vote for him/her.

! 2
Answer: Inferential statistics cannot be used here since it is likely that not all voters have
cell phones, and even if everyone has, it is important that the sample represents the
target population, which does not here, partly also due to non-responses.

T h e Con cept o f Estimation

In making inferences about the population, we can either provide a value or values for
the parameter or evaluate a statement about the parameter. The process of providing
a value or values for the parameter is generally referred to as esti mation.

In this module, we will discuss two ways in estimation, namely: point and interval
estimation and differentiate one from the other.

KE Y POI NT S
• A point estimate is a numerical value and it identifies a location or a position
in the distribution of possible values.
• A confidence interval estimate is a range of values where one has a certain
percentage of confidence that the true value will likely fall in.

! 3
L esson 2: Point Est imation of the Population Mean

At the end of the lesson, the learner should be able to:


• Identify possible point estimators of the population mean
• Discuss characteristics of a “good” estimator
• Appraise why the sample mean is the “best” estimator of the population
mean
• Compute for a point estimate of the population mean

Recall:

• A pa r a m e t e r is a characteristic of the population which is usually


unknown and needs to be estimated.
• A statistic is computed from a random sample and hence, it is
known and is used to estimate the unknown parameter.
• Recall that there are two types of estimation: point and interval
estimation.

In estimating a parameter, t h e m a t h e m a t i c a l expression o r f o r m u l a w e


u s e in com in g u p w i t h t h e estimate is referred t o as es t i m a t o r while the
es t i ma t e is a numerical value that you arrived at when you apply the estimator
using the sample data.

• There are several estimators for a parameter.


• For a population mean, usually represented by the Greek letter µ, the following are
possible estimators that make use of a sample data obtained using simple random
sampling scheme.

Sample Mean

Sample Median

where is the ith observation in an array or when


the observations are arranged in increasing or
decreasing order.

! 4
Sample Mode is the value(s) with the highest frequency

• With several estimators, we must choose and use the “best” estimator.
• An es timator could be evaluated based on the two statistical properties: accuracy and
precision, which are both measures of closeness.
• Ac c u r a c y is a measure of closeness of the estimates to the true value while pr e c i s i on is
a measure of closeness of the estimates to each other.

To illustrate, take the bull’s eye in a dart board as the parameter and the ‘hits’ made on the board
a s the estimates. There could be a “hit” that is near the bull’s eye or an estimate that is near the
parameter. On the other hand, there could be a “hit” that is far from th e bull’s eye or an estimate
th a t is far from th e parameter. As shown in the following fi gu r e , Estimate No. 1 is far from the
parameter value while Estimate No. 3 is near the parameter value.

An accurate estimator will have estimates that, on the average, are near the parameter value.
When the estimates on the average are equal to the true value, the estimator is said to
unbiased. Thus, an unb i a s ed es t i m a t o r is one whose average value is equal to t h e
p a r a m e t e r itself. If t h e a v e r a g e v a l u e o f t h e e s t i m a t e s d e v i a t e s f r o m t h e
parameter value then the estimator is said to b i a s e d. Bias can then be measured as the
difference between the average value of the estimator (i.e. the expected v a lu e of the sampling
distribution) and the parameter value or mathematically, we compute bias as:

Bias (Estimator, Parameter) = Expected value of the Estimator – Parameter Value

If, on the average, the estimates are greater than the parameter value or the bias is positive,
then we say the estimator overestimates the parameter. On the other hand, if on the average,
the estimates are less than the parameter value or the bias is negative, then we say that the
estimator underestimates the parameter. When bias is equal to zero, the estimator is
unbiased.

In terms of precision, an estimator is precise if the estimates are close to each other.

! 5
Otherwise, the estimator is not precise. A measure of precision of the estimator is its standard
error, which is the square root o f the estimator’s var iance. The smaller the standard error,
the more precise the estimator is.

Ideally, we choose an estimator that is both accurate and precise.

An example of this is the sample mean (computed from using a simple random sample of size
n), which is an accurate and precise estimator of the population mean. However, we could not
find, in all cases, such an ideal estimator. For practical reasons, one can opt to use a biased
and less precise estimator.

For example, a quality control engineer would use the minimum value of life span of an electric
bu lb r a t h e r than the sample mean in assessing the quality of a batch of manufactured electric
bulbs. For efficiency and practical reasons, the engineer does not need to wait for a sample
of n bulbs to bust in order to compute the sample mean. Instead, he could use the life span
of the first bulb as an estimate of the parameter which is the true value of the life span of the
batch of manufactured electric bulbs.

The following figur es illustrate the different kinds of estimators based on their accuracy and
precision:

An estimator that is both accurate and precise:

! 6
An estimator that is accurate but not precise:

Parameter
µ

An estimator that is precise but not accurate:

Parameter!

µ!

An estimator that is neither accurate nor precise:

Parameter!
µ!

As mentioned before, t h e s a m p l e m e a n , c o mp u ted as the a r i t h m e t i c m e a n o f


observations obtained using a simple random sample, is an accurate and precise estimator
of the population mean. It is a “good” estimator of the population mean. That is why it
is usually referred to as the Best Linear Unbiased Estimator (BLUE) of the population
mean. It is “best” since it is the most precise as it has the smallest variance among all
possible estimators of the population mean. It i s linear i n function and it is accurate
as it is an unbiased estimator.?

! 7
An Example.

Consider the following observed weights (in kilograms) of a random sample of 20


learners and use it to estimate the true value of the average weight of learners enrolled
in the class.

40 45 46 48 48 50 55 55 56 58
58 59 60 60 62 62 64 64 65 66

The sample mean is computed as:

Thus, we say the average weight of all learners in the class is estimated to be around
56.05 kg based on a simple random sample of 20 observations.

KEY PO IN T S

• When estimating a parameter (such as a population mean), there are various possible
estimators to use (including the sample mean, sample median, and sample mode).
• What makes an estimator a good estimator? An estimator should have both
accuracy and precision.
o Accuracy is a measure of closeness of the estimates to the true value
o Precision is a measure of closeness of the estimates to each other
• We also prefer the estimator to be unbiased.
o Bias is the difference between the average value of the estimator (i.e. the
expected value of the sampling distribution) and the value of the population
parameter.

! 8
Lesson 3: Confidence Interval Estimation of the Population
Mean (Part 1)

At the end of the lesson, the learner should be able to:


• Assess accuracy of confidence interval estimates through its width
• Interpret confidence interval estimates
• Construct a (1-α)100% confidence interval estimator of the population mean when
the population variance is known
• Determine the required sample size in estimating the population mean under the
simple random sampling scheme

Recall:

• A point estimate gives a single value of the parameter while an interval estimate gives a
range of possible v a l u e s of the pa rame ter . Also, with a n attached c o n f i d e n c e
c o e f f i c i e n t , t h e interval estimate is referred to as confidence interval estimate.

To better understand the concept of confidence interval estimates, consider the graph
regarding estimates of age. Each line segment represents an interval estimate of the true
value of your age.

0 20 40 TRUE VALUE 60 80

:
:

• The 95% confidence interval e s t i m a t e s , r e p r e s e n t e d by t h e l i n e s e g m e n t s ,


a r e of different w i d t h s . Some are short and some are long. The wi d t h o f t h e i n t e
r v a l es t i m a t e r ep r es en t s a c c u r a c y o f t he es t i m a t e. The narrower the interval or
the shorter the segment is, the more accurate the interval estimate is.

• There are line segments that include the true value but there are others that exclude the
true v a l u e . If all the l i n e s e g m e n t s r e p r e s e n t a l l possible 9 5 % confidence
i n t e r v a l estimates, then 95% of them will contain the true value and only 6% of
them will not contain t h e t r u e v a l u e . Thus, we c o u l d s a y that i f w e h a v e 1 , 0 0 0
p o s s i b l e 9 5 % confidence interval estimates, 950 of these estimates will contain the
true value and only 50 of the estimates will not have the true value within the interval
! 9
estimate. This is another way of interpreting a confidence interval estimate.

In general, an interval estimator is constructed as follows:

where the Tabular Value depends on the sampling distribution of the point estimator.

• In particular, for the population mean, the point estimator is the sample mean while the
standard error of the sample mean will be used in the computation. With a known
population variance (σ2) and sample size (n), the standard error of the sample mean is
computed a s a ratio of the standard deviation ( square root o f the variance) and the square

root of the same size or mathematically, .

• Also, since the population variance is known, the sampling distribution of the sample
mean will follow the standard normal distribution or the Z distribution. This would mean
that the tabular value would co me from the Z-distribution t a b l e . Usually, we use the
notation Z α/2 as a tabular value in the Z-distribution ta b le whose area to its right is equal
to α/2.

• Thus, a (1-α) % confidence interval (CI) of the population mean (µ) when the population
variance (σ2) is known is constructed as

or

where is the sample mean computed from a simple random sample of size n.

The lower limit of the interval is while the upper limit is

• The wi d t h o f t h e i n t e r v a l e s t i m a t e is the difference between the upper limit and


the lower limit of the interval estimate. Expressing it mathematically, we have:

This would lead to 2 where is usually referred to as

m a x i m u m a l l o w a b l e de v i a t i o n , denoted by D.

• The maximum allowable deviation is a function of three factors: (1) population standard
deviation, σ ; (2) sample size, n and (3) confidence coefficient (1-α)% through the tabular
value Zα/2. Take note of the following relationships between each of these three factors
and the confidence interval estimator holding other factors constant:

! 10
1. The larger is the variability of the population from which the simple random sample
was drawn, or the larger value of σ will result to larger maximum allowable deviation
and consequently, wider confidence interval estimate.
2. Bigger sample size will lead to smaller maximum allowable deviation and narrower
confidence interval estimate.
3. Higher confidence interval coefficient (1-α)% means lower value of α, thus higher
tabular value Zα/2 which leads to larger maximum allowable deviation and
consequently, wider confidence interval estimate.

• The (1-α)% confidence interval (CI) of the population mean (µ) can be interpreted as a
probability statement or a confidence statement. It is a probability statement when the
upper and lower limits are still considered random variables or they are not yet fixed.
Otherwise, it is considered a confidence statement. For example, one could say that the
probability that a 95% CI of the population mean will include the population mean in
the interval is equal to 0.95 Mathematically, this is expressed as:

Once, we have computed or fixed the lower and upper limits, say the lower limit is 40
and the upper limit is 60, the 95% CI of the population mean becomes a confidence
statement. Thus, we say that we are 95% confident that the true mean value will be
between 40 and 60, and the probability that the true mean value will be between 40
and 60 is either one or zero. The probability is one if the true mean value is indeed
between 40 and 60, and if otherwise, the probability is zero. Note also that we could
interpret the 95% CI of the population mean, in terms of the number of interval
estimates out of all possible confidence interval estimates that will contain or include
the population mean. Like what was said earlier, out of all possible confidence intervals,
95% of them will contain or include the population mean.

Illustration.
Consider the numerical example used in point estimation of the population mean
where the following observed weights (in kilograms) of a random sample of 20 learners were
used.

40 45 46 48 48 50 55 55 56 58
58 59 60 60 62 62 64 64 65 66

The sample mean is computed as:

Assuming that the population standard deviation of the weights of all learners in the
class is 9 kg, the 95% confidence interval estimate of the true average weight of the
learners is

! 11
Thus, we are 95% confident that the true average weight of all learners in the class is
between 52 kg and 60 kg (rounded off to the nearest integer).

• Using the expression on maximum allowable deviation, the

required sample size in estimating the population mean under simple random sampling
scheme is computed as (rounded up to the next higher integer).

Hence,greater variability of the population, larger confidence coefficient and smaller


maximum allowable deviation require larger sample size.

Illustration.

Suppose we want to estimate the true average weight of learners enrolled in a school
using a sample to be drawn using simple random sampling. How large should the sample
be if we want the estimate to be within 2 kg away from the true value and that we are
99% confident of our estimate? We could assume that population standard deviation of the
weight is 9 kg.

Thus, we need 135 learners in estimating the true average weight of learners enrolled in this
class under simple random sampling scheme with 99% confidence and maximum allowable
deviation is within 2 kg.

Lesson 4: Confidence Interval Estimation of the Population


Mean (Part 2)
At the end of the lesson, the learner should be able to:
• Construct a (1-α)100% confidence interval estimator of the population mean when the
population variance is unknown
• Use the Student’s t distribution table in getting a tabular value
• Construct a (1-α)100% confidence interval estimator of the population mean when the
population variance is unknown and sample size is large enough to invoke the Central Limit
Theorem
• Interpret confidence interval estimates

First, recall how to construct an interval estimator.

! 12
In this expression, the tabular value depends on the sampling distribution of the sample mean.
You learned in the previous lecture that the tabular value to use in the mathematical expression
when the population variance is known is to be taken from the standard normal distribution.

When the population variance is unknown, there is a slight change in the construction of the
confidence interval and the changes involve the tabular value and the standard error of the sample
mean.

A. Construction and interpretation of a (1-α)100% confidence interval estimator of the


population mean when the population variance is unknown

With an unknown population variance (σ2), it has to be estimated using a simple random sample
of size n. A point estimator of the population variance is the sample variance denoted as s 2
and computed as

The square root of the sample variance is the sample standard deviation, denoted as s. Such
point estimate of the population standard deviation is used in the computation of the standard
error of the sample mean and can be computed as a ratio of the sample standard deviation and
the square root of the same size or mathematically, .

B. Use of the Student’s t distribution table in getting a tabular value

The tabular value to use would come from the Student’s t-distribution table. Usually, we use the
notation t (α/2,n-1) as a tabular value in the Student’s t-distribution with degrees of freedom equal
to n-1. Such tabular value is also a point in the distribution whose area to its right is equal to
α/2.

A Student’s t-distribution table (Please see attached table generated using MS Excel®) provides
the area or probability to the right of a given value (t0). The illustration below shows a part of
the table. The first row of the table provides selected probabilities or areas while the first
column provides the degrees of freedom. The intersection of the area and the degrees of
freedom is the needed tabular value.

! 13
Thus, a (1-α)% confidence interval (CI) of the population mean (µ) when the population
variance (σ2) is unknown is constructed as

or

where and s are the sample mean and sample standard deviation, respectively. Both are

computed using a simple random sample of size n. The lower limit of the interval is while the
upper limit is

For this case, the width of the interval estimate is computed as:

and the maximum allowable deviation is

C. Construction and interpretation of a (1-α)100% confidence interval estimator of the


population mean when the population variance is unknown and sample size is large
enough to invoke the Central Limit Theorem

A property of the Student’s t distribution is that it approaches the standard normal


distribution as its degrees of freedom increase. Since the degrees of freedom that we are
concerned about at the moment depend on the sample size n, we can say that as n
increases, the Student’s t distribution approaches the standard normal distribution. This is
also in consonance to the Central Limit Theorem, discussed in the previous chapter. With
these concepts, the tabular value to be used in the construction of the confidence interval
for the population mean when the sample size is at least 30 is to be taken from the Z-
distribution table. Thus, the following expression is to be used in constructing a (1-α)%
confidence interval (CI) of the population mean (µ) when the population variance (σ2) is unknown
and the sample size is at least 30:

or

For this case, the width of the interval estimate is computed as:

and the maximum allowable deviation is

Ill u s t r a t io n.

Again, consider the numerical example used in point and interval estimation of the

! 14
population mean where the following observed weights (in kilograms) of a random sample of
20 learners were used.

40 45 46 48 48 50 55 55 56 58
58 59 60 60 62 62 64 64 65 66

The sample mean is computed as:

kg.

This time, w e don’t have an assumed value of the population standard deviation of the
weights of all learners in the class. Because of this situation, there is a need to use a
point estimate of the population standard deviation. Using the same sample
observations given above, a point estimate of the population standard deviation is

With the sample mean and standard deviation, the 95% confidence interval estimate of
the true average weight of the learners is

Thus, we say that we are 95% confident that the true average weight of all learners in
the class is between 52 kg and 60 kg (rounded off to the nearest integer).

! 15
Example. Determine the point estimate for the following.

1. An IQ test was given to a simple random sample of 75 students at a certain


college. The sample mean score was 105.2. Scores on this test are known to
have a standard deviation of 𝜎 = 10. It is desired to construct a 90% confidence
interval for the mean IQ score of students at this college.

̄ = 105.2 which is equivalent to the mean score.


The point estimate of 𝜇 is x

2. The mean test score for a simple random sample of n = 100 students was ̄x =
67.30. The population standard deviation of test scores is 𝜎 = 15.

The point estimate is the sample mean ̄x = 67.30.

3. The lifetime of a certain type of battery is known to be normally distributed with


standard deviation 𝜎 = 20 hours. A sample of 50 batteries had a mean lifetime
of 120.1 hours. It is desired to construct a 95% confidence interval for the mean
lifetime for this type of battery.

The point estimate is the sample mean ̄x = 120.1.

Example.

The mean test score for a simple random sample of n=100 students was 𝑥̅ =
67.30. The population standard deviation of test scores is 𝜎 = 15. Construct a
98% confidence interval for the population mean test score 𝜇.

Solution.

First, we check the assumptions.


The sample is a simple random sample, and the sample size is large (n > 30).
The assumptions are met so we may proceed.

1. Find the point estimate.


𝑥̅ = 67.30
2. Find the critical value 𝑧𝛼/2.
𝑧𝛼/2 = 2.326
3. Find the standard error and the margin of error.
Margin of error = 2.326 (1.5) = 3.489
4. Construct the confidence interval.
63.81 < 𝜇 < 70.79
5. Interpret the results.
We are 98% confident that the population mean score 𝜇 is between
63.81 and 70.79.

You might also like