Unit 6a Point and Interval Estimation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

UNIVERSITY OF NAMIBIA

Unit 6: INFERENTIAL BIOSTATISTICS –


POINT AND INTERVAL ESTIMATION

Unit objectives
By the end of the unit students will be able to:
1. Find point estimates for a single population mean, single population variance and single
population proportion
2. Find confidence intervals of single population mean and proportion
3. Find confidence intervals for the difference of two population means
4. Determine the appropriate sample size to estimate the mean and proportion

Introduction – The Central Limit Theorem


The sample statistic gives an estimate of the population parameter (e.g. 𝑥̅ estimates µ). Small
samples are less reliable than large ones. In the last unit we already talked about that for each
sample there will be a slightly different estimate of the sample mean. Variations between
individual estimates is due to sampling error, i.e. reflects the random scatter inherent in any
sample. For a number of small samples drawn from same population, some statistics
overestimate the parameter and some underestimate parameter.

The way in which sample statistics cluster around a population parameter is called the
distribution of the sampling statistic or the sampling distribution. These conform to
mathematical principles.

One of these mathematical principles is the Central Limit Theorem, which states that
1. the means of a large number of samples drawn randomly from the same population are
normally distributed and the ‘mean of means’ is the mean of the population. This
means that no matter what the underlying distribution of the population from which the
samples are drawn (whether symmetrical, skewed or bimodal, discrete or continuous), the
means of a large number of samples is normally distributed around the population
mean µ
2. Furthermore, the normal distribution of a large number of sample means has its own
standard deviation – the standard deviation of the sample means, also called the standard
error of the mean.
3. All the properties of the normal curve hold true for the normal distribution of sample means,
for example, 95% of a large number of sample means fall within ± 2 (1.96) SE of the
population mean µ.

Point estimation
In point estimation, a single statistic, computed from the sample is used to estimate the
population parameter. It is called a point estimate because only a single numerical value, that is,
a single point on the real line is calculated to estimate the population parameter of interest.
Sample statistics used to estimate population parameters are called estimators.

A good point estimator should be an unbiased estimator of the population parameter.

Definition of terms
Definition 1: An estimator is a sample statistic used to estimate a population parameter.

Definition 2: A statistic is called an unbiased estimator of the population parameter if the


average values of the statistics computed from several samples equals the parameter.

Definition 3: A point estimate is a single number that is used to estimate an unknown parameter.

Point estimators of the population mean


A point estimator of the population mean, μ, is a sample statistic 𝑋̅, computed as:
∑𝑛𝑖=1 𝑋𝑖
𝑋̅ =
𝑛
where 𝑛 is the sample size and 𝑥1 , 𝑥2 , … . 𝑥𝑛 are sample observations drawn from a population.

Point estimator for population variance


The unbiased estimator of the population variance 𝜎 2 , is the sample variance 𝑠 2 whose value is
computed from the formula

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠2 =
𝑛−1

Point estimator for population proportion (p)


One may want to estimate the proportion of items of a population that have a certain
characteristic of interest. For example, an animal scientist may want to estimate the proportion of
animal infected by a particular animal disease.

To estimate the population proportion, a sample of size n is drawn from the population and we
determine x, the number of sample items that have the characteristic of interest.

The unbiased estimator of the population proportion 𝑝 is given by:

𝑥
𝑝̂ =
𝑛

Example: In a study to estimate the incidence of the x disease in cattle, a researcher draws a
sample of 200 cattle from different farms in the region. Of the 200 cattle, 140 of the animals
have the disease. Find the point estimate of the proportion of cattle that have the disease.
𝑥
Solution: 𝑛 = 200, 𝑥 = 140. 𝑝̂ =
𝑛

140
=
200
= 0.7

The standard error of the mean


If we take a large number of samples from the same population, because of the Central Limit
Theorem, we can have a sampling distribution for the mean of means.

The variation associated with the distribution of the means is known as the standard deviation of
the mean or the standard error and is given by:
𝑠
𝑆𝐸 = 𝑠𝑥̅ =
√𝑛
where 𝑠 is the sample standard deviation and 𝑛 is the sample size.

Note:
 The standard deviation is an estimate of the dispersion of the individual observations from
the mean of a sample, while the standard error is an estimate of the standard deviation of the
sample means.
 We are effectively using the present sample to estimate what the likely distribution of the
means would be if we were to have repeated measurements from the population.
 SE can be used to estimate the confidence interval of a population mean from the sample
mean, given the distribution of the data.
 We assume that the means are normally distributed (Central Limit Theorem)

Confidence intervals
From the discussion above, population parameter can be estimated from a sample by calculating
the corresponding point estimate. However, due to sampling variability, it is almost never the
case that the population parameter equals the sample statistic. Further, the point estimate does
not provide any information about its closeness to the true population parameter and its
reliability cannot be evaluated. For this reason, interval estimates are more useful and
appropriate for providing estimates of the population values.

Definitions:
An interval estimate is a range of values used to estimate a population parameter.

A confidence interval (CI) describes a range of values within which a population parameter is
likely to lie. A CI is constructed so that there is high confidence that it does contain the true but
unknown population parameter.
A 100(1-α) % CI of the parameter implies that the probability of the parameter lying within the
interval is 1-α. 1-α is the degree of confidence that we have in stating that the parameter lies
within that interval while α is the probability of error in our assertion and is called the level of
confidence. Generally, a 100(1-α) % confidence interval equals:
𝑃𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑟𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 × 𝑆𝐸 (𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒)

where α is the level of significance between zero and one; 1-α is a value called the "confidence
coefficient"; 100(1-α)% is the confidence level; point estimate is a value for the point estimate
such as for the sample mean 𝑋̅, or for the population proportion 𝑝̂ ; reliability coefficient is a
probability point obtained from an appropriate table as dictated by, for example, 𝑧 𝛼 or 𝑡𝛼 (𝑛 −
2
1); and SE(estimate) is read standard error of the parameter, measures the closeness of the point
estimate to the true population parameter, i.e. it measures the precision of an estimate in getting
the parameter.

1. Confidence Interval for the population mean


When calculating the confidence interval for the population mean, there are three cases to
consider
1. If 𝜎 2 is known, when the population is large or small
2. If 𝜎 2 is unknown
a. For large sample sizes 𝑛 > 30
b. For small sample sizes 𝑛 ≤ 30

The overall assumption made is that the sample comes from a normal population.

1: Known population variance 𝝈𝟐


If a sample of n values 𝑥1 , 𝑥2 … 𝑥𝑛 , has been drawn from a population whose random variable X
is normally distributed and suppose that, the variance of the population 𝜎 2 is known. Then the
sampling distribution for the sample mean, 𝑋, ̅ is defined as:

𝜎 2
𝑋̅~𝑁(𝜇, ),
𝑛
whose standardized result is

(𝑋̅ − 𝜇)
𝑍= ~𝑁(0,1).
𝜎/√𝑛
So the 100(1-α) % confidence interval estimate for the population mean is given by
𝑋̅ − 𝑍𝛼 × 𝑆𝐸 ≤ 𝜇 ≤ 𝑋̅ + 𝑍𝛼 × 𝑆𝐸
2 2
𝜎 𝜎
= 𝑋̅ − 𝑍𝛼 × ≤ 𝜇 ≤ 𝑋̅ + 𝑍𝛼 × ,
2 √𝑛 2 √𝑛
Where 𝑍𝛼 is the value of the standard normal distribution such that the area under the normal
2
𝛼 𝜎
curve to the right of it is and is the standard error of the sample mean ̅𝑋.
2 √𝑛

Also, observe that, the confidence interval can be written in the form of
𝜎
𝑋̅ ± 𝑍𝛼 ×
2 √𝑛
Or as
𝜎 𝜎
(𝑋̅ − 𝑍𝛼 × ; 𝑋̅ + 𝑍𝛼 × )
2 √𝑛 2 √𝑛

Example: Bags of maize rice harvested at Ogongo Campus Farm were weighed and the
following data gives their weights in kg:
64.3 64.6 64.8 64.2 64.5 64.3 64.6 64.8 64.2 64.3

Assume that the sample is normally distributed with unit population variance (𝜎 2 = 1). For these
data, construct a 95% confidence interval for the population mean.

Solution: Using the data, 𝑛 = 10, 𝑋̅ = 64.46 kg, the level of significance, 𝛼 = 5% = 0.05, and
from the given assumption 𝜎 2 = 1. The resulting 95% confidence interval (CI) for the population
mean is:
𝜎 𝜎
𝑋̅ − 𝑍0.025 × ≤ 𝜇 ≤ 𝑋̅ + 𝑍0.025 ×
√𝑛 √𝑛
1 1
= 64.46 − 1.96 × ≤ 𝜇 ≤ 64.46 + 1.96 ×
√10 √10

Simplifying we then have the 95% CI for the population mean as


63.84 𝑘𝑔 ≤ 𝜇 ≤ 65.08 𝑘𝑔

Interpretation: This means that we are 96 % certain that the interval 63.84 to 65.08 covers the
true population mean.

One-sided Confidence Intervals for the population mean


Using similar assumptions, one-sided confidence intervals for the population mean μ are
obtained by replacing 𝑍𝛼 by 𝑍𝛼 . The 100(1-α)% upper confidence interval for the mean μ is
2
𝜎
thus given by: 𝑋̅ + 𝑍𝛼 ×
√𝑛

and the 100(1-α)% lower confidence interval for the mean μ is given by
𝜎
𝑋̅ − 𝑍𝛼 × ;
√𝑛
Exercise 1: For the data in the above example, construct the 90%; 95%; 99% lower -, and upper
- confidence limits.
What observations can you make? What is happening to the width of the confidence interval as
we increase the confidence level?

Case 2: Unknown population variance


a. Confidence interval for μ for large samples (n > 30)
It was assumed in the foregoing discussion that the population distribution is normal with an
unknown mean μ and a known population standard deviation σ and population variance σ2.
However, sometimes the population standard deviation may not be known. If we have a sample
standard deviation, we may estimate the population standard deviation from the sample statistic
and assume that an increased sample size improves the estimate of the confidence intervals from
this estimate. However, these assumptions may be dropped when dealing with large-samples:

Let the observations 𝑥1 , 𝑥2 … 𝑥𝑛 be a random sample from a population with unknown mean, μ
and an unknown variance 𝜎 2 . If 𝑛 is large, then
𝜎2
𝑋̅~𝑁(𝜇, )
𝑛
whose standardized result is

(𝑋̅ − 𝜇)
𝑍= ~𝑁(0,1)
𝜎/√𝑛

In the case where 𝑛 is large and so it is permissible to replace the unknown σ by s. This has close
to no effect on the distribution of Z, so for large n, the quantity
(𝑋̅ − 𝜇)
𝑍= ~𝑁(0,1)
𝑠/√𝑛

follows a standard normal distribution with mean zero and unit standard deviation.

Therefore, the 100(1 -α) % confidence interval for μ is when 𝜎 2 I unknown and n is large is
given by:
𝑠 𝑠
𝑋̅ − 𝑍𝛼 × ≤ 𝜇 ≤ 𝑋̅ + 𝑍𝛼 ×
2 √𝑛 2 √𝑛

which is true regardless of the sample's underlying distribution.

Note: The confidence interval can also be written as:


𝑠
𝑋̅ ± 𝑍𝛼 ×
2 √𝑛
or as
𝑠 𝑠
(𝑋̅ − 𝑍𝛼 × ; 𝑋̅ + 𝑍𝛼 × )
2 √𝑛 2 √𝑛
Example: A study was carried out in Zimbabwe to investigate pollutant contamination in small
fish. A sample of small fish was selected from 53 rivers across the country and the pollutant
concentration in the muscle tissue was measured (ppm). The pollutant concentration values are
shown below. Construct a 95% confidence interval for the population mean.
1.230 1.330 0.040 0.044 1.200 0.270 0.490 0.190 0.940 0.520 0.830
0.810 0.710 0.500 0.490 1.160 0.050 0.150 0.400 0.190 0.650 0.770
1.080 0.980 0.630 0.560 0.410 0.730 0.430 0.590 0.340 0.340 0.270
0.840 0.500 0.340 0.280 0.340 0.250 0.750 0.870 0.560 0.100 0.170
0.180 0.910 0.040 0.490 0.270 1.100 0.160 0.210 0.860

Solution: From the data, n = 53, the sample mean 𝑋̅ = 0.5250 𝑝𝑝𝑚, and the sample standard
deviation s = 0.3486 ppm2.

Since n > 30, the 95% confidence interval for μ is computed using the formula
𝑠 𝑠
𝑋̅ − 𝑍𝛼 × ≤ 𝜇 ≤ 𝑋̅ + 𝑍𝛼 ×
2 √𝑛 2 √𝑛
0.3486 0.3486
Substituting we get, 0.5250-1.96× ≤ 𝜇 ≤ 0.5250-1.96×
√53 √53

which simplifies to 0.431 𝑝𝑝𝑚 ≤ 𝜇 ≤ 0.619 𝑝𝑝𝑚

Exercise: Construct the 90% and the 99% CI for μ using the above data. Further, using the above
data construct the 90%, 95%, and the 99% lower- and upper- CI for the population mean.

b. Confidence interval for μ when the sample size is small (n ≤ 30)


For small samples, it is assumed that the population of interest is normally distributed with
unknown population variance 𝜎 2 and it is a reasonable procedure to use 𝑠 2 to estimate the
variance. Then the random variable Z is replaced with T (from the student’s t-distribution),
which is given by:
(𝑋̅ − 𝜇)
𝑇= ~𝑡(0,1)
𝑠/√𝑛
which is a random variable that follows the student's 𝑡 distribution with n – 1 degrees of freedom
which are associated with the estimated standard deviation. The student’s t distribution is similar
to the normal distribution but varies with sample size n < 30.

The 100(1 − 𝛼)% confidence interval for the mean μ, based on a random sample of size n
values 𝑥1 , 𝑥2 … 𝑥𝑛 when the population standard deviation σ is unknown is given by:
𝑠
𝑋̅ ± 𝑡𝛼 (𝑛 − 1) ×
2 √𝑛
=
(𝑋̅ − 𝑡𝛼 × 𝑆𝐸; 𝑋̅ + 𝑡𝛼 (𝑛 − 1) × 𝑆𝐸)
,(𝑛−1)
2 2
=
𝑠 𝑠
𝑋̅ − 𝑡𝛼,(𝑛−1) × ≤ 𝜇 ≤ 𝑋̅ + 𝑡𝛼,(𝑛−1) ×
2 √𝑛 2 √𝑛
Where 𝑋̅ is the sample mean, s is the sample standard deviation, n is the sample size and 𝑡𝛼,(𝑛−1)
2
is the value of the t distribution with (𝑛 − 1) degrees of freedom such that the area to the right of
𝛼 𝑠
it is 2 , and 𝑛 is the standard error of the mean.

Example: Consider the following data about the weight for piglets (in kg) to be fostered by
cow’s milk:
19.8 10.1 14.9 7.5 15.4 15.4 18.5 7.9 12.7 11.9 15.4
11.4 11.4 14.1 17.6 15.8 15.8 8.8 13.6 11.9 11.4 19.5

Construct the 95% confidence interval for the population mean μ.

Solution: 𝑋̅ = 13.71 𝑘𝑔 , s = 3.55 kg2, 𝑛 = 22. Since the sample is small, n=22, then the 95%
confidence interval for the population mean is given by
𝑠 𝑠
𝑋̅ − 𝑡𝛼,(𝑛−1) × ≤ 𝜇 ≤ 𝑋̅ + 𝑡𝛼,(𝑛−1) ×
2 √𝑛 2 √𝑛

𝑡𝛼,(𝑛−1) = 𝑡0.025,21 = 2.080


2
Substituting we get

3.55 3.55
13.71 − 2.080 × ≤ 𝜇 ≤ 13.71 + 2.080 ×
√22 √22

And upon simplification we have the 95% confidence interval for the population mean μ as:

12.1 kg ≤ 𝜇 ≤ 15.3 kg

Exercise
a. For the data above, construct the 90% and 99% confidence interval for the mean and
interpret the two confidence intervals.
b. Construct the 90%, 95% and 99% upper and lower confidence intervals confidence intervals
for the mean

Remark: Central limit theorem also holds for count data. However, often count is data drawn
from populations with skewed distributions (e.g. variance >> mean), and therefore the t-
distribution not a sufficient correction. In such cases, we need to calculate confidence interval of
transformed observations (e.g. log-transformed). We will not deal with such cases here.

Remark: One-sided confidence intervals for the mean of a normal population are constructed by
choosing the appropriate lower- or upper-confidence limit and then replacing 𝑡𝛼 (𝑛 − 1)
2
with 𝑡𝛼 (𝑛 − 1).

2. Confidence Intervals for the population proportions


a. For large n
Suppose that a random sample of size n, where is large, has been taken from a large population
and that 𝑥 (where 𝑥 is less than 𝑛) observations in this sample belong to a class of interest. Then
𝑝̂ calculated as:
𝑥
𝑝̂ =
𝑛
is a point estimator of the proportion of the population proportion 𝑝 that belongs to this class. It
is noted that 𝑛 and 𝑝 are the parameters of a binomial distribution (refer to earlier discussions).

𝑝(1− 𝑝)
The sampling distribution of 𝑝̂ is approximately normal with mean p and variance 𝑛 if p is
not too close to either 0 or 1 and if n is relatively large. To apply this, it is required that 𝑛𝑝 and
𝑛(1 − 𝑝) be greater than or equal to 5. We are saying that: If n is large, then the distribution of
𝑝̂ − 𝑝
𝑍= ~ 𝑁(0,1)
√ 𝑝(1 − 𝑝)
𝑛

is a standard normal random variable.

For large samples, which usually is the case when dealing with proportions, a satisfactory
100(1 − 𝛼)% confidence interval on the population proportion 𝑝 is computed as

𝑝̂(1−𝑝̂) 𝑝̂(1−𝑝̂)
𝑝̂ − 𝑍𝛼 × √ ≤ 𝑝 ≤ 𝑝̂ + 𝑍𝛼 × √
2 𝑛 2 𝑛

𝛼
where is 𝑝̂ the point estimate of p, and 𝑍𝛼 is the upper 2 probability point of the standard normal
2
distribution.

Example: In a random sample of 85 does selected from farms in Namibia, 10 have a body
condition score that is less than the expected average body condition score for a healthy doe and
hence should be considered for some treatment.

Construct a 95% confidence interval for the population proportion of does whose body condition
score is less than the expected.
10
Solution: 𝑝̂ = 85 , 𝑛 = 85, 𝑍𝛼 = 𝑍0.025 = 1.96
2

A 95% two-sided confidence interval for p is

10 10 10 10
10 √85 (1 − 85) 10 √85 (1 − 85)
− 1.96 × ≤𝑝≤ − 1.96 ×
85 85 85 85

Which simplifies to 0.05 ≤ 𝑝 ≤ 0.19


Remark: The one-sided upper confidence interval for the population proportion are given as
𝑝̂(1−𝑝̂)
𝑝̂ + 𝑍𝛼 × √ 𝑛

And the lower sided confidence interval for the population proportion is
𝑝̂(1−𝑝̂)
𝑝̂ − 𝑍𝛼 × √ 𝑛

Exercise: In the example above, construct


a. The 90 % and 99 % confidence intervals for the population proportion and interpret your
results.
b. The 95 % and the 99 % lower- and upper- confidence intervals for the population
proportion.

3. Confidence Intervals for the population variance


Let 𝑥1 , 𝑥2 , … 𝑥𝑛 be a random sample from a normal distribution with mean μ and variance 𝜎 2 ,
and let 𝑠 2 be the sample variance. Then the random variable
𝑠2
𝑉= 2
𝜎 /(𝑛 − 1)
2
has a chi-square (𝜒 )distribution with 𝑛 − 1 degrees of freedom.

Now, if s2 is the sample variance from a random sample of n observations from a normal
distribution with unknown variance, 𝜎 2 , then a 100(1 -α)% confidence interval on _2 is

4. Confidence Interval for the difference of two populations means


The overall assumption remains in place. And, the same with everything else. We are simply
considering two populations and constructing confidence intervals for the difference in two
population means.

Case 1: Known population variance


Assumption
It is assumed that the two populations are is normally distributed with unknown means
𝜇1 𝑎𝑛𝑑 𝜇2 μ and known population variances 𝜎12 and 𝜎22 respectively. The 100(1-α)%
confidence interval for the difference of two means is given by: The Z value given by
(𝑥
̅̅̅1 − ̅̅̅)
𝑥2 − (𝜇1 − 𝜇2 )
𝑍=
𝜎2 𝜎2
√ 1 + 2
𝑛1 𝑛2
is a standard normal random variable.

For example, you may want to compare the difference in maize yields in kg per hactre for
farmers using the conservation farming methods and those using conventional methods. . You
estimate the difference between two population means, 𝜇1 − 𝜇2 by taking a sample from each
population (say, sample 1 and sample 2) and using the difference of the two sample means ̅̅̅
𝑥1 −
̅̅̅2 plus or minus a margin of error. The result is a confidence interval for the difference of two
𝑥
population means, 𝜇1 − 𝜇2

If both of the population standard deviations are known, then the 100(1-α) % confidence interval
for the difference between two population means (averages) is given by:
𝜎12 𝜎22
̅̅̅ ̅̅̅2 ± 𝑍𝛼 √
𝑥1 − 𝑥 +
2 𝑛1 𝑛2
Where 𝑥̅̅̅1 and 𝑛1 are the mean and size of the first sample, and the first population’s standard
deviation, 𝜎1 is known, and ̅̅̅
𝑥2 and 𝑛2 are the mean and size of the second sample, and the
second population’s standard deviation is known.

Exercise: A researcher is interested in evaluating the effects of adding moringa to doe feeds to
the growth of their kids. Two groups of does are selected for the experiment, and one group is
feed with standard goat feed while the other group has moringa as an additive to the standard
goat feed. The weight gains of their kids are recorded after feeding the mothers for 12 weeks,
that is, before weaning. The variance for the weights of the two groups of kids are assumed to be
equal and known to be 3.5 kg.

The data below shows the weight (kg) for the two groups of the kids for the does fed with feed
with and without moringa additive.

moringa 18 18.8 17 17.4 16 15.6 12.4 13.2 14.2 15


No moringa 15.2 15.8 20 20.6 14 14.2 11.2 12 14 14.8

moringa 14.2 15 17 18.4 13.6 15 13.6 14.4 14.2 14


No moringa 14 14.8 11.6 12.2 14.4 15 12.8 14.8 11.8 11.6

Case 2: Unknown population variances but assumed to be equal


This is called the Homogeneous Variance Assumption

The assumptions are that the variances of both distributions 𝜎12 and 𝜎22 are unknown but equal.
This common variance is estimated by a quantity called pooled variance denoted 𝑠𝑝2 and
calculated as
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22
𝑠𝑝2 =
𝑛1 + 𝑛2 − 2

Then, the 100(1 − 𝛼)% confidence interval for the difference of two means is given by:

1 1
̅̅̅ 𝑥2 ± 𝑡𝛼 √𝑠𝑝2 [
𝑥1 − ̅̅̅ + ]
2 𝑛1 𝑛2
Exercise: The following data is from two populations, A and B. Ten samples from A had a mean
of 90.0 with a sample standard deviation of s1 = 5:0, while 15 samples from B had a mean of
87.0 with a sample standard deviation of s2 = 4:0. Assume that the populations, A and B are
normally distributed and that both normal populations have the same standard deviation.
Construct a 95% confidence interval on the difference in the two population means.

5. Determining the sample size in estimation


The purpose of statistics is to make inferences about a population based on sample data. It is
important to come up with a representative sample in order for one to use it to infer population
parameters. If too small a sample is used, the confidence intervals become too large. If too large
a sample is used, one might waste resources unnecessarily. In summary, the quality or strength of
inference depends on the size of the selected sample: the larger the sample the better the quality
or strength.

When estimating the population parameter, the precision (as measured but the width of the
confidence interval) is dependent on:
1. Variance
2. Sample size

The power of a statistical test (the probability of rejecting a null hypothesis when it is false) is
dependent upon
1. Variance
2. Sample size
3. Size of the difference that is worth detecting (effect size)

When planning a study or testing a hypothesis it is important to determine sample size so as to


yield adequate precision or power (ability to generalize results).

1. Sample size for estimating the mean


The sample size required for the sample mean 𝑋̅ to be within a margin error e of the true
population mean μ with a 100(1-α)% probability is given by
𝑍𝛼 × 𝜎 2
𝑛=[ 2 ]
𝑒

where σ is the population standard deviation.

Note: If σ is not known, then we can estimate it by s (or conservatively estimate it to be about
one fourth of the range) if 𝑠 is not given.

Example: A researcher would like to estimate the average monthly feed consumption in kg, for
some suckling cows at Namibian farms. Based upon studies conducted by the Ministry of
Agriculture, the standard deviation for the consumption is assumed to be 20 kg. The farmer
would like to estimate the consumption rate to be within ±5 of the true average with 99%
confidence. What sample size is needed?
𝛼
Solution: 𝜎 = 20, 𝛼 = 0.01, = 0.005, 𝑍𝛼 = 𝑍0.005 = 2.5758 and 𝑒 = 5
2 2

Then,
𝑍𝛼 × 𝜎 2
𝑛=[ 2 ]
𝑒

2.5758 × 20 2
=[ ]
5

=10.30322

=106.1559302

=107

Thus the required sample size is 107 cows.

Exercise: A food scientist wants to conduct a consumer sensory evaluation for the taste of some
new yoghurt. She will ask the people in her sample to rate the taste of the yoghurt. The taste is
rated on the scale 1 to 9. How many people should she poll in order to estimate the population
mean to be within 5 units at 90 % confidence level?

Sample size for estimating the population proportion


Research questions such as:
“What is the prevalence of tick infestation for cattle at Neudamm farm?” or “What is the
sensitivity/ specificity of a particular diagnostic test?” lead to the estimation of a single
proportion.

Let 𝑝̂ denote the sample proportion. The minimum sample size required for 𝑝̂ to be within the
margin error e of the true population proportion p with a 100(1-α) % probability is given by:
𝑍𝛼 2
𝑛 = 𝑝̂ (1 − 𝑝̂ ) [ 2 ]
𝑒

You might also like