Estimation and Sample Size Determination
Estimation and Sample Size Determination
Page 1 of 37
population parameter and using sample data to assess the
evidence against the null hypothesis.
Types of Estimation
1. Point Estimation: Point estimation involves estimating
an unknown population parameter using a single value
called a point estimate. The point estimate is a best guess
or approximation of the parameter based on the
available sample data. Examples of point estimates
include the sample mean, sample proportion, sample
variance, or sample regression coefficient.
2. Interval Estimation: Interval estimation provides a
range of plausible values within which the population
parameter is likely to lie. It involves calculating a
confidence interval, which is a range of values centered
around the point estimate. The confidence interval
quantifies the uncertainty associated with the estimation.
Commonly used confidence levels include 90%, 95%,
and 99%. For example, a 95% confidence interval for
the population mean provides a range of values within
which we are 95% confident that the true population
mean lies.
Point estimation provides a single numerical value that
represents the best estimate of the unknown population
Page 2 of 37
parameter, while interval estimation provides a range of values
that represent the uncertainty associated with the estimate. Point
estimation provides a specific value, whereas interval
estimation provides a range of values that account for the
sampling variability and uncertainty in the estimation process.
Estimator
Any sample statistic that is used to estimate a population
parameter is called an estimator. For example, the sample mean
𝑋 can be used as an estimator of the population mean , and the
sample proportion P can be used as an estimator of the
population proportion .
Estimate
An estimate is a specific observed numerical value of a statistic
used to estimate a parameter.
Table-1:
Page 3 of 37
Desirable properties of a good estimator
The desirable criteria of a good estimator are unbiasedness,
consistency, efficiency, and sufficiency. A brief description of
the criteria is provided below.
(i) Unbiasedness: We know the value of the sample
average would depend on the values of the sample and
may differ from sample to sample. An estimator is said
to be unbiased if its expected value is equal to the
parameter. For example, a sample mean 𝑋 is said to be
an unbiased estimator of a population mean if E(𝑋) =
(since the sampling distribution is a probability
distribution, the average is referred to as the expected
value).
(ii) Consistency: Consistency refers to the effect of sample
size on the accuracy of the estimator. A statistic is said
to be a consistent estimator of the population parameter
if it approaches the parameter as the sample size
increases. Thus, the sample mean 𝑋 is a consistent
estimator of a population mean if 𝑋→ is n →∞ .
(iii)Efficiency: An estimator is considered efficient if its
value remains stable from sample to sample. Hence, the
best estimator is the one that would have the least
variance from sample to sample taken randomly from
Page 4 of 37
the same population. For example, the sample mean is
the best estimator of the population mean than the
median or mode because the variance of the sample
mean is less than that of the median or mode.
(iv) Sufficiency: An estimator is said to be sufficient if it
uses all the information about the population parameter
contained in the sample. For example, the statistic
sample means uses all the sample values for its
computation while mode and the median do not; hence,
the sample mean is a better estimator.
However, a point estimator is sometimes insufficient because it
may or may not be close to respective parameter. If the
shopkeeper's point estimate is wrong, it does not indicate any
information about the extent of its error or deviation from true
value, although this deviation can be reduced by increasing the
sample size. Again, the point estimate does not specify the
confidence regarding its closeness to the parameter. However,
this amount of error and confidence can be accomplished by a
second type of estimation, called interval estimation.
Confidence Level or co-efficient
The percent of the samples, the actual mean, will lie in the
specified interval estimate. Confidence level, also referred to as
confidence coefficient, is a statistical term that measures the
Page 5 of 37
level of confidence or certainty associated with an interval
estimate. It is denoted by (1 - α), where α is the significance
level or the probability of making a Type I error (rejecting a true
null hypothesis). The confidence level represents the proportion
of intervals constructed from repeated sampling that would
contain the true population parameter.
For example, if we construct a 95% confidence interval for a
population mean, it means that if we were to repeat the sampling
process and construct 100 different intervals, approximately 95
of those intervals would contain the true population mean.
The choice of confidence level depends on the desired level of
certainty or risk of making a Type I error. Commonly used
confidence levels include 90%, 95%, and 99%. A higher
confidence level provides a wider interval, resulting in greater
certainty but potentially sacrificing precision.
Confidence Interval
An interval estimates of the parameter with a specific
probability or level of confidence. The construction of a
confidence interval involves selecting a desired confidence
level, typically expressed as a percentage (e.g., 90%, 95%, or
99%).
The margin of error (ME)
Page 6 of 37
The margin of error is a statistical measure that quantifies the
uncertainty or variability associated with an estimate, typically
used in the context of constructing confidence intervals. It
represents the maximum expected difference between the point
estimate and the true population parameter. The margin of error
provides a range within which the true value is likely to fall,
given a certain level of confidence. Margin of Error = Critical
value * Standard error of the estimate.
Confidence interval for mean
The formula for calculating a confidence interval for the
population mean depends on whether the sample size is large (n
≥ 30) or small (n < 30) and whether the population standard
deviation (σ) is known or unknown.
Here are the formulas for constructing confidence intervals for
the population mean:
1. Large Sample Size (n ≥ 30) with Known Population
Standard Deviation (σ): The confidence interval is
calculated as:
Confidence Interval = (x̄ - z * σ / √n, x̄ + z * σ / √n)
where x̄ is the sample mean, z is the critical value from
the standard normal distribution based on the desired
confidence level, σ is the population standard deviation,
and n is the sample size.
Page 7 of 37
2. Large Sample Size (n ≥ 30) with Unknown
Population Standard Deviation: In this case, the
sample standard deviation (s) is used as an estimate for
the population standard deviation. The confidence
interval is calculated as:
Confidence Interval = (x̄ - z * s / √n, x̄ + z * s / √n)
where x̄ is the sample mean, z is the critical value from
the standard normal distribution based on the desired
confidence level, s is the sample standard deviation, and
n is the sample size.
3. Small Sample Size (n < 30) with Known Population
Standard Deviation (σ): If the population is normally
distributed and the sample size is small with a known
population standard deviation, the confidence interval is
calculated using the t-distribution instead of the standard
normal distribution. The formula is:
Confidence Interval = (x̄ - t * σ / √n, x̄ + t * σ / √n)
where x̄ is the sample mean, t is the critical value from
the t-distribution based on the desired confidence level
and degrees of freedom (n - 1), σ is the population
standard deviation, and n is the sample size.
4. Small Sample Size (n < 30) with Unknown
Population Standard Deviation: In this case, both the
Page 8 of 37
sample mean and the sample standard deviation are used
to calculate the confidence interval. The formula is:
Confidence Interval = (x̄ - t * s / √n, x̄ + t * s / √n)
where x̄ is the sample mean, t is the critical value from
the t-distribution based on the desired confidence level
and degrees of freedom (n - 1), s is the sample standard
deviation, and n is the sample size.
Width of confidence interval
Sometimes we are interested in the width of a confidence
interval, which is simply defined as the difference between the
upper confidence limit and the lower confidence limit. There are
three factors that affect the width, namely, the value of standard
deviation, the size of sample n, and the level of confidence
required.
Confidence interval for population proportion (for large
sample)
Let ̂ denotes the observed proportion of ‘successes’ in a
particular sample of n observations from a population with a
proportion of successes . Then, if n is large enough, say, n (1
– ) > 9, a 100(1 – α) % confidence interval for the population
proportion is given by
𝑝̂(1-p̂) 𝑝̂(1-p̂)
𝑝̂ − 𝑧𝛼/2 √ < P < 𝑝̂ + 𝑧𝛼/2 √
𝑛 𝑛
Page 9 of 37
Or, 𝑝̂ ± ME, where ME, the margin of error, is given by ME
𝑝̂(1-p̂)
= 𝑧𝛼/2 √ 𝑛
𝑝̂(1-p̂)
Upper limit, P2 = 𝑝̂ + 𝑧𝛼/2 √ .
𝑛
0.152).
Here, 𝑥 = 5.30, and for a 95% confidence interval, the value of
z is 1.96, so.
(i) A 95% confidence interval for the population mean
breaking strain is given by
𝜎
𝑥 ± 1.96 = 5.30 ± 1.96 × 0.15= (5.006, 5.594).
√𝑛
0.65(1-0.65)
P2 = 0.65 + 2.58√ = 0.686
1200
Page 18 of 37
interval estimate of the true population proportion that favors
this modified plan.
Solution:
Suppose denotes the true population proportion and p denotes
the sample proportion; then 100(1 – α) % confidence interval
𝑝(1-p)
for the population proportion is given by 𝑝 ± √ .
𝑛
0.759(1-0.759)
P2 = 0.75 + 1.645. √ = 0.797
344
1 1 1 1
(𝑋̅1 − 𝑋̅2 ) − 𝑡. 𝑠 √ + (𝑋̅1 − 𝑋̅2 ) + 𝑡. 𝑠 √ + )
𝑛1 𝑛2 𝑛1 𝑛2
Page 26 of 37
the sample if the population is assumed to be infinite in the
given case?
Solution:
To determine the sample size for estimating the percent
defective within 3 percent of the true value with a 95 percent
probability, we can use the sample size estimation formula for
proportions.
Given information: N = 6000 (size of the population), E = 0.03
(desired margin of error, i.e., 3% of the true value), Z = 1.96
(corresponding to a 95% confidence level from the standard
normal distribution)
First, let's calculate the sample size assuming a finite
population:
n = (Z² * p * (1 - p) * N) / [(Z² * p * (1 - p)) + (E² * (N - 1))]
Since the true proportion p is unknown, we assume p = 0.5 to
get the maximum required sample size.
n = (1.96² * 0.5 * (1 - 0.5) * 6000) / [(1.96² * 0.5 * (1 - 0.5)) +
(0.03² * (6000 - 1))] n ≈ 1067.311
Since the sample size cannot be fractional, we round up to the
nearest whole number. Therefore, a sample size of 1068 will be
required to estimate the percent defective within 3% of the true
value with a 95% probability when assuming a finite
population.
Page 27 of 37
Next, let's consider the case where we assume an infinite
population:
n = (Z² * p * (1 - p)) / (E²)
Using this formula, we can calculate the sample size assuming
an infinite population:
n = (1.96² * 0.5 * (1 - 0.5)) / (0.03²) n ≈ 1066.444
Again, rounding up to the nearest whole number, we would
require a sample size of 1067 if we assume an infinite
population.
Therefore, the sample size would be 1068 if the population is
finite (N = 6000), and it would be 1067 if the population is
assumed to be infinite in the given case.
Problem-4: What should be the sample size from a set of 2000
accounts if the standard deviation of default as per past
experience was 2.6 when a 95% confidence is desired, and the
sample mean should not differ by more than half from the
population means?
Solution:
To determine the required sample size for estimating the
population mean with a desired level of confidence and a
specific margin of error, we can use the formula:
n = (Z^2 * σ^2) / E^2
Page 28 of 37
Where: n = sample size Z = Z-score corresponding to the desired
confidence level (in this case, 95% confidence level
corresponds to a Z-score of approximately 1.96) σ = standard
deviation of the population E = desired margin of error (half the
difference from the population mean)
In this case, the standard deviation (σ) is given as 2.6, and the
desired margin of error (E) is half the difference from the
population mean.
To calculate the sample size, we can plug in the values:
n = (1.96^2 * 2.6^2) / (0.5^2)
n = (3.8416 * 6.76) / 0.25
n ≈ 103.3472
Rounding up to the nearest whole number, the required sample
size is approximately 104.
Therefore, a sample size of 104 accounts should be taken from
the set of 2000 accounts to estimate the population mean with a
95% confidence level, ensuring that the sample mean does not
differ by more than half from the population mean.
Exercise
Ex-1: For example, suppose a random sample of high school
students is selected to determine if there is a difference between
how long male and female students sleep at night. If 83 male
students are randomly chosen and yield an average of 6.6 hours
Page 29 of 37
of sleep with a standard error of 1.8. And 65 females are
randomly selected with an average of 6.9 hours of sleep with a
standard error of 1.5. Construct a 95% confidence interval for
the difference between the two mean sleep hours for males vs.
females.
Ex-2: You are given the following information relating to the
purchase of bulbs from two manufacturers, A and B:
Manufacturer No. of Bulbs bought Mean Life S.D
A 100 2950 hrs. 100 hrs.
B 100 2970 hrs. 90 hrs
Construct a 99% confidence interval for the difference between
the two mean life of two makes of bulbs.
Ex-3: 20 people were attacked by a disease, and only 18
survived. Will you construct a confidence interval for the
survival rate, if attacked by this disease, is 85% in favor of the
hypothesis that is more, at 1% significance level?
Ex-4: The following are the weights (in lbs) of a random sample of
10 employees working in the shipping department of a wholesale
grocery firm: 154, 154,186,243,159,174,183,163,192,281.
On the basis of this data, can you construct a confidence interval at
the 0.05 significance level that the firm’s shipping department
employees mean?
Page 30 of 37
Ex-5: Two types of batteries are tested for their length of life, and the
following data are obtained:
Type A(Hours):450,500600,900,850,750,930,750,800,750,630,450,620
Type B (Hours): 360,530,420,620,720,420,560,850,650,450,610,
Construct a 99% confidence interval for the difference between the
two mean.
Ex-6: Prices of shares of a company on the different days in a month
were found to be 66,65,69,70,69,71,70,63,64 and 68. Construct a
confidence interval at the 0.05 significance level variance of the
shares.
Ex-7: A random sample of size 25 is obtained from a normal
population with a mean of 100 and variance of 81,
(i) Find a 99% confidence interval for the population
means.
(ii) Find the required sample size to obtain a width of the
confidence interval of no more than 3.
Ex-8: A random sample of size 60 is obtained from a normal
population with a mean of 80 and variance of 36,
(i) Find a 90% confidence interval for the population mean
(ii) Find the required sample size to obtain a width of
confidence interval of no more than 2.
Ex-9: A random sample of size 60 is obtained from a normal
population with a mean of 80 and variance of 36,
(i) Find a 90% confidence interval for the population mean
Page 31 of 37
(ii) Find the required sample size to obtain a width of
confidence interval of no more than 2.
Ex-10: A sample of size 9 is taken from a normal distribution
with a variance of 36; the sample mean is 128,
(i) Find a 95% CI for population mean of the distribution
and interpret.
(ii) Find a 99% CI for population mean of the distribution
and interpret.
Ex-11: A sample of size 25 is taken from a normal distribution
with a standard deviation of 4; the sample mean is 85,
(i) find a 90% CI for population mean of the distribution
and interpret
(ii) find a 99% CI for population mean of the distribution
and interpret.
(iii)Also, compute the size of samples in both cases required
to obtain a width of a maximum 1.5.
Ex-12: A normal distribution has a standard deviation of 15.
Estimate the sample size required if the following confidence
intervals for the mean should have a width less than 2 i) 90%,
ii) 95%, and iii) 99%
Ex-13: Suppose that we have a population with proportion P =
0.40, and a random sample of size n = 100 drawn from the
population
Page 32 of 37
Find a 95% confidence interval for population proportion and
hence compute the width of the confidence interval.
Ex-14: The lifetime of bulbs produced by a particular company
has a mean of 1200 hours and a standard deviation of 400 hours.
The population distribution is normal. Suppose that you
purchase nine bulbs which can be regarded as a random sample
from the manufacturer's output,
(i) Find 99% confidence interval for population mean
(ii) Suppose the authority is not satisfied with the interval so
obtained; it wishes to have an interval estimation so that
the width is no more than 3 hours. Estimate the sample
size required.
Ex-15: An overnight delivery service claims that, on average, 95
percent of all mail is delivered before noon the following day.
A random sample of 400 deliveries is selected. What proportion
of the sample will have
(i) Determine 90% confidence interval for population
proportion.
(ii) Also, determine the sample size required to obtain a
width of no more than 0.05.
Ex-16: A local bank has 2000 depositors, with 40 percent of
these depositors having current as well as savings accounts. The
Page 33 of 37
rest have only the current accounts. A random sample of 400
such accounts has been selected.
(i) Compute a 99% confidence interval for population
proportion.
(ii) Also, determine the sample size required for a 99%
confidence interval with a width less than 0.03.
Ex-17: A random sample of 16 housewives has an average body
weight of 52 kgs and a standard deviation of 3.6 kgs, compute
the standard error of mean weight and 99% confidence interval
for population mean weight.
Ex-18: The length of metal rods produced by an industrial
process are normally distributed with a standard deviation of 1.8
mm. Based on a random sample of nine observations from this
population, the 99% confidence interval for the mean has been
found as 194.65< <197.75. Now suppose that a production
manager believes that the interval is too wide for practical use
and instead requires a 99% confidence interval extending no
further than 0.50 mm on each side of the sample mean. How
large a sample is needed to achieve such an interval?
Ex-19: The sales manager of a large manufacturing company
wants to check the inventory records against the physical
inventories by a sampling study. He indicates that i) the
maximum sampling error should not be more than 10% above
Page 34 of 37
or below the true proportion of inaccurate records, ii) the level
of confidence interval is 95%, and iii) the proportion of the
inaccurate records is estimated at 25% according to the past
experience.
Ex-20: Suppose the university authority wishes to know if
graduate admission personnel viewed scores on standardized
exams as very important. In a sample of 142 observations, 78
answered 'very important.' Suppose, instead, it must be ensured
that a 95% confidence interval for the population proportion
extends no further than 0.06 on each side of the sample
proportion. How large a sample be needed for this purpose?
Ex-21: A random sample of 120 electrical bulbs is tested, and
the mean duration is found as 78.7 hours. The standard deviation
of the duration of bulbs is 11.5 hours. (i) Find a 95% CI for,
(ii) The authority thinks that the width of the interval is too large
to decide, so determine the sample size to confirm that the width
of this interval should not exceed 2.5 hours.
Ex-22: Suppose that the shopping times for customers at a local
store are normally distributed. A random sample of 16 shoppers
in the local grocery store had a mean time of 25 minutes.
Assume = 6 minutes and the shopping time is normally
distributed; find the standard error of the mean, margin of error,
Page 35 of 37
and width for a 95% confidence interval for the population mean
.
Ex-23: A process produces bags of refined sugar. The weights
of the contents of these bags are normally distributed with SD
1.2 ounces. The contents of 25 bags had a mean weight of 19.8
ounces. Find the 99% confidence interval for the true mean
weight for all bags of sugar produced by the process.
Ex-24: The director of an electronic company is interested in
estimating the mean expenditure of customers on electrical
appliances. A random sample of 80 customers was questioned
and found the average expenditure was Tk 47 thousand with a
standard deviation of 16.5 thousand. Find a 95% confidence
interval for true mean expenditure. Suppose the director is not
satisfied with the confidence interval because it is too high, so
he wishes to know how large a sample would be required to
obtain a 95% confidence interval of the total width of no more
than 4 thousand. Find the smallest size of sample that will
satisfy this desire.
Page 36 of 37
Page 37 of 37