0% found this document useful (0 votes)
70 views37 pages

Estimation and Sample Size Determination

Uploaded by

rafatanha804
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views37 pages

Estimation and Sample Size Determination

Uploaded by

rafatanha804
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

ESTIMATION

Dr. Md. Musa Khan


Associate Professor
DEB, IIUC
Statistical Inference
Statistical inference is the process of using limited information,
a sample, to reach conclusions about a large set of data, the
population.
Two major areas of statistical inference are:
a) Estimation
b) Hypothesis testing
a) Estimation
In statistics, estimation refers to the process of using sample
data to make inferences or draw conclusions about an unknown
population parameter. The goal of estimation is to estimate the
value of a population parameter, such as the mean, proportion,
or standard deviation, based on information obtained from a
sample.
b) Hypothesis testing
A hypothesis test, also known as a test of hypothesis or
statistical hypothesis test, is a procedure used to make
inferences or draw conclusions about a population based on
sample data. It involves formulating a hypothesis about a

Page 1 of 37
population parameter and using sample data to assess the
evidence against the null hypothesis.
Types of Estimation
1. Point Estimation: Point estimation involves estimating
an unknown population parameter using a single value
called a point estimate. The point estimate is a best guess
or approximation of the parameter based on the
available sample data. Examples of point estimates
include the sample mean, sample proportion, sample
variance, or sample regression coefficient.
2. Interval Estimation: Interval estimation provides a
range of plausible values within which the population
parameter is likely to lie. It involves calculating a
confidence interval, which is a range of values centered
around the point estimate. The confidence interval
quantifies the uncertainty associated with the estimation.
Commonly used confidence levels include 90%, 95%,
and 99%. For example, a 95% confidence interval for
the population mean provides a range of values within
which we are 95% confident that the true population
mean lies.
Point estimation provides a single numerical value that
represents the best estimate of the unknown population
Page 2 of 37
parameter, while interval estimation provides a range of values
that represent the uncertainty associated with the estimate. Point
estimation provides a specific value, whereas interval
estimation provides a range of values that account for the
sampling variability and uncertainty in the estimation process.
Estimator
Any sample statistic that is used to estimate a population
parameter is called an estimator. For example, the sample mean
𝑋 can be used as an estimator of the population mean , and the
sample proportion P can be used as an estimator of the
population proportion .
Estimate
An estimate is a specific observed numerical value of a statistic
used to estimate a parameter.

Table-1:

Parameter Statistic Estimator


(Population) (Sample)
Mean µ 𝑥 𝑥 is the estimator of µ
Variance 𝜎2 𝑠2 𝑠 2 is the estimator of 𝜎 2
Standard s s is the estimator of
Deviation
Proportion π/P p p is the estimator of π/P
Correlation ρ r r is the estimator of ρ
Coefficient
Regression β b b is the estimator of β
Coefficient

Page 3 of 37
Desirable properties of a good estimator
The desirable criteria of a good estimator are unbiasedness,
consistency, efficiency, and sufficiency. A brief description of
the criteria is provided below.
(i) Unbiasedness: We know the value of the sample
average would depend on the values of the sample and
may differ from sample to sample. An estimator is said
to be unbiased if its expected value is equal to the
parameter. For example, a sample mean 𝑋 is said to be
an unbiased estimator of a population mean  if E(𝑋) =
 (since the sampling distribution is a probability
distribution, the average is referred to as the expected
value).
(ii) Consistency: Consistency refers to the effect of sample
size on the accuracy of the estimator. A statistic is said
to be a consistent estimator of the population parameter
if it approaches the parameter as the sample size
increases. Thus, the sample mean 𝑋 is a consistent
estimator of a population mean  if 𝑋→  is n →∞ .
(iii)Efficiency: An estimator is considered efficient if its
value remains stable from sample to sample. Hence, the
best estimator is the one that would have the least
variance from sample to sample taken randomly from
Page 4 of 37
the same population. For example, the sample mean is
the best estimator of the population mean than the
median or mode because the variance of the sample
mean is less than that of the median or mode.
(iv) Sufficiency: An estimator is said to be sufficient if it
uses all the information about the population parameter
contained in the sample. For example, the statistic
sample means uses all the sample values for its
computation while mode and the median do not; hence,
the sample mean is a better estimator.
However, a point estimator is sometimes insufficient because it
may or may not be close to respective parameter. If the
shopkeeper's point estimate is wrong, it does not indicate any
information about the extent of its error or deviation from true
value, although this deviation can be reduced by increasing the
sample size. Again, the point estimate does not specify the
confidence regarding its closeness to the parameter. However,
this amount of error and confidence can be accomplished by a
second type of estimation, called interval estimation.
Confidence Level or co-efficient
The percent of the samples, the actual mean, will lie in the
specified interval estimate. Confidence level, also referred to as
confidence coefficient, is a statistical term that measures the
Page 5 of 37
level of confidence or certainty associated with an interval
estimate. It is denoted by (1 - α), where α is the significance
level or the probability of making a Type I error (rejecting a true
null hypothesis). The confidence level represents the proportion
of intervals constructed from repeated sampling that would
contain the true population parameter.
For example, if we construct a 95% confidence interval for a
population mean, it means that if we were to repeat the sampling
process and construct 100 different intervals, approximately 95
of those intervals would contain the true population mean.
The choice of confidence level depends on the desired level of
certainty or risk of making a Type I error. Commonly used
confidence levels include 90%, 95%, and 99%. A higher
confidence level provides a wider interval, resulting in greater
certainty but potentially sacrificing precision.
Confidence Interval
An interval estimates of the parameter with a specific
probability or level of confidence. The construction of a
confidence interval involves selecting a desired confidence
level, typically expressed as a percentage (e.g., 90%, 95%, or
99%).
The margin of error (ME)

Page 6 of 37
The margin of error is a statistical measure that quantifies the
uncertainty or variability associated with an estimate, typically
used in the context of constructing confidence intervals. It
represents the maximum expected difference between the point
estimate and the true population parameter. The margin of error
provides a range within which the true value is likely to fall,
given a certain level of confidence. Margin of Error = Critical
value * Standard error of the estimate.
Confidence interval for mean
The formula for calculating a confidence interval for the
population mean depends on whether the sample size is large (n
≥ 30) or small (n < 30) and whether the population standard
deviation (σ) is known or unknown.
Here are the formulas for constructing confidence intervals for
the population mean:
1. Large Sample Size (n ≥ 30) with Known Population
Standard Deviation (σ): The confidence interval is
calculated as:
Confidence Interval = (x̄ - z * σ / √n, x̄ + z * σ / √n)
where x̄ is the sample mean, z is the critical value from
the standard normal distribution based on the desired
confidence level, σ is the population standard deviation,
and n is the sample size.
Page 7 of 37
2. Large Sample Size (n ≥ 30) with Unknown
Population Standard Deviation: In this case, the
sample standard deviation (s) is used as an estimate for
the population standard deviation. The confidence
interval is calculated as:
Confidence Interval = (x̄ - z * s / √n, x̄ + z * s / √n)
where x̄ is the sample mean, z is the critical value from
the standard normal distribution based on the desired
confidence level, s is the sample standard deviation, and
n is the sample size.
3. Small Sample Size (n < 30) with Known Population
Standard Deviation (σ): If the population is normally
distributed and the sample size is small with a known
population standard deviation, the confidence interval is
calculated using the t-distribution instead of the standard
normal distribution. The formula is:
Confidence Interval = (x̄ - t * σ / √n, x̄ + t * σ / √n)
where x̄ is the sample mean, t is the critical value from
the t-distribution based on the desired confidence level
and degrees of freedom (n - 1), σ is the population
standard deviation, and n is the sample size.
4. Small Sample Size (n < 30) with Unknown
Population Standard Deviation: In this case, both the
Page 8 of 37
sample mean and the sample standard deviation are used
to calculate the confidence interval. The formula is:
Confidence Interval = (x̄ - t * s / √n, x̄ + t * s / √n)
where x̄ is the sample mean, t is the critical value from
the t-distribution based on the desired confidence level
and degrees of freedom (n - 1), s is the sample standard
deviation, and n is the sample size.
Width of confidence interval
Sometimes we are interested in the width of a confidence
interval, which is simply defined as the difference between the
upper confidence limit and the lower confidence limit. There are
three factors that affect the width, namely, the value of standard
deviation, the size of sample n, and the level of confidence
required.
Confidence interval for population proportion (for large
sample)
Let ̂ denotes the observed proportion of ‘successes’ in a
particular sample of n observations from a population with a
proportion of successes . Then, if n is large enough, say, n (1
– ) > 9, a 100(1 – α) % confidence interval for the population
proportion is given by
𝑝̂(1-p̂) 𝑝̂(1-p̂)
𝑝̂ − 𝑧𝛼/2 √ < P < 𝑝̂ + 𝑧𝛼/2 √
𝑛 𝑛

Page 9 of 37
Or, 𝑝̂ ± ME, where ME, the margin of error, is given by ME
𝑝̂(1-p̂)
= 𝑧𝛼/2 √ 𝑛

Hence, the lower limit is given by


𝑝̂(1-p̂)
P1 = 𝑝̂ − 𝑧𝛼/2 √ and
𝑛

𝑝̂(1-p̂)
Upper limit, P2 = 𝑝̂ + 𝑧𝛼/2 √ .
𝑛

Example 1: The sponsor of a television program targeted at the


children's market wants to find out the average period of time
children spend watching television. A random sample of 100
children indicated the average time spent by them per week to
be 27.2 hours. From the previous experience, the population
standard deviation of the weekly extent of watching television
is known to be 8 hours. Calculate a 95% confidence interval for
the population mean watching time if it is considered to be
adequate for making decisions about the average period of
watching television.
Solution:
To calculate the 95% confidence interval for the population
mean watching time, we can use the formula for a confidence
interval when the population standard deviation is known:
Confidence Interval = (x̄ - z * (σ / √n), x̄ + z * (σ / √n))
Where:
Page 10 of 37
• Confidence Interval: The range of values within which
the true population mean is likely to fall.
• x̄: The sample mean watching time.
• z: The critical value from the standard normal
distribution corresponding to the desired confidence
level. For a 95% confidence level, z ≈ 1.96.
• σ: The population standard deviation.
• n: The sample size.
Given the information in the problem, we have:
• x̄ (sample mean watching time) = 27.2 hours.
• z (critical value for a 95% confidence level) ≈ 1.96.
• σ (population standard deviation) = 8 hours.
• n (sample size) = 100 children.
Plugging these values into the formula, we get:
Confidence Interval = (27.2 - 1.96 * (8 / √100), 27.2 + 1.96 * (8
/ √100))
Simplifying this equation will give us the lower and upper
bounds of the confidence interval.
Calculating the values:
Confidence Interval = (27.2 - 1.96 * (8 / 10), 27.2 + 1.96 * (8 /
10))
Confidence Interval = (27.2 - 1.96 * 0.8, 27.2 + 1.96 * 0.8)
Confidence Interval = (27.2 - 1.568, 27.2 + 1.568)
Page 11 of 37
Confidence Interval = (25.632, 28.768)
Therefore, the 95% confidence interval for the population mean
watching time is (25.632, 28.768) hours. This means that we can
be 95% confident that the true average period of time children
spends watching television falls within this interval based on the
given sample.
Example 2: Gasoline prices rose dramatically during the recent
months of this year. A study was conducted with 24 trucks if the
fuel consumption, in miles per gallon, for these 24 trucks were
15.5 21.0 18.5 19.3 19.7 16.9 20.2 14.5
16.5 19.2 18.7 18.2 18.0 17.5 18.5 20.5
18.6 19.1 19.8 18.0 19.8 18.2 20.3 21.8
i) Estimate the population's mean fuel consumption for the
trucks with 90% confidence.
ii) Later, it has been found that the actual population's mean
consumption is 20 miles per gallon. What conclusion might the
statistician draw about his sample?
Solution:
To estimate the population's mean fuel consumption for the
trucks with 90% confidence, we can use the formula for a
confidence interval when the population standard deviation is
unknown. The formula is:
Confidence Interval = (x̄ - t * (s / √n), x̄ + t * (s / √n))
Page 12 of 37
Where:
• Confidence Interval: The range of values within which
the true population mean is likely to fall.
• x̄: The sample mean fuel consumption.
• t: The critical value from the t-distribution
corresponding to the desired confidence level and
degrees of freedom. For a 90% confidence level with 23
degrees of freedom (24 - 1), t ≈ 1.714.
• s: The sample standard deviation.
• n: The sample size.
Given the fuel consumption data for the 24 trucks, we can
calculate the sample mean (x̄) and sample standard deviation (s).
Sample mean (x̄) = (15.5 + 21.0 + 18.5 + ... + 18.2 + 20.3 +
21.8) / 24 Sample mean (x̄) = 18.850
Sample standard deviation (s) = √ [((15.5 - x̄) ^2 + (21.0 - x̄)^2
+ ... + (21.8 - x̄)^2) / (n - 1)]
Sample standard deviation (s) = 1.731
Using the formula and plugging in the values, we get:
Confidence Interval = (18.850 - 1.714 * (1.731 / √24), 18.850 +
1.714 * (1.731 / √24))
Simplifying this equation will give us the lower and upper
bounds of the confidence interval.
Calculating the values:
Page 13 of 37
Confidence Interval = (18.850 - 1.714 * (1.731 / √24), 18.850 +
1.714 * (1.731 / √24))
Confidence Interval = (18.850 - 1.714 * 0.353, 18.850 + 1.714
* 0.353)
Confidence Interval = (18.850 - 0.605, 18.850 + 0.605)
Confidence Interval = (18.245, 19.455)
i) Therefore, the 90% confidence interval for the population's
mean fuel consumption for the trucks is (18.245, 19.455) miles
per gallon. This means that we can be 90% confident that the
true average fuel consumption for the trucks falls within this
interval based on the given sample.
ii) If the actual population mean consumption is known to be 20
miles per gallon, we can compare it to the confidence interval.
In this case, the population mean of 20 falls within the
confidence interval (18.245, 19.455). Therefore, the statistician
may conclude that the sample data is consistent with the actual
population mean consumption of 20 miles per gallon.
Example 3: Monomoy is a student at a private university who
wants to buy a new model car for his personal use. He has
decided that he will buy a Toyota of 2010. He randomly selected
100 sales advertisements from the local newspapers over a
period of six months and found that the average car price in this
sample was $4500 (expressed in dollars to make the figures
Page 14 of 37
smaller). He also knows that the standard deviation of such new
model car prices is $520. Establish a 95% confidence interval
for the true average price for all new model cars in this category.
Solution:
To establish a 95% confidence interval for the true average price
of all new model cars in the category, we can use the formula
for a confidence interval when the population standard
deviation is known.
The formula is:
Confidence Interval = (x̄ - z * (σ / √n), x̄ + z * (σ / √n))
Where:
• Confidence Interval: The range of values within which
the true population mean is likely to fall.
• x̄: The sample mean price.
• z: The critical value from the standard normal
distribution corresponding to the desired confidence
level. For a 95% confidence level, z ≈ 1.96.
• σ: The population standard deviation.
• n: The sample size.
Given the information in the problem, we have:
• x̄ (sample mean price) = $4500.
• z (critical value for a 95% confidence level) ≈ 1.96.
• σ (population standard deviation) = $520.
Page 15 of 37
• n (sample size) = 100 cars.
Plugging these values into the formula, we get:
Confidence Interval = ($4500 - 1.96 * ($520 / √100), $4500 +
1.96 * ($520 / √100))
Simplifying this equation will give us the lower and upper
bounds of the confidence interval.
Calculating the values:
Confidence Interval = ($4500 - 1.96 * ($520 / 10), $4500 + 1.96
* ($520 / 10))
Confidence Interval = ($4500 - 1.96 * 52, $4500 + 1.96 * 52)
Confidence Interval = ($4500 - 101.92, $4500 + 101.92)
Confidence Interval = ($4398.08, $4601.92)
Therefore, the 95% confidence interval for the true average
price of all new model cars in the category is ($4398.08,
$4601.92).
This means that we can be 95% confident that the true average
price falls within this interval based on the given sample.
Example 4: The breaking strains of reels of string produced at
a factory have a standard deviation of 1.5 kg. A sample of 100
reels from a certain batch was tested, and the mean breaking
strain was found as 5.30 kg.
(i) Find a 95% confidence interval for the mean breaking
strain of the string.
Page 16 of 37
(ii) The manufacturer becomes concerned if the lower 95%
confidence limit falls below 5 kg. A sample of 80 reels
from another batch gave a mean breaking strain of 5.31
kg. Will the manufacturer be concerned?
Solution:
The distribution of braking strain is not known, but the sample
is quite large; hence, by virtue of the central limit theorem, the
sample mean will be approximately normally distributed with
1.5
mean  and standard error = 0.15, which means 𝑥~N(,
√100

0.152).
Here, 𝑥 = 5.30, and for a 95% confidence interval, the value of
z is 1.96, so.
(i) A 95% confidence interval for the population mean
breaking strain is given by
𝜎
𝑥 ± 1.96 = 5.30 ± 1.96 × 0.15= (5.006, 5.594).
√𝑛

(ii) If the mean breaking strain for another sample of size is


found as 5.31, the lower 95% confidence limit would be
𝜎 1.5
𝑥 − 1.96 = 5.31 − 1.96 80 = 4.98
√𝑛

hence the manufacturer would be concerned because this


estimated limit is beyond the CI obtained for the first batch.
Example 5: A survey was made on the opinion of mobile phone
users of a particular operator whether the service provided by
Page 17 of 37
them is sufficient to satisfy a customer. 1200 users were
randomly selected, and 780 of them answered in the affirmative.
Construct a 99% confidence interval for the corresponding true
proportion of affirmative answers regarding such opinions of all
users.
Solution:
We know the 99% confidence interval for population proportion
𝑝(1-p)
is given by 𝑝 ± 𝑧0.005 √ .
𝑛

Here, the sample proportion of users who are in favor of


satisfactory service is n = 1200, so p = 780/1200 = 0.65, and
zα/2 = z0.005 = 2.58 (from the standard normal table)
Substituting the values of n, p, and z0.005, we have,
0.65(1-0.65)
P1 =0.65 − 2.58√ = 0.614 and
1200

0.65(1-0.65)
P2 = 0.65 + 2.58√ = 0.686
1200

Hence the 99% confidence interval for P is given by (0.614,


0.686).
Example 6: Management wants to estimate the proportion of
the corporation’s employees who favor a modified bonus plan.
From a random sample of 344 employees, it was found that 261
were in favor of this particular plan. Find a 90% confidence

Page 18 of 37
interval estimate of the true population proportion that favors
this modified plan.
Solution:
Suppose  denotes the true population proportion and p denotes
the sample proportion; then 100(1 – α) % confidence interval
𝑝(1-p)
for the population proportion is given by 𝑝 ± √ .
𝑛

Here, n = 344, so, p = 261/344 = 0.759 and z0.05 = 1.645.


Therefore, the required confidence interval is given by
0.759(1-0.759)
P1 = 0.759 − 1.645. √ = 0.721 and
344

0.759(1-0.759)
P2 = 0.75 + 1.645. √ = 0.797
344

The confidence interval for the difference between the two


populations means
The formula for calculating a confidence interval for the
difference between two population means, assuming large
samples and known population standard deviations, is as
follows:
Confidence Interval = (x̄₁ - x̄₂) ± z * √ ((σ₁² / n₁) + (σ₂² / n₂))
Where:
• Confidence Interval: The range of values within which
the true difference between the population means is
likely to fall.
Page 19 of 37
• x̄₁ and x̄₂: The sample means of the two populations.
• z: The critical value from the standard normal
distribution corresponding to the desired confidence
level.
• σ₁ and σ₂: The population standard deviations of the two
populations.
• n₁ and n₂: The sample sizes of the two populations.
Note: The sample sizes (n₁ and n₂) should be large (typically
considered as n₁ ≥ 30 and n₂ ≥ 30) for the Central Limit Theorem
to apply.
Confidence interval for differences of two population mean
(small sample)

1 1 1 1
(𝑋̅1 − 𝑋̅2 ) − 𝑡. 𝑠 √ + (𝑋̅1 − 𝑋̅2 ) + 𝑡. 𝑠 √ + )
𝑛1 𝑛2 𝑛1 𝑛2

(𝑛1 −1)𝑠12 + (𝑛2 −1)𝑠22


𝑤ℎ𝑒𝑟𝑒, 𝑠 2 = (𝑛1 +𝑛2 −2)

Sample size determination for mean


For Large sample
When dealing with a large sample, the sample size estimation
formula for estimating the population mean is simplified
because the sample mean tends to follow a normal distribution
Page 20 of 37
due to the Central Limit Theorem. In this case, you can use the
following formula:
For unknown population
n = (Z² * σ²) / (E²)
Where:
• n: The required sample size.
• Z: The critical value from the standard normal
distribution corresponding to the desired confidence
level. It represents the number of standard deviations
from the mean that will include the desired proportion
of the population. For example, for a 95% confidence
level, Z ≈ 1.96.
• σ: The estimated or known standard deviation of the
population.
• E: The desired margin of error, which represents the
maximum allowed difference between the sample mean
and the true population mean.
For finite population
n = (Z² * σ² * N) / [(Z² * σ²) + (E² * (N - 1))]
Where:
• n: The required sample size.
• Z: The critical value from the t-distribution corresponds
to the desired confidence level and degrees of freedom.
Page 21 of 37
It represents the number of standard deviations from the
mean that will include the desired proportion of the
population.
• σ: The estimated or unknown standard deviation of the
population.
• E: The desired margin of error, which represents the
maximum allowed difference between the sample mean
and the true population mean.
• N: The approximate population size. If the population is
very large or unknown, N can be considered infinite (∞).
For small sample size
When dealing with a small sample size and unknown population
standard deviation, the sample size estimation formula for
estimating the population mean can be adjusted using the t-
distribution. In this case, you can use the following formula:
For unknown population
n = [(t * σ) / E] ²
For finite population
n = (t² * σ² * N) / [(t² * σ²) + (E² * (N - 1))]
Where:
• n: The required sample size.
• t: The critical value from the t-distribution
corresponding to the desired confidence level, degrees
Page 22 of 37
of freedom, and tail area. It represents the number of
standard deviations from the mean that will include the
desired proportion of the population. The degrees of
freedom depend on the sample size and are typically n -
1.
• σ: The estimated or unknown standard deviation of the
population.
• E: The desired margin of error, which represents the
maximum allowed difference between the sample mean
and the true population mean.
• N: The approximate population size. If the population
is very large or unknown, N can be considered infinite
(∞).
Sample size determination for proportion
The sample size estimation formula for estimating a proportion
in a population is given by:
For known population
n = (Z² * p * (1 - p)) / (E²)
For finite population
n = (Z² * p * (1 - p) * N) / [(Z² * p * (1 - p)) + (E² * (N - 1))]
Where:
• n: The required sample size.
• Z: The critical value from the standard normal
distribution corresponding to the desired confidence
Page 23 of 37
level. It represents the number of standard deviations
from the mean that will include the desired proportion
of the population. For example, for a 95% confidence
level, Z ≈ 1.96.
• p: The estimated or expected proportion of the
population.
• E: The desired margin of error, which represents the
maximum allowed difference between the sample
proportion and the true population proportion.
• N: The size of the finite population.
Problem 1: A researcher wants to estimate the average weight
of apples in a specific orchard with a 95% confidence level and
a margin of error of 0.1 kg. The researcher knows from prior
knowledge that the standard deviation of apple weights in this
orchard is approximately 0.5 kg. What sample size is needed to
estimate the population mean within the desired margin of
error?
Solution:
We can use the formula for sample size estimation:
n = (Z² * σ²) / (E²)
Given: Z = 1.96 (corresponding to a 95% confidence level from
the standard normal distribution) σ = 0.5 kg (standard deviation
of apple weights) E = 0.1 kg (desired margin of error)
Page 24 of 37
Substituting the values into the formula:
n = (1.96² * 0.5²) / (0.1²) n = (3.8416 * 0.25) / 0.01 n = 0.9604 /
0.01 n = 96.04
Since the sample size cannot be fractional, we round up to the
nearest whole number. Therefore, a sample size of 97 apples is
needed to estimate the average weight of apples within a margin
of error of 0.1 kg at a 95% confidence level.
Problem-2: Determine the size of the sample for estimating the
per capita income for the universe with N=5000 on the basis of
the following information: The standard deviation of per capita
income on the basis of past records = 0.75. The estimate should
be within a 5% error of the true income with a 95% confidence
level. Will there be a change in the size of the sample if we
assume infinite population in the given case? If so, explain how
much.
Solution:
To determine the sample size for estimating the per capita
income for a universe with N = 5000, we can use the following
formula:
n = (Z² * σ² * N) / [(Z² * σ²) + (E² * (N - 1))]
Given information: N = 5000 (size of the universe)
σ = 0.75 (standard deviation of per capita income)
E = 0.05 (desired margin of error)
Page 25 of 37
Z = 1.96 (corresponding to a 95% confidence level)
Now, let's calculate the sample size using the formula:
n = (1.96² * 0.75² * 5000) / [(1.96² * 0.75²) + (0.05² * (5000 -
1))] n ≈ 480.3636
Since the sample size cannot be fractional, we round up to the
nearest whole number. Therefore, a sample size of 481 will be
required to estimate the per capita income for the universe with
a 95% confidence level and a 5% margin of error.
Now, let's consider the case where we assume an infinite
population. In such a case, we use the sample size estimation
formula for an infinite population, which is:
n = (Z² * σ²) / (E²)
Using this formula, the sample size estimation for an infinite
population would be:
n = (1.96² * 0.75²) / (0.05²)
n ≈ 864
Thus, in the case of an infinite population, the sample size
becomes larger.
Problem-3: What would be the size of the sample if a simple
random sample from a population of 6000 items is to be drawn
to estimate the percent defective within 3 percent of the true
value with 95 percent probability? What would be the size of

Page 26 of 37
the sample if the population is assumed to be infinite in the
given case?
Solution:
To determine the sample size for estimating the percent
defective within 3 percent of the true value with a 95 percent
probability, we can use the sample size estimation formula for
proportions.
Given information: N = 6000 (size of the population), E = 0.03
(desired margin of error, i.e., 3% of the true value), Z = 1.96
(corresponding to a 95% confidence level from the standard
normal distribution)
First, let's calculate the sample size assuming a finite
population:
n = (Z² * p * (1 - p) * N) / [(Z² * p * (1 - p)) + (E² * (N - 1))]
Since the true proportion p is unknown, we assume p = 0.5 to
get the maximum required sample size.
n = (1.96² * 0.5 * (1 - 0.5) * 6000) / [(1.96² * 0.5 * (1 - 0.5)) +
(0.03² * (6000 - 1))] n ≈ 1067.311
Since the sample size cannot be fractional, we round up to the
nearest whole number. Therefore, a sample size of 1068 will be
required to estimate the percent defective within 3% of the true
value with a 95% probability when assuming a finite
population.
Page 27 of 37
Next, let's consider the case where we assume an infinite
population:
n = (Z² * p * (1 - p)) / (E²)
Using this formula, we can calculate the sample size assuming
an infinite population:
n = (1.96² * 0.5 * (1 - 0.5)) / (0.03²) n ≈ 1066.444
Again, rounding up to the nearest whole number, we would
require a sample size of 1067 if we assume an infinite
population.
Therefore, the sample size would be 1068 if the population is
finite (N = 6000), and it would be 1067 if the population is
assumed to be infinite in the given case.
Problem-4: What should be the sample size from a set of 2000
accounts if the standard deviation of default as per past
experience was 2.6 when a 95% confidence is desired, and the
sample mean should not differ by more than half from the
population means?
Solution:
To determine the required sample size for estimating the
population mean with a desired level of confidence and a
specific margin of error, we can use the formula:
n = (Z^2 * σ^2) / E^2

Page 28 of 37
Where: n = sample size Z = Z-score corresponding to the desired
confidence level (in this case, 95% confidence level
corresponds to a Z-score of approximately 1.96) σ = standard
deviation of the population E = desired margin of error (half the
difference from the population mean)
In this case, the standard deviation (σ) is given as 2.6, and the
desired margin of error (E) is half the difference from the
population mean.
To calculate the sample size, we can plug in the values:
n = (1.96^2 * 2.6^2) / (0.5^2)
n = (3.8416 * 6.76) / 0.25
n ≈ 103.3472
Rounding up to the nearest whole number, the required sample
size is approximately 104.
Therefore, a sample size of 104 accounts should be taken from
the set of 2000 accounts to estimate the population mean with a
95% confidence level, ensuring that the sample mean does not
differ by more than half from the population mean.
Exercise
Ex-1: For example, suppose a random sample of high school
students is selected to determine if there is a difference between
how long male and female students sleep at night. If 83 male
students are randomly chosen and yield an average of 6.6 hours
Page 29 of 37
of sleep with a standard error of 1.8. And 65 females are
randomly selected with an average of 6.9 hours of sleep with a
standard error of 1.5. Construct a 95% confidence interval for
the difference between the two mean sleep hours for males vs.
females.
Ex-2: You are given the following information relating to the
purchase of bulbs from two manufacturers, A and B:
Manufacturer No. of Bulbs bought Mean Life S.D
A 100 2950 hrs. 100 hrs.
B 100 2970 hrs. 90 hrs
Construct a 99% confidence interval for the difference between
the two mean life of two makes of bulbs.
Ex-3: 20 people were attacked by a disease, and only 18
survived. Will you construct a confidence interval for the
survival rate, if attacked by this disease, is 85% in favor of the
hypothesis that is more, at 1% significance level?
Ex-4: The following are the weights (in lbs) of a random sample of
10 employees working in the shipping department of a wholesale
grocery firm: 154, 154,186,243,159,174,183,163,192,281.
On the basis of this data, can you construct a confidence interval at
the 0.05 significance level that the firm’s shipping department
employees mean?

Page 30 of 37
Ex-5: Two types of batteries are tested for their length of life, and the
following data are obtained:
Type A(Hours):450,500600,900,850,750,930,750,800,750,630,450,620
Type B (Hours): 360,530,420,620,720,420,560,850,650,450,610,
Construct a 99% confidence interval for the difference between the
two mean.
Ex-6: Prices of shares of a company on the different days in a month
were found to be 66,65,69,70,69,71,70,63,64 and 68. Construct a
confidence interval at the 0.05 significance level variance of the
shares.
Ex-7: A random sample of size 25 is obtained from a normal
population with a mean of 100 and variance of 81,
(i) Find a 99% confidence interval for the population
means.
(ii) Find the required sample size to obtain a width of the
confidence interval of no more than 3.
Ex-8: A random sample of size 60 is obtained from a normal
population with a mean of 80 and variance of 36,
(i) Find a 90% confidence interval for the population mean
(ii) Find the required sample size to obtain a width of
confidence interval of no more than 2.
Ex-9: A random sample of size 60 is obtained from a normal
population with a mean of 80 and variance of 36,
(i) Find a 90% confidence interval for the population mean
Page 31 of 37
(ii) Find the required sample size to obtain a width of
confidence interval of no more than 2.
Ex-10: A sample of size 9 is taken from a normal distribution
with a variance of 36; the sample mean is 128,
(i) Find a 95% CI for population mean  of the distribution
and interpret.
(ii) Find a 99% CI for population mean  of the distribution
and interpret.
Ex-11: A sample of size 25 is taken from a normal distribution
with a standard deviation of 4; the sample mean is 85,
(i) find a 90% CI for population mean  of the distribution
and interpret
(ii) find a 99% CI for population mean  of the distribution
and interpret.
(iii)Also, compute the size of samples in both cases required
to obtain a width of a maximum 1.5.
Ex-12: A normal distribution has a standard deviation of 15.
Estimate the sample size required if the following confidence
intervals for the mean should have a width less than 2 i) 90%,
ii) 95%, and iii) 99%
Ex-13: Suppose that we have a population with proportion P =
0.40, and a random sample of size n = 100 drawn from the
population
Page 32 of 37
Find a 95% confidence interval for population proportion and
hence compute the width of the confidence interval.
Ex-14: The lifetime of bulbs produced by a particular company
has a mean of 1200 hours and a standard deviation of 400 hours.
The population distribution is normal. Suppose that you
purchase nine bulbs which can be regarded as a random sample
from the manufacturer's output,
(i) Find 99% confidence interval for population mean
(ii) Suppose the authority is not satisfied with the interval so
obtained; it wishes to have an interval estimation so that
the width is no more than 3 hours. Estimate the sample
size required.
Ex-15: An overnight delivery service claims that, on average, 95
percent of all mail is delivered before noon the following day.
A random sample of 400 deliveries is selected. What proportion
of the sample will have
(i) Determine 90% confidence interval for population
proportion.
(ii) Also, determine the sample size required to obtain a
width of no more than 0.05.
Ex-16: A local bank has 2000 depositors, with 40 percent of
these depositors having current as well as savings accounts. The

Page 33 of 37
rest have only the current accounts. A random sample of 400
such accounts has been selected.
(i) Compute a 99% confidence interval for population
proportion.
(ii) Also, determine the sample size required for a 99%
confidence interval with a width less than 0.03.
Ex-17: A random sample of 16 housewives has an average body
weight of 52 kgs and a standard deviation of 3.6 kgs, compute
the standard error of mean weight and 99% confidence interval
for population mean weight.
Ex-18: The length of metal rods produced by an industrial
process are normally distributed with a standard deviation of 1.8
mm. Based on a random sample of nine observations from this
population, the 99% confidence interval for the mean has been
found as 194.65<  <197.75. Now suppose that a production
manager believes that the interval is too wide for practical use
and instead requires a 99% confidence interval extending no
further than 0.50 mm on each side of the sample mean. How
large a sample is needed to achieve such an interval?
Ex-19: The sales manager of a large manufacturing company
wants to check the inventory records against the physical
inventories by a sampling study. He indicates that i) the
maximum sampling error should not be more than 10% above
Page 34 of 37
or below the true proportion of inaccurate records, ii) the level
of confidence interval is 95%, and iii) the proportion of the
inaccurate records is estimated at 25% according to the past
experience.
Ex-20: Suppose the university authority wishes to know if
graduate admission personnel viewed scores on standardized
exams as very important. In a sample of 142 observations, 78
answered 'very important.' Suppose, instead, it must be ensured
that a 95% confidence interval for the population proportion
extends no further than 0.06 on each side of the sample
proportion. How large a sample be needed for this purpose?
Ex-21: A random sample of 120 electrical bulbs is tested, and
the mean duration is found as 78.7 hours. The standard deviation
of the duration of bulbs is 11.5 hours. (i) Find a 95% CI for,
(ii) The authority thinks that the width of the interval is too large
to decide, so determine the sample size to confirm that the width
of this interval should not exceed 2.5 hours.
Ex-22: Suppose that the shopping times for customers at a local
store are normally distributed. A random sample of 16 shoppers
in the local grocery store had a mean time of 25 minutes.
Assume  = 6 minutes and the shopping time is normally
distributed; find the standard error of the mean, margin of error,

Page 35 of 37
and width for a 95% confidence interval for the population mean
.
Ex-23: A process produces bags of refined sugar. The weights
of the contents of these bags are normally distributed with SD
1.2 ounces. The contents of 25 bags had a mean weight of 19.8
ounces. Find the 99% confidence interval for the true mean
weight for all bags of sugar produced by the process.
Ex-24: The director of an electronic company is interested in
estimating the mean expenditure of customers on electrical
appliances. A random sample of 80 customers was questioned
and found the average expenditure was Tk 47 thousand with a
standard deviation of 16.5 thousand. Find a 95% confidence
interval for true mean expenditure. Suppose the director is not
satisfied with the confidence interval because it is too high, so
he wishes to know how large a sample would be required to
obtain a 95% confidence interval of the total width of no more
than 4 thousand. Find the smallest size of sample that will
satisfy this desire.

Page 36 of 37
Page 37 of 37

You might also like