SAMPLING & ESTIMATION
Main Issues
Universe/Population
Sampling Unit
Sampling Frame
Sample Size
Budgetary Constraints
Sampling Procedure
Criteria of Design
Cost of
collecting &
analyzing Data
Minimise cost of sampling
Cost of
incorrect
inferences
Systematic bias &
Sampling error Leads to
Systematic bias – Inherent in the System
Sampling error-Random variation, controllable by sample size
Sampling Methods
A. Non-random/Non-probability-based sampling
– Convenience/ judgmental /purposive/quota sampling
B. Random/Probability- based sampling
1. Simple random sampling
Each element/item has equal chance of getting included in a sample.
Randomness.
Sampling with/without replacement
Random number table, pseudo-random number generator.
2. Stratified Sampling
Each stratum is a homogeneous group and different from other strata.
Random selection from each stratum.
3. Systematic sampling
Elements selected at a uniform interval.
Selection evenly spread, less cost & time, more convenient.
Problem in case of hidden periodicity.
4. Cluster sampling
Least or no variation among clusters.
Clusters are selected randomly for further analysis.
Area sampling in geographical clusters.
Multi-stage sampling as a special case.
ESTIMATION FROM SAMPLES
• Sampling Distribution: Distribution of a sample
statistics, usually mean.
• Standard error( ): Standard deviation of the
sampling distribution.
• Mean of sampling distribution( ) of means, taking
all possible samples exhaustively, approaches to
population mean (µ), particularly for normal
population distribution.
• As sample size increases, standard error decreases.
Assuming Normal Population Distribution
n = Sample size
Central Limit Theorem:
Irrespective of shape of population distribution, sampling
distribution approaches to normal, as sample size increases.
Mean of such sampling distribution is population mean.
Sample Standard Precision of Cost of
Estimation Vs sampling
Size error
Point Estimate
Interval Estimate.
Confidence Level:
Level of significance, α
Probability that is associated with an interval
estimate (1- α), of any population parameter.
Higher confidence level => Wider confidence
interval
Estimation of mean from large sample(usually n> 30):
As sample size is large, sampling distribution of
mean is normal.
1. Compute from either known or estimated
2. Get Z value from standard normal distribution table
corresponding to confidence level (1- α).
3. The confidence interval
Estimation of means from small samples(n<30):
t-distribution:
Applicable for smaller sample size.
Unimodal and almost like a bell shape.
Flatter than normal.
Larger the sample size less flatter the distribution shape and
closer to normal.
Value of t varies with d.f.i.e.(n-1) as the distribution shape
changes.
Step 1. Compute ( ) as usual
Step 2.Get t value from t- distribution table corresponding to (n-1)
as d.f. and (1- confidence level) as the area under curve.
Step 3. ± t is the confidence interval/limit.
Two sided Confidence
Case Interval (CI)
Population standard deviation, σ 𝜎
𝑥 ± 𝑍𝛼/2
known 𝑛
Population Sample size n > 30 𝑠
𝑥 ± 𝑍𝛼/2
standard 𝑛
deviation, σ
unknown
Sample size n ≤ 30 𝑠
𝑥 ± 𝑡𝛼/2
𝑛
Confidence Interval on the Variance of a Normal Distribution
Confidence Intervals on a Population Proportion
• Example 1: A sample of size 20 was collected
and the sample mean and standard deviation
are estimated as 9.8525 and 0.0965. Find 95%
CI for the mean.
• Example 2: The life in hours of a light bulb is
known to be approximately normally
distributed with 25 hours. A random sample of
40 bulbs has a mean life of 1014 hours.
1. Construct a 95% two-sided CI on the mean life.
2. Construct a 95% one-sided lower CI of the mean
life.
• Example 3: The following result shows the
investigation of the haemoglobin level of hockey
players (in g/dl).
15.3 16.0 14.4 16.2 16.2
14.9 15.7 14.6 15.3 17.7
16.0 15.0 15.7 16.2 14.7
14.8 14.6 15.6 14.5 15.2
a) Find the 90% two-sided CI on the mean 15.43684211
0.83413996
haemoglobin level.
b) Also construct 90% Upper CI on the mean
haemoglobin level.