Lesson 07 - Sampling and Sampling Distributions (Without Video)
Lesson 07 - Sampling and Sampling Distributions (Without Video)
DISTRIBUTIONS
Chapter 07
By: Sanjaya Ariyawansa
LEARNING OBJECTIVES
• Selecting a sample is less time-consuming & less costly than selecting every item in the
population (census).
• An analysis of a sample is less cumbersome and more practical than an analysis of the entire
population.
• A Sampling Process Begins With A Sampling Frame
• The sampling frame is a listing of items that make up the population
• Inaccurate or biased results can result if a frame excludes certain portions of the population
Samples
• In a nonprobability sample, items included are chosen without regard to their probability of
occurrence.
• convenience sampling, items are selected based only on the fact that they are easy,
inexpensive, or convenient to sample.
• judgment sample, you get the opinions of pre-selected experts in the subject matter.
PROBABILITY SAMPLES
• In a probability sample, items in the sample are chosen on the basis of known probabilities.
Simple
Systematic
Random
Sample
Sample
Stratified Cluster
Sample Sample
SIMPLE RANDOM SAMPLE
• Every individual or item from the frame has an equal chance of being selected.
• Selection may be with replacement (selected individual is returned to frame for possible
reselection) or without replacement (selected individual isn’t returned to the frame).
• Samples obtained from table of random numbers or computer random number generators.
SYSTEMATIC SAMPLE
First Group
N = 40
n=4
k = 10
STRATIFIED SAMPLE
• Divide population into two or more subgroups (called strata) according to some common
characteristic
• A simple random sample is selected from each subgroup, with sample sizes proportional to
strata sizes
• Samples from subgroups are combined into one
• This is a common technique when sampling population of voters, stratifying across racial or
socio-economic lines.
Population
Divided
into 4
strata
CLUSTER SAMPLE
Population
divided into
Randomly selected
16 clusters. clusters for sample
• The main difference between cluster and stratified sampling is that in cluster sampling, the
population is divided into clusters and all individuals within the selected clusters are included
in the sample, while in stratified sampling, the population is divided into strata and a random
sample is selected from each stratum.
• A cluster is a group of individuals or units that are naturally occurring and similar to each
other in some way, such as households in a neighborhood or students in a classroom
• A stratum, on the other hand, is a subgroup of the population that is based on some
characteristic or attribute, such as age, gender, or income level.
COMPARING SAMPLING METHODS
• For example, suppose you sample 50 students from your college regarding their mean
GPA. If you obtained many different samples of 50, you would compute a different
mean for each sample. We are interested in the distribution of the mean GPA from all
possible samples of 50 students.
DEVELOPING A SAMPLING DISTRIBUTION
C A
D B
μ=
X i P(x)
N .3
18 + 20 + 22 + 24
= = 21 .2
4 .1
(X − μ) 2 0
18 20 22 24 x
σ= i
= 2.236
N A B C D
Uniform Distribution
NOW CONSIDER ALL POSSIBLE SAMPLES OF SIZE N=2
16 Sample
1st 2nd Observation
Obs Means
18 20 22 24
18 18,18 18,20 18,22 18,24 1st 2nd Observation
20 20,18 20,20 20,22 20,24 Obs 18 20 22 24
22 22,18 22,20 22,22 22,24 18 18 19 20 21
24 24,18 24,20 24,22 24,24 20 19 20 21 22
16 possible samples 22 20 21 22 23
(sampling with
replacement)
24 21 22 23 24
SAMPLING DISTRIBUTION OF ALL SAMPLE MEANS
μ = 21 σ = 2.236 μX = 21 σ X = 1.58
_
P(X) P(X)
.3 .3
.2 .2
.1 .1
0 0 _
18 20 22 24 X 18 19 20 21 22 23 24
A B C D X
SAMPLING DISTRIBUTION OF THE MEAN: STANDARD
ERROR OF THE MEAN
• Different samples of the same size from the same population will yield different sample
means
• A measure of the variability in the mean from sample to sample is given by the Standard
Error of the Mean:
(This assumes that sampling is with replacement or sampling is without replacement from an
infinite population)
σ
σX =
n
• Note that the standard error of the mean decreases as the sample size increases
SAMPLING DISTRIBUTION OF THE MEAN: IF THE
POPULATION IS NORMAL
• If a population is normal with mean μ and standard deviation σ, the sampling distribution of
𝑋ത is also normally distributed with
σ
μX = μ and σX =
n
Z-VALUE FOR SAMPLING DISTRIBUTION OF THE
MEAN
( X − μX ) ( X − μ)
Z= =
σX σ
n
where: ത sample mean
𝑋=
𝜇= population mean
𝜎= population standard deviation
𝑛 = sample size
SAMPLING DISTRIBUTION PROPERTIES
SAMPLING DISTRIBUTION PROPERTIES
Normal Population
μx = μ Distribution
(i.e. 𝑋ത is unbiased )
μ x
Normal Sampling
Distribution
(has the same mean)
μx
x
SAMPLING DISTRIBUTION PROPERTIES
Smaller
sample size
μ x
DETERMINING AN INTERVAL INCLUDING A FIXED
PROPORTION OF THE SAMPLE MEANS
Find a symmetrically distributed interval around µ that will include 95% of the sample means
when µ = 368, σ = 15, and n = 25.
• Since the interval contains 95% of the sample means 5% of the sample means will be
outside the interval.
• Since the interval is symmetric 2.5% will be above the upper limit and 2.5% will be
below the lower limit.
• From the standardized normal table, the Z score with 2.5% (0.0250) below it is -1.96
and the Z score with 2.5% (0.0250) above it is 1.96.
• Calculating the lower limit of the interval
σ 15
XL = μ+Z = 368 + (−1.96) = 362.12
n 25
• Calculating the upper limit of the interval
σ 15
XU = μ+Z = 368 + (1.96) = 373.88
n 25
• 95% of all sample means of sample size 25 are between 362.12 and 373.88
SAMPLING DISTRIBUTION OF THE MEAN: IF THE
POPULATION IS NOT NORMAL
σ
μx = μ σx =
n
SAMPLE MEAN SAMPLING DISTRIBUTION: IF THE
POPULATION IS NOT NORMAL
Population Distribution
Sampling distribution
properties:
Central Tendency
μx = μ
μ x
Variation Sampling Distribution
σ (becomes normal as n increases)
σx = Larger
n Smaller sample
sample size size
x
HOW LARGE IS LARGE ENOUGH?
• For most distributions, n ≥ 30 will give a sampling distribution that is nearly normal
• For fairly symmetric distributions, n ≥ 15
• For normal population distributions, the sampling distribution of the mean is always normally
distributed
EXAMPLE
• What is the probability that the sample mean is between 7.8 and 8.2?
SOLUTION:
• Even if the population is not normally distributed, the central limit theorem can be used (n ≥
30)
• so the sampling distribution of 𝑋ത is approximately normal
• with mean 𝜇𝑋ത = 8
𝜎 3
• and standard deviation 𝜎𝑋ത = = = 0.5
𝑛 36
SOLUTION (CONTINUED):
7.8 - 8 X -μ 8.2 - 8 Remember that
P(7.8 X 8.2) = P 𝜇𝑋ത = 𝜇
3 σ 3
36 n 36
= P(-0.4 Z 0.4) = 0.6554 - 0.3446 = 0.3108
• 0≤ p≤1
• p is approximately distributed as a normal distribution when n is large
(assuming sampling with replacement from a finite population or without replacement from an
infinite population)
SAMPLING DISTRIBUTION OF P
• Approximated by a
Sampling Distribution
normal distribution if: P( ps)
.3
nπ 5 .2
.1
and 0
0 .2 .4 .6 8 1 p
n(1 − π ) 5
where
π (1− π )
μp = π and σp =
n
(where π = population proportion)
Z-VALUE FOR PROPORTIONS
• If the true proportion of voters who support Proposition A is π = 0.4, what is the
probability that a sample of size 200 yields a sample proportion between 0.40 and 0.45?
• i.e. if π = 0.4 and n = 200, what is P(0.40 ≤ p ≤ 0.45) ?
(1− ) 0.4(1 − 0.4)
Find σp = = = 0.03464
n 200
Standardized
Sampling Distribution Normal Distribution
0.4251
Standardize
• Used to calculate the standard error of both the sample mean and the sample proportion
• Needed when the sample size, n, is more than 5% of the population size N (i.e. 𝑛 / 𝑁 >
0.05)
N −n
fpc =
N −1
USING THE FPC IN CALCULATING STANDARD
ERRORS
Standard Error of the Mean for Finite Populations
N −n
X =
n N −1
(1 − ) N − n
p =
n N −1
USING THE FPC REDUCES THE STANDARD ERROR
• Suppose a random sample of size 100 is drawn from a population of size 1,000 with a
standard deviation of 40.
• Here n=100, N=1,000 and 100/1,000 = 0.10 > 0.05. So using the fpc for the standard error of
the mean we get:
40 1000 − 100
X = = 3 .8
100 1000 − 1