0% found this document useful (0 votes)
34 views53 pages

Lesson 07 - Sampling and Sampling Distributions (Without Video)

1) Sampling is selecting a subset of items from a population to gather information about the whole population. It is less costly than a census but can introduce errors. 2) There are two main types of sampling: probability and non-probability. Probability sampling gives all items a chance of selection while non-probability is subjective. 3) A sampling distribution describes the distribution of all possible sample statistics from repeated sampling. It is used to understand sampling error and apply inferential statistics.

Uploaded by

ashwini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views53 pages

Lesson 07 - Sampling and Sampling Distributions (Without Video)

1) Sampling is selecting a subset of items from a population to gather information about the whole population. It is less costly than a census but can introduce errors. 2) There are two main types of sampling: probability and non-probability. Probability sampling gives all items a chance of selection while non-probability is subjective. 3) A sampling distribution describes the distribution of all possible sample statistics from repeated sampling. It is used to understand sampling error and apply inferential statistics.

Uploaded by

ashwini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

SAMPLING AND SAMPLING

DISTRIBUTIONS

Chapter 07
By: Sanjaya Ariyawansa
LEARNING OBJECTIVES

• Definitions pertaining to samples


• Distinguish between different sampling methods
• Concept of the Sampling distribution
• Central Limit Theorem
• Compute probabilities related to the sample mean and the sample proportion
WHY SAMPLE?

• Selecting a sample is less time-consuming & less costly than selecting every item in the
population (census).
• An analysis of a sample is less cumbersome and more practical than an analysis of the entire
population.
• A Sampling Process Begins With A Sampling Frame
• The sampling frame is a listing of items that make up the population

• Frames are data sources such as population lists, directories, or maps

• Inaccurate or biased results can result if a frame excludes certain portions of the population

• Using different frames to generate data can lead to dissimilar conclusions


TYPES OF SAMPLES

Samples

Non-Probability Probability Samples


Samples

Judgment Simple Stratified


Convenience
Random
Systematic Cluster
NON-PROBABILITY SAMPLES

• In a nonprobability sample, items included are chosen without regard to their probability of
occurrence.

• convenience sampling, items are selected based only on the fact that they are easy,
inexpensive, or convenient to sample.
• judgment sample, you get the opinions of pre-selected experts in the subject matter.
PROBABILITY SAMPLES

• In a probability sample, items in the sample are chosen on the basis of known probabilities.

Simple
Systematic
Random
Sample
Sample

Stratified Cluster
Sample Sample
SIMPLE RANDOM SAMPLE

• Every individual or item from the frame has an equal chance of being selected.

• Selection may be with replacement (selected individual is returned to frame for possible
reselection) or without replacement (selected individual isn’t returned to the frame).

• Samples obtained from table of random numbers or computer random number generators.
SYSTEMATIC SAMPLE

• Decide on sample size: n


• Divide frame of N individuals into groups of k individuals: k=N/n
• Randomly select one individual from the 1st group
• Select every kth individual thereafter

First Group
N = 40
n=4
k = 10
STRATIFIED SAMPLE

• Divide population into two or more subgroups (called strata) according to some common
characteristic
• A simple random sample is selected from each subgroup, with sample sizes proportional to
strata sizes
• Samples from subgroups are combined into one
• This is a common technique when sampling population of voters, stratifying across racial or
socio-economic lines.
Population
Divided
into 4
strata
CLUSTER SAMPLE

• Population is divided into several “clusters,” each representative of the population


• A simple random sample of clusters is selected
• All items in the selected clusters can be used, or items can be chosen from a cluster using
another probability sampling technique
• A common application of cluster sampling involves election exit polls, where certain election
districts are selected and sampled.

Population
divided into
Randomly selected
16 clusters. clusters for sample
• The main difference between cluster and stratified sampling is that in cluster sampling, the
population is divided into clusters and all individuals within the selected clusters are included
in the sample, while in stratified sampling, the population is divided into strata and a random
sample is selected from each stratum.

• A cluster is a group of individuals or units that are naturally occurring and similar to each
other in some way, such as households in a neighborhood or students in a classroom

• A stratum, on the other hand, is a subgroup of the population that is based on some
characteristic or attribute, such as age, gender, or income level.
COMPARING SAMPLING METHODS

• Simple random sample and Systematic sample


• Simple to use
• May not be a good representation of the population’s underlying characteristics
• Stratified sample
• Ensures representation of individuals across the entire population
• Cluster sample
• More cost effective
• Less efficient (need larger sample to acquire the same level of precision)
2. SURVEY
EVALUATING SURVEY WORTHINESS

• What is the purpose of the survey?

• Is the survey based on a probability sample?

• Coverage error – appropriate frame?

• Nonresponse error – follow up

• Measurement error – good questions elicit good responses

• Sampling error – always exists


TYPES OF SURVEY ERRORS

• Coverage error or selection bias


• Exists if some groups are excluded from the frame and have no chance of being selected
• Nonresponse error or bias
• People who do not respond may be different from those who do respond
• Sampling error
• Variation from sample to sample will always exist
• Measurement error
• Due to weaknesses in question design, respondent error, and interviewer’s effects on
the respondent (“Hawthorne effect”)
SURVEY

• Types of Survey Errors

Bad or leading Follow up on Random differences Excluded from frame


question nonresponses from sample to
sample
3. SAMPLING DISTRIBUTION
SAMPLING DISTRIBUTIONS

• A sampling distribution is a distribution of all of the possible values of a sample


statistic for a given size sample selected from a population.

• For example, suppose you sample 50 students from your college regarding their mean
GPA. If you obtained many different samples of 50, you would compute a different
mean for each sample. We are interested in the distribution of the mean GPA from all
possible samples of 50 students.
DEVELOPING A SAMPLING DISTRIBUTION
C A
D B

• Assume there is a population …

• Population size 𝑁=4

• Random variable, 𝑋, is age of individuals

• Values of 𝑋: 18, 20,22, 24 (years)


SUMMARY MEASURES FOR THE POPULATION
DISTRIBUTION

μ=
 X i P(x)
N .3
18 + 20 + 22 + 24
= = 21 .2
4 .1

 (X − μ) 2 0
18 20 22 24 x
σ= i
= 2.236
N A B C D
Uniform Distribution
NOW CONSIDER ALL POSSIBLE SAMPLES OF SIZE N=2

16 Sample
1st 2nd Observation
Obs Means
18 20 22 24
18 18,18 18,20 18,22 18,24 1st 2nd Observation
20 20,18 20,20 20,22 20,24 Obs 18 20 22 24
22 22,18 22,20 22,22 22,24 18 18 19 20 21
24 24,18 24,20 24,22 24,24 20 19 20 21 22
16 possible samples 22 20 21 22 23
(sampling with
replacement)
24 21 22 23 24
SAMPLING DISTRIBUTION OF ALL SAMPLE MEANS

16 Sample Means Sample Means


Distribution
1st 2nd Observation _
Obs 18 20 22 24 P(X)
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 X
(no longer uniform)
Sample Means Uniform Distribution
Distribution
_
P(X) P(x)
.3
.3
.2
.2
.1 .1
0 _ 0
18 19 20 21 22 23 24 X 18 20 22 24 x
(no longer uniform) A B C D
SUMMARY MEASURES OF THIS SAMPLING
DISTRIBUTION:
ഥ)
𝑷(𝑿
.3
.2
18 + 19 + 19 +  + 24
.1 μX = = 21
0
16
18 19 20 21 22 23 ഥ
𝑿
24 (no longer
uniform)
(18 - 21) 2 + (19 - 21) 2 +  + (24 - 21) 2
σX = = 1.58
16
Note: Here we divide by 16 because there are 16
different samples of size 2.
COMPARING THE POPULATION DISTRIBUTION TO
THE SAMPLE MEANS DISTRIBUTION

Population Sample Means Distribution


N=4 n=2

μ = 21 σ = 2.236 μX = 21 σ X = 1.58
_
P(X) P(X)
.3 .3

.2 .2

.1 .1

0 0 _
18 20 22 24 X 18 19 20 21 22 23 24
A B C D X
SAMPLING DISTRIBUTION OF THE MEAN: STANDARD
ERROR OF THE MEAN

• Different samples of the same size from the same population will yield different sample
means
• A measure of the variability in the mean from sample to sample is given by the Standard
Error of the Mean:
(This assumes that sampling is with replacement or sampling is without replacement from an
infinite population)

σ
σX =
n
• Note that the standard error of the mean decreases as the sample size increases
SAMPLING DISTRIBUTION OF THE MEAN: IF THE
POPULATION IS NORMAL

• If a population is normal with mean μ and standard deviation σ, the sampling distribution of
𝑋ത is also normally distributed with

σ
μX = μ and σX =
n
Z-VALUE FOR SAMPLING DISTRIBUTION OF THE
MEAN

• Z-value for the sampling distribution of 𝑋ത :

( X − μX ) ( X − μ)
Z= =
σX σ
n
where: ത sample mean
𝑋=
𝜇= population mean
𝜎= population standard deviation
𝑛 = sample size
SAMPLING DISTRIBUTION PROPERTIES
SAMPLING DISTRIBUTION PROPERTIES

Normal Population

μx = μ Distribution

(i.e. 𝑋ത is unbiased )
μ x
Normal Sampling
Distribution
(has the same mean)

μx
x
SAMPLING DISTRIBUTION PROPERTIES

As n increases, Larger sample


size
𝜎𝑋ത decreases

Smaller
sample size

μ x
DETERMINING AN INTERVAL INCLUDING A FIXED
PROPORTION OF THE SAMPLE MEANS

Find a symmetrically distributed interval around µ that will include 95% of the sample means
when µ = 368, σ = 15, and n = 25.

• Since the interval contains 95% of the sample means 5% of the sample means will be
outside the interval.
• Since the interval is symmetric 2.5% will be above the upper limit and 2.5% will be
below the lower limit.
• From the standardized normal table, the Z score with 2.5% (0.0250) below it is -1.96
and the Z score with 2.5% (0.0250) above it is 1.96.
• Calculating the lower limit of the interval

σ 15
XL = μ+Z = 368 + (−1.96) = 362.12
n 25
• Calculating the upper limit of the interval

σ 15
XU = μ+Z = 368 + (1.96) = 373.88
n 25
• 95% of all sample means of sample size 25 are between 362.12 and 373.88
SAMPLING DISTRIBUTION OF THE MEAN: IF THE
POPULATION IS NOT NORMAL

• We can apply the Central Limit Theorem:

• Even if the population is not normal,


• sample means from the population will be approximately normal as long as the sample size
is large enough.

• Properties of the sampling distribution:

σ
μx = μ σx =
n
SAMPLE MEAN SAMPLING DISTRIBUTION: IF THE
POPULATION IS NOT NORMAL

Population Distribution
Sampling distribution
properties:
Central Tendency

μx = μ
μ x
Variation Sampling Distribution
σ (becomes normal as n increases)
σx = Larger
n Smaller sample
sample size size

x
HOW LARGE IS LARGE ENOUGH?

• For most distributions, n ≥ 30 will give a sampling distribution that is nearly normal
• For fairly symmetric distributions, n ≥ 15
• For normal population distributions, the sampling distribution of the mean is always normally
distributed
EXAMPLE

• Suppose a population has mean μ = 8 and standard deviation σ = 3. Suppose a random


sample of size n = 36 is selected.

• What is the probability that the sample mean is between 7.8 and 8.2?
SOLUTION:

• Even if the population is not normally distributed, the central limit theorem can be used (n ≥
30)
• so the sampling distribution of 𝑋ത is approximately normal
• with mean 𝜇𝑋ത = 8
𝜎 3
• and standard deviation 𝜎𝑋ത = = = 0.5
𝑛 36
SOLUTION (CONTINUED):

 
 7.8 - 8 X -μ 8.2 - 8  Remember that
P(7.8  X  8.2) = P    𝜇𝑋ത = 𝜇
 3 σ 3 
 36 n 36 
= P(-0.4  Z  0.4) = 0.6554 - 0.3446 = 0.3108

Population Sampling Standard Normal


Distribution Distribution Distribution
???
? ??
? ? Sample Standardize
? ? ?
?
7.8 8.2 -0.4 0.4
μ=8 X μX = 8 x μz = 0 Z
POPULATION PROPORTIONS

π = the proportion of the population having


a characteristic of interest
• Sample proportion (p) provides an estimate of π:

X number of items in the sample having the characteristic of interest


p= =
n sample size

• 0≤ p≤1
• p is approximately distributed as a normal distribution when n is large
(assuming sampling with replacement from a finite population or without replacement from an
infinite population)
SAMPLING DISTRIBUTION OF P

• Approximated by a
Sampling Distribution
normal distribution if: P( ps)
.3
nπ  5 .2
.1
and 0
0 .2 .4 .6 8 1 p
n(1 − π )  5
where
π (1− π )
μp = π and σp =
n
(where π = population proportion)
Z-VALUE FOR PROPORTIONS

Standardize p to a Z value with the formula:


p − p −
Z= =
σp  (1−  )
n
EXAMPLE

• If the true proportion of voters who support Proposition A is π = 0.4, what is the
probability that a sample of size 200 yields a sample proportion between 0.40 and 0.45?
• i.e. if π = 0.4 and n = 200, what is P(0.40 ≤ p ≤ 0.45) ?
 (1−  ) 0.4(1 − 0.4)
Find σp = = = 0.03464
n 200

Convert to  0.40 − 0.40 0.45 − 0.40 


P(0.40  p  0.45) = P Z 
standardized  0.03464 0.03464 
normal:
= P(0  Z  1.44)
Utilize the cumulative normal table:
P(0 ≤ Z ≤ 1.44) = 0.9251 – 0.5000 = 0.4251

Standardized
Sampling Distribution Normal Distribution

0.4251

Standardize

0.40 0.45 0 1.44


p Z
SECTION SUMMARY

• Discussed probability and nonprobability samples


• Described four common probability samples
• Examined survey worthiness and types of survey errors
• Introduced sampling distributions
• Described the sampling distribution of the mean
• For normal populations
• Using the Central Limit Theorem
• Described the sampling distribution of a proportion
• Calculated probabilities using sampling distributions
4. SAMPLING FROM FINITE POPULATIONS
FINITE POPULATION CORRECTION FACTORS

• Used to calculate the standard error of both the sample mean and the sample proportion

• Needed when the sample size, n, is more than 5% of the population size N (i.e. 𝑛 / 𝑁 >
0.05)

• The Finite Population Correction Factor Is:

N −n
fpc =
N −1
USING THE FPC IN CALCULATING STANDARD
ERRORS
Standard Error of the Mean for Finite Populations

 N −n
X =
n N −1

Standard Error of the Proportion for Finite Populations

 (1 −  ) N − n
p =
n N −1
USING THE FPC REDUCES THE STANDARD ERROR

• The fpc is always less than 1

• So when it is used it reduces the standard error

• Resulting in more precise estimates of population parameters


USING FPC WITH THE MEAN - EXAMPLE

• Suppose a random sample of size 100 is drawn from a population of size 1,000 with a
standard deviation of 40.
• Here n=100, N=1,000 and 100/1,000 = 0.10 > 0.05. So using the fpc for the standard error of
the mean we get:

40 1000 − 100
X = = 3 .8
100 1000 − 1

You might also like