0% found this document useful (0 votes)
20 views39 pages

2024-Lecture 05

This document discusses sampling methods and sampling distributions. It covers topics like simple random sampling, stratified random sampling, sampling distribution of means, central limit theorem, and how simulation can help understand sampling distributions. Formulas for standard error and finding probabilities from sampling distributions are also provided.

Uploaded by

Nguyễn Tâm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views39 pages

2024-Lecture 05

This document discusses sampling methods and sampling distributions. It covers topics like simple random sampling, stratified random sampling, sampling distribution of means, central limit theorem, and how simulation can help understand sampling distributions. Formulas for standard error and finding probabilities from sampling distributions are also provided.

Uploaded by

Nguyễn Tâm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Probability and Statistics

LECTURE 5
SAMPLING METHODS
&
SAMPLING DISTRIBUTIONS

Adapted from http://www.prenhall.com/mcclave


OUTLINE
• Sampling methods
• The importance of sampling distribution
• Sampling distribution of sample means
• Using simulation to understand sampling
distribution
• Central limit theorem

5-2
SAMPLING

 Some terms:
• Census: conducting a survey to collect data for the
entire population
• Sampling: selecting a sample from population
 Why sampling is necessary?
 Cost
 Practicality
 Statistical inference will allow us to draw
conclusions for population based on sample data

5-3
SAMPLING WITH/WITHOUT REPLACEMENT

 Sampling with replacement:

 Sampling without replacement:

5-4
SAMPLING METHODS

 Probability sampling methods


 Probability sample: a sample selected in such a way
that each item or person in the population has a known
(nonzero) likelihood of being included in the sample.
 E.g. Simple random sampling, systematic random
sampling, and stratified random sampling.

5-5
SAMPLING METHODS

 Nonprobability sampling methods


 Not all items or people have a chance of being included
in the sample.
 E.g. online survey, surveying customers by going around
a shopping mall and asking customers to answer a
questionnaire.
 Results may be biased.

5-6
SOME IMPORTANT PROBABILITY
SAMPLING METHODS

Let’s now discuss two of the most important


probability sampling methods: simple random
sampling and stratified random sampling.

5-7
SIMPLE RANDOM SAMPLING
Selecting a sample in such a way that every possible
sample of the same size is equally likely to be
chosen.
Example: Suppose our lecture class is the
population. We are going to select a simple random
sample (SRS) of 20 students from the population.

 We first have to obtain a list of population members.


Number the list from 1 to N (the population size).
 Then, use a computer software to select a random
sample.

5-8
TAKE A SRS IN R
sample(x, size, replace = FALSE)

 x: a vector of one or more elements from which to


choose.
 size: the number of items to be chosen.

 replace: specifies whether we want sampling with


replacement (TRUE) or without replacement (FALSE)

5-9
STRATIFIED RANDOM SAMPLING

 To select a stratified random sample, first divide


the population into mutually exclusive groups of
similar individuals called strata. Then choose a
separate SRS from each stratum and combine
these SRSs to form the full sample.

 Example:

5 - 10
OTHER TYPES OF SAMPLING

Please refer to your textbook for other type of


sampling methods

5 - 11
STATISTICAL METHODS

Statistical
Methods

Descriptive Inferential
Statistics Statistics

5 - 12
IMPORTANCE OF SAMPLING DISTRIBUTION

 The basis of statistical inference


 Basis for understanding hypothesis testing,
estimation, etc.

5 - 13
REPEATED SAMPLING

 Example
 We wish to estimate population mean
 Select a random sample
 Find the sample mean (e.g. = 20) and use it as an
estimate
 If other people select different samples, and find
markedly different sample means
 Would we trust our estimate?

5 - 14
REPEATED SAMPLING

 The same problem but:


 If everyone else selects different samples, their
results are close to our result
 Sampling distribution of sample means gives ideas
about how sample means vary between
samples
 Sample mean: just a particular sample statistic

5 - 15
EXAMPLE OF SAMPLING DISTRIBUTION

 Given a population of salaries of 5 employees: 2,


5, 7, 8, 10 (in hundred dollars/month)
 Imagine population mean is unknown; we wish
to estimate the population mean salary
 We select a random sample of 3 salaries

5 - 16
EXAMPLE OF SAMPLING DISTRIBUTION

 Denote mean of random sample: 𝑿 ഥ


 Before sample selection: does 𝑿 ഥ represent
a fixed value or a random variable?
ഥ represents a variable that can change in
 If 𝑿
values
 how many possible values it can take?
 What is the possibility of each value?
 What if we use a sample size of n = 4?

5 - 17
SAMPLING DISTRIBUTION OF SAMPLE MEANS

 Probability distribution of all of the possible


values of the sample mean for a given size
sample selected from a population

 What if we change the sample size?

5 - 18
QUESTIONS

 Is there a sampling distribution of median?


 Is there a sampling distribution of variance?

5 - 19
EXAMPLE OF SAMPLING DISTRIBUTION
OF VARIANCE

5 - 20
IN GENERAL

Sampling distribution is a probability


distribution of all of the possible values of a
sample statistic for a given size sample selected
from a population

5 - 21
ACTIVITY: EXPLORING SAMPLING DISTRIBUTIONS
VIA SIMULATION

 Use the applet on the webpage:


http://www.rossmanchance.com/applets/OneSample.html
 1st Population: math scores of 15892 high school
students
 Let’s observe
 Histogram of population
 Mean of population
 SD of population

5 - 22
ACTIVITY

 Now we will develop sampling distribution of


sample means (for example, by selecting 10000
samples or more) for n =
 2
 10
 30
 100

5 - 23
OBSERVATIONS
 Let’s write down our observations:
 Many sampling distributions (for each n)
 Shape of sampling distribution
 Mean of sampling distribution (and compare it
with mean of population)
 SD of sampling distribution (and compare it with
SD of population)
 The difference between sampling distribution
and population

5 - 24
ACTIVITY
 Now let’s choose a different population (a non-
normal population) provided by the website
 Repeat what we have done
 Write down our observations
 When does the sampling distribution becomes
approximately normal?

5 - 25
ACTIVITY
 Now we should
 Clearly distinguish between population and sampling
distribution.
 Homework: you should experiment with other
populations in the website to deepen your
understanding of sampling distributions.
 Question: Is there a sampling distribution of
another statistic?

5 - 26
THEOREM I
 If a random sample is selected from a normal
population, the sampling distribution of sample
mean is normal.

 Demonstrated by the applet of population of


math scores.

5 - 27
THEOREM II: CENTRAL LIMIT THEOREM

 If a random sample is selected from a non-normal


population, the sampling distribution of sample
mean is approximately normal for large sample
sizes.

 Demonstrated by the applet of a skewed


population.

5 - 28
THEOREM II: CENTRAL LIMIT THEOREM
 Practical guideline:
• If the population is nearly normal, then a sample of size n
= 5 will probably be large enough to assure that 𝑿 ഥ is
approximately normal.
• If the population is symmetric, then a sample of size n =
20 to 25 is enough for the Central Limit Theorem (CLT) to
hold.
• For most moderately skewed distributions, a sample size
of around 30 is traditionally thought to be sufficiently
large for the CLT to hold. This is a rule of thumb but this
is not a definitive number.
• For very skewed distributions or distributions with
outliers, the sample size required for the CLT to hold may
be much larger than 30.
5 - 29
PROPERTIES OF SAMPLING DISTRIBUTION OF
MEAN

 The relationship between


 Mean of population and
 Mean of all sample means

 The relationship between


 SD of population and
 SD of all sample means

5 - 30
SAMPLING ERROR
 Difference between sample statistic and
parameter
 Important when making inference about
population

5 - 31
STANDARD ERROR OF MEAN
 SD of sample means
 Represents (approx.) average deviation of sample
means to center
 The center = population mean
 Represents (approx.) average error when using
sample mean to estimate population mean
 So called Standard error of mean:
𝝈𝑿
𝝈𝑋ത =
𝒏
(if n/N ≤ 0.05)

5 - 32
FINITE POPULATION CORRECTION FACTOR
 In cases where n/N > 0.05, the standard error
of mean is:

5 - 33
FINDING PROBABILITY OF SAMPLE MEAN
 First, check that the sampling distribution of
sample mean is normal or nearly so
 If so, convert to Z to find probability:

5 - 34
EXERCISE 1
You’re an operations analyst
for AT&T. Long-distance
telephone calls are normally
distributed with µ = 8 min. &
= 2 min. If you select a
random sample of 25 calls,
what is the probability that
the sample mean would be
between 7.8 & 8.2 minutes?

© 1984-1994 T/Maker Co.

5 - 35
SOLUTION
X   7.8  8
Z    .50
 n 2 25
X   8.2  8
Z   .50
 n 2 25
Sampling Standard Normal
Distribution Distribution
X = .4 =1
.3830

7.8 8 8.2 X -.50 0 .50 Z


5 - 36
EXERCISE 2
You’re an operations analyst
for company A. The
distribution of long-distance
telephone calls is symmetric
but non-normal with µ = 8
min. & = 2 min. If you select
a random sample of 30 calls,
what is the probability that
the sample mean lies
between 7.8 & 8.2 minutes?

© 1984-1994 T/Maker Co.

5 - 37
SOLUTION
X   7 .8  8
Z    .55
 n 2 30
X   8 .2  8
Z   .55
 n 2 30
Sampling Standard Normal
Distribution Distribution

 = .365  =1
X
.4176

-.55 0 Z
7.8
5 - 38
8 8.2 X .55
CONCLUSION

 Sampling methods
 The importance of sampling distribution

 Sampling distribution of sample mean

 Using simulation to understand sampling


distribution
 Central limit theorem

5 - 39

You might also like