Sp25 Module 06 Sampling
Sp25 Module 06 Sampling
Sampling Distribution
ISOM2500 BUSINESS STATISTICS
(L1 & L2, Spring 2025)
Jason HO
Contents
Samples and surveys
Sampling distribution of the sample mean
Central limit theorem
Sampling distribution of any statistic
2
Journey in this course
Descriptive statistics Building blocks of theory of statistics
Module 2 Modules 4 & 5
Module 3
Graphical Tools Random Variables
Probability • Discrete or
Continuous
Numerical Tools • Jointly distributed
Inferential statistics 3
Inferential statistics: a general set-up with
4 components
Parameter(s)
to describe a
characteristic of Population Find and
a population in
compute
answering some Sample Sample
questions in Statistic(s)
reality to estimate
the parameter
7
Samples and surveys
A survey gathers information of a subgroup of entities (i.e.,
sample) who belong to a much larger group of entities (i.e.,
population), providing necessary ingredients – THE DATA – for
parameter estimation
8
Example 1: use of surveys in daily lives
When an election is approaching, there are nonstop reports and news
about the latest opinion poll
The foreman of a warehouse will not accept a shipment of electronic
components unless virtually all the components in the shipment
operate correctly
A retailer wants to know the market share of a brand before deciding
to stock the items on its shelves
Managers in the human resources department determine the salary
for the new employees based on wages paid around the country
9
Data quality: garbage in, garbage out
Samples that distort the population (e.g., one that systematically
omits a portion of the population) are said to have sampling bias
A sample that presents a “good” snapshot of the population (i.e.,
showing/preserving systematic patterns of the population) is said
to be representative
12
Sampling variation/sample-to-sample
variation
Sampling a small portion from a large population results in
sampling variation/sample-to-sample variation
Every time, only n values A SRS of size n
Link
This time: n particular values 1.5 Probability of each
value depends on
Next time: Another n values 2 the probability of
getting the
corresponding
…… 100 SRS
16
Population: an (underlying) probability
model/distribution X or f(x)
Most statistical methods are developed by
Population
often, if not always, assuming an underlying
probability distribution X or f(x) (or p(x)) for
the population of interest:
Infinitely many values
• The population comprises infinitely
Histogram
many values (as categories or numbers)
• The red smooth curve for histogram An underlying
of all these values from a population probability model
of continuous values mimics the A probability
probability distribution f(x) distribution f(x)
or a random variable X
of a RV, say, X 17
Data: iid samples from X
Assume that the data arise as a representative sample of size n from
the population (with an underlying probability distribution f(x))
• Each data value is an independent realization from the
underlying probability model X, (i.e., like a random draw
with replacement from all values/figures which constitute
the histogram in Slide 17)
• The data are modeled as RVs, which are independent and
identically distributed (iid) samples/draws
from X or f(x), written as
18
Ch.4
From now on, let’s confine our statistic of interest as the sample
mean (i.e., the parameter of interest is a population mean ),
defined by
• As all Xi’s are RVs, the sample mean is a RV, with its probability
distribution especially called the sampling distribution
Sampling distribution of the sample mean is the distribution of the
sample mean computed from a sample of size n. In theory, it can be
obtained from ALL possible samples of size n from the population
through repeated sampling (illustrated in the next Slide)
19
In theory:
Existing New
RV’s Sum up/Average
RV
20
In practice:
M
with # of
repetitions M
(say, M = 10,000)
Approximate
Histogram
Existing New
RV’s Sum up/Average
RV M M
21
Existing New
Example 2: RV’s
Sum up/Average
RV
sampling distribution
Rice Virtual Lab in Statistics:
https://onlinestatbook.com/stat_sim/sampling_dist/index.html
22
Existing New
Example 2 (cont’d) RV’s
Sum up/Average
RV
1 3
1 Population 2
under study
= Distribution
of RVs being
averaged
2 Samples from
set M population
= Values of RVs
being averaged
3 Approximate
Sampling
Sample size Distribution
= # of averaged RVs/values
mean
24
Normality of the sample mean from
normal population
When the population is normal, the sample mean is always
normally distributed for all sample sizes (n = 1,2,3,…)
Sampling distributions of from
same
25
CENTRAL LIMIT
THEOREM
26
Central limit theorem (for other populations)
For a random sample of size n from a
population with mean and variance (both
finite), the sample mean is approximately
normal when n is large (≥30)
sample mean
By CLT
30
Example 5: CLT
A recent report stated that the day-care cost per week in a region is
$109. Suppose this figure is taken as the mean cost per week and
that the standard deviation is known to be $20
Find the probability that a sample of 50 day-care centers
would show a mean cost of $105 or less per week
Weekly cost in any day-care center in the region is the RV under
consideration, but the required probability is related to cost of
NOT only 1 center but costs of 50 of them
Treat weekly cost in all day-care center as the population denoted
by X, then the mean of a sample of 50 costs from X is of interest
31
Example 5 (cont’d) when n is large (≥30)
32
estimate
34
Example 6: sample proportion
Coke bottles are filled by a machine so that contents X have a normal
distribution with mean 298ml and SD 3ml
<295ml?
What is the proportion of bottles with less than 295ml?
Let X be the content (in ml) of any coke bottle, then
100 bottles
What if when we have a carton of 100 bottles of cokes
Example 6 (cont’d)
bottles with less than 295ml
in a carton of 100 bottles?
binary population
Understand the last question:
1. Among all bottles of coke produced <295ml success
by the machine, the proportion of
bottles with <295ml is 15.87%. Call
≥295ml
bottles with <295ml as “success”
2. With a carton of 100 bottles of cokes, it means that we have
sampled/selected 100 bottles from this binary population. We want to
study, among these 100 sampled bottles, the proportion of bottles with
<295ml (i.e., “success”)
Example 6 (cont’d 2)
bottles with less than 295ml
in a carton of 100 bottles?
By CLT, since np = 100 x 0.1587 > 10, and n(1-p) > 10,
37
Student’s t statistic: replacing an Slide 28
unknown with s
When the population SD is unknown (being more realistic), the
standardized sample mean with replaced by s
1. when n ≥ 30
─ 3 conditions to be satisfied
38
SAMPLING
DISTRIBUTION OF
ANY STATISTIC
39
Are other statistics
approximately normal?
Every statistic has a sampling distribution but, other than the
sample mean (e.g., sample variance, sample median and so on),
the distribution may be very different from being bell-shaped
Approximate sampling
distribution of H is far
from being normal
Summarized next:
values of H for
sum
1,560 (= M) games
Summary
other Statistic
Refer to Slides 33-34 than other than
Sample Counterpart
for a binary population mean sample mean
43
Statistic Sampling distribution of the statistic
Known Yes
? Slide 25
Yes
No Sample No
Sample Normal size n
Mean X? ≥30? Slide 38
No Sample Yes
Yes
size n
≥30? or
No Slide 27 Slide 38
Summary
Statistic An approximate sampling distribution
other than constructed via repeated sampling
sample mean Slide 40
44
Takeaway
Sampling variation; repeated sampling
Any sample statistic is a RV
Use of sample counterpart as parameter estimate
Sampling distribution of the sample mean
Central Limit Theorem
Sampling distribution of the sample proportion
Sampling distribution of other sample statistics
45