Sampling
Sampling
Statistical Inference
The purpose of statistical inference is to obtain information
about a population from information contained in a
sample.
Statistical Inference
» The sample results provide only estimates of the values of
the population characteristics.
» With proper sampling methods, the sample results can
provide “good” estimates of the population
characteristics.
» A parameter is a numerical characteristic of a population.
4
Sampling
5
Specifically :
Sampling is the act, process, or technique of
selecting a representative part of a population for the
purpose of determining parameters or characteristics
of the whole population
7
Population Frame
» A list, map, directory, or other source used to represent the
population
» Over registration -- the frame contains all members of the target
population and some additional elements
» Example: using the chamber of commerce membership
directory as the frame for a target population of member
businesses owned by women.
» Under registration -- the frame does not contain all members of
the target population.
» Example: using the chamber of commerce membership
directory as the frame for a target population of all businesses.
10
Nonrandom Sampling
» Convenience Sampling: sample elements are selected for the
convenience of the researcher
» Judgment Sampling: sample elements are selected by the
judgment of the researcher
» Quota Sampling: sample elements are selected until the quota
controls are satisfied
13
» N = 30
» n=6
19
Systematic Sampling
» Convenient and relatively
easy to administer N
» Population elements are an k = ,
ordered sequence (at least, n
conceptually).
» The first sample element is where:
selected randomly from the
first k population elements. n = sample size
» Thereafter, sample elements
are selected at a constant N = population size
interval, k, from the ordered
sequence frame. k = size of selection interval
22
Cluster Sampling
» Population is divided into non-overlapping clusters or areas
» Each cluster is a miniature, or microcosm, of the population.
» A subset of the clusters is selected randomly for the sample.
» If the number of elements in the subset of clusters is larger
than the desired value of n, these clusters may be subdivided
to form a new set of clusters and subjected to a random
selection process.
24
Cluster Sampling
Advantages
• More convenient for geographically dispersed populations
• Reduced travel costs to contact sample elements
• Simplified administration of the survey
• Unavailability of sampling frame prohibits using other random
sampling methods
Disadvantages
• Statistically less efficient when the cluster elements are similar
• Costs and problems of statistical analysis are greater than for
simple random sampling
25
Cluster Sampling
• Grand Forks • Portland
• Fargo
• Buffalo• Pittsfield
• Boise • Milwaukee
• Cedar
Rapids
• Denver • Kansas•• Louisville
Cincinnati
• San Jose
City
• San
• Phoenix • Atlanta
• Sherman-
Diego • Tucson• Odessa-
Dension
Midland
26
27
Sampling Error
» When the expected value of a point estimator is equal to the
population parameter, the point estimator is said to be unbiased.
» The absolute value of the difference between an unbiased
point estimate and the corresponding population parameter is
called the sampling error.
» Sampling error is the result of using a subset of the population
(the sample), and not the entire population.
» Statistical methods can be used to make probability statements
about the size of the sampling error.
28
Sampling Error
The sampling errors are:
Sampling
Distribution
30
Sampling Distributions
A sampling distribution is created by, as the name suggests,
sampling.
N=8
20
15
10
Frequency
Sampling Distribution of
The sampling distribution of is the probability distribution
of all possible values of the sample mean .
Expected Value of
E( ) = µ
where:
µ = the population mean
35
While there are 36 possible samples of size 2, there are only 11 values for , and some
(e.g. =3.5) occur more frequently than others (e.g. =1).
37
P( )
1.0 1/36
1.5 2/36
2.0 3/36
2.5 4/36
3.0 5/36
3.5 6/36
4.0 5/36
4.5 4/36
5.0 3/36
5.5 2/36
6.0 1/36
38
6/36
5/36
)
4/36
P( 3/36
2/36
1/36
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
39
Compare…
Compare the distribution of X…
1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
“A sampling distribution is a
distribution of all of the possible
values of a statistic for a given size
sample selected from a population.
”
41
Sampling Distribution of
Proper analysis and interpretation of a sample statistic requires
knowledge of its distribution.
Calculate x
to estimate
Population Sample
Process of x
(parameter) Inferential Statistics
(statistic )
Select a
random sample
42
The larger the sample size, the more closely the sampling
distribution of X will resemble a normal distribution.
43
x
46
(continued)
If the Population is not Normal
Sampling distribution Population Distribution
properties:
Central Tendency
xμ μ x
Sampling Distribution
(becomes normal as n
Variation σ increases)
σx Smaller
Larger
n sample size
sample size
μx x
47
Unbiasedness
» An estimator is said to be unbiased if its expected value is equal to the
population parameter it estimates.
» For example, E(X)=µso the sample mean is an unbiased estimator of the
population mean. Unbiasedness is an average or long-run property. The
mean of any single sample will probably not equal the population mean, but
the average of the means of repeated independent samples from a
population will equal the population mean.
» Any systematic deviation of the estimator from the population parameter of
interest is called a bias.
50
{
Bias
An unbiased estimator is on A biased estimator is off
target on average. target on average.
51
Efficiency
An estimator is efficient if it has a relatively small variance (and
standard deviation).
Consistency
n = 10 n = 100
An estimator is said to be sufficient if it contains all the information in
the data about the parameter it estimates.
53
» In general, the sample mean is the best estimator of the population mean.
The sample mean is the most efficient unbiased estimator of the
population mean. It is also a consistent estimator.
54
Example
» Suppose a population has mean μ = 8 and
standard deviation σ = 3. Suppose a random
sample of size n = 36 is selected.
Solution:
» Even if the population is not normally distributed, the central limit
theorem can be used (n > 30)
» … so the sampling distribution of is approximately normal
» … with mean = 8
» …and standard deviation σ 3
σx 0.5
n 36
56
Solution:
7.8 - 8 X -μ 8.2 - 8
P(7.8 X 8.2) P
3 σ 3
36 n 36
P(-0.4 Z 0.4) 0.3108
Example
The foreman of a bottling plant has observed that the
amount of soda in each “32-ounce” bottle is actually a
normally distributed random variable, with a mean of 32.2
ounces and a standard deviation of .3 ounce.
Solution:
We want to find P(X > 32), where X is normally distributed and µ = 32.2
and σ =.3
X 32 32.2
P(X 32) P P( Z .67) 1 .2514 .7486
.3
“there is about a 75% chance that a single bottle of soda contains
more than 32oz.”
59
Example (b)
The foreman of a bottling plant has observed that the amount of
soda in each “32-ounce” bottle is actually a normally distributed
random variable, with a mean of 32.2 ounces and a standard
deviation of .3 ounce.
Solution:
We want to find P(X > 32), where X is normally distributed
With µ = 32.2 and σ =.3
Things we know:
1) X is normally distributed, therefore so will X.
2) = 32.2 oz.
3)
61
Solution:
If a customer buys a carton of four bottles, what is the
probability that the mean amount of the four bottles will be
greater than 32 ounces?
what is the probability that one bottle what is the probability that the mean of
will contain more than 32 ounces? four bottles will exceed 32 oz?
63
Solution:
Because the sample size is greater than 30, the
central limit theorem can be used to state that the
sample mean is normally distributed and the problem
can proceed using the normal distribution
calculations.
65
Solution:
Population Parameters : 85, 9
Sample Size : n 40
X 87 85
P ( X 87) P
9
n 40
PZ 1.41 0.0793
66
.4207 .4207
85 87 X 0 1.41 Z
X - 87 85 2
Z= 1. 41 Equal Areas
9 1. 42 of .0793
n 40
67
Student’s t Distribution
If the population standard deviation, σ, is unknown, replace σ
with the sample standard deviation, s. If the population is
normal, the resulting statistic:
t X
s/ n
has a t distribution with (n - 1) degrees of freedom.
68
Student’s t Distribution
The t is a family of bell-shaped and
symmetric distributions, one for each
number of degree of freedom. Standard normal
The expected value of t is 0.
t, df=20
The variance of t is greater than 1, but
approaches 1 as the number of t, df=10
degrees of freedom increases. The t is
flatter and has fatter tails than does
the standard normal.
The t distribution approaches a 0
standard normal as the number of µ
degrees of freedom increases.