0% found this document useful (0 votes)
13 views68 pages

Sampling

The document discusses various sampling methods used in statistical inference. It defines key terms like population, sample, parameter, and sampling. It then describes different sampling techniques including simple random sampling, stratified random sampling, systematic sampling, and cluster sampling. For each technique, it provides examples and discusses their advantages and disadvantages. The overall purpose is to explain different approaches for selecting representative samples from populations.

Uploaded by

Raj Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views68 pages

Sampling

The document discusses various sampling methods used in statistical inference. It defines key terms like population, sample, parameter, and sampling. It then describes different sampling techniques including simple random sampling, stratified random sampling, systematic sampling, and cluster sampling. For each technique, it provides examples and discusses their advantages and disadvantages. The overall purpose is to explain different approaches for selecting representative samples from populations.

Uploaded by

Raj Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 68

Statistical Inference

Statistical Inference
The purpose of statistical inference is to obtain information
about a population from information contained in a
sample.

» A population is the set of all the elements of interest.

» A sample is a subset of the population.


3

Statistical Inference
» The sample results provide only estimates of the values of
the population characteristics.
» With proper sampling methods, the sample results can
provide “good” estimates of the population
characteristics.
» A parameter is a numerical characteristic of a population.
4

Sampling
5

“Sampling is the act, process, or


technique of selecting a suitable
Sample”
6

Specifically :
Sampling is the act, process, or technique of
selecting a representative part of a population for the
purpose of determining parameters or characteristics
of the whole population
7

Reasons for Sampling


» Sampling can save money.
» Sampling can save time.
» For given resources, sampling can broaden the scope of the
data set.
» Because the research process is sometimes destructive, the
sample can save product.
» If accessing the population is impossible; sampling is the only
option.
8

Reasons for Taking a Census


» Eliminate the possibility that a random sample is not
representative of the population.

» The person authorizing the study is uncomfortable with sample


information.
9

Population Frame
» A list, map, directory, or other source used to represent the
population
» Over registration -- the frame contains all members of the target
population and some additional elements
» Example: using the chamber of commerce membership
directory as the frame for a target population of member
businesses owned by women.
» Under registration -- the frame does not contain all members of
the target population.
» Example: using the chamber of commerce membership
directory as the frame for a target population of all businesses.
10

Random Versus Nonrandom


Sampling
» Random sampling
• Every unit of the population has the same probability of being included
in the sample.
• A chance mechanism is used in the selection process.
• Eliminates bias in the selection process
• Also known as probability sampling
» Nonrandom Sampling
• Every unit of the population does not have the same probability of being
included in the sample.
• Open the selection bias
• Not appropriate data collection methods for most statistical methods
• Also known as nonprobability sampling
11

Random Sampling Techniques


» Simple Random Sample
» Stratified Random Sample
» Systematic Random Sample
» Cluster (or Area) Sampling
12

Nonrandom Sampling
» Convenience Sampling: sample elements are selected for the
convenience of the researcher
» Judgment Sampling: sample elements are selected by the
judgment of the researcher
» Quota Sampling: sample elements are selected until the quota
controls are satisfied
13

Simple Random Sampling:


Finite Population
» Finite populations are often defined by lists such as:
⋄ Organization membership roster
⋄ Credit card account numbers
⋄ Inventory product numbers
 A simple random sample of size n from a finite population of size
N is a sample selected such that each possible sample of size n
has the same probability of being selected.
14

Simple Random Sampling:


Finite Population
 Replacing each sampled element before selecting subsequent
elements is called sampling with replacement.
 Sampling without replacement is the procedure used most often.
 In large sampling projects, computer-generated random numbers are
often used to automate the sample selection process.
15

Simple Random Sampling:


Infinite Population
» Infinite populations are often defined by an ongoing process
whereby the elements of the population consist of items
generated as though the process would operate indefinitely.

 A simple random sample from an infinite population is a sample


selected such that the following conditions are satisfied.
 Each element selected comes from the same population.
 Each element is selected independently.
16

Simple Random Sampling:


Infinite Population
» In the case of infinite populations, it is impossible to obtain a list
of all elements in the population.
» The random number selection procedure cannot be used for
infinite populations.
17

Simple Random Sample:


Numbered Population Frame
01 Alaska Airlines 11 DuPont 21 Lucent
02 Alcoa 12 Exxon Mobil 22 Mattel
03 Ashland 13 General Dynamics 23 Mead
04 Bank of America 14 General Electric 24 Microsoft
05 BellSouth 15 General Mills 25 Occidental Petroleum
06 Chevron 16 Halliburton 26 JCPenney
07 Citigroup 17 IBM 27 Procter & Gamble
08 Clorox 18 Kellog 28 Ryder
09 Delta Air Lines 19 KMart 29 Sears
10 Disney 20 Lowe’s 30 Time Warner
18

Simple Random Sample:


Sample Members
01 Alaska Airlines 11 DuPont 21 Lucent
02 Alcoa 12 Exxon Mobil 22 Mattel
03 Ashland 13 General Dynamics 23 Mead
04 Bank of America 14 General Electric 24 Microsoft
05 BellSouth 15 General Mills 25 Occidental Petroleum
06 Chevron 16 Halliburton 26 JC Penney
07 Citigroup 17 IBM 27 Procter & Gamble
08 Clorox 18 Kellog 28 Ryder
09 Delta Air Lines 19 KMart 29 Sears
10 Disney 20 Lowe’s 30 Time Warner

» N = 30
» n=6
19

Stratified Random Sample


» Population is divided into non-overlapping subpopulations called strata
» A random sample is selected from each stratum
» Potential for reducing sampling error
» Proportionate -- the percentage of thee sample taken from each stratum is
proportionate to the percentage that each stratum is within the population
» Disproportionate -- proportions of the strata within the sample are different
than the proportions of the strata within the population
20

Stratified Random Sample:


Population of FM Radio Listeners
Stratified by Age
20 - 30 years old
(homogeneous within)
Hetergeneous
(alike)
(different)
30 - 40 years old between
(homogeneous within)
(alike) Hetergeneous
(different)
40 - 50 years old between
(homogeneous within)
(alike)
21

Systematic Sampling
» Convenient and relatively
easy to administer N
» Population elements are an k = ,
ordered sequence (at least, n
conceptually).
» The first sample element is where:
selected randomly from the
first k population elements. n = sample size
» Thereafter, sample elements
are selected at a constant N = population size
interval, k, from the ordered
sequence frame. k = size of selection interval
22

Systematic Sampling: Example


» Purchase orders for the previous fiscal year are serialized 1
to 10,000 (N = 10,000).
» A sample of fifty (n = 50) purchases orders is needed for
an audit.
» k = 10,000/50 = 200
» First sample element randomly selected from the first 200
purchase orders.
23

Cluster Sampling
» Population is divided into non-overlapping clusters or areas
» Each cluster is a miniature, or microcosm, of the population.
» A subset of the clusters is selected randomly for the sample.
» If the number of elements in the subset of clusters is larger
than the desired value of n, these clusters may be subdivided
to form a new set of clusters and subjected to a random
selection process.
24

Cluster Sampling
Advantages
• More convenient for geographically dispersed populations
• Reduced travel costs to contact sample elements
• Simplified administration of the survey
• Unavailability of sampling frame prohibits using other random
sampling methods
Disadvantages
• Statistically less efficient when the cluster elements are similar
• Costs and problems of statistical analysis are greater than for
simple random sampling
25

Cluster Sampling
• Grand Forks • Portland
• Fargo
• Buffalo• Pittsfield
• Boise • Milwaukee
• Cedar
Rapids
• Denver • Kansas•• Louisville
Cincinnati
• San Jose
City
• San
• Phoenix • Atlanta
• Sherman-
Diego • Tucson• Odessa-
Dension
Midland
26
27

Sampling Error
» When the expected value of a point estimator is equal to the
population parameter, the point estimator is said to be unbiased.
» The absolute value of the difference between an unbiased
point estimate and the corresponding population parameter is
called the sampling error.
» Sampling error is the result of using a subset of the population
(the sample), and not the entire population.
» Statistical methods can be used to make probability statements
about the size of the sampling error.
28

Sampling Error
The sampling errors are:

|x   | for sample mean

|s   | for sample standard deviation

| p  p| for sample proportion


29

Sampling
Distribution
30

Sampling Distributions
A sampling distribution is created by, as the name suggests,
sampling.

The method we will employ on the rules of probability and


the laws of expected value and variance to derive the
sampling distribution.

For example, consider the roll of one and two dice…


31

Distribution of a Small Finite


Population

N=8

54, 55, 59, 63,64, 68, 69, 70


32

Sample Space for n = 2 with


Replacement
Sample Mean Sample Mean Sample Mean Sample Mean
1 (54,54) 54.0 17 (59,54) 56.5 33 (64,54) 59.0 49 (69,54) 61.5
2 (54,55) 54.5 18 (59,55) 57.0 34 (64,55) 59.5 50 (69,55) 62.0
3 (54,59) 56.5 19 (59,59) 59.0 35 (64,59) 61.5 51 (69,59) 64.0
4 (54,63) 58.5 20 (59,63) 61.0 36 (64,63) 63.5 52 (69,63) 66.0
5 (54,64) 59.0 21 (59,64) 61.5 37 (64,64) 64.0 53 (69,64) 66.5
6 (54,68) 61.0 22 (59,68) 63.5 38 (64,68) 66.0 54 (69,68) 68.5
7 (54,69) 61.5 23 (59,69) 64.0 39 (64,69) 66.5 55 (69,69) 69.0
8 (54,70) 62.0 24 (59,70) 64.5 40 (64,70) 67.0 56 (69,70) 69.5
9 (55,54) 54.5 25 (63,54) 58.5 41 (68,54) 61.0 57 (70,54) 62.0
10 (55,55) 55.0 26 (63,55) 59.0 42 (68,55) 61.5 58 (70,55) 62.5
11 (55,59) 57.0 27 (63,59) 61.0 43 (68,59) 63.5 59 (70,59) 64.5
12 (55,63) 59.0 28 (63,63) 63.0 44 (68,63) 65.5 60 (70,63) 66.5
13 (55,64) 59.5 29 (63,64) 63.5 45 (68,64) 66.0 61 (70,64) 67.0
14 (55,68) 61.5 30 (63,68) 65.5 46 (68,68) 68.0 62 (70,68) 69.0
15 (55,69) 62.0 31 (63,69) 66.0 47 (68,69) 68.5 63 (70,69) 69.5
16 (55,70) 62.5 32 (63,70) 66.5 48 (68,70) 69.0 64 (70,70) 70.0
33

Distribution of the Sample Means


Sampling Distribution Histogram

20

15

10
Frequency

53.75 56.25 58.75 61.25 63.75 66.25 68.75 71.25


34

Sampling Distribution of
The sampling distribution of is the probability distribution
of all possible values of the sample mean .
Expected Value of
E( ) = µ

where:
µ = the population mean
35

Sampling Distribution of the


Mean…
A fair die is thrown infinitely many times,
with the random variable X = # of spots on any throw.
The probability distribution of X is:
x 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6
…and the mean and variance are calculated as well:
36

Sampling Distribution of Two


Dice
A sampling distribution is created by looking at
all samples of size n=2 (i.e. two dice) and their means…

While there are 36 possible samples of size 2, there are only 11 values for , and some
(e.g. =3.5) occur more frequently than others (e.g. =1).
37

Sampling Distribution of Two


Dice…
The sampling distribution of is shown below:

P( )
1.0 1/36
1.5 2/36
2.0 3/36
2.5 4/36
3.0 5/36
3.5 6/36
4.0 5/36
4.5 4/36
5.0 3/36
5.5 2/36
6.0 1/36
38

Sampling Distribution of Two


Dice…
The sampling distribution of is shown below:

6/36

5/36

)
4/36

P( 3/36

2/36

1/36

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
39

Compare…
Compare the distribution of X…

1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

…with the sampling distribution of .


As well, note that:
40

“A sampling distribution is a
distribution of all of the possible
values of a statistic for a given size
sample selected from a population.

41

Sampling Distribution of
Proper analysis and interpretation of a sample statistic requires
knowledge of its distribution.

Calculate x
to estimate 
Population Sample
 Process of x
(parameter) Inferential Statistics
(statistic )

Select a
random sample
42

Central Limit Theorem…


The sampling distribution of the mean of a random sample
drawn from any population is approximately normal for a
sufficiently large sample size.

The larger the sample size, the more closely the sampling
distribution of X will resemble a normal distribution.
43

Central Limit Theorem…


If the population is normal, then X is normally distributed for
all values of n.

If the population is non-normal, then X is approximately


normal only for larger values of n.

In most practical situations, a sample size of 30 may be


sufficiently large to allow us to use the normal distribution as
an approximation for the sampling distribution of X.
44

If the Population is not Normal


» We can apply the Central Limit Theorem:
 Even if the population is not normal,
 …sample means from the population will be approximately
normal as long as the sample size is large enough.

Properties of the sampling distribution:


xμ σ
and σx 
n
45

Central Limit Theorem


the sampling
distribution
As the n↑ becomes
sample almost normal
size gets regardless of
large shape of
enough… population

x
46

(continued)
If the Population is not Normal
Sampling distribution Population Distribution

properties:
Central Tendency
xμ μ x
Sampling Distribution
(becomes normal as n
Variation σ increases)
σx  Smaller
Larger
n sample size
sample size
μx x
47

How Large is Large Enough?


» For most distributions, n > 30 will give a sampling distribution
that is nearly normal
» For fairly symmetric distributions, n > 15
» For normal population distributions, the sampling distribution
of the mean is always normally distributed
48

Estimators and Their Properties


An estimator of a population parameter is a sample statistic used to estimate the
parameter. The most commonly-used estimator of the:
Population Parameter Sample Statistic
Mean (µ) is the Mean ()
Variance () is the Variance (s2)
Standard Deviation () is the Standard Deviation (s)
Proportion (p) is the Proportion ( )

• Desirable properties of estimators include:


 Unbiasedness
 Efficiency
 Consistency
 Sufficiency
49

Unbiasedness
» An estimator is said to be unbiased if its expected value is equal to the
population parameter it estimates.
» For example, E(X)=µso the sample mean is an unbiased estimator of the
population mean. Unbiasedness is an average or long-run property. The
mean of any single sample will probably not equal the population mean, but
the average of the means of repeated independent samples from a
population will equal the population mean.
» Any systematic deviation of the estimator from the population parameter of
interest is called a bias.
50

Unbiased and Biased Estimators

{
Bias
An unbiased estimator is on A biased estimator is off
target on average. target on average.
51

Efficiency
An estimator is efficient if it has a relatively small variance (and
standard deviation).

An efficient estimator is, on An inefficient estimator is, on


average, closer to the average, farther from the
parameter being estimated.. parameter being estimated.
52

Consistency and Sufficiency


An estimator is said to be consistent if its probability of being close
to the parameter it estimates increases as the sample size increases.

Consistency

n = 10 n = 100
An estimator is said to be sufficient if it contains all the information in
the data about the parameter it estimates.
53

Properties of the Sample Mean


» For a normal population, both the sample mean and sample median are
unbiased estimators of the population mean, but the sample mean is both
more efficient (because it has a smaller variance), and sufficient. Every
observation in the sample is used in the calculation of the sample mean,
but only the middle value is used to find the sample median.

» In general, the sample mean is the best estimator of the population mean.
The sample mean is the most efficient unbiased estimator of the
population mean. It is also a consistent estimator.
54

Example
» Suppose a population has mean μ = 8 and
standard deviation σ = 3. Suppose a random
sample of size n = 36 is selected.

» What is the probability that the sample mean is


between 7.8 and 8.2?
55

Solution:
» Even if the population is not normally distributed, the central limit
theorem can be used (n > 30)
» … so the sampling distribution of is approximately normal
» … with mean = 8
» …and standard deviation σ 3
σx    0.5
n 36
56

Solution:
 
 7.8 - 8 X -μ 8.2 - 8 
P(7.8  X  8.2)  P   
 3 σ 3 
 36 n 36 
 P(-0.4  Z  0.4)  0.3108

Sampling Standard Normal


Distribution Distribution
Population .1554
Distribution ? ? ? +.1554
? ??
? ?
? ? ? Sample Standardi
? ze
-0.4μ  0 0.4
μ8 X 7.8 μ  8 8.2
X
x z Z
57

Example
The foreman of a bottling plant has observed that the
amount of soda in each “32-ounce” bottle is actually a
normally distributed random variable, with a mean of 32.2
ounces and a standard deviation of .3 ounce.

If a customer buys one bottle, what is the probability that the


bottle will contain more than 32 ounces?
58

Solution:
We want to find P(X > 32), where X is normally distributed and µ = 32.2
and σ =.3
 X   32  32.2 
P(X  32)  P    P( Z   .67)  1  .2514  .7486
  .3 
“there is about a 75% chance that a single bottle of soda contains
more than 32oz.”
59

Example (b)
The foreman of a bottling plant has observed that the amount of
soda in each “32-ounce” bottle is actually a normally distributed
random variable, with a mean of 32.2 ounces and a standard
deviation of .3 ounce.

If a customer buys a carton of four bottles, what is the probability


that the mean amount of the four bottles will be greater than 32
ounces?
60

Solution:
We want to find P(X > 32), where X is normally distributed
With µ = 32.2 and σ =.3

Things we know:
1) X is normally distributed, therefore so will X.
2) = 32.2 oz.
3)
61

Solution:
If a customer buys a carton of four bottles, what is the
probability that the mean amount of the four bottles will be
greater than 32 ounces?

“There is about a 91% chance the mean of the four bottles


will exceed 32oz.”
62

Graphically Speaking… mean=32.2

what is the probability that one bottle what is the probability that the mean of
will contain more than 32 ounces? four bottles will exceed 32 oz?
63

Tire Store Example


Suppose that the mean expenditure per customer at a
tire store is $85.00, with a standard deviation of
$9.00. If a random sample of 40 customers is taken,
what is the probability that the sample average
expenditure per customer for this sample will be
$87.00 or more?
64

Solution:
Because the sample size is greater than 30, the
central limit theorem can be used to state that the
sample mean is normally distributed and the problem
can proceed using the normal distribution
calculations.
65

Solution:
Population Parameters :   85,   9
Sample Size : n  40
 
 X   87  85 
P ( X  87)  P  
  9 
 n 40 
 PZ  1.41  0.0793
66

Graphic Solution to Tire Store


Example
9
 X
  1
40
 1. 42 .5000 .5000

.4207 .4207
85 87 X 0 1.41 Z
X -  87  85 2
Z=    1. 41 Equal Areas
 9 1. 42 of .0793
n 40
67

Student’s t Distribution
If the population standard deviation, σ, is unknown, replace σ
with the sample standard deviation, s. If the population is
normal, the resulting statistic:
t  X 
s/ n
has a t distribution with (n - 1) degrees of freedom.
68

Student’s t Distribution
 The t is a family of bell-shaped and
symmetric distributions, one for each
number of degree of freedom. Standard normal
 The expected value of t is 0.
t, df=20
 The variance of t is greater than 1, but
approaches 1 as the number of t, df=10
degrees of freedom increases. The t is
flatter and has fatter tails than does
the standard normal.
 The t distribution approaches a 0
standard normal as the number of µ
degrees of freedom increases.

You might also like