0% found this document useful (0 votes)
147 views74 pages

Chapter 4 Sampling Distributions PDF

The sampling distribution of the sample mean is the probability distribution of the means of all possible random samples of a given size drawn from the population. As the sample size increases, the sampling distribution of the mean approximates the normal distribution according to the central limit theorem.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views74 pages

Chapter 4 Sampling Distributions PDF

The sampling distribution of the sample mean is the probability distribution of the means of all possible random samples of a given size drawn from the population. As the sample size increases, the sampling distribution of the mean approximates the normal distribution according to the central limit theorem.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Amira Dridi Business Statistics II

Chapter 7  Chapter 11
2
In this chapter, you learn:
 To distinguish between different sampling
methods
 The concept of the sampling distribution
 To compute probabilities related to the sample
mean and the sample proportion
 The importance of the Central Limit Theorem

3
 A sample is a portion or part of the population of interest

??????????????
Sample

Population Mean
(mean, μ, is X = 50
unknown)

Sample

4
 For the safety of the consumer.
 Sampling – A means for gathering useful
information about a population
 Data are gathered from samples and
conclusions are drawn about the population as
a part of the inferential statistics process

5
 Sampling vs. census has advantages
 Sampling can save money.
 Sampling can save time.
 For given resources, the sample can broaden the scope of
the study.
 Because the research process is sometimes destructive,
the sample can save product.
 If accessing the population is impossible, the sample is
the only option.

6
 Eliminate the possibility that a random sample is
not representative of the population.

 The person authorizing the study is uncomfortable with


sample information.
For example, the performance of the products is so critical
to the consumer.
100% of the products are tested (i.e., airplanes or heart
defibrillators).
7
 Every research study has a target population
that consists of the individuals, institutions,
or entities that are the object of
investigation.
 The sampling frame is a listing of items that
make up the population
 Frames are data sources such as population
lists, map, directories, or other sources used
to represent the population
8
 Sampling is done from the frame, not the
target: the population
 Inaccurate or biased results can result if a
frame excludes certain portions of the
population.
 Using different frames to generate data can
lead to dissimilar conclusions

9
Samples

Non-Probability Probability Samples


Samples

Simple Stratified
Judgment Convenience Random

Systematic Cluster

10
 Nonrandom Sampling (non probability sampling)- Every
unit of the population does not have the same probability of
being included in the sample. Members of nonrandom
samples are not selected by chance

 Random sampling (probability sampling)- Every unit of the


population has the same probability of being included in the
sample. Random sampling implies that chance enters into
the process of selection

11
 The statistical methods presented and
discussed in this course are based on the
assumption that the data come from random
samples.
 Nonrandom sampling methods are not
appropriate techniques for gathering data to
be analyzed by most of the statistical methods
presented in this course.

12
Samples

Non-Probability Probability Samples


Samples

Simple Stratified
Judgment Convenience Random

Systematic Cluster

13
 See pp. 223-225
 Simple Random Sample – basis for other random
sampling techniques
 Each unit is numbered from 1 to N (N is the population size)
 A table of random numbers or a random number generator can
be used to select n (n<<N) items from the sample

14
 See Black (2008), pp. 220-221
 Select a sample of six companies

15
 Select a sample of six companies
 Population frame?

16
 Number every member of the population

01 Alaska Airlines 11 DuPont 21 Lucent


02 Alcoa 12 Exxon Mobil 22 Mattel
03 Ashland 13 General Dynamics 23 Mead
04 Bank of America 14 General Electric 24 Microsoft
05 BellSouth 15 General Mills 25 Occidental Petroleum
06 Chevron 16 Halliburton 26 JCPenney
07 Citigroup 17 IBM 27 Procter & Gamble
08 Clorox 18 Kellog 28 Ryder
09 Delta Air Lines 19 KMart 29 Sears
10 Disney 20 Lowe’s 30 Time Warner

17
 Use a table of random numbers to select the
6 items

18
 Use a random number generator to select the 6 items (SPSS 20)
 File “ch 04 sampling dist example 1.sav”
 Click Transform  Random # Generators  Active Generator
Initialization  Fixed Value  Enter a numeric seed
▪ Click Data  Select Cases
▪ Under Select, choose Random Sample of Cases. Click on Sample
box.
▪ Choose Exactly n (10) of the first N (31), click Continue
▪ Click OK. Slashes will appear in the case numbers of the cases not
in the sample, and an indicator variable representing Filter will be
attached to end of dataset.
▪ Analyze the Data

19
20
 Samples are used to estimate population
characteristics (i.e., sample mean is used to
estimate the population mean)
 It is unlikely that the sample statistic (i.e., sample
mean) would be exactly equal to its correponding
population parameter (i.e., population mean)
 Sampling error: The difference between a sample
statistic and its corresponding population
parameter

21
 See p. 224
 File « ch 02 sampling dist example 2.sav»
 The population mean is equal to 3.13

 Select random samples of size 5 and compute


the sampling error
22
23
 Definition

Sampling distribution of the sample mean is a


probability distribution of all possible means
of a given sample size.

24
 Assume there is a population …
D
A C
 Population size N=4 B

 Random variable, X,
is age of individuals
 Values of X: 18, 20,
22, 24 (years)

25
Summary Measures for the Population Distribution:

μ
 X i P(x)
N .3

18  20  22  24 .2
  21
4 .1

σ
 (X  μ)
i
2

 2.236
18
A B
20
C
22
D
24 x
N
Uniform Distribution

26
Example: Developing a Sampling
Distribution

16 Sample Means
1st 2nd Observation
Obs
18 20 22 24
18 18,18 18,20 18,22 18,24 1st 2nd Observation
20 20,18 20,20 20,22 20,24 Obs 18 20 22 24
22 22,18 22,20 22,22 22,24 18 18 19 20 21
24 24,18 24,20 24,22 24,24 20 19 20 21 22
16 possible samples 22 20 21 22 23
(sampling with
replacement)
24 21 22 23 24

27
Example: Developing a Sampling
Distribution

16 Sample Means Sample Means


Distribution
1st 2nd Observation _
Obs 18 20 22 24 P(X)
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 X
(no longer uniform) 28
Example: Developing a Sampling
Distribution

18  19  19    24
μX   21
16
(18 - 21) 2  (19 - 21) 2    (24 - 21) 2
σX   1.58
16

Note: Here we divide by 16 because there are 16


different samples of size 2.

29
Population Sample Means Distribution
N=4 n=2
μ  21 σ  2.236 μX  21 σ X  1.58
_
P(X) P(X)
.3 .3

.2 .2

.1 .1

0
18 20 22 24 X
0
18 19 20 21 22 23 24
_
X
A B C D

30
 Different samples of the same size from the same
population will yield different sample means
 A measure of the variability in the mean from sample to
sample is given by the Standard Error of the Mean:
(This assumes that sampling is with replacement or
sampling is without replacement from an infinite population)

σ
σX 
n
 Note that the standard error of the mean decreases as the
sample size increases
31
Proper analysis and interpretation of a sample statistic
requires knowledge of its distribution.

Calculate x
Pop ulation to estimate 
Samp le
 Process of x
(p arameter) Inferential Statistics
(statistic)
" Start here."
Select a
random sample
32
 The sample mean is one of the more common
statistics used in the inferential process.
 To compute and assign the probability of occurrence
of a particular value of a sample mean, the researcher
must know the distribution of the sample means.
 One way to examine the distribution is to take a
population with a particular distribution, randomly
select samples of a given size, compute the sample
means, and attempt to determine how the means are
distributed.
33
 Suppose a small finite population consists of N=8
numbers {54, 55, 59, 63, 64, 68, 69, 70}

 We take all possible samples of size n=2 from this


population with replacement
 For each sample, we compute the mean

34
 The distribution of these sample means can be
represented using an histogram

shape of the histogram for sample means is quite unlike


the shape of the histogram for the population.

35
 Data from a Poisson distribution of values with a population mean of
1.25.
 90 samples of size n = 30 are taken randomly from a Poisson
distribution with = 1.25 and the means are computed on each sample.
The resulting distribution of sample means is displayed

36
 Although the samples were drawn from a Poisson
distribution, which is skewed to the right, the
sample means approaches a symmetrical, nearly
normal-curve-type distribution.

37
 Notice that even for small sample sizes, the distributions
of sample means for samples taken from the uniformly
distributed population begin to “pile up” in the middle.

 As sample sizes become much larger, the sample mean


distributions begin to approach a normal distribution and
the variation among the means decreases.

38
 We examined three populations with different
distributions
 The mean of the sample means is exactly equal
to the population mean

X  

39
 The dispersion of the sampling distribution of sample
means is narrower than the population distribution

X 
n
X is called the standard error of the mean

 The sample means for samples taken from these


populations appear to be approximately normally
distributed (bell-shaped), especially as the sample sizes
become larger.
40
 Central limits theorem allows one to study
populations with differently shaped distributions
 Central limits theorem creates the potential for
applying the normal distribution to many problems
when sample size is sufficiently large (i.e., create
confidence intervals for the population mean,
perform tests of hypothesis)

41
 Advantage of Central Limits theorem is when
sample data is drawn from populations not
normally distributed or populations of unknown
shape can also be analyzed because the sample
means are normally distributed due to large
sample sizes

42
 If samples of size n are drawn randomly from a
population that has a mean of μ and a standard
deviation of σ, the sample means are approximately
normally distributed fror sufficiently large sample
sizes (n≥30) regardless of the shape of the
population distribution
 If the population is normally distributed, the sample
means are normally distributed for any sample size.

43
 The central limit theorem creates the potential for applying the
normal distribution to many problems when sample size is
sufficiently large.
 Sample means that have been computed for random samples
drawn from normally distributed populations are normally
distributed.
 However, the real advantage of the central limit theorem comes
when sample data drawn from populations not normally
distributed or from populations of unknown shape also can be
analyzed by using the normal distribution because the sample
means are normally distributed for sufficiently large sample
sizes.
44
 How large must a sample be for the central limit
theorem to apply?
 The sample size necessary varies according to the shape of
the population. However, in this text (as in many others), a
sample of size 30 or larger will suffice.
 Recall that if the population is normally distributed, the
sample means are normally distributed for sample sizes as
small as n = 2.
Z Formula for Sample Means

 The central limit theorem states that sample means


are normally distributed regardless of the shape of
the population for large samples and for any sample
size with normally distributed populations. Thus,
sample means can be analyzed by using z scores:

X  X X 
Z 
X / n
47

 Suppose, for example, that the mean expenditure
per customer at a tire store is $85, with a standard
deviation of $9.
 What is the probability that the sample average
expenditure per customer for this sample will be $87 or
more if a random sample of 40 customers
is taken?

49
 Because the sample size is greater than 30, the central
limit theorem can be used, and the sample means are
normally distributed
 With μ= $85.00, σ = $9.00, and the z formula for
sample means, z is computed as shown next
 X

9
 1
40
.5000 .5000
 1.42

.4207 .4207

85 87 X 0 1.41 Z

X -  87  85 2
Z=    1. 41 Equal Areas
 9 1. 42 of .0793

n 40
51
Population Parameters:   85,   9  
 
Sample Size: n  40  87  85 
P Z
 87   X   9 
P( X  87)  P Z    
 
 X  40
 PZ  1.41
 
 87     .5  (0  Z  1.41)
 P Z  
 .5  .4207
  

 n   .0793
7.93% of the samples of size 40 will have an average expenditure per customer
of $87 or more
52
 Suppose that during any hour in a large department
store, the average number of shoppers is 448, with
a standard deviation of 21 shoppers.
 What is the probability that a random sample of 49 in
different shopping hours will yield a sample mean
between 441 and 446 shoppers?

53
24.15% of the samples of
size 49 will have a mean
between 441 and 446
shoppers.

54
 X
3  1
.4901 .4901
.2486 .2486

.2415 .2415

441 446 448 X -2.33 -.67 0 Z

X -  441  448 X -  446  448


Z=   2.33 Z =    0.67
 21 21
n 49
n 49
55
 The central limit theorem is based on the
assumption that the popultation was infinite or
extremely large
 In cases of a finite population, a statistical
adjustment can be made to the z formula for
sample means. The adjustment is called the finite
correction factor. It operates on the standard
deviation of sample mean:
X 
Z 
 N n Correction
factor
n N 1

56
 As the size of the finite population becomes larger
in relation to sample size, the finite correction
factor approaches 1
 In theory, whenever researches are working with a
finite population, they can use the finite correction
factor
 If the sample size is less than 5% of the finite
population size (n/N<0.05), the finite correction
factor does not significantly modify the solution
 The baggage limit for an airplane is set at 100
pounds per passenger. The weight of the baggage
of an individual passenger is a random variable
with a mean of 95 pounds and a standard deviation
of 35 pounds.
If we randomly select a random sample of size 50
in a particular flight and compute the passengers’
baggage mean, what is the probability that sample
mean will exceed the 100-pound limit? Interpret.
 The manufacturer of cans of Solomon that are supposed
to have a net weight of 6 ounces tells you that the net
weight is actually a normal random variable with a
mean of 6.05 ounces and a standard deviation of 0.18
ounces. Suppose that you draw a random sample of 36
cans.
 Find the probability that the mean weight of the sample
is less than 5.97 ounces.
 Suppose your random sample of 36 cans of solomon
produced a mean weight that is less than 5.97 ounces.
Comment on the statement made by the manufacturer.

60
p = the proportion of the population having
a characteristic of interest
 0≤p≤1
 p is usually unknown
 N is usually unknown
 Examples: proportion of students having GPA ≥2.0

X number of items in the population having the characteri stic of interest


p 
N population size

61
p = the proportion of the population having
a characteristic of interest
 Sample proportion ( p̂) provides an estimate
of p:
X number of items in the sample having the characteri stic of interest
p̂  
n sample size

 X is the number of successes. X ~ Bin (n, p)


 0≤ p̂ ≤1

62
Sampling Distribution of p̂

 The central limit theorem applies to sample


proportions in that the normal distribution
approximates the shape of the distribution
(binomial) of sample proportions p̂ if
 np ≥ 5,
 nq ≥ 5 (q=1-p).

63
Sampling Distribution of p̂
 Approximated by a
Sampling Distribution
normal distribution if: P( pˆ )
.3
 np  5 .2
and .1
0
n(1  p )  5 0 .2 .4 .6 8 1 p

Where and p(1  p)


μ p̂  p σ p̂ 
n
 p is the population proportion

64
Standardize p̂ to a Z value with the formula:

p̂   pˆ p̂  p
Z 
σ p̂ p(1  p)
n

65
 If the true proportion of voters who support
Proposition A is p = 0.4, what is the
probability that a sample of size 200 yields a
sample proportion between 0.40 and 0.45?

 i.e.: if p = 0.4 and n = 200, what is


P(0.40 ≤ p̂ ≤ 0.45) ?

66
(continued)

 if p = 0.4 and n = 200, what is


P(0.40 ≤p̂ ≤ 0.45) ?
p(1  p) 0.4(1  0.4)
Find σ
:
p̂ σ p̂    0.03464
n 200

Convert to  0.40  0.40 0.45  0.40 


standardized normal: P(0.40  p̂  0.45)  P Z 
 0.03464 0.03464 
 P(0  Z  1.44)

67
(continued)

 if p = 0.4 and n = 200, what is


P(0.40 ≤ p̂≤ 0.45) ?
Utilize the cumulative normal table:
P(0 ≤ Z ≤ 1.44) = 0.9251 – 0.5000 = 0.4251

Standardized
Sampling Distribution Normal Distribution

0.4251

Standardize

0.40 0.45 0 1.44


p Z
68
 If 10% of a population of parts is defective,
what is the probability of randomly
selecting 80 parts and finding that 12 or
more parts are defective?

69
Population Parameters
 . 15  P
P = 0 . 10 P Z 
PQ
Q = 1 - P  1 . 10  . 90 n

Sample . 15  . 10
 P Z 
n = 80 (. 10 )(. 90 )
80
X  12
0 . 05
X 12  P Z 
p    0 . 15 0 . 0335
n 80
 P ( Z  1. 49 )
. 15   p  . 5  P ( 0  Z  1. 49 )
P ( p  . 15 )  P Z   . 5  . 4319
 p
 . 0681

70
 p
 0. 0335  1
.5000 .5000

.4319 .4319

^
0.10 0.15 p 0 1.49 Z

pP 0.15  0.10 0. 05


Z=    1. 49
PQ (.10)(. 90) 0. 0335
n 80
71
 p
 0. 0335
.5000

.4319

^
0.10 0.15 p

72
 In the last election, a state representative received
52% of the votes cast. One year after the election,
the representative organized a survey that asked a
random sample of 300 people whether they would
vote for him in the next election. If we assume that
his popularity has not changed,
 what is the probability that more than half of the
sample would vote for him?
 What is the probability that less than 20% of the
sample would vote for him?
73
 Discussed probability and nonprobability samples
 Introduced sampling distributions
 Described the sampling distribution of the mean
 For normal populations
 Using the Central Limit Theorem
 Described the sampling distribution of a proportion
 Calculated probabilities using sampling distributions

74

You might also like