4.4 Sampling Distribution Models and The Central Limit Theorem

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 35

4.

4 Sampling Distribution
Models and the Central Limit
Theorem
Transition from Data Analysis and
Probability to Statistics
Sampling Distributions
 Population parameter: a numerical descriptive
measure of a population.
(for example: p (a population proportion);
the numerical value of a population parameter
is usually not known)
Example: mean height of all NCSU students
p=proportion of Raleigh residents who favor
stricter gun control laws
 Sample statistic: a numerical descriptive
measure calculated from sample data.
(e.g, x, s, p (sample proportion))
Parameters; Statistics
 In real life parameters of populations are
unknown and unknowable.
– For example, the mean height of US adult
(18+) men is unknown and unknowable
 Rather than investigating the whole
population, we take a sample, calculate a
statistic related to the parameter of
interest, and make an inference.
 The sampling distribution of the statistic
is the tool that tells us how close the value
of the statistic is to the unknown value of
the parameter.
DEF: Sampling Distribution
 The sampling distribution of a sample
statistic calculated from a sample of n
measurements is the probability
distribution of values taken by the
statistic in all possible samples of size n
taken from the same population.
 Based on all possible samples of size n
 In some cases the sampling distribution can
be determined exactly.
 In other cases it must be approximated by
using a computer to draw some of the
possible samples of size n and drawing a
histogram.
2
Pop size = 5, n = 2, # of poss samples: 5  25
8
Pop size: 6; n = 8; # of poss. samples: 6 
1,679,616
Pop size: 500,000, n = 10; # of samples:
10
500,000
Sampling distribution of p, the
sample proportion; an example
 If a coin is fair the probability of a head on
any toss of the coin is p = 0.5.
 Imagine tossing this fair coin 5 times and
calculating the proportion p of the 5 tosses
that result in heads (note that p = x/5, where x
is the number of heads in 5 tosses).
 Objective: determine the sampling
distribution of p, the proportion of heads in 5
tosses of a fair coin.
Sampling distribution of p (cont.)
Step 1: The possible values of p are 0/5=0,
1/5=.2, 2/5=.4, 3/5=.6, 4/5=.8, 5/5=1
 Binomial
Probabilities p 0 .2 .4 .6 .8 1
p(x) for n=5,
P(p) .03125 .15625 .3125 .3125 .15625 .03125
p = 0.5
x p(x)
0 0.03125
1 0.15625 The above table is the probability distribution of
2 0.3125
p, the proportion of heads in 5 tosses of a fair
3 0.3125
4 0.15625
coin.
5 0.03125
Sampling distribution of p (cont.)
p 0 .2 .4 .6 .8 1
P(p) .03125 .15625 .3125 .3125 .15625 .03125

 E(p) =0*.03125+ 0.2*.15625+ 0.4*.3125


+0.6*.3125+ 0.8*.15625+ 1*.03125 = 0.5 = p
(the prob of heads)
Var(p) = (0  .5)  .03125  (.2  .5)  .15625  (.4  .5)  .3125
2 2 2

2
 (.6  .5)  .3125  (.8  .5)  .15625  (1  .5)  .03125
2 2

 .05

 So SD(p) = sqrt(.05) = .2236


 NOTE THAT SD(p) = pq  .5  .5  .5  .2236
n 5 5
Expected Value and Standard
Deviation of the Sampling
Distribution of p
 E(p) = p

pq
 SD(p) =
n

where p is the “success” probability in the


sampled population and n is the sample
size
Shape of Sampling Distribution of
p
 The sampling distribution of p is
approximately normal when the sample
size n is large enough. n large enough
means np>=10 and nq>=10
Example
 8% of American Caucasian male
population is color blind.
 Use computer to simulate random
samples of size n = 1000
Histogram of phat's from Simulated Samples (2000
independent samples, each of size n=1000 men)
400
# of Samples

300
200
100
0

phat
The sampling distribution model for a
sample proportion p
Provided that the sampled values are independent and the
sample size n is large enough, the sampling distribution of
p is modeled by a normal distribution with E(p) = p and
pq
standard deviation SD(p) = n , that is
 pq 
pˆ ~ N  p, 
 n 
where q = 1 – p and where n large enough means np>=10
and nq>=10
The Central Limit Theorem will be a formal statement of
this fact.
Example: binge drinking by
college students
 Study by Harvard School of Public Health:
44% of college students binge drink.
 244 college students surveyed; 36% admitted
to binge drinking in the past week
 Assume the 44% given in the study; compute
the probability that in a sample of 244
students, 36% or less have engaged in binge
drinking.
Example: binge drinking by
college students (cont.)
 Let p be the proportion in a sample of 244
that engage in binge drinking.
 We want to compute P ( pˆ  .36)
pq .44 *.56
  .032
 E(p) = p = .44; SD(p) = n 244
 Since np = 244*.44 = 107.36 and nq =
244*.56 = 136.64 are both greater than 10,
we can model the sampling distribution of p
with a normal distribution, so …
Example: binge drinking by
college students (cont.)
pˆ ~ N (.44,.032)

 pˆ  .44 .36  .44 


So P ( pˆ  .36)  P   
 .032 .032 
 P ( z  2.5)  .0062
Another Population Parameter of
Frequent Interest: the Population
Mean µ
 To estimate the unknown value of
µ, the sample mean x is often used.
 We need to examine the Sampling
Distribution of the Sample Mean x
(the probability distribution of all
possible values of x based on a
sample of size n).
Example
 Professor Stickler has a large statistics class of
over 300 students. He asked them the ages of
their cars and obtained the following probability
distribution:
x 2 3 4 5 6 7 8
p(x) 1/14 1/14 2/14 2/14 2/14 3/14 3/14
 SRS n=2 is to be drawn from pop.
 Find the sampling distribution of the
sample mean x for samples of size n = 2.
Solution
 Total of 72=49 possible samples of size 2

 All 49 possible samples with the


corresponding sample mean are on p. 35
Solution (cont.)
 Probability distribution of x:
x 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
p(x) 1/196 2/196 5/196 8/196 12/196 18/196 24/196 26/196 28/196 24/196 21/196 18/196 1/196

 This is the sampling distribution of x because it


specifies the probability associated with each
possible value of x
 From the sampling distribution above

P(4  x  6) = p(4)+p(4.5)+p(5)+p(5.5)+p(6)
= 12/196 + 18/196 + 24/196 + 26/196 + 28/196 = 108/196
Expected Value and Standard
Deviation of the Sampling
Distribution of x
Example (cont.)
 Population probability dist.
x 2 3 4 5 6 7 8
p(x) 1/14 1/14 2/14 2/14 2/14 3/14 3/14

 Sampling dist. of x
x 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

p(x) 1/196 2/196 5/196 8/196 12/196 18/196 24/196 26/196 28/196 24/196 21/196 18/196 1/196
Population probability dist.
x 2 3 4 5 6 7 8
p(x) 1/14 1/14 2/14 2/14 2/14 3/14 3/14
E(X)=2(1/14)+3(1/14)+4(2/14)+ … +8(3/14)=5.714
Sampling dist. of x Population mean E(X)= = 5.714
x 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
p(x) 1/196 2/196 5/196 8/196 12/196 18/196 24/196 26/196 28/196 24/196 21/196 18/196
1/196

E(X)=2(1/196)+2.5(2/196)+3(5/196)+3.5(8/196)+4(12/196)+4.5(18/196)+5(24/196
)
+5.5(26/196)+6(28/196)+6.5(24/196)+7(21/196)+7.5(18/196)+8(1/196) = 5.714

Mean of sampling distribution of x: E(X) = 5.714


Example (cont.)
Population:
  E ( x)  2( 141 )  3( 141 )  4( 142 )    8  143   5.714
Var ( X )   2  3.4898
Sampling dist. of x:
E ( x )  2( 196
1
)  2.5( 196
2
)    8( 196
9
)  5.714
3.4898 Var ( X )
Var ( X ) 1.7449  
2 2
SD(X)=SD(X)/2 =/2
IMPORTANT

Note that
E ( X )  E ( X )  5.714
and
Var ( X )  Var ( X )
n  3.4898
2  1.7449 (Recall that n = 2)
Sampling Distribution of the
Sample Mean X: Example
 An example
– A die is thrown infinitely many times. Let X
represent the number of spots showing on
any throw.
– The probability distribution E(X) = 1(1/6) +2(1/6) +
3(1/6) +……… = 3.5
of X is
x 1 2 3 4 5 6 V(X) = (1-3.5)2(1/6)+
p(x) 1/6 1/6 1/6 1/6 1/6 1/6 (2-3.5)2(1/6)+ ………
………. = 2.92
 Suppose we want to estimate  from the mean of a
sample of size n = 2.
x
 What is the sampling distribution of in this
situation? x

Sample
Sample Mean Sample
Mean Sample Mean Sample
Mean Sample Mean
Mean
11 1,1
1,1 11 13
13 3,1
3,1 22 25
25 5,1
5,1 33
22 1,2
1,2 1.5
1.5 14
14 3,2
3,2 2.5
2.5 26
26 5,2
5,2 3.5
3.5
33 1,3
1,3 22 15
15 3,3
3,3 33 27
27 5,3
5,3 44
44 1,4
1,4 2.5
2.5 16
16 3,4
3,4 3.5
3.5 28
28 5,4
5,4 4.5
4.5
55 1,5
1,5 33 17
17 3,5
3,5 44 29
29 5,5
5,5 55
66 1,6
1,6 3.5
3.5 18
18 3,6
3,6 4.5
4.5 30
30 5,6
5,6 5.5
5.5
77 2,1
2,1 1.5
1.5 19
19 4,1
4,1 2.5
2.5 31
31 6,1
6,1 3.5
3.5
88 2,2
2,2 22 20
20 4,2
4,2 33 32
32 6,2
6,2 44
99 2,3
2,3 2.5
2.5 21
21 4,3
4,3 3.5
3.5 33
33 6,3
6,3 4.5
4.5
10
10 2,4
2,4 33 22
22 4,4
4,4 44 34
34 6,4
6,4 55
11
11 2,5
2,5 3.5
3.5 23
23 4,5
4,5 4.5
4.5 35
35 6,5
6,5 5.5
5.5
12
12 2,6
2,6 44 24
24 4,6
4,6 55 36
36 6,6
6,6 66
Sample
Sample Mean Sample
Mean Sample Mean Sample
Mean Sample Mean
Mean
11 1,1
1,1 11 13
13 3,1
3,1 22 25
25 5,1
5,1 33
22 1,2
1,2 1.5
1.5 14
14 3,2 2.5
3,2 2.5 26
26 5,2
5,2 3.5
3.5
33 1,3
1,3 22 15
15 3,3
3,3 33 27
27 5,3
5,3 44
44 1,4
1,4 2.5
2.5 16
16 3,4 3.5
3,4 3.5 28
28 5,4
5,4 4.5
4.5
55 1,5
1,5 33 17
17 Var ( X )
3,5
3,5 44 29
29 5,5
5,5 55
Note : E ( X )  E ( X ) and Var ( X ) 
66
77
1,6
1,6
2,1
2,1
3.5
3.5
1.5
1.5
18
18
19
19
3,6 4.5
3,6 4.5
4,1 2.5
4,1 2.5
30
30
31
31
5,6
5,6
6,1
6,1
5.5
5.5
3.5
3.5
88
99
2,2
2,2
2,3
2,3
22
2.5
2.5
20
20
21
21
4,2
4,2 2 33
4,3 3.5
4,3 3.5
32
32
33
33
6,2
6,2
6,3
6,3
44
4.5
4.5
10
10 2,4
2,4 33 22
22 4,4
4,4 44 34
34 6,4
6,4 55
11
11 2,5
2,5 3.5
3.5 23
23 4,5 4.5
4,5 4.5 35
35 6,5
6,5 5.5
5.5
12
12 2,6
2,6 44 24
24 4,6
4,6 55 36
36 6,6
6,6 66

E( x) =1.0(1/36)+
1.5(2/36)+….=3.5
6/36
5/36
V(X) = (1.0-3.5)2(1/36)+
4/36 (1.5-3.5)2(2/36)... = 1.46
3/36
2/36
1/36
1 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 x
n5
E ( X )  3.5
n  10
Var ( X )
Var ( X )  .5833 (  ) E ( X )  3.5
5
n  25
Var ( X )
Var ( X )  .2917 (  ) E ( X )  3.5
10
Var ( X )
Var ( X )  .1167 (  )
25
1 6

Notice thatVar ( X ) is smaller 1 6


than Var(X). The larger the sample
size the smaller is Var ( X ) . Therefore,
x tends to fall closer to, as the
sample size increases. 1 6
The variance of the sample mean is smaller

than the variance of the population.


Mean = 1.5 Mean = 2. Mean = 2.5

Population 1 1.5
1.5 22 2.5
2.5 3
1.5 22 2.5
1.5
1.5 2 2.5
2.5
Let us take samples Compare1.5 the variability
2 of the
2.5population
1.5
to the variability
1.5 of 22the sample
2.5
2.5 mean.
of two observations
1.5 2 2.5
1.5
1.5 2 2.5
2.5
1.5 2 2.5
1.5 2 2.5

Also,
Expected value of the population = (1 + 2 + 3)/3 = 2
Expected value of the sample mean = (1.5 + 2 + 2.5)/3 = 2
Properties of the Sampling
Distribution of x
1. E ( x )  
(the expected value of the sampling distribution
of x = the expected value  of the sampled population)
SD ( x) 
2. SD( x )  
n n
where  is the standard deviation of the
population from which the sample is taken and n is
the sample size.
Unbiased

Confidence
Precision

The central tendency is down the center


BUS 350 - Topic 6.1 Handout 6.1, Page 1 6.1 - 14
Unbiased
Biased Biased

µ µ µ

The central tendency is down the center


BUS 350 - Topic 6.1 Handout 6.1, Page 2 6.1 - 15
Consequences
1. E ( x )  . This is why we use x to estimate an
unknown population mean . The sampling
dist. of x is "centered" at the parameter we are
trying to estimate.
2. SD( x )  SD (nx ) ; the standard deviation of x is
smaller than SD( x), the stand. dev. of the pop-
ulation from which the sample is taken. The
values of x will cluster tightly around 
when n is large.
We Know More!
 We know 2 parameters of the sampling
distribution of x :
E(x)  μ
SD(x)
SD(x) 
n
The Central Limit Theorem tells
us about the shape of the distribution of x
when the sample size n is sufficiently large.

You might also like