0% found this document useful (0 votes)
36 views36 pages

Unit 05 - Sampling Distributions With Solutions - 1 Per Page

This document provides an overview of sampling distributions and the central limit theorem. It discusses how the binomial distribution can be used to model experiments with dichotomous outcomes and fixed sample size. For large samples, the binomial distribution approximates the normal distribution. The mean and variance of the binomial are defined. The sampling distribution of the sample proportion p̂ is also normally distributed for large n, with mean equal to the population proportion p and variance inversely related to n. Examples are provided to illustrate these concepts.

Uploaded by

Kase1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views36 pages

Unit 05 - Sampling Distributions With Solutions - 1 Per Page

This document provides an overview of sampling distributions and the central limit theorem. It discusses how the binomial distribution can be used to model experiments with dichotomous outcomes and fixed sample size. For large samples, the binomial distribution approximates the normal distribution. The mean and variance of the binomial are defined. The sampling distribution of the sample proportion p̂ is also normally distributed for large n, with mean equal to the population proportion p and variance inversely related to n. Examples are provided to illustrate these concepts.

Uploaded by

Kase1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Unit 5: Sampling Distributions

Chapter 5 in IPS
Unit 5 Outline: Sampling Distributions

• The Binomial Probability Distribution


• Normal Approximation to the Binomial
X
• pˆ  and its sampling distribution
n
• Law of Large Numbers and the Central Limit Theorem
• Sampling Distribution of X

2 2
The Binomial Distribution

• When sampling n subjects randomly (independently) from a population


with a dichotomous (two level: like yes/no) characteristic,
• X will be total number with characteristic labeled `success’
• p̂ = X/n will be the sample proportion of subjects labeled `success’
• Main example is yes/no answers to polls

• Think Coin Flips

3 3
The Binomial Probability Model
• Prototype for many simple experiments and surveys
• Characterized by 4 properties
1) Fixed number (n) of observations, or `trials’.
2) The n trial are all independent of each other
3) Each trial has two possible outcomes: `success’ or `failure’.
4) The probability (denoted by p) of success at each trial is constant.
• Let X = the total number of successes in the n independent trials. X has
a binomial distribution with parameters n and p.
• The possible values of the binomial random variable X are 0, 1, 2, …,n
• The probability that X = k, where k = 0,1,2,..,n is given by
n k
P ( X  k )    p (1  p ) n  k
k 
n n!
  
 k  k!(n  k )! n! n(n  1)(n  2)    (1)
4 4
5 5
Combinatorics
n n!
• So what does    mean?
 k  k!(n  k )!
• It counts, out of a total of n individuals, the number of
ways to select k individuals to form one group and (n – k)
individuals to form the other group (only two options)

• Or in the context of counting # of heads in n total coin flips,


k flips are heads, and (n – k) flips are tails.

• Simple example: n = 4, k = 2

6 6
The Binomial Distribution: Example

• If a couple are both carriers of a certain disease their child has


probability 0.25 of being born with the disease. Suppose that a
couple has 4 children:
 What is the probability that none of their children have the
disease?
 What is the probability that at least two children have the
disease?
• Let’s use the formula to calculate the answers, and the
table to check both calculations

7 7
Mean and Variance of a Binomial R.V.
Let X ~ B(n, p). Then the mean, variance and standard deviation of
X are:
 X  np
  np (1  p )
2 Derivations pg 320 in IPS;
X you will not be asked to
reproduce the derivation
 X  np (1  p )

Example: What is the mean and standard deviation of the number of


children who will have the disease (from previous example)?
How are the mean and standard deviation interpreted here?

Based on these calculations and the shape of the binomial probability


histogram, we can then assert...
8 8
Normal approximation to the distribution of a
binomial variable
• Draw a Simple Random Sample (SRS) of size n from a
population having proportion p of successes. Let X be the
number of successes.

• The Binomial distribution is awkward to use when n is large, but


for large n the binomial distribution looks very much like the
normal distribution

• When n is large:

X ~ N   np,   np (1  p ) 
• The approximation is accurate as long as both
np ≥ 10 and n(1 – p) ≥ 10.

9 9
X1 ~ Bin(n = 4, p = 0.5) X2 ~ Bin(n = 50, p = 0.5)
.5

.1
.4
.3

Density
Density

.05
.2
.1

0
0

0 1 2 3 4 10 20 30 40
bin4_half bin50_half

.15
.3
.2

.1
Density

Density

.05
.1
0

0 2 4 6 8 10 0 10 20 30
bin10_quarter bin50_quarter

X3 ~ Bin(n = 10, p = 0.25) X4 ~ Bin(n = 50, p = 0.25)


10 10
Example (similar to Example 5.21, IPS p325)‫‏‬

A worker inspects a simple random


sample of 100 switches from a large
shipment. 10% of the switches in
shipments are usually bad switches.
What is the probability that the
number of bad switches is at least
17?

11 11
Approximating the count of bad
switches with a Normal Density

12 12
Solution

 X  np 17  np 
P ( X  17)  P  
 np (1  p )  
 np (1 p ) 
 17  100 ( 0 . 10 ) 
 P Z  
 
 100 ( 0 . 10 )( 0 . 90 ) 
 7
 P Z  
 3
 Z  2.33  0.0099

Using the exact binomial distribution:


P(X ≥ 17) = [P(X = 17) + P(X = 18) + … + P(X = 100)]
13 = 0.0206
13
Sampling Distribution of a
Sample Proportion ( pˆ ).
Setup…
 For a two-level characteristic (success/failure) in a
population which has a true proportion of success, p
 We will take a one sample of size n from the population,
compute sample proportion
X
pˆ 
n
 Sampling distribution of the random variable p̂ is the
theoretical distribution of values if we could look at sample
proportions in all possible random samples of size n.
• Then…

14 14
Sampling Distribution of a Sample Proportion

• Since p̂ is just a linear transformation of X (which is a


binomial random variable) we can easily calculate that:
  pˆ  p

p (1  p )
  pˆ 
n
• Also, if n is large and p is not `too close’ to 0 or 1, then the
sampling distribution of p̂ is…
 [Approximately] Normal Distributed

Again, under the conditions: np ≥ 10 and n(1 – p) ≥ 10


15 15
Interpretation, and a return to some examples

• The main implications of the result


 The sampling distribution of p̂ is centered over the
true population proportion, p.
 Note the formula of the standard deviation of p̂ .
What happens as n increases?
 The standard deviation also depends on the unknown
parameter p.
• Summarized in the graphics from IPS, Section 3.4, 5.1

16 16
From IPS

17 17
More generally (graphic from IPS, Section 5.2)…

18 18
Gallup Poll: Supreme Court Healthcare Ruling
• “Americans are sharply divided over Thursday's Supreme Court
decision on the 2010 healthcare law, with 46% both agree and
disagree with the Healthcare ruling.”

• “Results are based on telephone interviews with 1,012 national


adults, aged 18 and older, conducted June 26-28, 2012, as part
of Gallup Poll Daily tracking. For results based on the total
sample of national adults, one can say with 95% confidence
that the maximum margin of sampling error is ±4 percentage
points.

• “In addition to sampling error, question wording and practical


difficulties in conducting surveys can introduce error or bias
into the findings of public opinion polls.”
19 19
Gallup polls
• What supports the claim: Results are based on telephone
interviews with 1,012 national adults, aged 18 and older,
conducted June 26-28, 2012, as part of Gallup Poll Daily
tracking. For results based on the total sample of national
adults, one can say with 95% confidence that the maximum
margin of sampling error is ±4 percentage points.

• With n = 1,012 the standard deviation (called the standard


error) of the sample proportion will be:

 pˆ 
p (1  p )
1012

p (1  p )
31.81
 0.0314  p (1  p ) 
20 20
Gallup Poll, continued…

p (1  p ) is at its largest when p = ½. Why? Look


at the graph of f ( x)  x(1  x) over
the range of x [0,1].

So the largest value of  p̂  pˆ 


p (1  p )
1012
 0.0314  p (1  p ) 
will be
 
0.0314 ( 1 2 )( 1 2 )  0.0157

Where do we find 95% of the observations of a normal distribution?


We will return to this idea when we study confidence intervals.

21 21
Compact (and technical) summary…
Suppose that X~ B(n,p) then

X is approximately distributed as


N  X  np,  X  np (1  p ) 
p̂ is approximately distributed as
 n 
N   pˆ  p,  pˆ  
 p (1  p ) 
Reminder: the approximation is a good one provided that
np  10 and n(1 – p)  10.
22 22
An old practice problem (with some new parts)
Ultrasound is often used to determine the sex of an unborn baby. However, because the
procedure relies on visual detection of anatomic differences between male and female
babies, the error rates differ according to whether the baby is a boy or a girl.
 Pr(Ultrasound predicts male | baby is male) = 75%
 Pr(Ultrasound predicts female | baby is female) = 90%

(a) Consider an individual woman who comes to the clinic for an ultrasound to predict her
baby’s sex. What is the probability that the ultrasound gives the wrong result?
(Answer = 0.175)

(b) Suppose a clinic performs 10 ultrasounds a day. What is the probability


of two or more incorrect sex determinations?

(a) What is the probability that a baby is male, given that the ultrasound predicts a male?

(a) Suppose a clinic performs 100 ultrasounds in a month. What is the


probability that 25 or more ultrasounds make incorrect determinations?

23 23
Tables of binomial probability distributions

24 24
Solutions
Let B = {baby is a boy}, BC = {baby is a girl}
A = {ultrasound predicts boy}, AC = {ultrasound predicts girl}
(a) P(Wrong result )  P( A and B C )  P( AC and B)
 P ( A | B C ) P ( B C )  P ( AC | B ) P ( B )
 0.10(0.50)  0.25(0.50)  0.175

(b) P(at least 2 out of 10 are wrong predictions)


 1  P (0 or 1 wrong predictions)  1  P (10 or 9 correct predictions)
 
 1  (0.825)10  10(0.175)1 (0.825) 9  1  0.146  0.310  0.544
(c) P(boy | ultrasound predicts boy )  P( B | A)
 P ( A | B ) P ( B ) /[ P ( A | B ) P ( B )  P ( A | B C ) P ( B C )]
 [0.75(0.50)] /[0.75(0.50)  0.10(0.50)]  0.375 / 0.425  0.882

(d) Let X ~ Bin(n  100, p  0.175)


then approximately X ~ N (   17.5,   100(0.175)(0.825)  3.80)
So, P (at least 25 wrong calls out of 100)
 X  17.5 25  17.5 
 P ( X  25)  P    P ( Z  1.97)  0.0244
 3.80 3.80  25 25
Law of Large Numbers:
Sampling Results for x.
• When thinking about the average of a population, the notation often
used is
 population mean: 
 sample mean: x

• The Law of Large Numbers


 If one draws independent samples from a population with (finite)
mean , then as the number of observations increases, the
sample mean eventually becomes arbitrarily close to (and stays
close to) the population mean.
 We can summarize the specific results much like we did for
sample proportions…

26 26
The Sampling Distribution
for the Sample Mean
Notation:
E(x)    E( X )  is the population mean of individuals
(also written E(X))

 SD ( X )  is the population standard deviation


SD ( x )   (also written SD(X)).
n n

Interpretation: The average of the


It is also known that the sampling distribution of the sample
distribution of the sample mean mean is the population mean, and the
will be approximately normally standard deviation of the sample mean
distributed. More on this in a is the population standard deviation
few slides. divided by sq. root of sample size
27 27
Sampling Distribution of the sample mean

E(x)    E( X )

 SD ( X )
SD ( x )  
n n
28 28
Compact (and technical) summary…
Arbitrary Random E( x )    E( X )
Variables
 SD( X )
SD( x )  
n n

Distribution of the Sample mean

Binomial Random Variables X is approximately distributed as



N  X  np,  X  np(1  p) 

p̂ is approximately distributed as
 n 
N   pˆ  p,  pˆ  
 p(1  p) 

March 2,
A typical problem…
It is known that math SAT scores in the entire US population (in
2007) are approximately normal with an average of 515 with a
standard deviation of 100.

.004
(a) What is the probability a

.003
randomly selected SAT-taker
scores above 550 on math?
Density

.002
.001

(b) What is the probability that the


average SAT score for 20
people in a random sample is
0

above 550? 0 200 400


SAT_math
600 800 1000

30 30
Solution

(a) What is the probability a randomly selected SAT-taker scores above 550
on math?

Let X = SAT math score of a random test-taker. Then:

 X   X 550  515 
P ( X  550)  P    P ( Z  0.35)  0.363
 X 100 

(b) What is the probability that the average SAT score for 20 people in
random sample is above 550?

Let X = mean SAT math score of a random sample of 20 test-takers. Then:

X  X 550  515
P ( X  550)  P (  )  P ( Z  1.57)  0.058
X 100 / 20
31 31
Two central ideas: IPS, p 302-303

32 32
Central Limit Theorem
• What this means is, no matter what the underlying
distribution of the individual observations of X is, if you
take a large enough sample, the sampling distribution of
the random variable X will be:

Normally
Distributed!!!
33 33
Example Problem

It is known that [Personal Per Capita] Income in


Massachusetts has a mean of $60,000 and standard
deviation of $20,000.

(a) If we select one individual out of the entire Massachusetts population,


what is the probability of selecting someone whose income is greater
than $70,000?

(a) If we select a random sample of 25 individuals from the Mass


population, what is the probability that the average income in your
sample will be greater than $70,000?

34 34
Example Problem
(a) If we select one individual out of the entire Massachusetts population,
what is the probability of selecting someone whose income is greater
than $70,000?

Great question…we could try to use the Normal Distribution here, but the
calculation would not be very accurate. Why not? Its likely a very right-
skewed distribution (and the CLT does not take hold here since n = 1).

(b) If we select a random sample of 25 individuals from the Mass


population, what is the probability that the average income in your
sample will be greater than $70,000?

This we can compute. Let X be the random variable for the mean income
of a random sample of 25 people. Then:

X  X 70,000  60,000
P ( X  70,000)  P (  )  P ( Z  2.5)  0.0062
X 20,000 / 25

35
Recap of important ideas…
• Random variables are an abstract way to describe the numerical outcome
of an experiment, survey, or study
 they come in 2 forms: discrete and continuous
• Probability distributions (example: binomial model) are used to describe
the distribution of random variables
• Probability distribution models also apply to summary statistics, such as
the sample proportion and sample mean
 Probability model of a summary statistic often called its sampling
distribution
• The sampling distributions of most summary statistics
 Have an expected value (mean) that is identical to the population
parameter of interest (population mean, population proportion)
 Have a standard deviation that decreases as the sample size
increases (by Law of Large Numbers)
• Normal distribution can be used to approximate the sampling
distribution of a sample mean (by the CLT) no matter how the
individuals are distributed in the population!!!
36 36

You might also like