Unit 05 - Sampling Distributions With Solutions - 1 Per Page
Unit 05 - Sampling Distributions With Solutions - 1 Per Page
Chapter 5 in IPS
Unit 5 Outline: Sampling Distributions
2 2
The Binomial Distribution
3 3
The Binomial Probability Model
• Prototype for many simple experiments and surveys
• Characterized by 4 properties
1) Fixed number (n) of observations, or `trials’.
2) The n trial are all independent of each other
3) Each trial has two possible outcomes: `success’ or `failure’.
4) The probability (denoted by p) of success at each trial is constant.
• Let X = the total number of successes in the n independent trials. X has
a binomial distribution with parameters n and p.
• The possible values of the binomial random variable X are 0, 1, 2, …,n
• The probability that X = k, where k = 0,1,2,..,n is given by
n k
P ( X k ) p (1 p ) n k
k
n n!
k k!(n k )! n! n(n 1)(n 2) (1)
4 4
5 5
Combinatorics
n n!
• So what does mean?
k k!(n k )!
• It counts, out of a total of n individuals, the number of
ways to select k individuals to form one group and (n – k)
individuals to form the other group (only two options)
• Simple example: n = 4, k = 2
6 6
The Binomial Distribution: Example
7 7
Mean and Variance of a Binomial R.V.
Let X ~ B(n, p). Then the mean, variance and standard deviation of
X are:
X np
np (1 p )
2 Derivations pg 320 in IPS;
X you will not be asked to
reproduce the derivation
X np (1 p )
• When n is large:
X ~ N np, np (1 p )
• The approximation is accurate as long as both
np ≥ 10 and n(1 – p) ≥ 10.
9 9
X1 ~ Bin(n = 4, p = 0.5) X2 ~ Bin(n = 50, p = 0.5)
.5
.1
.4
.3
Density
Density
.05
.2
.1
0
0
0 1 2 3 4 10 20 30 40
bin4_half bin50_half
.15
.3
.2
.1
Density
Density
.05
.1
0
0 2 4 6 8 10 0 10 20 30
bin10_quarter bin50_quarter
11 11
Approximating the count of bad
switches with a Normal Density
12 12
Solution
X np 17 np
P ( X 17) P
np (1 p )
np (1 p )
17 100 ( 0 . 10 )
P Z
100 ( 0 . 10 )( 0 . 90 )
7
P Z
3
Z 2.33 0.0099
14 14
Sampling Distribution of a Sample Proportion
p (1 p )
pˆ
n
• Also, if n is large and p is not `too close’ to 0 or 1, then the
sampling distribution of p̂ is…
[Approximately] Normal Distributed
16 16
From IPS
17 17
More generally (graphic from IPS, Section 5.2)…
18 18
Gallup Poll: Supreme Court Healthcare Ruling
• “Americans are sharply divided over Thursday's Supreme Court
decision on the 2010 healthcare law, with 46% both agree and
disagree with the Healthcare ruling.”
pˆ
p (1 p )
1012
p (1 p )
31.81
0.0314 p (1 p )
20 20
Gallup Poll, continued…
21 21
Compact (and technical) summary…
Suppose that X~ B(n,p) then
X is approximately distributed as
N X np, X np (1 p )
p̂ is approximately distributed as
n
N pˆ p, pˆ
p (1 p )
Reminder: the approximation is a good one provided that
np 10 and n(1 – p) 10.
22 22
An old practice problem (with some new parts)
Ultrasound is often used to determine the sex of an unborn baby. However, because the
procedure relies on visual detection of anatomic differences between male and female
babies, the error rates differ according to whether the baby is a boy or a girl.
Pr(Ultrasound predicts male | baby is male) = 75%
Pr(Ultrasound predicts female | baby is female) = 90%
(a) Consider an individual woman who comes to the clinic for an ultrasound to predict her
baby’s sex. What is the probability that the ultrasound gives the wrong result?
(Answer = 0.175)
(a) What is the probability that a baby is male, given that the ultrasound predicts a male?
23 23
Tables of binomial probability distributions
24 24
Solutions
Let B = {baby is a boy}, BC = {baby is a girl}
A = {ultrasound predicts boy}, AC = {ultrasound predicts girl}
(a) P(Wrong result ) P( A and B C ) P( AC and B)
P ( A | B C ) P ( B C ) P ( AC | B ) P ( B )
0.10(0.50) 0.25(0.50) 0.175
26 26
The Sampling Distribution
for the Sample Mean
Notation:
E(x) E( X ) is the population mean of individuals
(also written E(X))
E(x) E( X )
SD ( X )
SD ( x )
n n
28 28
Compact (and technical) summary…
Arbitrary Random E( x ) E( X )
Variables
SD( X )
SD( x )
n n
p̂ is approximately distributed as
n
N pˆ p, pˆ
p(1 p)
March 2,
A typical problem…
It is known that math SAT scores in the entire US population (in
2007) are approximately normal with an average of 515 with a
standard deviation of 100.
.004
(a) What is the probability a
.003
randomly selected SAT-taker
scores above 550 on math?
Density
.002
.001
30 30
Solution
(a) What is the probability a randomly selected SAT-taker scores above 550
on math?
X X 550 515
P ( X 550) P P ( Z 0.35) 0.363
X 100
(b) What is the probability that the average SAT score for 20 people in
random sample is above 550?
X X 550 515
P ( X 550) P ( ) P ( Z 1.57) 0.058
X 100 / 20
31 31
Two central ideas: IPS, p 302-303
32 32
Central Limit Theorem
• What this means is, no matter what the underlying
distribution of the individual observations of X is, if you
take a large enough sample, the sampling distribution of
the random variable X will be:
Normally
Distributed!!!
33 33
Example Problem
34 34
Example Problem
(a) If we select one individual out of the entire Massachusetts population,
what is the probability of selecting someone whose income is greater
than $70,000?
Great question…we could try to use the Normal Distribution here, but the
calculation would not be very accurate. Why not? Its likely a very right-
skewed distribution (and the CLT does not take hold here since n = 1).
This we can compute. Let X be the random variable for the mean income
of a random sample of 25 people. Then:
X X 70,000 60,000
P ( X 70,000) P ( ) P ( Z 2.5) 0.0062
X 20,000 / 25
35
Recap of important ideas…
• Random variables are an abstract way to describe the numerical outcome
of an experiment, survey, or study
they come in 2 forms: discrete and continuous
• Probability distributions (example: binomial model) are used to describe
the distribution of random variables
• Probability distribution models also apply to summary statistics, such as
the sample proportion and sample mean
Probability model of a summary statistic often called its sampling
distribution
• The sampling distributions of most summary statistics
Have an expected value (mean) that is identical to the population
parameter of interest (population mean, population proportion)
Have a standard deviation that decreases as the sample size
increases (by Law of Large Numbers)
• Normal distribution can be used to approximate the sampling
distribution of a sample mean (by the CLT) no matter how the
individuals are distributed in the population!!!
36 36