CHAPTER 5 - Sampling Distributions Sections: 5.1 & 5.2: Assumptions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

CHAPTER 5 Sampling Distributions

Sections: 5.1 & 5.2


Introduction
In this chapter we focus our attention on the sampling distribution of the sample mean and the
sampling distribution of the sample proportion. These sampling distributions will give us the tools that
we need to derive statistical methods that will allow us to estimate the value for the following
population parameters:
1. Population mean ()
2. Population proportion (p)
The foundation of our work in all of the chapters that follow is based primarily on the following fact:
Information that we seek about a particular population is also embedded in any random sample that we
draw from this population.
The Binomial Distribution
Assumptions:
A random experiment with only two possible outcomes, success or failure, is repeated for a given
number of times, n.
Let p be the probability that the outcome of the experiment is a success (this implies that the probability
of a failure is 1 - p) and that the outcomes of successive trials are independent of each other.
Let X be the number of successes when the experiment is repeated n times (n trials).
X is a discrete random variable with possible values:
S = {0, 1, 2, 3, 4, ..., n}.
The probability distribution that the random variable X follows is called the Binomial Distribution. The
probability that X will take a value k (one of the numbers 0, 1, 2, 3, 4, ..., n) can be computed by using
the following formula.
|X = ] =
n!
k!(n-k)!
p
k
( -p)
n-k

Tables with binomial probabilities are also available for values of n for up to 15 or 20 and for the most
common values of p. In the textbook for this class, this table is found in the Tables section at the end of
the book (Table C).
Mean and Standard Deviation for a Binomial Distribution
The mean (mu) and the standard deviation (sigma) of any binomial random variable can be found by
using the following formulas:
= np
o = np( -p)
Example:
A basketball player takes 12 shots at the basket. On each throw he makes a basket with probability 0.7
and he misses the basket with probability 0.3. Let X be the number of times that the basketball player
misses the basket.
1. Draw the probability distribution of X. Is this a symmetric distribution or is it a skewed
distribution?
2. What is the mean and what is the standard deviation of the number of baskets that the player
misses?
3. What is the probability that the basketball player will make more than 8 baskets?
Solution:
1. By using Table C, we construct the following table, which describes the probability distribution of X,
the number of baskets the players misses in 12 attempts.


2. = np = . = ., o = (. )( -.) = .9

3. P{Player makes more than 8 baskets} = P{player misses less than 4 }
|X < ] = |X ] = .9

Normal Approximation to Binomial Probabilities
If X is a binomial distribution with parameters n and p, then it can be shown that

This formula gives satisfactory approximations for large values of n, and values of p not near 0 or 1 (say,
0.05 < p < 0.95).
In general the approximations are good if both np and n(1-p) are at least 10. The addition and
subtraction of 1/2 is called the continuity correction.
EXERCISES
1. A new vaccine was tested on 100 persons to determine its effectiveness. The drug company claims
that the vaccine is 80% effective, find the probabilities that:
a. less than 74 people will develop immunity
b. between 74 and 85 people, inclusive, will develop immunity.
Answers:
a. 0.0668
b. 0.8276
Note: These answers were derived without using the continuity correction.
2. When a certain seed is planted, the probability that it will sprout is 0.1. If 1000 seeds are planted,
find the approximate probability that:
a. more than 130 seeds will sprout
b. between ninety and ninety-five seeds, inclusive, will sprout.
Answers:
a. 0.0008
b. 0.1512
Sampling Distributions

Parameter
A parameter is number that can be used to describes a population as a whole.
Statistic
A statistic is a number derived from a sample drawn from a specific population.
In statistical practice, the value of a population parameter is not known. A statistic is used to estimate a
parameter.
Example:
A telemarketing firm in L.A. uses a device that dials residential telephone numbers in that city at
random. Of the first 100 numbers dialed, 48% are unlisted. This is not surprising because 52% of all L.A.
residential phones are unlisted. Here 48% is a statistic. 52% is a parameter.
Sampling Distribution (of a statistic)
The distribution of the values taken by a statistic in all possible samples of the same size from the same
population is called the sampling distribution of the statistic.
Unbiased statistics
A statistic that is used to estimate a parameter is said to be unbiased if the mean of its sampling
distribution equals the value of the population parameter being estimated.
Two of the population parameters that we are interested in are:
The population proportion (p)
The population mean ()
The sample proportion p =
X
n
and sample mean X

=
_ x
i
n
i=1
n
are unbiased statistics for the
population proportion p and population mean , respectively.


Sampling Distribution of the Sample Proportion

Example
The probability that a person is left-
hat) be the proportion of left-handed people in the sample
varies in different SRSs of size n = 200.
Because p =
X
n
(for this example
the number of left-handed people in the sample (we call X the count),
deviation of p-hat can be found using the rules for means and variances.
The count X follows the binomial distribution with parameters
and standard deviation o
X
= np
Using the rules for means and variances from section 4.4 we get:
p
p
=
p
X
n
=
np
n
= p
o
p
2
= (
1
n
)
2
o
X
2
=
1
n
2
np
Or
o
p
= _
p(1-p)
n

Sampling Distribution of the Sample Proportion
-handed is approximately 0.10.Take a SRS of size n=
handed people in the sample (the sample proportion). The
varies in different SRSs of size n = 200.
(for this example p-hat = X/200), where X is the random variable whose value equals
handed people in the sample (we call X the count), the mean and standard
hat can be found using the rules for means and variances.
inomial distribution with parameters n & p with mean p
X
= np
np( - p) .
means and variances from section 4.4 we get:
np( -p) =
p(1-p)
n


200 and let p (p-
. The value of p-hat
, where X is the random variable whose value equals
the mean and standard
np

In our case, the mean and standard deviation for the sample proportion of left
sample of 200 are:
p
p
= . (or 10%)
o
p
= _
(.10)(.90)
200
= .

Because the distribution of the count X is approximately
hat is also approximately (p, _
The approximate distribution of the sample proportion of left
below.



In our case, the mean and standard deviation for the sample proportion of left-handed people in a
. (oi appiox %)
the distribution of the count X is approximately (np, np( - p) ), the distribution of p
_
p(1-p)
n
)
The approximate distribution of the sample proportion of left-handed people in a SRS of 200 is shown
= .10
= .02
handed people in a
, the distribution of p-
handed people in a SRS of 200 is shown



Sampling Distribution of the Sample Mean

Example
The distribution of the sum X that shows up when a pair of fair dice is tossed
Take a SRS of size n = 9 from this population
a pair of dice 9 times and record the sum
Number Generation from Data Analysis to simulate the toss of a pair of dice.


0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
2 3 4 5
Distribution of Sum X when a pair of dice is
tossed ( = 7 ,
Sampling Distribution of the Sample Mean
of the sum X that shows up when a pair of fair dice is tossed is shown below.
from this population and compute its sample mean . You can do t
record the sum that shows up each time. In Excel you can use
Number Generation from Data Analysis to simulate the toss of a pair of dice.
5 6 7 8 9 10 11 12
Distribution of Sum X when a pair of dice is
tossed ( = 7 , = 2.415)

shown below.

. You can do this by tossing
In Excel you can use Random
12
The table below shows 10 samples of size n = 9 generated using Excel. The last row is the average for the
9 tosses of the dice for each sample.

Notice that the value of the sample mean varies from sample to sample.
In general, the sample mean X

in a SRS of size n from the same population is a random variable.



THE MEAN

AND STANDARD DEVIATION o

OF THE RANDOM VARIABLE

=
_ X
I
n
I=1
n
=
X
1
+ X
2
+ X
3
+ .+ X
n
n
=

n
(X
1
+ X
2
+ X
3
+ .+ X
n
)
Applying the rules for means and variances from section 4.4 we get:

n
(n -
X
) =
X

o
X

2
= (
1
n
)
2
- (n o
X
2
) =
1
n
o
X
2

Or
o
X

=
c
X
Vn


SHAPE OF THE DISTRIBUTION OF THE SAMPLE MEAN


Based on a well know result in Statistics called the Central Limit Theorem, the shape of the distribution
of X

is approximately normal.

According to these results, the distribution of the sample mean
approximately normal and has mean


EXAMPLES:
1. In the long run, annual real returns on common stocks have varied with mean 9% and standard
deviation 28%. You plan to retire in 45 years and you are considering investing in stocks. What is the
approximate probability (assuming market conditions do not ch
years) that the mean annual return on your investment over the next 45 years will:
(a) exceed 15%
(b) be lower than 5%
Answers:
a. 0.0753
b. 0.1690
2. According to government data, 21% of American children under the age of six live in households
with incomes less than the official poverty level. A study of learning in early childhood chooses a
random sample of 300 children. Find the approximate probabili
the sample selected come from households with incomes less than the official poverty level.
Answer: 0.0080
distribution of the sample mean X

in 9 tosses of a pair of dice


has mean
X

= and standard deviation o


X

=
2.415
V9
=
2.
In the long run, annual real returns on common stocks have varied with mean 9% and standard
deviation 28%. You plan to retire in 45 years and you are considering investing in stocks. What is the
approximate probability (assuming market conditions do not change dramatically in the next 45
years) that the mean annual return on your investment over the next 45 years will:
According to government data, 21% of American children under the age of six live in households
with incomes less than the official poverty level. A study of learning in early childhood chooses a
random sample of 300 children. Find the approximate probability that at least 80 of the children in
the sample selected come from households with incomes less than the official poverty level.
in 9 tosses of a pair of dice is
.415
3
= .8

In the long run, annual real returns on common stocks have varied with mean 9% and standard
deviation 28%. You plan to retire in 45 years and you are considering investing in stocks. What is the
ange dramatically in the next 45
years) that the mean annual return on your investment over the next 45 years will:
According to government data, 21% of American children under the age of six live in households
with incomes less than the official poverty level. A study of learning in early childhood chooses a
ty that at least 80 of the children in
the sample selected come from households with incomes less than the official poverty level.
= 7
= .805

You might also like