Binomial Distributions For Sample Counts
Binomial Distributions For Sample Counts
Estimation
Estimation A process whereby we select
a random sample from a population and use
a sample statistic to estimate a population
parameter.
Point and Interval Estimation
Point Estimate A sample statistic used to
estimate the exact value of a population
parameter
Confidence interval (interval estimate) A
range of values defined by the confidence level
within which the population parameter is
estimated to fall.
Confidence Level The likelihood, expressed
as a percentage or a probability, that a specified
interval will contain the population parameter.
Apopulation distribution variation in the larger
group that we want to know about.
Adistribution of sample observations
variation in the sample that we can observe.
Asampling distribution a normal distribution
whose mean and standard deviation are unbiased
estimates of the parameters and allows one to infer
the parameters from the statistics.
Inferential Statistics involves
Three Distributions:
What does this Theorem tell us:
Even if a population distribution is skewed, we know that the
sampling distribution of the mean is normally distributed
As the sample size gets larger the mean of the sampling
distribution becomes equal to the population mean
As the sample size gets larger the standard error of the mean
decreases in size (which means that the variability in the sample
estimates from sample to sample decreases as n increases).
It is important to remember that researchers do not
typically conduct repeated samples of the same
population. Instead, they use the knowledge of theoretical
sampling distributions to construct confidence intervals
around estimates.
The Central Limit Theorem
Revisited
Arange of reasonable guesses at a population value,
for example, a mean.
Confidence level = chance that range of guesses
captures the population value.
Most common confidence level is 95%
General Format of a Confidence Interval
estimate +/- margin of error
Accuracy of a mean
A sample of n=36 college women has
mean pulse = 75.3.
The SD of these pulse rates = 8 .
How well does this sample mean estimate
the population mean ?
Standard Error of Mean
SEM = SD of sample / square root of n
SEM = 8 / square root ( 36) = 8 / 6 = 1.33
Margin of error of mean = 2 x SEM
Margin of Error = 2.66 , about 2.7
Interpretation
95% confidence that the sample mean is
within 2.7 (pulse beats) of the population
mean.
A 95% confidence interval for the
population mean
sample mean +/- margin of error
75.3 +/-2.7 ; 72.6 to 78.0
C.I. for mean pulse of men
n=49
sample mean=70.3, SD = 8
SEM = 8 / square root(49) = 1.1
margin of error=2 x 1.1 = 2.2
Interval is 70.3 +/- 2.2
68.1 to 72.5
Do men and women differ in
mean pulse?
C.I. for women is 72.6 to 78.0
C.I. for men is 68.1 to 72.5
No overlap between intervals
We say that population means differ
Confidence Levels:
Confidence Level The likelihood, expressed as a
percentage or a probability, that a specified interval
will contain the population parameter.
95% confidence level there is a .95 probability that
a specified interval DOES contain the population
mean. In other words, there are 5 chances out of 100
(or 1 chance out of 20) that the interval DOES NOT
contains the population mean.
99% confidence level there is 1 chance out of 100
that the interval DOES NOT contain the population
mean.
Constructing a
Confidence Interval (CI)
The sample mean is the point estimate of the
population mean.
The sample standard deviation is the point
estimate of the population standard deviation.
The standard error of the mean makes it
possible to state the probability that an
interval around the point estimate contains
the actual population mean.
Standard error of the mean the standard
deviation of a sampling distribution
n
x
x
o
o = = Standard Error
The Standard Error
n
x
x
o
o =
Since the standard error is generally not known, we
usually work with the estimated standard error:
n
s
s
x
x
=
Estimating standard errors
) (
x
SE Z X CI =
Determining a
Confidence Interval (CI)
) (
n
s
Z X CI
x
=
Given a large enough sample, any confidence interval for the
population mean may be constructed:
Where z is chosen from a standard normal distribution table to
obtain a desired degree of confidence.
Confidence Level Increasing our confidence level
from 95% to 99% means we are less willing to draw
the wrong conclusion we take a 1% risk (rather
than a 5%) that the specified interval does not contain
the true population mean.
If we reduce our risk of being wrong, then we need a
wider range of values . . . Sotheinterval becomes
lessprecise.
) (
n
s
Z X
x
)
155
42 . 2
( 96 . 1 97 . 12
38 . 0 97 . 12 =
So the interval is 12.59 s s 13.35
Interpretation
Informal: Based on our analysis of this
particular sample, we are about 95% confident
that the mean education among all voters in
this town lies between 12.59 and 13.35 years.
Formal: If we took a large number of random
samples, each with 155 cases, and calculated
confidence intervals in this manner for each
sample, about 95% of those confidence
intervals should include the true population
mean .
Estimating the standard error of a proportion based
on the Central Limit Theorem, a sampling distribution of
proportions is approximately normal, with a mean,
t
,
equal to the population proportion, t, and with a standard
error of proportions equal to:
( )( )
n
t t
o
t
=
1
=
Confidence Intervals for Proportions
Determining a Confidence Interval
for a Proportion
( )( )
n
Z SE Z
t t
t t
t
1
) (
=
Large sample confidence intervals for proportions
are found as
Where z is chosen from a table of the standard normal
distribution to give the desired degree of confidence.
Finding an approximate 95% confidence interval for the
proportion favoring school closings.
Sample statistics:
Proportion favoring school closed = 0.431
Number of cases n = 153
Confidence interval for population proportion
t
( )( )
n
Z SE Z
t t
t t
t
1
) (
=
( )( )
153
431 . 0 1 431 . 0
96 . 1 431 . 0
=
078 . 0 431 . 0 =
So the interval is 0.353 s t s 0.509
Interpretation
Informal: Based on our analysis of this one sample we
are about 95% confident that the proportion in favor
of closing schools, among all voters in this town, lies
between 0.353 and 0.509.
Formal: If we took a large number of random
samples, each with 153 cases, and calculated
confidence intervals in this manner for each sample,
about 95% of those confidence intervals should
include the true population proportion t.