Estimation
Estimation
由 NordriDesign 提供
www.nordridesign.com
Course Outline
• Introduction to Biostatistics
• Descriptive Biostatistics
• Probability
• Discrete Probability Distributions
• Continuous Probability Distributions
• Sampling Distributions
• Estimation
• Hypothesis Testing
Lecture Outline
Part I
• In this Chapter, we do the reverse: given one sample, we ask what was the random
system that generated its statistics.
• This shift our mode of thinking from deductive reasoning to inductive reasoning.
Probability/deduction
Population Sample
Statistics/induction
Populations vs. Samples
• The term sample refers to a subset of the population that is selected for analysis.
– In the example the polling company selected a sample of 1,008 voters
Sampling
• If the values of parameters are unknown, we make inferences about them using
sample information.
Types of Inference
Estimation
•Estimating or predicting the value of the parameter
Hypothesis Testing
•Deciding about the value of a parameter based on some preconceived idea.
Example: A consumer wants to estimate the average price of similar homes in his city
before putting his home on the market.
Example: A manufacturer wants to know if a new type of steel is more resistant to high
temperatures than an old type was.
Hypothesis test: Is the new average resistance, µN greater than the old average
resistance, µO?
Types of Inference
Whether you are estimating parameters or testing hypotheses, statistical methods are
important because they provide:
Estimation involves the use of the data in the sample to calculate the corresponding
parameter in the population from which the sample was drawn.
Estimator
• An estimator is a rule, usually a formula, that tells you how to calculate the estimate
based on the sample.
A good estimator must satisfy three conditions in order to predict well the value in the
population:
1. Unbiased: The expected value of the estimator must be equal to the mean of the
parameter.
2. Consistent: The value of the estimator approaches the value of the parameter as the
sample size increases.
3. Relatively Efficient: The estimator for a parameter has the smallest variance of all
estimators which could be used.
Unbiased Estimator
• An estimator whose expected value is the mean of the parameter being estimated.
1. Point Estimates
2. Interval Estimates.
Point Estimate
• Two numbers are calculated to create an interval within which the parameter is
expected to lie. It is constructed so that, with a chosen degree of confidence, the
true unknown parameter will be captured inside the interval.
Confidence Interval
• The point estimate is going to be different from the population parameter because
due to the sampling error, and there is no way to know who close it is to the actual
parameter. For this reason, statisticians like to give an interval estimate which is a
range of values used to estimate the parameter.
• A level of confidence is the probability that the interval estimate will contain the
parameter.
• For large sample sizes, central limit theorem applies which allow us to use normal
distributions.
6. Confidence Interval for the Ratio of Variances of Two Normally Distributed Populations
Confidence Level
The percent of the time the true mean will lie in the interval estimate given.
Point Estimator of Population Mean
x
x i
n
Example
Point Estimation of Population Proportion
• Since an estimator is calculated from sample values, it varies from sample to sample
according to its sampling distribution.
• An estimator is unbiased if the mean of its sampling distribution equals the parameter
of interest. It does not systematically overestimate or underestimate the target
parameter.
• Both sample mean and sample proportion are unbiased estimators of population mean
and proportion. The following sample variance is an unbiased estimator of population
variance.
22
( x x )
ss22 ( xi i x )
nn 11
Properties of Point Estimators
• Of all the unbiased estimators, we prefer the estimator whose sampling distribution
has the smallest spread or variability.
Maximum Error of the Estimate
• The maximum error of the estimate is denoted by E and is one-half the width of the
confidence interval for means and proportions.
• The maximum difference between the point estimate and the actual parameter.
Sampled Population & Target Population
The sampled population is the population from which we actually draw the sample.
The target population is the population about which we wish to make an inference.
The strict validity of statistical procedures depends on the assumption of random samples.
Random Sample
A random sample is a selection of some members of the population such that each
member is independently chosen and has a known nonzero probability of being selected.
Simple Random Sample
A simple random sample is a random sample in which each group member has the
same probability of being selected.
Confidence Intervals
• For a Normal distribution, we know that 95% of values will be within 1.96 Standard
deviations of
• So, given one estimate we can say that this estimate is within 1.96 standard errors
of the actual population mean , with 95% confidence.
• From this we can specify a range of values within which we are 95% confident
that the population mean () lies.
• This is called a confidence interval.
• 95% Confidence Interval for a population mean (from large enough sample):
__
x 1.96 standard error
__
x 1.96
n
• Remarkably, this result holds for samples of size 30 or more. So, a large sample
in this context, is a sample of 30 or more.
The Margin of Error
• From the Central Limit Theorem (CLT), the sampling distributions of x and p̂ will
• For unbiased estimators with normal sampling distributions, 95% of all point
estimates will lie within 1.96 standard deviations of the parameter of interest.
96
11..96 std
stderror
errorof
of the
theestimator
estimator
The 95% Confidence Interval
The 95% Confidence Interval
Components of an Interval Estimate
of the values of the standard normal curve lie within 2 standard deviations of the
mean. The z score in this case is called the reliability coefficient. The real value to
use is 1.96.
Estimating Means and Proportions
Point
Pointestimator
estimatorof
of population
populationmean
meanμμ::xx
Margin
Marginof error((nn
of error 30))::
30 11..96
96
nn
Point
Pointestimator
estimatorofof population proportionpp::p̂p̂
populationproportion x/n
x/n
p̂p̂q̂q̂
Margin
Marginof error::
of error 1.96
1.96 n
n
pq
pq
Assumption : np 5 and nq 5; or 0 p 2 n 11
Assumption : np 5 and nq 5; or 0 p 2
n
Estimating Means and Proportions
Example: A homeowner randomly samples 64 homes similar to her own and finds
that the average selling price is ₦250,000 with a standard deviation of ₦15,000.
Estimate the average selling price for all similar homes in the city.
Solution
Point
Pointestimator of μμ::xx250
estimatorof 250,000
,000
15
15,000
,0003675
Margin of error : 1 .
Margin of error : 1.9696 1 .
1.9696 3675
nn 64
64
Estimating Means and Proportions
Solution
• Create an interval (a, b) so that you are fairly sure that the parameter lies between
these two values.
• “Fairly sure” means “with high probability”, measured using the confidence
coefficient, 1
100(1-)%
100(1-)%Confidence
ConfidenceInterval:
Interval:
Point Estimator
Point Estimator zz SE
SE
Interpretation of A Confidence Interval
• A confidence interval is calculated from one given sample. It either covers or misses
the true parameter. Since the true parameter is unknown, you'll never know which
one is true.
• If independent samples are taken repeatedly from the same population, and a
confidence interval calculated for each sample, then a certain percentage (confidence
level) of the intervals will include the unknown population parameter.
• The confidence level associated with a confidence interval is the success rate of the
confidence interval.
General Expression for an Interval Estimate
Table of confidence coefficients
α 1-α Z
.10 .90 1.645
.05 .95 1.960
.01 .99 2.575
Confidence Intervals for Means and Proportions
Confidence
Confidenceinterval
intervalfor
foraapopulation
populationmean
meanμμ::
xxz
z/ 2/ 2
nn
Confidence
Confidenceinterval
intervalforforaapopulation
populationproportion
proportion pp::
pˆpˆqˆqˆ
pˆpˆzz/ 2/ 2
nn
Confidence Intervals for Means and Proportions
Example: One sample of size 30 from the electronic components yields a sample
mean = 5,873 hours. We know = 3,959, so a 95% confidence interval would be:
__
x 1.96 standard error
__
3959
x 1.96 5873 1.96
n 30
5873 1417 4456 to 7290
Interpretation: we would say that the average lifetime of all components (μ) is between
4,456 and 7,290 hours with 95% confidence.
Solution
35
x 1.96 756 1.96 756 9.70
n 50
Example: Find a 99% confidence interval for µ, the population average daily intake
of dairy products for men.
Solution
35
x 2.58 756 2.58 756 12.77
n 50
The interval must be wider to provide for the increased confidence that is does
indeed enclose the true value of .
Confidence Intervals for Means and Proportions
Solution
•The z value corresponding to a confidence coefficient of .99 is found in the standard
normal distribution table is 2.58. This is our reliability coefficient.
•The standard error is;
We say we are 99 percent confident that the population mean is between 76.3 and 92.3
since, in repeated sampling, 99 percent of all intervals that could be constructed in the
manner just described would include the population mean.
Confidence Intervals for Means and Proportions
Example: Punctuality of patients in keeping appointments is of interest to a
research team. In a study of patient flow through the offices of general
practitioners, it was found that a sample of 35 patients were 17.2 minutes late for
appointments, on the average. Previous research had shown the standard
deviation to be about 8 minutes. The population distribution was felt to be non-
normal. What is the 90 percent confidence interval for the true mean amount of
time late for µ appointments?
Solution
•Since the sample size is fairly large (greater than 30), and since the population standard
deviation is known, we draw on the central limit theorem and assume the sampling
distribution of to be approximately normally distributed.
•From the standard normal distribution table we find the reliability coefficient
corresponding to a confidence coefficient of .90 to be about 1.645, if we interpolate.
•The standard error is;
(4.265, 7.695)
(3.7261, 8.2339)
Example: Of a random sample of n = 150 college students, 104 of the students said
that they had played on a soccer team during their K-12 years. Estimate the
proportion of college students who played soccer in their youth with a 90%
confidence interval.
Solution
pˆ qˆ 104 .69(.31)
pˆ 1.645 1.645
n 150 150
Example: The AB Internet and Country BA Life Project reported in a particular year
that 18 percent of Internet users have used it to search for information regarding
experimental treatments or medicines. The sample consisted of 1220 adult Internet
users, and information was collected from telephone interviews. We wish to
construct a 95 percent confidence interval for the proportion of Internet users in
the sampled population who have searched for information on experimental
treatments or medicines.
Solution
•We shall assume that the 1220 subjects were sampled in random fashion.
•The best point estimate of the population proportion is
•The reliability coefficient corresponding to a confidence level of .95 is 1.96.
•Our estimate of the standard error is,
AArandom
randomsample
sampleofofsize
sizenn11drawn
drawnfrom
from
population with mean μ and variance 2
population 1 with mean μ1 and variance 1 2. .
1
1 1
AArandom
randomsample
sampleofofsize
sizenn22drawn
drawnfrom
from
population mean μ 2 and variance 2
population 2 with mean μ and variance 2. .
2 with 2
2 2
Estimating the Difference between Two Means
Population 1 µ1 σ1 2 σ1
Population 2 µ2 σ2 2 σ2
Sample Standard
Mean Variance
size Deviation
Sample from
n1 s1 2 s1
Population 1
Sample from
n2 s2 2 s2
Population 2
Estimating the Difference between Two Means
• We compare the two averages by making inferences about µ1-µ2, the difference in
• The best estimate of µ1-µ2 is the difference in the two sample means,
xx11 xx22
1.1. The meanofof xx1 1 xx2 2isis1 1 2 2, ,the
Themean thedifference
difference inin
the
thepopulation
populationmeans.
means.
12 2 22 2
2.2. The
Thestandard deviationofof xx1 1 xx2 2isis n 1 n 2. .
standarddeviation
n1 n2
1 2
33. . IfIfthe
thesample
samplesizes
sizes(both
(bothnn1 1and andnn2 2))are
arelarge,
large,the
the
sampling
samplingdistributi onofof xx1 xx2 isisapproximat
distribution 1 2 approximatelyelynormal,
normal,
ss12 2 ss22 2
and
andstandard
standarddeviation
deviationcan
canbe
beestimated SE n 1 n 2. .
estimatedasasSE
n1 n2 1 2
Estimating 1-
For large samples, point estimates and their margin of error as well as confidence
intervals are based on the standard normal (z) distribution.
Point
Pointestimate for11--22::xx11 xx22
estimatefor
ss122 ss222
Margin
Marginof Error::
of Error ZZ/ 2/ 2 1
2
nn1 nn2
1 2
Assumption
Assumption::
Bothnn1 30
Both
1 andnn2 30
30and 2 30
Confidence
Confidenceinterval for11--22::
intervalfor
ss122 ss222
((xx1 xx2 ))z 1
1 2 z/ 2/ 2 2
nn1 nn2
1 2
Estimating 1-
Estimating 1-
Example: Compare the average daily intake of dairy products of men and women
using a 95% confidence interval.
Solution
s12 s22
( x1 x2 ) 1.96
n1 n2
35 30
(756 762) 1.96 6 12.78
50 50
Continue on next slide…
or - 18.78 1 2 6.78.
Estimating 1-
• Could you conclude, based on this confidence interval, that there is a difference in the
average daily intake of dairy products for men and women?
• The confidence interval contains the value µ1- µ2= 0. Therefore, it is possible that µ1
= µYou would not want to conclude that there is a difference in average daily intake
of dairy products for men and women.
Estimating the Difference between Two Proportions
AArandom
randomsample
sampleof
ofsize
sizenn11drawn
drawnfrom
from
binomial
binomialpopulation
population11with
withparameter
parameterpp1. . 1
AArandom
randomsample
sampleof
ofsize
sizenn22drawn
drawnfrom
from
binomial
binomialpopulation
population22with
withparameter
parameter pp2 . . 2
Estimating the Difference between Two Proportions
Sample frompˆ n
x1 pˆ1qˆ1 pˆ1qˆ1
1
1 n1
Population 1
n n
x2 pˆ 2 qˆ2 pˆ 2 qˆ2
Sample from
n2 pˆ 2
Population 2 n2 n n
Estimating the Difference between Two Proportions
We compare the two proportions by making inferences about p1-p2, the difference in the
• The best estimate of p1-p2 is the difference in the two sample proportions,
xx11 xx22
pˆpˆ11 pˆpˆ22
nn11 nn22
Estimating the Difference between Two Proportions
1.1. The
Themean
meanofof pˆpˆ11 pˆpˆ22isis pp11 pp22,,the
thedifference
difference in
in
the
thepopulation
populationproportion
proportions.s.
pp1qq1 pp2 qq2
2.2. The
Thestandard
standarddeviation of pˆpˆ11 pˆpˆ22isis
deviationof 1 1
2 2..
nn1 nn2
1 2
33..IfIf the
thesample
samplesizes
sizes(both
(bothnn11and andnn22))arearelarge,
large,the
the
sampling
samplingdistributi
distribution of pˆpˆ1 pˆpˆ2 isisapproximat
onof 1 2 approximately
ely
normal,
normal,andandstanard
stanarddeviation
deviationcan
canbe
beestimated
estimatedas
as
pˆpˆ1qˆqˆ1 pˆpˆ2 qˆqˆ2
SE
SE 1 1 2 2..
nn1 nn2
1 2
Estimating the Difference between Two Proportions
For large samples, point estimates and their margin of error as well as confidence intervals
are based on the standard normal (z) distribution.
Point
Pointestimate
estimatefor -p22::pˆpˆ11 pˆpˆ22
for pp11-p
pˆpˆ11qˆqˆ11 pˆpˆ22qˆqˆ22
Margin
Marginof Error::
of Error 1.96
1.96
nn11 nn22
Confidence
Confidenceinterval intervalfor for pp11 pp22:: Assumption
Assumption::both bothnn11and
andnn22
pˆpˆ11qˆqˆ11 pˆpˆ22qˆqˆ22 are
aresufficient
sufficientlylylarge
largeso
sothat
that
((pˆpˆ11 pˆpˆ22))
zz/ /22 --11
nn11 nn22 p̂p̂1 --p̂p̂2
1 2 SE
22SE 11
Estimating the Difference between Two Proportions
Example: Compare the proportion of male and female college students who said that they had played on a soccer team during their K-12 years using a 99% confidence interval.
Solution
pˆ 1qˆ1 pˆ 2 qˆ 2
( pˆ 1 pˆ 2 ) 2.58
n1 n2
65 39 .81(.19) .56(.44)
( ) 2.58 .25 .19
80 70 80 70
or .06 p1 p2 .44.
Estimating the Difference between Two Proportions
• Could you conclude, based on this confidence interval, that there is a difference in the
proportion of male and female college students who said that they had played on a
soccer team during their K-12 years?
• The confidence interval does not contains the value p1-p2 = 0. Therefore, it is not
likely that p1= p2You would conclude that there is a difference in the proportions for
To estimate one of four population parameters when the sample sizes are large, use the
following point estimators with the appropriate margins of error.
Large Sample Interval Estimators (Summary)
To estimate one of four population parameters when the sample sizes are large, use the
following interval estimators.
Large Sample Estimation (Summary)
1. All values in the interval are possible values for the unknown population
parameter.
2. Any values outside the interval are unlikely to be the value of the unknown
parameter.
3. To compare two population means or proportions, look for the value 0 in the
confidence interval. If 0 is in the interval, it is possible that the two population
means or proportions are equal, and you should not declare a difference. If 0 is not
in the interval, it is unlikely that the two means or proportions are equal, and you
can confidently declare a difference.
Part II
• When the sample size is small, the estimation procedures for large-samples size are
not appropriate.
• Point estimators remain the same, but the CLT no longer works
• He derived the correct sampling distribution for the mean of samples < 30 and called
it the ‘t distribution’.
• The mathematical details are complicated, but, it turns out that we perform exactly the
same calculations as before, with the one change that the t distribution instead of the
normal distribution is used.
Properties of the t distribution
•There are actually many t distributions, one for each degree of freedom
•As the sample size increases, the t distribution approaches the normal distribution.
•The t-scores can be negative or positive, but the probabilities are always positive.
Degrees of Freedom
• A degree of freedom occurs for every data value which is allowed to vary once a
statistic has been fixed.
• For a single mean, there are n-1 degrees of freedom.
• This value will change depending on the statistic being used.
Properties of the t distribution
Deciding between z and t
• Which one to use depends on the size of the sample, whether it is normally
distributed or not, and whether or not the variance is known.
• There are various flowcharts and decision keys that can be used to help decide.
Assumptions
• Student t’s result only referred to a mean where the distribution of the population was
normally distributed with some mean μ and finite standard deviation σ.
• This is in contrast to the CLT for large samples that required no such assumption
about normality.
• The t-distribution itself is bell shaped and symmetric – just like the normal distribution
but is ‘flatter’.
• The rule used is: for a sample of size n – use the t distribution with degrees of
freedom = n−1
Example: if the sample size is 15, then use a t distribution with degrees of freedom 15
− 1=14.
xx
zz
// nn
xx
isisnot
notnormal
normal
ss// nn
Student’s t Distribution
Fortunately, this statistic does have a sampling distribution that is well known to
statisticians, called the Student’s t distribution, with n-1 degrees of freedom.
xx
tt
ss// nn
We can use this distribution to create estimation procedures for the population mean µ.
a
Using the t-Table
• Table gives the values of t that cut off certain critical values in the right tail of the t
distribution. ta
• Use index df and the appropriate tail area a to find ta, the value of t with area a to its
right.
t.025 = 2.262
Small Sample Confidence Interval for Population Mean
Small
Small--Sample (1--)100%
Sample(1 )100%confidence
confidenceinterval
interval
of
of the
thepopulation meanisis
populationmean
s
xxttα/2 s
α/2
nn
where
where tt/ 2/ 2isisthe
thevalue
valueof ofttthat
thatcuts
cutsoff area/2/2
off area
ininthe
theright
righttail tailof
ofaatt--distributi
distribution withdfdf nn11. .
onwith
11 11 Remember
ss Rememberthe
thethree
22
((xx1 xx2 ))t three
2 t/ 2/ 2
nn11 nn22
1
assumptions:
assumptions:
1.1. Original
Originalpopulations
populationsnormal
normal
2.2. Samples
Samplesrandom
randomand
and
independent
independent
22 ((nn1 11))ss122((nn2 11))ss222
withss
with 1 1 2 2 3.3. Equal
Equalpopulation
populationvariances
variances
nn1 nn2 22 2 2
1 2 1 2
and
andtt/2/2isisthe
thecritical
criticalvaluevalueof of ttwithwith
degrees
degreesof freedom nn1 nn2 22
of freedom 1 2
Estimating the Difference between Two Means
•Let 87 octane fuel be the first group and 90 octane fuel the second group,
•so we have n1 = n2 = 6 and x1 =28.45, s1 1.228, x 2 =30.73, s 2 1.392
•Degree of freedom (df) = n1+n2-2 =10.
•The critical value of t is 2.228.
2 (n(n1 11) )s1s2 2(n(n2 11) )ss22 2 55 .50855
11.508 11.938
.9381.723
ss 2 1 1 2 2
1.723
nn1 nn2 22 10
10
1 2
2
xx1 xx2 tt / 2 ss 2(1(1/ /nn1 11/ /nn2 ) )
1 2 /2 1 2
Solution
n1 = 18
s1 = 9.3
x1 = 4.7
n2 = 10
s2 = 11.5
x2 = 8.8 Continue on next slide…
Estimating the Difference between Two Means
The 95 percent confidence interval for the difference between population means as follows:
LOGO
• REFERENCES