5-6.sampling Error and Confidence Interval
5-6.sampling Error and Confidence Interval
5-6.sampling Error and Confidence Interval
抽样误差与置信区间
Haomin Yang
School of Public Health
Fujian Medical University
Content
Sampling error and Sampling distribution
Central limit theorem
Standard error
t distribution
Point estimation
Confidence Interval estimation
2
2
Normal distribution
X
X ~ N ( , )
2
Z
3
Critical value
Z
4
The relationship between the population and sample
Population
(The complete set) inference
sampling Sample
(The subset of
the population)
5
Sampling error
S
X SX
n n
10
t distribution
X X
Z ╳ t SX
S
X SX n
11
It was developed by William Sealy Gosset under the pseudonym Student.
12
t distribution
t value obeys t distribution with
the degree of freedom v. Standard normal distribution
X
t ,t ~t ( ), =n 1
SX
14
The properties of t distribution:
15
The properties of t distribution:
16
Area under t distribution curve
t ,
P (t t ,v ) ,P (t t ,v )
17
Area under t distribution curve
20
21
When ν =9 and one side probability α =0.05,
how much is t , ? t , =1.833
A t-curve never:
Numerical variable
Statistical description Nominal variable
Parameter estimation
Statistical inference Hypothesis testing
35
Parameter
Given a model, the parameters are the numbers that
yield the actual distribution.
In the real world often you don’t know the “true” parameters,
but you get to observe data. Next up, we will explore how we
can use data to estimate the model parameters.
Statistics as Estimators
39
Point estimation
40
Point estimation
Point estimation represents our best
“determination” of the parameter.
However, it does not express the uncertainty in
the estimation.
Point estimation does not consider sampling
errors.
41
A best estimator
1. Method of Moments
2. Maximum Likelihood
3. Bayesian
Let M1, M2,... be independent random variables having a
common distribution possessing a mean µ. Then the sample
means converge to the distributional mean as the number of
observations increase.
Interval estimation
47
Confidence interval (CI)
48
Confidence interval (CI)
1-α CI denoted by (A, B).
CI is open interval.
In practice, 1-α is usually 90%, 95%
and 99%.
49
Methods to calculate the confidence interval
50
Method of normal distribution
2
X ~ N ( , )
n
X
Z ,Z ~N (0,1) α/2 1-α α/2
X
P ( Z 2 Z Z 2 ) 1
-Z 2
0 Z 2
X
P(Z 2 Z 2 ) 1
X
P(Z 2 X X Z 2 X ) 1
P( X Z 2 X X Z 2 X ) 1
( X Z 2 X , X Z 2 X ) ( X Z 2 S X , X Z 2 S X )
51
Method of normal distribution
52
example
53
answer
54
Method of t distribution
55
Method of t distribution
2
X ~ N ( , )
n
X
t ,t ~t (n 1)
SX
-t t
P(t 2,v t t 2,v ) 1 2, 2,
X ( X t 2,v S X , X t 2,v S X )
P(t 2,v t 2,v ) 1
SX
P(t 2,v S X X t 2,v S X ) 1
P( X t 2,v S X X t 2,v S X ) 1
56
Example
57
Answer
59
How to interpret the 95% CI
60
How to interpret the 95% CI
61
How to interpret the 95% CI
62
How to interpret the 95% CI
63
How to interpret the 95% CI
However, most statisticians often describe
confidence intervals in this way: the value of 0.95
is really the probability that the limits calculated
from a random sample will include the population
value. For 95% of the calculated confidence
intervals it will be true to say that the population
mean, μ, lies within this interval.
64
How to interpret the 95% CI
65
Exercise
What is meant by the term “90% confident” when
constructing a confidence interval for a mean?
A. If we took repeated samples, approximately 90% of the
samples would produce the same confidence interval.
B. If we took repeated samples, approximately 90% of the
confidence intervals calculated from those samples would
contain the sample mean.
C. If we took repeated samples, approximately 90% of the
confidence intervals calculated from those samples would
contain the true value of the population mean.
D. If we took repeated samples, the sample mean would equal
the population mean in approximately 90% of the samples
Exercise
Among various ethnic groups, the standard deviation of heights is
known to be approximately three inches. We wish to construct a
95% confidence interval for the mean height of male Swedes.
Forty-eight male Swedes are surveyed. The sample mean is 71
inches. The sample standard deviation is 2.8 inches.
1.x¯=________
2.σ =________
3.n=________
2.In words, define the random variables X and X¯
.
3.Which distribution should you use for this problem? Explain your
choice.
4.Construct a 95% confidence interval for the population mean
height of male Swedes.
1.State the confidence interval..
5.What will happen to the level of confidence obtained if 1,000
male Swedes are surveyed instead of 48? Why?
71
3
48
X is the height of a Swedish male, and x is the mean height
from a sample of 48 Swedish males.
Normal. We know the standard deviation for the population,
and the sample size is greater than 30.
CI: (70.15, 71.85)
The confidence interval will decrease in size, because the
sample size increased. Recall, when all factors remain
unchanged, an increase in sample size decreases variability.
Thus, we do not need as large an interval to capture the true
population mean.
Exercise
A pharmaceutical company makes tranquilizers. It is assumed that the
distribution for the length of time they last is approximately normal.
Researchers in a hospital used the drug on a random sample of nine patients.
The effective period of the tranquilizer for each patient (in hours) was as
follows: 2.7; 2.8; 3.0; 2.3; 2.3; 2.2; 2.8; 2.1; and 2.4.
1.x¯= __________
2.sx= __________
3.n= __________
4.n–1= __________
2.Define the random variable Xin words.
3.Define the random variable X¯in words.
4.Which distribution should you use for this problem? Explain your choice.
5.Construct a 95% confidence interval for the population mean length of
time.
1.State the confidence interval.
2.Sketch the graph.
3.Calculate the error bound.
6.What does it mean to be “95% confident” in this problem?
1.x¯=2.51
2.sx=0.318
3.n=9
4.n−1=8
2.the effective length of time for a tranquilizer
3.the mean effective length of time of tranquilizers from a sample
of nine patients
4.We need to use a Student’s-t distribution, because we do not
know the population standard deviation.
1.CI:(2.27,2.76)
2.Check student's solution.
3.EBM:0.25
6.If we were to sample many groups of nine patients, 95% of the
samples would contain the true population mean length of time.
72
Exercise
The average height of young adult males has a
normal distribution with standard deviation of 2.5
inches. You want to estimate the mean height of
students at your college or university to within one
inch with 93% confidence. How many male students
must you measure?
EBM= error bound for the mean