Statistical Inference and Hypothesis Testing
Statistical Inference and Hypothesis Testing
Statistical Inference and Hypothesis Testing
6.1-1
Mean
RANDOM SAMPLE, n = 64
x is called a
point estimate
of
=
n
= 0.75
64
6.1-2
xd
= 0.75 mos
x+d
X
X
~ N(0, 1). Therefore,
/ n
+d
d
= 0.95
P
Z
/ n
/ n
For future
reference, call this
equation .
0.95
0.025
z.025 = 1.960
Hence,
0.025
Z
0
1.960 = z.025
+d
= z.025 d = z.025
= (1.96)(0.75 months) = 1.47 months.
/ n
n
95% margin of error
6.1-3
, x + z.025
x z.025
n
n
Example:
x + 1.47
X
Sample
Mean x
95% CI
26.0 mos
24.53 26 27.47
X
.
.
.
.
.
27.0 mos
.
.
.
.
.
25.53 27 28.47
X
.
.
.
.
.
X
X
etc.
Interpretation: Based on Sample 1, the true mean of the new treatment population is
between 24.53 and 27.47 months, with 95% confidence. Based on Sample 2, the true
mean is between 25.53 and 28.47 months, with 95% confidence, etc. The ratio of
# CIs that contain
0.95, as more and more samples are chosen, i.e., The probability
Total # CIs
that a random CI contains the population mean is equal to 0.95. In practice however, the
common (but technically incorrect) interpretation is that the probability that a fixed CI
(such as the ones found above) contains is 95%. In reality, the parameter is
constant; once calculated, a single fixed confidence interval either contains it or not.
6.1-4
For any significance level (and hence confidence level 1 ), we similarly define the
z /2 , x + z /2
n
n
where z/2 is the critical value that divides the area under the standard normal
distribution N(0, 1) as shown. Recall that for = 0.10, 0.05, 0.01 (i.e., 1 = 0.90, 0.95,
0.99), the corresponding critical values are z.05 = 1.645, z.025 = 1.960, and z.005 = 2.576,
respectively. The quantity z/2
/2
z/2
/2
z/2
x
Exercise: Why is it not realistic to ask for a 100% confidence interval (i.e., certainty)?
Exercise: Calculate the 90% and 99% confidence intervals for Samples 1 and 2 in the
preceding example, and compare with the 95% confidence intervals.
6.1-5
Objective 2a: Hypothesis Testing ~ How does this new treatment compare with a
control treatment? In particular, how can we use a confidence interval to decide this?
STANDARD POPULATION = Cancer patients on standard drug treatment
Random Variable: X = Survival time (months)
Suppose X is known to have mean = 25 months.
Population Distribution of X
= 6
25
How does this compare with the mean of the study population?
Technical Notes: Although this is drawn as a bell curve, we dont really care how the
variable X is distributed in this population, as long as it is normally distributed in the study
population of interest, an assumption we will learn how to check later, from the data.
Likewise, we dont really care about the value of the standard deviation of this
population, only of the study population. However, in the absence of other information, it is
sometimes assumed (not altogether unreasonably) that the two are at least comparable in
value. And if this is indeed a standard treatment, it has presumably been around for a while
and given to many patients, during which time much data has been collected, and thus very
accurate parameter estimates have been calculated. Nevertheless, for the vast majority of
studies, it is still relatively uncommon that this is the case; in practice, very little if any
information is known about any population standard deviation . In lieu of this value
then, is usually well-estimated by the sample standard deviation s with little change,
if the sample is sufficiently large, but small samples present special problems. These
issues will be dealt with later; for now, we will simply assume that the value of is known.
6.1-6
Hence, let us consider the situation where, before any sampling is done, it is actually the
experimenters intention to see if there is a statistically significant difference between the
unknown mean survival time of the new treatment population, and the known mean
survival time of 25 months of the standard treatment population. (See page 1-1!) That is,
the sample data will be used to determine whether or not to reject the formal
Null Hypothesis H0: = 25
versus the
Two-sided Alternative
Either < 25 or > 25
Null Distribution
X ~ N(25, 0.75)
X
24.53 25 26
25 25.53 27
27.47
28.47
Two-sided Alternative
Either < 0 or > 0
Decision Rule: If the (1 ) 100% confidence interval contains the value 0, then the
difference is not statistically significant; accept the null hypothesis at the level of
significance. If it does not contain the value 0, then the difference is statistically
significant; reject the null hypothesis in favor of the alternative at the significance level.
6.1-7
Objective 2b: Calculate which sample mean values x will lead to rejecting or not
rejecting (i.e., accepting or retaining) the null hypothesis.
From equation above, and the calculated margin of error = 1.47, we have
P( 1.47 X + 1.47) = 0.95 .
Now, IF the null hypothesis : = 25 is indeed true, then substituting this value gives
P(23.53 X 26.47) = 0.95 .
Interpretation: If the mean survival time x
of a random sample of n = 64 patients is
between 23.53 and 26.47, then the difference
from 25 is not statistically significant (at
the = .05 significance level), and we retain
the null hypothesis. However, if x is either
less than 23.53, or greater than 26.47, then
the difference from 25 will be statistically
significant (at = .05), and we reject the
null hypothesis in favor of the alternative.
More specifically, if the former, then the
result is significantly lower than the standard
treatment average (i.e., new treatment is
detrimental!); if the latter, then the result is
0.025 significantly higher than the standard
treatment average (i.e., new treatment is
beneficial).
Null Distribution
X ~ N(25, 0.75)
0.95
0.025
23.53
Rejection
Region
In general
= 25
26 26.47 27
Acceptance
Region for H0
Rejection
Region
(Sample 1)
(Sample 2)
z /2 , 0 + z /2
n
n
Decision Rule: If the (1 ) 100% acceptance region contains the value x , then the
difference is not statistically significant; accept the null hypothesis at the significance
level. If it does not contain the value x , then the difference is statistically significant; reject
the null hypothesis in favor of the alternative at the significance level.
6.1-8
- Confidence Level -
= 0
P(Accept H0 | H0 true) = 1
1
/2
/2
H0: = 0
Rejection
Region
Type I Error
Acceptance
Region for H0
Rejection
Region
Null Distribution
X ~ N(0, / n )
Alternative Distribution
X ~ N(1, / n )
Likewise,
= 1
P(Accept H0 | H0 false) =
Type II Error
H0: = 0
HA: = 1
6.1-9
Objective 2c: How probable is my experimental result, if the null hypothesis is true?
Consider a sample mean value x . Again assuming that the null hypothesis : = 0 is
indeed true, calculate the p-value of the sample = the probability that any random sample
mean is this far away or farther, in the direction of the alternative hypothesis. That is, how
significant is the decision about H0, at level ?
Test Statistic
Z =
X 0
~ N(0, 1)
/ n
0.95
0.95
0.0912
0.0912
0.025
0.025
0.025
0.025
0.0038
0.0038
23.53
= 25
23.53
26 26.47
= 25
26.47 27
= 2 P( X 26)
= 2 PZ
= 2 P( X 27)
26 25
0.75
= 2 P(Z 1.333)
= 2 0.0912
= 0.1824 > 0.05 =
= 2 PZ
27 25
0.75
= 2 P(Z 2.667)
= 2 0.0038
= 0.0076 < 0.05 =
Decision Rule: If the p-value of the sample is greater than the significance level , then the
difference is not statistically significant; accept the null hypothesis at this level. If the
p-value is less than , then the difference is statistically significant; reject the null
hypothesis in favor of the alternative at this level.
Guide to statistical significance of p-values for = .05:
Reject
H0
0 p .001
p .005
p .05
p .01
extremely strong
strong
moderate
borderline
.10 p 1
not significant
Accept
H0
6.1-10
Summary of findings: Even though the data from both samples suggest a generally
longer mean survival time among the new treatment population over the standard
treatment population, the formal conclusions and interpretations are different. Based on
Sample 1 patients ( x = 26), the difference between the mean survival time of the study
population, and the mean survival time of 25 months of the standard population, is not
statistically significant, and may in fact simply be due to random chance. Based on
Sample 2 patients ( x = 27) however, the difference between the mean age of the study
population, and the mean age of 25 months of the standard population, is indeed
statistically significant, on the longer side. Here, the increased survival times serve as
empirical evidence of a genuine, beneficial treatment effect of the new drug.
Comment: For the sake of argument, suppose that a third sample of patients is selected,
and to the experimenters surprise, the sample mean survival time is calculated to be only
x = 23 months. Note that the p-value of this sample is the same as Sample 2, with x = 27
months, namely, 0.0076 < 0.05 = . Therefore, as far as inference is concerned, the
formal conclusion is the same, namely, reject H0: = 25 months. However, the practical
interpretation is very different! While we do have statistical significance as before, these
patients survived considerably shorter than the standard average, i.e., the treatment had an
unexpected effect of decreasing survival times, rather than increasing them. (This kind of
unanticipated result is more common than you might think, especially with investigational
drugs, which is one reason for formal hypothesis testing, before drawing a conclusion.)
higher
easier to reject,
less conservative
If p-value < ,
then reject H0;
significance!
= .05
lower
harder to reject,
more conservative
://www.african-caribbean-ents.com
6.1-11
Modification: Consider now the (unlikely?) situation where the experimenter knows that
the new drug will not result in a mean survival time that is significantly less than 25
months, and would specifically like to determine if there is a statistically significant
increase. That is, he/she formulates the following one-sided null hypothesis to be rejected,
and complementary alternative:
Null Hypothesis H0: 25
versus the
Right-tailed Alternative
6
than the null-value of 0 = 25, plus the one-sided margin of error = z
= z.05
=
n
64
(1.645)(0.75) = 1.234, hence 26.234 . Note that replaces /2 here!
0.95
0.95
0.0912
0.05
0.05
0.0038
= 25 26 26.234
= 25 26.234 27
= P(Z 2.667)
= 0.0038 < 0.05 =
(fairly strong rejection)
Note that these one-sided p-values are exactly half of their corresponding two-sided
p-values found above, potentially making the null hypothesis easier to reject. However,
there are subtleties that arise in one-sided tests that do not arise in two-sided tests
6.1-12
Consider again the third sample of patients, whose sample mean is unexpectedly calculated
to be only x = 23 months. Unlike the previous two samples, this evidence is in strong
agreement with the null hypothesis H0: 25 that the mean survival time is 25 months
or less. This is confirmed by the p-value of the sample, whose definition (recall above) is
the probability that any random sample mean is this far away or farther, in the direction
of the alternative hypothesis which, in this case, is the right-sided HA: > 25. Hence,
p-value = P( X 23) = P(Z 2.667) = 1 0.0038 = 0.9962 >> 0.05 =
which, as just observed informally, indicates a strong acceptance of the null hypothesis.
0.9962
0.95
0.05
23
= 25 26.234
Exercise: What is the one-sided p-value if the sample mean x = 24 mos? Conclusions?
A word of caution: One-sided tests are less conservative than two-sided tests, and should be
used sparingly, especially when it is a priori unknown if the mean response is likely to be
significantly larger or smaller than the null-value 0, e.g., testing the effect of a new drug. More
appropriate to use when it can be clearly assumed from the circumstances that the conclusion
would only be of practical significance if is either higher or lower (but not both) than some
tolerance or threshold level 0, e.g., toxicity testing, where only higher levels are of concern.
SUMMARY: To test any null hypothesis for one mean , via the p-value of a sample...
Step I: Draw a picture of a bell curve, centered at the null value 0.
Step II: Calculate your sample mean x , and plot it on the horizontal X axis.
Step III: From x , find the area(s) in the direction(s) of H A (<, >, or both tails) , by first
transforming x to a z-score, and using the z-table. This is your p-value. SEE NEXT PAGE!
Step IV: Compare p with the significance level . If <, reject H0. If >, retain H0.
Step V: Interpret your conclusion in the context of the given situation!
6.1-13
x 0
/ n
.
standard error
(illustrated)
z
N(0, 1)
(illustrated)
z
Example:
(unsafe), at = .05 . Assume N ( , ) , with = 1.6 ppb. A sample of n = 64 readings that average to
x = 10.1 ppb would have a z-score = 0.1 / 0.2 = 0.5, which corresponds to a p-value = 1 0.69146 = 0.30854
> .05, hence not significant; toxicity has not been formally shown. (Unsafe levels are x 10.33 ppb. Why?)
H A : 0 (2-sided)
N(0, 1)
p/2
p/2
z
N(0, 1)
p/2
STEP 3.
p/2
z
If the p-value is less than (= .05, usually), then REJECT NULL HYPOTHESIS
EXPERIMENTAL RESULT IS STATISTICALLY SIGNIFICANT AT THIS LEVEL!
If the p-value is greater than (= .05, usually), then RETAIN NULL HYPOTHESIS
EXPERIMENTAL RESULT IS NOT STATISTICALLY SIGNIFICANT AT THIS LEVEL!
STEP 4. IMPORTANT - Interpret results in context. (Note: For many, this is the hardest step of all!)
6.1-14
from H0
Test Statistic
z-score = x 0
HA: < 0
HA: > 0
HA: 0?
1 table entry
table entry
sign of
z-score?
2 (table entry)
2 (1 table entry)
6.1-15
Null
Distribution
X ~ N(25, 0.75)
.95
Acceptance Region
.025
for H0: = 25
Rejection Region
1.47
23.53
+1.47
25
.025
Rejection Region
26.47
Alternative
Distribution
X ~ N(28, 0.75)
Z
0
0
|
25
28 0.75 z
28
28 + 0.75 z
6.1-16
General Formulation:
Procurement of drug samples for testing purposes, or patient recruitment for clinical trials,
can be extremely time-consuming and expensive. How to determine the minimum sample
size n required to reject the null hypothesis H0: = 0, in favor of an alternative value
HA: = 1, with a desired power 1 , at a specified significance level ? (And
conversely, how to determine the power 1 for a given sample size n, as above?)
Reject H0
H0 true
H0 false
Type I error,
probability =
(significance level)
probability = 1
(power)
Type II error,
Accept H0 probability = 1
(confidence level)
probability =
(1 power)
That is,
and
Null Distribution
X ~ N 0 ,
Alternative Distribution
X ~ N 1 ,
/2
0 z/2
/2
0 + z/2
1 z
6.1-17
1 z
0 + z/2
where =
|1 0|
Note:
Remember
that, as we defined it,
z is always 0, and
has area to its right.
1
Comments:
This formula corresponds to a two-sided hypothesis test. For a one-sided test, simply
replace /2 by . Recall that if = .05, then z.025 = 1.960 and z.05 = 1.645.
If is not known, then it can be replaced above by s, the sample standard deviation,
provided the resulting sample size turns out to be n 30, to be consistent with CLT.
However, if the result is n < 30, then add 2 to compensate. [Modified from: Lachin,
J. M. (1981), Introduction to sample size determination and power analysis for
clinical trials. Controlled Clinical Trials, 2(2), 93-113.]
What affects sample size, and how? With all other values being equal
It is easier to
distinguish
these two
distributions
from each
other...
6.1-18
|28 25|
= 0.5, and
6
1.960 + 1.282 2
n =
= 42.04,
0.5
n =
0.5
so
= 51.98,
n 43 patients.
so
n 52 patients.
= 116.96,
0.333
n =
so
Then =
|27 25|
= 0.333,
6
n 117 patients.
0.1
0.125
0.15
0.175
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.6
0.7
0.8
0.9
1.0
80%
785
503
349
257
197
126
88
65
50
39
32
24
19
15
12
10
85%
898
575
400
294
225
144
100
74
57
45
36
27
21
17
14
11
90%
1051
673
467
344
263
169
117
86
66
52
43
30
24
19
15
13
95%
1300
832
578
425
325
208
145
107
82
65
52
37
29
23
19
15
99%
1838
1176
817
600
460
294
205
150
115
91
74
52
38
31
25
21
6.1-19
n = 100
n = 30
n = 20
= .025
= |1 0| /
= 1.0
= 0.4
= 0.3
Question:
Why is power
not equal to 0
if = 0?
n = 10
= 0.2
= 0.1
= 0.0
6.1-20
Comments:
Due to time and/or budget constraints for example, a study may end before optimal
sample size is reached. Given the current value of n, the corresponding power can then
be determined by the graph above, or computed exactly via the following formula.
Power = 1 = P(Z z/2 + n)
z-score
| 28 25 |
Example: As in the original study, let = .05, =
= 0.5, and n = 64. Then the
6
z-score = 1.96 + 0.5 64 = 2.04, so power = 1 = P(Z 2.04) = 0.9793, or 98% .
The probability of committing a Type 2 error = = 0.0207, or 2%. See page 6.1-15.
N(0, 1)
0.9793
0.0207
Z
2.04
Exercise: How much power exists if the sample size is n = 25? 16? 9? 4? 1?
Generally, a minimum of 80% power is acceptable for reporting purposes.
Note: Larger sample size longer study time longer wait for results. In clinical
trials and other medical studies, formal protocols exist for early study termination.
Also, to achieve a target sample size, practical issues must be considered (e.g., parking,
meals, bed space,). Moreover, may have to recruit many more individuals due to
eventual censoring (e.g., move-aways, noncompliance,) or death. $$$$$$$ issues
Research proposals must have power and sample size calculations in their methods
section, in order to receive institutional approval, support, and eventual journal
publication.
6.1-21
, for any n.
Test statistic
known:
Z =
X
~ N(0, 1).
/ n
Recall:
s.e. = / n
unknown, n 30: Z =
X
~ N(0, 1) approximately
s/ n
X
~ tn1 Note: Can use for n 30 as well.
s/ n
=s/ n
s.e.
1 z/2
e
2
n
2
1
t 2 n/2
tn1: fn(t) =
1 +
n 1
(n 1) n 1
2
tn1, /2 z/2
z/2 tn1, /2
6.1-22
Example: Again recall that in our study, the variable X = survival time was assumed to
be normally distributed among cancer patients, with = 6 months. The null hypothesis
H0: = 25 months was tested with a random sample of n = 64 patients; a sample mean of
x = 27.0 months was shown to be statistically significant (p = .0076), i.e., sufficient
evidence to reject the null hypothesis, suggesting a genuine difference, at the = .05 level.
Now suppose that is unknown and, like , must also be estimated from sample data.
Further suppose that the sample size is small, say n = 25 patients, with which to test the
same null hypothesis H0: = 25, versus the two-sided alternative HA: 25, at the = .05
significance level. Imagine that a sample mean x = 27.4 months, and a sample standard
deviation s = 6.25 months, are obtained. The greater mean survival time appears promising.
However
s.e.
s
6.25 mos
=
= 1.25 months
n
25
critical value
Therefore,
= t24, .025 = 2.064
Margin of Error = (2.064)(1.25 mos)
t24
= 2.58 months
0.95
.025
.025
2.064
2.064
So
95% Confidence Interval for = (27.4 2.58, 27.4 + 2.58) = (24.82, 29.98) months,
which does contain the null value = 25 Accept H0 No significance shown!
95% Acceptance Region for H0 = (25 2.58, 25 + 2.58) = (22.42, 27.58) months,
which does contain the sample mean x = 27.4 Accept H0 No significance shown!
27.4 25
1.25
0.0334
.025
.025
22.42
= 25
27.4 27.58
6.1-23
gray
green
yellow
orange
p < .001
red
Solution: X = Cerebral Blood Flow (CBF) is normally distributed, H0: = 0.5 ml/g/min
n = 6 x = 0.767 ml/g/min s = 0.082 ml/g/min
As the population standard deviation is unknown, and the sample size n is small, the t-test
on df = 6 1 = 5 degrees of freedom is appropriate.
= s = 0.082 ml/g/min = 0.03348 ml/g/min yields
Using standard error estimate s.e.
n
6
0.767 0.5
p-value = 2 P( X 0.767) = 2 P T5
= 2 P(T5 7.976) = 2 (.00025) = .0005
0.03348
This is strongly significant at any reasonable level . According to the scale, the voxel
should be assigned the color RED.
6.1-24
from H0
Test Statistic
t-score = x 0
t-score
ALTERNATIVE HYPOTHESIS
HA: < 0
HA: 0
HA: > 0
1 table entry
table entry
table entry
for |t-score|
2 table entry
2 table entry
for |t-score|
1 table entry
for |t-score|
6.1-25
Checks for normality ~ Is the ongoing assumption that the sample data come
from a normally-distributed population reasonable?
Quantiles: As we have already seen, 68% within 1 s.d. of mean, 95% within 2
s.d. of mean, 99.7% within 3 s.d. of mean, etc. Other percentiles can also be
checked informally, or more formally via...
Normal Scores Plot: The graph of the quantiles of the n ordered (low-to-high)
observations, versus the n known z-scores that divide the total area under N(0, 1)
equally (representing an ideal sample from the standard normal distribution), should
resemble a straight line. Highly skewed data would generate a curved plot. Also
known as a probability plot or Q-Q plot (for Quantile-Quantile), this is a popular
method.
Example: Suppose n = 24 ages (years). Calculate the .04 quantiles of the sample, and
plot them against the 24 known (i.e., theoretical) .04 quantiles of the standard
normal distribution (below).
Each of these 25
areas represents
.04 of the total.
{1.750, 1.405, 1.175, 0.994, 0.842, 0.706, 0.583, 0.468, 0.358, 0.253, 0.151, 0.050,
+0.050, +0.151, +0.253, +0.358, +0.468, +0.583, +0.706, +0.842, +0.994, +1.175, +1.405, +1.750}
6.1-26
Sample 1:
{6, 8, 11, 12, 15, 17, 20, 20, 21, 23, 24, 24, 26, 28, 29, 30, 31, 32, 34, 37, 40, 41, 42, 45}
The Q-Q plot of this sample (see first graph, below) reveals a more or less linear trend
between the quantiles, which indicates that it is not unreasonable to assume that these
data are derived from a population whose ages are indeed normally distributed.
Sample 2:
{6, 6, 8, 8, 9, 10, 10, 10, 11, 11, 13, 16, 20, 21, 23, 28, 31, 32, 36, 38, 40, 44, 47, 50}
The Q-Q plot of this sample (see second graph, below) reveals an obvious deviation
from normality. Moreover, the general concave up nonlinearity seems to suggest that
the data are positively skewed (i.e., skewed to the right), and in fact, this is the case.
Applying statistical tests that rely on the normality assumption to data sets that are not
so distributed could very well yield erroneous results!
Anderson-Darling
Shapiro-Wilk
Lilliefors (a special case of Kolmogorov-Smirnov)
6.1-27
Y = ln(X)
Frequency
1
2
3
4
5
5
4
4
3
3
3
2
2
2
2
1
1
1
1
1
Nonparametric Tests: Statistical tests (on the median, rather than the mean) that are
free of any assumptions on the underlying distribution of the population random
variable. Slightly less powerful than the corresponding parametric tests, tedious to
carry out by hand, but their generality makes them very useful, especially for small
samples where normality can be difficult to verify.
Sign Test (crude), Wilcoxon Signed Rank Test (preferred)
6.1-28
GENERAL SUMMARY
Is random variable
approximately
normally distributed
(or mildly skewed)?
Yes
Yes
Is known?
No, or
dont know
No
Yes
No
Is n 30?
Use Z-test
(with )
Z=
X 0
~ N (0,1)
n
Z=
X 0
~ N (0,1)
s n
Use t-test
(with = s)
T=
X 0
~ tn 1
s n
Use a transformation,
or a nonparametric test,
e.g., Wilcoxon Signed
Rank Test
CONTINUE
6.1-29
2-sided
HA:
1-sided, right
HA: >
Z- or Tdf - score
6.1-30
6.1.2
Variance
Given:
Null Hypothesis
versus
H0: 2 = 0 2
(constant value)
Two-sided Alternative
Either 2 < 0 2 or 2 > 0 2
Calculate s 2
=
2
Test statistic:
(n 1) s2
2
n1
Sampling Distribution of 2:
Chi-Squared Distribution, with = n 1 degrees of freedom df = 1, 2, 3,
=1
f(x) =
/2
1
x /2 1 ex/2
(/2)
=2
=3
=4
=5
=6
=7
Note that the chi-squared distribution is not symmetric, but skewed to the right. We will not
pursue the details for finding an acceptance region and confidence intervals for 2 here. But
this distribution will appear again, in the context of hypothesis testing for equal proportions.
6.1.3
6.1-31
Proportion
POPULATION
Binary random variable
1, Success with probability
Y =
SAMPLE
Random Variable: X = # Successes ~ Bin(n, )
Recall: Assuming n 30, n 15, and n (1 ) 15,
X ~ N ( n, n (1 ) ), approximately. (see 4.2)
Therefore, dividing by n
= n ~ N ,
(1 )
n
, approximately.
Problem! The expression for the standard error involves the very parameter upon which
we are performing statistical inference. (This did not happen with inference on the mean ,
where the standard error is s.e. = / n, which does not depend on .)
=0
=1
= 0.5
.03
.03
.04
.04
.046
.049
.05
.049
.046
0.1
0.3
0.5
0.7
0.9
(1 )
Illustration of the bell curves N ,
6.1-32
Example: Refer back to the coin toss example of section 1.1, where a random sample of
n = 100 independent trials is performed in order to acquire information about the
probability P(Heads) = . Suppose that X = 64 Heads are obtained. Then the samplebased point estimate of is calculated as = X / n = 64/100 = 0.64 . To improve this to
an interval estimate, we can compute the
z/2
(1 )
n
, + z/2
(1 )
n
Null Hypothesis
vs. Alternative Hypothesis
HA: 0.5
As the 95% CI does not contain the null-value = 0.5, H0 can be rejected at the
= .05 level, i.e., the coin is not fair.
z/2
0 (1 0)
n
, 0 + z/2
0 (1 0)
n
= 0.50
(0.50)(0.50)
= 0.50 1.96 (.050)
100
s.e.0 = .050
6.1-33
Test Statistic
Z =
0
~ N(0, 1)
0 (1 0)
n
p-value
= 2 P( 0.64) = 2 PZ
0.64 0.50
As p << = .05, H0 can be strongly rejected at this level, i.e., the coin is not fair.
Null Distribution
~ N(0.5, .05)
0.95
0.025
0.025
0.0026
0.0026
0.402
= 0.5 0.546
0.598 0.64
0.734
6.1-34
Comments:
0.5
A continuity correction factor of n may be added to the numerator of the Z test
statistic above, in accordance with the normal approximation to the binomial
distribution see 4.2 of these Lecture Notes. (The n in the denominator is there
because we are here dealing with proportion of success = X / n, rather than just
number of successes X.)
Power and sample size calculations are similar to those of inference for the mean, and
will not be pursued here.
IMPORTANT
See Appendix > Statistical Inference > General Parameters and FORMULA TABLES.
and Appendix > Statistical Inference > Means and Proportions, One and Two Samples.