Stat2602 Chapter5
Stat2602 Chapter5
Hypothesis
Null Hypothesis
Alternative Hypothesis
H 0 : θ 0
H 1 : θ 1
where 0 and 1 are disjoint subsets of the parameter space . However, for
non-parametric models, the hypotheses may not be specified in terms of the
parameters.
Simple Hypothesis
Composite Hypothesis
P.97
Stat2602 Probability and Statistics II Fall 2014-2015
Often, the null hypothesis has the form H 0 : θ c for some one-dimensional
parameter θ and some known constant c, and the alternative hypothesis has one
of three forms:
H 1 : θ c , H 2 : θ c , or H 1 : θ c
The first two alternative hypotheses are called one-sided alternatives, and the third
is called the two-sided alternative.
Example 5.1
X 1 , X 2 ,..., X m ~ N x , x2 , Y1 , Y2 ,..., Ym ~ N y , y2
iid iid
2. Statistical model:
H 0 : x2 y2 composite hypothesis
H 1 : x2 y2 composite hypothesis, two-sided alternative
H 0 : p1 2 p2 composite hypothesis
H 1 : p1 2 p2 composite hypothesis, one-sided alternative
iid
4. Statistical model: X 1 , X 2 ,..., X n ~ some distribution
Non-parametric model.
P.98
Stat2602 Probability and Statistics II Fall 2014-2015
Example 5.2
A chemical process has produced on the average 800 tons of chemicals per day.
Some engineers claimed that the production has been declined recently due to the
depreciation of the machines. Should we take this claim?
We may record a random sample of daily yields and estimate the mean daily yield
by the sample mean X . The basic strategy in hypothesis testing is to measure
how far this statistic is from a hypothesized value of the parameter . If the
distance is “large”, we would argue that the hypothesized statement is inconsistent
with the data and we would be inclined to reject the hypothesis. (We could be
wrong, of course; rare events do happen!)
distance
X 800
According to intuition, if a sample mean is observed to be less than 800, then the
data seems to support the alternative hypothesis H 1 : 800 and opposite the null
hypothesis H 0 : 800 . However, before jumping to the conclusion, we must take
into account the possible variability of the observations as the distance may be just
resulted from sampling errors but not the deviation of the null hypothesis from the
truth.
is called a test and the sample statistic X used is called the test statistic.
P.99
Stat2602 Probability and Statistics II Fall 2014-2015
Definition
Remark
Note that X is just a point estimator of and there would be some estimation
error. It may be possible that the process still yields an average of 800 per day but
the sample mean X is observed to be smaller than 793, thereby leading to an
incorrect conclusion by rejecting H 0 . Hence we may need to determine how
reliable the test procedure is and this is measured by the probabilities of drawing
mistaken conclusions.
To assess the reliability of the test, there are two possible types of error to be
considered. The following table summarizes the possibilities of drawing correct or
incorrect conclusions.
Accept H 0 Reject H 0
P.100
Stat2602 Probability and Statistics II Fall 2014-2015
Example 5.3
A manufacturer of light bulbs has to decide whether the mean lifetime of the light
bulbs has been increased from 1200 hours to 1240 hours after implementing a new
production method. The hypotheses to be test can be formulated as
H 0 : 1200 vs H 1 : 1240 .
PReject H 0 | H 0 true
P X 1249 | 1200
1249 1200
1
300 100
1 1.633 0.0513
PAccept H 0 | H 1 true
P X 1249 | 1249
1249 1240
300 100
0.3 0.6179.
As can be seen, such test would have only about 5% chance to make the type I
error if H 0 is true, but will have about 62% chance to make the type II error if H 0
is false. In order to reduce the type II error probability, we may use a smaller cut
off in the test, so that it would become harder to accept H 0 .
P.101
Stat2602 Probability and Statistics II Fall 2014-2015
1234 1240
0.2 0.4207
300 100
which is substantially reduced. However, the type I error probability would then
become larger:
1234 1200
1 1 1.13 0.1292 .
300 100
As can be seen from this example, there is a trade-off between the two types of
error. Making smaller will result in a larger , and vice versa. Therefore in
designing a test we can only control one of them, and the convention is to
guarantee in a desired low level and then try to reduce as much as we could
(i.e. type I error is considered as more serious than type II error). That is why the
roles of H 0 and H 1 are not interchangeable.
Choice of H0 and H1
Example 5.4
Suppose we want to know if a man is guilty. There can be two different settings of
the hypotheses:
H 0 : He is guilty. H 0 : He is not guilty.
A: , B:
H 1 : He is not guilty. H 1 : He is guilty.
Since the false judgment of the guilty of a man is considered as a more serious
error than the false judgment of the non-guilty of a man, the setting according to B
is more appropriate.
P.102
Stat2602 Probability and Statistics II Fall 2014-2015
Example 5.5
Suppose we want to show that Brand A products are more popular than Brand B.
Then we may set the hypotheses as
If we find most people love A, then we can reject H 0 and establish our assertion
with great confidence.
Example 5.6
Suppose a standard medicine for a particular disease has cure rate p0 0.6 . A drug
company had developed a new medicine for the same disease. Before bringing the
new medicine to the market, the in-house statisticians of this company were asked
to show that the new medicine has a higher cure rate than the standard medicine
based on some clinical trial data. The appropriate setting of the hypotheses they
should use is
P.103
Stat2602 Probability and Statistics II Fall 2014-2015
(i) the type I error leads to the use of a worse medicine which is a more serious
error then abandon the use of a better medicine (type II error); and
(ii) the drug company wants to establish the assertion that their new medicine is
better than the standard medicine.
We may treat n 50 random patients by the new medicine, observe the data X
which is the number of patients cured by the new medicine, and then use the
sample cure rate p̂ as the test statistic to construct the test:
“Reject H 0 if pˆ 0.65 ”
X 0,1,2,...,50| X 0.65n
Note : The rejection region, or the decision rule, is decided before we actually
observe our data.
In Example 5.3, both the null and alternative hypotheses are simple hypotheses
which specify the distribution of the population, thereby allowing the calculations
of a single type I error probability and a single type II error probability. In practical
situations, the hypotheses under consideration are often composite and the error
probabilities will become functions of the parameters. In general, they can be
evaluated through a power function.
Definition
K θ P Reject H 0 | θ .
P.104
Stat2602 Probability and Statistics II Fall 2014-2015
Example 5.7
“Reject H 0 if pˆ 0.65 ”
pˆ p
~ N 0,1
.
p 1 p 50
K p P Reject H 0 | p
P pˆ 0.65 | p
50 pˆ p 50 0.65 p
P | p
p 1 p p 1 p
50 0.65 p
1
for p 0,1
p 1 p
1
0.9
0.8
0.7
0.6
K(p) 0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
p
P.105
Stat2602 Probability and Statistics II Fall 2014-2015
Definition
The size of a test is the maximum of the probability of a type I error, i.e.
size max K θ .
θ0
A test is said to have significance level if its size is less than or equal to . In
many cases, the size and the significance level of a test are equal.
The significance level (or the size) of a test represents the worst scenario of falsely
rejecting the null hypothesis. Thus, if we set the significance level of the test to
0.05, we are guaranteeing that the probability of a type I error is at most 0.05.
Example 5.8
50 0.65 p
K p 1
for p 0,1 .
p 1 p
50 0.65 p
Using simple calculus, it can be easily shown that the expression is
p 1 p
a strictly decreasing function of p. Therefore the power function K p is strictly
increasing (as can also be seen from the plot of K p ). Under H 0 : p 0.6 , the
maximum of K p will be attained at p 0.6 , i.e. the size of the test is
50 0.65 0.6
size max K p 1 1 0.72 0.2358
p 0.6
0 . 6 1 0 . 6
There would be at most 23.6% chance to make the type I error. To construct a test
with at most 5% type I error probability, we may solve the equation
50 c 0.6
0.05 1
0.61 0.6
and obtain the critical value c 0.7140 , so that the corresponding test becomes
“Reject H 0 if pˆ 0.714 .”
P.106
Stat2602 Probability and Statistics II Fall 2014-2015
Definition
The significance level of a test is defined as an upper bound of the type I error
probability of not committing a Type II error, i.e. power 1 . For a composite
alternative H 1 : θ 1 , the power of the test at a point θ1 1 is the value of the
power function at that point:
The power of a test is the probability of correctly rejecting the null hypothesis
when the null hypothesis is false. The higher the power, the more sensitive the test
is to detecting the deviation from the null hypothesis if one actually exists. Since
we can always construct a test with a desirable size, the comparison among
different test procedures would be based on the power.
Example 5.9
H 0 : p 0.6 vs H 1 : p 0.6
“Reject H 0 if pˆ 0.714 .”
50 0.714 p
K p 1
for p 0,1 .
p 1 p
Using this power function, we can easily determine the power of the test:
The test is not powerful when the new medicine slightly improves the cure rate,
and will be powerful only when there is a great improvement.
P.107
Stat2602 Probability and Statistics II Fall 2014-2015
In summary, the size and the power can be illustrated in the following graph of the
power function:
power = 0.9385
(at p = 0.8)
size = 0.05
H0 H1
4. Choose a test statistic T with known (and nice, e.g. tabled) null distribution
(distribution under H 0 ).
5. Based on the null distribution of T, derive the size of the test. Find the rejection
region by setting the size to be less than or equal to the significance level .
P.108
Stat2602 Probability and Statistics II Fall 2014-2015
Example 5.10
General Steps
1. Statistical model: X = number of patients cured out of 50, X ~ b50, p .
2. H 0 : p 0.6 vs H 1 : p 0.6
p1 p 50
c 0.6 c 30
0.05 1 1.645 c 0.7140
0.0693 0.0693
Hence the rejection region is given by pˆ 0.7140 , i.e. X 35.7 . Since X must
be an integer, we will reject H 0 if X 36 . Note the size of the test would be
less than 0.05.
P.109
Stat2602 Probability and Statistics II Fall 2014-2015
We may use the sample mean X , which is the MLE of , as the test statistic and
reject H 0 if X differs too much from 0 .
X
By the sampling distribution of X , we have Z ~ N 0,1 .
n
X 0
Therefore under H 0 , Z ~ N 0,1 .
n
Solving the equation gives the critical value c Z 2 and hence the rejection rule is:
X 0
Reject H 0 at significance level if Z 2 .
n
Using similar derivations, we can obtain the rejection rules for different settings of
the hypotheses and they are summarized below.
P.110
Stat2602 Probability and Statistics II Fall 2014-2015
X 0
Reject H 0 at significance level if Z 2 .
n
X 0
Reject H 0 at significance level if Z .
n
X 0
Reject H 0 at significance level if Z .
n
Example 5.11
A chemical process has produced on the average 800 tons of chemicals per day.
Some engineers claimed that the production has been declined recently due to the
depreciation of the machines. From the daily yield in past five days ( n 5 ),
X 795 is observed. If the amount produced on each day is assumed to be
normally distributed with known variance 2 75 . Should we take their claims?
X 800
At 5% significance level, we will reject H 0 if Z Z 0.05 1.645 .
75 5
795 800
From the data, Z 1.291 1.645 .
75 5
The above test procedure guarantee only 5%, which is a small chance, that the type
I error would be made. What is the chance of making the type II error if the mean
daily yield actually decreases to 790?
P.111
Stat2602 Probability and Statistics II Fall 2014-2015
X 800
1.645 X 793.63 .
75 5
793.63
K PReject H 0 |
75 5
793.63 790
K 790 0.937 0.8256
75 5
Note that the above tests requires knowing the value of the population variance
2 . In practical situations, the value of 2 is usually unknown. If it is the case, we
may estimate it by the sample variance S 2 and use the Student’s t-distribution,
which leads to the following rejection rules:
X 0
Reject H 0 at significance level if tn 1, 2 .
S n
X 0
Reject H 0 at significance level if tn 1, .
S n
X 0
Reject H 0 at significance level if tn 1, .
S n
P.112
Stat2602 Probability and Statistics II Fall 2014-2015
Example 5.12
X 800
At 5% significance level, we will reject H 0 if T t4 , 0.05 2.132 .
S 5
795 800
From the data, T 1.341 2.132 .
69.5 5
We conclude that H 0 is not rejected at 5% significance level, i.e. data didn’t show
a decrease in the mean daily yield from the chemical process.
The above test procedure based on the comparison between the test statistic and
critical value is called the classical approach. Another equivalent approach relies
on the calculation of a quantity called the significance probability or simply the p-
value. The interpretation of “p-value” is the probability of the occurrence of the
particular observed value or more extreme values, under the assumption of H 0 .
The smaller the magnitude of p-value, the stronger is the evidence against H 0 .
Given a pre-assigned significance level , one can either compare the observed
value of test statistic with the critical value; or first compute the p-value from the
observed value of test statistic and then compare with .
compare
p-value
compute compute
P.113
Stat2602 Probability and Statistics II Fall 2014-2015
“Reject H 0 if test statistic is greater than the critical value” ……………. [1]
is equivalent to
To use procedure [1], we need to compute the critical value before drawing the
conclusion. Since different people may have different choices of the significance
level , different critical values will need to be determined.
On the other hand, procedure [2] only needs the calculation of the p-value.
Moreover, the p-value is more informative than just stating whether or not a
hypothesis is rejected. Reporting the p-value indicates just how unlikely the
observed event is under the null hypothesis and the users can make their own
decision on what to conclude in face of the evidence. Therefore in most statistical
packages, the p-value will be provided after analysing the data.
Example 5.13
The calculation of p-value depends on the tests and hypotheses. In Example 5.11,
the test statistic will be “more extreme” if we observe Z 1.291 . The p-value can
be calculated as
p value PZ 1.291 | H 0 is true
1.291
0.0985
which is larger than 0.05 . This will lead to the non-rejection of H 0 and is
consistent with the conclusion drawn in Example 5.11.
P.114
Stat2602 Probability and Statistics II Fall 2014-2015
The derivations of the test procedures are quite similar to the construction of
confidence interval in Section 4.2 For brevity, only the two sample t-test under
equal variance assumption is illustrated here. The derivations under other situations
are left as exercise.
X 1 , X 2 ,..., X m ~ N x , Y1 , Y2 ,...,Yn ~ N y , 2 .
iid iid
2
,
Note that the two population variances are assumed to be the same.
m 1S 2
n 1S y2
Estimate by the pooled sample variance:
2
S 2
x
mn2
pool
X Y x y
Recall, from Section 4.2, that ~ t m n2 .
S pool 1 m 1 n
X Y H 0
P.115
Stat2602 Probability and Statistics II Fall 2014-2015
Example 5.14
The Rejoy company claims that their shampoo performs better than the shampoo
manufactured by another company, N&S. To assert their claim, they want to
compare these two brands of shampoo in their ability of removing dandruff. An
experiment was carried out in which 8 volunteers had used Rejoy and 6 volunteers
had used N&S to wash their hair regularly in one week. The following tables show
the data and summary statistics of the remaining dandruff on these volunteers after
one week:
Rejoy N&S
Sample size m6 n 8
Sample mean X 90.67 Y 129.0
Sample variance S x2 399.2 S y2 781.76
Test H 0 : x y vs H 1 : x y at 0.05 .
X Y
At 5% significance level, we reject H 0 if T t12 , 0.05 1.782 .
S Pool 1 6 1 8
90.67 129.0
From the data, T 2.845 1.782 .
622.36 1 6 1 8
P.116
Stat2602 Probability and Statistics II Fall 2014-2015
Remarks
1. For large samples, the t critical values will become the Z critical values.
2. The above tests can be easily modified to handle more general hypotheses. For
example, one may want to test whether x is greater than y by more than a
specific number c. Then the one-sided hypotheses can be formulated as
H0 : x y c vs H 1 : x y c
X Y c
T .
S pool 1 m 1 n
The following procedure allows us to test the equal variance hypothesis based on
the sample variances S x2 and S y2 of two independent samples drawn from the two
normal populations.
S x2
Test statistic : F 2
Sy
P.117
Stat2602 Probability and Statistics II Fall 2014-2015
Example 5.15
H 0 : x2 y2 vs H 1 : x2 y2
20.67 2
Test statistic : F 0.616
26.332
1 1
Critical values: F0.025 , 5, 7 5.29 , 0.146
F0.025, 7 , 5 6.85
The test statistic F 0.616 is between these two critical values. According to the
test procedure, the hypothesis of equal variance is not rejected, i.e. the variances
are not significantly different from each other.
Remarks
1. As for the construction of confidence interval, all the test procedures described
above relies much on the normal population assumption. If the normal
assumption is violated, the above procedures are still valid only on large
samples location problems. For small sample problems with non-normal
population(s), we will need to use non-parametric statistical methods. Also,
the equal variance test is not valid even for large samples, if the normal
assumption is not satisfied.
2. Independence between the two samples is also a crucial assumption for the
above test procedures on two sample problems.
P.118
Stat2602 Probability and Statistics II Fall 2014-2015
Sometimes data does not come in the form of independent samples. For example,
the midterm and final examination results of the same group of students are
obviously dependent. The IQ scores of a group of fathers may be dependent to the
IQ scores of their sons. These kinds of dependent samples are called paired
samples.
Data structure:
D
1 n
Di X Y , S D2
1 n
Di D 2
n i 1 n 1 i 1
D1 , D2 ,..., Dn ~ N , D2 , , D2 unknown
iid
D H0
T ~ t n 1
SD n
P.119
Stat2602 Probability and Statistics II Fall 2014-2015
Example 5.16
Volunteer A B C D E
Rejoy 105 62 78 112 96
N&S 97 70 54 85 93
Di 8 –8 24 27 3
Test H 0 : 0 vs H 1 : 0 at 0.05 .
D
At 5% significance level, we will reject H 0 if T t4 , 0.05 2.132 .
SD 5
10.8
From the data, D 10.8 , S D 14.65 , and T 1.648 2.132
14.65 5
P.120
Stat2602 Probability and Statistics II Fall 2014-2015
pˆ p0
~ N 0, 1
H0
Z
p0 1 p0 n
where p̂ is the sample proportion. The rejection rules are given below:
Example 5.17
pˆ 0.9
At 1% significance level, we reject H 0 if Z Z 0.01 2.326 .
0.90.1 100
78 0.78 0.9
From the data, pˆ 0.78 and Z 4 2.326 .
100 0.90.1 100
P.121
Stat2602 Probability and Statistics II Fall 2014-2015
Remark
pˆ p0
Z
pˆ 1 pˆ n
as asymptotically its null distribution is also N 0,1 . The use of the sample
proportion p̂ instead of the hypothetical value p0 in the denominator may provide
better estimates of the standard error when the null hypothesis is clearly false.
However, using p0 can result better approximation to -level significance tests.
Thus there are trade-offs and it is difficult to say one is better than the other.
Fortunately, the numerical answers are usually about the same.
p p p p ~ N 0, 1
1 2 1 2
.
p 1 p p 1 p
1
1 2 2
n1 n2
when the sample sizes n1 and n2 are large, where pˆ 1 , pˆ 2 are the corresponding
sample proportions. If H 0 : p1 p2 is true, the common proportion can be
estimated by using the pooled sample proportion
n1 pˆ 1 n2 pˆ 2
pˆ pool .
n1 n2
pˆ 1 pˆ 2
Z
pˆ pool 1 pˆ pool 1 n1 1 n2
.
P.122
Stat2602 Probability and Statistics II Fall 2014-2015
Example 5.18
One production process yielded 27 defective pieces in a random sample of size 400
while another yielded 15 defective pieces in a random sample of size 300. Test the
null hypothesis that the two processes yield equal proportions of defectives, against
the alternative hypothesis that the defective rates are different.
Test H 0 : p1 p2 vs H 1 : p1 p2 at 0.05 .
27 15
Sample proportions : pˆ 1 6.75% , pˆ 2 5%
400 300
27 15
Pooled sample proportion: pˆ pool 6%
300 400
0.0675 0.05
Test statistic: Z 0.965
1 1
0.06 0.94
400 300
Since Z 0.965 1.96 Z 0.025 , we do not reject H 0 at 0.05 , i.e. the defective
rates of the two processes are not significantly different.
P.123
Stat2602 Probability and Statistics II Fall 2014-2015
Remark
Z
pˆ
1 pˆ 2
pˆ 1 1 pˆ 1 pˆ 2 1 pˆ 2
n1 n2
Example 5.19
A tracking study in MIT found that out of 198 ISP (Integrated Studies Program)
students, 189 graduated within the time limit of six years; out of a random sample
of 210 non-ISP students, 158 graduated within the time limit. The data can be
tabulated as
ISP Non-ISP
Graduated 189 158
Not Graduated 9 52
Total 198 210
189 158
Sample proportions : pˆ ISP 0.9545 , pˆ N 0.7524
198 210
0.9545 0.7524
Test statistic: Z 6.0757
0.95450.0455 198 0.75240.2476 210
Since Z 6.0757 1.645 Z 0.05 , we reject H 0 at 0.05 , i.e. the graduation rate
of ISP students is significantly higher than Non-ISP students.
P.124