0% found this document useful (0 votes)
1 views28 pages

Stat2602 Chapter5

The document discusses hypothesis testing in statistics, defining key concepts such as null and alternative hypotheses, simple and composite hypotheses, and the structure of hypothesis tests. It explains the importance of Type I and Type II errors, providing examples to illustrate the implications of these errors in decision-making. Additionally, it introduces the power function of a test, which assesses the probability of rejecting the null hypothesis based on the parameter vector.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views28 pages

Stat2602 Chapter5

The document discusses hypothesis testing in statistics, defining key concepts such as null and alternative hypotheses, simple and composite hypotheses, and the structure of hypothesis tests. It explains the importance of Type I and Type II errors, providing examples to illustrate the implications of these errors in decision-making. Additionally, it introduces the power function of a test, which assesses the probability of rejecting the null hypothesis based on the parameter vector.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Stat2602 Probability and Statistics II Fall 2014-2015

Chapter V Hypothesis Testing


§ 5.1 Introduction

Hypothesis

A statistical hypothesis is an assertion or statement about the population, usually


formulated in terms of population parameters . It is denoted by H 0 or H 1 .

Null Hypothesis

The null hypothesis, denoted by H 0 , is usually a statement about something that


has been established, or something that has an authoritative standing, or something
worth protecting.

Alternative Hypothesis

The alternative hypothesis, denoted by H 1 , is usually a statement about something


that challenges the authority, or something that needs not be protected strongly.

The hypotheses are often generally expressed in terms of the population


parameters:

H 0 : θ  0
H 1 : θ  1

where 0 and 1 are disjoint subsets of the parameter space  . However, for
non-parametric models, the hypotheses may not be specified in terms of the
parameters.

Simple Hypothesis

A hypothesis H 0 : θ  θ 0 completely specifies the distribution of the population is


called a simple hypothesis.

Composite Hypothesis

A hypothesis H 0 : θ  0 that does not completely specify the distribution of the


population is called a composite hypothesis.

P.97
Stat2602 Probability and Statistics II Fall 2014-2015

Often, the null hypothesis has the form H 0 :  θ   c for some one-dimensional
parameter  θ  and some known constant c, and the alternative hypothesis has one
of three forms:

H 1 :  θ   c , H 2 :  θ   c , or H 1 :  θ   c

The first two alternative hypotheses are called one-sided alternatives, and the third
is called the two-sided alternative.

Example 5.1

X 1 , X 2 ,..., X n ~ N  ,  02  where  0 is a known constant


iid
1. Statistical model:

H 0 :   800 simple hypothesis


H 1 :   800 composite hypothesis, one-sided alternative

X 1 , X 2 ,..., X m ~ N  x ,  x2  , Y1 , Y2 ,..., Ym ~ N  y ,  y2 
iid iid
2. Statistical model:

H 0 :  x2   y2 composite hypothesis
H 1 :  x2   y2 composite hypothesis, two-sided alternative

3. Statistical model: X ~ bn1 , p1  , Y ~ bn2 , p2 

H 0 : p1  2 p2 composite hypothesis
H 1 : p1  2 p2 composite hypothesis, one-sided alternative

iid
4. Statistical model: X 1 , X 2 ,..., X n ~ some distribution

H 0 : Population is a normal distribution.


H 1 : Population is not a normal distribution.

Non-parametric model.

P.98
Stat2602 Probability and Statistics II Fall 2014-2015

Example 5.2

A chemical process has produced on the average 800 tons of chemicals per day.
Some engineers claimed that the production has been declined recently due to the
depreciation of the machines. Should we take this claim?

Null hypothesis: H 0 :   800 (simple hypothesis)


Alternative hypothesis: H 1 :   800 (composite, one-sided alternative)

We may record a random sample of daily yields and estimate the mean daily yield
 by the sample mean X . The basic strategy in hypothesis testing is to measure
how far this statistic is from a hypothesized value of the parameter  . If the
distance is “large”, we would argue that the hypothesized statement is inconsistent
with the data and we would be inclined to reject the hypothesis. (We could be
wrong, of course; rare events do happen!)

distance

X   800

According to intuition, if a sample mean is observed to be less than 800, then the
data seems to support the alternative hypothesis H 1 :   800 and opposite the null
hypothesis H 0 :   800 . However, before jumping to the conclusion, we must take
into account the possible variability of the observations as the distance may be just
resulted from sampling errors but not the deviation of the null hypothesis from the
truth.

A strong conclusion may be drawn if the distance is large enough, or in other


words, if X is small enough. Based upon this reasoning, we will make a decision
according to the value of the sample mean X and will conclude a false null
hypothesis if it is smaller than some predetermined constant, say, 793. The
decision rule in the form of

“Reject H 0 and hence accept H 1 if X  793 ”

is called a test and the sample statistic X used is called the test statistic.

P.99
Stat2602 Probability and Statistics II Fall 2014-2015

Definition

A test of statistical hypothesis is a rule or procedure, based upon the observed


values of the data X , that leads to the acceptance or rejection of the null
hypothesis H 0 . The set C of X for which the test rejects H 0 is called the rejection
region (or critical region). The rejection region is often formulated using a test
statistic T X  .

Remark

Note that X is just a point estimator of  and there would be some estimation
error. It may be possible that the process still yields an average of 800 per day but
the sample mean X is observed to be smaller than 793, thereby leading to an
incorrect conclusion by rejecting H 0 . Hence we may need to determine how
reliable the test procedure is and this is measured by the probabilities of drawing
mistaken conclusions.

§ 5.2 Type I Error and Type II Error

To assess the reliability of the test, there are two possible types of error to be
considered. The following table summarizes the possibilities of drawing correct or
incorrect conclusions.

Accept H 0 Reject H 0

H 0 true Correct decision Type I error

( H 0 false) H 1 true Type II error Correct decision

Type I error probability:   PReject H 0 | H 0 true

Type II error probability:   PAccept H 0 | H 1 true

P.100
Stat2602 Probability and Statistics II Fall 2014-2015

Example 5.3

A manufacturer of light bulbs has to decide whether the mean lifetime of the light
bulbs has been increased from 1200 hours to 1240 hours after implementing a new
production method. The hypotheses to be test can be formulated as

H 0 :   1200 vs H 1 :   1240 .

Suppose that the population standard deviation of the lifetimes is known as


  300 hours. Based on a sample of size n  100 light bulbs, consider the test:

“Reject H 0 and hence accept H 1 if X  1249 .”

The sampling distribution of X can be approximated by normal as the sample size


is large:
 300 2 
X ~ N  , .
 100 

The type I error probability can be calculated as

  PReject H 0 | H 0 true
 P  X  1249 |   1200
 1249  1200 
 1    
 300 100 
 1   1.633  0.0513

and the type II error probability is

  PAccept H 0 | H 1 true
 P  X  1249 |   1249
 1249  1240 
   
 300 100 
  0.3  0.6179.

As can be seen, such test would have only about 5% chance to make the type I
error if H 0 is true, but will have about 62% chance to make the type II error if H 0
is false. In order to reduce the type II error probability, we may use a smaller cut
off in the test, so that it would become harder to accept H 0 .

P.101
Stat2602 Probability and Statistics II Fall 2014-2015

For example, the test

“Reject H 0 and hence accept H 1 if X  1234 ”

would have a type II error probability given by

 1234  1240 
        0.2   0.4207
 300 100 

which is substantially reduced. However, the type I error probability would then
become larger:

 1234  1200 
  1      1   1.13  0.1292 .
 300 100 

As can be seen from this example, there is a trade-off between the two types of
error. Making  smaller will result in a larger  , and vice versa. Therefore in
designing a test we can only control one of them, and the convention is to
guarantee  in a desired low level and then try to reduce  as much as we could
(i.e. type I error is considered as more serious than type II error). That is why the
roles of H 0 and H 1 are not interchangeable.

Choice of H0 and H1

Criterion 1 : Since we will keep   PReject H 0 | H 0 true in a low level, it is


more appropriate to choose a statement as H 0 when falsely rejecting
it is considered as a serious error.

Example 5.4

Suppose we want to know if a man is guilty. There can be two different settings of
the hypotheses:
 H 0 : He is guilty.  H 0 : He is not guilty.
A:  , B:
 H 1 : He is not guilty.  H 1 : He is guilty.

Since the false judgment of the guilty of a man is considered as a more serious
error than the false judgment of the non-guilty of a man, the setting according to B
is more appropriate.

P.102
Stat2602 Probability and Statistics II Fall 2014-2015

In testing statistical hypothesis, we are testing H 0 against H 1 , i.e. we observe the


data to see if there is sufficient evidence to reject H 0 . If we have sufficient
evidence to reject H 0 , we can have great confidence that H 0 is false and H 1 is
true. However, if we observe the data and find that H 0 is not rejected, it does not
mean that we have great confidence in the truth of H 0 . It only means that we have
no enough evidence to reject H 0 . Formally we should say “do not reject H 0 ”
instead of “accept H 0 ”. Therefore in testing statistical hypothesis, testing H 0
against H 1 is not equivalent to testing H 0 again H 1 .

Criterion 2 : When we aim at establishing an assertion with substantive support


obtained from the data, the negation of the assertion is taken to be
the null hypothesis H 0 and the assertion itself is taken to be the
alternative hypothesis H 1 .

Example 5.5

Suppose we want to show that Brand A products are more popular than Brand B.
Then we may set the hypotheses as

H 0 : B is more popular than A,


H 1 : A is more popular than B.

If we find most people love A, then we can reject H 0 and establish our assertion
with great confidence.

Example 5.6

Suppose a standard medicine for a particular disease has cure rate p0  0.6 . A drug
company had developed a new medicine for the same disease. Before bringing the
new medicine to the market, the in-house statisticians of this company were asked
to show that the new medicine has a higher cure rate than the standard medicine
based on some clinical trial data. The appropriate setting of the hypotheses they
should use is

H 0 : p  0.6 (new medicine is not better),


H 1 : p  0.6 (new medicine is better).

P.103
Stat2602 Probability and Statistics II Fall 2014-2015

Such setting satisfies both criteria as

(i) the type I error leads to the use of a worse medicine which is a more serious
error then abandon the use of a better medicine (type II error); and

(ii) the drug company wants to establish the assertion that their new medicine is
better than the standard medicine.

We may treat n  50 random patients by the new medicine, observe the data X
which is the number of patients cured by the new medicine, and then use the
sample cure rate p̂ as the test statistic to construct the test:

“Reject H 0 if pˆ  0.65 ”

The corresponding rejection region is

X  0,1,2,...,50| X  0.65n

Note : The rejection region, or the decision rule, is decided before we actually
observe our data.

§ 5.3 Size, Power, and Power Function

In Example 5.3, both the null and alternative hypotheses are simple hypotheses
which specify the distribution of the population, thereby allowing the calculations
of a single type I error probability and a single type II error probability. In practical
situations, the hypotheses under consideration are often composite and the error
probabilities will become functions of the parameters. In general, they can be
evaluated through a power function.

Definition

The power function of a test of H 0 : θ  0 against H 1 : θ  1 is a function of the


parameter vector θ that gives the probability of rejecting H 0 , i.e.

K θ   P Reject H 0 | θ  .

P.104
Stat2602 Probability and Statistics II Fall 2014-2015

Example 5.7

Consider the hypotheses in Example 5.6:

H 0 : p  0.6 (new medicine is not better),


H 1 : p  0.6 (new medicine is better).

and the corresponding test

“Reject H 0 if pˆ  0.65 ”

Using normal approximation

pˆ  p
~ N 0,1
.

p 1  p  50

the power function can be evaluated as

K  p   P Reject H 0 | p 

 P  pˆ  0.65 | p 

 50  pˆ  p  50 0.65  p  
 P   | p 
 p 1  p  p 1  p  

 50 0.65  p  
 1    
 for p  0,1
 p 1  p  

1
0.9
0.8
0.7
0.6
K(p) 0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
p

P.105
Stat2602 Probability and Statistics II Fall 2014-2015

Definition

The size of a test is the maximum of the probability of a type I error, i.e.

size  max K θ  .
θ0

A test is said to have significance level  if its size is less than or equal to  . In
many cases, the size and the significance level of a test are equal.

The significance level (or the size) of a test represents the worst scenario of falsely
rejecting the null hypothesis. Thus, if we set the significance level of the test to
0.05, we are guaranteeing that the probability of a type I error is at most 0.05.

Example 5.8

Consider the power function determined in Example 5.7,

 50 0.65  p  
K  p   1    
 for p  0,1 .
 p 1  p  

50 0.65  p 
Using simple calculus, it can be easily shown that the expression is
p 1  p 
a strictly decreasing function of p. Therefore the power function K  p  is strictly
increasing (as can also be seen from the plot of K  p  ). Under H 0 : p  0.6 , the
maximum of K  p  will be attained at p  0.6 , i.e. the size of the test is

 50 0.65  0.6 
size  max K  p   1      1   0.72   0.2358

p 0.6
 0 . 6 1  0 . 6  

There would be at most 23.6% chance to make the type I error. To construct a test
with at most 5% type I error probability, we may solve the equation

 50 c  0.6  
0.05  1    
 0.61  0.6 

and obtain the critical value c  0.7140 , so that the corresponding test becomes

“Reject H 0 if pˆ  0.714 .”

P.106
Stat2602 Probability and Statistics II Fall 2014-2015

Definition

The significance level of a test is defined as an upper bound of the type I error
probability of not committing a Type II error, i.e. power  1   . For a composite
alternative H 1 : θ  1 , the power of the test at a point θ1  1 is the value of the
power function at that point:

power at θ1  P Reject H 0 | θ1   K θ1  .

The power of a test is the probability of correctly rejecting the null hypothesis
when the null hypothesis is false. The higher the power, the more sensitive the test
is to detecting the deviation from the null hypothesis if one actually exists. Since
we can always construct a test with a desirable size, the comparison among
different test procedures would be based on the power.

Example 5.9

Consider the hypotheses in Example 5.6:

H 0 : p  0.6 vs H 1 : p  0.6

and the test determined in Example 5.8 with size 0.05:

“Reject H 0 if pˆ  0.714 .”

The power function of this test is given by

 50 0.714  p  
K  p   1    
 for p  0,1 .
 p 1  p  

Using this power function, we can easily determine the power of the test:

at p  0.65 , power  K 0.65  0.1714 ;


at p  0.7 , power  K 0.7   0.4145 ;
at p  0.8 , power  K 0.8  0.9358 ; etc.

The test is not powerful when the new medicine slightly improves the cure rate,
and will be powerful only when there is a great improvement.

P.107
Stat2602 Probability and Statistics II Fall 2014-2015

In summary, the size and the power can be illustrated in the following graph of the
power function:

power = 0.9385
(at p = 0.8)

size = 0.05

H0 H1

General Steps in Testing Hypothesis

1. Identify a statistical model.

2. Formulate the null hypothesis H 0 and alternative hypothesis H 1 .

3. Specify the significance level  , which is the maximum probability that we


allow to make the type I error.

4. Choose a test statistic T with known (and nice, e.g. tabled) null distribution
(distribution under H 0 ).

5. Based on the null distribution of T, derive the size of the test. Find the rejection
region by setting the size to be less than or equal to the significance level  .

6. Compute the value of T based on the observed data.

7. If T falls in the rejection region, reject H 0 at significance level  ; otherwise do


not reject H 0 .

P.108
Stat2602 Probability and Statistics II Fall 2014-2015

Example 5.10

Standard medicine : cure rate p0  0.6


New medicine: believe cure rate p  0.6
Data: Out of 50 patients taken new medicine, 42 patients are cured.

Is the new medicine better?

General Steps
1. Statistical model: X = number of patients cured out of 50, X ~ b50, p  .

2. H 0 : p  0.6 vs H 1 : p  0.6

3. Require   0.05 , i.e. allow no more than 5% chance to falsely rejecting H 0 .

4. Choose the sample proportion pˆ  X n as the test statistic. Using normal


approximation,
pˆ  p
~ N 0, 1 .
.

p1  p  50

5. Intuitively, a large value of p̂ would indicate an evidence supporting H 1 . The


test should be in the form of “Reject H 0 if pˆ  c ”. The size of such a test can
be expressed as
  c p 
size  max 1    

 p 1  p  50  
p  0. 6 

 c  0 .6 
 1    
 (refer to Example 5.8)
 0 . 6 0 . 4  50 
 c  0.6 
 1   
 0.0693 

Setting the size to the significance level   0.05 , we have

 c  0.6  c  30
0.05  1      1.645  c  0.7140
 0.0693  0.0693

Hence the rejection region is given by pˆ  0.7140 , i.e. X  35.7 . Since X must
be an integer, we will reject H 0 if X  36 . Note the size of the test would be
less than 0.05.

P.109
Stat2602 Probability and Statistics II Fall 2014-2015

6. From the observed data, X  42  36 .

7. Therefore we reject H 0 at 5% significance level, i.e. the clinical trial data


strongly suggests that the new medicine performs better than the standard
medicine.

§ 5.4 One Sample Test Based on Normal Population

Let X 1 , X 2 ,..., X n be a random sample from the normal distribution N  ,  02 


where  0 is a known constant.

Two sided test H 0 :   0 vs H1 :   0

We may use the sample mean X , which is the MLE of  , as the test statistic and
reject H 0 if X differs too much from  0 .

X 
By the sampling distribution of X , we have Z ~ N 0,1 .
 n
X  0
Therefore under H 0 , Z ~ N 0,1 .
 n

We would reject H 0 if Z is too large or too small, i.e. if Z  c . The constant c is


determined according to the significance level of  :

  PReject H 0 | H 0 true  P Z  c | H 0 true   21  c 

Solving the equation gives the critical value c  Z  2 and hence the rejection rule is:

X  0
Reject H 0 at significance level  if  Z 2 .
 n

Using similar derivations, we can obtain the rejection rules for different settings of
the hypotheses and they are summarized below.

P.110
Stat2602 Probability and Statistics II Fall 2014-2015

Two sided test H 0 :    0 vs H 1 :    0

X  0
Reject H 0 at significance level  if  Z 2 .
 n

One sided test H 0 :    0 vs H 1 :    0

X  0
Reject H 0 at significance level  if  Z .
 n

One sided test H 0 :    0 vs H 1 :    0

X  0
Reject H 0 at significance level  if  Z .
 n

Example 5.11

A chemical process has produced on the average 800 tons of chemicals per day.
Some engineers claimed that the production has been declined recently due to the
depreciation of the machines. From the daily yield in past five days ( n  5 ),
X  795 is observed. If the amount produced on each day is assumed to be
normally distributed with known variance  2  75 . Should we take their claims?

Test H 0 :   800 vs H 1 :   800 at   0.05 .

X  800
At 5% significance level, we will reject H 0 if Z    Z 0.05  1.645 .
75 5

795  800
From the data, Z  1.291  1.645 .
75 5

Therefore H 0 is not rejected at 5% significance level, i.e. data didn’t show a


decrease in the mean daily yield from the chemical process.

The above test procedure guarantee only 5%, which is a small chance, that the type
I error would be made. What is the chance of making the type II error if the mean
daily yield actually decreases to 790?

P.111
Stat2602 Probability and Statistics II Fall 2014-2015

In terms of the sample mean, the rejection rule can be expressed as

X  800
 1.645  X  793.63 .
75 5

The power function of this test can be expressed as

 793.63   
K    PReject H 0 |      

 75 5 

The power of the test at   790 is calculated as

 793.63  790 
K 790        0.937   0.8256
 75 5 

and the corresponding type II error probability is   1  K 790   0.1744 .

Unknown Population Variance

Note that the above tests requires knowing the value of the population variance
 2 . In practical situations, the value of  2 is usually unknown. If it is the case, we
may estimate it by the sample variance S 2 and use the Student’s t-distribution,
which leads to the following rejection rules:

Two sided test H 0 :   0 vs H1 :   0

X  0
Reject H 0 at significance level  if  tn 1, 2 .
S n

One sided test H 0 :    0 vs H 1 :    0

X  0
Reject H 0 at significance level  if  tn 1, .
S n

One sided test H 0 :    0 vs H 1 :    0

X  0
Reject H 0 at significance level  if  tn 1, .
S n

P.112
Stat2602 Probability and Statistics II Fall 2014-2015

Example 5.12

Suppose the variance  2 is actually unknown in previous example and it was


estimated by the sample variance S 2  69.5 from the data.

Test H 0 :   800 vs H 1 :   800 at 5% significance level.

X  800
At 5% significance level, we will reject H 0 if T   t4 , 0.05  2.132 .
S 5

795  800
From the data, T  1.341  2.132 .
69.5 5

We conclude that H 0 is not rejected at 5% significance level, i.e. data didn’t show
a decrease in the mean daily yield from the chemical process.

§ 5.5 Significance Probability (p-value)

The above test procedure based on the comparison between the test statistic and
critical value is called the classical approach. Another equivalent approach relies
on the calculation of a quantity called the significance probability or simply the p-
value. The interpretation of “p-value” is the probability of the occurrence of the
particular observed value or more extreme values, under the assumption of H 0 .

The smaller the magnitude of p-value, the stronger is the evidence against H 0 .

Given a pre-assigned significance level  , one can either compare the observed
value of test statistic with the critical value; or first compute the p-value from the
observed value of test statistic and then compare with  .

compare
p-value 

compute compute

data critical value


compare

P.113
Stat2602 Probability and Statistics II Fall 2014-2015

The rejection rule

“Reject H 0 if test statistic is greater than the critical value” ……………. [1]

is equivalent to

“Reject H 0 if p-value < ”. ……………. [2]

To use procedure [1], we need to compute the critical value before drawing the
conclusion. Since different people may have different choices of the significance
level , different critical values will need to be determined.

On the other hand, procedure [2] only needs the calculation of the p-value.
Moreover, the p-value is more informative than just stating whether or not a
hypothesis is rejected. Reporting the p-value indicates just how unlikely the
observed event is under the null hypothesis and the users can make their own
decision on what to conclude in face of the evidence. Therefore in most statistical
packages, the p-value will be provided after analysing the data.

Example 5.13

The calculation of p-value depends on the tests and hypotheses. In Example 5.11,
the test statistic will be “more extreme” if we observe Z  1.291 . The p-value can
be calculated as
p  value  PZ  1.291 | H 0 is true 
   1.291
 0.0985

which is larger than   0.05 . This will lead to the non-rejection of H 0 and is
consistent with the conclusion drawn in Example 5.11.

P.114
Stat2602 Probability and Statistics II Fall 2014-2015

§ 5.6 Two Sample Test Based on Normal Populations

The derivations of the test procedures are quite similar to the construction of
confidence interval in Section 4.2 For brevity, only the two sample t-test under
equal variance assumption is illustrated here. The derivations under other situations
are left as exercise.

Suppose we have two independent random samples of sizes m and n respectively:

X 1 , X 2 ,..., X m ~ N  x ,  Y1 , Y2 ,...,Yn ~ N  y , 2 .
iid iid
2
,

Note that the two population variances are assumed to be the same.

m  1S 2
 n  1S y2
Estimate  by the pooled sample variance:
2
S 2
 x

mn2
pool

X  Y    x  y 
Recall, from Section 4.2, that ~ t m n2 .
S pool 1 m  1 n

X Y H 0

Therefore we can use the test statistic T ~ t mn2


S pool 1 m 1 n

and follow the rejection rules given below:

Two sided test H0 : x  y vs H 1 :  x   y

Reject H 0 at significance level  if T  tmn 2 , 2 .

One sided test H0 : x  y vs H 1 :  x   y

Reject H 0 at significance level  if T  tmn 2 , .

One sided test H 0 :  x   y vs H 1 :  x   y

Reject H 0 at significance level  if T  tm n 2 , .

P.115
Stat2602 Probability and Statistics II Fall 2014-2015

Example 5.14

The Rejoy company claims that their shampoo performs better than the shampoo
manufactured by another company, N&S. To assert their claim, they want to
compare these two brands of shampoo in their ability of removing dandruff. An
experiment was carried out in which 8 volunteers had used Rejoy and 6 volunteers
had used N&S to wash their hair regularly in one week. The following tables show
the data and summary statistics of the remaining dandruff on these volunteers after
one week:

Number of pieces of remaining dandruff


Rejoy (X) 81 102 63 121 95 76
N&S (Y) 130 121 151 86 156 113 91 144

Rejoy N&S
Sample size m6 n 8
Sample mean X  90.67 Y  129.0
Sample variance S x2  399.2 S y2  781.76

Test H 0 :  x   y vs H 1 :  x   y at   0.05 .

Pooled sample variance: 2


S pool 
6  1  399.2  8  1  781.76  622.36
682

X Y
At 5% significance level, we reject H 0 if T   t12 , 0.05  1.782 .
S Pool 1 6  1 8

90.67  129.0
From the data, T  2.845  1.782 .
622.36 1 6  1 8

Therefore H 0 is rejected at 5% significance level, i.e. Rejoy performs significantly


better than N&S.

P.116
Stat2602 Probability and Statistics II Fall 2014-2015

Remarks

1. For large samples, the t critical values will become the Z critical values.

2. The above tests can be easily modified to handle more general hypotheses. For
example, one may want to test whether  x is greater than  y by more than a
specific number c. Then the one-sided hypotheses can be formulated as

H0 : x  y  c vs H 1 :  x   y  c

and the test statistic becomes

X Y  c
T .
S pool 1 m  1 n

3. As for the construction of confidence interval, if the population variances are


not equal, the Satterthwaite’s approximation may be used.

§ 5.7 Two Samples Variance Test

The following procedure allows us to test the equal variance hypothesis based on
the sample variances S x2 and S y2 of two independent samples drawn from the two
normal populations.

S x2
Test statistic : F  2
Sy

Two sided test H 0 :  x2   y2 vs H 1 :  x2   y2


1
Reject H 0 at  if F  Fm1,n1, 2 or F  .
Fn1,m1, 2

One sided test H 0 :  x2   y2 vs H 1 :  x2   y2


Reject H 0 at significance level  if F  Fm 1,n 1, .

One sided test H 0 :  x2   y2 vs H 1 :  x2   y2


1
Reject H 0 at significance level  if F  .
Fn1,m1,

P.117
Stat2602 Probability and Statistics II Fall 2014-2015

Example 5.15

From the shampoo data in Example 5.14, m  6 , n  8 , S x  20.67 , S y  26.33 .


The following two-sided hypotheses will be tested.

H 0 :  x2   y2 vs H 1 :  x2   y2

20.67 2
Test statistic : F  0.616
26.332

1 1
Critical values: F0.025 , 5, 7  5.29 ,   0.146
F0.025, 7 , 5 6.85

The test statistic F  0.616 is between these two critical values. According to the
test procedure, the hypothesis of equal variance is not rejected, i.e. the variances
are not significantly different from each other.

Remarks

1. As for the construction of confidence interval, all the test procedures described
above relies much on the normal population assumption. If the normal
assumption is violated, the above procedures are still valid only on large
samples location problems. For small sample problems with non-normal
population(s), we will need to use non-parametric statistical methods. Also,
the equal variance test is not valid even for large samples, if the normal
assumption is not satisfied.

2. Independence between the two samples is also a crucial assumption for the
above test procedures on two sample problems.

3. The significance level must be pre-assigned before we analyse the data.

P.118
Stat2602 Probability and Statistics II Fall 2014-2015

§ 5.8 Paired Data

Sometimes data does not come in the form of independent samples. For example,
the midterm and final examination results of the same group of students are
obviously dependent. The IQ scores of a group of fathers may be dependent to the
IQ scores of their sons. These kinds of dependent samples are called paired
samples.

Data structure:

Pair Treatment 1 Treatment 2 Difference


1 X1 Y1 D1  X 1  Y1
2 X2 Y2 D2  X 2  Y2
   
n Xn Yn Dn  X n  Yn

D
1 n
 Di  X  Y , S D2 
1 n
 Di  D 2
n i 1 n  1 i 1

The pairs {  X 1 , Y1  ,  X 2 , Y2  , …,  X n , Yn  } can be viewed as a random sample and


so as D1 , D2 ,..., Dn . Therefore the methods for one sample location problem can
be used.

D1 , D2 ,..., Dn ~ N  , D2  ,  , D2 unknown
iid

The hypothesis H 0 :  x   y is equivalent to the hypothesis H 0 :   0 . Therefore


to compare  x with  y , we can use the test statistic

D H0
T ~ t n 1
SD n

and follow the rejection rules given below:

Two sided test H 0 :   0 vs H 1 :   0


Reject H 0 at significance level  if T  tn 1, 2 .

One sided test H 0 :   0 vs H 1 :   0


Reject H 0 at significance level  if T  tn 1, .

P.119
Stat2602 Probability and Statistics II Fall 2014-2015

One sided test H 0 :   0 vs H 1 :   0

Reject H 0 at significance level  if T  tn 1, .

We can also derive a 1001   % confidence interval for    x   y , which is


given by
S
D  tn 1, 2 D .
n

Example 5.16

Consider the comparison of the two brands of shampoo in previous example.


Suppose another experiment was carried out by N&S on five volunteers. Each of
them used Rejoy for a week, followed by N&S for another week. The number of
pieces of dandruff were recorded and listed as follows:

Volunteer A B C D E
Rejoy 105 62 78 112 96
N&S 97 70 54 85 93
Di 8 –8 24 27 3

To determine whether N&S is better, we can use the paired t-test.

Test H 0 :   0 vs H 1 :   0 at   0.05 .

D
At 5% significance level, we will reject H 0 if T   t4 , 0.05  2.132 .
SD 5

10.8
From the data, D  10.8 , S D  14.65 , and T   1.648  2.132
14.65 5

Therefore H 0 is not rejected at 5% significance level, i.e. N&S is not significantly


better than Rejoy.

A 90% confidence interval of    x   y is given by

 10.8  18.2   7.4 , 29.0 .


SD 14.65
D  t4 , 0.025  10.8  2.776 
n 5

P.120
Stat2602 Probability and Statistics II Fall 2014-2015

§ 5.9 Tests for Population Proportions

Hypothesis involving population proportion p would be formulated as


H 0 : p  p0 . Based on a sample with size n which is large enough, we can use the
normal approximation as the null distribution of the following test statistic:

pˆ  p0
~ N 0, 1
H0

Z
p0 1  p0  n

where p̂ is the sample proportion. The rejection rules are given below:

Two sided test H 0 : p  p0 vs H 1 : p  p0

Reject H 0 at significance level  if Z  Z 2 .

One sided test H 0 : p  p0 vs H 1 : p  p0

Reject H 0 at significance level  if Z  Z  .

One sided test H 0 : p  p0 vs H 1 : p  p0

Reject H 0 at significance level  if Z   Z  .

Example 5.17

Is the old sayings “9 out of 10 bald-headed are rich” real?

Suppose in a sample of 100 bald-headed people, we found 78 of them as rich.

Test H 0 : p  0.9 vs H 1 : p  0.9 at   0.01 .

pˆ  0.9
At 1% significance level, we reject H 0 if Z    Z 0.01  2.326 .
0.90.1 100

78 0.78  0.9
From the data, pˆ   0.78 and Z    4  2.326 .
100 0.90.1 100

Therefore H 0 is rejected at 1% significance level, i.e. the data shows sufficient


evidence for us to reject the old sayings.

P.121
Stat2602 Probability and Statistics II Fall 2014-2015

Remark

Some statisticians may use the test statistic

pˆ  p0
Z
pˆ 1  pˆ  n

as asymptotically its null distribution is also N 0,1 . The use of the sample
proportion p̂ instead of the hypothetical value p0 in the denominator may provide
better estimates of the standard error when the null hypothesis is clearly false.
However, using p0 can result better approximation to  -level significance tests.
Thus there are trade-offs and it is difficult to say one is better than the other.
Fortunately, the numerical answers are usually about the same.

Comparison of two proportions

Hypothesis concerning about the comparison of two population proportions, p1


and p2 , is usually formulated as H 0 : p1  p2 . Based on two random samples
independently drawn from these two population, we have

 p  p    p  p  ~ N 0, 1
1 2 1 2
.

p 1  p  p 1  p 
1
 1 2 2

n1 n2

when the sample sizes n1 and n2 are large, where pˆ 1 , pˆ 2 are the corresponding
sample proportions. If H 0 : p1  p2 is true, the common proportion can be
estimated by using the pooled sample proportion

n1 pˆ 1  n2 pˆ 2
pˆ pool  .
n1  n2

The test statistic is therefore given by

pˆ 1  pˆ 2
Z
pˆ pool 1  pˆ pool 1 n1  1 n2 
.

P.122
Stat2602 Probability and Statistics II Fall 2014-2015

The rejection rules are given below:

Two sided test H 0 : p1  p2 vs H 1 : p1  p2

Reject H 0 at significance level  if Z  Z 2 .

One sided test H 0 : p1  p2 vs H 1 : p1  p2

Reject H 0 at significance level  if Z  Z  .

One sided test H 0 : p1  p2 vs H 1 : p1  p0

Reject H 0 at significance level  if Z   Z  .

Example 5.18

One production process yielded 27 defective pieces in a random sample of size 400
while another yielded 15 defective pieces in a random sample of size 300. Test the
null hypothesis that the two processes yield equal proportions of defectives, against
the alternative hypothesis that the defective rates are different.

Test H 0 : p1  p2 vs H 1 : p1  p2 at   0.05 .

27 15
Sample proportions : pˆ 1   6.75% , pˆ 2   5%
400 300

27  15
Pooled sample proportion: pˆ pool   6%
300  400

0.0675  0.05
Test statistic: Z  0.965
 1 1 
0.06  0.94    
 400 300 

Since Z  0.965  1.96  Z 0.025 , we do not reject H 0 at   0.05 , i.e. the defective
rates of the two processes are not significantly different.

P.123
Stat2602 Probability and Statistics II Fall 2014-2015

Remark

Some statisticians may use the test statistic

Z
 pˆ
1  pˆ 2 
pˆ 1 1  pˆ 1  pˆ 2 1  pˆ 2 

n1 n2

so as to provide better a approximation to the standard error when the null


hypothesis is clearly false. As mentioned before, there are trade-offs and we
usually have no strong preference one way or the other as the two methods provide
about the same numerical result.

Example 5.19

A tracking study in MIT found that out of 198 ISP (Integrated Studies Program)
students, 189 graduated within the time limit of six years; out of a random sample
of 210 non-ISP students, 158 graduated within the time limit. The data can be
tabulated as

ISP Non-ISP
Graduated 189 158
Not Graduated 9 52
Total 198 210

We are interested in the difference between the population proportions of


graduated students from the two program  p ISP  p N  .

Consider the test H 0 : p ISP  p N vs H 1 : p ISP  p N at   0.05 .

189 158
Sample proportions : pˆ ISP   0.9545 , pˆ N   0.7524
198 210

0.9545  0.7524
Test statistic: Z  6.0757
0.95450.0455 198  0.75240.2476 210
Since Z  6.0757  1.645  Z 0.05 , we reject H 0 at   0.05 , i.e. the graduation rate
of ISP students is significantly higher than Non-ISP students.

P.124

You might also like