0% found this document useful (0 votes)
76 views33 pages

Testing of Hypothesis - Two Samples

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 33

Hypothesis Testing

(Significance Test)
TWO INDEPENDENT
UNIVARIATE
POPULATIONS
Sampling
In order to compare two groups (populations), we have to select
samples from both the groups. If the observations in one sample
are independent of those in the other, then those are
called independent samples.
Eg. Suppose we want to compare two drugs. We select a sample of
patients and randomly allocate them to the two drugs. These two
groups of patients (and also the observations coming from them)
will constitute independent samples since they were randomly
allocated to the two groups corresponding to the two drugs.
Significance test for difference
between population proportions
Notations:
• p1 (p2) : population proportion of success in the first
(second) group.
• n1 (n2) : sizes of random samples drawn from the first
(second) populations.

Assumptions :
• Independent random samples from the two groups.
• Large enough sample sizes so that in each sample there
are at least 5 success and 5 failures.

Hypotheses:

H 0 : p1  p 2 H 0 : p1  p 2 H 0 : p1  p 2
H 1 : p1  p 2 H 0 : p1  p 2 H 0 : p1  p 2
Test Statistic: difference between sample proportions
= pˆ  pˆ
1 2

Let x1 and x2 represent the number of observations that belong to


the class of interest in samples 1 and 2, respectively.
x1 x2
pˆ 1  ; pˆ 2 
n1 n2
x1  x2
pˆ 
n1  n2
By CLT,
  1 1 
 ˆ ) 
ˆ1  p
p ˆ2  N  p1  p2 , p ˆ (1  p n  n  
  1 2 

ˆ1  p
p ˆ2
z obs  ~ N (0,1), under H 0
 1 1 
p (1  p )
ˆ ˆ   
 n1 n2 
Critical Region :

Right tail :  {zobs  z }


Left tail :  {zobs   z }
Two tail :  {zobs  z / 2 or z   z / 2 }
Example:
Two population Proportions
Is there a significant difference between the
proportion of men and the proportion of women
who will vote “Yes” on a proposal from local
administration.

• In a random sample, 36 of 72 men and 31 of 50


women indicated they would vote “Yes”

• Test at the .05 level of significance


Example: contd..

H0: p1 – p2 = 0 (the two proportions are equal)


H1: p1 – p2 ≠ 0 (there is a significant difference between proportions)

sample proportion of men  pˆ 1  36 / 72  0.5


sample proportion of women  pˆ 2  31 / 50  0.62

 The pooled estimate for the overall proportion is:

x1  x 2 36  31 67
pˆ     .549
n1  n 2 72  50 122
Example: contd.

The test statistic for testing H0

z
 pˆ1  pˆ 2    p1  p2 
1 1
pˆ (1  pˆ )   
 n1 n2 


 .50  .62   0   1.31
 1 1 
.549 (1  .549)   
 72 50 

Critical Values = ±1.96 Conclusion: There is not


For  = .05 significant evidence of a
difference in proportions who
Decision: Do not reject H0 will vote yes between men and
women.
Significance test for difference
between population means
(population sd’s are known)
(both small and large sample)
Notations:
• μ1 (μ2) : population mean in the first (second) group.
•σ1 (σ2) : population sd in the first (second) group.
• n1 (n2) : sizes of random samples drawn from the first
(second) populations.

Assumptions :
• Independent random samples from the two populations
(normal distributions) are drawn.
• n1 and n2 large (small)

Hypotheses:

H 0 : 1   2 H 0 : 1   2 H 0 : 1   2
H 1 : 1   2 H 0 : 1   2 H 0 : 1   2
Test Statistic: difference between sample means
= ˆ  ˆ  x  x (unbiased estimators )
1 2 1 2

By CLT,

  12  22 
x1  x2  (~) N  1   2 ,  
 n1 n2 
( x1  x2 )  ( 1   2 )
zobs  ~ N (0,1), under H 0
 12  22

n1 n2
Critical Region :

Right tail :  {zobs  z }


Left tail :  {zobs   z }
Two tail :  {zobs  z / 2 or z   z / 2 }
Example
A product developer is interested in reducing the drying time of
primer paint. Two formulations of the paint are tested.
Formulation-1 is the standard chemistry and Formulation-2 has a
new drying ingredient that should reduce the drying time. From
experience it is known that the standard deviation of drying time is
8 minutes and this inherent variability should be unaffected by the
addition of the new ingredient. Ten specimens are painted with
Formulation-1 and another 10 specimens are painted with
Formulation-2; the 20 specimens are painted in random order. The
two sample average drying times are 121 minutes and 112 minutes
respectively. What conclusions can be the product developer draw
about the effectiveness of the new ingredient, using α = 0.05.
Example: contd..
X 1 : drying time from formulation 1
X 2 : drying time from formulation 2
Let
X 1 ~ N ( 1 ,  12 )
X 2 ~ N (  2 ,  22 )
X 1 and X 2 are independen tly distribute d

H 0 : 1   2
H 1 : 1   2

x1  x2
Z 0bs 
 12  22

n1 n2
Example : contd..
Reject H0 if Z0bs ≥ 1.645 at α = 0.05
Computations:
Since x =121 minutes, x2 =112 minutes,
1
σ1 = σ22 = 82 = 64 minutes and n1= n2 = 10, the value of the test
2

statistics is,
121  112
Z0   2.52
8 2
8 2

 10  
 10  
Conclusion: Since Z0 = 2.52 > 1.645, we reject H0: μ1 - μ2 = 0 at the 0.05
level of significance and conclude that adding the new ingredient to the
paint significantly reduces the drying time.
Significance test for difference
between population means
(population sd’s are unknown)
Small sample
Notations:
• μ1 (μ2) : population mean in the first (second) group.
•σ1 (σ2) : population sd in the first (second) group.
• n1 (n2) : sizes of random samples drawn from the first
(second) populations.

Assumptions :
• Independent random samples are drawn from normal
distributions
• σ1 = σ2 = σ (say)

Hypotheses:

H 0 : 1   2 H 0 : 1   2 H 0 : 1   2
H 1 : 1   2 H 0 : 1   2 H 0 : 1   2
Test Statistic: difference between sample means

= ˆ1  ˆ 2  x1  x2 (unbiased estimators )


1 n1
ˆ1  s1'  
n1  1 i 1
( x1i  x1 ) 2

1 n2
ˆ 2  s2'  
n2  1 i 1
( x 2i  x 2 ) 2

Pooled estimator of  2 is
( n  1) s '2
 ( n  1) s '2
s '2  1 1 2 2
n1  n2  2
 2 1 1 
x1  x2 ~ N  1   2 ,  (  ) 
 n1 n2 
( n1  n2  2) s '2
~  2
n1  n2  2
2

( x1  x2 )  ( 1   2 )
tobs  ~ t n1  n2  2 , under H 0
1 1
s '

n1 n2
Critical Region :

Right tail :   {tobs  t ;n1  n2  2 }


Left tail :   {tobs  t ;n1  n2  2 }
Two tail :   {tobs  t / 2;n1  n2  2 or tobs  t / 2;n1  n2  2 }
Example
A sample of scores on an examination given in statistics are

Men : 72 69 98 66 85 76 79 80 77
Women : 81 67 90 78 81 80 76

Is the mean score of women is same as that of men?

Hypothesis:
H0 : µf = µm H1: µf ≠ µm
Solution:
Women Men
Mean 79 78
Variance 47.33333333 90
Observations 7 9
Pooled Variance 71.71428571
Hypothesized Mean
Difference 0
df 14
t Stat 0.234318967
P(T<=t) one-tail 0.409064729
t Critical one-tail 1.761310115
P(T<=t) two-tail 0.818129458
t Critical two-tail 2.144786681
BIVARIATE POPULATION
Sampling

Two samples are dependent if the observations in one sample are


dependent (or not independent) of the observations in the
other sample.
In this case, each subject is measured (or observed) at two times.
The observations recorded at the two time points (for all the
subjects) constitute the two samples. Thus, both the samples have
the same subjects., hence called bivariate. In this case, data is
called “PAIRED DATA”
Eg : We take a group of students and measure their weights before
and after they are subjected to a change of diet.
Significance test for difference
between population means
Notations:
• μ1 (μ2) : population mean in the first (second) group.
•σ1 (σ2) : population sd in the first (second) group.
• n : sample size ; “n” pairs of data
• Let (x11, x21), (x12, x22), … , (x1n, x2n) be a set of n paired
observations of a sample drawn from two populations with
means μ1 and μ2 and variances σ12 and σ22 respectively.

Assumptions :
• samples are drawn from normal distributions.
PAIRED t-TEST
 Define the differences between each pair of observations as
Dj = x1j - x2j, j = 1,2, … , n.

 Then ‘Dj’s are assumed to be normally distributed with mean


μD = μ1 - μ2 and variance σD2.

 Testing hypothesis about the difference between μ1 and μ2 is


accomplished by performing a one-sample t-test on μD.

 Hypotheses:
H0 : D  0 H0 : D  0 H0 : D  0
H1 :  D  0 H 0 :  D  0 H 0 :  D  0
PAIRED t-TEST
Test Statistic

 Test statistic for testing H0 is given by


D  D
tobs 
ˆ D / n
which follows t-distribution with (n-1) degrees of freedom where

1
1
d   di
n
s 
2
D
n 1
 ( di  d ) 2
Critical Region :

Right tail :   {tobs  t ;n 1}


Left tail :   {tobs  t ;n 1}
Two tail :   {tobs  t / 2;n 1 or tobs  t / 2;n 1}
Example
Advertisements by a fitness center claim that its course will
result in losing weight. A random sample of eight recent
participants showed the following weights before and after
completing the course. What should you conclude?
Serial no : 1 2 3 4 5 6 7 8
Weight (before) : 155 228 141 162 211 164 184 172
Weight (after) : 154 207 147 157 196 150 170 165
Solution
t-Test: Paired Two Sample for Means

Before after
Mean 177.125 168.25
Variance 857.8392857 485.6429
Observations 8 8
Pearson Correlation 0.981101026
Hypothesized Mean Difference 0
df 7
t Stat 2.861003291
P(T<=t) one-tail 0.012151345
t Critical one-tail 1.894578604
P(T<=t) two-tail 0.024302691
t Critical two-tail 2.364624251

You might also like