0% found this document useful (0 votes)
18 views45 pages

Statistics07 TwoSamplesHypothesisTest

The document outlines the process of hypothesis testing for two independent samples, focusing on comparing mean values and proportions. It details the six steps involved in hypothesis testing, the construction of test statistics (z and t), and various cases based on known or unknown population variances. Additionally, it discusses methods for comparing proportions and provides guidance on using statistical software like Excel for calculations.

Uploaded by

Nguyễn Đại
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views45 pages

Statistics07 TwoSamplesHypothesisTest

The document outlines the process of hypothesis testing for two independent samples, focusing on comparing mean values and proportions. It details the six steps involved in hypothesis testing, the construction of test statistics (z and t), and various cases based on known or unknown population variances. Additionally, it discusses methods for comparing proportions and provides guidance on using statistical software like Excel for calculations.

Uploaded by

Nguyễn Đại
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 45

Hypothesis tests for two

independent samples

• Compare mean values of two


populations

• Compare two proportions


Two independent samples
model

Model of two groups of objects with different


a) Intervention levels,
b) Individual proper
14.3

6 steps of hypothesis testing…

We follow the 6 steps to perform hypothesis


testing when comparing two populations:
1.Specify the null and alternative hypotheses
2.Determine the test statistic
3.Specify the significance level
4.Define the decision rule
5.Calculate the value of the test statistic
6.Make conclusions
Problem 1. Compare two mean values

Let ( X 1, X 2 ,..., X n ) be a sample of n independent


observations from a variable X with expectation 1 and
2
variance 

(Y1,Y2 ,...,Ym ) be a sample of m independent


observations from a variable Y with expectation 2 and
2
variance 

Problem: Compare two expectations 1 and 2 .


 Estimate and compare two mean values X and Y .
Case 1: Testing a hypothesis about μ1
– μ2 when the population variances
are known

((xx11  xx22)) ((  ))


zz 
  
 
 
nn11 nn22
Cases 2 - 3: Testing a hypothesis about μ1 –
μ2 when the population variances are
unknown
Practically, the z-statistic is hardly used, because the
population variances, σ12 and σ22, are usually not
known and estimated by sample variances, s12 and s22.

((xx11  xx22))  ((  ))


ZtZ 

 
 

 2 
2 2
s1 s
nn11 nn22
Instead of a z-statistic, we construct a t-statistic using
the sample ‘variances’ (s12 and s22).
Cases 2 - 3: Testing a hypothesis about μ1 –
μ2 when the population variances are
unknown
Two cases are considered when producing the t-
statistic:
Case 2: The two unknown population variances
are equal.
Case 3: The two unknown population variances
are not equal.
Case 2: Unknown but equal
variances
Calculate the pooled variance estimate by:

The pooled ( n  1) s 2
 ( n  1) s 2

variance S p2  1 1 2 2

n1  n2  2
estimator
14.
10

Construct the equal-variances t-statistic as follows:

( x1  x2 )  (    )
t
1 1
s 2p (  )
n1 n2
d . f . n1  n2  2

Perform an equal-variances t-test of μ1 – μ2


H0: μ1 - μ2 = 0
HA: μ1 - μ2 ≠ 0;
The problem can be solved by using the following
Theorem:

Theorem. Let ( X , X ,..., X ) and (Y1 , Y2 ,..., Ym ) be two


1 2 n
samples of independent observations selected correspondingly from a
2
variable X with sample mean X and sample variance S X and
from a variable Y with sample mean Y and sample variance S 2
Y
(both variables are normal distributed with common variance). If the
hypothesis H is true (µ1 = µ2) then the variable (statistic)

n.m nm 2
t  ( X  Y )
n  m (n  1) S X2  (m  1) SY2
has Student distribution with (n+m-2) degrees of freedom.
Hypothesis Tests

Hypothesis

H: µ1 = µ2

Alternative Hypothesis

K: µ1 ≠ µ2
Steps of testing
Step 1. Estimate sample mean values
Mean(X) , Mean(Y) and sample variances
Var(X) , Var(Y)

Step 2. Calculating perform the quantity


n.m nm 2
t  ( Mean( X )  Mean(Y ))
n  m (n  1) Var ( X )  (m  1) Var (Y )
Step 3 (p-value approach). Taking a
variable T(n+m-2) of Student distribution
with (n + m - 2) degrees of freedom
calculate the p-value (probability)

b = P { | T(n+m-2) | ≥ | t | }
Step 4. Compare the p-value b with a given
ahead significance level α (=5%, 1%, 0.5%
or 0.1%):

+ If b ≥ α  accept Hypothesis H and


conclude
µ1 = µ2

+ If b < α  reject Hypothesis H and


confirm
µ1 ≠ µ2
Using Excel to Compute t - Distribution

• Excel has two functions for computing cumulative


probabilities and x values for any t - distribution:

• T.DIST is used to compute the cumulative


probability given an x value.

• T.INV is used to compute the x value given a


cumulative probability.

16
Version B. Using Student critical value
Calculate the critical value T(n+m-2)(1-α/2) of
Student distribution with n+m-2
degrees of freedom (α is a given ahead
significance level =5%, 1% or 0.5%)
Decide
- Reject Hypothesis H if
|t| ≥ T(n+m-2)(1-α/2)
- Accept Hypothesis H if
|t| < T(n+m-2)(1-α/2)
Version C. Using confidence intervals
When degree of freedom (sample size) is large,
Student distribution approximates Normal
distribution. Then we can use confidence
intervals (with significance level of 5%) for
testing:
( n 1) SD( X ) ( n 1) SD( X )
( Mean( X )  T(1  /2)  ; Mean( X )  T(1  /2)  ),
n n
( m 1) SD (Y ) ( m 1) SD (Y )
( Mean(Y )  T(1  /2)  ; Mean(Y )  T(1  /2)  )
m m
Decide
Reject Hypothesis H if the two intervals
disjoin
Accept Hypothesis H if the two intervals have
nonempty intersection
Case 3: Unknown and unequal
variances
Construct the unequal-variances t-statistic as follows:

( x1  x2 )  ( 1  2 )
t
s12 s22
(  )
n1 n2
( s12 n1  s22 / n2 )2
with d . f .  2
( s12 2
n1 ) ( s22
n2 )

n1  1 n2  1

Then the hypothesis testing procedure remains the


same as of Case 2
Problem 2. Compare two
proportions – the case of large
sample sizes

Let ( X1, X 2 ,..., X n1 ) be a sample of a binary variable X taking


value 1 with probability p1 and value 0 with probability (1  p1 ) ,
(Y1, Y2 ,..., Yn2 ) be a sample of a binary variable Y taking value 1
with probability p2 and value 0 with probability (1  p2 ) ;
p1,p2  (0,1).

Consider the Hypothesis H: p1 = p2


and Alternative Hypothesis K: p1  p2
Note. Variable X has expectation p1 and variance p1 (1- p1 ).
Variable Y has expectation p2 and variance p2 (1- p2 ).
Therefore we can treat the testing problem as a special problem
of comparing two mean values (expectations) p1 and p 2 .
By Moivre-Laplace Theorem, for large sample
size,
n1×p1 ≥ 5 and n1×(1-p1) ≥ 5,
n2×p2 ≥ 5 and n2×(1-p2) ≥ 5,
the sample proportions m(p1)/n1 and m(p2)/n2
of appearance of number 1 have
distributions approximate to normal
distribution with expectation p1, p2 and
variance p1 ×(1-p1)/n1, p2 ×(1-p2)/n2,
respectively. Denote m1 = m(p1) and m2 =
m(p2).
If the Hypothesis H is true then use the two samples ( X1, X 2 ,..., X n1 )
and (Y1, Y2 ,..., Yn2 ) as samples collected from one variable and estimate
the common variance of X and Y by
m1  m2 m  m2 m  m2 n1  n2  m1  m2
.(1  1 ) 1 .
n1  n2 n1  n2 n1  n2 n1  n2
then perform a statistic
 m1 m2   m1  m2 n1  n2  m1  m2 n1  n2 
u   / . . 
 1n n2   n1  n2 n1  n2 n .n
1 2 
for testing, where m1 and m2 respectively are the numbers
of values 1 appeared in the above two samples.
By Central Limit Theorem, when sample sizes
are large, the difference Mean(X) - Mean(Y)
has a distribution very close to Normal
distribution. Then the testing procedure can
be as follows:
Step 1. Calculate value of statistic

 m1 m2   m1  m2 n1  n2  m1  m2 n1  n2 
u    / . . 
 n1 n2   n1  n2 n1  n2 n1.n2 

Step 2. Taking Normal distribution N(0,1) find


the probability (p-value)
b = P { | N(0,1) | > | u | }
Step 3. Compare the probability b (p-value) to
the given ahead significance α
* If b ≥ α  Accept Hypothesis H , confirm
the equality of two proportions

* If b < α  Reject Hypothesis H and


conclude two proportions to be different
Version B. Using Normal critical value

Looking in Table of Normal distribution


find out critical value uα/2 of Normal
distribution (the critical value for α = 5%
equals 1.96)
Decide
- Reject Hypothesis H if
|u| ≥ uα/2
- Accept Hypothesis H if
|u| < uα/2
Version C. Using confidence intervals
Use confidence intervals (with significance
level of α) of estimated proportions for
testing:
 m1 m1 m1 m1 m1 m1 
  Z1  /2 * (1  ) / n1 ;  Z1  /2 * (1  ) / n1 
 n1 n1 n1 n1 n1 n1 

 m2 m2 m2 m2 m2 m2 
  Z1  /2 * (1  ) / n2 ;  Z1  /2 * (1  ) / n2 
 n2 n2 n2 n2 n2 n2 
Decide
Reject Hypothesis H if the two intervals
disjoin
Accept Hypothesis H if the two intervals
have nonempty intersection
Compare several proportions
Let X be a binary variable taking two values 0 and 1 .
Collecting data from that variable under k different
conditions we have a sample containing k groups of
observations related with the conditions
Let p1, p2 ,..., pk
be probabilities of appearance of value 1 of
variable X under each of the above k
conditions.
Hypothesis
p1  p2 ...  pk
H:
Alternative Hypothesis
K: there is certain difference between p1, p2 ,..., pk
Data: Perform a 2xk table of 2 rows and k columns:
each column for one group, the 1rst row for value 1,
the 2nd row for value 0 of the variable at
observations:

Table 1. Observed frequency

n1 n11  n12  ...  n1k ; n0 n01  n02  ...  n0k


n ( j ) n j1  n j 0 ; j 1,2,..., k ; n n0  n1
Compare several proportions

• If the hypothesis is correct, the


proportion of occurrence of 1
estimated commonly to all
columns (conditions) is equal to
n1 / n
• The proportion of occurrence of
0 estimated commonly to all
columns is equal to
n0 / n
Perform the table of expected (theoretical)
frequencies of the hypothesis:

Table 2. predicted (expected) frequency


Perform the table of the test statistic:

k 1 ( j) ( j)
n .n n .n
 2    ( nij  i )2 /( i )
j 1 i 0
n n
LEMMA. Suppose that hypothesis H is true.
2
Then variable  has distribution approximate
to the Chi-square distribution with ( k  1)
2
degrees of freedom  (k-1) .
Density function of Chi – squared distribution
Using Excel to Compute Chi – squared
Distribution

• Excel has two functions for computing cumulative


probabilities and x values for any Chi - squared
distribution:
• CHI.DIST is used to compute the cumulative
probability given an x value, p-value.

• CHI.INV is used to compute the x value given a


cumulative probability, critical value.

35
Method A (p-value):

Step 1. Taking a variable ꭓ2(k-1) of Chi-


squared distribution with (k-1) degrees of
freedom calculate the probability (p-value)
b = P {ꭓ2(k-1) > }2 .
Step 2. Compare the probability b to the
given ahead significance level α :
* If b ≥ α  accept hypothesis H ,
conclude the all proportions are equal

* If b < α  reject hypothesis H , confirm


the appearance of some difference between
proportions.
Method B. (Critical value)
Looking in Table of Chi-squared
distribution to find (2critical
k  1) ( ) value
of Chi-squared distribution with k-1
degrees of freedom (α is a given ahead
significance level =5%,1% or 0.5%)
Decide
- Reject Hypothesis H: = if
2 2
  ( k  1) ( )
- Accept Hypothesis
2 2
H: = if
   ( k  1) ( )
Test for two related (paired)
samples

• Compare two mean


values
C. Model of two dependent
(paired) samples

• Two dependent samples model is used in a study when


• A) Each object in the first sample is chosen together with a similar
(paired) object in the second sample, or
• B) Any object in the second sample is the same one in the first sample,
but the measures in the two samples are taken under different
conditions.
Compare mean values of two
related samples

For related variables X and Y , the


comparison of mean values is equivalent to
the comparison the mean value of the
difference variable X – Y to value 0 
the problem reduces to one-sample
model.
Compare mean values of two related samples

Hypothesis
Alternative Hypothesis
where and are the
expectations of X and Y

With ,
comparing expectations of to
0:
Hypothesis
Alternative Hypothesis
Compare mean values of two related
samples
With the empirical value of the test statistic

a)Compare the empirical value of the t-test


statistic with the critical value
, which is the

percentile of the Student


distribution with n-1 degrees of freedom:
• - If  reject the hypothesis H ,
• - If  accept H .
Compare mean values of two related
samples
b) Taking a random variable T having
Student distribution with n-1 degrees of
freedom, calculate the probability of
significance

Compare the probability of significance


with the significance level :
• - If  reject H ,
• - If  accept H .
Compare mean values of two related
samples

c) Determine the 95% confidence


interval of the estimation :

Compare 0 to the confidence


interval:

• - If  reject H,

• - If  accept H .

You might also like