Statistics07 TwoSamplesHypothesisTest
Statistics07 TwoSamplesHypothesisTest
independent samples
2
2 2
s1 s
nn11 nn22
Instead of a z-statistic, we construct a t-statistic using
the sample ‘variances’ (s12 and s22).
Cases 2 - 3: Testing a hypothesis about μ1 –
μ2 when the population variances are
unknown
Two cases are considered when producing the t-
statistic:
Case 2: The two unknown population variances
are equal.
Case 3: The two unknown population variances
are not equal.
Case 2: Unknown but equal
variances
Calculate the pooled variance estimate by:
The pooled ( n 1) s 2
( n 1) s 2
variance S p2 1 1 2 2
n1 n2 2
estimator
14.
10
( x1 x2 ) ( )
t
1 1
s 2p ( )
n1 n2
d . f . n1 n2 2
n.m nm 2
t ( X Y )
n m (n 1) S X2 (m 1) SY2
has Student distribution with (n+m-2) degrees of freedom.
Hypothesis Tests
Hypothesis
H: µ1 = µ2
Alternative Hypothesis
K: µ1 ≠ µ2
Steps of testing
Step 1. Estimate sample mean values
Mean(X) , Mean(Y) and sample variances
Var(X) , Var(Y)
b = P { | T(n+m-2) | ≥ | t | }
Step 4. Compare the p-value b with a given
ahead significance level α (=5%, 1%, 0.5%
or 0.1%):
16
Version B. Using Student critical value
Calculate the critical value T(n+m-2)(1-α/2) of
Student distribution with n+m-2
degrees of freedom (α is a given ahead
significance level =5%, 1% or 0.5%)
Decide
- Reject Hypothesis H if
|t| ≥ T(n+m-2)(1-α/2)
- Accept Hypothesis H if
|t| < T(n+m-2)(1-α/2)
Version C. Using confidence intervals
When degree of freedom (sample size) is large,
Student distribution approximates Normal
distribution. Then we can use confidence
intervals (with significance level of 5%) for
testing:
( n 1) SD( X ) ( n 1) SD( X )
( Mean( X ) T(1 /2) ; Mean( X ) T(1 /2) ),
n n
( m 1) SD (Y ) ( m 1) SD (Y )
( Mean(Y ) T(1 /2) ; Mean(Y ) T(1 /2) )
m m
Decide
Reject Hypothesis H if the two intervals
disjoin
Accept Hypothesis H if the two intervals have
nonempty intersection
Case 3: Unknown and unequal
variances
Construct the unequal-variances t-statistic as follows:
( x1 x2 ) ( 1 2 )
t
s12 s22
( )
n1 n2
( s12 n1 s22 / n2 )2
with d . f . 2
( s12 2
n1 ) ( s22
n2 )
n1 1 n2 1
m1 m2 m1 m2 n1 n2 m1 m2 n1 n2
u / . .
n1 n2 n1 n2 n1 n2 n1.n2
m2 m2 m2 m2 m2 m2
Z1 /2 * (1 ) / n2 ; Z1 /2 * (1 ) / n2
n2 n2 n2 n2 n2 n2
Decide
Reject Hypothesis H if the two intervals
disjoin
Accept Hypothesis H if the two intervals
have nonempty intersection
Compare several proportions
Let X be a binary variable taking two values 0 and 1 .
Collecting data from that variable under k different
conditions we have a sample containing k groups of
observations related with the conditions
Let p1, p2 ,..., pk
be probabilities of appearance of value 1 of
variable X under each of the above k
conditions.
Hypothesis
p1 p2 ... pk
H:
Alternative Hypothesis
K: there is certain difference between p1, p2 ,..., pk
Data: Perform a 2xk table of 2 rows and k columns:
each column for one group, the 1rst row for value 1,
the 2nd row for value 0 of the variable at
observations:
k 1 ( j) ( j)
n .n n .n
2 ( nij i )2 /( i )
j 1 i 0
n n
LEMMA. Suppose that hypothesis H is true.
2
Then variable has distribution approximate
to the Chi-square distribution with ( k 1)
2
degrees of freedom (k-1) .
Density function of Chi – squared distribution
Using Excel to Compute Chi – squared
Distribution
35
Method A (p-value):
Hypothesis
Alternative Hypothesis
where and are the
expectations of X and Y
With ,
comparing expectations of to
0:
Hypothesis
Alternative Hypothesis
Compare mean values of two related
samples
With the empirical value of the test statistic
• - If reject H,
• - If accept H .