Lecture 3 - Statistical Tests
Lecture 3 - Statistical Tests
TESTS
THE T-TEST
What is the main use of the t-test?
How is the distribution of t related to the unit normal?
When would we use a t-test instead of a z-test? Why might
we prefer one to the other?
What are the chief varieties or forms of the t-test?
What is the standard error of the difference between means?
What are the factors that influence its size?
• Identify the appropriate version of t to use for a given design.
• Compute and interpret t-tests appropriately.
• Given that
Suppose
(X ) ( X X ) 2
zM sX N 1
est. M est . M
Then N N
H 0 : 10; H1 : 10; s X 5; N 200
If
sX 5 5
est . M .35
N 200 14.14
(11 10)
X 11 z 2.83; 2.83 1.96 p .05 Sign. Diff.
.35
THE T DISTRIBUTION
We use t when the population variance is unknown (the usual case)
and sample size is small (N<100, the usual case). If you use a stat
package for testing hypotheses about means, you will use t.
The t distribution is a short, fat relative of the normal. The shape of t depends on its df. As N
becomes infinitely large, t becomes normal.
DEGREES OF FREEDOM
For the t distribution, degrees of freedom are always a
simple function of the sample size, e.g., (n-1).
X tˆ M
Interval =
11 2.064(1) [8.936, 13.064]
Interval is about 9 to 13 and contains 10, so n.s.
Rejection region is 8.936 > X > 13.064.
REVIEW
How are the distributions of z and t related?
H 0 : 75; H1 : 75; s y 14; N 49; t(.05, 48) 2.01
Given that
Standard Error:
2 2
diff M1 M2
DIFFERENCE BETWEEN MEANS (2)
We can estimate the standard error of the difference
between means.
est . diff est . M2 1 est . M2 2
For large samples, can use z
( X 1 X 2 ) ( 1 2 ) H 0 : 1 2 0; H1 : 1 2 0
z diff est d iff X 1 10; N1 100; SD1 2
X 2 12; N 2 100; SD2 3
4 9 13
est . diff .36
100 100 100
(10 12) 0 2
z diff 5.56; p .05
.36 .36
INDEPENDENT SAMPLES T (1)
( y1 y 2 ) ( 1 2 )
Looks just like z: t diff est d iff
df=N1-1+N2-1=N1+N2-2
If SDs are equal, estimate is:
2 2 1 1
diff 2
N1 N 2 N
1 N 2
( N1 1) s12 ( N 2 1) s22 N1 N 2
est . diff
N1 N 2 2 N N
1 2
INDEPENDENT SAMPLES T (2)
( N1 1) s12 ( N 2 1) s22 N1 N 2
est . diff
N1 N 2 2 N N
1 2
H 0 : 1 2 0; H1 : 1 2 0
( y1 y 2 ) ( 1 2 )
t diff est d iff y1 18; s12 7; N1 5
y2 20; s22 5.83; N 2 7
4(7) 6(5.83) 12
est . diff 1.47
5 7 2 35
(18 20) 0 2
t diff 1.36; n.s.
1.47 1.47
tcrit = t(.05,10)=2.23
REVIEW
What is the standard error of the difference between
means? What are the factors that influence its size?
Describe a design (what IV? What DV?) where it
makes sense to use the independent samples t test.
DEPENDENT T (1)
Observations come in pairs. Brother, sister, repeated measure.
diff
2
M2 1 M2 2 2 cov( y1 , y2 )
D
(D ) i
i
( D D ) 2
est . MD
sD
N s
2
D
N 1 N
D E(D )
t df=N(pairs)-1
est . MD
DEPENDENT T (2)
Brother Sister Diff (D D )2
5 7 2 1
7 8 1 0
3 3 0 1
y 5 y6 D 1
sD
( D D ) 2
1
N 1 est . MD 1 / 3 .58
D E(D ) 1
t 1.72
est . MD .58
ASSUMPTIONS
The t-test is based on assumptions of normality and
homogeneity of variance.
You can test for both these (make sure you learn the SAS
methods).
As long as the samples in each group are large and nearly
equal, the t-test is robust, that is, still good, even tho
assumptions are not met.
REVIEW
Describe a design where it makes sense to use a
single-sample t.
Describe a design where it makes sense to use a
dependent samples t.
STRENGTH OF ASSOCIATION
(1)
Scientific purpose is to predict or explain variation.
Our variable Y has some variance that we would like to account for. There are statistical
indexes of how well our IV accounts for variance in the DV. These are measures of how
strongly or closely associated our Ivs and DVs are.
Variance accounted for:
2
2
( 1 2 ) 2
2 Y Y|X
Y2
4 Y2
STRENGTH OF ASSOCIATION
(2) 2
2
( 1 2 ) 2
How much of variance in Y is associated with the IV?
2 Y Y|X
Y2
4 Y2
Compare the 1st (left-most) curve with the curve in the
middle and the one on the right.
In each case, how 0.4
membership? More
in the second 0.1
comparison. As
mean diff gets big, so 0.0
-4 -2 0 2 4 6
(X )
( X1 X 2 ) t
t
1 (single 2
1 N
(independent p2
N
1 N 2 sample)
samples)
td N
Increasing sample size does not increase effect size
(strength of association). It decreases the standard
error so power is greater, |t| is larger.
ESTIMATING POWER (1)
If the null is false, the statistic is no longer distributed as t, but rather as noncentral t. This
makes power computation difficult.
Howell introduces the noncentrality parameter delta to use for estimating power. For the one-
sample t,
105 100
d Howell presents an appendix
1 / 3 .33
15
where delta is related to
d n .33 25 1.65 power. For power = .8, alpha
power .38 = .05, delta must be 2.80. To
solve for N, we compute:
2 2
2 .8
d n; n 8.48 71.91
2
d .33
ESTIMATING POWER (3)
Dependent t can be cast as a single sample t using difference scores.
Independent t. To use Howell’s method, the result is n per group, so double it. Suppose d = .5
(medium effect) and n =25 per group.
2 SAMPLE T POWER
Calculate sample size
proc power; Two-sample t Test for Mean Difference
twosamplemeans Fixed Scenario Elements
meandiff= .5 Distribution Normal
stddev=1 Method Exact
power=0.8 Mean Difference 0.5
ntotal=.;
Standard Deviation 1
run;
Nominal Power 0.8
Number of Sides 2
Null Difference 0
Alpha 0.05
Group 1 Weight 1
Group 2 Weight 1
Computed N Total
Actual N
Power Total
0.801 128
2 SAMPLE T POWER
proc power;
The POWER Procedure
Two-Sample t Test for Mean Difference
Fixed Scenario Elements
twosamplemeans
Distribution Normal
meandiff = 5 [assumed Method Exact
difference] Number of Sides 1
stddev =10 [assumed SD] Mean Difference 5
Standard Deviation 10
sides = 1 [1 tail] Total Sample Size 50
ntotal = 50 [25 per group] Null Difference 0
Alpha 0.05
power = .; *[tell me!]; Group 1 Weight 1
Group 2 Weight 1
run;
Computed Power
Power
0.539
TYPICAL POWER IN PSYCH
Average effect size is about d=.40.
Consider power for effect sizes between .3 and .6. What kind of sample size do we need for
power of .8?