Textbook
Textbook
In this chapter we put together the few skills and techniques we have 14.1 Setting up the hypothesis . 71
learnt so far into a formal method for testing hypothesis, which is the 14.2 The test statistic . . . . . . . 71
classical way of doing statistical inference. 14.3 How to conclude . . . . . . . 72
14.4 The t-test . . . . . . . . . . . . 73
Assume that as a new employee at the national tax department you 14.5 Types of error . . . . . . . . . 74
propose a new tax code that you claim is revenue-neutral (i.e. tax rev-
enue will remain the same) but if adopted will save on administration
costs. To show this you collect a sample of 100 forms in the treasury
department, and applying your new tax rule, find that the sample
average came to 2190 Baht with a sample standard deviation of 7200
Baht. The question then is whether this drop in tax revenue of 2190
really represents a drop in overall tax revenue or can be attributed to
chance error.
We shall break the hypothesis testing process into three essential steps,
namely, 1) setting up the hypothesis, 2) calculating the test statistic,
and 3) concluding.
0: Average = 0
0: Average > 0
A test statistic is used to measure the difference between the data (from
the sample) and what is expected as stated in the null hypothesis.
observed expected
TS =
SE
72 14 Test of Significance
2190 0
TS = ⇡ 3
720
P-value
1 in 1,000
-3
So, a small p-value, more concretely, of less than the stipulated signifi-
cance level U leads us to reject the null hypothesis.
An alternative method, popularized in many textbooks, is to compare
the test statistic itself with a “critical value.” As mentioned earlier, a
large test statistic suggests a noticeable difference between the sample
statistic and the expected value. Here, we must take note of whether
a 1-sided or 2-sided test is being performed. For a one-sided test, the
critical value is found by placing all of U on one side of the distribution,
in this case, say, 5%, giving us a z-value or more precisely a “critical
14.4 The t-test 73
Note that to reduce Type-I error for the example of the fire alarm, we
could simply remove the batteries. Then the alarm will never go off
and this increases Type-II error. In effect, there is a trade-off between
Type-I and Type-II error. A researcher can decide in terms of cost which
error is less desirable when designing a test.
To summarize, Type-I error is the probability of rejecting the null when
it is in fact true, i.e. a false alarm. Since 1 U measures our confidence
⇤ If, say, 3 5 is not shown on the table, use the nearest one.
14.5 Types of error 75
that any alarm bells we hear are genuine, 0; ?⌘0 then is the probability
of Type-I error. On the other hand, the chance of making a Type-II
error, that is when the alternative of “fire” is in fact true but the alarm
does not go off, i.e. something we call a “false positive,” is usually
termed V. That is V is probability of not rejecting a hypothesis when
it is in fact false. Obviously we would like a test to reject a false null
most of the time. In fact this is known as the power of test, which is
1 V. Below is a diagram that shows Type-I and Type-II errors.
0: `1 `2 = 0
0: `1 `2 > 0
To find the standard error two independent samples, one must find
a way to mix the two spreads. Adding two SEs, however, will not do.
We have to resort to the law of variance, which states that Var(A+B)
= Var(A) + Var(B) for independent A and B. So the trick is to change
square the individual sample SEs then add to get some kind of accu-
mulated variance for the two samples, then get its square root to revert
to SE, but this time it will be the SE for the combined samples, which
is what we want. Mathematically,
⇤ Note that the same size of each group need not be the same.
† It is important that we are asking a statistical question; of course a score of 25 is not
the same as a score of 23. But what we are saying is that the difference of 2 could just
be chance difference and not a difference due to other factors, specifically smartness in
this case. It is a statistical question that we are asking.
‡ We could have stated the null as ` = ` , which technically is the same, but the way
1 2
we state it in the text reflects that we are doing a “difference” in means test.
80 16 The Test for the Difference in Means
q
SE of difference = (⇢ 12 + (⇢ 22
(25 23) 0
TS = ⇡ 3.4
0.58
The next question is to ask whether this test statistic follows a normal
or t-distribution. The degrees of freedom is (=1 1) + (=2 1), since each
sample losses one degree of freedom. for our example, this is pretty
large so the normal can be used. But for the purist, the t-distribution is
appropriate.
To conclude, either of the 3 methods discussed in the previous chapter
can be used, i.e. 1) the p-value method, 2) the “critical value” method,
or 3) using “confidence intervals”. This is left to the reader as an
exercise.
The Chi-square Test 17
The j2 -test, pronounced “ki-square test” was invented by the promi- 17.1 Goodness-of-fit test . . . . . 81
nent statistician Karl Pearson in 1900. We have dealt so far with tests 17.2 Test for independence . . . 83
that involve, say, drawing from a 1-0 count box. In such a case we have
seen that a z-test or t-test was appropriate. We shall now turn to the
j2 -test, which is used when we wish to make do inferences when more
than 2 categories are considered. More specifically, in this chapter we
will examine two uses of the j2 -test, namely, 1) the goodness-of-fit test,
and 2) test for independence.
df= 5
df= 10
’ (observed expected) 2
TS = (17.1)
expected
Note that a large j2 -statistic means that the observed and expected
frequencies are far apart suggesting that there is a bad fit. More pre-
cisely, at some level of confidence U, one can then use the j2 -table to
find the corresponding p-value or simply compared the test statistic to
the critical value read off the table. If the p-value is less than U or the
test statistic is greater than the critical value, then the null hypothesis
is rejected.
For our example on the toss of a die 60 times, we could pick U = 0.05.
We calculated the j2 -test statistic to be 14.2. The degrees of freedom is
not the sample size less one as with the t-test, but rather the number
of cases less one, i.e. 6 1 = 5. From the table, we find that the corre-
sponding area to the right of 14.2 is between 5% (that is the area to the
right of 11.07) and 1% (i.e. the area to the right of 15.09). In other words,
the p-value is between 5 and 1%, which is less than our depicted U.
The conclusion is the same if we compare the test statistic with the
critical value at U = 0.05 and degrees of freedom 5; our test statistic is
larger than the critical value, 14.2 > 11.07 leading us to reject the null
hypothesis of unfair die.⇤
⇤ As a rule of thumb, the j2 -test should be used when the expected frequency of each
line in the table is 5 or more.
17.2 Test for independence 83
In the earlier part of the book, when studying probability, recall that
we tested for the independence of two variables using conditional
probability. More specifically, two events A and B are said to be inde-
pendent if %( |⌫) = %( ) or %(⌫| ) = %(⌫). The j2 -distribution is the
statistical equivalent for testing independence of two variables given
observations on them.
As before, discussions around an example should be useful. From a
survey of 2,237 people, their number of handedness is summarized in
the following table.
The test statistic is the same as 16.1. The following table shows the
expected frequencies and how we got these will be explained later.
Observed Expected
Male Female Male Female
Right-handed 934 1070 956 1048
Left-handed 113 92 98 107
Ambitextrous 20 8 13 15
The bottom row and left-most column show that the vertical and
horizontal sums respectively, or what is the sum of deviation, add
to zero. This means that we need to know only 2 deviations, and the
others can be automatically found, hence the degrees of freedom is
2. In sum, when testing independence in a < ⇥ = table with no other
constraints on their probabilities, there are (< 1) ⇥ (= 1) degrees of
freedom.
Assuming again U = 0.05, for our example, the p-value is the area to
the right of j2 = 12 at 2 degrees of freedom. From the table, the p-value
is less than 1%, which is less than the significance level of 5%, hence
we reject the null. In the same vein, the critical value at 2 degrees of
freedom and U = 0.05 is 5.99. Because |TS| > |critical value|, we reject
the null in favor of the alternative of no independence.
Lastly, we explain how the expected frequencies is found.