Chapter 7 Hypothesis Analysis
Chapter 7 Hypothesis Analysis
Tests of hypotheses
4.1. Introduction
Hypothesis testing is a method of analysis that makes inferences about
population parameters or relationship of variables from sample data. The
procedure is that we make an assumption about the population parameter or
the relationship variables under study and then test the assumptions on the
basis of sample information or data. It is one the major techniques of
statistical inference, which is used to solve many practical problems in
business and in any field in general.
Objectives
Upon completion of this unit you will be able to
on the level of the sample mean, X , which is referred to as the test statistic.
To make test suppose we take a random sample of 500 families and decide
Therefore, the set of values { X > 800} is called critical or rejection region
Accept Ho Reject Ho
Ho True No error Type I error
Ho False Type II error No error
Notation
The probability of type I error is denoted by and the probability of type II
error is denoted by .
P (Type 1 error) = P (rejecting Ho Ho is true) =
P (Type 11 error) = P (accepting Ho Ho is false) =
Level of significance. The probability of rejecting Ho when it is infact true is
denoted by and is known as a level of significance. That is,
Level of significance = = P (reject Ho/ Ho is true)
= P (type I error)
is usually taken as 0.05 or 0.01.
We determine the critical values by first specifying alpha (), the probability
of committing type I error. If the cost of committing type I error is high, the
decision makers should specify a small alpha. A small alpha results in a small
rejection region. If the rejection region is small, the sample mean is less
likely to fall there, and the chances of rejecting a true null hypothesis are
small. If the acceptance region is large, the probability of committing a type
II error () will also tend to be large. A decrease in will increase in .
However, the increase in will not equal the decrease in . Decision makers
must examine carefully the costs of committing type I and type II errors and,
in light of these costs, attempt to establish acceptable values of and .
The goodness of the choice of the critical value 9 depends on the attitude of
the decision towards the two types of errors, Type I and Type II errors. The
smaller the two types of errors, the better the test. These errors can be
computed for any particular test. It can be shown that when one of the
probabilities is decreased the other tends to increase. Therefore, the general
agreement is that we fix the probability of type one error at and try to
minimize type two error .
Hypotheses in general are of two forms
Tests of the form,
Ho: = o or Ho: = o
Ha: N o Ha: o,
in which the critical region includes either large or small values of the test
statistic are called one- sided or one –tailed tests.
A test statistic is a formula which is used for computing from the data in
testing hypothesis. The value of the test statistic is used in determining
whether or not we may reject the null hypothesis.
And those of the form,
Ho: = o
Ha: o
Including both large and small values of the test statistic are called two –
tailed tests.
Steps in testing hypotheses
In general statistical testing of hypothesis follows certain steps that are more
or less taken by most researchers.
1. Formulate the hypothesis. That is, rewrite the problem to a testable
form.
2. Determine alpha ( ), which is usually given
3. Determine the appropriate statistical distribution used.
4. Determine the test statistic.
5. Establish the critical region of the test statistic or give the decision
rule. The decision rule of a statistical hypothesis test is a rule that
specifies the conditions under which the null hypothesis may be
rejected.
6. Gather data.
7. Calculate the value of the test statistic and find the tabulated value.
8. Make a decision, that is, accept or reject the null hypothesis, Ho.
4.3. Tests about a single mean
Hypotheses tests about means, just like confidence interval estimation about
means considered in the previous unit, are dependent on whether the
population is normal or not, samples large or small and population standard
deviation known or unknown. Let us see the different possible combinations
of these cases and make tests about mean.
1-
_____________________________________________________________
Z
X−μ o
(
σ
P √ n C) =
Critical region for testing the above hypothesis is
X− μ o
σ
√ n Z
If Ho: = o
Ha: o
In a similar argument as above, the critical region is,
X− μ o
σ
√ n - Z
To test the two- tailed tests,
Ho: = o
Ha: o, there is no difference between the believed population
mean and the specified value.
Critical region can be shown to be,
X− μ o
σ
√ n Z/2
Example 1
According to a car manufacturing company, their cars averaged at least 32
miles per gallon in the city. From past records it is known that mileage is
normally distributed with standard deviation of 2.5 miles per gallon. Tests
on 16 cars showed that mean mileage in the city is 31.5 miles per gallon. Do
the data support the claim of the company at 1% level of significance?
Solution
The appropriate hypothesis is,
Ho: ≥ 32
Ha: < 32
Here, population is normal, sample size, n = 16, is small and population
standard deviation, = 2.5, is known, and hence we use the normal
distribution for the test.
Let = 0.01
Here, we have to reject Ho if
X− μ o
σ
√ n < - Z
X− μ o 0 .985−1
σ
0 .105
Zcal = √n = 5 = 0.728
Z tab = - Z = - Z0.01 = - 2.33
Since Zcal > Z , accept Ho.
tab
Therefore, the claim of the company is correct or the data support the claim
of the company.
Activity1
A producer of an electric bulb claims that the average life length of its
product is more than 1800hrs. Consumer’s protection agency on the other
hand doubts the claim. Therefore, it takes a random sample of 400 bulbs and
found out that the mean life length is 1780hrs with standard deviation of
200hrs. Assuming that life length of bulbs is normal, should the agency
support the producer’s claim at 5% level of significance?
S= √ ∑ ( X i −X )2
n−1
Example 2
Workers of a given industry complain that their average monthly salary is at
most birr 600. Workers union wanted to test the claim of the workers and
takes a random sample of 100 workers of the factory that produce a mean
monthly salary of Birr 610 with a standard deviation of Birr 50. What should
the workers union conclude about the claim of the workers at 5% level of
significance?
Solution
The hypothesis is
Ho: 600
Ha: > 600
Here, is unknown and n is large. Therefore, by central limit theorem, the
test distribution is Z-distribution.
X−μ
s
Critical region is √ n Z
X−μ 610−600
s 50
√n = √ 100 = 2
For = 0.05, Z0.05 = 1.645
Since Zcal > Z , then Ho is rejected. Therefore, the workers union should
tab
We have seen earlier that in this case, X has a t- distribution with (n-1)
X =
∑ Xi
n =, 37.7 S= √ ∑ ( Xi−X )2
n−1 = 1.84
X− μ o
S
Critical rejoin is √n tα/2
X− μ o 37 . 7−37
S 1. 84
t cal = √n = √10 = 1.2
At = 0.05, ttab = t (n – 1) = t
/2 0.025 (9) = 2.26
Since t cal < ttab, accept Ho.
Therefore, the sample data are consistent with the specified body
temprature.
Activity2
In a particular school District the I.Q. is known to have a normal distribution
with mean of 110. From one school a sample of 25 students were taken and
found that their average I.Q. is 115 with variance 100. Is the average in this
school different from the district average?
The method here is to compare the sample proportion with the specified
population value or to analyze the qualitative data where we test the
presence or absence of a certain characteristics bases on sample values.
By the central limit theorem, if nP > 5 and nQ > 5, it can be shown that the
Hence Z= √ PQ
n N (0, 1)
^p −p
Again √ PQ
n is the test statistic for testing single proportion.
The test procedure is the same as that of the mean, shown in the previous
sections. That is the computed value of Z is compared with the table value of
Z, which is also known as the critical value of Z.
If the hypothesis is Ho: P = Po
Ha: P > P o, where Po is a specific value
1-
_____________________________________________________________
Z
^p −p
P( √ PQ
n > Z ) = , where is probability of type I error
^p −p
Similarly, if Ho: P = Po
Ha: P < Po
^p −p
Critical region is √ PQ
n < - Z
Critical region is
PQ
√
n Z
The calculated value of Z is
^p −p 0 . 52−0 .5
Zcal = √ PQ
√
(0 . 5)( 0. 5 )
n = 500 = 0.89
For = 0.05, the tabulated value is
Ztab =Z = Z0.05 = 1.645.
Since Zcal < Ztab , accept the Ho.
Therefore, the claim of workers union is correct.
Activity 3
A medical expert claims that at least 60% of diseases of a given community
are related to lack of sanitation. A researcher who wanted to test the claim
takes a random sample of 500 patients from the community and found out
that 350 of the cases are related to sanitation. Test at 1% level of
significance if the claim of the medical expert is correct or not.
√ N −n
N −1 , in tests of hypotheses too. In this case also the standard error of the
respective estimators must be multiplied by the finite population correction
factor.
√ N −n
N −1 . When the population standard deviation is unknown, we substitute
the sample standard deviation, s, for it.
√ √
^
P(1− ^ N−n
P)
^ , as σ ^p =
the sample proportion, p n N−1
Example 5
Suppose that a random sample of 200 electric bulbs from a total of 2000
bulbs produces average life length of 12,000 hrs with a standard deviation of
Birr 1500 hrs. Test at 5% level of significance that the average life length of
the 2000 electric bulbs is at most 11,800 hrs.
Solution
For N = 2000 and n = 200, the sampling fraction, n/N = 200/2000 = 0.1,
which is more than 0.05. Therefore, the finite population correction factor
must be used in the test.
The hypothesis is
Ho: μ ≤ 11,800
Ha: μ > 11,800
x−μ o
and X 2 are sample means from the first and the second population
respectively.
Here, it is obvious that the difference in the sample means is normally
σ σ
12 22
+
distributed with mean, μ1- μ2 and variance n1 n2
2 2
σ 1 σ 2
+
That is, ( X 1 −X 2 ) N (1 -2, n1 n2 )
( X 1 −X 2 )−( μ1 −μ2 )
√
σ σ
12 22
+
Thus, Z = n1 n2 N (0, 1)
Suppose we wish to test,
Ho: 1 = 2 or Ho: 1 - 2 = 0
Ha: 1 2 Ha: 1 - 2 > 0
X 1− X 2
√
σ 2 σ 2
1 2
+
Here under Ho, the test statistic becomes n1 n2 N (0, 1)
√
σ 2 σ 2
1 2
+
n1 n2 Zα
√
σ 2 σ 2
1 2
+
n1 n2 - Zα
√
σ 2 σ 2
1 2
+
n1 n2 Zα /2
Example 6
A sample of 200 students graduated from college in 1990 produce an average
age of 25 years and another sample of 250 students graduated from college
in 2005 produce an average age of 23.5. Test at 5% level of significance that
the average age of graduation decreases in 2005, assuming normal
population and standard deviation of ages of graduation of both years are
equal to be 5 years.
Solution
Let 1 = the average age of all students graduated in 1990
2 = the average age of all students graduated in 2005
The required test is
Ho: 1 = 2
Ha: 1 > 2
Here, populations are normal, 1 = 2 = 5, then the test distribution is normal.
Therefore, critical region is
X 1− X 2
√
σ 2 σ 2
1 2
+
n1 n2 Zα
X 1− X 2 25−23 .5
√ √
σ 2 σ 2 25 25
1
+
2
+
Zcal = n1 n2 = 200 250 = 3.16
Ztab = Zα = Z0.05 = 1.645
Therefore, we reject Ho, which means that the average age of graduation
decreases from that of 1990 significantly.
σ σ
Case1: When population variances are equal, which is, 12 = σ 2.
22 =
2
Because the population variances are equal, σ is logically estimated by the
pooled sample variances, Sp2, given as
( n1 −1) S 2 +( n 2−1 ) S 2
1 2
Sp2 = n1 +n 2−1
2 2
σ 1 σ 2
+
And ( X 1 −X 2 ) t (1 -2, n1 n 2 ).
Then,
( X 1 −X 2 )−( μ1 −μ2 )
Sp
√ 1 1
+
n1 n2 t (n1 + n2 – 2)
Suppose we wish to test
Ho: 1 = 2
Ha: 1 2
X 1 −X 2
Under Ho,
Sp
√ 1 1
+
n1 n2 t (n +n – 2)
1 2
Sp
√ 1 1
+
n1 n2 t (n +n – 2)
1 2
If Ho: 1 = 2
Ha: 1 2
Critical region is
X 1 −X 2
Sp
√ 1 1
+
n1 n2 - t (n +n – 2)
1 2
Sp
√ 1 1
+
n1 n2 - tα 2 (n +n – 2)
1 2
Example 7
A transport company wants to compare the fuel efficiency of two types of
lorries it operates. It obtains data from samples of the two types of lorries, A
and B with the following result.
Lorry type
A B
Average mileage per liter 10.9 10.0
Standard deviation 2.3 2.0
Sample size 20 25
tcal = │
Sp
√ 1 1
+
n A n B │ = 1.39
( X 1 −X 2 )−( μ1 −μ2 )
√
s s
12 22
+
Here, it can be shown that, n1 n2 (approx.) t ()
[ ]
2
S 2 S 2
1 2
+
n1 n2
( ) ( )
2 2
S 2 S 2
1 2
n1 n2
+
where, the degree of freedom = = n1 −1 n 2−1
√
s 2 s 2
1 2
+
Under Ho, n1 n2 (Approx.) t ()
X 1− X 2
√
s 2 s 2
1 2
+
Critical region is, n1 n2 t ()
Similarly the critical region for the tests with alternative hypotheses
Ha: 1 2 and Ha: 1 2 respectively are
X 1− X 2
√
s 2 s 2
1 2
+
n1 n2 - t ()
X 1− X 2
√
s 2 s 2
1 2
+
And n1 n2 tα 2 ()