Module in Inferential Statistics With Additional Exercises Edited Until August 2022 by Tats2020
Module in Inferential Statistics With Additional Exercises Edited Until August 2022 by Tats2020
by
Fernando P. Tataro
Introduction
Module has been around for decades as an aid to the delivery of teaching. With
the proliferation of the Covid 19, module has become a necessity. In fact, many
authors have already written this type of manuscript. As the Covid 19 spreads out
worldwide, it causes many institutions/sectors (private or state) to be locked down. Its
spread prompted many countries in almost all parts of the world to issue ECQ, GCQ,
or Modified GCQ or whatever they may call it. The top 5 countries hardly hit by this
Pandemic include the United States (USA), Brazil, Russia, India and the United
Kingdom (UK). The number of infected (total cases) in the USA has already reached
more than 2.6 million and is still growing. In the Philippines, though not that serious
compared to the above-mentioned countries, for fear that the Medical Sector may not
be able to handle the great number of infected persons and to prevent the rapid
increase of infection, the government, as advised by the Department of Health (DOH),
has issued a lockdown as well.
One of the sectors greatly impacted by this Pandemic Covid 19 is the
Educational System. The issuance of lockdown due to the virus, hindered educators
and students to meet face-to-face because they have to stay at home. But all
concerned individuals must not succumb to the adversities brought about by it. Thanks
to the new technology at hand. The advent of this technology has somehow enabled
us to cope up with the current challenges, especially that caused by Covid 19
pandemic.
The influx of computers, TVs, cellphones, etc., has grown in gigantic numbers,
and so with the application programs / softwares (You-Tubes, Zoom, Google
Classroom, Google Forms, etc.), running in various platforms. The role of Modules
however will remain essential to the Educational System.
Module 1
Skewness and kurtosis are measures that give information whether the
distribution is normal or abnormal. When skewness is positive, the distribution
is said to be positively skewed or skewed right, meaning, the right tail of the
distribution is longer than the left. When the skewness is negative, the
distribution is negatively skewed or skewed left, which means that the left tail
of the distribution is longer than the right. And If skewness is zero, the
distribution is perfectly symmetrical. The curve representing the distribution is
bell shape.
When the skewness is zero (perfectly symmetrical) and kurtosis is 0.265
(mesokurtic), the distribution is said to be normal.
These two measures are very important in the succeeding lessons on
Inferential Statistics. These enable us to determine if the distribution is normal
or non-normal.
Objectives:
When you have completed the lessons, you will be able to:
• Calculate quantities such as the mean, median and standard deviation for
use in the skewness.
• Use skewness to determine whether the distribution of data is positively
skewed or negatively skewed (abnormal) or neither (normal).
• Calculate the quartiles, quartile deviation, percentiles and kurtosis to
distinguish whether the distribution of data is mesokurtic, leptokurtic and
platykurtic then tell the abnormality or normality of the distribution based on
these criteria.
• Identify the different types of graphs of skewness and kurtosis.
LESSON 1 – The SKEWNESS
The formula for skewness is
3(𝑋̅ − Md)
𝑆𝑘 =
SD
Where:
𝑋̅= Mean
Md=Median
SD= Standard Deviation
𝑋̅=31.60
Then, identify the value of Lme under the Cumulative Frequency (CF) Column.
Starting from the bottom, search for the first value that is equal to or greater the
𝑛
calculated 2 = 25.
This value is 32. The value 32 enables us to locate the Median Class. Thus, the
Median Class is 29.5 – 34.5. So, Lme=29.5, CF=21, and Fme=11. The class interval i is
5, the difference between two subsequent values, say 45 and 40. So 45-40 =5.
i =5
Substituting values in the formula for Median above:
Thus
𝑛
−𝐶𝐹
2
𝑀𝑑 = 𝐿𝑚𝑒 + 𝑖
𝐹𝑚𝑒
25−21
𝑀𝑑 = 29.5 + (5)
11
𝑀𝑑 = 31.318
2
2 (∑ 𝑓𝑥)
∑ 𝑓𝑥 −
𝑆𝐷 = √ 𝑛
𝑛−1
15802
53120−
𝑆𝐷 = √ 50−1 50
𝑆𝐷 = 8.071
d) Calculation of the SK
3(𝑋̅ −𝑀𝑑)
𝑆𝐾 = 𝑆𝐷
3(31.60−31.318)
𝑆𝐾 = 8.071
𝑆𝐾 = 0.105
The SK value of 0.105 indicates that the distribution is abnormal. Being positive, the
distribution is said to be positively skewed or skewed right.
DIFFERENT GRAPHS OF SKEWNESS
Skewness is the shape of the distribution. The distribution is negatively
skewed when the thinner tail is deviated to the left side. It is positively skewed when
the thinner tail is deviated to the right, but it is normal when it is a bell shape. (Broto,
2006)
𝑄𝑑
𝐾𝑢 =
(𝑃90 −𝑃10 )
Where:
Ku=Kurtosis
Qd=Quartile Deviation
𝑃90 = 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 90
𝑃10 = 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 10
Example:
Consider the frequency distribution of example 1 for the problem in Skewness.
Determine whether it is normal or abnormal by solving the value of Kurtosis.
a) Solve for Qd.
𝑄3 − 𝑄1
𝑄𝑑 =
2
But Qd has the formula above. And Q1 and Q3 must be calculated first.
For Q1:
𝑛
−𝐶𝐹
4
𝑄1 = 𝐿𝑄1 + 𝑖
𝐹𝑄1
𝑛 50
1) Determine = = 12.5.
4 4
2) Identify the first quartile class:
Look for the first value under Cumulative Frequency Column from the above
table greater than or equal to 12.5, this value is 21.
The value 21 is (21>=12.5).
This value is located in the third row from the bottom of the table. The 1st
Quartile Class (24.5 – 29.5) belongs to this row. LQ1 therefore is 24.5; CF=9,
the cumulative frequency before; the FQ1 =12; and i=5.
12.5 − 9
𝑄1 = 24.5 + (5)
12
Q1 = 25.958
For Q3:
3𝑛
−𝐶𝐹
4
𝑄3 = 𝐿𝑄3 + 𝑖
𝐹𝑄3
3𝑛 3(50)
1) Determine = = 37.5.
4 4
2) Identify the third quartile class:
Look for the first value under Cumulative Frequency Column from the above
table greater than or equal to 37.5, this value is 41. The value 41 is (41 >= 37.5).
This value is located in the fifth row from the bottom of the table. The Quartile
Class (34.5 – 39.5) belongs to this row. LQ3 therefore is 34.5; CF=32, the
cumulative frequency before; the FQ3 =9; and i=5.
37.5 − 32
𝑄3 = 34.5 + (5)
9
Q3 =37.556
37.556−25.958
𝑄𝑑 = 2
𝑄𝑑 = 5.799
For P10:
10𝑛
−𝐶𝐹
𝑃10 = 𝐿𝑃10 + 100 𝑖
𝐹 𝑃10
10𝑛 10(50)
1) Determine = = 5.
100 100
2) Identify the P10 class:
Look for the first value under Cumulative Frequency Column from the above
table greater the or equal to 5, this value is 9. The value 9 is (9 >=5).
This value is in the 2nd row from the bottom of the table. The P10 Class (19.5-
24.5) belongs to this row. LP10 is 19.5, CF=4, and FP10 = 5.
Substituting values,
5−4
𝑃10 = 19.5 + (5)
5
𝑃10 = 20.5
90𝑛
−𝐶𝐹
P90: 𝑃90 = 𝐿𝑃90 + 100 𝑖
𝐹 𝑃90
90𝑛 90(50)
1) Determine 100 = = 45.
100
2) Identify the P90 class:
Look for the first value under Cumulative Frequency Column from the above
table greater the or equal to 45, this value is 47. The value 47 is (47 >=45).
This value is in the 2nd row from the top of the table. The P90 Class (39.5-44.5)
belongs to this row. LP90 is 39.5, CF=41, and FP90 = 6.
Substituting values,
45−41
𝑃90 = 39.5 + (5)
6
𝑃90 = 42.833
Since the item values in the formula for Ku were all determined, thus
𝑄𝑑
𝐾𝑢 = (𝑃
90 −𝑃10 )
5.799
𝐾𝑢 = (42.833−20.5)
𝐾𝑢 = 0.260
The value of Ku equal to 0.260 is less than 0.265. This means that, the
distribution is Leptokurtic or the data cluster to the peak. The distribution is abnormal.
Note: The arrangement of the data in the Frequency Distribution Table is in
descending order. The cumulative frequencies obtained started from the frequency
below the table towards the top by adding them together one after the other. Had the
data were arranged in ascending order, cumulative frequencies shall be obtained in
reverse order (i.e., by starting from the top).
Exercise:
Consider the following Frequency Distribution. Determine if the distribution is normal
or abnormal by solving for skewness and kurtosis.
Scores f
16-20 2
21-25 5
26-30 11
31-35 13
36-40 10
41-45 6
46-50 3
n=50
Types of Test
1. Test of Means
2. Test of Difference of Means
The first type aims to find out if a population characteristic, as indicated by the Mean,
has changed. The second seeks to determine if the same characteristic between two
populations is significantly different.
Types of Error
The Null Hypothesis as stated, can either be correct (that is, the statement
is true) or wrong (that is, the statement is false). Two types of errors can be committed
when a conclusion is made after a hypothesis is tested. These are:
• Type I error (or alpha error): The Null Hypothesis is rejected when, in
fact, it is true.
• Type II error (or beta error): The Null Hypothesis is accepted when, in
fact, it is false.
Since we can never be completely certain about the claim made in the Null
Hypothesis and the corresponding conclusion, therefore, there is always the risk of
making either a Type I or a Type II error. Between these two types, the more serious
error is Type II. We certainly would avoid making a Type II error, if any error should be
made at all. The chance of making an error of this type can be minimized by using
what is referred to as the Level of Significance.
Critical or Critical or
rejection rejection
region region
Level of Significance
As shown above, the two critical regions on both ends of the Normal Curve
are also called the rejection regions. The area in the Middle is the acceptance region.
The Level of Significance is the dividing line between the acceptance region and the
rejection region.
Important Note: Another way of knowing when to reject or accept the Null Hypothesis
is to solve for the p-value. When the p-value is less than the Level of Significance,
then the Null Hypothesis is rejected.
Use of Symbols
We need to make a distinction between the symbols used for a population
and those for a sample. These symbols are used in the formulas applied in testing a
Hypothesis. Let us summarize them below:
Sample Population
Size n N
Mean 𝑋̅ (mu)
Standard Deviation s 𝜎 (𝑠𝑖𝑔𝑚𝑎)
(small letter s)
One-Tailed and Two-Tailed Tests
In testing a hypothesis, we have to decide whether to use a One-tailed Test
or a Two-tailed Test. Whether it is one-tailed or two-tailed will depend on the nature of
the problem which, in turn, will determine how the Null Hypothesis (Ho) and Alternative
Hypothesis (Ha) are stated.
Before we cite examples to determine which test to use, let us point out that
in a two-tailed test, the critical regions are on both sides of the Normal Curve, that is:
Critical or Critical or
rejection rejection
region region
On the other hand, in one-tailed test, the critical region is only on one side
of the Normal Curve, either on the left or right side of it, that is:
Critical or
rejection
region
And
Critical or
rejection
region
Let us now cite examples to illustrate when to use each type of test. Without solving
the problem, the following example will illustrate the use of a Two-tailed Test:
This problem will require a Two-tailed Test with critical regions on both sides of the
Normal Curve because if the machine is not functioning properly it will produce
parts whose diameters will be wider and narrower. It is not a case of the
malfunctioning machine producing parts with consistently narrower diameters only
(or with consistently wider diameters only).
This situation determines now how the Null and Alternative Hypotheses are
stated, which are:
Null Hypothesis: The average diameter of the part being manufactured has not
changed (that is, Ho: 𝜇 = 5 𝑐𝑚).
Alternative Hypothesis: The average diameter has changed (that is, Ha: 𝜇 ≠ 5 𝑐).
• A barangay chairman feels that, on the whole households in his barangay, are
better-off financially now compared to several years ago. He feels this because he
has observed that mothers seem to be visiting supermarkets more often and that
he has been receiving less complaints. Furthermore, he knows that a quite a
number of husbands in his barangay have gone abroad to work as OFWs
(overseas foreign workers) where incomes are much higher than when you are just
in the Philippines. He knows from a survey conducted several years back that the
average monthly income in his barangay was Php 20,000. He decides to find out
if the average monthly income may have possibly increased by taking a sample of
households. He tests his hypothesis.
The Null and Alternative hypotheses for his problem are stated as follows:
Null Hypothesis: The average monthly family income in the barangay has not changed.
(that is, Ho: 𝜇 = 𝑃ℎ𝑝 20,000).
Alternative Hypothesis: The average monthly family income has increased (that is, Ha:
𝜇 > 𝑃ℎ𝑝 20,000).
MODULE 2
Objectives:
When you have completed the chapter, you will be able to:
• Use lessons in module 1 to determine the types of distribution of data.
• Distinguish parametric from nonparametric tests and their relationship to normality
of distribution.
• Apply Stepwise Method in solving problems using parametric tests.
Where:
t = the t-test
̅̅̅1 = the mean of group 1
𝑋
̅̅̅2 = the mean of group 2
𝑋
𝑆𝑆1= sum of squares of group 1
𝑆𝑆2 = sum of squares of group 2
𝑛1 = number of observations in group 1
𝑛2 = number of observations in group 2
Example 1. The following are the scores of 10 male and 10 female BSBM students in
mathematics. Test the null hypothesis that there is no significant difference between
the performance of male and female BSBM students in the said test. Use the t-test at
0.05 level of significance.
Male Female
15 13
19 9
17 12
15 5
6 9
12 3
14 8
10 4
11 6
16 13
Male Female
𝑥1 𝑥12 𝑥2 𝑥22
15 225 13 169
19 361 9 81
17 289 12 144
15 225 5 25
6 36 9 81
12 144 3 9
14 196 8 64
10 100 4 16
11 121 6 36
16 256 13 169
135 1953 82 794
̅̅̅1 = ∑ 𝑥1 = 13.5
∑ 𝑥1 = 135, ∑ 𝑥1 2 = 1953, n1 = 10, 𝑋
𝑛 1
∑ 𝑥2
∑ 𝑥2 = 82, ∑ 𝑥2 2 = 794, n2 = 10, 𝑋
̅̅̅2 = = 8.2
𝑛2
2 (∑ 𝑥1 )2 (135)2
𝑆𝑆1 = ∑ 𝑥1 − = 1953 − = 130.5
𝑛1 10
2 (∑ 𝑥2 )2 822
𝑆𝑆2 = ∑ 𝑥2 − = 794 − = 121.6
𝑛2 10
̅̅̅1 − 𝑋
𝑋 ̅̅̅2
𝑡=
𝑆𝑆1 + 𝑆𝑆2 1 1
√(
𝑛1 + 𝑛2 − 2 𝑛1 𝑛2 )
)( +
13.5 − 8.2
𝑡=
√( 130.5 + 121.6)( 1 + 1 )
10 + 10 − 2 10 10
5.3
𝑡=
√252.1 ( 1 + 1 )
18 10 10
5.3
𝑡=
√14.0056(0.2)
5.3
𝑡=
√2.80112
5.3
𝑡=
1.6737
t=3.167
Adopting the Stepwise Method briefly delineated above:
I. Problem: Is there a significant difference between the performance of the
male and female BSBM students in mathematics?
II. Hypothesis:
Ho: There is no significant difference between the performance of
male and female BSBM students in mathematics( ̅̅̅̅ ̅̅̅2 ).
𝑋1 = 𝑋
H1: There is a significant difference between the performance of male and
female BSBM students in mathematics ( 𝑋 ̅̅̅̅ ̅̅̅
1 ≠ 𝑋2 ).
III. Level of Significance:
𝛼 = 0.05
𝑑𝑓 = 𝑛1 + 𝑛2 − 2 = 10 + 10 − 2 = 18
From the table, 𝑡0.05 = 2.101. This value was obtained by pairing the level
of significance, 𝛼 and the degree of freedom, 𝑑𝑓, (0.05, 18).
The critical value, 𝑡0.05 = 2.101 above, can also be obtained using the Excel
built-in function t-crit or t-tabular = 𝑇. 𝐼𝑁𝑉. 2𝑇(𝛼, 𝑑𝑓). Thus, we have
𝑡0.05 = 𝑇. 𝐼𝑁𝑉. 2𝑇(0.05,18) = 2.101
Conclusion: Since the p-value of 0.005 is less than 0.05 level of significance
with 18 degrees of freedom, the null hypothesis is rejected in favor of the
research hypothesis. This means that, there is a significant difference in the
performance of the male and female BSBM students in mathematics. It
implies that the male performed better than the female students, considering
that the mean or average score of 13.5 obtained by the male students is
greater than the mean score of the female students of 8.2.
Exercise:
Two groups of experimental rabbits were injected with tranquilizer at 1.0 mg and
1.5 mg dose respectively. The time given in seconds that took them to fall asleep
is hereby given. Use the t-test for two independent samples at 0.01 to test the null
hypothesis that the difference in dosage has no effect on the length of time it took
them to fall asleep.
1.0 mg 1.5 mg
5.3 12.1
3.4 7.8
7.2 15.4
6.7 14.2
5.6 13.6
3.1 9.7
4.8 10.4
7.8 17.2
13.1 20.3
8.2 19.7
6.4
12.3
The t-test for correlated samples is used when comparing the means before
and after the treatment such as pretest and posttest. The formula is,
Example 1. An experimental study was conducted on the effect of programmed
materials in Mathematics on the performance of 20 selected college students. Before
the program was implemented, pretest was administered and after 3 months the same
instrument was used to get the posttest result. The following is the result of the
experiment:
PRETEST POSTTEST
X1 X2 D D2
20 24 -4 16
35 38 -3 9
15 20 -5 25
16 25 -9 81
18 27 -9 81
17 24 -7 49
23 35 -12 144
22 27 -5 25
19 23 -4 16
25 28 -3 9
28 32 -4 16
22 28 -6 36
12 25 -13 169
15 26 -11 121
21 32 -11 121
28 39 -11 121
25 36 -11 121
16 28 -12 144
34 41 -7 49
32 38 -6 36
∑ 𝐷 = -153 ∑ 𝐷2 =1389
̅ = −7.65
𝐷 𝑛 = 20
Substituting values in the formula below,
̅
𝐷
𝑡=
2
∑ 𝐷 2 − (∑ 𝐷)
√ 𝑛
𝑛(𝑛 − 1)
−7.65
𝑡=
(−153)2
√ 1389 − 20
20(19)
𝑡 = −10.087
̅
𝐷
𝑡=
2
∑ 𝐷 2 − (∑ 𝐷)
√ 𝑛
𝑛(𝑛 − 1)
Where:
̅
𝐷 = the mean difference between the pretest and the posttest.
D = the difference between the pretest and the posttest
n = the sample size
𝑑𝑓 = 19
𝑡0.05 = 2.093 or
𝑡0.05 = 𝑇. 𝐼𝑁𝑉. 2𝑇(0.05,19) = 2.093
If the computed t is negative, to obtain the p-value using the Excel built-in
function, take note that we must use its absolute value by omitting the
negative sign, hence we have
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑇. 𝐷𝐼𝑆𝑇. 2𝑇(10.087,19) = 0.000
Pretest Posttest
16 20
18 20
16 24
24 28
20 20
25 30
22 23
18 24
15 19
15 15
Exercise:
Ten subjects were given an attitude test on a controversial issue. Then,
they were shown a film favorable to the ten subjects and the same attitude
test was administered. Make a directional test at 𝛼 = 0.05.
THE Z-TEST
The z-test is another test under parametric statistics requiring normality of
the distribution. It utilizes the two population parameters 𝜇 𝑎𝑛𝑑 𝜎. It is used to compare
two means: the sample mean, and perceived population mean. It is also used to
compare the two-sample means reckoned from the same population. It is used when
the samples are equal to or greater 30. The z-test can be applied in two ways: the
One-Sample Mean Test and the Two-Sample Mean Test.
The tabular value of z-test at 0.01 and 0.05 level of significance are shown
below.
Level of Significance
Test
0.01 0.05
One-tail ± 2.33 ± 1.645
Two-tails ± 2.575 ± 1.96
(𝑥̅ − 𝜇)√𝑛
𝑧=
𝜎
Where:
𝑥̅ = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛
𝜇 = ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
of the population mean
n = sample size
Example 1. XYZ company claims that the average lifetime of a certain tire is at least
26,500 km. To check the claim, a taxi company puts 40 of these tires on its taxis and
gets a mean lifetime of 24,430 km. With a standard deviation of 1240 km, is the claim
true? Use z-test at 0.05.
Computation:
(𝑥̅ − 𝜇)√𝑛
𝑧=
𝜎
(24430−26500)√40
= = −10.56
1240
To convert computed z to p-value, we can still make use of the Excel built-
in function by supplying higher value for degrees of freedom of at least 1000.
Hence, we get
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑇. 𝐷𝐼𝑆𝑇. 2𝑇(10.56,1000) = 0.000
V. Decision Rule: If the z-computed value >= z-tabular value, then reject the
null hypothesis or if p-value<=alpha, reject Ho.
VI. Conclusion: Since the z-computed value of 10.56 is greater than the z-
tabular value of 1.96, at 0.05 level of significance, the research hypothesis
is accepted. This means that the average lifetime of certain tires is no longer
26,500 km. It implies that the average lifetime of these tires is already less
than this value.
̅̅̅̅
𝑋1 −𝑋̅̅̅̅
2
𝑧=
𝑠 2𝑠 2
√ 1+ 2
𝑛1 𝑛2
Where:
̅̅̅1 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 1
𝑋
̅̅̅2 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 2
𝑋
𝑠12 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 1
𝛼 = 0.01
𝑧0.01 = ±2.575 or
𝑧0.01 = 𝑇. 𝐼𝑁𝑉. 2𝑇(0.01,1000) = 2.58
IV. Statistics:
Two sample mean z-test
̅̅̅̅
𝑋1 −𝑋̅̅̅̅
2
𝑧=
𝑠 2
𝑠 2
√ 1+ 2
𝑛1 𝑛2
85−75
𝑧= 38 32
√ +
100 100
𝑧 =11.95
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑇. 𝐷𝐼𝑆𝑇. 2𝑇(11.95,1000) = 0.000
V. Decision Rule: If the z-computed >= z-tabular value, then reject the null
hypothesis or if p-value<=alpha, reject Ho.
THE F-TEST
The F-test otherwise known as the Analysis of Variance (ANOVA) is used in
comparing the means of two or more independent groups. One-way ANOVA is used
when there is only one variable involved. The Two-way ANOVA is used when two
variables are involved: the column and row variables. The researcher is interested to
know if there are significant differences between and among column and row
variables. This is also used to determine if there is an interaction effect between the
variables being analyzed.
Like the t-test, the F-test is also a parametric test, which requires that the samples
are normally distributed and that the data are expressed as interval or ratio. This test
is more efficient than the other tests of difference.
Brand
A B C D
8 10 2 4
5 12 3 6
4 9 5 7
7 8 6 9
8 7 4 3
5 10 3 4
4 11 3 6
Perform the analysis of variance and test the hypothesis at 0.05 level of
significance that the average sales of the four brands of soap are equal.
Solving by Stepwise Method:
I. Problem: Is there a significant difference in the average sales of the four
brands of soap?
II. Hypotheses:
Ho: There is no significant difference in the average sales of
the four brands of soap.
Ha: There is a significant difference in the average sales of the
four brands of soap.
IV. Statistics
One-Way-Analysis of Variance (F-test) computation:
A B C D
2 2
𝑋1 𝑋1 𝑋2 𝑋2 𝑋3 𝑋32 𝑋4 𝑋42
8 64 10 100 2 4 4 16
5 25 12 144 3 9 6 36
4 16 9 81 5 25 7 49
7 49 8 64 6 36 9 81
8 64 7 49 4 16 3 9
5 25 10 100 3 9 4 16
4 16 11 121 3 9 6 36
41 259 67 659 26 108 39 243
̅̅̅1=41 = 5.857
𝑋 ̅̅̅2=67 = 9.571
𝑋 ̅̅̅3=26 = 3.714
𝑋 ̅̅̅4=43 = 5.571
𝑋
7 7 7 7
Note: When it was found out that Ho was rejected or that there was a significant
difference in the computed means among groups, use Scheff𝑒́ s Test to identify
which groups have significant differences.
SCHEFF𝑬́S TEST
̅̅̅1 − 𝑋
(𝑋 ̅̅̅2 )2
𝐹′ =
𝑆𝑊 2 (𝑛1 + 𝑛2 )
𝑛1 𝑛2
Where:
𝐹’ = 𝑆𝑐ℎ𝑒𝑓𝑓𝑒’𝑠 𝑇𝑒𝑠𝑡
𝑋1 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝 1
𝑋2 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝 2
𝑛1 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑖𝑛 𝑔𝑟𝑜𝑢𝑝 1
𝑛2 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑖𝑛 𝑔𝑟𝑜𝑢𝑝 2
𝑆𝑊 2 = 𝑤𝑖𝑡ℎ𝑖𝑛 𝑚𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
Brand A vs B: Brand A vs C:
(5.857−9.571)2 (5.857−3.714)2
F’ = 3.071(7+7) F’ = 3.071(7+7)
7(7) 7(7)
F’ = 15.721 F’ = 5.234
Brand A vs D: Brand B vs C:
(5.857−5.571)2 (9.571−3.714)2
F’ = 3.071(7+7) F’ = 3.071(7+7)
7(7) 7(7)
F’ = 0.093 F’ = 39.097
Brand B vs D: Brand C vs D:
(9.571−5.571)2 (3.714−5.5712
F’ = 3.071(7+7) F’ = 3.071(7+7)
7(7) 7(7)
F’ = 18.235 F’ = 3.930
Comparison of the Average Sales of the Four Brands of Soap
(F0.05)
Between Brand
F’ (k-1) Interpretation
(3.01) (3)
A vs B 15.721 9.03 Significant
A vs C 5.234 9.03 Not Significant
A vs D 0.093 9.03 Not Significant
B vs C 39.097 9.03 Significant
B vs D 18.235 9.03 Significant
C vs D 3.930 9.03 Not Significant
The above table shows the F’-computed values for all the four brands of soap under
comparison.
Since the F’-computed values of brands 5.234, 0.093, and 3.93 respectively for A vs
C, A vs D, and C vs D are all less than the F-tabular value of 9.03 at 0.05 level of
significance and 3 degrees of freedom, thus it can be said that the means for brands
A, C, and D of soap have no significant differences. On the other hand, inasmuch as
the F’-values of 15.721, 39.097 and 18.235 respectively for A vs B, B vs C and B vs
D are all greater than the F-tabular values, hence the means for brands A vs B, B vs
C, and B vs D have significant differences. It implies that the brand B of soap is more
saleable than the brands C and D.
Exercise. The following data represent the operating time in hours of the 3 types of
scientific pocket calculators before a recharge is required. Perform the Analysis of
Variance (F-test) at 0.05 level of significance.
Brand
A B C
4.8 6.7 7.6
5.6 6.9 7.2
4.7 7.2 7.6
6.2 5.8 6.8
4.4 5.4 6.3
6.9 6.3
4.8
LESSON 10 - THE F-TEST (TWO-WAY-ANOVA WITH INTERACTION EFFECT)
In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-
way ANOVA that examines the influence of two different categorical independent
variables on one continuous dependent variable.
The Two-Way ANOVA uses the following formulas to ultimately calculate the values
of F for: Between Columns, Rows and Interaction.
(𝐺𝑇)2
• Correction Factor, 𝐶𝐹 = 𝑁
• Total Sum of Squares, 𝑆𝑆𝑇 = ∑ 𝑥 2 − 𝐶𝐹
2
(∑ 𝑋𝑤 )
• Within Sum of Squares, 𝑆𝑆𝑊 = ∑ 𝑥 2 − ∑ ,w = within number
𝑛𝑤
2
(∑ 𝑋𝑐 )
• Column Sum of Squares, 𝑆𝑆𝑐 = ∑ − 𝐶𝐹, c=column number
𝑁𝑐
2
(∑ 𝑋𝑟 )
• Row Sum of Squares, 𝑆𝑆𝑟 = ∑ − 𝐶𝐹, r=row number
𝑁𝑟
• Interaction Sum of Squares, SScr = SST-SSW-SSc-SSr
Degrees of Freedom:
• Total Degree of Freedom, dft = N-1
• Within Degree of Freedom, dfw = k(ni-1)
• Column Degree of Freedom, dfc= c-1
• Row Degree of Freedom, dfr= r-1
• Interaction Degree of Freedom, dfcr = (c-1)(r-1)
𝑆𝑆
• Mean of Square, MS = 𝑑𝑓
𝑀𝑆𝑎𝑛𝑦
• F-test, F = 𝑀𝑆𝑤
Steps: Calculate
(1290)2
1. CF= = 55470
30
2. SST = 382+422+442+…+462+422- CF
= 56080 - 55470 = 610
3. SSW = 56080 - (1992+2472+2032+1952+2182+2282) /5 = 56080 - 55870.4 = 209.6
4. SSc = (3942+4652+4312)/10 - CF = 55722.2 - 55470 = 252.2
5. SSr = (6492+6412)/15 - CF = 55472.13 – 55470 = 2.13
6. SScr = SST-SSW-SSc-SSr = 610 – 209.6 – 252.2 – 2.13 = 146.07
It can be deduced from this table, that since the F-computed value of 14.44 for
the column (instructor) is greater than the F-tabular value of 3.403 at 0.05 level
of significance with 2 and 24 degrees of freedom, hence the null hypothesis is
rejected. This means that there is a significant difference in the performance of
students under three different instructors. Instructor factor affects the
performance of the students. This implies that students under teacher B have
performed better than those under teacher A and teacher C. While students
under teacher A have the poorest performance.
On the other hand, since the F-computed value of 0.24 for row(methods) is less
than the F-tabular value of 4.260 with 1 and 24 degrees of freedom, the null
hypothesis is accepted. This indicates that as far as methods of teaching is
concerned, the performance of the students is unaffected. Methods of teaching
doesn’t matter.
The F-computed value of 8.36 against the F-tabular value of 3.403 indicates
that there is an interaction effect between instructors and their methods of
teaching. Students under instructor B have better performance under methods
of teaching 1 while students under instructor C have better performance under
methods of teaching 2.
High
r=+
x
Low High
If the trend of the line graph is going upward, the value of r is positive. This
indicates that as x increases, the value of y also increases. Likewise, if x decreases,
the value of y also decreases, the x and y are positively correlated.
High
r=-
x
Low High
If the trend of the line graph is going downward, the value of r is negative. It
indicates that as x increases, the corresponding value of y decreases, x and y are
negatively correlated.
High
Low High
If the trend of the line graph cannot be established either upward or downward
(i.e., when the plotted points are so scattered), then r=0. This indicates that there is
no correlation between the x and y variables.
The formula for the Pearson Product Moment Coefficient of Correlation, r is:
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝑟=
√(𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 )(𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 )
Where:
r = the Pearson Product Moment Coefficient of
Correlation
n = sample size
∑ 𝑥𝑦 = the sum of the product x and y
∑ 𝑥 ∑ 𝑦 = the product of the sum of x and the sum of
y
∑ 𝑥 2 = sum of squares of x
∑ 𝑦 2 = sum of squares of y
Example 1. Below are the midterm (x) and final (y) examinations.
x 75 70 65 90 85 80 70 65 90 88
y 80 77 65 94 88 85 88 76 72 91
𝑡
𝑟=
√𝑡 2 +𝑛−2
Computation:
No x y x2 y2 xy
1 75 80 5625 6400 6000
2 70 77 4900 5929 5390
3 65 65 4225 4225 4225
4 90 94 8100 8836 8460
5 85 88 7225 7744 7480
6 80 85 6400 7225 6800
7 70 88 4900 7744 6160
8 65 76 4225 5776 4940
9 90 72 8100 5184 6480
10 88 91 7744 8281 8008
Total 778 816 61444 67344 63943
10(63943) − (778)(816)
𝑟=
√[10(61444) − (778)2 ][10(67344) − (816)2 ]
𝑟 = 0.55
To convert this value to p-value, we can first convert this value to t-value
𝑛−2
using the formula 𝑡 = 𝑟√1−𝑟 2 and afterwards, we can make use of the
function 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑇. 𝐷𝐼𝑆𝑇. 2𝑇(𝑡, 𝑑𝑓).
8
𝑡 = 0.55√ = 0.162
1 − 0.552
VI. Conclusion/Implication:
Since the r-computed value of 0.55 is less than the r-tabular value of 0.632
at 0.05 level of significance with 8 degrees of freedom, hence, the null
hypothesis is accepted. This means that there is no significant relationship
between the midterm examinations of the students and the final
examinations. It implies that the midterm grade has no say with the final
grade.
Coefficient of Determination
CD = (0.55)2x (100%)
CD = 30.25 %
This value of 30.25 % indicates that the final examination does not depend on
the midterm examination, implying that the final exam is not influenced by the midterm
exam.
Computation:
No X y x2 y2 xy
1 4.8 42.5 23.04 1806.25 204
2 3.2 38.6 10.24 1489.96 123.53
3 3.6 40.2 12.96 1616.04 144.72
4 3.3 38.5 10.89 1482.25 127.05
5 5.2 45.4 27.04 2061.16 236.08
6 5.6 48.8 31.36 2381.44 273.28
7 3.3 40.0 10.89 1600 132
8 4.3 38.4 18.49 1474.56 165.12
9 3.8 42.4 14.44 1797.76 161.12
10 4.6 40.7 21.16 1656.49 187.22
Total 41.7 415.5 180.51 17365.91 1754.11
10(1754.11) − (41.7)(415.5)
𝑟=
√[10(180.51) − (41.7)2 ][10(17365.91) − (415.5)2 ]
𝑟 = 0.827
V. Decision Rule: If the computed value is greater than the tabular value, reject
null hypothesis.
VI. Conclusion: Since the r-computed value of 0.827 is greater than r-tabular value
of 0.632 at 0.05 level of significance with 8 degrees of freedom, hence, the null
hypothesis is rejected in favor of the research hypothesis. This means that the
advertising cost is related to the sales. The sales are influenced by the
advertisement, implying that the higher the cost of advertisement, the higher
the sales.
And since the variable advertising cost, denoted x, is significantly related to the
sales, denoted y, Simple Linear Regression can be used.
Solving for:
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦 10(1754.11)−(41.7)(415.5)
1. 𝑏 = = , 𝑏 = 3.243
𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 10(180.51)−(41.7)2
2. 𝑎 = 𝑦̅ − 𝑏𝑥̅ , 𝑥̅ = 4.17, 𝑦̅ = 41.55, 𝑎 = 41.55 − 3.243(4.17) = 28.027
𝑦 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + ⋯ + 𝑏𝑛 𝑥𝑛
Where:
y = the dependent variable to be predicted
x1, x2, x3, …, xn = the known independent variables that may
influence y.
b0, b1, b2, …, bn = the arbitrary constants whose values can
be determined from the observed data.
When there are two independent variables x1 and x2 and we want to fit the
equation in the equation model, we use the equation,
𝑦 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2
∑ 𝑥1 𝑦 = ∑ 𝑥1 𝑏0 + ∑ 𝑥12 𝑏1 +∑ 𝑥1 𝑥2 𝑏2 (2)
∑ 𝑥2 𝑦 = ∑ 𝑥2 𝑏0 +∑ 𝑥1 𝑥2 𝑏1 + ∑ 𝑥22 𝑏2 (3)
Example 1. The following are data on the ages and salaries of a random sample of 6
executives working at BBC corporation and their academic achievements while in
college.
Computations:
No 𝑦 𝑥1 𝑥2 𝑥12 𝑥22 𝑥1 𝑦 𝑥2 𝑦 𝑥1 𝑥2
1 82.3 39 1.75 1521 3.0625 3209.7 144.03 68.25
2 75.6 33 2 1089 4 2494.8 151.2 66
3 85.4 42 1.5 1764 2.25 3586.8 128.1 63
4 78.8 39 2 1521 4 3073.2 157.6 78
5 73.2 30 2.25 900 5.0625 2196 164.7 67.5
6 69.3 29 2.5 841 6.25 2009.7 173.25 72.5
Total 464.6 212 12 7636 24.625 16570 918.88 415.25
𝑏0 = 83.68
𝑏1 = 0.423
𝑏2 = −10.593
Take note that 3 linear equations with 3 unknowns can be solved by Elimination,
Substitution, or by Extended Matrix (Cramer’s Rule) or you can make use of your
scientific calculators with built-in functions dedicated for solving such equations.
Exercise: The following data were obtained for 8 students in Management: midterm
grades, average grade in quizzes and their final grades. (Antonio S. Broto,
Statistics Made Simple, pp 240, exercise 18).
Midterm Average Grade in Final Grade
Grade Quizzes
3.0 2.5 2.75
1.5 1.5 1.5
1.25 1.5 1.25
1.25 1.25 1.25
1.75 2.0 1.5
2.75 2.75 2.5
2.5 2.5 2.5
2.0 2.25 2.0
a) Use the method of least squares (multiple regression) to fit an equation of the
form 𝑦 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 .
b) Predict the final grade of student whose midterm grade is 1.75 and the average
grade in the quizzes is 1.5.
MODULE 3
Nonparametric tests also utilize both nominal and ordinal data. Nominal data
are expressed in categories, whereas, the ordinal data are expressed in ranking.
The most commonly used tests under the nonparametric tests are the Chi-
Square test; U-test; H-test; Spearman Rank Order Coefficient of Correlation, rs; Sign
Test (Median Test); Mc Nemar’s Test, Friedman Test, Fr; and Kendall’s Coefficient of
Concordance, W.
Objectives:
When you have completed the module, you will be able to:
• Distinguish different types of Chi-Square tests and their uses.
• Identify parametric and nonparametric tests which are counterparts.
• Recognize data appropriate for nonparametric tests.
• Convert data suited for parametric tests into data for nonparametric tests.
• Apply the Stepwise Method in solving statistical problems into nonparametric tests.
This test is a test of difference between the observed and expected frequencies.
The Chi-Square is considered a unique test because it has 3 functions which are as
follows:
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
Where:
Example. A coin is thrown 100 times. The observed frequency for the head being on
top when it falls is 48, while the toe being 52. Using Chi-Square, determine if there is
a significant difference between the observed and expected frequencies. Use 0.05
level of significance.
We can also obtain the critical value of Chi-Square by using the built-in
function CHISQ.INV.RT(alpha, df). Thus, we have
2
Χ0.05 = 𝐶𝐻𝐼𝑆𝑄. 𝐼𝑁𝑉. 𝑅𝑇(0.05,1) = 3.841
2
(𝑂 − 𝐸)2
𝜒 =∑
𝐸
(48 − 50)2 (52 − 50)2
𝜒2 = +
50 50
𝜒 2 = 0.16
The p-value can be obtained using the built-in function
VI. Conclusion: Since the 𝜒 2 − 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 of 0.16 is less than the 𝜒 2 −
𝑡𝑎𝑏𝑢𝑙𝑎𝑟 𝑣𝑎𝑙𝑢𝑒 of 3.481 at 0.05 level of significance with 1 degree of
freedom, the null is accepted. This means that there is no significant
difference between the observed and the expected frequencies. This
implies that the theory of a fifty-fifty chance for each face of the coin being
on top holds true inasmuch as the value of 𝜒 2 does not warrant the theory
to be rejected.
Exercise:
A certain machine is supposed to mix peanuts, hazelnuts, cashews, and pecans in the
ratio of 4:3:2:1. A can containing 500 of these mixed nuts was found to have 275
peanuts, 105 hazelnuts, 76 cashews and 44 pecans. At 0.05 level of significance, test
the hypothesis that the machine is mixing the nuts at this specified ratio.
Where:
𝜒 2 = 𝑐ℎ𝑖 − 𝑠𝑞𝑢𝑎𝑟𝑒 𝑡𝑒𝑠𝑡
𝑁 = 𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
𝑎, 𝑏, 𝑐 𝑎𝑛𝑑 𝑑 = 𝑡ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 𝑎𝑟𝑟𝑎𝑛𝑔𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑎𝑏𝑙𝑒 𝑎𝑠 𝑏𝑒𝑙𝑜𝑤
𝑘𝑙𝑚𝑛 = 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑜𝑤𝑠 𝑎𝑛𝑑 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 𝑡𝑜𝑡𝑎𝑙
Example. Evaluate the attitude of a sample of LDP and Nationalista parties on the
issue of peace and order in Mindanao. To carry out the study, a random sampling of
members of each party is drawn from the nationwide population of LDP and
Nationalista and each individual in both samples responds to the scale. Scores are
then classified into “Favorable” or Unfavorable” categories. The following frequencies
were recorded:
Computation:
𝑁(𝑎𝑑 − 𝑏𝑐)2
𝜒2 =
𝑘𝑙𝑚𝑛
200((63)(48)−(37)(52))2
𝜒2 = (100)(100)(115)(85)
𝜒 2 =2.476 or
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝐶𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇. 𝑅𝑇(2.476,1) = 0.116
V. Decision Rule: If the 𝜒 2 − 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 value is greater than the 𝜒 2 − 𝑡𝑎𝑏𝑢𝑙𝑎𝑟
value, reject null hypothesis.
VI. Conclusion: Since the 𝜒 2 − 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 value of 2.476 is less than the 𝜒 2 −
𝑡𝑎𝑏𝑢𝑙𝑎𝑟 value of 3.841 at 0.05 level of significance with 1 degree of freedom,
hence the null hypothesis is accepted. This means that there is no
significant difference between the attitudes of the two political parties on the
issue of peace and order in Mindanao.
Exercise. Using Chi-Square at 0.05 level of significance, determine the attitude of the
TCU community on the issue of charter change. To carry out such study, 100 samples
from each teaching and non-teaching personnel were taken to respond to the issue
categorized as “YES” or “NO”. The following frequencies were obtained:
YES NO Total
Teaching 34 (a) 66 (b) 100 (k)
Non-Teaching 55 (c) 45 (d) 100 (l)
Total 89 (m) 111 (n) 200 (N)
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
Where:
i = row number
j= column number
Example. 100 individuals, male and female, were given an IQ test and their scores
were classified into high and low. Using the 𝜒 2 -test of independence at 0.05 level of
significance, the table is shown as follows:
IQ
Sex High Low Total
O E O E
Male 23 33 56
Female 32 12 44
Total 55 45 100
IQ
Sex High Low Total
O E O E
Male 23 30.80 33 25.20 56
Female 32 24.20 12 19.80 44
Total 55 55 45 45 100
(56)(55)
𝐸11 = = 30.80
100
(56)(45)
𝐸12 = = 25.20
100
(44)(55)
𝐸21 = = 24.20
100
(44)(45)
𝐸22 = = 19.80
100
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
(23 − 30.80)2 (33 − 25.2)2 (32 − 24.20)2 (12 − 19.80)2
𝜒2 = + + +
30.80 25.2 24.20 19.80
𝜒 2 = 9.976 or
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝐶𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇. 𝑅𝑇(9.976,1) = 0.002
VI. Conclusion: Inasmuch as the 𝜒 2 − computed value of 9.976 is greater than the
𝜒 2 − tabular value of 3.841 at 0.05 level of significance with 1 degree of
freedom, the null hypothesis is rejected. This means that there is a
significant relationship between sex and IQ. It implies that the female has a
better IQ than the male counterpart.
Students’ Activity:
Two lots of 30 experimental guinea pigs were used in testing the effectiveness
of a new serum in combating a certain disease. Both were inoculated with the new
organism but only one lot was previously given the preventive serum. Is the serum
effective? Use 0.01 level of significance.
Serum No Serum Total
Recovered 12 3 15
Died 2 13 15
Total 14 16 30
Note: When df is 1 and any expected frequency is small, less than 10, the 𝜒 2 − 𝑡𝑒𝑠𝑡,
using the Yate’s correction for lack of continuity will be applied because the distribution of Chi-
Square is discrete. Whereas the values obtained by the use of the formula result in a
continuous probability model. The formula used is
(|𝑂−𝐸|−0.5)2
𝜒2 = ∑ 𝐸
or
𝑁 2
𝑁 (|𝑎𝑑 − 𝑏𝑐| − )
𝜒2 = 2
𝑘𝑙𝑚𝑛
Example. Of the eighteen selected patients who were seriously infected of covid 19,
ten were treated with a new serum and eight were not. The number of days the patient
recovered were then recorded. Using the Wilcoxon-rank-sum test at 0.05 level of
significance, test whether the serum is effective, consider the following data:
With 6.5 5.4 5.3 7.2 4.5 8.4 6.8 7.3 6.7 8.2
Treatment
No Treatment 15.2 18 13.6 12.2 13.1 12.0 17.6 19.4
10(10 + 1)
𝑈1 = 55 − =0
2
8(8 + 1)
𝑈2 = 116 − = 80
2
𝑈 = min(0,80) = 0
VI. Conclusion:
Since the smaller of U1=0 and U2=80, which is 0, is less than the tabular value
of 17, at 0.05 level of significance with 10 and 8 degrees of freedom, hence the
null hypothesis is rejected in favor of the research hypothesis. This means that
the serum is effective in the treatment of Covid 19. It implies that patients with
treatment of serum recover more rapidly than those patients without treatment
of the serum because it takes them less number of days to recover.
12 𝑅𝑖2
𝐻= ∑ − 3(𝑛 + 1)
𝑛(𝑛 + 1) 𝑛𝑖
Where:
H = Kruskal Wallis test
n = number of observations
Ri = total ranks of i group
ni = no. of observations in i group
i = group location
Example. Consider the examination scores of samples of college students
taught in English using three different methods: Method 1 (Face-to-Face
classroom teaching), Method 2 (On-line teaching), and Method 3 (Modular
teaching). Use the H-test at 0.05 level of significance to test the null hypothesis
that their means are not equal. Consider the following data:
IV. Statistics:
Kruskal Wallis, H-test
Computation:
Method 1 R1 Method 2 R2 Method 3 R3
96 18 87 10.5 88 12.5
86 8.5 88 12.5 76 3
91 16 89 14.5 74 2
93 17 85 7 64 1
89 14.5 81 6 79 4
87 10.5 86 8.5
80 5
n1=6 84.5 n2=7 64 n3=5 22.5
12 𝑅𝑖2
𝐻= ∑ − 3(𝑛 + 1)
𝑛(𝑛 + 1) 𝑛𝑖
12 84.52 642 22.52
𝐻= ( + + ) − 3(18 + 1)
18(18 + 1) 6 7 5
𝐻 = 8.840 or
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝐶𝐻𝐼𝑆𝑄. 𝐷𝐼𝑆𝑇. 𝑅𝑇(8.840,2) = 0.012
V. Decision Rule: if the computed H >= tabular value, reject Ho or if p-
value<=alpha, reject Ho.
VI. Conclusion: Since the computed H of 8.840 is greater the tabular value of 5.991
at 0.05 level of significance with 2 degrees of freedom, the null hypothesis is
rejected. This means that there are significant differences in the average scores
of 18 students using three different Methods of teaching. It implies that Method
1 (face-to-face teaching) is more effective than the other two methods (on-line
and modular teaching, respectively).
Students’ Exercise:
The data on war on rape cases under the Duterte administration committed from
January to December in 3 cities in Metro Manila are as follows:
City
Month A B C
January 5 6 6
February 6 7 7
March 4 8 8
April 3 3 9
May 4 5 7
June 5 7 10
July 4 4 9
August 3 3 6
September 7 4 7
October 5 5 9
November 6 4 10
December 4 2 8
Perform Kruskal-Wallis H-test to determine if the hypothesis that the average rape
cases in the 3 cities are significantly the same. Use 𝛼 = 0.05 level of significance.
∑ 𝐷2
𝑟𝑠 = 1 − 6
𝑛(𝑛2 − 1)
Where:
𝑟𝑠 = Spearman Rank Order Coefficient of
Correlation
∑ 𝐷 2 = sum of the squares of the difference
between rank x and rank y
𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
Example. The following are the number of hours spent by 12 students studying
for the final examination and the scores they obtained in Calculus. Calculate
𝑟𝑠 at 0.05 level of significance.
Number of
Hours Spent 6 5 10 11 12 18 15 20 8 9 14 12
x
Final Scores
2.75 3 2 2 1.75 1.5 1.5 1.25 2.25 2.25 1.25 2
y
Ha: There is a significant relationship between the number of hours spent for
studying Calculus and the scores students obtained in the Final Examination.
III. Level of Significance:
𝛼 = 0.05
df = 𝑛 − 2 = 12 − 2 = 10
Note: The degree of freedom used for this statistic is same as that used for
Pearson r. It uses also same table for its critical values. Hence, we can adopt
similar techniques we have used for Pearson r.
Number
Final
of
Scores Rx Ry D D2
Hours
y
Spent x
6 2.75 11 11 0 0
5 3 12 12 0 0
10 2 8 7 1 1
11 2 7 7 0 0
12 1.75 5.5 5 0.5 0.25
18 1.5 2 3.5 -1.5 2.25
15 1.5 3 3.5 -0.5 0.25
20 1.25 1 1.5 -0.5 0.25
8 2.25 10 9.5 0.5 0.25
9 2.25 9 9.5 -0.5 0.25
14 1.25 4 1.5 2.5 6.25
12 2 5.5 7 -1.5 2.25
∑ 𝐷2 = 13
∑ 𝐷2
𝑟𝑠 = 1 − 6
𝑛(𝑛2 − 1)
13
𝑟𝑠 = 1 − 6 = 0.955
12(122 − 1)
10
𝑡 = 0.955√ = 10.182
1 − 0.9552
The computed value of rs=0.955 being greater than the tabular value of
0.576 at 0.05 level of significance with 10 degrees of freedom, leads to the
rejection of the null hypothesis. This indicates that there is a significant
relationship between the number of hours spent for studying and the scores the
students obtained in Calculus. It implies that the more students spend their time
in studying Calculus, the higher the scores they obtain.
Students’ Exercise:
The following is the ranking of two judges given to the work of 10 artists. Using
rs at 0.05 level of significance, test the hypothesis that the two judges differ most in
their opinions about these artists.
Judge A 6 7 8 5 9 3 2 1 4 10
Judge B 5 9 10 4 8 5 2 1 6 7
This test is known as the Median Test under nonparametric statistics. This test
is used to compare the median of two independent samples. It’s the counter part of
the t-test under parametric test, though what is being compared by t-test are the means
of two independent groups or samples. The data consist of two independent samples
of n1 and n2 observations.
𝑁(𝑎𝑑 − 𝑏𝑐)2
𝜒2 =
𝑘𝑙𝑚𝑛
Where:
𝜒 2 = 𝑐ℎ𝑖 − 𝑠𝑞𝑢𝑎𝑟𝑒 𝑡𝑒𝑠𝑡
𝑁 = 𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
𝑎 𝑎𝑛𝑑 𝑐 = 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 (+)𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
𝑏 𝑎𝑛𝑑 𝑑 = 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 (−)𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
𝑘 𝑎𝑛𝑑 𝑙 = 𝑡ℎ𝑒 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙𝑠
𝑚 𝑎𝑛𝑑 𝑛 = 𝑡ℎ𝑒 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙𝑠
The procedure is to get the median of the data jointly. The data above this
median are assigned a (+) sign, while those at or below this median a (-) sign. Then
the number of + and – signs for each sample is obtained. A Chi-Square test is used to
determine whether the observed frequencies of +and – signs differ significantly.
I. Is there a significant difference between the IQ test scores of the male and
female students?
II. Hypotheses:
Ho: There is no significant difference between the IQ test scores of the male
and female students.
Ha: There is a significant difference between the IQ test scores of the male and
female students.
III. Level of Significance:
𝛼 = 0.05
𝑑𝑓 = (𝑐 − 1)(𝑟 − 1)
𝑑𝑓 = (2 − 1)(2 − 1) = 1
2
𝜒0.05 = 3.841
IV. Statistics: Median Test for Two Independent Samples
Computation:
• Determine the median. Arrange the data jointly from lowest to highest. The median
is the middle item if there are odd number data, or the average of the two middle
items if there are even number data. The median is 95.5.
• Mark or assign a plus (+) sign to those data above the median, while those at or
below will be marked or assigned a minus (-) sign.
• Count each plus and minus sign under the female and male columns respectively.
The observed frequencies are illustrated and summarized as follows:
2
𝑁(𝑎𝑑 − 𝑏𝑐)2
𝜒 =
𝑘𝑙𝑚𝑛
22(8𝑥7 − 4𝑥3)2
𝜒2 = = 2.933
12(10)(11)(11)
Exercise. Consider the test scores of 25 students in spelling. The students are composed of 15
females and 10 males. The following are their scores:
Female 13 14 17 24 16 15 12 16 22 18 25 23 14 19 20
Male 9 16 12 15 8 9 17 11 24 16
LESSON 18 - A SIGN TEST FOR TWO CORRELATED SAMPLES (FISHER SIGN TEST)
This test is under nonparametric statistics. This is the counterpart of the t-test for
correlated sample under the parametric test. The Fisher Sign Test compares two correlated
samples and is applicable to data composed of N paired observations. The difference between
the paired observations is obtained. This test is based on the assumption that half the
difference between the paired observations will be positive and the other half will be
negative. The formula is:
|𝐷| − 1
𝑍=
√𝑁
Where:
Z = the Fisher Sign Test
D = the difference between the number of + and – signs.
Example. The pretest and the posttest results before and after the implementation of
the program are presented below:
Pretest x 16 20 30 34 12 10 18 16 11 15
Posttest y 18 24 28 32 12 8 21 14 15 17
Computation:
In this example, there are 3 + signs, 6 – signs, and 1 zero. Zero is omitted.
Thus,
|𝐷| − 1
𝑍=
√𝑁
|3 − 6| − 1
𝑍=
√9
𝑍 = 0.667 or
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑇. 𝐷𝐼𝑆𝑇. 2𝑇(0.667, 1000) = 0.505
V. Decision Rule: If Z-computed value >= Z-tabular value, reject Ho or if p-
value<=alpha, reject Ho.
VI. Conclusion:
Inasmuch as the Z-computed value of 0.667 is less than the Z-tabular value of
1.96 at 0.05 level of significance, thus the null hypothesis is accepted. This
means that there is no significant difference between the pretest and posttest
results of the 10 students.
Exercise: Perform a Fisher Sign Test for the example in t-test for correlated samples
then compare the results.
This test is under the nonparametric tests. This is a forthright extension of the
median test for two independent samples. The Chi-Square test formula is used for this
test.
Example. A sampling of the acidity of rain for 10 randomly selected rainfalls was
recorded at three different locations in the province of Albay: Guinubatan, Legaspi
City, and Polangui. The pH readings for these 24 rainfalls are shown in the table.
(Note: pH readings range from 0 to 14; 0 is acid, 14 is alkaline. Pure water falling
through clean air has a pH reading of 5.7).
Use the Median test at 0.05 level of significance to test the hypothesis that there is no
significant difference among the pH readings of the 3 different municipalities/cities of
Albay.
IV. Statistics: Sign Test for K Independent Samples (Median Test: Multi-Sample
Case)
Computation:
2
(𝑂 − 𝐸)2
𝜒 =∑
𝐸
𝜒 2 = 4.36 or
p-value = 0.113
V. Decision Rule: If the computed value is greater than the tabular value, reject
the null hypothesis.
VI. Since the 𝜒 2 − 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 value of 4.36 is less than the 𝜒 2 − 𝑡𝑎𝑏𝑢𝑙𝑎𝑟 value of
5.991 at 0.05 level of significance with 2 degrees of freedom, the null hypothesis
is accepted. This means that there is no significant difference among the pH
readings of 3 different municipalities/cities of Albay.
Note: Another way to arrive at this decision and conclusion is to use the p-
value. You can obtain this value using MS-Excel. It is compared to the level of
significance used, usually,
Where:
ofr = observed frequency range
efr = expected frequency range
CHISQ.TEST = the name of the function in MS-Excel
When using the p-value, the decision rule is: If the p-value is less than
or equal to the level of significance, reject the null hypothesis.
Since the p-value of 0.113 is greater than 0.05 level of significance, the null
hypothesis is accepted. As you can see, you arrived at same decision and
interpretation/conclusion.
This test belongs to the nonparametric statistics which doesn’t require normal
distribution of data. A Chi-Square test for the situations when samples are
matched should not be independent. This is a before and after design to test
whether there is a significant change between the before and after situations.
The formula is:
(𝑏 − 𝑐)2
𝜒2 =
𝑏+𝑐
Where:
𝜒 2 = 𝐶ℎ𝑖 − 𝑆𝑞𝑢𝑎𝑟𝑒
𝑏 = 𝑡ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑐𝑒𝑙𝑙 𝑜𝑓 𝑡ℎ𝑒 2𝑛𝑑 𝑐𝑜𝑙𝑢𝑚𝑛 𝑖𝑛 𝑎 2𝑥2 𝑡𝑎𝑏𝑙𝑒
𝑐 = 𝑡ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑐𝑒𝑙𝑙 𝑜𝑓 𝑡ℎ𝑒 2𝑛𝑑 𝑟𝑜𝑤 𝑖𝑛 𝑎 2𝑥2 𝑡𝑎𝑏𝑙𝑒
Example. Data on seat belt use before and after involvement in car accidents for a
sample of 120 accident victims.
Wore seat belt
regularly Wore seat belt regularly after the accident Total
before the
Yes No
accident
Yes a = 74 b = 12 86
No c = 23 d = 11 34
Total 97 23 120
I. Problem: Is there a significant difference in the use of seat belt before and after
involvement in the car accident?
II. Hypotheses:
Ho: There is no significant difference in the use of seat belt before and after
involvement in the car accident.
Ha: There is a significant difference in the use of seat belt before and after
involvement in the car accident.
III. Level of Significance:
𝛼 = 0.05
𝑑𝑓 = (𝑐 − 1)(𝑟 − 1) = (2 − 1)(2 − 1) = 1
2
𝜒0.05 = 3.841
IV. Statistics: Mc Nemar’s test for correlated proportion
Computation:
(𝑏−𝑐)2 (12−23)2
𝜒2 = = = 3.457
𝑏+𝑐 12+23
V. Decision Rule: If the computed 𝜒 2 is greater than the tabular 𝜒 2 , reject Ho.
VI. Conclusion: Since the computed 𝜒 2 of 3.457 is less than the tabular 𝜒 2 of 3.841
at 0.05 level of significance and 1 degree of freedom, the null hypothesis is
accepted. This means that there is no significant difference in the use of seat
belt before and after involvement in the car accident. This implies that, their
involvement in the car accident did not change their attitudes towards wearing
seat belts.
12
𝐹𝑟 = ∑ 𝑇𝑖2 − 3𝑏(𝑘 + 1)
𝑏𝑘(𝑘 + 1)
Where:
𝐹𝑟 = Friedman test
𝑏 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠
𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑇𝑖 = 𝑟𝑎𝑛𝑘 𝑠𝑢𝑚 𝑓𝑜𝑟 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑖
𝑖 = 1,2,3, … , 𝑘
Vaccine
Children 1 2 3 4
1 5.6 2.4 6.8 6.5
2 8.8 9.2 6.7 9.6
3 5.4 2.8 3.6 6.7
4 7.6 9.6 5.2 8.9
5 4.8 7.8 2.7 2.8
6 4.1 8.3 3.4 3.4
Computation:
Vaccines
Children 1 2 3 4
Reaction Reaction Reaction Reaction
Rank Rank Rank Rank
(cm) (cm) (cm) (cm)
1 5.6 2 2.4 1 6.8 4 6.5 3
2 8.8 2 9.2 3 6.7 1 9.6 4
3 5.4 3 2.8 1 3.6 2 6.7 4
4 7.6 2 9.6 4 5.2 1 8.9 3
5 4.8 3 7.8 4 2.7 1 2.8 2
6 4.1 3 8.3 4 3.4 1.5 3.4 1.5
Rank 15 17 10.5 17.5
Sum
12
𝐹𝑟 = ∑ 𝑇𝑖2 − 3𝑏(𝑘 + 1)
𝑏𝑘(𝑘 + 1)
12
𝐹𝑟 = (152 + 172 + 10.52 + 17.52 ) − 3(6)(4 + 1)
(6)(4)(4 + 1)
𝐹𝑟 = 3.05
V. Decision Rule: If the computed Fr is greater than the tabular Fr, reject Ho.
VI. Conclusion: Inasmuch as the 𝐹𝑟 − 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 value of 3.05 is less than the 𝐹𝑟 −
𝑡𝑎𝑏𝑢𝑙𝑎𝑟 value of 7.815 at 0.05 level of significance with 3 degrees of freedom,
the null hypothesis of no significant difference in the reaction of 6 children on
the 4 different vaccines was accepted.
Antibiotics
Children 1 2 3 4
1 4.3 8.8 5.4 6.5
2 6.8 9.2 6.7 9.4
3 5.4 8.5 4.3 8.4
4 7.2 9.6 5.2 9.3
5 6.1 9.3 3.9 7.6
Use the Friedman Fr test to assess the children’s reaction on the 4 different antibiotics
at 0.01 level of significance.
Judges
Contestants
A B C D
1 1 2 3 2
2 4 3 2 4
3 3 4 5 3
4 5 5 4 5
5 2 1 1 1
6 8 7 6 7
7 7 8 7 6
8 10 10 9 10
9 9 9 10 9
10 6 6 8 8
Computation:
Judges Sum of |𝑅̅-Sum of Ranks|
Contestants D2
A B C D Ranks D
1 1 2 3 2 9 13.1 171.61
2 4 3 2 4 13 9.1 82.81
3 3 4 5 3 15 9.1 82.81
4 5 5 4 5 19 3.1 9.61
5 2 1 1 1 5 17.1 292.1
6 8 7 6 7 28 5.9 34.81
7 7 8 7 6 28 5.9 34.81
8 10 10 9 10 39 16.9 285.61
9 9 9 10 9 37 14.9 222.01
10 6 6 8 8 28 5.9 34.81
Total 221 1250.99
221
𝑅̅ = = 22.1
10
12(1250.99)
𝒲= = 0.948
42 (10)(102 − 1)
V. Decision Rule: If the computed 𝒲 is greater than the tabular 𝒲, reject Ho.
VI. Conclusion: The computed 𝒲 of 0.948 being greater than the tabular 𝒲 value
of 0.44 at 0.05 level of significance with 4 and 10 degrees of freedom, the null
hypothesis is accepted. This means that there is a significant agreement or
concordance in the rankings of the 4 judges to the 10 contestants.
Students
Judges a b c d e
A 2 2 4 3 5
B 1 1 3 4 5
C 2 3 2 5 4
Module 1 - Additional Exercises
1.1 The performance ratings of 50 policemen in the capital town of NCR are shown
below.
Classes Frequency
70-74 5
75-79 11
80-84 13
85-89 15
90-94 6
1.3 In a PE class, 50 students were made to run in a 100-meter dash. The following
results are presented in a frequency distribution.
Seconds Frequency
10-11 2
12-13 8
14-15 14
16-17 16
18-19 8
20-21 2
6,000 7,000
8,000 7,600
9,000 10,500
8,500 9,500
8,800 9,000
9,200 10,400
9,500 10,500
11,300 12,000
8,600 9,500
7,200 8,700
9,400 10,500
10,500 10,000
11,600 12,500
12,000 13,500
15,000 15,800
Use the t-test for correlated samples at .05 level of significance to test whether there
were significant changes in the income of the jeepney drivers after the implementation
of the program.
2.2 The data below represent the number of hours of pain relief provided by two brands
of headache syrups administered to 20 individuals. These individuals were randomly
divided into two groups and each group was rated with a different brand.
Brand X Brand Y
6 6
8 7
9 5
4 3
3 4
7 6
8 5
6 3
5 6
8
6
Use the t-test at 0.05 level of significance to test the null hypothesis that there is no
significant difference between the average number of hours of pain relief provided by
the two brands of headache syrups.
2.3 The following data show the weight losses (in mg) of certain machine parts due to
friction using two different lubricants.
Lubricant 𝐴 Lubricant 𝐵
10 7
13 6
12 6
11 5
13 7
14 10
7 9
10 4
8 11
11 6
12
13
Test at 0.01 level of significance whether the differences between the two-sample
means are significant.
2.3 The table shows the number of errors made on 10 occasions by two compositors
on their technical report. Is there a significant difference in the number of errors made
in general by the two compositors? Use the t-test for independent samples at 0.05
level of significance.
Composition 1 Composition 2
11 14
12 11
10 13
12 9
11 11
14 13
13 11
11 12
14 11
9 9
2.4 Ten samples were given an attitude test on a controversial issue. They were shown
a favorable film regarding the issue and the same attitude test was administered
before and after. Make a directional test at 0.05 level of significance.
Pretest Posttest
15 19
19 21
15 24
23 27
21 22
24 28
23 24
17 23
16 19
14 15
𝑛1 = 100 𝑥1 = 64.2
̅̅̅ 𝑆𝐷1 = 2.48
𝑛2 = 120 𝑥1 = 63.8
̅̅̅ 𝑆𝐷2 = 2.53
2.8 A study was made to check whether the male average income is higher than the
female average income. Use Z-test at 0.05 level of significance.
2.9 The table below shows the number of minutes that patients had to wait for their
appointment with 5 doctors.
Doctor
A B C D E
21 10 19 10 20
19 12 18 12 31
22 16 17 29 25
31 13 16 30 27
29 19 21 16 26
Use the F-test at 0.05 level of significance to test if there are significant differences
among the means of the samples.
2.10 The following are the data on homicide cases committed from January to
December in 3 cities in the National Capital Region are as follows:
City
January A B C
February 4 5 6
March 5 7 5
April 3 6 5
May 6 7 7
June 4 3 9
July 7 2 8
August 8 5 10
September 5 3 5
November 3 7 6
December 2 3 8
Perform the one-way analysis of variance to test the null hypothesis that the average
homicide cases in the 3 cities are equal at 𝛼 = 0.05 level of significance.
2.11 A research study was conducted on 4 groups of students. The following number
of correct responses were recorded out of 10 trials.
Apply the analysis of variance to find out if the groups differed significantly in
their performance. Use 𝛼 = 0.05.
Group
Trial A B C D
1 8 7 3 2
2 6 8 4 7
3 9 9 6 9
4 10 10 5 8
5 11 9 6 7
6 10 12 6 8
7 8 14 3 6
8 10 10 2 4
9 6 14 4 6
10 8 9 3 2
2.12 The following are the test scores of 20 students under two teachers and two kinds
of modules used in teaching English. Apply two-way-ANOVA, at 0.05 level of
significance.
TEACHERS
Textbooks A B Total
5 9
6 8
A 3 3
2 8
4 9
Sub Total
6 7
9 8
B 3 8
2 9
1 5
Total
2.13 Two sets of attitudinal scales were administered to two groups of students from
private and public schools. Perform the two-way-ANOVA at 0.05 level of significance
if there is significant difference between the two groups of students coming from two
different schools and two groups of students given Set A and Set B on attitudinal
scale?
SCHOOLS
22 16
18 17
Set A 9 6
10 8
8 5
8 19
Sub Total
23 24
19 25
Set B 21 19
17 18
19 17
11 11
Total
2.14 The table below shows the percentage of the votes predicted by a poll survey A
for 10 candidates for the senate on different cities and the percentage of the votes
which they actually received B.
42 48
47 50
32 37
38 38
26 25
35 34
39 47
51 53
19 31
25 27
Use the Pearson Product Moment Correlation Coefficient at 0.05 level of significance
to determine if there is a significant relationship between the poll survey and the actual
votes received.
2.15 The following are the scores of 10 students in the final examination in
mathematics of investment (MOI) and accounting.
81 84
64 69
65 66
74 64
75 77
76 81
77 76
80 84
83 96
88 84
Find the value of r and interpret the result at 0.05 level of significance.
2.16 A study was made by the SM Bicutan Hypermarket to determine the relationship
between weekly sales and advertising expenditures. The following data were
recorded. Use 𝑟 at 0.05 level of significance.
Adverting Cost Sales
(in thousand pesos) (in thousand pesos)
5.0 37.5
2.5 40.2
3.0 38.4
2.5 35.1
5.5 45.6
6.0 56.7
2.0 23.5
3.5 42.5
2.5 43.8
3.0 46.3
2.17 The following midterm grade, average grade in quizzes and final grade of 8
college students in Civil Engineering were obtained by a certain professor in TCU.
𝑦 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2
b. Predict the final grade of student whose midterm grade is 1.5 and the average
grade in the quizzes is 1.75.
2.18 The table below displays the area of the lot, the number of bedrooms, and the
prices at which 10 one-family cottages sold at DMCI subdivision:
250 4 1.7
200 3 1.5
160 2 0.9
180 2 1.2
240 4 1.8
320 5 2.6
270 3 1.8
300 4 2.2
250 3 1.6
220 3 1.4
Determine a linear equation which will enable us to predict the average sale price of
one - family cottage in terms of the area in square meters and the number of
bedrooms.
3.1 In 120 tosses of a coin, 64 heads and 56 tails are recorded. Is this a balanced
coin? Use 𝜒 2 − 𝑡𝑒𝑠𝑡 at 0.05 level of significance.
3.2 The grades in Mathematics in the Modern World for a particular semester are as
follows:
Grades Observed
1.25 15
1.50 19
1.75 32
2.00 22
2.25 17
Use the 𝜒 2 − 𝑡𝑒𝑠𝑡 at 0.05 and test the hypothesis that the distribution of grades is
uniform.
3.3 A random sample of 270 voters classified according to the political affiliation were
asked if they were in favor of the ongoing peace negotiation in selected cities/towns in
Mindanao.
NP 38 52 90
LP 53 37 90
PDP-LABAN 54 36 90
3.4 A random sample of 60 adults are grouped according to sex and their opinion
regarding the ceasefire between the government and the CPP-NPA at Christmas.
Ceasefire
Male 19 11 30
Female 21 9 30
Total 40 20 60
3.5 A random sample of 400 adults are classified according to their age bracket and
drinking habits.
Nondrinkers 28 75 19
Moderate drinkers 33 74 38
Heavy Drinkers 53 63 17
Test the hypothesis that age bracket is dependent of drinking habit. Use 𝑋 2 − 𝑡𝑒𝑠𝑡 at
0.05 level of significance.
3.6 The following data were taken from 200 individuals in a study to determine the
dependence of lung cancer and smoking habits,
With Cancer 26 48 46
No Cancer 49 21 10
Total 75 69 56
Test the hypothesis that the lung cancer is independent of smoking habits. Use 0.05
level of significant.
3.7 From 18 students of Math class, 9 students are selected at random and given
additional instruction by the teacher. The rest of the students were given no additional
instruction. The results on the final examination were as follows:
With Additional 86 91 85 82 85 86 80 84 86
Instruction
No Additional 76 80 80 84 73 82 84 78 82
Instruction
Use the Wilcoxon rank-sum test at 0.05 level of significance if the additional instruction
affects the average grade.
3.8 Electric cables are being manufactured by two companies. To determine if there
is a difference in the mean breaking strength of the cables, 8 pieces from each
company are selected at random and tested for breaking strength. The results are:
Use the Wilcoxon rank-sum test at 0.05 level of significance if there is a difference in
the mean breaking strength of the cables manufactured by two companies.
3.9 Random sample of 3 brands of cigarettes were tested for nicotine content. The
following figures show the milligrams of nicotine found in the 15 cigarettes tested.
Brand
A B C
16 18 12
15 17 11
14 19 10
17 20 9
13 21 11
Use the Kruskal-Wallis test, at the 0.05 level of significance, to test whether there is a
significant difference in nicotine content among the 3 brands of cigarettes.
3.10 The following data represent the operating time in hours for 4 Brands of midrange
cellphones before a recharge is required.
Brand
Use the Kruskal-Wallis test, at the 0.01 level of significance to test the hypothesis that
the operating times for all 4 brands of cellphones are equal.
3.11 Two judges of a city fiesta parade in NCR ranked 10 floats in the following order:
Judge A Judge B
8 5
5 7
10 9
9 5
3 2
4 4
6 8
7 6
2 1
1 3
3.12 Two groups of experimental dogs are given two brands of vitamins and the
following weight gains in grams were obtained.
Group 1 250 206 106 280 212 106 112 142 168 185
Group 2 67 75 138 180 193 48 78 82 67 88
Apply the sign test to determine if the two samples come from population with the
same median.
3.13 The following are the scores of 10 male and 10 female students in psychological
laboratory examination.
Male 22 83 51 67 74 79 46 20 37 69
Female 46 94 62 75 56 42 84 92 49 96
Test the significant difference between the scores of the male and female students
using sign test.
3.14 The following are the weights of 15 women enrolled in a 10-week slimming
program. Their weights were taken before and after the program.
Before After
120 120
130 125
140 115
125 126
140 145
115 100
135 120
146 126
141 134
130 129
126 124
114 112
115 106
120 112
136 121
Use the sign test at 0.05 level of significance to test if there is a significant difference
in the weights of women enrolled in the programs before and after.
3.15 The following are the scores obtained by the groups of 6 subjects each given with
3 different methods of teaching in Algebra.
38 40 70
37 38 40
30 50 70
40 46 69
36 32 50
34 48 39
3.16 The following data were obtained before and after a televised debate on charter
change for a sample of 60 registered voters.
After the Debate
Yes No Total
Yes 20 15 35
Before the
Debate No 12 13 25
Total 32 28 60
3.17 The following data were recorded for 6 subjects exposed to 4 different treatments.
Treatments
Subjects T1 T2 T3 T4
1 10 10 4 6
2 9 5 6 3
3 8 7 9 10
4 10 9 8 7
5 5 6 3 4
6 9 9 8 7
3.18 The following data were obtained for 5 subjects repeatedly measured under 3
different conditions.
Condition
Subject 1 2 3
1 11 14 13
2 29 34 41
3 22 30 18
4 3 5 17
5 13 11 16
3.19 Three judges ranked 5 contestants in a beauty contest. The following were
obtained:
Students
Judge A B C D E
X 2 3 1 4 5
Y 2 1 4 3 5
Z 1 3 2 5 4
Works Cited
(n.d.). Retrieved from https://www.sciencedirect.com>topics
Bayanito, M. R. (2015). The Contemporary World. Taguid City: National Book Store.
Bibliography
(n.d.). Retrieved from https://www.sciencedirect.com>topics
Broto, A. S. (2006). Statistics Made Simple. Mandaluyong City: National Book Store.
Garcia, G. A. (2003). Fundamental Concepts and Methods in STATISTICS (Part 1). Manila, Philippines:
University of Santo Tomas Publishing House.