0% found this document useful (0 votes)
159 views77 pages

Chapter 4

The document discusses hypothesis testing, including defining key terms like the null and alternative hypotheses, test statistics, critical values, p-values, and types of hypothesis tests. It also provides examples of hypothesis tests on means and variances from sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views77 pages

Chapter 4

The document discusses hypothesis testing, including defining key terms like the null and alternative hypotheses, test statistics, critical values, p-values, and types of hypothesis tests. It also provides examples of hypothesis tests on means and variances from sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

STA404

STATISTICS FOR
BUSINESS AND SOCIAL
SCIENCES

CHAPTER 4:
HYPOTHESES
TESTING
HYPOTHESIS TESTING
DEFINITION

Hypothesis testing Statistical hypotheses


Involve making a decision about the value of a parameter An assumption about a population parameter. This
based on some preconceived idea of what its value might be. conjecture may or may not be true.

TERM DESCRIPTION
Hypothesis Statement about the value of the population parameter.
Null hypothesis, H0 Claim (or statement) about a population parameter that
assumed to be true until it is declared false.
Statistical hypothesis that states that there is no difference
between a parameter and specific value, or that there is no
difference between two parameters.
𝐇𝟎 : 𝛍 = 𝛍𝟎
where μ is the parameter and μ0 is a single number.
Alternative Hypothesis opposite to H0 and this hypothesis will be
hypothesis, H1 accepted if H0 is rejected. It is also known as the research
hypothesis.
Statistical hypothesis that states the existence of a
difference between a parameter and specific value, or that
there is a difference between two parameters.
The alternative hypothesis can take two form:
a) Non-directional
𝐇𝟏 : 𝛍 ≠ 𝛍𝟎 (two tailed-test)
b) Directional
𝐇𝟏 : 𝛍 > 𝛍𝟎 (right tailed-test)
𝐇𝟏 : 𝛍 < 𝛍𝟎 (left tailed-test)

Significance level The probability of rejecting the null hypothesis it is true. This
level is represented by the symbol α (alpha). The level of
confidence that correspond to α is 1 - α. Thus, a confidence
level of 95% corresponds to α of 5%.
Test statistic Single number calculated from the sample data as a basis in
deciding to reject or not to reject the null hypothesis.
The formula for the test statistic:

Am I testing population means or variances?

MEANS VARIANCE

1 or 2 sample groups? 1 or 2 sample groups?

1 sample 2
1 sample 2 samples
samples
Is variance known or
n≥30? Is variance known or
n≥30?
YES NO
YES NO
Critical value Value of the test statistic that separates the non-rejection region
(NRR) from the rejection region (RR)

 Rejection region – An region in which the null hypothesis, H 0


will be rejected if the value of the test statistic falls in it.
 Non – rejection region – A region in which the null hypothesis,
H0 will not be rejected if the value of the test statistic falls in it.

a) Right – tailed test


- the rejection region is on the right tail of the distribution.

b) Left – tailed test


- the rejection region is on the left tail of the distribution.

c) Two – tailed test


- the rejection region is on the both sides of the
distribution.
Error in
making
decision
THE USE OF P-VALUE

0.10 – we have
some evidence
that is not true
HYPOTHESIS TESTING
CRITICAL VALUE APPROACH P - VALUE METHOD
- find critical value from statistical table - use p - value from minitab output

State the null and alternative State the null and alternative
hypothesis hypothesis

State the level of significance, α State the level of significance, α

Determine and compute the


appropriate test statistic. Find State the p-value
critical value from table.

Specify the decision rule Specify the decision rule

Make the decision Make the decision

State the conclusion State the conclusion


HYPOTHESIS TESTING FOR A SINGLE MEAN

Assumption:
1. The population is normally distributed.
2. The sample is independent.

STEP 1: State the null and alternative hypothesis

STEP 2: Determine the significance level, α is given

STEP 3: Compute the test statistic

VARIANCE KNOWN OR N≥30 VARIANCE UNKNOWN OR N<30

with df = n-1

STEP 4: Find critical value from table

STEP 5: Specify the decision rule

STEP 6: Make a decision and conclusion


EXAMPLE 1
The thickness of the smartphones produces by company I and II are recorded. Assume that the distribution of the data
is normal. The following table summarizes the sample mean (mm), sample standard deviation (mm) and sample size of
the data.
I II
Mean 10.2 9.2
Standard deviation 1.1 0.9
Sample size 16 13
Test at 1% level of significance that the mean of the thickness of smartphones produced from company II is less than
10.
EXAMPLE 2
The starting salary (in RM) of engineering graduates from bachelor and master degree is compared. The following table
summarizes the results. Assume that the salary is normally distributed for both populations.
Sample
size mean variance
Bachelor 23 2300 225
Master 19 2450 248

A report claims that the mean starting salary of engineering graduates from bachelor degree is more than RM2250.
Test at 5% level of significance whether the data support the claim
EXAMPLE 3
The voltage readings of two types of altimeters are recorded for a period of 13 days. The means and variances of the
readings are:
Type I Type 2
Sample mean 40.33 42.54
Sample variance 1.54 2.96
Assume that the voltage readings are normally distributed for both types of altimeters. A researcher claims that the
variance of the voltage reading of type 2 altimeter is different from 4. Test whether the claim is correct at 5%
significance level.
EXAMPLE 4
Environmental testing is one of the attempts to test a component under conditions that closely simulates the
environment in which the component will be used. An electrical component is to be used in two different locations in
Country A. Before environmental testing can be conducted, it is necessary to determine the soil composition in these
localities. These data are obtained on the percentages of SiO2 by weight of the soil.
Soil Type I Soil Type 2
Number of sample 13 11
Sample mean 64.94 57.06
Sample variance 9 7.29
Assume the data is normally distributed for both populations. A researcher claims that the percentage of SiO2 in Soil
Type 1 is 64. Do the data contradict the researcher‟s claim? Test at the 1% significance level.
EXAMPLE 5
A set of facilitation tools to help with data analysis for problem solving is being developed a group of statisticians at
UiTM. In order to test the effectiveness of these tools, a group of research officers were asked to analyze and produce a
built – in report for a set of data on the computer. Twelve equally capable research officers were randomly selected and
six were randomly assigned a standard procedure to complete the task. The other six were asked to do the task using
the developed facilitation tools. The response measured was the time to completion (in minutes). The data collected
are shown below.
Group 1 (standard procedure) 61 69 68 74 58 63
Group 2 (facilitation tool) 32 42 40 34 38 33

Assume that the population distributions are normal. The statisticians claim that the average time to complete the task
using the developed facilitation tools is significantly less than 50 minutes. Do they have sufficient evidence to support
this claim at the 5% level of significance?
SPSS OUTPUT
EXAMPLE 6
A manufacturer claims that the average capacity for a certain type of battery is 140 ampere – hours. An agency wishes
to test the credibility of the manufacturer‟s claim and measures the capacity of ten randomly selected batteries from a
current production batch. The results, in ampere - hours are as follows:

One-Sample Statistics
N Mean Std. Deviation Std. Error Mean
Battery_capacity 10 139.290 1.6360 .5174

a) Construct a 95% confidence interval for the mean capacity of a battery.


b) Based on the interval obtained, can you conclude that the capacity of the battery is different from 140 ampere –
hours?
c) Specify the null and alternative hypotheses for testing whether capacity of battery is different from 140 ampere –
hours. Perform a test at 5% significance level.
EXAMPLE 7
The quality department of an industry requires that the thickness of steel bolts produced should have a mean more
than 9mm. A random sample consisting 10 steel bolts were measured and the thicknesses are as follows (in mm):

One-Sample Statistics
N Mean Std. Deviation Std. Error Mean
Thicknesses 10 9.990 .3348 .1059

a) Construct a 95% confidence interval for the mean thickness of steel bolts. Does the interval indicate that the mean
thickness of steel bolts is more than 9 mm?
b) Hence, perform a test of hypotheses at 5% significance level to test whether the industry is able to produce steel
bolts with mean that more than 9mm.
EXAMPLE 8
A food researcher claimed that the percentage starch content of a certain type is normally distributed with mean, μ
equals to 20%. In order to assess the mean value of the starch content a random sample of twelve potatoes is selected
and their starch content measured. The percentages of starch contents obtained were as follows:

One-Sample Statistics
N Mean Std. Deviation Std. Error Mean
Percentage 12 20.675 1.5130 .4368

One-Sample Test
Test Value = 20
98% Confidence Interval of the
Sig. (2- Mean Difference
t df
tailed) Difference
Lower Upper
Percentage 1.545 11 .151 .6750 -.512 1.862

Based on the p – value in the computer output, is there any evidence that the average percentage starch content of
potatoes is significantly different from 20 percent? Use α = 0.01.
EXAMPLE 9
A gas company claimed that the length of time between a gas leak being reported and an engineer arriving to
investigate the report has a mean of 113 minutes. An analysis has been conducted and the results are reposted in the
following tables.
One-Sample Statistics
Std. Std. Error
N Mean Deviation Mean
Time 8 130.25 50.230 17.759

One-Sample Test
Test Value = 113
95% Confidence Interval
Sig. (2- Mean of the Difference
t df tailed) Difference Lower Upper
Time .971 7 .364 17.250 -24.74 59.24

Based on the above table, answer the following questions.


a) Show that the standard error of the mean is 17.759.
b) Obtain a 95% confidence interval for the mean length of time and show that the interval is between -24.74 and
59.24.
c) State the null and alternative hypotheses for the above study.
EXERCISE 1
1. Steel rods from company A and B are randomly selected and the length of the rods is measured. A sample of size
nine from company A gives a sample mean of 145 cm and estimated standard deviation of 9 cm. Meanwhile a
sample of size 13 from company B gives a sample mean of 152 cm and estimated standard deviation of 11 cm. The
length of steel rods is approximately normally distributed. An officer from company A claims that the mean length
of steel rods produced from Company A is 140 cm. Do the data contradict the officer‟s claim? Test at 1%
significance level.
2. A survey made by the Human Resource Ministry states that the average monthly salary of an executive is RM4,100
with a standard deviation of RM680. However, a sample of 25 executives selected recently gives an average monthly
salary per month of RM3,850. Assuming that the average monthly salary of an executive is normally distributed, test
at the 1% significance level, whether the ministry‟s is too high.
3. The length of a particular type of iron nails produced by a manufacturer has standard deviation 6.8mm. The target
length for an iron nail is 38mm. A supervisor takes length measurement of a random sample of 100 nails and
obtains a sample mean length of 39.4mm. Test whether the mean length is on target. Use the 5% significance level.
4. According to an article published 2 years ago, the mean number of cigarettes smoked per day by adults who were
daily smokers was 11.6. To determine whether adults who are daily smokers nowadays smoke less than the general
population of daily smokers in the past, a random sample of 100 adults who are currently daily smokers and record
the number of cigarettes smoked on a randomly selected day. The data give a sample mean of 10.8 cigarettes and a
standard deviation of 3.9 cigarettes. Perform, at the 5% significance level, a test to determine whether current adults
who are daily smokers smoke less than the general population of daily smokers two years ago.
5. It is claimed that an automobile is driven on the average more than 20,000 km per year. To test this claim, a
random sample of 100 automobile owners is asked to keep a record of the kilometers they travel. Would you agree
with this claim if the random sample showed an average of 23,500 km and standard deviation of 3,900 km. (use
α=0.05)
6. A research company claims that Malaysian travelers spend an average of RM 4,250 a year on tours. A sample of 200
Malaysians who travel produced a mean travel expenses of RM 4,165 with population standard deviation of RM500.
Test at 1% level of significance whether the mean is less than that claimed by the research company.
EXERCISE 1
7. The policy of a particular bank branch is that its ATMs must be stocked with enough cash to satisfy customers
making withdrawals over an entire weekend. At this branch the population average amount of money withdrawn
from ATM machines per customer transaction over the weekend is RM160 with a standard deviation of RM30.
Suppose that a random sample of 36 customer transactions is examined and it is observed that the sample mean
withdrawal is RM172. At 5% significance level, test whether the ATMs are not stocked with enough cash.
8. The TIV Telephone Company provides long-distance telephone service in an area. According to the company‟s
record, the average length of all long-distance calls placed through this company in 2015 was 12.44 minutes. The
company‟s management wanted to check if the mean length of the current long-distance calls is different from the
12.44 minutes. A sample of 150 such calls placed through this company produced a mean length of 13.71 minutes
with a standard deviation of 2.65 minutes. Using the 5% significance level, can you conclude that the mean length
of all current long-distance is different from the 12.44 minutes?
9. According to a salary survey, the average salary offered to computer science majors who graduated in May 2015
was RM 5,035. Suppose this result is true for all computer science majors who graduated in May 2015. A random
sample of 200 computer science majors who graduated this year showed that they were offered a mean salary of
RM 5,175 with a standard deviation of RM 5,240. Using the 1% significance level, can you conclude that the mean
salary of this year‟s computer science graduates is higher than RM 5,035?
10. The director of admissions at a small college advises parents of incoming students about the textbooks cost of not
more than RM 300 during a typical semester. A sample of 20 students enrolled in the college indicates a sample
average cost of RM 315.40 with a sample standard deviation of RM 43.20. Using 0.10 of level significance, is there
enough evidence that the population average is above RM 300?
11. The average number of bookings per month received by a travel agency last year was 132. Azlan who works at the
travel agency believes that the number of bookings for this year has dropped. He collected data on the number of
bookings for the last 7 months as follows:

126 129 135 115 142 109 111

Test using the 5% level significance whether there is any evidence to support Azlan‟s belief.
TUTORIAL 1
1. At canon Food Corporation, it took an average of 50 minutes for new workers to learn a food processing job.
Recently the company installed a new food processing machine. The supervisor at the company wants to find if the
mean time taken by new workers to learn the food processing procedure on this new machine is different from 50
minutes. A sample of 40 workers showed that it took on average, 47 minutes for them to learn the food processing
procedure on the new machine with standard deviation of 7 minutes. At α = 0.01, test that the mean learning time
for the food processing procedure on the new machine is different from 50 minutes.
2. A study claims that all adults spend an average of 14 hours or more on chores during a weekend. A researcher
wanted to check if this claim is true. A random sample of 200 adults taken by this researcher showed that these
adults spend an average of 13.75 hours on chores during a weekend with a standard deviation of 3 hours. Test that
all adults spend less than 14 hours on chores during a weekend at 5% significance level.
3. The mayor of a large city claims that the average net worth of families living in this city is at least RM3,000. A
random sample of 100 families selected from this city produced a mean net worth RM2,880 with a standard
deviation of RM800. Using the 2.5% significance level, can you conclude that the mayor‟s claim is false?
4. According to report, there were 8.1 million unemployed people aged 18 years and over in August 2015. The
average duration of unemployment for these people was 16.3 weeks. Suppose that a recent random sample of 400
unemployed aged 18 years and over gave mean duration of unemployment of 16.9 weeks with a standard deviation
of 4.2 weeks. Test at the 2% significance level whether the current mean duration of unemployment exceeds 16.3
weeks.
5. According to an estimate, the average age of motorcycle owners was 38.1 years in 2014. A recent random sample
of 700 motorcycle owners yielded a mean age of 37 years with a standard deviation of 8 years. Testing at 1%
significance level, can you conclude that the current mean age of motorcycle owners is less than 38.1 years?
HYPOTHESIS TESTING FOR A DIFFERENCE BETWEEN TWO MEANS
(INDEPENDENT SAMPLE)

An Independent Sample t-test compares the means of two independent groups and normally
distributed.
a) The null hypothesis would be that the means are the same. (μ1 = μ2)
b) A low p-value indicating a sufficiently large difference between groups would suggest that
you reject the null hypothesis and conclude that the two groups are significantly different.

STEP 1: State the null and alternative hypothesis

Left-tailed H1: μ1 < μ2 Right-tailed H1: μ1 > μ2

Two-tailed H1: μ1 ≠ μ2

STEP 2: Determine the significance level, α is given

STEP 3: Compute the test statistic

VARIANCE KNOWN OR N≥30 VARIANCE UNKNOWN OR N<30

STEP 4: Find critical value from table

STEP 5: Specify the decision rule

STEP 6: Make a decision and conclusion


EXAMPLE 10
The table below lists the prices (in RM) of set lunch in two different restaurants.

Restaurant A 15 17 15 18 16 18
Restaurant B 12 14 13 15 15 19 13

a) Construct a 90% confidence interval for the ratio between the two variances. Explain whether the variances are
equal.
b) Using the result in part (a), determine if there are differences in the mean prices of set lunch in Restaurant A and
Restaurant B. Test at 10% level of significance.
EXAMPLE 11
The weights in kilograms (kg) of a sample of bricks produced by company A and company B are shown below.

Company A 3.5 3.3 3.7 3.0 3.3 3.1


Company B 3.2 3.3 3.0 2.8 3.1 3.0

a) Construct a 95% confidence interval for the ratio between the two variances. Justify whether the variances are equal.
b) Using the result in (a) test whether there are differences in the mean of the weight of bricks produced by company
A and company B. Use 10% level of significance.
EXAMPLE 12
To evaluate the performance of inspectors in a new company, 13 novice inspectors were chosen to evaluate 200
finished products. The same 200 products were evaluated by 13 experienced inspectors. The table below lists the
number of inspection errors made by each inspector.

Novice inspectors 30 45 35 31 26 33 40 29 36 21 20 48 41
Experienced inspectors 31 19 15 18 25 24 19 10 28 20 17 21 20

a) Construct a 90% confidence interval for the ratio between the two variances. Hence, explain whether the variances
are equal.
b) Based on the result in (a), can we conclude there are differences in the mean of the number of inspection errors for
both groups of inspectors? Use α = 0.10.
EXAMPLE 13
Manufacturing Company X and Y produce standard screws for a furniture company. A random sample of 9 screws is
selected from each company and the length of the screws is measured. The results, in cm, are as follows:

X 2.0 2.1 1.9 2.1 2.0 1.9 2.1 2.0 2.1


Y 2.1 2.1 2.0 1.8 1.9 1.9 2.0 2.1 1.9

Assume both populations have equal variances, can we conclude that there are differences in the mean length of
screws produced by Company X and Y? Test at 10% level if significance.
EXAMPLE 14
Steel rods from company A and B are randomly selected and the length of the rods is measured. A sample of size nine
from company A gives a sample mean of 145 cm and estimated standard deviation of 9 cm. Meanwhile a sample of
size 13 from company B gives a sample mean of 152 cm and estimated standard deviation of 11 cm. The length of steel
rods is approximately normally distributed.
a) Construct a 95% confidence interval for the ratio between the two variances. Explain why we can assume that the
variances are equal.
b) Using the result on the variances in part (a), test at 5% significance level whether the mean length of steels rods by
Company A is longer than Company B.
SPSS OUTPUT
EXAMPLE 15
Manufacturing Company X and Y produce standard screws for a furniture company. A random sample of 9 screws is
selected from each company and the length of the screws is measured. The results (in cm) are as follows:

Independent Samples Test


Levene's Test for
t-test for Equality of Means
Equality of Variances
90% Confidence
Sig. (2- Mean Std. Error Interval of the
F Sig. t df Difference
tailed) Difference Difference
Lower Upper
Equal variances
1.078 .315 .970 16 .346 .0444 .0458 -.0355 .1244
assumed
Length
Equal variances
.970 14.952 .347 .0444 .0458 -.0359 .1248
not assumed

a) State the hypothesis for the test.


b) Based on the results, what is the assumption of the variances? Use α = 0.10.
c) Using the p – value in the SPSS output, do the data provide sufficient evidence to indicate that length of the screws
from Company X is higher than that from Company Y. Use α = 0.10.
d) State the 90% confidence interval of the difference in mean length of screws from Company X and Company Y.
EXAMPLE 16
The effective life (in hours) of batteries is compared by operating temperatures 20°C and 45°C. 18 batteries are
randomly selected and randomly allocated to the temperature levels. The resulting life for all the 18 batteries is shown
in the following table.

Independent Samples Test


Levene's Test for
Equality of Variances t-test for Equality of Means
95% Confidence Interval
of the Difference
Sig. (2- Mean Std. Error
F Sig. t df tailed) Difference Difference Lower Upper
Effective_ Equal variances
3.423 .083 2.737 16 .015 35.333 12.911 7.963 62.704
life assumed
Equal variances not
2.737 11.862 .018 35.333 12.911 7.166 63.501
assumed

a) State the null and alternative hypothesis for the above test.
b) Based on the result, what is the assumption for the variances of the effective life between two temperatures? Use α
= 0.05.
c) Using the p – value, do the data provide sufficient evidence to indicate that there is a significant difference in
effective life between the two temperatures?
d) State the 95% confidence interval of the mean difference in effective life between the two temperatures. Does the
interval further prove the conclusion in (c)?
EXAMPLE 17
A researcher is interested in comparing the sodium content (in grams) in real cheese and substitute cheese. The data
for the two random samples are shown in the following table.
Group Statistics
Std. Std. Error
Cheese N Mean Deviation Mean
Sodium_content Real 8 193.12 132.151 46.722
Substitute 8 231.25 76.614 27.087

Independent Samples Test


Levene's Test for
Equality of Variances t-test for Equality of Means
95% Confidence
Interval of the
Difference
Sig. (2- Mean Std. Error
F Sig. t df tailed) Difference Difference Lower Upper
Sodium_conte Equal variances
2.134 .166 -.706 14 .492 -38.125 54.006 -153.957 77.707
nt assumed
Equal variances
-.706 11.228 .495 -38.125 54.006 -156.699 80.449
not assumed

a) What is the variable in this study?


b) Determine whether there is any difference between the variances of the population at the 5% significance level.
c) Is there any difference in the mean sodium content between real cheese and cheese substitute using the 5%
significance level?
EXAMPLE 18
In order to compare two types of methods, X and Y in measuring the hardness of metals, readings of Brinell hardness
were taken using each method for 20 metal specimens. The resulting Brinell hardness readings are given in the table
below:
Independent Samples Test
Levene's Test for
Equality of Variances t-test for Equality of Means
99% Confidence Interval
of the Difference
Sig. (2- Mean Std. Error
F Sig. t df tailed) Difference Difference Lower Upper
Hardness Equal variances
.000 .993 -.032 18 .975 -1.400 43.640 -127.014 124.214
assumed
Equal variances not
-.032 18.000 .975 -1.400 43.640 -127.014 124.214
assumed

a) Based on the p – value in the Levene‟s Test, what can be concluded about the equality of variances?
b) Show that the test statistic value is – 0.032.
c) State the 99% confidence interval of the difference in the mean hardness of metals between Metal X and Metal Y.
d) Using the p – value, do the data provide sufficient evidence whether there is a significant difference in the mean
hardness metal? Does the confidence interval in (c) consistent with your answer in (d)? Give a reason to support
your answer.
EXAMPLE 19
An investigation was conducted into the dust content in the flue gases of two types of solid fuel boilers. Thirteen
boilers of Type X and nine boilers of Type Y were used under identical fuelling and extraction conditions. Over a similar
period, the following quantities (grams) of dust were deposited in similar traps inserted in each of the 22 flues. Assume
that these independent samples came from a normal population. The results are shown in the table.
Group Statistics
Type N Mean Std. Deviation Std. Error Mean
Dust_deposit X 13 63.831 10.6307 2.9484
Y 9 52.889 9.0044 3.0015

Independent Samples Test


Levene's Test for
Equality of Variances t-test for Equality of Means
95% Confidence Interval
of the Difference
Sig. (2- Mean Std. Error
F Sig. t df tailed) Difference Difference Lower Upper
Dust_de Equal variances assumed .519 .480 2.520 20 .020 10.9419 4.3415 1.8857 19.9980
posit Equal variances not assumed 2.601 19.058 .018 10.9419 4.2074 2.1376 19.7462

a) Using Levene‟s test, determine whether there is any difference between the variances of the population at 5% level
of significance.
b) Find a 95% confidence interval for the difference between the mean dust deposit of Type X and Type Y.
c) Is there any difference in the mean dust deposit between Type X and Type Y? Test at 5% significance level.
EXERCISE 2
1. The monthly income of a sample male and female tourist guides is summarized in the table below.

Male Guides Female Guides


x1  RM3800 x2  RM3950
σ1=RM800 σ2=RM650
n1=32 n2=30

Is there any evidence at 10% significance level to indicate that female tourist guides earn more than male guides?
Assume that the populations are normally distributed.
2. The following information was obtained from two independent samples selected from two normally distributed
populations with unknown but equal standard deviation. Test at 1% significance level if ≠ .

Sample 1 27 39 25 33 21 35 30 26 25 31 35 30 28
Sample 2 24 28 23 25 24 22 29 26 29 28 19 29

3. A consumer agency wanted to estimate the difference in the mean amounts of caffeine in two brands of coffee. The
agency took a sample of 15 one – pound jars of Brand I coffee that showed the mean amount of caffeine in these
jars to be 80 mg per jar with a standard deviation of 5 mg. Another sample of 12 one – pound jars of Brand II coffee
gave a mean amount of caffeine equal to 77 mg per jar with a standard deviation of 6 mg. Test at the 1%
significance level whether the mean amount of caffeine are different for those two brands. Assume that the two
populations are normally distributed and that the standard deviations of the two populations are equal.
4. A random sample of 20 cleaning persons produced the mean hourly earnings of RM10.60 with a standard deviation
of RM1.02. A random sample of 25 technicians gave the mean hourly earnings of RM11.57 with a standard deviation
of RM1.34. Assume that the hourly earnings of both groups are normally distributed with different population
standard deviations. Using 5% significance level, can you conclude that the mean hourly earnings of all cleaning
persons are lower than those of all technicians in this state?
EXERCISE 2
5. A study is conducted to compare the effect of stress in the form of noise on the ability to perform a simple task by
50 subjects. The first group of 25 subjects acted as a control, while the second group of 25 was the experimental
group. Both group subjects had to perform the task but only the experimental group subjects had to perform the
task with loud music played on. The time to finish the task was recorded for each subject and the following
summary was obtained.
Control Experimental
Number of samples 25 25
Sample mean 15 minutes 23 minutes
Sample standard deviation 4 minutes 10 minutes
Assume that the time to finish the task follows a normal distribution for both populations.
a) Construct a 95% confidence interval for the ratio between the two variances. Explain why we can assume that
the variances are equal.
b) Using the result on the variances in part (a), test at α = 0.05 whether the mean time to finish the task for the
experimental group is higher than the control group.
6. The average salary offered to college students who graduated in 2014 was RM4,373 to Civil Engineering (EC) and
RM4,029 to Computer Science (CS). Assume that these means are based on samples of 900 EC students and 1,200
CS students and that the sample standard deviations for the two samples are RM220 and RM195, respectively. Test
at the 2.5% significance level if the mean salary offered to college students who graduated in 2014 with EC is higher
than that for CS. Assuming that the populations are approximately normally distributed.
7. A company is considering installing new machines to assemble its product. The company is considering two types
of machines but it will buy only one type. The company selected eight assembly workers and asked them to use
these two types of machines to assemble products. The following table gives the time taken (in minutes) to
assemble one unit of the product on each type of machine for each of these eight workers.
Machine I 23 26 19 24 27 22 20 18
Machine II 21 24 23 25 24 28 24 23
Test at 5% significance level whether the mean times taken assemble a unit of the product are different for the
two types of machines.
HYPOTHESIS TESTING FOR A DIFFERENCE BETWEEN TWO MEANS
(DEPENDENT SAMPLE)

The dependent t-test or paired sample t-test is used to evaluate that are related to each other.
For example, data from a group of students who are given the pre-test and post-test would be analyzed using a paired
sample t-test.

Assumption:
1. The paired differences are normally distributed.
2. The paired differences represent a random sample from the population.

STEP 1: State the null and alternative hypothesis

Left-tailed H0: μD < D0 Right-tailed H0: μD > D0

Two-tailed H0: μD ≠ D0

STEP 2: Determine the significance level, α is given

STEP 3: Compute the test statistic

STEP 4: Find critical value from table

STEP 5: Specify the decision rule

STEP 6: Make a decision and conclusion


EXAMPLE 20
A company claims that its 12-week special exercise program significantly reduces weight. A random sample of six
persons was selected, and these persons were put on this exercise program for 12 weeks. The following table gives the
weight (pounds) of those six persons before and after the program.

Before 180 195 177 221 208 199


After 183 187 161 204 197 189

Test at 5% level of significance to determine whether the mean weight loss for all persons due to this special exercise
program is greater than zero.
EXAMPLE 21
A private agency claims that the crash course it offers significantly increases the writing speed of secretaries. The
following table gives the scores of eight secretaries before and after they attended this course.

Before 81 75 89 91 65 70 90 64
After 97 72 93 110 78 69 115 72

Using the 5% significance level, can you conclude that attending this course increases the writing speed?
EXAMPLE 22
A company wanted to know if attending a course on “how to be a successful salesperson” can increase the average
sales of its employees. The company sent six of its salesperson to attend this course. The following table gives the one
– week sales of these salespersons before and after they attended this course.

Before 12 18 25 9 14 16
After 18 24 24 14 19 20

Using the 1% significance level, can you conclude that the mean weekly sales for all salespersons increase as a result of
attending this course? Assume that the population of paired differences has a normal distribution.
SPSS OUTPUT
EXAMPLE 23
A company wanted to know if attending a course on “how to be successful salesperson” can increase the average sales
of its employees. The company sent six of its salespersons to attend this course. The following table gives the one-
week sales of these salespersons before and after they attended this course.

Paired Samples Statistics


Mean N Std. Deviation Std. Error Mean
Pair 1 Before 15.67 6 5.538 2.261
After 19.83 6 3.817 1.558

Paired Samples Test


Paired Differences
99% Confidence Interval
Std. Std. Error of the Difference Sig. (2-
Mean Deviation Mean Lower Upper t df tailed)
Pair 1 Before - After -4.167 2.639 1.078 -8.511 .178 -3.867 5 .012

a) Show that the value of the test statistic is – 3.867.


b) Give the null and alternative hypothesis.
c) Test at the 1% significance level whether the mean weekly sales for all salespersons increase after attending the
course.
d) State the 99% confidence interval for the mean difference.
e) Does the result in (d) above support your conclusion in (c)? Explain.
EXAMPLE 24
A company sent seven of its employees to attend a course in building self-confidence. These employees were evaluated
for their self-confidence before and after attending this course. The following table gives the scores (on a scale 0f 1 to
15, 1 being the lowest and 15 being the highest score) of these employees before and after they attended the course.

Paired Samples Statistics


Mean N Std. Deviation Std. Error Mean
Pair 1 Before 6.57 7 2.070 .782
After 8.00 7 2.160 .816

Paired Samples Test


Paired Differences
95% Confidence Interval
Std. Std. Error of the Difference Sig. (2-
Mean Deviation Mean Lower Upper t df tailed)
Pair 1 Before – After -1.429 1.988 .751 -3.267 .410 -1.901 6 .106

Using p – value in the SPSS output, test whether attending this course increases the mean score of employees.
EXAMPLE 25
A researcher wanted to find the effect of a special diet on systolic blood pressure. She selected a sample of seven
adults and put them on this dietary plan for three months. The following table gives the systolic blood pressures of
these seven adults before and after the completion of this plan.
Paired Samples Statistics
Mean N Std. Deviation Std. Error Mean
Pair 1 Before 208.43 7 18.100 6.841
After 203.43 7 21.078 7.967

Paired Samples Test


Paired Differences
90% Confidence Interval
Std. Std. Error of the Difference Sig. (2-
Mean Deviation Mean Lower Upper t df tailed)
Pair 1 Before - After 5.000 10.786 4.077 -2.922 12.922 1.226 6 .266

a) What is the purpose of the study?


b) State your null hypothesis and alternative hypothesis.
c) What statistical test would you use?
d) How many degrees of freedom are there?
e) Using the p-value, what is your conclusion?
PYQ DECEMBER 2019
QUESTION 7

A researcher wants to determine whether there is a significant difference in the Body Mass Index (BMI) between male
and female. A survey as conducted on 80 patients at Tawakal Health Centre. The collected data analyzed using SPSS.
The partial output indicated in the following table.
Group Statistics
Gender Mean N Std. Deviation Std. Error Mean
Male 27.0375 40 4.41911 0.69872
BMI
Female 24.0175 40 4.00031 0.63251

a) Show the standard error difference is 0.9425.


(3 marks)
b) State the null and alternative hypotheses for the above study.
(2 marks)
c) Given the z –statistic is 3.205, do the data provide sufficient evidence to indicate there is significant difference in the
Body Mass Index (BMI) between male and female? Use α = 0.05.
(3 marks)
Analysis of variance (ANOVA) is a comparison of means. ANOVA will allow you to compare more
than two means simultaneously.
OVERVIEW OF ANOVA

The quantity of information contained in a sample is affected by various factors that the
experimenter may or may not be able to control.

The analysis of variance is used to determine how different experimental factors affect the
average response.

All measurements exhibit variability.

The total variation in the response measurements is broken into portions that can be attributed to
various factors.

These portions are used to judge the effect of the various factors on the experimental response.
EXPERIMENTAL DESIGN OF ANOVA

Completely Randomized Completely Randomized Two Factor Factorial


Design (CRD) Block Design (CRBD) Experiment
(a x b Factorial Experiment)

Extension of two
Extension of paired-test
independent sample t-test
(two-way ANOVA)
(one-way ANOVA)
Involve only two factors
and their effect on the
response. (two-way ANOVA
with interaction)
BLOCKING EXPERIMENTAL UNIT FACTOR
a technique to include other object on which a
factors in an experiment which an independent variable,X whose
measurement(s) is taken. values are controlled and varied
contribute to undesirable
variation. by the experimenter

REPLICATION LEVEL
a repetition of the basic the intensity setting of a
experiment.
TERM
factor.

RANDOMIZATION
a random process of assigning
treatments to the experimental
units.
TREATMENT/ FACTOR
RESPONSE, Y LEVELS
the variable being specific combination of
measured by the factor levels.
experimenter.
COMPARISON AMONG DESIGNS
Two Factor Factorial
Completely Randomized Completely Randomized Experiment
Design (CRD) Block Design (CRBD)
(a x b Factorial Experiment)

• EXPERIMENTAL UNIT • EXPERIMENTAL UNIT • EXPERIMENTAL UNIT


• FACTOR, X • FACTOR, X • FACTOR, X
• LEVEL • LEVEL • LEVEL
• TREATMENT/ FACTOR • TREATMENT/ FACTOR • TREATMENT/ FACTOR
LEVELS LEVELS LEVELS
• RESPONSE, Y • RESPONSE, Y • RESPONSE, Y
• RANDOMIZATION • RANDOMIZATION • RANDOMIZATION
• BLOCKING • BLOCKING
• REPLICATION
• One-way ANOVA is used to ASSUMPTIONS
determine whether there are any • We want to understand
significant differences between whether exam performance
the means of two or more differed based on test anxiety
independent groups or conditions levels among students,
• Normality – the samples are dividing students into three
in an experiment. obtained from populations
• If the means are significantly independent groups (low,
that are normally distributed. medium, high-stressed).
different, we can say that the • Constant variance – the
variable being manipulated • Fifteen fourth-grade students
variances of the populations were randomly assigned to
(independent variable) had an effect are equal.
on the variable being measured three groups to experiment
(dependent/ response variable). • Independence – the samples with three different methods
drawn from different (Method I, Method II, Method
populations are random and III) of teaching arithmetic.
independent.
ONE -WAY ANOVA
(COMPLETELY EXAMPLE
RANDOMIZED DESIGN)

Note that the one-way ANOVA cannot tell us


which specific groups were significantly different from each
other. It only indicates that at least two groups were
different. Since we may have more than two groups in our
study, determining which of these groups differ from each
other is important. This can be done by using a post-hoc
test such as Tukey‟s test or Fisher Least Significant Difference
(LSD) method.
COMPLETELY RANDOMIZED DESIGN (CRD)
Treatment are randomly assigned to the experimental units, or in which independent random samples of experimental units
are selected for each treatment.

Objective of CRD

To compare three or more treatment means by using analysis of variance (ANOVA) F-test

Conditions required for a valid ANOVA F-test:

the samples are randomly selected in all k sampled populations have


an independent manner from the k- distributions that are approximately the k population variances are equal.
treatmnet populations normal.

Limitation
F-test can only show whether or not a difference exists among the three of more means. It cannot reveal where the
difference lies.
ONE-WAY ANOVA (COMPLETELY RANDOMIZED
DESIGN)

Assume comparing k different treatments or k normal populations. Let random samples of size n1, n2, …, nk are drawn
from k populations with mean µ1, µ2, …, µk and common variance σ2.
Group 1
10 students
(TREATMENT 1)

Group 2
Randomly assigned 30 students to 10 students
3 groups (TREATMENT)
(TREATMENT 2)

Group 3
10 students
(TREATMENT 3)
Are the k population means the same or is at least one mean differ from others?

TYPICAL DATA STRUCTURE FOR ONE-WAY ANOVA

Treatment (level) Observations

1 y11 y12 . . y1n ∑y1


2 y21 y22 . . y2n ∑y2
. . . . . . . .
. . . . . . . .
k yk1 yk2 . . ykn ∑yk
TOTAL ∑Y
Even we are
testing the
means, the
ANOVA
procedures will
analyzes the Treatment
variation in the sum of Sum of squares
Error sum of
data. squares divided by their
squares
(SSTR) respective
(SSE) -
The total - measures measures the degrees of
variation in the the variation variation due freedom (df)
data is due to to random are called mean
measured by differences in errors squares.
total sum of the treatment
squares (SST) means

ONE WAY TOTAL SUM OF MEAN SQUARES


ANOVA SQUARES (SST)
Sum of Squares Mean Square
Source of Variation df F
(SS) (MS)

Treatment, TR
(Between treatment)

Error, E
(within treatment)

Total
STEP 1: Hypothesis Statement

STEP 2: Significance level

STEP 3: Test Statistic

STEP 4: Critical Value

STEP 5: Decision Rule

STEP 6: Decision & conclusion


Each of four varieties of corn, A, B, C and D are planted in three identical plots. The respective yields in bushels
per acre are shown in the following table.

A B C D
86 89 100 88
91 98 98 92
88 97 101 94
Based on the above data:
a) Identify the following item:
i. Experimental unit
ii. Treatment
iii. Factor
iv. Response
b) Determine the type of experimental design used in this study.
c) Construct an ANOVA table.
d) Test at 5% level of significance whether the average yields are equal.
The following are the mileages recorded during series of road test on three new models of luxury sedans.

Model
A B C
22 28 29
26 24 32
27 29 28

Based on the above data:


a) Identify the following item:
i. Experimental unit
ii. Treatment
iii. Factor
iv. Response
b) Construct an ANOVA table.
c) Test at 5% level of significance that the mileages recorded from the three models are equal.
A study is conducted to measure the difference between three teaching methods which are Method A (blended
learning), Method B (on – learning) and Method C (face to face). A professor decided to use the three different
teaching methods to conduct a course. A test was administered at the end of the course and the marks
obtained by the participants are as in the table below.

Method A 25 38 42 65 47 52
Method B 15 21 19 25
Method C 44 39 54 58 73

Based on the above data:


a) Construct an ANOVA table.
b) How many observations are involved in this study?
c) Is there a difference in the mean marks among the three methods? Construct a test of hypothesis at 1%
significance level.
A company manufactures brake wheels for a magnet brake. The product quality is measured by the difference
between the specified diameter and the actual diameter of the brake wheel. The data on precision (1/100 inch)
of nine brake wheels random samples from each of the three different workers are recorded. A partially
completed ANOVA table is given below.

Source of Sum of Degrees of


Mean Square F
Variation Squares Freedom
Treatment 1.548
Error
Total 18.750

Based on the above data:


a) Complete the ANOVA table.
b) Test at 5% significance level if there is any significant difference in product quality between the three
different workers?
Fifteen fourth-grade students were randomly assigned to three groups to experiment with three different
methods (Method I, Method II, Method III) of teaching arithmetic. At the end of the semester, the same test was
given to all 15 students. The table gives the scores of students in the three groups.
Method I Method II Method III
48 55 84
73 85 68
51 70 95
65 69 74
87 90 67
The data were analyzed using MINITAB software and the output is shown below:
ANOVA
Score
Sum of Squares df Mean Square F Sig.
Between Groups 432.133 2 216.067 1.093 .366
Within Groups 2372.800 12 197.733
Total 2804.933 14
a) State the null and alternative hypothesis.
b) Prove that the total sum of squares is 2804.9.
A consumer agency wants to check if the mean lives of four brands of auto batteries which sell for nearly the
same price are the same. The agency randomly selected a few batteries of each brand and tested them. The
following table gives the lives of these batteries in thousands of hours.
Brand A Brand B Brand C Brand D
74 53 57 56
78 51 71 51
51 47 91 49
56 59 77 43
65 68
The following is the result of analysis done using MINITAB software.
ANOVA
Life
Sum of Squares df Mean Square F Sig.
Between Groups 1563.594 3 521.198 5.556 .010
Within Groups 1313.350 14 93.811
Total 2876.944 17
Answer the following questions:
a) How many observations are involved in this study?
b) Test the hypothesis that the mean lives of different brands are same at α = 0.05.
PYQ DECEMBER 2019
QUESTION 1

A team of researchers interest to compare the yield (in kilograms) of four different varieties (A, B, C and D) of a
rambutan tree in Kg Hutan Kampung orchard. The researchers obtain a random sample of four trees of each variety
from the same orchard. The data were analyzed by using IBM SPSS Statistics. The result given as below.
ANOVA
Source Sum of Squares df Mean Square F Sig.
Between Groups R 3 125.729 U 0.293
Within Groups 1083.250 S 90.271
Total 1460.438 T

a) Compute the values of R, S, T and U.


(3 marks)
b) State the null and alternative hypotheses for this study.
(2 marks)
c) Based on the p – value, test at the 5% level of significance whether the mean yield differ on the four different
varieties.
(3 marks)
1. An engineer wants to compare the average time taken to produce an item from three different machines.
The time taken to produce items from these three machines was recorded. The following Minitab output
summarized the analysis of variance (ANOVA).
ANOVA
Time
Sum of Squares df Mean Square F Sig.
Between Groups 2.17 2 1.08 0.67 0.534
Within Groups 14.50 9 1.61
Total 16.67 11

a) Name the type of experimental design used in this study.


b) At 1% level of significance, investigate if there is a difference in the average time taken to produce an item
among three different machines.
2. An excessive amount of ozone in the air is indicative of air pollution. Five air samples were collected from
each of 3 locations in Penang and their content of ozone was measured. The summary output from the data
analysis software is given below.
ANOVA
Ozone
Sum of Squares df Mean Square F Sig.
Between Groups 0.004173 2 0.002087 2.25 0.148
Within Groups 0.011120 12 0.000927
Total 0.015293 14
a) Name the type of experimental design used in this study.
b) At 5% level of significance, investigate if there is any difference in the mean ozone content among the three
locations. Use p-value.
3. A smart phone manufacturer wants to study the battery lifetime of the smart phone from three different
models. The battery lifetimes of several phones from these three models were recorded. The following
Minitab output summarized the analysis of variance (ANOVA).
ANOVA
Lifetime
Sum of Squares df Mean Square F Sig.
Between Groups 604.333 2 302.167 12.26 0.001
Within Groups 369.667 15 24.644
Total 974.000 17
a) Name the type of experimental design used in this study.
b) At 5% level of significance, is there a difference in the mean battery lifetime of the smart phones from three
different models. Use p-value.
4. A university employment office wants to compare the time taken by graduates with three different majors to
find their first job after graduation. The following table lists the time ( in days) taken to find their first full-
time job after graduation for a random sample of eight business majors, seven computer science majors and
six engineering majors who graduated in May ANOVA 2015.
Time
Sum of Squares df Mean Square F Sig.
Between Groups 1312 2 656.2 2.02 .161
Within Groups 5841 18 324.5
Total 7153 20
a) State the null and alternative hypothesis.
b) How many observations are involved in this study?
c) At the 5% significance level, can you conclude that the mean time taken to find their first job for all 2015
graduates in these fields is the same?
1. The effective life of four types of different insulating fluids at an accelerated load of 35kV is being
investigated using a completely randomized experiment. Six observations on the life of insulating fluids were
recorded from each of the insulating fluids. The collected data were then analyzed using SPSS and a partially
completed ANOVA table is given below:
ANOVA
Life
Sum of Squares df Mean Square F
Between Groups 30.17 3 10.06 Y
Within Groups 65.9 20 X
Total 96.16 23

Complete the ANOVA table by finding the appropriate values (x) and (y). Test at 5% level of significance if
there is any difference among the four types of insulating fluids.
2. An experiment is carried out to examine the safety of compact cars, midsize cars and full – size cars. A
sample of three for each of the types of cars was randomly selected to perform the test. The data on the
mean pressure (Pa) applied to the driver‟s head during a crash test is recorded and it is shown below.

Pressure (Pa)
Compact Car Midsize Car Full – size Car
643 469 484
655 427 456
702 525 402

a) Construct an ANOVA table.


b) Can we conclude that the mean pressure applied to the driver‟s head during crash test is different for each
types of car? Conduct a test of hypothesis at 1% level of significance.
3. An investigator is interested in comparing the cardiovascular fitness of elite runners on three different
training courses, each of which covers 10 miles. The courses differ in terms of terrain where Course A is flat,
course B has graded inclines and Course C includes steep inclines. Each runner‟s inclines and Course C
includes steep inclines. Each runner‟s heart rate is monitored at the 5th mile of the run in each course. Ten
runners are involved and their heart rates (beats per minute) measured on each course are as shown.
Runner 1 2 3 4 5 6 7 8 9 10
Course A 135 132 143 128 141 139 150 131 150 142
Course B 141 131 138 148 135 156 134 156 165 145
Course C 150 139 141 161 148 138 138 162 151 160

Below is the result of analysis using SPSS software.


ANOVA
Heart_rate
Sum of Squares df Mean Square F Sig.
Between Groups 476.5 2 238.2 2.57 .095
Within Groups 2499.4 27 92.6
Total 2975.9 29
a) Prove that the total sum of squares is 2975.9.
b) At 5% level of significance, test if there is a difference in the mean heart rates of runners on the three
courses.
4. A lot of different factors contribute to air pollution. One particular factor, particulate matter, was measured
for prominent cities of three continents. Particulate matter includes smoke, soot, dust and liquid droplets
from combustion such that the particle is less than 10 microns in diameter and thus capable of reaching
deep into the respiratory system. The measurements are listed below.
Asia Europe Africa
79 35 43
104 34 16
73 30 33
40 43
The following is the result of analysis done using SPSS software.
ANOVA
Particulate_Matter
Sum of Mean
Squares df Square F Sig.
Between Groups 4230 2 2115 6.65 .020
Within Groups 2544 8 318
Total 6774 10
a) State the null and alternative hypotheses.
b) Prove that the total sum of squares is 6774.
c) At the 0.05 level of significance, is there sufficient evidence to conclude there is a difference in the means?
d) What do you understand by term “One-way ANOVA”?
TEST OF INDEPENDENCE

The purpose of a Chi - square test of independence is


to study the relationship / association between two
categorical / qualitative variables.

The data are displayed using contingency tale:

The contingency The Chi - square test


table has r is used to to test the
rows, c columns independence between
and hence r*c the rows and columns of
total cells a contingency table.

Qualitative Variable 2
Qualitative Variable 1
1 2 … c
1 O11 O12 … O1C
2 O21 O22 … O2C
    
R Or1 Or2 … OrC
GENERAL STEPS:

STEP 1 : Null and Alternative hypotheses:


H0 : The two variables are independent / Not related / Not associated
H1 : The two variables are dependent / Related / Associated

STEP 2 : Test Statistics

where
FOR 2 x 2 otherwise
contingency table Oij = observed frequencies / counts
Eij = expected frequencies/ counts

STEP 3 : Critical Value

STEP 4 : Decision Rule


Reject H0 if

STEP 5 : Decision & Conclusion


If H0 is rejected, we conclude that the two variables are dependent / related.
otherwise, we conclude that the two variables are independent / not related.
A large furniture retailer with stores in Penang, Perak and Pahang had the following results from a Hari Raya
Sale.
Location of Type of furniture
stores Dining Set TV Cabinet Sofa
Penang 20 10 30
Perak 18 8 24
Pahang 8 7 13

At 5% significance level, investigate if there is sufficient evidence to indicate that type of furniture sold and the
location of stores are dependent.
A total of 120 computer defects were recorded and the defects were classified into three types (A, B and C). the
production shift were each computer was manufactured was also identified. The counts are recorded in the
following contingency table. At 2.5% level of significance, is there enough evidence that there is an association
between production shift and type of defects?

Production Type of defects


shift A B C
1 15 20 21
2 22 19 23
A researcher wishes to determine whether there is any relationship between gender of an individual and the
amount of alcohol consumed. A sample of 68 individuals is selected and the data are recorded in the following
table:

Alcohol Consumption
Gender
Low Moderate High
Male 10 9 8
Female 13 16 12

At 5% significance level, can the researcher conclude that alcohol consumption is related to gender?
A researcher wants to know whether hotel guest‟s response is dependent on hotel location. A sample of hotel
guests at three locations (urban, semi – urban, rural) was asked to indicate whether they would return to the
hotel or would not return to the hotel. The following tables provide the result of the analysis done using SPSS
software. Based on the output, answer the following questions:
Response * Location Crosstabulation
Location
urban semiurban rural Total
response yes Count 46 36 12 94
Expected Count 40.2 29.2 24.5 94.0
no Count Y 20 35 86
Expected Count 36.8 26.8 X 86.0
Total Count 77 56 47
Expected Count 77.0 56.0 47.0

Chi-Square Tests
Value df Asymp. Sig. (2-sided)
Pearson Chi-Square Z 2 .000
Likelihood Ratio 18.975 2 .000
Linear-by-Linear Association 11.257 1 .001
N of Valid Cases 180

a) Compute the value X, Y and Z from the table above.


b) State the name of the test for above data.
c) Specify the appropriate hypothesis to test the data.
d) Based on the p - value, state your decision and conclusion for the hypothesis in c). Use α = 0.05.
A manager of a tire factory randomly selects tires for inspection. He recorded the shift (Shift 1 or Shift 2) that
produced the tire. Each tire is either classified as perfect, satisfactory or defective. The Crosstabulation of this
inspection is shown in the following table.
Shift * Tire_Quality Crosstabulation
Tire_Quality
Perfect Satisfactory Defective Total
Shift Shift 1 Count 100 121 10 231
Expected Count 99.3 122.7 9.0 231.0
Shift 2 Count 65 83 5 153
Expected Count 65.7 81.3 6.0 153.0
Total Count 165 204 15 384
Expected Count 165.0 204.0 15.0 384.0
Based on the SPSS output, answer the following questions.
a) State the total number of tires selected for inspection.
b) How many tires give „Satisfactory‟ as tire quality?
c) How many Shift 1 give „Perfect‟ as tire quality? Hence, show that the expected frequency for the group is
99.3.
d) Determine whether there is an association between shift of the production and tire quality at 5% significance
level.
A sample of students was asked if they prefer class room learning or online learning. The results of the study are
given in the table below. At 1% level of significance, is there evidence that there is an association between the
learning preference and their age?

Learning * Age Crosstabulation


Age
Total
15-17 18-20 21-23
Learning Count 12 15 8 35
Classroom
Expected Count A 12.2 13.3 35.0
Count 15 B 30 65
Online
Expected Count 17.6 22.8 24.7 65.0
Count 27 35 38 100
Total
Expected Count 27.0 35.0 38.0 100.0

a) State the total number of students involved in this study.


b) How many students prefer classroom learning and aged between 18 - 20?
c) Calculate the values of A and B.
d) At 5% level significance level, determine whether the learning preferences and the age of the students are
related.
QUESTION 5

By pooling a random sample of 316 undergraduate students, a campus press obtains the following frequency
counts regarding students‟s attitude towards a proposed change in dormitory regulations.

Gender * Student_attitude Crosstabulation


Age
Total
Favor Indifferent Oppose
Count 75 62 C 151
Male
Expected Count 61.2 68.3 21.5 151.0
Gender
Count 53 81 31 165
female
Expected Count 66.8 D 23.5 16.0
Count 128 143 45 316
Total
Expected Count 128.0 143.0 45.0 316.0

a) Compute the value of C and D using an appropriate formula.


(2 marks)
2
b) Calculate the 𝑥 - statistics for this study.
(4 marks)
c) State the null and alternative hypotheses to test whether gender and student‟s attitude towards a proposed
change in dormitory regulations are related.
(1 mark)
2
d) Based on the 𝑥 - statistics in b), state your decision and conclusion for the above test. Use α = 0.05.
(3 marks)
1. Violence and lack of discipline have become major problems in school recently. A random sample of 300
adults was asked if they favor giving freedom to schoolteachers to punish students for violence and lack of
discipline. The two way–way classification of the responses of these adults is presented in the following table.

In favor Against No opinion


Men 93 70 12
Women 87 32 6

Do the data provide sufficient evidence to conclude that gender ad opinions of adults are dependent at 1%
significance level?
2. A sample of 800 workers was asked their marital status (single or married) and their working shift (morning,
afternoon, or night). The information obtained was recorded in the following two-way classification table.
Marital Morning Afternoon Night
status shift shift shift
Single 160 60 80
Married 360 100 40

Test at the 5% significance level whether the marital status and the working shift are related.

3. A well-known hotel is being investigated to determine if its recruitment is gender biased. The following table
shows the classifications of applications for management, secretarial and chef positions according to gender
and the results of the interview.
Job category
Gender
Offered Denied
Male 70 30
Female 50 80
Test the null hypothesis that there is no gender bias in any job category. Use α = 0.01.
4. A sample of employees at a large travel agency was asked to indicate a preference for one of three pension
plans. The results are given in the following table:
Moderate Heavy
Nonsmokers
smokers smokers
Hypertension 21 36 30
No
48 26 19
hypertension
a) Specify the hypotheses.
b) At 5% significance level, can we conclude that there is association between the pension plan selected and
job class?
5. A study is conducted to test for the independence between road accident injuries and type of road. These
data were obtained from records of 500 selected accidents. Based on 5% level of significance, is there any
evidence of an association between the variables?
Type of road
Type of
Federal / Highway
injury State
City
None or minor 60 20 15
Major 100 88 102
Fatal 45 22 48
6. In an experiment to study the dependence of hypertension on smoking habits, the following data were taken
from 180 individuals. Test the hypothesis that the presence or absence of hypertension is independent of
smoking habits. Use 5% level of significance. Pension Plan
Job class Plan Plan Plan
A B C
Supervisor 10 13 29
Clerical 19 80 19
Labor 81 57 22

You might also like