MATH 403 EDA Chapter 8
MATH 403 EDA Chapter 8
MATH 403
Engineering
Data
Analysis
CABACES, DONNALYN C.
MARCAIDA, MARJORIE G.
SOTTO, RODOLFO JR. C.
MATH 403- ENGINEERING DATA ANALYSIS
Chapter 8
TEST ON HYPOTHESIS FOR A SINGLE
SAMPLE
Introduction
sample data using a point estimate or confidence interval was discussed. In many
situations there are two competing claims about the value of a parameter, and whichever
claim is correct must be determined. This can be done by statistical inference. Inferential
statistics is the other branch of statistics which deals with the estimates of population
values called parameters and to make statements about computed statistics acceptable
to some degree of confidence. Statistical inference is the method concerned with making
determining how accurate the generalizations are. This chapter focuses on the basic
sample of data.
At the end of this module, it is expected that the students will be able to:
t-test.
MATH 403- ENGINEERING DATA ANALYSIS
population. The goal of this process is to make judgment about the difference between
the sample statistics and a hypothesized population parameter. In this process, the
researcher must define the population under study, state the hypothesis to be
investigated, give the significance level, select a sample, collect data, perform the
required test and reach a conclusion. The z test and t test are statistical tests for
hypothesis testing on means while chi-square test is used for testing the standard
deviation.
existence of relationship between the variables under study. This statement is tested for
a statement of the expectation derived from the theory under the study.
MATH 403- ENGINEERING DATA ANALYSIS
Reject Ho Do no reject Ho
A type I error occurs if one rejects the null hypothesis when it is true. It is also
referred to as significance level and denoted by the Greek symbol alpha (). The
common values of are 1%, 5% and 10%. A type II error occurs if one does not reject
the null hypothesis when it is false. It is denoted by a Greek symbol beta ().
The level of significance is the maximum probability of committing a type I error. That
is, P (type I error) = . Generally, statisticians agree on using three arbitrary significance
levels: 0.10, 0.05 and 0.01 level. That is, if the null hypothesis is rejected, the probability
of a type I error will be 10%, 5% or 1% and the probability of correct decision will be 90%,
95% or 99%, depending on which level of significance is used. The values of correct
decision is the confidence interval which represents the chance of accepting the null
In order to state the hypothesis correctly, the researcher must translate correctly
the claim into mathematical symbols. There are three possible sets of statistical
hypotheses.
In hypotheses testing of a discrete test statistic, the critical region may be arbitrarily
chosen. If α is too large, it can be reduced by making an adjustment in the critical value.
It may be necessary to increase the sample size to offset the decrease that occurs
automatically in the power of the test. In statistical analysis, it had become customary to
choose a significance level of 0.10, 0.05 or 0.01 and the critical region is selected
accordingly in which the rejection or non-rejection of the null hypothesis H0 would depend
on. For example, if the test is two tailed and 𝛼 is set at the 0.05 level of significance and
the test statistic involves, say, the standard normal distribution, then a z-value is observed
from the data and the critical region is z > 1.96 or z < −1.96 where the value 1.96 is found
as z0.025 in the table of Areas Under the Normal Curve. A value of z in the critical region
prompts the statement “The value of the test statistic is significant,” which we can then
MATH 403- ENGINEERING DATA ANALYSIS
translate into the user’s language. For example, if the hypothesis is given by H 0: μ = 12,
H1: μ 12, one might say, “The mean differs significantly from the value 12.”
The philosophy that the maximum risk of making a type I error should be controlled
is he root of the pre-selection of a significance level. However, this approach does not
account for values of test statistics that are “close” to the critical region. Suppose, for
example, in the illustration with H0: μ = 12 versus H1: μ 12, a value of z = 1.84 is
observed; strictly speaking, with = 0.05, the value is not significant. But the risk of
committing a type I error if one rejects H0 in this case could hardly be considered severe.
In fact, in a two-tailed scenario, one can quantify this risk as P = 2P (Z > 1.84 when μ =
information to the user although the evidence against H 0 is not as strong as that which
would result from rejection at an = 0.05 level. As a result, the P-value approach has
of a probability, to a mere “reject” or “do not reject” conclusion. The P-value also gives an
important information when the z-value falls well into the ordinary critical region. For
example, if z is 2.75, it is informative for the user to observe that P = 2(0.0030) = 0.0060,
and thus the z-value is significant at a level considerably less than 0.05. It is important to
know that under the condition of H0, a value of z = 2.75 is an extremely rare event. That
is, a value at least that large in magnitude would only occur 60 times in 10,000
experiments.
MATH 403- ENGINEERING DATA ANALYSIS
A P-value is the lowest level of significance at which the observed value of the test
statistic is significant. It is the smallest level of that would lead to rejection of the Ho with
The following are the steps in hypothesis testing using the fixed probability of Type I
Error approach.
2. Determine the level of significance and the direction of test. The direction of test
will be based on whether the alternative hypothesis is stated as left or right tailed
4. Write the decision rule expressing on how to accept or reject the null hypothesis.
5. Compute the test statistic and compare with the critical value. The test statistic
6. State the decision based on the resulting computed value when compared to the
critical value.
If you will be testing the hypothesis using Significant Testing or the P-value approach,
4. Compute the P-value based on the computed value of the test statistic.
5. State the decision based on the resulting P-value and knowledge of the scientific
system.
Following the steps in hypothesis testing for only single mean, the hypothesized
Ho: µ = µo
H1 : µ µo
H1 : µ > µo
H1: µ < µo
The decision rule is stated as follows: reject the null hypothesis if the absolute value of
the test statistic exceeds the critical value. Otherwise, do not reject the null hypothesis.
In order to draw inference on a mean in one-population case assuming that the entries
are normally distributed and the variance is known, Z-test is used. It can be used when
the sample size is equal or greater than 30 (n 30). The Z-statistic, Zc, is the test statistic
MATH 403- ENGINEERING DATA ANALYSIS
used in order to lead for the rejection of null hypothesis in favor of the alternative
𝑋̅ − 𝜇𝑜
𝑧𝑐 =
𝜎/√𝑛
Where 𝑋̅ the computed mean is in the gathered data, 𝜇𝑜 is the hypothesized mean, 𝜎 is
the population standard deviation which is known or given and n is the sample size. The
critical value is obtained using the z-tabular value. For a two-tailed test, the value of 1-/2
written symbolically as z/2 is considered. Otherwise, for one-tailed test the value of 1-
written as z is written.
Figure 1. The Normal Distribution or Z- Distribution for Testing the Hypothesis Ho: = o
with critical values for (a) H1: o, (b) > o, (c) < o
Professor X shows that the average grade in the midterm examination is 85%. Professor
X claims that the average grade of the students in the midterm is at least 80% with a
standard deviation of 16%. Is there an evidence to say that the claim is correct at 5% level
of significance?
Solution:
1. H0 : µ = 80%
H1 : µ > 80%
MATH 403- ENGINEERING DATA ANALYSIS
𝑋̅ −𝜇𝑜
3. 𝑧𝑐 = 𝜎/√𝑛
𝑋̅ − 𝜇𝑜
𝑧𝑐 = 𝜎
√𝑛
85 − 80
=
16
√100
= 3.125
6. Reject H0 since 3.125 is greater than 1.645
Using the P-value approach, the P-value corresponding to z = 3.125 is 0.0009 using the
table for Areas Under the Normal Curve. This results to an evidence stronger than the
Example 2. A manufacturer of solar lamp claims that the mean useful life of their new
product is 8 months with a standard deviation of 0.5 month. To test this clam, a random
sample of 50 solar lamps were tested and found to have a mean life of 7.8 months. Test
the hypothesis that = 8 months against the alternative hypothesis that 8 months
Solution:
1. H0 : µ = 8 months
H1 : µ 8 months
𝑋̅ −𝜇𝑜
3. 𝑧𝑐 = 𝜎/√𝑛
4. Critical region: z < -2.575 and z > 2.575. Reject H0 if -2.575 > zc > 2.575
𝑋̅ − 𝜇𝑜
𝑧𝑐 = 𝜎
√𝑛
7.8 − 8
=
0.5
√50
7. Therefore, the mean useful life of the new product is not equal to 8 months. In fact
Using the P-value approach and considering that this is a two-tailed test, the P-value is
twice as the area to the left of z = -2.83. Using the table for Areas Under the Normal Curve,
distributed but the variance is unknown and the sample size is less than 30, t-test is used.
The test statistic used is the t-statistic, tc, which is computed as follows:
𝑋̅ − 𝜇𝑜
𝑡𝑐 =
𝑠/√𝑛
MATH 403- ENGINEERING DATA ANALYSIS
where 𝑋̅ the computed mean is in the gathered data, 𝜇𝑜 is the hypothesized mean, s is
the sample standard deviation and n is the sample size. The critical value is obtained
using the t-tabular value. For a two-sided test, critical value is obtained at /2 and at a
degree of freedom (d.f.) equals to (n-1), written as t/2 (n-1). Otherwise, for one-sided test,
Figure 2. T- Distribution for Testing the Hypothesis Ho: = o with critical values for
(a) H1: o, (b) > o, (c) < o
incoming freshmen. Those who got scores equal or higher than the set passing are
accepted in the College. The average score of the incoming freshmen was 80% before
exam was suspended for two years and it is thought that the quality of the first year
students had diminished. However, with the vision, mission, goals and objectives of the
University and the College towards quality education, the Dean wants to determine if the
diminished so a small random sample of 15 freshmen students and administers the same
entrance exam. The average score is found to be 83% with a standard deviation of 5%.
Solution:
1. H0 : µ = 80%
H1 : µ 80%
𝑋̅ −𝜇𝑜
3. 𝑡𝑐 = 𝑠/√𝑛
4. Critical region: t = 2.977. Reject H0 if tc is less than -2.977 or greater than 2.977
This is obtained from the table for Critical Values of the t-distribution using /2 = 0.005
𝑋̅ − 𝜇𝑜
𝑡𝑐 = 𝑠
√𝑛
83 − 80
=
5
√15
= 2.32
6. Do not reject H0 since 2.32 is less than 2.977 but greater than -2.977
significance.
The P-value corresponding to 2.32 is 0.036 or 3.6%. Since this is a two-tailed test, then
The chi-square distribution will be used to test a claim about a single variance or
standard deviation. The formula for the Chi-square test for a single variance is given by:
(𝑛 − 1)𝑠 2
𝜒2 =
𝜎2
where n is the sample size, 𝑠 2 is the sample variance and 𝜎 2 is the population variance
with the degrees of freedom equal to (n -1). There are three assumptions for the Chi-
square test: the sample must be randomly selected from the population, the population
must be normally distributed for the variable under study, and the observations must be
Figure 3. Chi-Squared Distribution for Testing the Hypothesis Ho: 2 = o2 with critical values for
(a) H1: 2 o2, (b) 2 > o2, (c) 2 < o2
Example1. A company claims that the variance of the sugar content of its ice cream is
measured. The variance of the sample is found to be 36. At 10% level of significance, is
Solution:
1. H0 : 2 = 25 mg/oz
MATH 403- ENGINEERING DATA ANALYSIS
H1 : 2 25 mg/oz
(𝑛−1)𝑠2
3. 𝜒 2 = 𝜎2
4. Critical region: 𝜒 2 < 10.117 and 𝜒 2 > 30.144 . Reject H0 if 𝜒 2 is less than 10.117
or greater than 30.144. This is obtained from the table for Critical Values of the
2
(𝑛 − 1)𝑠 2
𝜒 =
𝜎2
(19)(36)
=
25
= 27.36
7. Therefore, the company claim that the sugar content is equal to 25 mg/oz is correct
binomial experiment equals some specified value. That is, the null hypothesis H o that p =
po, where p is the parameter of the binomial distribution is tested. The alternative
𝑝 < 𝑝𝑜 , 𝑝 > 𝑝𝑜 or 𝑝 ≠ 𝑝𝑜
MATH 403- ENGINEERING DATA ANALYSIS
1. H0: p = po
value
Example1. A home developer claims that solar panels are installed in 65% of all homes
being constructed today in a certain subdivision. Would you agree with this claim if a
random survey of new homes in this subdivision shows that 8 out of 15 had solar panels
Solution:
1. H0 : p = 0.65
H1 : p 0.65
4. Computations: x = 8 and npo = (15) (0.65) = 9.75. Using the table for Binomial
= 2 ∑ 𝑏(𝑥; 15,0.65)
𝑥=0
= 0.5213
5. Do not reject H0 and conclude that there is no enough evidence to doubt the claim
For large n, approximation is required. When the hypothesized value po is very close to 0
or 1, the Poisson distribution with parameter µ = npo may be used. However, the normal-
curve approximation, with parameters µ = npo and 2 = npoqo, is usually preferred for large
n and is very accurate as long as po is not extremely close to 0 or 1. Using the normal
𝑥 − 𝑛𝑝𝑜
𝑧=
√𝑛𝑝𝑜 𝑞𝑜
which is a value of the standard normal variable Z. Hence, for a two-tailed test at the -
level of significance, the critical region is z < -z/2 and z > z/2. For one-sided alternative
p < po, the critical region is z < -z and for the alternative p > po, the critical region is z >
z.
The company is said to demonstrate capability to the customers if the process produces
defective items not exceeding to 5%. To determine this, a random sample of 200
microcontrollers were tested and found out that there are four defective items. Will you
agree that the company demonstrate process capability at 0.05 level of significance? Use
P-value approach.
MATH 403- ENGINEERING DATA ANALYSIS
Solution:
1. H0 : p = 0.05
𝑥 − 𝑛𝑝𝑜
𝑧=
√𝑛𝑝𝑜 𝑞𝑜
4 − 200(0.05)
=
√200(0.05)(0.95)
= −1.95
4. The P-value from the Table for Areas Under the Normal Curve, P(z < -1.95) =
0.0256.
REFERENCES:
Garcia, George A. Fundamental Concepts and Methods in Statistics, Manila: University of Sto.
Tomas Publishing House, 2004
Montgomery, Douglas C., et al., Applied Statistics and Probability for Engineers, 7th ed., John
Wiley & Sons (Asia) Pte Ltd, 2018
Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists, 9th ed.,
Pearson Education Inc., 2016
MATH 403- ENGINEERING DATA ANALYSIS
CHAPTER TEST
1. A company producing lubricating oil claims that the average content of the
containers is 20 liters. Test this claim if a random sample of ten containers are 20.4,
19.4, 20.2, 20.6, 20.2, 19.6, 19.8, 20.8, 20.6 and 19.6 liters. Assume normal
2. It is claimed that personal vehicle is driven 25,000 kilometers per year. Would
you agree with this claim if a random sample of 100 vehicle owners were asked to
keep the records of their travel and showed that an average of 28,500 kilometers
3. A marketing expert for mobile operating system believes that 40% of the users
prefer android. If 9 out of 20 choose android over IOS, what can you conclude about
normally distributed with a variance of 0.06 liter. Test this hypothesis against the
alternative that the variance is not equal to 0.06 liter. Use 0.01 level of significance.