0% found this document useful (0 votes)
15 views37 pages

Hypotheses TestingComplete

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views37 pages

Hypotheses TestingComplete

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

HYPOTHESES TESTING

Compiled by: Orley G. Fadriquel, RMEE

Learning Objectives:
1. Understand the fundamentals of hypothesis testing
2. Learn how hypothesis testing works
3. Be able to differentiate between z-test, t-test, and other statistics concepts

Statistical inference of samples involves using statistical methods to conclude, make predictions,
or generalize information about a population based on data collected from one or more samples. In
essence, it is the process of making educated and quantified guesses about a larger group (the
population) by analyzing a subset of that group (the sample).
Hypothesis testing for a single sample is a statistical method used to make inferences about a
population parameter based on data from a single sample. The goal is to assess whether the sample
data provides enough evidence to support or reject a specific hypothesis about the population
parameter.
Statistical inference for two samples involves using statistical methods to compare two
independent samples from different populations to make inferences or draw conclusions about the
population characteristics or differences between them. This is a common statistical analysis used in
various fields to answer questions such as whether there is a significant difference between two groups
or populations.
Parametric Hypotheses Tests
There are several types of parametric hypothesis tests. These tests are used to make inferences
about population parameters while assuming certain distributional data properties. These tests are
used both for single-sample and two-sample tests.
1. T-Test
2. Z-Test
3. Variance Test (Chi-Squared Test
4. F-Test
Each parametric test has specific assumptions and conditions that must be met for valid
inference. The appropriate test choice depends on the data's characteristics and the parameter
being tested.

Non - Parametric Tests


A non-parametric test, also known as a distribution-free test, is a statistical hypothesis test that
does not make strong assumptions about the population distribution from which the data is drawn. In
contrast to parametric tests, which assume a specific probability distribution (usually normal
distribution), non-parametric tests are more flexible. They can be used with data that may not meet
the assumptions of parametric tests. Several types of non-parametric hypothesis tests are available.
These tests are valuable when data does not meet the assumptions of a normal distribution or when
dealing with ordinal or categorical data. Some common types of non-parametric hypothesis tests for
a single sample include:

1. Sign Test
2. Wilcoxon Signed-Rank Test
3. Runs Test
4. Kolmogorov-Smirnov Test
5. Kruskal – Wallis (H - test)
6. Friedman Test:
7. Spearman Rank Difference Correlation Coefficient:
8. Phi Correlation Coefficient:
9. Point Biserial Correlation Coefficient:

Fundamentals of Hypothesis Testing

Example:
In one of the Engineering Data Analysis Class, the mean score of 40 marks out of 100. The
Professor decided that extra classes are necessary in order to improve the performance of the class.
The class scored an average of 45 marks out of 100 after taking extra classes. Can we be sure whether
the increase in marks is a result of extra classes or is it just random?

Hypothesis testing lets us identify that. It lets a sample statistic to be checked against a
population statistic or statistic of another sample to study any intervention etc. Extra classes being the
intervention in the above example.

Hypothesis testing is defined in two terms – Null Hypothesis and Alternate Hypothesis.
• Null Hypothesis (Ho) being the sample statistic to be equal to the population statistic. For
example, the Null Hypothesis for the above example would be that the average marks after
extra class are the same as that before the classes.
• Alternate Hypothesis (Ha) for this example would be that the marks after extra class significantly
differ from those before the class.

Hypothesis Testing is done on different confidence levels and uses a z-score to calculate the
probability. So, for a 95% Confidence Interval, anything above the z-threshold for 95% would reject the
null hypothesis.

NOTE:
We CANNOT ACCEPT the Null hypothesis, only REJECT it or FAIL TO REJECT it. Why?
The concept that null hypotheses cannot be "accepted" but can only be "rejected" or "fail to
be rejected" is a fundamental principle in hypothesis testing in statistics. This concept is rooted in the
philosophy of science and the way statistical inference works. Here's why it's the case:
• Burden of Proof: In hypothesis testing, the null hypothesis (often denoted as H0) states that there
is no effect or difference. It represents a default or null position, and it's up to the researcher to
provide evidence against this null hypothesis. The burden of proof is on the researcher to show
that there is a statistically significant effect or difference.
• Uncertainty: In statistical analysis, we deal with uncertainty. We use sample data to make
inferences about the entire population. Since we're dealing with sample data, some degree of
uncertainty is always involved. Even if we collect data and find that it doesn't strongly contradict
the null hypothesis, we can't definitively say that the null hypothesis is true. We can only say we
haven't found enough evidence to reject it.
• Type I and Type II Errors: When conducting hypothesis tests, two types of errors can occur: Type
I error (rejecting a true null hypothesis) and Type II error (failing to reject a false null hypothesis).
Researchers control the risk of Type I errors by designating a significance level (alpha). However,
they cannot contain Type II errors directly. The decision to "fail to reject" the null hypothesis at a
given significance level is a way to minimize the risk of Type II errors while allowing for the
possibility of Type I errors.
• Continuous Testing: Scientific research is an ongoing process. New evidence, data, and
research can always emerge. Therefore, we typically don't definitively "accept" or "prove"
hypotheses; we gather evidence to support or refute them. Even if we fail to reject the null
hypothesis in one study, future studies might provide more substantial evidence to leave it.
The language of hypothesis testing is rooted in the philosophy of scientific inquiry and the
recognition of uncertainty in data analysis. Rather than definitively "accepting" the null hypothesis, we
say that we either "reject" it based on the available evidence or "fail to reject" it because we haven't
found enough evidence to do so. This approach helps maintain a cautious and rigorous standard for
scientific conclusions.
ERRORS IN HYPOTHESES TESTS
Let’s take an example to understand the concept of Hypothesis Testing. A person is on trial for
a criminal offense, and the judge needs to provide a verdict on his case. Now, there are four possible
combinations in such a case:
• First Case: The person is innocent, and the judge identifies the person as innocent
• Second Case: The person is innocent, and the judge identifies the person as guilty
• Third Case: The person is guilty, and the judge identifies the person as innocent
• Fourth Case: The person is guilty, and the judge identifies the person as guilty

As you can see, there can be two types of error in the judgment – Type 1 error, when the verdict
is against the person while he was innocent, and Type 2 error, when the verdict is in favor of the person
while he was guilty.
According to the Presumption of Innocence, the person is considered innocent until proven
guilty. That means the judge must find the evidence which convinces him “beyond a reasonable
doubt.” This “Beyond a reasonable doubt” phenomenon can be understood as Probability (Judge
Decided Guilty | Person is Innocent) should be small.

Another Example:
• A male human tested positive for being pregnant. Is it even possible? This indeed looks like a
case of False Positive. More formally, it is the incorrect rejection of a True Null Hypothesis. The Null
Hypothesis, in this case, would be that a male Human is not pregnant. This is a Type I Error
• A male human is pregnant; the test supports the Null Hypothesis. This looks like a case of False
Negative. More formally, it is defined as accepting a false Null Hypothesis. This is a Type II Error

Now, we have defined a basic Hypothesis Testing framework. It is important to look into some of
the mistakes that are committed while performing Hypothesis Testing and try to classify those mistakes
if possible.
Now, look at the Null Hypothesis definition above. At first glance, we notice that it is a statement
subjective to the tester like you and me and not a fact. That means there is a possibility that the Null
Hypothesis can be true or false, and we may end up committing some mistakes on the same lines.

Figure 1. Types of Error in Hypotheses Test.


Normality Test
To conduct a hypothesis test, an individual should initially verify the prerequisites associated with
the specific test. A frequent prerequisite involves the assumption that the data under examination
follows a particular distribution, often the normal distribution. When the data conforms to a normal
distribution, parametric tests are typically applicable, while non-parametric tests are commonly
employed when the data deviates from normality.
One of the prevalent assumptions when performing statistical tests pertains to the normality of
the data. For instance, if an individual intends to perform a t-test or an ANOVA, they must first assess
whether the data or variables exhibit a normal distribution.
If the data are not normally distributed, the parametric test cannot be used instead, use the
non-parametric tests. Non-parametric tests do not assume that the data are normally distributed.
The assumption of normal distribution is also important for linear regression analysis. Still, in this
case, the error made by the model must be normally distributed, not the data itself.

How is the normal distribution tested?


Normal distribution can be tested either analytically (statistical tests) or graphically. The most
common analytical tests to check data for normal distribution are the following:
• Kolmogorov-Smirnov Test
• Shapiro-Wilk Test
• Anderson-Darling Test
For graphical verification, either a histogram or, better, the Q-Q plot is used. Q-Q stands for the
quantile-quantile plot, where the actually observed distribution is compared with the theoretically
expected distribution.
In all of these tests, individuals examine the null hypothesis concerning the normality of their
data. The null hypothesis suggests that the data's frequency distribution is normal. To determine
whether to reject or retain the null hypothesis, these tests yield a p-value. The critical consideration is
whether this p-value is less than or greater than 0.05.
If the p-value is less than 0.05, this is interpreted as a significant deviation from the normal
distribution, and it can be assumed that the data are not normally distributed. If the p-value is greater
than 0.05 and you want to be statistically clean, you cannot necessarily say that the frequency
distribution is normal; you cannot reject the null hypothesis.

Disadvantage of The Analytical Tests for Normal Distribution


Unfortunately, the analytical method has a significant drawback, so more attention is paid to
graphical methods.
The problem is that the calculated p-value is affected by the sample size. Therefore, if you have a very
small sample, your p-value may be much larger than 0.05, but if you have a very large sample from
the same population, your p-value may be smaller than 0.05.
Suppose we assume that the distribution in the population deviates only slightly from the normal
distribution. In that case, we will get a very large p-value with a very small sample and, therefore
assume that the data are normally distributed. However, if you take a larger sample, the p-value gets
smaller and smaller, even though the samples are from the same population with the same distribution.
With a very large sample, you can even get a p-value of less than 0.05, rejecting the null hypothesis of
normal distribution.
To avoid this problem, graphical methods are increasingly being used.

Graphical Test for Normal Distribution


If the normal distribution is tested graphically, one looks either at the histogram or, even better,
the QQ plot. If you want to check the normal distribution using a histogram, plot the normal distribution
on the histogram of your data and check that the distribution curve of the data approximately
matches the normal distribution curve.
A better way to do this is to use a quantile-quantile plot, or Q-Q plot for short. This compares the
theoretical quantiles that the data should have if they were perfectly normal with the quantiles of the
measured values.
If the data were perfectly normally distributed, all points would lie on the line. The further the
data deviates from the line, the less normally distributed the data is.

Parametric and Non-Parametric Tests


Levene Test
Many statistical testing procedures require that there is equal variance in the samples. How
can it now be checked whether the variances are homogeneous, i.e., whether there is equality of
variance? This is where the Levene test helps. The Levene test checks whether several groups have
the same variance in the population.

Levene's test is therefore used to test the null hypothesis that the samples to be compared come
from a population with the same variance. In this case, possible variance differences occur only by
chance, since each sampling has small differences.

If the p-value for the Levene test is greater than .05, then the variances are not significantly
different (i.e., the homogeneity assumption of the variance is met). If the p-value for Levene's test is less
than .05, then there is a significant difference between the variances.
H0: Groups have equal variances
H1: Groups have different variances
It is important to note that the mean values of the individual groups have no influence on the
result; they may differ. A big advantage of Levene's test is that it is very stable against violations of the
normal distribution. Therefore, Levene's test is used in many statistics programs.
Furthermore, the variance equality can also be checked graphically; this is usually done with a
grouped box plot or with a Scatterplot.

What is the p-value?


The p-value indicates the probability that the observed result or an even more extreme result
will occur if the null hypothesis is true. The p-value is used to decide whether the null hypothesis is
rejected or retained (not rejected). If the p-value is smaller than the defined significance level (often
5%), the null hypothesis is rejected; otherwise not.

When is the p-value used?


The "p" in the p-value stands for "probability." A p-value is a statistical measure representing the
probability of obtaining the observed results (or more extreme results) in a statistical test, assuming that
the null hypothesis is true. In hypothesis testing, researchers use p-values to assess the strength of
evidence against the null hypothesis.
The p-value is used to either reject or retain (not reject) the null hypothesis in a hypothesis test. If
the calculated p-value is smaller than the significance level, which in most cases is 5%, then the null
hypothesis is rejected, otherwise, it is retained.

Significance Level
The significance level is determined before the test. If the calculated p-value is below this value,
the null hypothesis is rejected, otherwise, it is retained. As a rule, a significance level of 5 % is chosen.
• alpha < 0.01: very significant result.
• alpha < 0.05: significant result.
• alpha > 0.05: not significant result.
The significance level thus indicates the probability of a 1st type error. What does this mean?
Suppose there is a p-value of 5% and the null hypothesis is rejected. In that case, the probability that
the null hypothesis is valid is 5%, i.e., there is a 5% probability of making a mistake. If the critical value is
reduced to 1%, the probability of error is accordingly only 1%, but it is also more challenging to confirm
the alternative hypothesis.
One-Tailed and Two-Tailed p Values
What Is A One-Tailed Test?
Let’s discuss the meaning of a one-tailed test. If you use a significance level of .05, a one-tailed
test allows all of your alpha to test the statistical significance in the one direction of interest. This means
that .05 is in one tail of the distribution of your test statistic. When using a one-tailed test, you are testing
for the possibility of the relationship in one direction and completely disregarding the possibility of a
relationship in the other direction. The one-tailed test provides more power to detect an effect in one
direction by not testing the effect in the other direction.

In a one-tailed p-test (also known as a one-tailed t-test or one-tailed hypothesis test), the value
1.645 is associated with a specific significance level, often denoted as alpha (α), which determines the
critical value for deciding the null hypothesis. If your test statistic (calculated from your sample data) is
more significant than 1.645, you will reject the null hypothesis at the 0.05 significance level. You would
fail to reject the null hypothesis if your test statistic is less than or equal to 1.645.
What Is A Two-Tailed Test?
Suppose you are using a significance level of 0.05. In that case, a two-tailed test allots half of
your alpha to test the statistical significance in one direction and half of your alpha to testing statistical
significance in the other direction. This means that .025 is in each tail of the distribution of your test
statistic. When using a two-tailed test, regardless of the direction of the relationship you hypothesize,
you are testing for the possibility of the relationship in both directions.
For a two-tailed test at a 95% confidence level, you typically use a significance level of 0.05
(5%). This significance level is split evenly between the two tails of the distribution, with 2.5% in each tail.
The critical value of 1.96 is chosen because it corresponds to the 2.5% cutoff in the tails of a standard
normal distribution (z-distribution). This means that if your test statistic (calculated from your sample
data) falls below -1.96 or above 1.96, you will reject the null hypothesis at the 0.05 significance level,
indicating a significant difference between your sample and the population parameter in either
direction.
A single Sample and Two samples
In the context of hypothesis testing, the terms "single sample" and "two samples" denote the
number of groups or sets of data being compared. These distinctions hold significance as they
determine the type of statistical tests that are appropriate for the analysis.
In a single sample, one is engaged in comparing the characteristics of a particular group to a
known value or a theoretical expectation. The data collected emanates from a singular group or
population. For instance, a single sample test would be employed when investigating whether the
average height of a group of individuals differs significantly from a known average height, such as the
average height of the general population.
On the other hand, when there are two distinct groups or populations and wishes to compare
the characteristics of these two groups, a two-sample test is employed. The data derived from each
group is independent of one another, signifying that the observations in one group bear no relationship
to the observations in the other group.
Dependent and Independent Samples
In a DEPENDENT SAMPLE, the measures are related. It involves comparing two sets of
measurements derived from the same group or related individuals. These measurements are frequently
taken at distinct times or under varying conditions. This type of test is commonly referred to as a paired
sample or matched pairs test. For instance,
For example: if an individual seeks to ascertain whether there exists a noteworthy difference in the
blood pressure of individuals before and after a treatment, a dependent sample test would be
employed.
Example: If you take a sample of people who have had a knee operation and interview them before
and after the operation, this is a dependent sample. This is because the same person was interviewed
at two different times.
In INDEPENDENT SAMPLES, the values come from two or more different groups. Independent
samples refer to sets of data where the observations in one sample are not related or paired with the
observations in the other sample. An independent samples test is used when comparing the means or
other characteristics of two separate and unrelated groups. Another term often used to refer to
independent samples is unpaired samples. In the context of statistical analysis, independent samples
and unpaired samples are often used interchangeably to describe situations where observations in
one sample are not related or paired with observations in another sample.
More Than Two Dependent or Independent Samples
In the case of independent and dependent sampling, there can be more than two samples.
The important thing is that in the case of independent sampling, the individual groups or samples have
nothing to do with each other. In the case of dependent sampling, a respondent appears in all groups.
Why Is It Important to Know the Difference?
Whether the data at hand are from a dependent or an independent sample determines which
hypothesis test is used. For example, an independent samples t-test or an ANOVA without repeated
measures is calculated if the data are independent. If the data are dependent, a t-test for dependent
samples or an ANOVA with repeated measures is calculated.
Hypothesis Testing for Dependent and Independent Samples

T-Test
A t-test is a statistical hypothesis test that assesses sample means to draw conclusions about
population means. Frequently, analysts use a t-test to determine whether the population means for
two groups are different. The t-test is a statistical test procedure that tests whether there is a significant
difference between the means of two groups.

Figure 2. Sample of Groups. The two groups could be, for example, patients who received drug A once and drug
B once, and you want to know if there is a difference in blood pressure between these two groups
There are three types of t-tests. They all evaluate sample means using t-values, t-distributions, and
degrees of freedom to calculate statistical significance. It is a parametric analysis that compares one
or two group means.

Standard t-test
There are three standard t-tests. The one-sample t-test, the independent-sample t-test, and the
paired-sample t-test.

One Sample t-Test


We use the one-sample t-test to compare a sample's mean with a known reference mean.

Example of a one-sample t-test


A manufacturer of chocolate bars claims that its chocolate bars weigh 50 grams on average.
To verify this, a sample of 30 bars is taken and weighed. The mean value of this sample is 48 grams.
To determine if the mean of 48 grams significantly differs from the claimed 50 grams, a one-
sample t-test can be performed.

t-Test for Independent Samples


The t-test for independent samples compares the means of two independent groups or samples.
This is to determine if there is a significant difference between these means.

Example of a t-test for independent samples


We would like to compare the effectiveness of two painkillers, drug A and drug B.

To do this, we randomly divided 60 test subjects into two groups. The first group receives drug A;
the second group receives drug B. With an independent t-test, we can now test whether there is a
significant difference in pain relief between the two drugs.
Paired Samples t-Test
t-test for dependent samples is used to compare the means of two dependent groups.

Example of the t-test for paired samples


We want to know how effective a diet is. To do this, we weighed 30 people before the diet and
exactly the same people after the diet.

Now we can see for each person how big the weight difference is between before and after.
With a dependent t-test, we can now check whether there is a significant difference.

Which T Test Should I Use?


To choose the correct t-test, you must know whether you are assessing one or two group means.
If you’re working with two groups means, do the groups have the same or different items/people? Use
the table below to choose the proper analysis.

Number of Group Means Group Type T Test


One sample t-test
One
Different items in each group Two sample t-test
Two
The same items in both groups Paired t-test
Two
Procedure in Carrying out the T-test
To test the null hypothesis that the true mean difference is zero, the procedure is as follows:
1) State the null hypothesis and alternate hypothesis.
2) Choose an alpha level,  and establish the degree of freedom (df=n - 1)
3) Find the critical value of t in a t-table.
4) Calculate the t-test statistic
𝑥̅ − 
𝑡= 
√𝑛
Where:
𝑥̅ – mean of the sample
 - mean of the population
 - standard deviation of the Population
n – number of samples
For the Standard Error of the Mean
𝑠𝑑
𝑆𝐸 (𝑥̅ ) =
√𝑛
5) Use t-distribution tables to compare your value for T to the tn−1 distribution. This will give the
p-value for the paired t-test. Interpret the result

EXERCISES
a) Single Sample t- Test

Brinell Hardness Scores

An engineer measured the Brinell hardness of 25 pieces of ductile iron that were sub-critically
annealed. The resulting data were:

Brinell Hardness of 25 Pieces of Ductile


Iron (kgf/mm2)

170 167 174 179 179

156 163 156 187 156

183 179 174 179 170

187 179 183 179 159

167 156 174 170 187

The engineer hypothesized that the mean Brinell hardness of all such ductile iron pieces is
greater than 170. Therefore, he was interested in testing the hypotheses:

H0 : μ = 170
HA: μ > 170
The engineer entered his data into Minitab and requested that the "one-sample t-test" be
conducted for the above hypotheses. He obtained the following output:

Descriptive Statistics

N Mean StDev SE Mean 95% Lower Bound

25 172.52 10.31 2.06 168.99

: mean of Brinell

Test

Null hypothesis H₀:  = 170


Alternative hypothesis HA: > 170

T-Value P-Value

1.22 0.117

The output tells us that the average Brinell hardness of the n = 25 pieces of ductile iron was 172.52
with a standard deviation of 10.31. (The standard error of the mean "SE Mean", calculated by dividing
the standard deviation 10.31 by the square root of n = 25, is 2.06). The test statistic t* is 1.22, and the P-
value is 0.117.

If the engineer set his significance level α at 0.05 and used the critical value approach to
conduct his hypothesis test, he would reject the null hypothesis if his test statistic t* were greater than
1.7109 (determined using statistical software or a t-table):

Since the engineer's test statistic, t* = 1.22, is not greater than 1.7109, the engineer fails to
reject the null hypothesis. The test statistic does not fall in the "critical region." At the = 0.05 level, there
is insufficient evidence to conclude that the mean Brinell hardness of all such ductile iron pieces is
greater than 170.
If the engineer used the P-value approach to conduct his hypothesis test, he would determine the
area under a tn - 1 = t24 curve and to the right of the test statistic t* = 1.22:

In the output above, Minitab reports that the P-value is 0.117. Since the P-value, 0.117, is greater
than  = 0.05, the engineer fails to reject the null hypothesis. There is insufficient evidence, at the =
0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.

Note that the engineer obtains the same scientific conclusion regardless of the approach used. This
will always be the case.

Height of Sunflowers

A biologist was interested in determining whether sunflower seedlings treated with an extract
from Vinca minor roots resulted in a lower average height of sunflower seedlings than the standard
height of 15.7 cm. The biologist treated a random sample of n = 33 seedlings with the extract and
subsequently obtained the following heights:

Heights of 33 Sunflower Seedlings

11.5 11.8 15.7 16.1 14.1 10.5 9.3 15.0 11.1

15.2 19.0 12.8 12.4 19.2 13.5 12.2 13.3

16.5 13.5 14.4 16.7 10.9 13.0 10.3 15.8

15.1 17.1 13.3 12.4 8.5 14.3 12.9 13.5

The biologist's hypotheses are:

H0 : μ = 15.7
HA: μ < 15.7
The biologist entered her data into Minitab and requested that the "one-sample t-test" be conducted
for the above hypotheses. She obtained the following output:

Descriptive Statistics

N Mean StDev SE Mean 95% Upper Bound

33 13.664 2.544 0.443 14.414

μ: mean of Height

Test

Null hypothesis H₀: μ =15.7


Alternative hypothesis HA: μ < 15.7

T-Value P-Value

-4.60 0.000

The output tells us that the average height of the n = 33 sunflower seedlings was 13.664 with a
standard deviation of 2.544. (The standard error of the mean "SE Mean", calculated by dividing the
standard deviation 13.664 by the square root of n = 33, is 0.443). The test statistic t* is -4.60, and the P-
value, 0.000, is to three decimal places.

Minitab Note. Minitab will always report P-values to only 3 decimal places. If Minitab reports the P-value
as 0.000, it really means that the P-value is 0.000....something. Throughout this course (and your future
research!), when you see that Minitab reports the P-value as 0.000, you should report the P-value as
being "< 0.001."

If the biologist set her significance level  at 0.05 and used the critical value approach to
conduct her hypothesis test, she would reject the null hypothesis if her test statistic t* were less than -
1.6939 (determined using statistical software or a t-table):s-3-3
Since the biologist's test statistic, t* = -4.60, is less than -1.6939, the biologist rejects the null
hypothesis. That is, the test statistic falls in the "critical region." There is sufficient evidence, at the  =
0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.

If the biologist used the P-value approach to conduct her hypothesis test, she would determine
the area under a tn - 1 = t32 curve and to the left of the test statistic t* = -4.60:

In the output above, Minitab reports that the P-value is 0.000, which we take to mean < 0.001.
Since the P-value is less than 0.001, it is clearly less than  = 0.05, and the biologist rejects the null
hypothesis. There is sufficient evidence, at the  = 0.05 level, to conclude that the mean height of all
such sunflower seedlings is less than 15.7 cm.

Note again that the biologist obtains the same scientific conclusion regardless of the approach used.
This will always be the case.

Gum Thickness

A manufacturer claims that the thickness of the spearmint gum it produces is 7.5 one-hundredths of an
inch. A quality control specialist regularly checks this claim. On one production run, he took a random
sample of n = 10 pieces of gum and measured their thickness. He obtained:

Thicknesses of 10 Pieces of Gum

7.65 7.60 7.65 7.70 7.55

7.55 7.40 7.40 7.50 7.50

The quality control specialist's hypotheses are:

H0 : μ = 7.5
HA: μ ≠ 7.5

The quality control specialist entered his data into Minitab and requested that the "one-sample t-test"
be conducted for the above hypotheses. He obtained the following output:
Descriptive Statistics

N Mean StDev SE Mean 95% CI for μ

10 7.550 0.1027 0.0325 (7.4765, 7.6235)

μ: mean of Thickness

Test

Null hypothesis H₀: μ = 7.5


Alternative hypothesis HA: μ ≠ 7.5

T-Value P-Value

1.54 0.158

The output tells us that the average thickness of the n = 10 pieces of gums was 7.55 one-
hundredths of an inch with a standard deviation of 0.1027. (The standard error of the mean "SE Mean",
calculated by dividing the standard deviation 0.1027 by the square root of n = 10, is 0.0325). The test
statistic t* is 1.54, and the P-value is 0.158.

If the quality control specialist sets his significance level at 0.05 and used the critical value
approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t* were
less than -2.2616 or greater than 2.2616 (determined using statistical software or a t-table):

Since the quality control specialist's test statistic, t* = 1.54, is not less than -2.2616 nor greater than
2.2616, the quality control specialist fails to reject the null hypothesis. That is, the test statistic does not
fall in the "critical region." There is insufficient evidence, at the  = 0.05 level, to conclude that the mean
thickness of all of the manufacturer's spearmint gum differs from 7.5 one-hundredths of an inch.

If the quality control specialist used the P-value approach to conduct his hypothesis test, he would
determine the area under a tn - 1 = t9 curve, to the right of 1.54 and to the left of -1.54:

In the output above, Minitab reports that the P-value is 0.158. Since the P-value, 0.158, is greater
than  = 0.05, the quality control specialist fails to reject the null hypothesis. There is insufficient
evidence, at the  = 0.05 level, to conclude that the mean thickness of all pieces of spearmint gum
differs from 7.5 one-hundredths of an inch.

Note that the quality control specialist obtains the same scientific conclusion regardless of the
approach used. This will always be the case.
b) Paired t-test
To test the null hypothesis that the true mean difference is zero, the procedure is as follows:
6) State the null hypothesis and alternate hypothesis.
7) Choose an alpha level, 
8) Find the critical value of t in a t table.
9) Calculate the t-test statistic
• Calculate the difference (di = yi − xi) between the two observations on each pair, making
sure you distinguish between positive and negative differences.
• Calculate the mean difference, d ̅.
• Calculate the standard deviation of the differences, sd, and use this to calculate the
o standard error of the mean difference,
𝑆
o 𝑆𝐸(𝑑̆ ) = 𝑑
√𝑛
• Calculate the t-statistic, which is given by
𝑑̅
𝑡=
𝑆𝐸(𝑑̆ )
Under the null hypothesis, this statistic follows a t-distribution with n − 1 degrees of freedom.
10) Use t-distribution tables to compare your value for T to the tn−1 distribution. This will give the
p-value for the paired t-test. Interpret the result
Example:
Suppose a sample of n=20 students were given a diagnostic test before studying a particular
module and then again after completing the module. We want to find out if, in general, our teaching
leads to improvements in students’ knowledge/skills (i.e., test scores). We can use the results from our
sample of students to conclude the impact of this module in general.
Let x = test score before the module, y = test score after the module.
Student Pre-Module Score Post Module Score Difference
1 18 22 4
2 21 25 4
3 16 17 1
4 22 24 2
5 19 16 -3
6 24 29 5
7 17 20 3
8 21 23 2
9 23 19 -4
10 18 20 2
11 14 15 1
12 16 15 -1
13 16 18 2
14 19 26 7
15 18 18 0
16 20 24 4
17 12 18 6
18 22 25 3
19 15 19 4
20 17 16 -1
𝑑̅ 2.05
sd 2.837
1) State the Null and alternate hypothesis
Null Hypothesis: The module has no significant impact on students' knowledge/skills, and
there is no improvement in test scores after completing the module.
Ho:  after -  before  0
Alternative Hypothesis: The module significantly improves students' knowledge/skills and
increases test scores after completing the module.
Ha:  after -  before > 0
2) Choose the alpha value,  = 0.05
3) Find the critical value from the t-table with  = 0.05 and df = 20 - 1 = 19
4) Tcritical = 1.729
5) Calculating the mean and standard deviation of the differences gives:
𝑑̅ = 2.05 and sd = 2.837. Therefore,
𝑆 2.837
𝑆𝐸(𝑑̆ ) = 𝑑 = = 0.634
√𝑛 √20
So, we have
𝑑̅ 2.05
𝑡= = = 3.241
𝑆𝐸(𝑑̆ ) 0.634

The computed t is 3.241, which is greater than the critical t value, the null hypothesis is
rejected. Therefore, there is strong evidence that, on average, the module does lead to
improvements.

a) Unpaired t-test
An unpaired t-test is used to compare two population means; the procedure is as follows:
1) State the null hypothesis and alternate hypothesis.
2) Choose an alpha level, 
3) Find the critical value of t in a t table.
4) Calculate the t-test statistic
• Calculate the difference between the two sample means x1 – x2
• Calculate the pooled standard deviation,

(𝑛1 − 1)𝑠1 2 + (𝑛2 − 1)𝑠2 2


𝑠𝑝 = √
𝑛1 + 𝑛2 − 2

1 1
𝑆𝐸(𝑥̆1 − 𝑥̆2 ) = 𝑠𝑝√ +
𝑛1 𝑛2

• Calculate the t-statistic, which is given by


𝑥̅1 − 𝑥̅2
𝑡=
𝑆𝐸(𝑥̅1 − 𝑥̅2 )
Under the null hypothesis, this statistic follows a t-distribution with n2 + n1 − 2 degrees of
freedom.
5) Use t-distribution tables to compare your value for T to the tn−1 distribution. This will give the
p-value for the unpaired t-test. Interpret the result

For the unpaired t-test to be valid the two samples should be roughly normally distributed and
should have approximately equal variances. If the variances are obviously unequal, we must use:

𝑠1 2 𝑠2 2
𝑆𝐸(𝑥̆1 − 𝑥̆2 ) = 𝑠𝑝√ +
𝑛1 𝑛2
Then,
𝑥̅1 − 𝑥̅2
 𝑁 (0,1) if n1 and n2 are reasonably large
𝑆𝐸(𝑥̅1 − 𝑥̅2 )
Else,
2
𝑠 2 𝑠 2
𝑥̅1 − 𝑥̅2 ( 1 + 2 )
𝑛1 𝑛2
 𝑡 𝑛′ , 𝑤ℎ𝑒𝑟𝑒 𝑛′ = 2 2 rounded down to the nearest integer
𝑆𝐸(𝑥̅1 − 𝑥̅2 ) 𝑠 2 𝑠 2
( 1 ) ( 2 )
𝑛1 𝑛2
+
(𝑛1 − 1) (𝑛2 − 1)
Example:
A U.S. magazine, Consumer Reports, carried out a survey of the calorie and sodium content of
a number of different brands of hotdog. There were three types of hotdog: beef, ’meat‘ (mainly pork
and beef but can contain up to 15% poultry) and poultry. The results below are the calorie content of
the different brands of beef and poultry hotdogs.

Beef hotdogs:
186, 181, 176, 149, 184, 190, 158, 139, 175, 148, 152, 111, 141, 153, 190, 157, 131, 149, 135, 132
Poultry hotdogs:
129, 132, 102, 106, 94, 102, 87, 99, 170, 113, 135, 142, 86, 143, 152, 146, 144

Before carrying out a t-test you should check whether the two samples are roughly normally
distributed. This can be done by looking at histograms of the data. In this case there are no outliers and
the data look reasonably close to a normal distribution; the t-test is therefore appropriate. So, first we
need to calculate the sample mean and standard deviation in each group:

Group Sample size Sample mean Sample standard deviation


n1 20 x1 156.85 s1 22.64
Beef
n2 17 x2 122.47 s2 25.48
Poultry

1) State the null hypothesis and alternate hypothesis.


Null Hypothesis: There is no significant difference in the calorie content between beef and poultry
hotdogs.
Ho: beef = poultry
Alternative Hypothesis: There is a significant difference in the calorie content between beef and
poultry hotdogs.
Ho: beef  poultry
2) Choose an alpha level,  = 0.05
3) Find the critical value of t in a t table, df = 37-2 = 35, tcritical = 2.03 (interpolation)
4) Calculate the t-test statistic
• Calculate the difference between the two sample means x1 – x2 = 156.85 − 122.47 = 34.38
• Calculate the pooled standard deviation,

(𝑛1 − 1)𝑠1 2 + (𝑛2 − 1)𝑠2 2 (19)(22.64)2 + (16)(25.48)2


𝑠𝑝 = √ =√ = 23.98
𝑛1 + 𝑛2 − 2 35
1 1 1 1
𝑆𝐸(𝑥̆1 − 𝑥̆2 ) = 𝑠𝑝√ + = 23.98√ + = 7.91
𝑛1 𝑛2 20 17

• Calculate the t-statistic, which is given by


𝑥̅1 − 𝑥̅2 34.38
𝑡= = = 4.346
𝑆𝐸(𝑥̅1 − 𝑥̅2 ) 7.91
The computed t is 4.346 which is greater than the critical t value, the null hypothesis is
rejected. Therefore, there is strong evidence that the calorie content of poultry hotdogs is lower
than the calorie content of beef hotdogs.

Z Test

Z-test is a statistical method used for inferential analysis, specifically when comparing means of
two large data samples with a known standard deviation. It is applicable to populations that follow a
normal distribution and is commonly employed when the sample sizes are larger than 30. The test can
be utilized in two ways: the 1-sample analysis helps determine if a population mean differs from a
hypothesized value, while the 2-sample version assesses whether two population means differ.
Additionally, Z-tests are effective for comparing group means in statistical analysis. Further, the z-test
definition stresses an important assumption—the sample data is a normal distribution. A given sample
is normally distributed, and the external factor has no influence.

The Formula for Z-Test (Single Population)

𝑥̅ − 
𝑍 𝑡𝑒𝑠𝑡 = 
√𝑛
Where:
𝑥̅ – mean of sample
 - mean of population
 - standard deviation of the Population
n – number of samples
The Formula for Z-Test (Two Populations)
𝑝̂1 − 𝑝̂2
𝑍=
1 1
√𝑝̂1 − 𝑞̂1 ( + )
𝑛1 𝑛2
𝑥1 + 𝑥2
𝑝̂ =
𝑛1 + 𝑛2
𝑞̂ = 1 − 𝑝̂

Where:
P1 and P2 are the proportions
n1 and n2 are the populations
x1 and x2 are the samples
Note: The alpha level, /2. Dividing  by 2 is a standard practice in two-tailed hypothesis testing to
ensure that both extremes of the distribution are considered, and it helps maintain the desired
overall significance level for the test.
Calculating a Z-test requires the following steps:
1) State the null hypothesis and alternate hypothesis.
2) Choose an alpha level.
3) Find the critical value of z in a z table.
4) Calculate the z-test statistic (see below).
5) Compare the test statistic to the critical z value and decide whether to support or reject the null
hypothesis.
6) Interpret the result

Example:
The Dean claims that students in the College are above average in intelligence; a random
sample of 30 students IQ scores have a mean score of 112.5, and the mean population IQ is 100 with
a standard deviation of 15. Is there sufficient evidence to support the Deans claim?
Given:
𝑥̅ – mean of sample = 112.5
 - mean of population = 100
 - standard deviation of the population = 15
n – number of samples = 30
𝑥̅ −  ̅̅̅̅̅̅̅
112.5 − 100
𝑍 𝑡𝑒𝑠𝑡 =  = = 4.56
15
√𝑛 √30
Procedure:
1) State the Null and Alternative hypothesis of the statement.
Null Hypothesis (Ho): The mean IQ of students in the College is equal to the mean population IQ.
(Meaning, the null hypothesis assumes that there is no significant difference in the mean
population IQ)
Ho:  = 100
Alternative Hypothesis (Ha): The mean IQ of students in the College is above the mean population.
(Meaning, the alternative hypothesis suggests that the mean IQ of the students in the College is
greater than the mean population IQ)
Ha:  > 100
2) Choose the alpha level,  = 0.05
3) Find the critical value of z in a z table, Zcrit = 1.645
4) Compute Z

𝑥̅ −  ̅̅̅̅̅̅̅ − 100
112.5
𝑍 𝑡𝑒𝑠𝑡 =  = = 4.56
15
√𝑛 √30
5) The computed z-value (Zcomp = 4.56) is greater than the critical value (Zcrit = 1.645), therefore, reject
the null hypothesis.
6) The IQ of the students in the College is above average.

Z-test with Two or More Populations


Problem: Five groups of researchers decided to test whether there is no difference between the
properties of dismissals to cases brought affecting firms in their home districts.
Judicial 0.0667 0.9333 0.027226 3.5880 ± 1.96
Accept Significant
Other Committee 0.0667 0.9333 0.010087 1.6860 ± 1.96 Retain Significant

7) Conclusion:
1 The study conducted by the Internal Affairs Committee showed that the properties of
dismissals to cases brought does not affect firms in the home districts.
2 The study conducted by the Education Committee showed that the properties of
dismissals to cases does not affect firms in the home districts.
3 The study conducted by the ICT Committee showed that the properties of dismissals to
cases brought affect firms in the home districts.
4 The study conducted by the Juducial Affairs Committee showed that the properties of
dismissals to cases brought affect firms in the home districts.
5 The study conducted by the other committee showed that the properties of dismissals
to cases brought does not affect firms in the home districts.

CHI – SQUARE
The Chi-square test is a hypothesis test used to determine whether there is a relationship
between two categorical variables. The chi-square test checks whether the frequencies occurring in
the sample differ significantly from the frequencies one would expect. Thus, the observed frequencies
are compared with the expected frequencies and their deviations are examined. Categorical
variables are, for example, a person's gender, preferred newspaper, frequency of television viewing,
or their highest level of education.

Figure 3. The Chi-square test is used to investigate whether there is a relationship between gender and the
highest level of education.

Applications of the Chi-Square Test


There are various applications of the Chi-square test, it can be used to answer the following
questions:
1) Independence test
Are two categorical variables independent of each other? For example, does gender have an
impact on whether a person has a Netflix subscription or not?
2) Distribution test
Are the observed values of two categorical variables equal to the expected values? One
question could be, is one of the three video streaming services Netflix, Amazon, and Disney
subscribed to above average?
3) Homogeneity test
Are two or more samples from the same population? One question could be whether the
subscription frequencies of the three-video streaming services Netflix, Amazon and Disney differ in
different age groups.

Example
A study investigates whether a new drug reduces the incidence of a certain disease. Two groups
of patients are considered: one receiving the new drug and the other receiving a placebo.

Null Hypothesis (H0): There is no difference in disease incidence between the new drug and
the placebo groups.
Alternative Hypothesis (H1): There is a difference in disease incidence between the new drug
and the placebo groups.
Result: If the p-value is less than the significance level (usually 0.05), the null hypothesis is rejected,
indicating a significant difference in disease incidence between the
two groups.
Conclusion: In this hypothetical example, the calculated p-value is 0.026, which is less than 0.05.
Therefore, we reject the null hypothesis and conclude that the new drug significantly reduces the
incidence of the disease compared to the placebo.

Calculate chi-squared
The chi-squared value is calculated with the equation:

𝑛
(𝑂𝑘 − 𝐸𝑘 )2
𝑥2 =∑
𝐸𝑘
𝑘=1
Where:
Ok – Observed frequency
Ek – Expected frequency

Example:

Observed Frequency Expected Frequency


Category A Category B Male Female
10 13 Category A 9 11
Category A
13 14 Category B 12 13
Category B

Using the equation:

(10 − 9)2 (13 − 11)2 (13 − 12)2 (14 − 13)2


𝑥2 = + + + = 0.635
9 11 12 13
After calculating chi-squared the number of degrees of freedom df is needed.
𝑑𝑓 = (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑜𝑤𝑠 − 1)𝑥(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 − 1) = 1
𝑑𝑓 = (2 − 1)𝑥(2 − 1) = 1
Where:
p: number of lines
q: number of columns

From the table of the chi-squared distribution for a significance level of 5 % and a df of 1, this
results in 3.841. Since the calculated chi-squared value is smaller, there is no significant difference. As
a prerequisite for this test, please note that all expected frequencies must be greater than 5.

CHI-SQUARE TEST OF INDEPENDENCE


The Chi-Square Test of Independence is used when two categorical variables are to be tested
for independence. The aim is to analyze whether the characteristic values of the first variable are
influenced by the characteristic values of the second variable and vice versa.

Example:
Question: Does gender have an influence on whether a person has a YouTube subscription or not? For
the two variables gender (male, female) and has Netflix subscription (yes, no), it is tested whether they
are independent. If this is not the case, there is a relationship between the characteristics.

The research question that can be answered with the Chi-square test is: Are the characteristics
of gender and ownership of a Netflix subscription independent of each other?

In order to calculate the chi-square, an observed and an expected frequency must be given.
In the independence test, the expected frequency is the one that results when both variables are
independent. If two variables are independent, the expected frequencies of the individual cells are
obtained with the equation below.

𝑅𝑜𝑤𝑆𝑢𝑚(𝑖)𝑥𝐶𝑜𝑙𝑢𝑚𝑛𝑆𝑢𝑚(𝑗)
𝑓(𝑖, 𝑗) =
𝑁

Where i and j are the rows and columns of the table respectively.

For the fictitious Netflix example, the following tables could be used. On the left is the table with
the frequencies observed in the sample, and on the right is the table that would result if perfect
independence existed.

Observed Frequency Expected Frequency if Independent


Male Female Male Female
10 13 YouTube Yes (23 𝑥 25) (23 𝑥 27)
YouTube Yes = 11.06 = 11.94
52 52
15 14 YouTube No (29 𝑥 25) (29 𝑥 27)
YouTube No = 13.94 = 15.06
52 52
The Chi- Square can be calculated

(10 − 11.06)2 (13 − 11.94)2 (15 − 13.94)2 (14 − 15.06)2


𝑥2 = + + + = 0.36
11.06 11.94 13.94 15.06

From the Chi-square table read the critical value again and compare it with the result.
The assumptions for the Chi-square independence test are that the observations are from a
random sample and that the expected frequencies per cell are greater than 5.

CHI-SQUARE DISTRIBUTION TEST

If a variable is present with two or more values, the differences in the frequency of the individual
values can be examined.

The Chi-square distribution test, or Goodness-of-fit test, checks whether the frequencies of the
individual characteristic values in the sample correspond to the frequencies of a defined distribution.
In most cases, this defined distribution corresponds to that of the population. In this case, it is tested
whether the sample comes from the respective population.

For market researchers it could be of interest whether there is a difference in the market
penetration of the three-video streaming services YouTube, Netflix and NBA between Manila and the
whole of Philippines. The expected frequency is then the distribution of streaming services throughout
Philippines and the observed frequency results from a survey in Manila. In the following tables the
fictitious results are shown:

Observed Frequency in Manila Expected Frequency (all Philippines):


Frequency Frequency

YouTube 25 YouTube 23

Netflix 29 Netflix 26

NBA 13 NBA 16

Others or None 20 Others or None 22

Using the equation:

(25 − 23)2 (29 − 26)2 (13 − 16)2 (20 − 22)2


𝑥2 = + + + = 2.389
23 26 16 22

CHI-SQUARE HOMOGENEITY TEST

The Chi-square homogeneity test can be used to check whether two or more samples come
from the same population? One question could be whether the subscription frequency of three video
streaming services Netflix, Amazon and Disney differ in different age groups. As a fictitious example, a
survey is made in three age groups with the following result:
Observed Frequency

Age 15 - 25 26 - 35 36 -45

YouTube 25 23 20

Netflix 29 30 33

NBA 11 13 12

Others or None 16 24 26

As with the Chi-square independence test, this result is compared with the table that would
result if the distributions of Streaming providers were independent of age.

Effect size in the Chi-square test


So far, it has been determined whether the null hypothesis can be rejected, but it is very often
of great interest to know how strong the relationship between the two variables is. This can be
answered with the help of the effect size.
In the Chi-square test, Cramér's V can be used to calculate the effect size. Here a value of 0.1
is small, a value of 0.3 is medium and a value of 0.5 is large.

Effect size Cramér’s V


Small 0.1
Medium 0.3
Large 0.5

ANALYSIS OF VARIANCE, ANOVA


An analysis of variance (ANOVA) tests whether statistically significant differences exist between
more than two samples. For this purpose, the means and variances of the respective groups are
compared with each other. In contrast to the t-test, which tests whether there is a difference between
two samples, the ANOVA tests whether there is a difference between more than two groups.

There are different types of analysis of variance, the most common are the one-way and two-
way analysis of variance, each of which can be calculated either with or without repeated
measurements.
• One-factor (or one-way) ANOVA
• One-factor ANOVA with repeated measurements
• Two-factors (or two-way) ANOVA
• Two-factors ANOVA with repeated measurements
Difference between One-Way and Two-Way ANOVA
The one-way analysis of variance only checks whether an independent variable has an
influence on a metric dependent variable. This is the case, for example, if it is to be examined whether
the place of residence (independent variable) has an influence on the salary (dependent variable).
However, if two factors, i.e. two independent variables, are considered, a two-factor analysis of
variance must be used.
Two-factor analysis of variance tests whether there is a difference between more than two
independent samples that are split between two variables or factors.

One-factor ANOVA Two-factors ANOVA


Does a person's place of residence Does a person's place of residence (1st
(independent variable) influence independent variable) and gender (2nd
his or her salary? independent variable) affect his or her salary?
Analysis of Variance With and Without Repeated Measures
Depending on whether the sample is independent or dependent, either analysis of variance
with or without repeated measures is used. If the same person was interviewed at several points in time,
the sample is a dependent sample and analysis of variance with repeated measurements is used.

One - Factor ANOVA


The one-way analysis of variance is an extension of the t-test for independent groups. With the
t-test only a maximum of two groups can be compared; this is now extended to more than two groups.
For two groups (k = 2), the analysis of variance is therefore equivalent to the t-test. The independent
variable is accordingly a nominally scaled variable with at least two characteristic values. The
dependent variable is on a metric scale. In the case of the analysis of variance, the independent
variable is referred to as the factor.

EXAMPLE:
With the help of the dependent variable, e.g. highest educational qualification with the three
characteristics Group 1, Group 2 and Group 3 should be explained as much variance of the
dependent variable salary as possible. In the graphic below, under A) a lot of variance can be
explained with the three groups and under B) only very little variance.

Accordingly, in case A) the groups have a very high influence on the salary and in case B) they
do not.

In the case of A), the values in the respective groups deviate only slightly from the group mean,
the variance within the groups is therefore very small. In the case of B), however, the variance within
the groups is large. The variance between the groups is the other way around; it is large in the case of
A) and small in the case of B). In the case of B) the group means are close together, in the case of A)
they are not.
Variance within the groups Variance between group means
Case A) Small Large
Case B) Large Small
ANALYSIS OF VARIANCE HYPOTHESES
The null hypothesis and the alternative hypothesis result from a one-way analysis of variance as
follows:
• Null hypothesis H0: The mean value of all groups is the same.
• Alternative hypothesis H1: There are differences in the mean values of the groups.

The results of the ANOVA can only make a statement about whether there are differences
between at least two groups. However, it cannot be determined which groups are exactly different. A
post-hoc test is needed to determine which groups differ. There are various methods to choose from,
with Duncan, Dunnet C and Scheffe being among the most common methods.

Assumptions for one-way analysis of variance


• Scale level: The scale level of the dependent variable should be metric that the independent
variable is nominally scaled
• Homogeneity: The variances in each group should be roughly the same. This can be checked
with the Levene test.
• Normal distribution: The data within the groups should be normally distributed. This means that
the majority of the values are in the average range, while very few values are significantly below
or significantly above. If this condition is not met, the Kruskal-Wallis test can be used.

ANOVA CALCULATION BY HAND


Procedure
Step 1: State the Null and Alternate Hypothesis.
Step 2: Calculate the group means and the overall mean.
Step 3: Calculate SSR.
Calculate the regression sum of squares (SSR) using the following formula:
𝑛 ∑(𝑥𝑖 − 𝑥̅ )2
where:
n: the sample size of group j
Σ: a Greek symbol that means “sum”
xj: the mean of group j
𝑥̅ : the overall mean
Step 4: Calculate SSE.
Calculate the error sum of squares (SSE) using the following formula:
2
∑(𝑥𝑖𝑗 − 𝑥𝑗 )
where:
Σ: a greek symbol that means “sum”
Xij: the ith observation in group j
Xj: the mean of group j
Step 5: Calculate SST.

SST = SSR + SSE

Step 6: Fill in the ANOVA table.


ANOVA Table
Source of
Sum of Squares (SS) df Mean Squares S F
Variance
Treatment
Error
Total

Step 7: Determine the value of alpha  and the degrees of freedom, df


Step 8: Determine the Critical Value of P
Here is how various numbers are calculated in the table:
df Treatment k-1
df error n-k
df total: n-1

MS treatment: SSR / df treatment


MS error: SSE / df error
F: MS treatment / MS error
Note: n = total observations, k = number of groups
Step 9: Interpret the results.
To determine if the F is a statistically significant result, compare the Computed F to the F critical
value found in the F distribution table with the following values:
 (significance level)
df1 (df of treatment) (in row)
df2 (df of error) (in column)

If the computed F is less than the F critical value in the F distribution table, we fail to reject the
null hypothesis. This means we don’t have sufficient evidence to say that there is a statistically
significant difference between the mean scores of the groups.

TUKEYS RANGE TEST


Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or
Tukey's HSD test, is a single-step multiple comparison procedure and statistical test. It can be used to
find means that are significantly different from each other.

Example of One-Way ANOVA


ONE WAY ANALYSIS OF VARIANCE FOR ARROWROOT STARCH DRIED AT DIFFERENT DRYING DAYS
(Part of the Master of Engineering Thesis of Engr. Orley G. Fadriquel)
1) The Hypothesis
• The null hypothesis, Ho: There is no significant difference among the means of the samples
dried at 3, 4, and 5 drying days.
• The alternative hypothesis, Ha: There is significant difference among the means of the samples
dried at 3, 4, and 5 drying days
Moisture Content Result of Test
Sample A Sample B Sample C
Trial 1 5.82 5.37 5.36
Trial 2 5.46 5.61 5.55
Trial 3 6.04 5.6 5.47
Trial 4 6.09 5.49 5.48
Trial 5 5.95 5.31 5.47
Group Means 5.872 5.476 5.466
Overall Mean 5.602
Compute for the Regression Sum of Squares (SSR)
𝑆𝑆𝑅 = 𝑛 ∑(𝑥𝑖 − 𝑥̅ )2
𝑆1 = 5(5.872 − 5.602)2 = 0.3573
𝑆2 = 5(5.476 − 5.602)2 = 0.0828
𝑆3 = 5(5.466 − 5.602)2 = 0.0961
SSR = 0.5362
Calculate Error Sum of Squares (SSE)
2
𝑆𝑆𝐸 = ∑(𝑥𝑖𝑗 − 𝑥𝑗 )
SSE Sample A = (5.82 – 5.872)2 + (5.46 – 5.872)2 + (6.04 – 5.872)2 +(6.09 – 5.872)2 +(5.95 – 5.872)2
SSE Sample A = 0.2543
SSE Sample A = (5.37 – 5.476)2 + (5.61 – 5.476)2 + (5.6– 5.476)2 +(5.49 – 5.476)2 +(5.31 – 5.476)2
SSE Sample B = 0.0723
SSE Sample A = (5.36 – 5.466)2 + (5.55 – 5.466)2 + (5.47 – 5.466)2 +(5.48 – 5.466)2 +(5.47 – 5.466)2
SSE Sample C = 0.0185
SSE Total = 0.3451
Calculate SST
SST = SSR + SSE = 0.5362 + 0.3451= 0.8814

Degrees of Freedom, df
Total dft= Total Number of items - 1
15 – 1 = 14
Between - Groups df = Number of columns - 1
dfr = K - 1
3–1=2
Within - Groups df = Number of columns - 1
dfe = Total df - Between column df
14 – 2 = 12
Compute the Mean Sum of Squares, MSS
Mean Regression Sum of Squares (MSSR)
𝑆𝑆𝑅 0.5362
𝑀𝑆𝑅 = =
𝑑𝑓𝑟 2
MSR = 0.26813
Mean Sum of Squares of Error MSSE
𝑆𝑆𝐸 0.3451
𝑀𝑆𝐸 = =
𝑑𝑓𝑒 12
MSE = 0.02876

Mean Total Sum of Squares within (MSST)


𝑆𝑆𝑇 0.8814
𝑀𝑆𝑇 = =
𝑑𝑓𝑡 14
MST = 0.06296
Compute for F
𝑀𝑆𝑅 0.26813
𝐹= = = 9.323
𝑀𝑆𝐸 0.02876

ANOVA Table of the Three Samples Subjected to Different Trials

Source of Variance Sum of Squares df MSS = SS/df F


Samples (Between
0.53625 2 0.26813
Groups) 9.323
Error (Within Groups) 0.34512 12 0.02876

Total 0.88137 14 0.06296

We now compare the computed F value with the tabular F value which is taken from the table
on critical values of F. This value is in the intersection between the df of the MSSR (df = 2) and the df of
the MSSE (drf = 12). From the table at dfe = 2 and df = 12, at 5% significant level is 3.88.
Since the F test statistic in the ANOVA table is greater than the F critical value in the F distribution
table, we reject the null hypothesis and retain the alternative hypothesis. This means we have sufficient
evidence to say that there is a statistically significant difference between the mean moisture content
of arrow root starch dried in 3, 4, and 5 drying days.
Tukey’s Test
In finding which of the means are significantly different from each other Tukey’s test was
conducted using the online calculator.

The means of the following pairs are significantly different: The Means of Sample A and Sample
B and the Means of Sample A and Sample C.
References:

DATAtab Team (2023). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL
https://datatab.net
Kumari, Kajal (May 18, 2022). Hypothesis Testing for Data Science and Analytics. Retrieved on August
2023 from https://www.analyticsvidhya.com/blog/2022/05/hypothesis-testing-for-data-
science-and-analytics/
Madhuri Thakur (July 26, 2023). Z-test statistics formula. EDUCBA. Retrieved from on November 7, 2023
from https://www.educba.com/z-test-statistics-formula/.
Meena, Subhash (May 17, 2023). Difference Between Z-Test and t-Test. Retrieved on August 2023 from
https://www.analyticsvidhya.com/blog/2020/06/statistics-analytics-hypothesis-testing-z-test-t-
test/
Muwaya, Monica Seles (June 23, 2022). Hypothesis Testing in Inferential Statistics. Retrieved on August
2023 from https://www.analyticsvidhya.com/blog/2022/06/hypothesis-testing-in-inferential-
statistics

You might also like