0% found this document useful (0 votes)
42 views37 pages

6hypothesis Testing

Module 6 covers hypothesis testing, emphasizing its importance in data analysis and research. It explains concepts such as null and alternative hypotheses, confidence levels, significance levels, and the two types of errors (Type I and Type II). The module also discusses various statistical tests and approaches for hypothesis testing, including critical value and p-value methods.

Uploaded by

jamespacilan06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views37 pages

6hypothesis Testing

Module 6 covers hypothesis testing, emphasizing its importance in data analysis and research. It explains concepts such as null and alternative hypotheses, confidence levels, significance levels, and the two types of errors (Type I and Type II). The module also discusses various statistical tests and approaches for hypothesis testing, including critical value and p-value methods.

Uploaded by

jamespacilan06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

MODULE 6

HYPOTHESIS TESTING
Prepared by:

Asst. Prof. ISRAEL P.


PENERO
Course Facilitator
Another equally important lesson in data
analysis is on how to test a research problem that could also
be a research hypothesis. This will be the backbone of the
research whether it will be accepted or rejected.

Have you heard about research with regards to the


efficacy of COVID vaccine? A group of medical researchers
do a series of trials in order to test the efficacy of the
vaccine made by a particular industry and most probably
their research problem would be “How will the series of
trials of the vaccine be effective to the people?” and with
this question, they will come up with a hypothesis.
Learning Outcomes:
At the end of this module, the student should be able
to:

1. State and explain what hypothesis is as well as


statistical hypothesis and hypothesis testing.
2. Discuss and compare the two types of hypothesis
testing.
3. Demonstrate the importance of the confidence level,
the level of significance and the approaches of
hypothesis testing.
4. Compare the one-tailed test to two-tailed test in
hypothesis testing.
5. Explain the type 1 and type 2 error and know when
WHAT IS HYPOTHESIS?

• A hypothesis is a specific, testable prediction.


• A hypothesis is a premise that is to be claimed and to be tested.
• It is a supposition or explanation (theory) that is provisionally
accepted in order to interpret certain events or phenomena, and to
provide guidance for further investigation. A hypothesis may be
proven correct or wrong, and must be capable of refutation.
• A hypothesis is a tentative statement about the relationship
between two or more variables. A hypothesis is a specific, testable
prediction about what you expect to happen in your study.
WHAT IS STATISTICAL
HYPOTHESIS?

A statistical hypothesis is an assertion or conjecture


concerning one or more populations.
WHAT IS STATISTICAL HYPOTHESIS
TESTING?

Hypothesis testing or significance testing is a method for testing a


claim or hypothesis about a parameter in a population, using data measured
in a sample.
In this method, we test some hypotheses by determining the
likelihood that a sample statistic could have been selected, if the hypothesis
regarding the population parameter were true.
WHAT ARE THE TWO TYPES
OF HYPOTHESES?

A. Null Hypothesis ( Ho)

The null hypothesis (H0), stated as the null, is a statement about a population parameter, such as
the population mean, that is assumed to be true.

The null hypothesis is a starting point. Being the starting point of the testing process, it serves as
our working hypothesis. It must always express the idea of non-significance of difference or of a
relationship. This is commonly called the no difference or no relationship hypothesis.

B. Alternative Hypothesis ( Ha)

An alternative hypothesis (Ha) is a statement that directly contradicts a null hypothesis by


stating that the actual value of a population parameter is less than, greater than, or not equal to the value
stated in the null hypothesis. It is a type of hypothesis that has significance of difference or of
relationship. It expresses an existence of difference or of relationship and it is the opposite of null
hypothesis.
Illustration 1: Research Problem

Is there any significant difference between the performance of the


male and female Computer Science students in Statistics and Probability?

Now, the null and alternative hypothesis is written below:


Second
Ho: There is no significant difference between the performance of the male and female
Computer Science students in Statistics and Probability.

Ha: There is a significant difference between the performance of the male and female
Computer Science students in Statistics and Probability.

Note: With this illustration, once you reject your null hypothesis, meaning
you will be accepting your alternative hypothesis but if you failed to reject
your null hypothesis, meaning that you will not accept any more your
alternative hypothesis.
Example 2:
Research Problem
Is the proportion of the performance of
freshmen students in College Algebra different
from 0.50 in a sample of 100 freshmen students
where 60 are males?
H1: p = 0.50; Ha: p ≠ 0.50
CONFIDENCE LEVEL (C)

Confidence level is a measure of the reliability of a result. If you want to


have a 95% level of confidence; this means that there is a probability of at least
0.95 that the result is reliable and only 0.05 will commit an error in making a
decision.
LEVEL OF SIGNIFICANCE (Sig.); α

Level of significance, or significance level denoted by the symbol α refers to a criterion of judgment
upon which a decision is made regarding the value stated in a null hypothesis. The criterion is based on the
probability of obtaining a statistic measured in a sample if the value stated in the null hypothesis were true.

The level of significance is set by the researcher at the beginning of the research.

In hypothesis testing, the commonly used values for level of significance, denoted by α, are 0.05 and 0.01.
This means that we are willing to commit an error of 5% or 1% as the case may be. This implies that we are
95% or 99% confident of making the correct decision.

Usually, the researcher uses either the 0.05 level or sometimes called the 5% level or the 0.01 level or 1%
level. The lower the significance level, the more the data must diverge from the null hypothesis to be
significant. Therefore, the 0.01 level is more conservative than the 0.05 level.
The Hypothesis Testing Approaches
a. The critical value approach
One way of deciding whether or not to reject Ho is by comparing
the value of the test statistics with the critical value. The critical value
is the value that the test statistics (z-test or t-test) must exceed in
order for the null hypothesis to be rejected. The decision rule in this
process is:
i) Reject the null hypothesis (Ho) if |z-compute| is greater
than or equal to |z-critical|; or

ii) Reject the null hypothesis (Ho) if |t-compute| is greater


than or equal to |t-critical|;
b. The p-value approach

The p-value as a tool in decision-making is now widely used by the researcher. It is utilized
as an alternative and equivalent way of conducting tests of significance.

The p-value is the probability of getting a sample statistic or a mean extreme sample
statistics in the direction of Ha when Ho is true. The p value is the actual area under the standard
normal distribution curve.

Here, we compare p-value with the level of significance (α). The decision rule in this
process is:

i) “reject Ho if p-value is less than or equal to α”


“not to reject Ho if p-value is greater than α”

or

ii) “ if p ≤ α ; reject the null hypothesis”


“if p > α; fail to reject the null hypothesis”
ONE-TAILED AND TWO-TAILED TEST
A. What is a one-tailed test?
A one-tailed test is a statistical test in which the critical area
or region of a distribution is one-sided of the mean. It is either
greater than or less than a certain value, but not both. In other
words, a one-tailed test may be either a right-tailed test or left
tailed test, depending on the direction of the inequality of the
alternative hypothesis.
B. What is a two-tailed test?
A two-tailed test is a statistical test in which the critical area
or region of a distribution is two-sided and tests whether a sample is
greater than or less than a certain range of values. If the sample
being tested falls into either of the critical areas, the alternative
hypothesis is accepted instead of the null hypothesis.
THE TYPE 1 AND TYPE 2 ERRORS

How do we determine whether to reject the null hypothesis? It begins the level of significance α,
which is the probability of the Type I error.

When doing hypothesis testing, two types of mistakes may be made and we call them Type I
error and Type II error.

The given table below is the possible outcome of a hypothesis test.


Example 1

Scenario: Building Inspection


An inspector has to choose between certifying a building as safe or saying that the
building is not safe. There are two hypotheses:

1. Building is not safe.


2. Building is safe.
How will you set up the hypotheses? Remember to set it up so that Type I error is
more serious.

Ho: Building is not safe


Ha : Building is safe
Example 2:
Scenario: Giving final verdict to an accused criminal in a trial court.

The judge wants to give his verdict whether the defendant committed or not the crime. There are two
hypotheses:

Ho: The defendant did not commit the crime.


Ha: The defendant committed the crime.

How will you set up the hypotheses? Remember to set it up so that Type I error is more serious.

Type I error: Convicting a person who, in reality, did not commit the crime.
Type II error: Acquitting a person who, in reality did not commit the crime.
So, in other words;

A type I error is rejecting null


hypothesis (Ho) when, in reality, it is
true.
A type II error is failing to reject null
hypothesis (Ho) when, in reality, it is
false.
TEST STATISTICS

The test statistic is a mathematical formula that allows


researchers to determine the likelihood of obtaining sample
outcomes if the null hypothesis were true. The value of the
test statistic is used to make a decision regarding the null
hypothesis. Parametric and Non-Parametric are the two test
to answer the questions what, when, why and how in the
analysis of research problems and every research problem
has its appropriate statistical test. These test statistics could
be used either you will test if there is a significant difference
or relationship between or among variables.
TEST STATISTICS for PARAMETRIC TEST

a. t-test for Independent Samples


b. t-test for Correlated Samples
c. z-test for Two Sample Means
d. z-test for One Sample Group
e. F-test (Analysis of Variance, ANOVA)
f. Pearson Product Moment Coefficient of
Correlation
g. Simple Linear Regression Analysis
TEST STATISTICS for
NONPARAMETRIC TEST
Note that to be
able to use which
1. The Chi-square test
of the following
2. The Wilcoxon Rank-Sum Test you are going to
use, first you need
3. The Kruskal-Wallis Test to know what type
4. The Spearman Rank Order Coefficient of Correlation of variables you
have and you
5. The Sign Test need to test the
6. The Mc Nemar’s Test distribution of the
data set whether it
7. The Friedman Test is normal or non-
8. The Kendall’s Coefficient of Concordance normal.
Here are some test statistics for exploring or testing if there is a significant relationship between
variables whether parametric or non-parametric.
VARIABLES STATISTICAL TESTS/TECHNIQUES
Two Continuous Variables Pearson Correlation
Two Categorical Variables Chi-square; Odds Ratio; Phi-coefficient; Cramer’s V;
Contingency Coefficient; Gamma Coefficient
Continuous or Ordinal Variables Spearman Correlation
Dependent Variable: Linear Regression; Multivariate Adaptive Regression
Continuous Independent Spline; Decision Tree Analysis; Random Forests
Variable: Continuous and/or
Categorical
Dependent Variable: Nominal Logistic Regression; Discriminant Function Analysis;
Categorical Multivariate Adaptive Regression Spline; Decision tree
Independent Variables: Analysis; Random Forests
Continuous and/ or categorical
Dependent Variable: Ordinal Ordinal regression; Multivariate Adaptive Regression
Categorical Spline; Decision Tree Analysis; Random Forests
Independent variable:
Continuous and/or categorical
Dependent Variable: Count Poisson Regression; Multivariate Adaptive
Independent Variable: Regression Spline; Decision Tree Analysis; Random
Continuous and/or categorical
Forests
These are some statistical tests in exploring or to test if there is a significant difference
between groups.

Purpose Parametric Techniques Non-Parametric


Techniques
Testing hypothesis about Two-independent Mann-Whitney U
two independent groups samples t-test Test Wilcoxon Test

Testing Hypothesis about One-way Analysis of Kruskal-Wallis Test


more than two Variance (ANOVA) Median Test
independent groups Jonchheere-
Terpstra Test

Testing Hypothesis about Paired t-test Wilcoxon Test


two related groups Sign Test
McNemar Test
Testing Hypothesis about One-way repeated Friedman Test
three or more related measures ANOVA
groups
Properties of Parametric and Non-Parametric Test Statistics
Parametric Techniques Non-Parametric
Techniques
Distribution of Assumes that a population has Don’t care about distribution.
Population a particular distribution (e.g., Any distribution will do.
normal distribution)

Variances Homogenous (assumed that Equal variances are not an


of the variances are equal or issue. Any will do either it is
Populations not significantly different) homogeneous or
Standard deviation is heterogeneous.
quite similar Standard deviation are quite
different

Typical Data Ratio or Interval Ordinal or Nominal


Data Set Independent Any
Relationship
Usual Mean Median
central
Measure
Outliers Sensitive Not sensitive
Sample Size Large Small
Statistical Powerful Less Powerful*
Power* (if
no
violation on
the
parametric
assumption)
Benefits Can draw more conclusions Simplicity; Less affected by
outliers
Example of Research Questions, its Essential Feature and the Appropriate Statistical Test
(Parametric and Non-Parametric)

Example of Essential Features Parametric Non-parametric


Research Test Test
Question (alternative)

Is there a • Variable of Pearson-r Spearman’s Rho


relationship Interest: Age
between age and and
optimism scores? Optimism Score
• Data can be
summarized
using
scatter plot
• Researcher wants
to test if there is
a
relationship
between Age and
Optimism
Is there a • Variables of • Chi-square test
relationship interest: • Phi-coefficient
between smoking Smoking • Odds ratio
and lung cancer? and Lung Cancer
• Data can be
summarized
using
crosstab
FIVE STEPS IN HYPOTHESIS
TESTING

A) Hypothesis Testing (critical value approach)

1. State the null (Ho) and alternative hypothesis (Ha).


2. Set the criteria for a decision by specifying the level of significance that is to be used. Identify also the
number of cases or entries and the degrees of freedom. 3. Choose the appropriate statistical tool and compute
the statistical value.
4. Make a decision and interpret the result.
5. Construct a conclusion based on the result.

B) Hypothesis Testing (p-value approach)

1. Based on the research problem, make sure you know what the assumption
would be, then state the null (Ho) and alternative hypothesis (Ha).
2. Decide what level of significance or the value of alpha (α) you are going to use. The most commonly
use value of α is 0.01 or 0.05.
3. Determine the appropriate test statistics.
4. Solve for the p-value associated with the test statistic.
5. Make the statistical decision whether you are going to reject or not to reject the Ho and draw the
conclusion based on the result.
MAKING A DECISION BASED ON THE TYPES OF
ERROR

In making a decision, we decide whether to retain or reject the null hypothesis.


Because we are observing a sample and not an entire population, it is possible that a
conclusion may be wrong. Table below shows that there are four decision alternatives
regarding the truth and falsity of the decision we make about a null hypothesis:

1. The decision to retain the null hypothesis could be correct.


2. The decision to retain the null hypothesis could be incorrect.
3. The decision to reject the null hypothesis could be correct.
4. The decision to reject the null hypothesis could be incorrect.

Type I error is the probability of rejecting a null hypothesis that is


actually true. Researchers directly control for the probability of committing this type of
error. Type II error, or beta (β) error, is the probability of retaining a null hypothesis that
is actually false.
DECISION: RETAIN THE NULL HYPOTHESIS

When we decide to retain the null hypothesis, we can be correct or


incorrect. The correct decision is to retain a true null hypothesis. This
decision is called a null result or null finding. This is usually an
uninteresting decision because the decision is to retain what we already
assumed: that the value stated in the null hypothesis is correct.

The incorrect decision is to retain a false null hypothesis. This


decision is an example of a Type II error, or β error. With each test we
make, there is always some probability that the decision could be a Type
II error. In this decision, we decide to retain previous notions of truth
that are in fact false. ANALYSIS
DECISION: REJECT THE NULL HYPOTHESIS

When we decide to reject the null hypothesis, we can be correct or incorrect. The
incorrect decision is to reject a true null hypothesis. This decision is an example of a Type I
error. With each test we make, there is always some probability that our decision is a Type I
error. A researcher who makes this error decides to reject previous notions of truth that are in
fact true. Making this type of error is analogous to finding an innocent person guilty. To
minimize this error, we assume a defendant is innocent when beginning a trial. Similarly, to
minimize making a Type I error, we assume the null hypothesis is true when beginning a
hypothesis test.

Since we assume the null hypothesis is true, we control for Type I


error by stating a level of significance. The level we set, called the alpha level
(symbolized as a), is the largest probability of committing a Type I error that
we will allow and still decide to reject the null hypothesis. This criterion is
usually set at .05 (a = .05), and we compare the alpha level to the p value.
When the probability of a Type I error is less than 5% (p < .05),we decide to
reject the null hypothesis; otherwise, we retain the null hypothesis.
DEGREE OF FREEDOM VS TEST STATISTICS

What do we mean by degree of freedom in Statistics? Let us recall the


formula for variance and standard deviation.

The divisor n – 1 in the formula for variance and the standard deviation is what we
called the degree of freedom. The degree of freedom is the number of variables which are free
to vary and in many statistical problems we are required to determine the number of degrees of
freedom.
The degree of freedom is the number of variables which are free to vary and in many statistical
problems we are required to determine the number of degrees of freedom.

PARAMETRIC DEGREE OF NON DEGREE OF


FREEDOM PARAMETRIC FREEDOM
One Sample t-test n- 1 Chi Square for (r - 1)(c - 1)
Independence
Paired sample n-1 Chi-Square n–1
t-test (paired data Goodness of Fit n is number of
is not categories
independent; 2n)
t-test for two n1+ n2
independent -2
sample
t-test for n-1
correlated sample
One-Way Analysis Between
of Variance Groups;
k-1 W/in Groups
(k-1)-(n-1)
Total; n - 1
Pearson-r n-2
Try this!!!
Formulate Ho and Ha for the following research problem and
tell what appropriate test statistics must be used
1. Is there any significant relationship on the preferred political
party of male and female registered voters in Batangas City?

2. Is there any significant difference on the efficacy of vitamin


given to the different year level of students in Batangas State
University.

3. Is there any significant relationship between sex and their


enrolled program?

4. Is there any significant difference on the final grade of IT 2202


and IT 2203 students in Data Analysis?
Thank You!

You might also like