0% found this document useful (0 votes)
6 views33 pages

Ae9 Final Module

This document outlines the principles of hypothesis testing in inferential statistics, including the formulation of null and alternative hypotheses, the significance level, and the statistical tests to be used. It explains the process of hypothesis testing, types of errors, and methods for assessing normality in data. Additionally, it provides examples and procedures for conducting tests such as the dependent and independent sample t-tests.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views33 pages

Ae9 Final Module

This document outlines the principles of hypothesis testing in inferential statistics, including the formulation of null and alternative hypotheses, the significance level, and the statistical tests to be used. It explains the process of hypothesis testing, types of errors, and methods for assessing normality in data. Additionally, it provides examples and procedures for conducting tests such as the dependent and independent sample t-tests.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

REFERENCES: PUP - DEPARTMENT OF MATHEMATICS AND STATISTICS

MODULE 4: INFERENTIAL STATISTICS


OBJECTIVES: What is HYPOTHESIS TESTING?
After successful completion of this module, you should be
able to: Hypothesis testing is a procedure on sample
✦ Differentiate the null and alternative hypotheses. evidence and probability, used to test claims
✦ Formulates the appropriate null and alternative regarding a characteristic of one or more populations.
hypotheses.

Explain the logic of hypothesis testing. What is HYPOTHESIS?

Assess and test if the data follows a normal distribution.
•A statement or claim regarding a characteristic of
✦ Distinguish between independent and dependent
sampling. one or more populations.
✦ Identify the appropriate test statistics for normally •A preconceived idea, assumed to be true but has to
distributed data. be tested for its truth or falsity.
✦ Conduct test for two categorical variables.

1. State the Null and Alternative Hypothesis


Procedures for Testing Two Types of Hypothesis
Hypothesis 1. Null Hypothesis
• Denoted by
1. State the null and alternative hypothesis. • The statement being tested.
2. Set the level of significance or alpha level (α).
• Assumed true until evidence indicates otherwise.
• Must contain the condition of equality and must be written
with the symbol = , ≤ , or ≥.
3. Determine the test distribution to use.
2. Alternative Hypothesis
4. Calculate test statistic or p - value. • Denoted by
• Statement that must be true if the null hypothesis is false
5. Make statistical Decision • Sometimes referred to as the research hypothesis
• Must contain the condition of equality and must be written
6. Draw Conclusion with the symbol ≠, < or >.
Example Hypothesis: Null Hypothesis: Reminders:
✦ Students who eat and not eat breakfast will perform the same on If you are conducting a research study and you want
a math exam.
✦ Students who experience and not experience test anxiety prior to to use a hypothesis test to support your claim, the
an English exam will get the same scores. claim must be stated in such a way that it becomes
✦ Motorists who talk and not talk on the phone while driving will the alternative hypothesis, so it cannot contain the
get the same errors on a driving course. condition of equality.
Alternative Hypothesis:
✦ Students who eat breakfast will perform better on a math exam Two Types of Alternative Test
than students who do not eat breakfast.
✦ Students who experience test anxiety prior to an English exam 1. One - tailed test
will get higher scores than students who do not experience test ✦ Left tailed

anxiety. ✦ Right tailed


Motorists who talk on the phone while driving will be more likely
2. Two - tailed test

to make errors on a driving course than those who do not talk on


the phone.

2. Set the Level of Significance or Alpha Level (α) Example:


• You should establish a predetermined level of
H0: The defendant is innocent.
significance, below which you will reject the null
hypothesis. Ha: The defendant is not innocent.

• The generally accepted levels are 0.10, 0.05, and 0.01. What happen to the defendant if the jury made type I
• Be as rigorous as possible. and type II error?
Two Types of Error
Answer:
A type I error is like putting an innocent person in
jail.
A type II error is like letting a guilty person go free.
3. Determine the Test Distribution to Use.
Reminders: Determine the appropriate statistical test to
It is important to note that we want to set be used.
( α ) before we start our study because the ✦ Dependent Sample t - Test
Type I error is the more ‘grevious’ error to
make. ✦ Independent Sample t - Test

The smaller (α ) is, the smaller the region


✦ One Way Analysis of Variance
of rejection. (ANOVA) Test
✦ Pearson r
✦ Chi - Square Test

4. Calculate Test Statistic or p - value. Decision Rule:


✦ Using Confidence Interval

Performing statistical analysis using statistical Reject the null hypothesis if the test statistic is not within
software such as Excel, SPSS, R, Minitab, SAS, the range specified by the confidence interval.
etc. ✦ Using Traditional Approach
Reject Ho if the computed value of the test statistic falls in
5. Make Statistical Decision the region of rejection.
✦ Using P-value Approach
✦ Using confidence interval Reject the null hypothesis if the computed p-value is less
than or equal to the set significance level , otherwise do not
✦ Using p-value approach reject the null hypothesis.
Example: If the level of significance (α = 0.05),
✦ Using traditional method P-value Decision
0.01 Reject H0
0.05 Reject H0
0.10 Failed to Reject H0
Traditional Approach One-tailed and Left tailed One-tailed and Right tailed
Ha : μ1 < μ2 Ha : μ1 > μ2
Rejection of region Rejection Region
or critical region is Rejection Region
the set of all values of
the test statistic
which will lead to the -2 0 2 -2 0 2

rejection of H0.
Acceptance Region is Two-tailed
the set of all values of Ha : μ1 ≠ μ2
the test statistic that Rejection Region
Rejection Region
leads the researcher to
retain H0.
-2 0 2

In stating your decision you can use:


✦ Fail to reject the null hypothesis/ Do not reject
Assessing and Testing Normality
the null hypothesis/ Retain the null hypothesis of the Data
✦ Reject the null hypothesis.
To determine if the data is follows a normality
It is important to recognize that we never accept distribution, we can use the graphical or
the null hypothesis. We are merely saying that the numerical method.
sample evidence is not strong enough to warrant Graphical:
rejection of the null hypothesis. Normal Q-Q Plot
6. Draw Conclusion Histogram
Record conclusions and recommendations in a report, Numerical:
and associate interpretations to justify your Shapiro Wilk Test
conclusion or recommendations. Kolmogorov Smirnov Test
How to Check Normality? How to Check Normality?
Histogram plots the observed values against their Q-Q probability plots display the observed values
frequency, states a visual estimation whether the against normally distributed data (represented by the
distribution is bell shaped or not. line).

Hypotheses of Normality Test


Reminders:
The hypotheses used are:
Ho: The sample data follows a normal distribution.
Graphical methods are typically not
Ha: The sample data does not follow a normal
very useful when the sample size is distribution.
small.
When we are testing normality:
• If P value > alpha, it means that the data are
normal.
• If P value ≤ alpha, it means that the data are NOT
normal.
How to Calculate Shapiro - Wilk Test in Excel? STEP 2: Calculate SS as follows:
n
(xi − x̄)
Sample Data 2

SS =
i=1
STEP 1:
Rearrange
the data in
ascending
order.

Use "=DEVSQ( )”
function in excel

∑ i ( n+1−i
STEP 3: Calculate b as follows: b = a x − xi)
i=1

n is the number of
observation
If n is even:
n
m=
2
If n is odd:
n−1
m=
2
Since n is even in this
SS means Sum of Square example, m=8. That’s
why we used a1 to a8
Shapiro - Wilk Table

Taking the ai weights from


the table of Shapiro -Wilk
(based on the value of n)
Note that if n is odd, the median
data value is not used in the
calculation of b.
STEP 4: Calculate the test statistic: b2 STEP 5:
W= Find the value in the table of Shapiro - Will (for a
SS
given value of n) that is closest to W, interpolating if
necessary. This is the p-value for the test.

We choose this
interval in the table of
Shapiro - Wilk,
because our n=16 and
our test statistic
(W=0.955) is within
this interval.
Result

Since the computed p-value is greater than the set


level of significance, we failed to reject the null
hypothesis. Therefore, the sample data follows a
normal distribution.
We used interpolation to get the
p-value of Shapiro-Wilk Test

Inferential Statistics Inference About Two Means


1. Parametric Tests To perform inference on the difference of two
✦ Assume underlying statistical distributions in the data. population means, we must first determine whether the
Therefore, several conditions of validity must be met data come from an independent or dependent sample.
so that the result of a parametric test is reliable.

Apply to data in ratio scale, and some apply to data in Distinguish between Independent and Dependent Sample
interval scale. ✦
A sampling method is independent when the
2. Non Parametric Test individuals selected for one sample do not dictate
✦ Refer to a statistical method in which the data is not
which individuals are to be in a second sample.
required to fit a normal distribution. ✦
A sampling method is dependent when the individual

Most non-parametric tests apply to data in an ordinal selected to be in one sample are used to determine the
scale, and some apply to data in nominal scale. individuals to be in the second sample.
Example:
Determine whether the sample is independent or dependent. Example:
1. An urban economist believes that commute times to Determine whether the sample is independent or
work in the South are less than commute times to work dependent.
in the Midwest. He randomly selects 40 employed 3. A researcher wants to know if the mean
individuals in the south and 45 employed individuals in
length of stay in for-profit hospitals is different
the Midwest and determines their commute times.
Answer: Independent from the mean length of stay in not-for-profit
2. In an experiment conducted in biology class, Prof. hospitals. He randomly selected 20 individuals in
Rhea measured the time required for 12 students to the for-profit hospital and matched them with 20
catch a failing meter stick using their dominant hand individuals in the not-for-profit by diagnosis.
and nondominant hand. The goal of the study was to
Answer:
determine whether the reaction time in an individual’s
Dependent
dominant hand is different from the reaction time in
the non dominant hand. Answer: Dependent

Assumptions
Dependent Sample t - Test 1. Your dependent variable should be measured at
the interval or ratio level (i.e., they are
The dependent sample t-test (also called continuous).
the paired t-test or paired-samples t-test) 2. Your independent variable should consist of two
compares the means of two related groups categorical, "related groups" or "matched pairs”.
to determine whether there is a statistically 3. There should be no significant outliers in the
significant difference between these differences between the two related groups.
means. 4. The distribution of the differences in the
H0 : μ1 ≥ μ2 and Ha : μ1 < μ2 dependent variable between the two related
H0 : μ1 ≤ μ2 and Ha : μ1 > μ2 groups should be approximately normally
H0 : μ1 = μ2 and Ha : μ1 ≠ μ2 distributed.
Example: 1. State the Null and Alternative
A teacher is interested to know if the new learning program Hypothesis
will help to increase the number of correct remembered
Null hypothesis: Ho : μ1 ≥ μ2
words. 10 Subjects learn a list of 50 words. Learning
performance is measured using a recall test. The new learning program will not help to increase
After the first test all subjects the number of correct remembered words.
are instructed how to use the Alternative hypothesis: Ha : μ1 < μ2
learning program and then
The new learning program will help to increase the
learn a second list of 50 words.
Learning performance is again number of correct remembered words.
measured with the recall test. In 2. Set the Level of Significance or Alpha
the following table the number
of correct remembered words Level (α)
are listed for both tests. α = 0.05

3. Determine the Test 4. Calculate Test Statistic or


p - value.
Distribution to Use.
Click “Data”, then click “Data Analysis”
Dependent Variable:
Number of correct remembered words
Independent Variable:
Treatment (Before and After)

Since we are comparing the means of two


related groups, we will use the dependent
sample t-test.
5. Make Statistical Decision
Using p-value approach: If pvalue ≤ α , reject Ho,
otherwise failed to reject Ho

Reject Ho

Exercises:
6. Draw Conclusion Apply the procedure in testing the hypothesis.
Professor Rhea measured the time (in second) required to
catch a falling meter sticks for 10 randomly selected
There is sufficient evidence to support that the new students' dominant hand and non-dominant hand. Professor
learning program help to increase the number of Rhea claims that the reaction time in an individual's
correct remembered words. dominant hand is less than the reaction time in
their non-dominant hand.
Proper Presentation of Results Test the claim at the level
of significance. The data
obtained are presented:
Result
Independent Sample t - Test
The independent sample t - test allows
researchers to evaluate or to compare the mean
difference between two populations using the data
from two separate samples. It is used to test
whether population means are significantly
different from each other, using the means from
randomly drawn samples.
H0 : μ1 ≥ μ2 and Ha : μ1 < μ2
H0 : μ1 ≤ μ2 and Ha : μ1 > μ2
H0 : μ1 = μ2 and Ha : μ1 ≠ μ2

Assumptions Example:
1. Your dependent variable should be measured on a Researchers wanted to know whether there was a difference in
continuous scale (i.e., it is measured at the interval or comprehension among students learning a computer program
ratio level). based on the style of the text. They randomly divided 18
2. Your independent variable should consist of two students into two groups of 9 each. The researchers verified
categorical, independent groups. that the 18 students were similar in terms of educational level,
3. You should have independence of observations, which age, and so on. Group 1 individuals learned the software using
means that there is no relationship between the visual manual (multimodal
observations in each group or between the groups instruction), while Group 2
themselves. individual learned the software
4. There should be no significant outliers. using textual manual (Unimodal
5. Your dependent variable should be approximately instruction). The following data
normally distributed for each group of the independent represent scores the students
variable. received on an exam given to them
6. There needs to be homogeneity of variances. they studied from the manuals.
1. State the Null and Alternative 3. Determine the Test
Hypothesis
Null hypothesis: Ho : μ1 = μ2
Distribution to Use.
There is no significant difference between the scores of the Dependent Variable:
students learning computer program using textual and
visual style. Scores
Alternative hypothesis: Ha : μ1 ≠ μ2 Independent Variable:
There is significant difference between the scores of the
students learning computer program using textual and Style of the Text (Visual and Textual)
visual style.
2. Set the Level of Significance or Alpha Since we are comparing the means of two
Level (α) independent groups, we will use the
α = 0.05 independent sample t-test.

Click “Data”, then click “Data Analysis”

Determine if the
variances are equal
or not equal.
Using p-value approach: If pvalue ≤ α , reject Ho, 4. Calculate Test Statistic or
otherwise failed to reject Ho
Ho: Equal Variances Assumed p - value.
Ha: Equal Variances Not Assumed Click “Data”, then click “Data Analysis”

Failed to
Reject Ho
Since we failed to reject Ho, we will proceed to t-test: Two
Sample Assuming Equal Variances. Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Result
5. Make Statistical Decision 6. Draw Conclusion
Using p-value approach: If pvalue ≤ α , reject Ho, There is no enough evidence to support that
otherwise failed to reject Ho there is a difference in comprehension among
students learning a computer program based on
the style of the text.
Proper Presentation of Results

Failed to
Reject Ho

Exercises:
Apply the procedure in testing the hypothesis.
Twenty participants were given a list of 20 words to
process. The 20 participants were randomly assigned to
one of two treatment conditions. Half were instructed to
count the number of vowels in each word (shallow
processing). Half were instructed to judge whether the
object described by each word would be useful if one
were stranded on a desert island (deep processing).
After a brief distractor task, all subjects were given a
surprise free recall task. Did the instruction affect the
level of recall?The number of words correctly recalled
was recorded for each subject. Here are the data:
Result

Since the result of F-test conclude that the


variances of the two groups are equal, we will
apply “Assuming Equal Variances”.

One - Way Analysis of Variance Assumptions


1. Your dependent variable should be measured at the
(ANOVA) interval or ratio level (i.e., they are continuous).
2. Your independent variable should consist of two or more
One-way analysis of variance (ANOVA) categorical, independent groups.
is a method of test ing the equality of 3. You should have independence of observations, which
three or more population means by means that there is no relationship between the
observations in each group or between the groups
analyzing sample variances. themselves.
4. There should be no significant outliers.
Ho : μ1 = μ2 = . . . = μk
5. Your dependent variable should be approximately
Ha : At least one of the population means normally distributed for each category of the independent
is different from the others. variable.
6. There needs to be homogeneity of variances.
Example: 1. State the Null and Alternative
A Researchers wanted to compare math test scores of Hypothesis
students at the end of secondary school from various cities. Null hypothesis:
Eight randomly selected students from Makati, Manila,
and Quezon City each were administered the same exam; There is no significant difference between the
the results are presented in the following table. Can the mathematics scores of students at various city.
researchers conclude Alternative hypothesis:
that the distribution of There is significant difference between the
exam scores is different mathematics scores of students at various city.
for each city at the
level of significance? 2. Set the Level of Significance or Alpha
Level (α)
α = 0.10

3. Determine the Test Click “Data”, then click “Data Analysis”


Distribution to Use.
Dependent Variable:
Mathematics Scores
Independent Variable:
Cities (Makati, Manila, Quezon City)
Since we are comparing the means of one
independent variable that consist of two Determine if the
or more categorical groups, we will use variances are equal
the one-way ANOVA. or not equal.
Using p-value approach: If pvalue ≤ α , reject Ho,
otherwise failed to reject Ho
Ho: Equal Variances Assumed
Ha: Equal Variances Not Assumed

Failed to
Reject Ho
E q u a l
Variances
Assumed

Using p-value approach: If pvalue ≤ α , reject Ho,


otherwise failed to reject Ho
Ho: Equal Variances Assumed
Ha: Equal Variances Not Assumed

Failed to
Reject Ho
E q u a l
Variances
Assumed
Using p-value approach: If pvalue ≤ α , reject Ho,
otherwise failed to reject Ho
Ho: Equal Variances Assumed
Ha: Equal Variances Not Assumed

Failed to
Reject Ho

E q u a l
Variances
Assumed

4. Calculate Test Statistic or


p - value.
Click “Data”, then click “Data Analysis”
Result 5. Make Statistical Decision
Using p-value approach: If pvalue ≤ α , reject Ho,
otherwise failed to reject Ho

Reject Ho

Exercises:
6. Draw Conclusion Apply the procedure in testing the hypothesis.
There is enough evidence to support that the A teacher is concerned about the level of
distribution of exam scores of students in knowledge possessed by PUP students regarding
mathematics is different for each city. Philippine history. Students completed a high
school senior level standardized history exam.
Proper Presentation of Results Academic major of the students was also recorded.
Data in terms of percent correct is recorded below
for 24 students. Is there a significant difference
between the levels of knowledge possessed by PUP
students regarding Philippine history when
grouped according to their academic major?
Result

Pearson Product Moment


Correlation Features of r
The Pearson product moment correlation • Unit free
coefficient (Pearson r) is a measure of the • Range between -1 and 1
strength of a linear association between • The closer to -1, the stronger the negative
two variables and is denoted by r. linear relationship.
Ho: There is no significant relationship • The closer to 1, the stronger the positive
between two continuous variables. linear relationship.
Ha: There is significant relationship between • The closer to 0, the weaker the linear
two continuous variables. relationship.
Pearson Product Moment Sample of Observations from
Correlation Various r Values
Y Y Y

X X X
r = -1 r = -.6 r =0
If r is positive, the correlation is direct. Y Y
If r is negative, the correlation is inverse.
r = .6 r=1

Reminders: Assumptions
• Correlation does not imply causation. 1. Your two variables should be measured at the
• Watch out for hidden (lurking) variables. interval or ratio level (i.e., they are
continuous).
Lurking Variable
2. There is a linear relationship between your
• A variable that is not included as an explanatory two variables.
or response variable in the analysis but can affect
the interpretation of relationships between 3. There should be no significant outliers.
variables.
4. Your variables should be approximately
• Can falsely identify a strong relationship between normally distributed.
variables or it can hide the true relationship.
Significance Testing of Pearson r Example:
Test Statistic: A dietetics student wanted to look at the
df relationship between calcium intake and
t=r knowledge about calcium in sports
1 − r2
where: science students. Table shows the data
df = degrees of freedom she collected. Is there a relationship
between calcium intake and knowledge
r = correlation coefficient of Pearson r
about calcium in sports science
Note:
students?
df = n − 2

1. State the Null and Alternative


Hypothesis
Null hypothesis:
There is no significant relationship between the
calcium intake and knowledge about calcium in sports
science students.
Alternative hypothesis:
There is significant relationship between the calcium
intake and knowledge about calcium in sports science
students.
2. Set the Level of Significance or Alpha
Level (α)
α = 0.0.5
3. Determine the Test 4. Calculate Test Statistic or p - value.
Distribution to Use.
Dependent Variable:
Calcium Intake
Independent Variable:
Knowledge about Calcium

Since we are testing the significant


relationship of two variables, we will use
Pearson r.

df
t=r
1 − r2
Result
df = n − 2
5. Make Statistical Decision 6. Draw Conclusion
There is sufficient evidence to conclude that there
Using p-value approach: If pvalue ≤ α , is significant relationship between the calcium
reject Ho, otherwise failed to reject intake and knowledge about calcium in sports
Ho Strong and science students.
D i r e c t Proper Presentation of Results
Correlation

Reject Ho

Exercises:
Apply the procedure in testing the hypothesis.

A group of twelve children participated in a


psychological study designed to assess the
relationship, if any, between age (years)
and average total sleep time (minutes). To
obtain a measure for average total sleep
time, recordings were taken on each child
on five consecutive nights and then Result
averaged. The results obtained are shown in
the table.
Chi - Square: Test for
Chi-Square Distribution Independence
Definition: ✦
Used to discover if there is association
between two categorical variables.
The chi-square distribution is
written as χ 2 distribution.
✦ Used when you want to decide whether
two variables are independent or
The symbol χ is the Greek letter dependent.
“chi”, pronounced as “ki”. ✦ A contingency table will be constructed.

Chi - Square: Test for Chi - Square: Test for


Independence Independence
The test statistic for a test of independence is given
H0: The two categorical variables are by
(O − E)2
2

independent. χ =
E
where:
Ha: The two categorical variables are O is the observed frequency for a category
dependent.
E is the expected frequency for a category
(row total)(column total)
E=
grand total
Observed and Expected Frequencies Assumptions
The frequencies obtained from the performance of an 1. There are 2 variables, and both are measured as
experiment are called the observed frequencies and are categories, usually at the nominal level.
denoted by O. 2. The two variables should consist of two or more
The expected frequencies, denoted by E, are the categorical, independent groups.
frequencies that we expect to obtain if the null hypothesis is 3. The data in the cells should be frequencies, or counts
true. of cases rather than percentages or some other
Example of Contingency Table: transformation of the data.
Observed Values Low Medium High Row Total
4. For a 2 by 2 table, all expected frequencies > 5.
Some College 20 35 20 80
Bachelor's Degree 17 33 25 70 5. For a larger table, all expected frequencies > 1 and
Masters Degree 11 18 21 50 no more than 20% of all cells may have expected
frequencies < 5.
Column Total 48 86 66 200

Example: Reminders:
1. A doctor who knows that hypertension depends
on smoking habits can tell his smoking patients what
they should do. The word contingency refers to
dependence, but this is only a
2. If the traffic condition (light, moderate, heavy,
standstill) is found to be dependent on vehicle plate statistical dependence and cannot be
numbers (odd, even) a traffic officer may decide to used to establish a direct cause-and-
revise traffic law enforcement. effect link between the two variables in
3. If poverty status of households is found to be question.
correlated with family size, government ought to
adopt a viable poverty management program
1. State the Null and Alternative
Example: Hypothesis
Null hypothesis:
Educators are always looking for novel ways in
which to teach statistics to undergraduates as part Gender is independent with the preferred type of
of a non-statistics degree course (e.g., psychology). learning medium.
With current technology, it is possible to present Alternative hypothesis:
how-to guides for statistical programs online Gender is dependent with the preferred type of
instead of in a book. However, different people learning medium.
learn in different ways. An educator would like to
know whether gender (male/female) is associated
2. Set the Level of Significance or Alpha
with the preferred type of learning medium (online Level (α)
vs. books). Use “Data_Example and Exercises file”. α = 0.0.5

3. Determine the Test 4. Calculate Test Statistic or


Distribution to Use. p - value.
Click “Insert”, then click “Pivot Table”
Two Categorical Variables
Gender (Male and Female)
Preferred type of learning medium
(online vs. books)

Since we are testing the significant


relationship of two categorical variables,
we will use Chi-square test.
Row Total

Grand Total
Column Total

(row total)(column total)


E=
grand total

5. Make Statistical Decision


Using p-value approach: If pvalue ≤ α, reject Ho,
otherwise failed to reject Ho

Reject Ho
6. Draw Conclusion
There is sufficient evidence to conclude that there
gender is associated with the preferred type of
learning medium.
Proper Presentation of Results

Result

You might also like