R Inferential Statistics
R Inferential Statistics
6. Draw a Conclusion
INFERENTIAL
STATISTICS
TWO COMMON STATISTICAL INFERENCE
1.ESTIMATION
2.HYPOTHESIS TESTING
COMMONLY USE CONFIDENCE LEVEL
Confidence
Level
90%
95%
99%
HYPOTHESIS TESTING
HYPOTHESIS TESTING
is a procedure on sample evidence and probability, used to test claims regarding a characteristic of
one or more populations.
PROCEDURE FOR HYPOTHESIS TESTING
1. State the null and alternative hypothesis.
2. Set the level of significance or alpha level ( ).
3. Determine the test distribution to use.
4. Calculate test statistic or p - value.
5. Make statistical Decision
- Confidence Interval
- Traditional Method (Critical value method)
- p-value method
6. Draw Conclusion
TWO TYPES OF HYPOTHESIS
1. The null hypothesis denoted by 𝐻0, is a statement saying that there is no significant
difference between population parameter and the value that is being claimed. It is the
hypothesis to be tested.
2. Alternative hypothesis, denoted by 𝐻𝑎 , is a statement saying that there is a significant
difference between the population parameter and the value that is being claimed. This
is a statement that will be true once the null hypothesis is rejected.
Example of Null Hypothesis
• 1. Students who eat and not eat breakfast will perform the same on a math exam
• 2. Students who experience and not experience test anxiety prior to an English exam will get
the same scores.
• Approximately Normal
• Homogeneity of Variances
• Sample must be independent of each other
• No significant outlier
TESTING OF NORMALITY
• Graphical
1. Histogram
2. Normal Q-Q Plot
• Numerical
1. Kolmogorov Smirnov-Test
2. Liliefors
3. Anderson-Darling Test
4. Shapiro Wilk Test
HOW TO CREATE Q-Q PLOT IN R.
1. Bartletts test
2. Levenes test
BARTLETTS TEST
If the data is normally distributed, this is the best test to use. It is sensitive to data which is
non normally distribution; it is more likely to return a “false positive” when the data is non-
normal.
Command for Bartletts Test
Bartlett.test(x~group, data=data frame)
“x” a numeric vector of data values
“group” factor of the data
LEVENES TEST
1. Your dependent variable should be measured at the interval or ratio level (i.e., they are
continuous).
2. Your independent variable should consist of two categorical, "related groups" or
"matched pairs”.
3. There should be no significant outliers in the differences between the two related groups.
4. The distribution of the differences in the dependent variable between the two related
groups should be approximately normally distributed.
COMMAND FOR DEPENDENT SAMPLE T-TEST
The teacher is interested to know if the new learning program will help to
increase the number of correct remembered words. 10 Subjects learn a list of
50 words. Learning performance is measured using a recall test. After the
first test all subjects are instructed how to use the learning program and then
learn a second list of 50 words. Learning performance is again measured
with the recall test. In the following table the number of correct remembered
words are listed for both tests.
Subject Before After
1 ,, 26
2 18 24
3 32 31
4 14 17
5 16 17
6 22 25
7 26 25
8 19 24
9 19 22
10 22 23
STEP 1. STATE THE NULL AND ALTERNATIVE
HYPOTHESIS
• Null hypothesis:
H0 : μ1 ≥ μ2
The new learning program will not help to increase the number of correct remembered words.
• Alternative hypothesis:
H0 : μ1 > μ2
The new learning program will help to increase the number of correct remembered words.
STEP 2. SET THE LEVEL OF SIGNIFICANCE 𝛼
𝛼=0.05
STEP 3. DETERMINE THE TEST DISTRIBUTION TO
USE OR TEST THE ASSUMPTION
Suppose a sample of n student were given a diagnostic test before studying a particular
module and then again after completing the module. We want to find if, in general, our
teaching leads to improvement in students’ knowledge/ skills of test scores increases. We
can use the result form the sample of students to draw conclusion about the impact of this
module in general.
Let x= test score before the module y= test score after the module
Student Pre-module score Post-module score
1 18 22
2 21 25
3 16 17
4 22 24
5 19 16
6 24 29
7 17 20
8 21 23
9 23 19
10 18 20
11 14 15
12 16 15
13 16 18
14 19 26
15 18 18
16 20 24
17 12 18
18 22 25
19 15 19
20 17 16
STEP 1.
• Null hypothesis:
H0 : μ1 = μ2
There is no significant difference in the improvement of student in skill/knowledge with Pre-
Module and Post Module
• Alternative hypothesis:
H0 : μ1 ≠ μ2
There is no significant difference in the improvement of student in skill/knowledge with Pre-
Module and Post Module
.
STEP 2. SET THE LEVEL OF SIGNIFICANCE 𝛼
𝛼=0.05
STEP 3. DETERMINE THE TEST DISTRIBUTION TO
USE OR TEST THE ASSUMPTION
• Dependent Variable: Number of correct remembered
words Treatment (Before and After)
• Independent Variable: Pre-module score
• Post-module score
PROCEDURES
Researchers wanted to know whether there was a difference in comprehension among students
learning a computer program based on the style of the text. They randomly divided 18 students into
two groups of 9 each. The researchers verified that the 18 students were similar in terms of educational
level, age, and so on. Group 1 individuals learned the software using Example: v i s u a l m a n u a l (
m u l t i m o d a l instruction), while Group 2 individual learned the software using textual manual
(Unimodal instruction). The following data represent scores the students received on an exam given to
them they studied from the manuals.
Visual Textual
51.08 64.55
57.03 57.6
44.85 68.59
75.21 50.75
56.87 49.63
75.28 43.58
57.07 57.4
80.3 49.48
52.2 49.57
STEP 1. STATE THE NULL AND ALTERNATIVE
HYPOTHESIS
Null hypothesis: 𝝁𝟏 = 𝝁𝟐
There is no significant difference between the scores of the students learning computer program using
textual and visual style
Alternative hypothesis: 𝝁𝟏 ≠ 𝝁𝟐
• There is significant difference between the scores of the students learning computer program using
textual and visual style.
STEP 2. SET THE LEVEL OF SIGNIFICANCE 𝛼
𝛼=0.05
STEP 3. DETERMINE THE TEST DISTRIBUTION TO
USE.
• Apply the procedure in testing the hypothesis. Twenty participants were given a list of 20
words to process. The 20 participants were randomly assigned to one of two treatment
conditions. Half were instructed to count the number of vowels in each word (shallow
processing). Half were instructed to judge whether the object described by each word
would be useful if one were stranded on a desert island (deep processing). After a brief
distractor task, all subjects were given a surprise free recall task. Did the instruction
affect the level of recall?The number of words correctly recalled was recorded for each
subject. Here are the data:
ANALYSIS OF
VARIANCE
ANOVA
One-way analysis of variance (ANOVA) is a method of test ing the equality of three or more
population means by analyzing sample variances.
• Ho : μ1 = μ2 = . . . = μk
This command corrects for non-homogeneity, but doest give much information. Only F, p-value and dfs for
numerator and denominator are given information, no mean square etc.
To used the command of ANOVA, you need to download the package stats.
oneway.test(x~group, data=data
frame,var.equal=FALSE)
The default is equal variances not assumed, to change this, set “var.equal=“option to TRUE.
“x” numeric vector of data values.
“group” factor of data.
COMMAND FOR ANOVA TEST
If you want to have an information about the result of sum of square and mean square of anova,
this command is applicable.
To used the command of ANOVA, you need to download the package stats.
summary(aov(x~group, data=data frame))
Researchers wanted to compare math test scores of students at the end of secondary school from
various cities. Eight randomly selected students from Makati, Manila, and Quezon City each were
administered the same exam; the results are presented in the following table. Can the researchers
conclude that the distribution of exam scores is different for each city at the level of significance 0.1?
STEP 1. STATE THE NULL AND ALTERNATIVE
HYPOTHESIS
• Null hypothesis: Alternative hypothesis: There is no significant difference between the
mathematics scores of students at various city.
•
• Alternative hypothesis: There is significant difference between the mathematics scores of
students at various city.
STEP 2. SET THE LEVEL OF SIGNIFICANCE 𝛼
𝛼=0.10
STEP 3. DETERMINE THE TEST DISTRIBUTION TO
USE.
Apply the procedure in testing the hypothesis. A teacher is concerned about the level of
knowledge possessed by PUP students regarding Philippine history. Students completed a
high school senior level standardized history exam. Academic major of the students was
also recorded. Data in terms of percent correct is recorded below for 24 students. Is there
a significant difference between the levels of knowledge possessed by PUP students
regarding Philippine history when grouped according to their academic major