0% found this document useful (0 votes)
22 views78 pages

R Inferential Statistics

The document discusses procedures for hypothesis testing including stating null and alternative hypotheses, setting significance levels, determining test distributions, calculating test statistics, making statistical decisions, and drawing conclusions. It also provides examples of dependent sample t-tests.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views78 pages

R Inferential Statistics

The document discusses procedures for hypothesis testing including stating null and alternative hypotheses, setting significance levels, determining test distributions, calculating test statistics, making statistical decisions, and drawing conclusions. It also provides examples of dependent sample t-tests.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

PROCEDURES

1. State the Null and Alternative Hypothesis

2. Set the level of significance

3. Determine the test distribution to use.


- Determine the appropriate statistical
test to be used
PROCEDURES

4. Calculate test statistics


-Using Confidence Interval
-Using traditional Method Using p-value approach:
-p-value method If p value≤ 𝟎, therefore reject
Ho, otherwise failed to reject
Example: If the level of significance 𝜶 = 𝟎. 𝟎𝟓
Ho
p-value Decision
0.01 Reject
0.05 Reject
0.10 Fail to Reject
PROCEDURES

5. Make a Statistical Decision

6. Draw a Conclusion
INFERENTIAL
STATISTICS
TWO COMMON STATISTICAL INFERENCE

1.ESTIMATION
2.HYPOTHESIS TESTING
COMMONLY USE CONFIDENCE LEVEL

Confidence
Level

90%
95%
99%
HYPOTHESIS TESTING

What are the things that you’re believing or


claiming which was not prove to be true? And
give suggestion how do you prove you’re
claims.
HYPOTHESIS
 statement or claim regarding a characteristic of one or more populations.
 A preconceived idea assumed to be true but has to be tested for its truth or falsity.

HYPOTHESIS TESTING
is a procedure on sample evidence and probability, used to test claims regarding a characteristic of
one or more populations.
PROCEDURE FOR HYPOTHESIS TESTING
1. State the null and alternative hypothesis.
2. Set the level of significance or alpha level ( ).
3. Determine the test distribution to use.
4. Calculate test statistic or p - value.
5. Make statistical Decision
- Confidence Interval
- Traditional Method (Critical value method)
- p-value method
6. Draw Conclusion
TWO TYPES OF HYPOTHESIS

1. The null hypothesis denoted by 𝐻0, is a statement saying that there is no significant
difference between population parameter and the value that is being claimed. It is the
hypothesis to be tested.
2. Alternative hypothesis, denoted by 𝐻𝑎 , is a statement saying that there is a significant
difference between the population parameter and the value that is being claimed. This
is a statement that will be true once the null hypothesis is rejected.
Example of Null Hypothesis
• 1. Students who eat and not eat breakfast will perform the same on a math exam
• 2. Students who experience and not experience test anxiety prior to an English exam will get
the same scores.

Example of Alternative Hypothesis


1. Students who eat breakfast will perform better on a math exam than
students who do not eat breakfast.
2. Students who experience test anxiety prior to an English exam will get
higher scores than students who do not experience test anxiety
TWO TYPES OF ALTERNATIVE TEST

1. One - tailed test


• ✦ Left tailed ✦ Right tailed
1. Two - tailed test
PARAMETRIC TEST

One Sample t-test


Dependent Sample t-test
Independent Sample t-test
One-Way ANOVA
Pearson Product Moment Correlation
COMMON ASSUMPTION

• Approximately Normal
• Homogeneity of Variances
• Sample must be independent of each other
• No significant outlier
TESTING OF NORMALITY

• Graphical
1. Histogram
2. Normal Q-Q Plot
• Numerical
1. Kolmogorov Smirnov-Test
2. Liliefors
3. Anderson-Darling Test
4. Shapiro Wilk Test
HOW TO CREATE Q-Q PLOT IN R.

Command for Histogram


To construct normal Q-Q plot use the command:
qqnorm(x)
qqline(x)
TESTING OF NORMALITY

Command for Shapiro Wilk-Test


shapiro.test(x)
Command for lilliefors Test
To used the lilliefors, you need to download the package nortest.
Lillie.test(x)
TESTING OF NORMALITY

Command for Anderson Darling Test


To used the command of Anderson darling, you need to download the nortest.
ad.test(x)
Command for Kolmogorov Smirnov Test
ks.test(x,”pnorm”)

“x” is a numeric vector


TESTING OF HOMOGENEITY

1. Bartletts test
2. Levenes test
BARTLETTS TEST

If the data is normally distributed, this is the best test to use. It is sensitive to data which is
non normally distribution; it is more likely to return a “false positive” when the data is non-
normal.
Command for Bartletts Test
Bartlett.test(x~group, data=data frame)
“x” a numeric vector of data values
“group” factor of the data
LEVENES TEST

More robust to departures from normality than Bartletts test.


Command for Levene’s Test
To used the command of levene’s, you need to download the package car and carData.
leveneTest(x~group, data = dataframe )

“x” a numeric vector of data values.


“group” factor of the data.
DEPENDENT T-TEST
INDEPENDENT T-TEST
COMPARISON OF
ANOVA MEANS
INDEPENDENT VS. DEPENDENT SAMPLE

✦ A sampling method is independent when the individuals selected for one


sample do not dictate which individuals are to be in a second sample.
✦ A sampling method is dependent when the individual selected to be in
one sample are used to determine the individuals to be in the second sample.
EXAMPLE 1

1. In an experiment conducted in biology class, Miss Justinne measured the


time required for 12 students to catch a failing meter stick using their
dominant hand and nondominant hand. The goal of the study was to
determine whether the reaction time in an individual’s dominant hand is
different from the reaction time in the non-dominant hand.
EXAMPLE 2

An urban economist believes that commute times to


work in the South are less than commute times to work in
the Midwest. He randomly selects 40 employed
individuals in the south and 45 employed individuals in
the Midwest and determines their commute times.
DEPENDENT
SAMPLE T-TEST
The dependent sample t-test (also called the paired t-test or paired-samples t-test)
compares the means of two related groups to determine whether there is a statistically
significant difference between these means.
ASSUMPTIONS:

1. Your dependent variable should be measured at the interval or ratio level (i.e., they are
continuous).
2. Your independent variable should consist of two categorical, "related groups" or
"matched pairs”.
3. There should be no significant outliers in the differences between the two related groups.
4. The distribution of the differences in the dependent variable between the two related
groups should be approximately normally distributed.
COMMAND FOR DEPENDENT SAMPLE T-TEST

t.test(a,b, mu=0, alternative = “less”, paired =TRUE, conf.level = 0.95)

“a” a numeric vector of data values


“b” a numeric vector of data values
EXAMPLE

The teacher is interested to know if the new learning program will help to
increase the number of correct remembered words. 10 Subjects learn a list of
50 words. Learning performance is measured using a recall test. After the
first test all subjects are instructed how to use the learning program and then
learn a second list of 50 words. Learning performance is again measured
with the recall test. In the following table the number of correct remembered
words are listed for both tests.
Subject Before After
1 ,, 26
2 18 24
3 32 31
4 14 17
5 16 17
6 22 25
7 26 25
8 19 24
9 19 22
10 22 23
STEP 1. STATE THE NULL AND ALTERNATIVE
HYPOTHESIS
• Null hypothesis:
H0 : μ1 ≥ μ2

The new learning program will not help to increase the number of correct remembered words.
• Alternative hypothesis:
H0 : μ1 > μ2

The new learning program will help to increase the number of correct remembered words.
STEP 2. SET THE LEVEL OF SIGNIFICANCE 𝛼

𝛼=0.05
STEP 3. DETERMINE THE TEST DISTRIBUTION TO
USE OR TEST THE ASSUMPTION

• Dependent Variable: Number of correct remembered


words Treatment (Before and After)
• Independent Variable: Treatment Before and After
PROCEDURES

4. Calculate test statistics


-Using Confidence Interval
-Using traditional Method Using p-value approach:
-p-value method If p value≤ 𝟎, therefore reject
Ho, otherwise failed to reject
Example: If the level of significance 𝜶 = 𝟎. 𝟎𝟓
Ho
p-value Decision
0.01 Reject
0.05 Reject
0.10 Fail to Reject
PROCEDURES

5. Make a Statistical Decision


p-value ≤ 𝟎. 𝟎𝟓, 𝒕𝒉𝒆𝒓𝒆𝒇𝒐𝒓𝒆 𝒓𝒆𝒋𝒆𝒄𝒕
𝒏𝒖𝒍𝒍 𝒉𝒚𝒑𝒐𝒕𝒉𝒆𝒔𝒊𝒔, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆 𝑭𝑨𝑰𝑳𝑬𝑫 𝒕𝒐 𝒓𝒆𝒋𝒆𝒄𝒕
null hypothesis
STEP 6. DRAW A CONCLUSION

There is sufficient evidence to support that the new


learning program help to increase the number of correct
remembered words.
EXAMPLE 2

Suppose a sample of n student were given a diagnostic test before studying a particular
module and then again after completing the module. We want to find if, in general, our
teaching leads to improvement in students’ knowledge/ skills of test scores increases. We
can use the result form the sample of students to draw conclusion about the impact of this
module in general.

Let x= test score before the module y= test score after the module
Student Pre-module score Post-module score
1 18 22
2 21 25
3 16 17
4 22 24
5 19 16
6 24 29
7 17 20
8 21 23
9 23 19
10 18 20
11 14 15
12 16 15
13 16 18
14 19 26
15 18 18
16 20 24
17 12 18
18 22 25
19 15 19
20 17 16
STEP 1.

• Null hypothesis:
H0 : μ1 = μ2
There is no significant difference in the improvement of student in skill/knowledge with Pre-
Module and Post Module
• Alternative hypothesis:
H0 : μ1 ≠ μ2
There is no significant difference in the improvement of student in skill/knowledge with Pre-
Module and Post Module
.
STEP 2. SET THE LEVEL OF SIGNIFICANCE 𝛼

𝛼=0.05
STEP 3. DETERMINE THE TEST DISTRIBUTION TO
USE OR TEST THE ASSUMPTION
• Dependent Variable: Number of correct remembered
words Treatment (Before and After)
• Independent Variable: Pre-module score
• Post-module score
PROCEDURES

4. Calculate test statistics


-Using Confidence Interval
-Using traditional Method Using p-value approach:
-p-value method If p value≤ 𝟎, therefore reject
Ho, otherwise failed to reject
Example: If the level of significance 𝜶 = 𝟎. 𝟎𝟓
Ho
p-value Decision
0.01 Reject
0.05 Reject
0.10 Fail to Reject
PROCEDURES

5. Make a Statistical Decision


p-value ≤ 𝟎. 𝟎𝟓, 𝒕𝒉𝒆𝒓𝒆𝒇𝒐𝒓𝒆 𝒓𝒆𝒋𝒆𝒄𝒕
𝒏𝒖𝒍𝒍 𝒉𝒚𝒑𝒐𝒕𝒉𝒆𝒔𝒊𝒔, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆 𝑭𝑨𝑰𝑳𝑬𝑫 𝒕𝒐 𝒓𝒆𝒋𝒆𝒄𝒕
null hypothesis
STEP 6.

The study is used to determine that teaching helps the


improvement of student in terms of skills or knowledge.
The result should that there is strong evidence that, on
average of teaching modules does lead to improvements.
INDEPENDENT T-TEST
INDEPENDENT SAMPLE T - TEST
The independent sample t - test allows researchers to evaluate or to compare the mean difference
between two populations using the data from two separate samples. It is used to test whether
population means are significantly different from each other, using the means from randomly drawn
samples.
COMMAND FOR INDEPENDENT T-TEST

t.test (x~group, mu = 0, alternative = “ less“, var.equal = TRUE, confi.level = 0.95)

“x” a numeric vector of data values.


“group” factor of the data
ASSUMPTION
1. Your dependent variable should be measured on a continuous scale (i.e., it is measured at the interval or
ratio level).
2. Your independent variable should consist of two categorical, independent groups.
3. You should have independence of observations, which means that there is no relationship between the
observations in each group or between the groups themselves. Assumptions
4. There should be no significant outliers.
5. Your dependent variable should be approximately normally distributed for each group of the independent
variable.
6. There needs to be homogeneity of variances.
EXAMPLE

Researchers wanted to know whether there was a difference in comprehension among students
learning a computer program based on the style of the text. They randomly divided 18 students into
two groups of 9 each. The researchers verified that the 18 students were similar in terms of educational
level, age, and so on. Group 1 individuals learned the software using Example: v i s u a l m a n u a l (
m u l t i m o d a l instruction), while Group 2 individual learned the software using textual manual
(Unimodal instruction). The following data represent scores the students received on an exam given to
them they studied from the manuals.
Visual Textual
51.08 64.55
57.03 57.6
44.85 68.59
75.21 50.75
56.87 49.63
75.28 43.58
57.07 57.4
80.3 49.48
52.2 49.57
STEP 1. STATE THE NULL AND ALTERNATIVE
HYPOTHESIS
Null hypothesis: 𝝁𝟏 = 𝝁𝟐
There is no significant difference between the scores of the students learning computer program using
textual and visual style

Alternative hypothesis: 𝝁𝟏 ≠ 𝝁𝟐
• There is significant difference between the scores of the students learning computer program using
textual and visual style.
STEP 2. SET THE LEVEL OF SIGNIFICANCE 𝛼

𝛼=0.05
STEP 3. DETERMINE THE TEST DISTRIBUTION TO
USE.

• Dependent Variables: Scores


• Independent Variables: Style of Text( Visual and Textual)

Check the homogeneity of variance


STEP 5. MAKING STATISTICAL DECISION
STEP 6. DRAW A CONCLUSION

There is no enough evidence to support that there is a difference


in comprehension among students learning a computer program
based on the style of the text.
EXAMPLE 2

• Apply the procedure in testing the hypothesis. Twenty participants were given a list of 20
words to process. The 20 participants were randomly assigned to one of two treatment
conditions. Half were instructed to count the number of vowels in each word (shallow
processing). Half were instructed to judge whether the object described by each word
would be useful if one were stranded on a desert island (deep processing). After a brief
distractor task, all subjects were given a surprise free recall task. Did the instruction
affect the level of recall?The number of words correctly recalled was recorded for each
subject. Here are the data:
ANALYSIS OF
VARIANCE
ANOVA

One-way analysis of variance (ANOVA) is a method of test ing the equality of three or more
population means by analyzing sample variances.
• Ho : μ1 = μ2 = . . . = μk

• Ha : At least one of the population


COMMAND FOR ANOVA TEST

This command corrects for non-homogeneity, but doest give much information. Only F, p-value and dfs for
numerator and denominator are given information, no mean square etc.
To used the command of ANOVA, you need to download the package stats.
oneway.test(x~group, data=data
frame,var.equal=FALSE)

The default is equal variances not assumed, to change this, set “var.equal=“option to TRUE.
“x” numeric vector of data values.
“group” factor of data.
COMMAND FOR ANOVA TEST

If you want to have an information about the result of sum of square and mean square of anova,
this command is applicable.
To used the command of ANOVA, you need to download the package stats.
summary(aov(x~group, data=data frame))

“x” numeric vector of data values.


“group” factor of data.
ASSUMPTIONS
1.Your dependent variable should be measured at the interval or ratio level (i.e., they are continuous).
2. Your independent variable should consist of two or more categorical, independent groups.
3. You should have independence of observations, which means that there is no relationship between the
observations in each group or between the groups themselves
4. There should be no significant outliers.
5. Your dependent variable should be approximately normally distributed for each category of the
independent variable.
6. There needs to be homogeneity of variances.
EXAMPLE 1

Researchers wanted to compare math test scores of students at the end of secondary school from
various cities. Eight randomly selected students from Makati, Manila, and Quezon City each were
administered the same exam; the results are presented in the following table. Can the researchers
conclude that the distribution of exam scores is different for each city at the level of significance 0.1?
STEP 1. STATE THE NULL AND ALTERNATIVE
HYPOTHESIS
• Null hypothesis: Alternative hypothesis: There is no significant difference between the
mathematics scores of students at various city.

• Alternative hypothesis: There is significant difference between the mathematics scores of
students at various city.
STEP 2. SET THE LEVEL OF SIGNIFICANCE 𝛼

𝛼=0.10
STEP 3. DETERMINE THE TEST DISTRIBUTION TO
USE.

• Dependent Variables: Scores


• Independent Variables: Cities (Makati, Manila, Quezon City)

Check the homogeneity of variance


STEP 4. CALCULATE THE P-VALUE
STEP 5. MAKING STATISTICAL DECISION
STEP 6. DRAW A CONCLUSION

There is enough evidence to support that the distribution


of exam scores of students in mathematics is different for
each city.
EXAMPLE 2

Apply the procedure in testing the hypothesis. A teacher is concerned about the level of
knowledge possessed by PUP students regarding Philippine history. Students completed a
high school senior level standardized history exam. Academic major of the students was
also recorded. Data in terms of percent correct is recorded below for 24 students. Is there
a significant difference between the levels of knowledge possessed by PUP students
regarding Philippine history when grouped according to their academic major

You might also like