0% found this document useful (0 votes)
44 views53 pages

Anova Chi Square Test

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views53 pages

Anova Chi Square Test

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

INFERENTIAL

STATISTICS
HYPOTHESIS
TESTING
ANALYSIS of VARIANCE (ANOVA)
It was developed by Ronald A. Fisher
and is also known as the F-test.
It is a technique designed to test
whether or not more than two
ANALYSIS of samples or groups are significantly
VARIANCE different from each other.

It is a logical extension of t–test when


we have data from 3 or more
independent groups
ANALYSIS of VARIANCE
Formulas:
1. TSS = ∑X2 – (∑X)2 4. MSSb =
SSb
n dfb
2. SSb = (∑Xi)2 – (∑X)2 5. MSSw =
SSw
r n
dfw

3. SSw = TSS – SSb 6. Fc =


MSSb
ANALYSIS of VARIANCE
Formulas:
TSS = Total Sum of Squares
SSb = Between-column Sum of Squares
SSw = Within-column Sum of Squares
MSSb = Between-column Mean Sum of Squares
MSSw = Within-column Mean Sum of Squares
dft = total degrees of freedom
dfb = between-column degrees of freedom
dfw = within-column degrees of freedom
c = categories/groups
r = no. of rows
Fc = computed value of F-test
ANALYSIS of VARIANCE
1. A study was conducted to determine if there is a significant difference
existing in the mean achievement scores of students from three
different schools. The results were shown on the table below.

Student School A School B School C


No.
1 86 80 83
2 82 85 89
3 78 81 84
4 90 84 80
5 88 86 82
6 79 75 78
7 77 80 76
8 87 83 85
9 95 79 87
10 83 92 94
ANALYSIS of VARIANCE
1. A study was conducted to determine if there is a significant difference
existing in the mean achievement scores of students from three
different schools. The results were shown on the table below.

Student School A School B School C


No.
1 86 80 83
2 82 85 89
3 78 81 84
4 90 84 80
5 88 86 82
6 79 75 78
7 77 80 76
8 87 83 85
9 95 79 87
10 83 92 94
∑Xi 845 825 838 ∑X = 2, 508
ANALYSIS of VARIANCE
1. A study was conducted to determine if there is a significant difference
existing in the mean achievement scores of students from three
different schools. The results were shown on the table below.

Student School A A= X2 School B= X2 School C = X2


No. B C
1 86 862 = 7396 80 6400 83 6889
2 82 6724 85 7225 89 7921
3 78 6084 81 6561 84 7056
4 90 8100 84 7056 80 6400
5 88 7744 86 7396 82 6724
6 79 6241 75 5625 78 6084
7 77 5929 80 6400 76 5776
8 87 7569 83 6889 85 7225
9 95 9025 79 6241 87 7569
10 83 6889 92 8464 94 8836
∑Xi 845 825 838 ∑X = 2508
71, 701 68, 257 70, 480 ∑X2 = 210, 438
ANALYSIS of VARIANCE
1. Ho: There is no significant difference existing in the mean
achievement scores of students from three different
schools. (X1= X2= X3)
Ha: There is a significant difference existing in the mean
achievement scores of students from three different
schools. (X1 X2  X3)

2.  = 0.05
3. F-Test for equal number of rows dfb 2_
4. Critical/Tabular Value: Ft = 3.35 dfw 27
categories
dft = n – 1 dfb = c – 1
dfw = dft – dfb
ANALYSIS of VARIANCE
Formulas:
TSS = ∑X2 – ( ∑X )2
SSw = TSS – SSb
n
SSw = 769.2 – 20.6
TSS = 210 438 – (2 508 )2 SSw =
748.6
30
TSS = 769.2
MSSb = SSb
SSb = (∑Xi)2 – (∑X)2
dfb
ANALYSIS of VARIANCE
Formulas:

MSSw = SSw FC =
MSSb
dfw
MSSw
MSSw = 748.6 FC =
6.10.3
Since /Fc = 0.37/ < /Ft = 3.354/, therefore,
accept null27 27.73 that there is
hypothesis. This implies
noMSSw = 27.73
significant difference existing in Fthe
C = mean
achievement
0.37 scores of students from three different
schools.
2. Three sections of the same Mathematics I course are taught
by 3 instructors. The final grades were as follows:
Instructor
A B C
73 88 68
89 78 79
82 91 91
80 85 71
73 74 71
77 87
78 59
62
76
96
397 805 526 1 728 = ∑X
31 703 65 659 40 278 137 640 = ∑X2

Is there a significant difference in the average grades given by the 3


instructors using 1% level of significance?
ANALYSIS of VARIANCE
1. Ho: There is no significant difference in the
average grades given by the 3 instructors
Ha: There is a significant difference in the average
grades given by the 3 instructors
2.  = 1% or 0.01
3. F-test for unequal number of rows
4. Critical/Tabular Value: Ft = 5.93

dft = n – 1 dfb = c – 1
dfw = dft – dfb
dft = 22 – 1 = 21 dfb = 3 – 1 = 2 dfw
= 21 – 2 = 19
ANALYSIS of VARIANCE
Formulas:
TSS = ∑X2 – ( ∑X )2 SSw =
TSS - SSb
n SSw =
1 913.45 – 122.90
TSS = 137 640 – (1 728 )2 SSw = 1
790.55
22
TSS = 1 913.45
MSSb = SSb
SSb = (∑Xi)2 – (∑X)2
dfb
ANALYSIS of VARIANCE
Formulas:

MSSw = SSw FC =
MSSb
dfw
MSSw
MSSw = 1 790.55 FC = 61.45
6. Since /Fc19 = 0.65/ < /Ft = 5.93/,
94.24 therefore,
MSSwnull
accept = 94.24
hypothesis. This shows that FC there
= is
no0.65
significant difference in the average grades given
by the 3 instructors.
ANALYSIS Of VARIANCE
3. Determine if there is a significant difference among the
entrance examination scores obtained by 5 students from 4
different sections.
1 2 3 4
45 35 65 54
34 29 55 38
30 58 34 67
28 60 32 40
40 32 25 50
ANALYSIS of VARIANCE
1. Ho: There is no significant difference among the entrance
examination scores obtained by 5 students from 4 different
sections.
2. Ha: There is a significant difference among the entrance examination
scores obtained by 5 students from 4 different sections.
3. 2.  = 5% or 0.05
3. F-test for equal number of rows
4. Critical/Tabular Value: Ft = 4.08

dft = n – 1 dfb = c – 1
dfw = dft – dfb
dft = 20 – 1 = 19 dfb = 4 – 1 = 3 dfw
= 19 – 3 = 16
ANALYSIS of VARIANCE
3. Solutions:
1 2 3 4
45 35 65 54
34 29 55 38
30 58 34 67
28 60 32 40
40 32 25 50
177 214 211 249 X = 851
6465 10054 10055 12 949 X2 = 39 523
ANALYSIS of VARIANCE
Solutions:X = 851 X2 = 39 523
TSS = ∑X2 – ( ∑X )2 SSw =
TSS - SSb
n SSw =
3 312.95 – 519.35
TSS = 39 523 – (851)2 SSw = 2
793.6
20
TSS = 3 312.95
MSSb = SSb
SSb = (∑Xi)2 – (∑X)2
dfb
ANALYSIS of VARIANCE
Solutions:

MSSw = SSw FC =
MSSb
dfw
MSSw
MSSw = 2 793.6 FC = 173.12
6. Since /Fc 16 = 0.99/ < /Ft = 4.08/, 174.6 therefore,
MSSw null
accept =174.6
hypothesis. This means thatFCthere = is no
significant
0.99 difference among the entrance examination scores
obtained by 5 students from 4 different sections.
THE CHI–SQUARE TEST (X2)
CHI–SQUARE TEST (X2) can be utilized to know whether the set of
observed frequencies on one variable will be the same as the
expected frequencies on the same variable.
APPLICATIONS OF THE CHI–SQUARE: The chi–square test is considered a unique
test due to its applications which are as follows:
1. TEST OF GOODNESS–OF–FIT is performed in order to determine
if a set of observed data corresponds to some theoretical
distribution. It is used for one-way classification.
Example: We have bags of candy with five flavors
in each bag. The bags should contain an equal
number of pieces of each flavor. The idea we'd like
to test is that the proportions of the five flavors in
APPLICATIONS OF THE CHI–SQUARE:
2. TEST OF HOMEGEINITY is used when there are two or more
samples, one criterion variable. It is used to determine if
two or more populations or data distribution are similar
with respect to a particular criterion variable
Also, test of homogeneity compares the
proportions of responses from two or more
populations with regards to a dichotomous
variable (e. g., male/female, yes/no) or
variable with more than two outcome
categories.
APPLICATIONS OF THE CHI–SQUARE:
Consider the table below. It shows that people that people who live
with others are marginally more likely to be on a diet but are much
less likely to watch what they eat and drink and are much more
likely to eat and drink whatever they feel like. However, only 32 in
the table are classified as living alone, so it is likely that these
results reflect a relatively high degree
LivingofAlone
sampling error.with
Living
Others
On a diet 6 8
Watch what I eat and drink 72 49
Eat and drink whatever I fell 22 42
like
A chi-square test of homogeneity
Total tests
100 whether differences
100 in a
table like this are consistent with sampling error.
APPLICATIONS OF THE CHI–SQUARE:

3. TEST OF INDEPENDENCE used when there is one sample, two


criterion variables. This test is used to see if measures taken on two
criterion variables are either independent or associated with one
another in a given population

The chi-square test of independence checks whether two


variables are likely to be related or not. We have counts
for two categorical or nominal variables. We also have an
idea that the two variables are not related. The test gives
us a way to decide if our idea is plausible or not.
APPLICATIONS OF THE CHI–SQUARE:
Examples:
We have a list of movie genres; this is our first variable. Our
second variable is whether or not the patrons of those genres
bought snacks at the theater. Our idea (or, in statistical terms,
our null hypothesis) is that the type of movie and whether or
not people bought snacks are unrelated. The owner of the
movie theater wants to estimate how many snacks to buy. If
movie type and snack purchases are unrelated, estimating will
be simpler than if the movie types impact snack sales.
A veterinary clinic has a list of dog breeds they see as
patients. The second variable is whether owners feed dry
food, canned food or a mixture. Our idea is that the dog
breed and types of food are unrelated. If this is true, then
DETERMINING THE DEGREES OF FREEDOM
 FOR ONE–WAY CLASSIFICATION – it has only one variable described by at least two
categories
df = c – 1 where: c = number of categories
 FOR TWO–WAY CLASSIFICATION – it has two variables described by their corresponding
categories (it is always presented in a contingency table)
df = (c – 1)(r – 1) where: c = number of columns
r = number of rows
FORMULAS:
1. X2 = (O – E)2 for contingency table (two–way classification)
E E = (row total)(column total)
Formula 1
grand total
2. X2 = (/O – E/ – 0.5)2
E for one–way classification E = np
Formula 2
where: n = total frequency p =
EXAMPLES:
1.The table below shows the civil status of 50 employees of a certain
company.
Civil Status f
Single 18
Married 24
Widowed 5
Legally 3
Separated
Based onTotal
the data, is the actual observed proportion significantly
different from the expected proportion, if the ideal or expected
proportion is 30% married, 50% single, 10% legally separated, and
10% widowed using 5% level of significance?
ANALYSIS of VARIANCE
1. Ho: The actual observed proportion is not significantly
different from the expected proportion.
Ha: The actual observed proportion is significantly
different from the expected proportion.
2.  = 5% or 0.05
3. Type/Kind of test: Chi-Square Test for one-way

classification
4. Critical/Tabular Value: X2t = 7.815

df = c – 1 df = 4 – 1 = 3
5. Solutions:
Civil Status f
Single 18
Married 24
Widowed 5
Legally 3
Separated
Given: Total n = 50 E = np
Proportion:
S= 50% n = 50 E1 = 50(0.5)
= 25
M = 30% E2 = 50(0.3)
= 15
5. Solutions:
E1 = 25 O1 = 18 Use the formula
E2 = 15 O2 = 24 X2 = (O – E)2
E3 = 5 O3 = 5
E
E4 = 5 O4 = 3
2
Compute for the individual X
X 1
2
= (18 – 25) = 1.96
2
X32 = (5 – 5)2 =
0
25 5

X22 = (24 – 15)2 = 5.4 X42 = (3 –


2
6. DECISION RULE/CONCLUSION:
Since the /X2c = 8.16/ > /X2t = 7.815/,
therefore, Reject Ho, accept Ha. The actual
observed proportion is significantly different from the
expected proportion.
EXAMPLE 2: Determine if sex is related to work
performance as indicated by the data below using
a 0.01 level of significance.
Work Low Average High
performance
Sex
Male 35 46 53
Female 30 57 48
1. Ho: Sex is not related to work performance.
Ha: Sex is related to work performance.
2.  = 0.01
3. Type/Kind of test: Chi-Square Test for two-way
classification
4. Critical/Tabular Value: X2t = 9.210
df = (c – 1)(r – 1)
df = (3 – 1)(2 – 1) = 2
5. Solutions:
Work Lo Averag High Row
performance w e Total
Gender
Male 35 46 53 134
Female 30 57 48 135
Column Total 65 103 101 269
Note: This is a contingency table and a two-way
classification, use formula 1 to find the expected output
(E).
Grand Total = 269
Work Low Avera High
performance ge
Sex
Male 35 = O1 46 = O3 53 = O5 134
Female 30 =
O2
57 = O4 48 = O6 135
65
E = (row total)(column total) 103 101 269
Grand Total
E1= (65)(134) = 32.38 E2 = (65)(135)
= 32.62
269
269
Work Low Avera High
performance ge
Gender
Male 35 = O1 46 = O3 53 = O5 134
Female 30 =
O2
57 = O4 48 = O6 135
E3 = (103)(134) = 51.31 65 103 101
E4 = 269
(103)(135)
= 51.69
269
269

E5 = (101)(134) = 50.31 E6 = (101)(135)


= 50.69
O1 = O3 = 46 O5 = 53 E1 = E3 = E5 =
35 32.38 51.31 50.31
O2 = The
Note: O4 =degrees
57 O6 =of E2 =is greater
48freedom E4 = than
E6 =
1. Use
30
formula 1 to compute X2. 32.62 51.69 50.69

X21 = (35 – 32.38)2 = 0.212 X24 = (57 – 51.69)2 = 0.545


32.38
51.69

X22 = (30 – 32.62)2 = 0.210 X25 = (53 – 50.31)2 = 0.144


32.62 50.31

X23 = (46 – 51.31)2 = 0.550 X26 = (48 – 50.69)2 = 0.143


51.31 50.69
6. DECISION RULE/CONCLUSION:
Since the /X2c = 1.804/ < /X2t = 9.210/,
therefore, Accept Ho. Gender is not related
to work performance.

NOTE: Formula 2 in finding X2 will only be


used if the degrees of freedom is equal to 1.

Example 3: Two treatments were


experimented on patients afflicted with a
certain disease. A random sample of 200
patients were taken and the results are
Recovered Did not
recover
Treatme 86 14
nt A
Test the hypothesis80
Treatme that recovery
20 from the
disease is
nt independent
B of the treatment at
the 2.5% significance level.
1. Ho: Recovery from the disease is independent
of the type of treatment.
Ha: Recovery from the disease is dependent of
the type of treatment.
2.  = 0.025
3. Type/Kind of test: Chi-Square Test for Two-way
4. Critical/Tabular Value: X2t = 5.024
df = (r – 1)(c – 1)
df = (2 – 1)(2 – 1) = 1

5. Solutions:
Recovered Did not
recover
Treatme 86 14 100
nt A
Treatme 80 20 100
Usentthe
B formula 1 in finding E
166total)(column
E = (row 34 total)
200
Grand total
Recovered Did not
recover
Treatme 86 = O1 14 = O3 100
nt A
Treatme 80 = O2 20 = O4 100
nt B
E1 = (100)(166) = 83 E2 E3 = (100)
(34) = 17 E4 166 34 200
200
200
X21 = (/86 – 83/ – 0.5)2 = 0.075 X3 = (/14 – 17/ –
0.5 )2 = 0.368
83
X2c = 0. 075(2) + 0.368(2) = 0.886

6. Decision Rule/Conclusion:
Since the /X2c = 0.886/ < /X2t = 5.024/,
therefore, accept Ho. The recovery of
patients inflicted from the disease is
independent of the treatment.
SIMPLE REGRESSION & CORRELATION ANALYSIS
LINEAR REGRESSION ANALYSIS is used when
there is a significant relationship between
two variables (x and y). This is also used in
predicting one variable with the knowledge of
another variable, that is, predicting the value
of y given the
Regression value
Line x
Equation: y = a + bx
x=y–a
b
where: y = the predicted score (dependent
variable)
a = the y-intercept
SIMPLE REGRESSION & CORRELATION ANALYSIS
Example 1.
The data below represent the
Midterm and the Final grades of 8 students
during
Stud a particularFinal
Midterm semester:
ent Grade (x) Grade (y)
1 83 86 a.Find the equation
2 87 91 of the regression
3 79 80 line.
4 77 80
5 85 88 b.Estimate the final
6 78 82 grade of a student
7 94 96 whose midterm
8 90 94
grade is 89.
SIMPLE REGRESSION & CORRELATION ANALYSIS
STEPS using the
Stud Midterm Final
ent Grade (x) Grade (y) calculator:
1 83 86 1. Press mode 2
2 87 91 (STAT)
3 79 80 2. Press 2 (A
4 77 80 + Bx)
5 85 88 3. Enter the values of
6 78 82
x and y
7 94 96
8 90 94 X
Y
4. Press AC press shift
83 =
151 =
86 =
5. To find A, press shift
87 =
SIMPLE REGRESSION & CORRELATION ANALYSIS
Stude Midterm Final Grade STEPS using the
nt Grade (x) (y) calculator:
1 83 86
6. To solve for B,
2 87 91
3 79 80 press shift 152 =
4 77 80 252 =
5 85 88 B = 1.012
6 78 82 7. To find r, press shift
7 94 96 a. y = a + bx
8 90 94
153 y== 2.018 + 1.012x
r = 0.99
c. x = ? y = 85 b. x = 89 y=?
y = a + bx
85 = 2.018 + 1.012x y = 2.018 +
1.012x = 85 – 2.018 1.012(89)
x = 82 y = 92.09
SIMPLE REGRESSION & CORRELATION ANALYSIS
STEPS: MODE COMP SD REG
1. Press Mode 31 3 = Reg 1 = Lin
2. Enter 83, 86 M+ n=1
87, 91 M+ n=2
79, 80 M+ n=3
77, 80 M+ n=4
85, 88 M+ n=5
78, 82 M+ n=6
94, 96 M+ n=7
90, 94 M+ n=8
Press shift SVAR
3. To find A Press shift 2 > > 1 = 2.02
To find B Press shift 2 > > 2 = 1.01
To find r Press shift 2 > > 3 = 0.99
SIMPLE REGRESSION & CORRELATION ANALYSIS
2. The raw scores obtained by 13 students in Problem Solving
and Statistics are given below.
Student 1 2 3 4 5 6 7 8 9 10 11 12 13
Problem 17 25 23 34 18 36 35 19 27 25 26 29 31
Solving (x)
Statistics (y) 20 21 19 25 17 29 27 18 25 23 24 25 29

a. What is the simple linear regression function?


b.Determine the score of the student in Statistics whose score in
Problem Solving is 38.
c. If the score of the student is Statistics is 32, find the his
corresponding score in Problem Solving
a = 8.21 b = 0.57
SIMPLE REGRESSION & CORRELATION ANALYSIS

a = 8.21 b = 0.57
a. What is the simple linear regression function?
y = a + bx y = 8.21 + 0.57x
b.Determine the score of the student in Statistics whose score in
Problem Solving is 38. y=? x = 38
y = 8.21 + 0.57x y = 8.21 + 0.57(38) = 29.87
c. If the score of the student in Statistics is 32, find his corresponding
score in Problem Solving. y = 32 x=?
y = 8.21 + 0.57x x = y – a = 32 –
8.21 = 41.74
32 = 8.21 + 0.57(x) b
0.57
SIMPLE REGRESSION & CORRELATION ANALYSIS

SIMPLE CORRELATION ANALYSIS is a technique


used to describe the relationship or association
between two variables

PEARSON–PRODUCT–MOMENT CORRELATION
(PEARSON r) is an index of relationship between two
variables. It may be obtained if we want to know the
degree of relationship between two variables which
are measured in an interval scale
SIMPLE REGRESSION & CORRELATION ANALYSIS
PROPERTIES OF PEARSON r
1. Correlation Coefficient may be positive or negative.

2. The magnitude of r is simply a measure of how


closely the points cluster about a certain trend line
which we have known as the LEAST SQUARE LINE or
the SIMPLE REGRESSION LINE.

3. A positive correlation is present when high values


in one variable are associated with high values of
another variable or vice versa.
SIMPLE REGRESSION & CORRELATION ANALYSIS
PROPERTIES OF PEARSON r
4. When high values on one variable is associated with
low values of the other variable or vice versa, a
negative correlation is present.

5. A perfect positive correlation is represented by a


positive 1 value.

6. A perfect negative correlation is represented by


a value of negative 1.
1. A study is conducted on the relationship of the number of absences (X)
and the grades (Y) of 10 students in Mathematics. Using r at 5% level of
significance, is there a significant relationship between the number of
absences and the grades of the 10 students in a Mathematics class?

Number of Absences (x) Grades in Math (y)


1 90
2 87
2 85
3 80
3 83
8 68
6 73
1 95
4 80
5 78
THANK
YOU!!!

You might also like