www.byjusexamprep.
com
1
www.byjusexamprep.com
Analysis of Variance(ANOVA)
• Given by Sir Ronald Fisher
• The principle aim of statistical models is to explain the variation in measurements.
• The statistical model involving a test of significance of the difference in mean values of the
variable between two groups is the student's ‘t’ test If there are more than two groups, the
appropriate statistical model is Analysis of Variance (AN OVA)
Assumptions for ANOVA
1. Sample population can be easily approximated to normal distribution.
2. All populations have the same Standard Deviation.
3. Individuals in the population are selected randomly.
4. Independent samples
● ANOVA compares variance by means of a simple ratio, called F-Ratio
F= Variance between groups
Variance within groups
• The resulting F statistics are then compared with a critical value of F (critic), obtained from
F tables in much the same way as was done with 't'
• If the calculated value exceeds the critical value for the appropriate level of α, the null
hypothesis will be rejected.
• An F test is therefore a test of the Ratio of Variances F Tests can also be used on their own,
independently of the ANOVA technique, to test hypotheses about variances.
• In ANOVA, the F test is used to establish whether a statistically significant difference exists
in the data being tested.
• ANOVA can be
❑ One Way ANOVA
2
www.byjusexamprep.com
⮚ If the various experimental groups differ in terms of only one factor at a time- a one way
ANOVA is used
e.g. A study to assess the effectiveness of four different antibiotics on S Sanguis
❑ Two Way ANOVA
⮚ If the various groups differ in terms of two or more factors at a time, then a Two Way
ANOVA is performed
e.g. A study to assess the effectiveness of four different antibiotics on S Sanguis in three
different age groups
Pearson's Correlation Coefficient
Karl Pearson is the most popular, widely used and correlation quantitatively within specified
limitations through an ideal measure of covariance. The coefficient of correlation, here always
ranges between + 1 and – 1. One (1) indicates a complete correlation and zero (0) indicates no
correlation at all. It is popularly called Karl Pearson’s Coefficient of correlation or Pearsonian
Correlation. The formulas used under this method are:
By Direct method (Actual mean)
Where: γ = Karl Pearson’s Coefficient of Correlation
x and y = Deviations of individual items of the series from their mean
n = The number of terms of a series
σ1 and σ2 = standard Deviations of first and second series
The Kruskal-Wallis H Test
• The Kruskal-Wallis H Test is a non-parametric procedure that can be used to compare
more than two populations in a completely randomized design.
• All n = n1 + n2 + ... + nk measurements are jointly ranked (i.e. treat as one large sample).
• We use the sums of the ranks of the k samples to compare the distributions.
3
www.byjusexamprep.com
The Kruskal-Wallis H Test
✔ Rank the total measurements in all k samples from 1 to n. Tied observations arc assigned
average of the ranks they would have gotten if not tied .
✔ Calculate
▪ Ti = rank sum for the i th sample i = 1, 2, ... ,k
✔ And the test statistic
12 Ti2
H= − 3(n + 1)
n(n + 1) ni
The Kruskal-Wallis H Test
H0: the k distributions are identical versus
Ha : at least one distribution is different
Test statistic: Kruskal-Wallis H
When H0 is true, the test statistic H has an approximate chi-square distribution with df
= k-1.
Use a right-tailed rejection region or p-value based on the Chi-square distribution.
Example
Four groups of students were randomly assigned to be taught with four different techniques, and
their achievement test scores were recorded. Are the distributions of test scores the same, or do
they differ in location?
1 2 3 4
65 75 59 94
87 69 78 89
79 81 62 88
Teaching Methods
1 2 3 4
65 (3) 75 (7) 59 (1) 94 (16)
87 (13) 69 (5) 78 (8) 89 (15)
4
www.byjusexamprep.com
73 (6) 83 (12) 67 (4) 80 (10)
79 (9) 81 (11) 62 (2) 88 (14)
Ti 31 35 15 55
Teaching Methods
Key Concepts
l. Nonparametric Methods
These methods can be used when the data cannot be measured on a quantitative scale, or when
• The numerical scale of measurement is arbitrarily set by the researcher, or when
• The parametric assumptions such as normality or constant variance are seriously violated.
Key Concepts
Kruskal-Wallis H Test: Completely Randomized Design
5
www.byjusexamprep.com
1. Jointly rank all the observations in the k samples (treat as one large sample of size n say).
Calculate the rank sums, Ti, = rank sum of sample i. and the test statistic
12 Ti2
H= − 3(n + 1)
n(n + 1) ni
2. If the null hypothesis of equality of distributions is false, H will be unusually large, resulting in
a one-tailed test
3. For sample sizes of five or greater, the rejection region for H is based on the chi-square
distribution with (k – 1) degrees of freedom.
Mann Whitney U test:
nonparametric equivalent of a t test for two independent samples
Use when:
• Data does not support means (ordinal)
• Data is not normally distributed.
1) Rank all data.
2) Evaluate if ranks tend to cluster within a group.
Mann Whitney U test:
n1 (n1 + 1)
U1 = (n1 )(n2 ) + − R1
2
n2 (n2 + 1)
U2 = (n1 )(n2 ) + − R2
2
Where: n1 Size of Sample one
n2 Size of Sample two
Evaluation of Mann Whitney U
1) Choose the smaller of the two U values.
6
www.byjusexamprep.com
2) Find the critical value (Mann Whitney table)
3) When a computed value is smaller than the critical value the outcome is significant!
group 1 group 2
24 28
18 42
45 63
57 57
12 90
30 68
Step One: Rank all data across groups
group 1 group 2
24 28
18 2 42
45 63
57 57
12 1 90
30 68
group 1 group 2
24 3 28 4
18 2 42 6
45 7 63 10
57 8.5 57 8.5
12 1 90 12
7
www.byjusexamprep.com
30 5 68 11
Step Two: Sum the ranks for each group
group 1 group 2
24 3 28 4
18 2 42 6
45 7 63 10
57 8.5 57 8.5
12 1 90 12
30 5 68 11
26.5 51.5
Check the rankings:
n(n + 1)
R = 2
(12)(13)
R = 2
156
R = 2
R = 78
Step Three: Compute U1
n1 (n1 + 1)
U1 = (n1 )(n2 ) + − R1
2
6(7)
U1 = (6)(6) + − 26.5
2
8
www.byjusexamprep.com
U1 = 36 + 21– 26.5
U1 = 30.5
Step Four: Compute U2
n2 (n2 + 1)
U2 = (n1 )(n2 ) + − R2
2
6(7)
U2 = (6)(6) + − 51.5
2
U2 = 36+21–51.5
U2 = 5.5
Step Five: Compare U1 to U2
U1 = 30.5
U2 = 5.5
5.5 < 30.5
U = 5.5
Critical Value = 5
This is a nonsignificant outcome
Chi-square Test
Chi-square is a test statistic used to test a hypothesis that provides a set of the theoretical
frequencies with, which observed frequencies are compared.
Chi-square, symbolically written as x2, enable us to test and compare whether more than two
population proportions can be considered equal.
Hence, it is a non-parametric test of statistical significance. Which compare observed data with
expected data and testing the null hypothesis, which states that there is no significant difference
between the expected and the observed result.
2
The Chi-square ( ) is computed by using the following formula.
9
www.byjusexamprep.com
( O − E)
2
2 =
E
where O represents the observed frequency, E represents an expected frequency.
Whether or not a calculated value of 2 is significant, can be ascertained by looking at the
2
tabulated values of for a given degree of freedom at a certain level of confidence (generally
5% level is taken). If the calculated value of 12 exceeds the table value, the difference between the
observed and expected frequencies is taken as significant but if the table value is more than the
2
calculated value of , then the difference is considered as insignificant. Insignificant value is
considered to have arisen as a result of chance and as such can be ignored.
Area of Application of Chi-square Test
The Chi-square test technique is used in a number of problems. Some of them are
As a Test of Goodness of Fit Karl Pearson developed a test for significance called the chi-square
test of goodness of fit, which is used to test whether or not the observed frequency results support
a Particular hypothesis. The test can be used to identify whether the deviations, if any, between
the observed and estimated values can be because of a chance or some other inadequacies.
2
As a Test of Homogeneity test helps is in stating whether different samples' come from the
same universe. Through, this test, we can also explain whether the results worked cut on the basis
of sample/samples are in conformity with a well-defined hypothesis or the results fail to support
the given hypothesis.
2
As Test of Population Variance square is also used to test the significance of population
variance through confidence intervals, especially in the case of small samples.
Conditions for the Applicability of 2 Test
The following conditions should be satisfied before the test can be applied
• Observations are recorded and collected on a random basis.
• All the members in the sample must be independent
• No group should contain very few items.
• The overall number of items must be reasonably large.
• The constraints must be linear. Constraints, which involve linear equations in the cell frequencies
of a contingency table are known as linear constraints
Step involved in Finding the Value of Chi-square
10
www.byjusexamprep.com
The process of computing the 2 value involves the following steps
1. Set-up null hypothesis and alternative hypothesis.
2. List-up the observed frequencies.
3. Calculate the expected frequencies, if the data followed a given theoretical distribution.
4. Obtain the difference between the observed and corresponding expected frequencies.
5. Expressing the square of the difference as a fraction of the corresponding expected frequencies.
6. Now add all the fractions obtained.
7. Then compare the value with the appropriate (x2) value from the tables at the predetermined
level of significance.
8. Accept the null hypothesis, if the value, thus computed for the given degrees of freedom and
levels of significance is lesser than the (tabulated value) otherwise rejecting it.
Illustration The following table depicts the expected sales (E) and actual sales (O) of television
sets for a company. Test whether there is a substantial difference between the observed values and
expected value, using the chi-square method.
Actual and Expected Sales of Television Sets
Actual Sales (O) Expected Sales (E)
57 59
69 76
51 55
83 75
44 39
48 53
35 30
37 48
Solution
Computation of Test Statistic
11
www.byjusexamprep.com
O E O–E (O – E)2 (O – E)2 / E
57 59 –2 4 0.068
69 76 –8 64 0.842
51 55 –4 16 0.291
83 75 8 64 0853
44 39 5 25 0.641
48 53 –5 25 0.472
35 30 –5 25 0.833
37 48 –11 121 2.521
Total 6.521
The critical value of Chi-square (8 – 1) = 7 degree of freedom at 0.05 level of significance is
2.167
( O − E)
2
2
x = = 6.521
But E
Since, the value of x2 does fall within critical region the null hypothesis has to rejected. That
is, there is a significant difference actual values of sales and value of sales.
12
www.byjusexamprep.com
13