0% found this document useful (0 votes)
27 views13 pages

Class 20 Chi Square Copy 1 79

Uploaded by

SANDEEP Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views13 pages

Class 20 Chi Square Copy 1 79

Uploaded by

SANDEEP Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

www.byjusexamprep.

com

1
www.byjusexamprep.com

Analysis of Variance(ANOVA)

• Given by Sir Ronald Fisher


• The principle aim of statistical models is to explain the variation in measurements.
• The statistical model involving a test of significance of the difference in mean values of the
variable between two groups is the student's ‘t’ test If there are more than two groups, the
appropriate statistical model is Analysis of Variance (AN OVA)

Assumptions for ANOVA

1. Sample population can be easily approximated to normal distribution.

2. All populations have the same Standard Deviation.

3. Individuals in the population are selected randomly.

4. Independent samples

● ANOVA compares variance by means of a simple ratio, called F-Ratio


F= Variance between groups

Variance within groups

• The resulting F statistics are then compared with a critical value of F (critic), obtained from
F tables in much the same way as was done with 't'
• If the calculated value exceeds the critical value for the appropriate level of α, the null
hypothesis will be rejected.
• An F test is therefore a test of the Ratio of Variances F Tests can also be used on their own,
independently of the ANOVA technique, to test hypotheses about variances.
• In ANOVA, the F test is used to establish whether a statistically significant difference exists
in the data being tested.

• ANOVA can be
❑ One Way ANOVA

2
www.byjusexamprep.com

⮚ If the various experimental groups differ in terms of only one factor at a time- a one way
ANOVA is used
e.g. A study to assess the effectiveness of four different antibiotics on S Sanguis

❑ Two Way ANOVA


⮚ If the various groups differ in terms of two or more factors at a time, then a Two Way
ANOVA is performed
e.g. A study to assess the effectiveness of four different antibiotics on S Sanguis in three
different age groups

Pearson's Correlation Coefficient

Karl Pearson is the most popular, widely used and correlation quantitatively within specified
limitations through an ideal measure of covariance. The coefficient of correlation, here always
ranges between + 1 and – 1. One (1) indicates a complete correlation and zero (0) indicates no
correlation at all. It is popularly called Karl Pearson’s Coefficient of correlation or Pearsonian
Correlation. The formulas used under this method are:

By Direct method (Actual mean)

Where: γ = Karl Pearson’s Coefficient of Correlation


x and y = Deviations of individual items of the series from their mean
n = The number of terms of a series
σ1 and σ2 = standard Deviations of first and second series

The Kruskal-Wallis H Test

• The Kruskal-Wallis H Test is a non-parametric procedure that can be used to compare


more than two populations in a completely randomized design.
• All n = n1 + n2 + ... + nk measurements are jointly ranked (i.e. treat as one large sample).
• We use the sums of the ranks of the k samples to compare the distributions.

3
www.byjusexamprep.com

The Kruskal-Wallis H Test

✔ Rank the total measurements in all k samples from 1 to n. Tied observations arc assigned
average of the ranks they would have gotten if not tied .
✔ Calculate
▪ Ti = rank sum for the i th sample i = 1, 2, ... ,k
✔ And the test statistic

12 Ti2
H=  − 3(n + 1)
n(n + 1) ni

The Kruskal-Wallis H Test

H0: the k distributions are identical versus

Ha : at least one distribution is different

Test statistic: Kruskal-Wallis H

When H0 is true, the test statistic H has an approximate chi-square distribution with df

= k-1.

Use a right-tailed rejection region or p-value based on the Chi-square distribution.

Example

Four groups of students were randomly assigned to be taught with four different techniques, and
their achievement test scores were recorded. Are the distributions of test scores the same, or do
they differ in location?

1 2 3 4
65 75 59 94
87 69 78 89
79 81 62 88

Teaching Methods

1 2 3 4

65 (3) 75 (7) 59 (1) 94 (16)

87 (13) 69 (5) 78 (8) 89 (15)

4
www.byjusexamprep.com

73 (6) 83 (12) 67 (4) 80 (10)

79 (9) 81 (11) 62 (2) 88 (14)

Ti 31 35 15 55

Teaching Methods

Key Concepts

l. Nonparametric Methods

These methods can be used when the data cannot be measured on a quantitative scale, or when

• The numerical scale of measurement is arbitrarily set by the researcher, or when


• The parametric assumptions such as normality or constant variance are seriously violated.

Key Concepts

Kruskal-Wallis H Test: Completely Randomized Design

5
www.byjusexamprep.com

1. Jointly rank all the observations in the k samples (treat as one large sample of size n say).
Calculate the rank sums, Ti, = rank sum of sample i. and the test statistic

12 Ti2
H=  − 3(n + 1)
n(n + 1) ni

2. If the null hypothesis of equality of distributions is false, H will be unusually large, resulting in
a one-tailed test

3. For sample sizes of five or greater, the rejection region for H is based on the chi-square
distribution with (k – 1) degrees of freedom.

Mann Whitney U test:

nonparametric equivalent of a t test for two independent samples

Use when:

• Data does not support means (ordinal)

• Data is not normally distributed.

1) Rank all data.

2) Evaluate if ranks tend to cluster within a group.

Mann Whitney U test:

n1 (n1 + 1)
U1 = (n1 )(n2 ) + − R1
2

n2 (n2 + 1)
U2 = (n1 )(n2 ) + − R2
2

Where: n1 Size of Sample one

n2 Size of Sample two

Evaluation of Mann Whitney U

1) Choose the smaller of the two U values.

6
www.byjusexamprep.com

2) Find the critical value (Mann Whitney table)

3) When a computed value is smaller than the critical value the outcome is significant!

group 1 group 2

24 28

18 42

45 63

57 57

12 90

30 68

Step One: Rank all data across groups

group 1 group 2

24 28

18 2 42

45 63

57 57

12 1 90

30 68

group 1 group 2

24 3 28 4

18 2 42 6

45 7 63 10

57 8.5 57 8.5

12 1 90 12

7
www.byjusexamprep.com

30 5 68 11

Step Two: Sum the ranks for each group

group 1 group 2

24 3 28 4

18 2 42 6

45 7 63 10

57 8.5 57 8.5

12 1 90 12

30 5 68 11

26.5 51.5

Check the rankings:

n(n + 1)
R = 2

(12)(13)
R = 2

156
R = 2

R = 78

Step Three: Compute U1

n1 (n1 + 1)
U1 = (n1 )(n2 ) + − R1
2

6(7)
U1 = (6)(6) + − 26.5
2

8
www.byjusexamprep.com

U1 = 36 + 21– 26.5

U1 = 30.5

Step Four: Compute U2

n2 (n2 + 1)
U2 = (n1 )(n2 ) + − R2
2

6(7)
U2 = (6)(6) + − 51.5
2

U2 = 36+21–51.5

U2 = 5.5

Step Five: Compare U1 to U2

U1 = 30.5

U2 = 5.5

5.5 < 30.5

U = 5.5

Critical Value = 5

This is a nonsignificant outcome

Chi-square Test

Chi-square is a test statistic used to test a hypothesis that provides a set of the theoretical
frequencies with, which observed frequencies are compared.

Chi-square, symbolically written as x2, enable us to test and compare whether more than two
population proportions can be considered equal.

Hence, it is a non-parametric test of statistical significance. Which compare observed data with
expected data and testing the null hypothesis, which states that there is no significant difference
between the expected and the observed result.
2
The Chi-square (  ) is computed by using the following formula.

9
www.byjusexamprep.com

 ( O − E)
2

2 =
E

where O represents the observed frequency, E represents an expected frequency.

Whether or not a calculated value of 2 is significant, can be ascertained by looking at the


2
tabulated values of  for a given degree of freedom at a certain level of confidence (generally
5% level is taken). If the calculated value of 12 exceeds the table value, the difference between the
observed and expected frequencies is taken as significant but if the table value is more than the
2
calculated value of  , then the difference is considered as insignificant. Insignificant value is
considered to have arisen as a result of chance and as such can be ignored.

Area of Application of Chi-square Test

The Chi-square test technique is used in a number of problems. Some of them are

As a Test of Goodness of Fit Karl Pearson developed a test for significance called the chi-square
test of goodness of fit, which is used to test whether or not the observed frequency results support
a Particular hypothesis. The test can be used to identify whether the deviations, if any, between
the observed and estimated values can be because of a chance or some other inadequacies.
2
As a Test of Homogeneity  test helps is in stating whether different samples' come from the
same universe. Through, this test, we can also explain whether the results worked cut on the basis
of sample/samples are in conformity with a well-defined hypothesis or the results fail to support
the given hypothesis.
2
As Test of Population Variance  square is also used to test the significance of population
variance through confidence intervals, especially in the case of small samples.

Conditions for the Applicability of 2 Test

The following conditions should be satisfied before the test can be applied

• Observations are recorded and collected on a random basis.

• All the members in the sample must be independent

• No group should contain very few items.

• The overall number of items must be reasonably large.

• The constraints must be linear. Constraints, which involve linear equations in the cell frequencies
of a contingency table are known as linear constraints

Step involved in Finding the Value of Chi-square

10
www.byjusexamprep.com

The process of computing the 2 value involves the following steps

1. Set-up null hypothesis and alternative hypothesis.

2. List-up the observed frequencies.

3. Calculate the expected frequencies, if the data followed a given theoretical distribution.

4. Obtain the difference between the observed and corresponding expected frequencies.

5. Expressing the square of the difference as a fraction of the corresponding expected frequencies.

6. Now add all the fractions obtained.

7. Then compare the value with the appropriate (x2) value from the tables at the predetermined
level of significance.

8. Accept the null hypothesis, if the value, thus computed for the given degrees of freedom and
levels of significance is lesser than the (tabulated value) otherwise rejecting it.

Illustration The following table depicts the expected sales (E) and actual sales (O) of television
sets for a company. Test whether there is a substantial difference between the observed values and
expected value, using the chi-square method.

Actual and Expected Sales of Television Sets

Actual Sales (O) Expected Sales (E)

57 59

69 76

51 55

83 75

44 39

48 53

35 30

37 48

Solution

Computation of Test Statistic

11
www.byjusexamprep.com

O E O–E (O – E)2 (O – E)2 / E

57 59 –2 4 0.068

69 76 –8 64 0.842

51 55 –4 16 0.291

83 75 8 64 0853

44 39 5 25 0.641

48 53 –5 25 0.472

35 30 –5 25 0.833

37 48 –11 121 2.521

Total 6.521

The critical value of Chi-square (8 – 1) = 7 degree of freedom at 0.05 level of significance is


2.167

 ( O − E)
2
2
x = = 6.521
But E

Since, the value of x2 does fall within critical region the null hypothesis has to rejected. That
is, there is a significant difference actual values of sales and value of sales.

12
www.byjusexamprep.com

13

You might also like