Goodnessnof Fit
Goodnessnof Fit
1|O L AT U B I V I C TO R I A
Adjusted R-squared is a modification of R² that adjusts for the number of predictors in the
model. It accounts for the possibility that R² can increase simply by adding more variables, even
if they are not meaningful.
Adjusted R² gives a more accurate picture when comparing models with different numbers of
independent variables.
c. Chi-square Test
The chi-square goodness of fit test is used to compare observed data with data expected under a
specific hypothesis. It checks if the frequency distribution of a categorical variable matches
expected distributions.
Formula:
(O−E)2
χ2= ∑ 𝐸
2|O L AT U B I V I C TO R I A
Residual Plots: These help in visualizing whether residuals are randomly distributed, indicating a
good fit.
2. Hypothesis Testing
Null Hypothesis: Often, when testing goodness of fit, the null hypothesis is that the model fits
the data well.
p-value: A low p-value (usually < 0.05) suggests that the model does not fit the data well, leading
to rejection of the null hypothesis.
Goodness of fit is a fundamental concept in statistical modeling that ensures that models
accurately reflect the data they are designed to represent. Various techniques like R-squared,
residual analysis, and chi-square tests offer ways to quantify how well a model fits. However,
understanding the context, assumptions, and limitations of each method is crucial in applying the
concept effectively.
Chi-squared test
This test was developed in 1900 by Karl Pearson (1857–1936), in part to investigate theories of
genetic inheritance.
Chi-Square is used to find out how the observed value of a given phenomenon is significantly
different from the expected value. In Chi Square goodness of fit test, the term goodness of fit is
used in order to compare the observed sample distribution with the expected probability
distribution. Chi-Square goodness of fit test determines how well theoretical distribution (such as
normal, binomial, or Poisson) fits the empirical distribution. In Chi-Square goodness of fit test,
sample data is divided into intervals. Then the numbers of points that fall into the interval are
compared, with the expected numbers of points in each interval. In regard to the procedure, set up
the hypothesis for Chi-Square goodness of fit test, that is set upo both null and alternative
hypothesis. a) Null hypothesis: In Chi-Square goodness of fit test, the null hypothesis assumes that
there is no significant difference between the observed and the expected value. b) Alternative
hypothesis: In Chi-Square goodness of fit test, the alternative hypothesis assumes that there is a
significant difference between the observed and the expected value. Calculate chi-square using the
formula and find out for the given degrees of freedom if chi-square value is significant at .05 or
.01 levels. If so reject the null hypothesis. If not significant accept the null hypothesis. 2.2.2 Testing
Hypothesis of Equal Probability The Chi-square test is a useful method of comparing
experimentally obtained results with those to be expected theoretically on some hypothesis. The
formula for calculating χ2 is
χ2 =∑[(O-E)2]/E
Where; O = observed frequency of a phenomenon or even which the experimenter is studying
E = expected frequency of the same phenomenon based on “no difference” or “null”
hypotheses.
3|O L AT U B I V I C TO R I A
The use of the above formula can be illustrated by the following example.
Example 1:
An attitude scale designed to measure attitude toward co-education was administered on 240
students. They have to give their response in terms of favorable, neutral and unfavorable. Of the
members in the group 70 marked favorable, 50 neutral and 120 disagreed. Do these results indicate
significant difference in attitude?
The observed data is (O) given in the first row of table below
In the second row is the distribution of answer to be expected on the basis of null hypothesis (E),
if each answer is selected equally.
Table 1: Responses from subjects in regard to the attitudes
Calculations Favourable Neutral Unfavourable Total
O 70 50 120 240
E 80 80 80 240
(O-E) -10 -30 40
(O-E)2 100 900 1600
(O-E)2 / E 100 /80 900/80 1600/80
X 1.25 11.25 20 ∑(O- E)2 /E =
32.50
4|O L AT U B I V I C TO R I A
(O-E) 3 -2 4 -3 -2
(O-E)2 9 4 16 9 4
The formula of χ2 is
χ2 =∑[(O-E)2 /E] = 0.45+0.20+0.80+0.45+0.20 = 2.10
χ2 = 2.10
d.f.=(r-1)(c-1)
d.f.=(5-1)(2-1)
Critical value of χ2 at .05 level=9.488 (refer to statistical table given at the end of the statistics
book) Critical value of χ2 at .01 level=13.277 (refer to statistical table given at the end of the
statistics book) The computed value of χ2, i.e. 2.10 is less than the critical values at .05 and
.01significance levels, we conclude that χ2 is not significant and we retain the null hypothesis.
We can say that the deviation of observed absenteeism from expectation might be a matter of
chance. On the other hand, if the computed value of χ2 is more than 9.48 or 13.28, then the null
hypothesis is rejected and it may be concluded that there is significant difference in the
absenteeism that happens on different days of the week.
However, in our example we have found the chi-square value being lower than what is given in
the table and so we retain the null hypothesis stating that the absenteeism does not vary interms
of the days and that it is purely a chance factor.
Steps for Chi-square Testing
1) First set a null of hypotheses.
2) Collect the data and find out observed frequency.
3) Find out the expected frequencies by adding all the observed frequencies divided by number
of Categories (In Ist example 240/3=80, in II example 100/5=20).
4) Find out the difference between observed frequencies and expected frequencies. (O-E)
5) Find out the square of the difference between observed and expected Frequency. (O-E)2
6) Divide it by expected frequency. (O-E)2 /E. You will get a quotient
7) Find out the sum of these quotients.
8) Determine the degree of freedom and find out the critical value of χ2 from table.
5|O L AT U B I V I C TO R I A
9) Compare the calculated and table value of χ2 and use the following decision rule. Accept null
hypothesis if χ2 is less than critical value given in table. Reject the null hypothesis if calculated
value of χ2 is more than what is given in the table under .05 or .01 significance levels.
6|O L AT U B I V I C TO R I A