Goodnessnof Fit

The document discusses the concept of goodness of fit in statistical modeling, which measures how well a model's predicted values align with actual observations. It outlines various methods for assessing goodness of fit, including R-squared, chi-square tests, and residual analysis, emphasizing their importance in validating model accuracy. Additionally, it provides examples of chi-square testing to illustrate how to determine if observed data significantly differs from expected distributions.

Uploaded by

Tobi Joshua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

Goodnessnof Fit

Uploaded by

Tobi Joshua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

BOWEN UNIVERSITY IWO

COLLEGE OF AGRICULTURE ENGINEERING AND SCIENCE (COAES)

PURE AND APPLIED BIOLOGY PROGRAMME
BIO 201: GENETICS 1
Module 3
TEST OF GOODNESS OF FIT
Goodness of Fit
Goodness of fit refers to how well a statistical model fits a set of observations. In simpler terms,
it measures how close the predicted values from a model are to the actual observed values. It is a
crucial concept in statistical analysis because it helps assess the accuracy of models in
representing real-world data.
Goodness of fit is applicable in various types of models, including linear regression, logistic
regression, and other types of statistical models used in predictive analytics. Below are the key
components and methods used to evaluate goodness of fit.
Purpose of Goodness of Fit
The purpose of assessing the goodness of fit is to determine:
• How well the model explains the variation in the data.
• If the model assumptions hold.
• Whether the model can be used to make reliable predictions.
Goodness of fit helps to understand whether the relationship between variables that a model
predicts holds true across the sample.
Methods for Measuring Goodness of Fit
There are several statistical methods used to measure the goodness of fit depending on the type
of model and data being used:
a. R-squared (R²)
R-squared is a commonly used statistic to measure the goodness of fit in regression models. It
tells you the proportion of variance in the dependent variable that can be explained by the
independent variables in the model. An R² value of 0 means that the model does not explain any
of the variation in the response variable. An R² value of 1 indicates that the model perfectly
explains all the variation. A higher R² value suggests a better fit, though it does not guarantee that
the model is accurate.
b. Adjusted R-squared

1|O L AT U B I V I C TO R I A
Adjusted R-squared is a modification of R² that adjusts for the number of predictors in the
model. It accounts for the possibility that R² can increase simply by adding more variables, even
if they are not meaningful.
Adjusted R² gives a more accurate picture when comparing models with different numbers of
independent variables.
c. Chi-square Test
The chi-square goodness of fit test is used to compare observed data with data expected under a
specific hypothesis. It checks if the frequency distribution of a categorical variable matches
expected distributions.
Formula:
(O−E)2
χ2= ∑ 𝐸

Where O represents observed values and E represents expected values.

A small chi-square statistic means a good fit; a large chi-square statistic indicates a poor fit.
d. Residual Analysis
Residuals are the differences between observed values and model-predicted values. The
distribution of residuals provides insights into how well the model fits the data.
Types of Residuals:
o Standardized Residuals: Help detect outliers.
o Studentized Residuals: Consider the influence of each point on the model.
Ideally, residuals should have a random pattern when plotted against predicted values. Patterns
in residuals suggest a poor fit or that assumptions (e.g., linearity) have been violated.
e. Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)
Both AIC and BIC are used to compare different models and assess goodness of fit, especially in
more complex models. AIC penalizes the number of parameters in the model but seeks the one
with the lowest value. BIC: Similar to AIC but applies a higher penalty for models with more
parameters. Lower AIC or BIC values indicate better-fitting models.
Evaluating the Model Fit
There are various ways to evaluate how well a model fits the data:
1. Visual Inspection
Scatterplots: In regression, scatterplots of observed vs. predicted values provide a clear picture of
how well the model performs. If points lie close to the line of perfect fit, the model is a good fit.

2|O L AT U B I V I C TO R I A
Residual Plots: These help in visualizing whether residuals are randomly distributed, indicating a
good fit.
2. Hypothesis Testing
Null Hypothesis: Often, when testing goodness of fit, the null hypothesis is that the model fits
the data well.
p-value: A low p-value (usually < 0.05) suggests that the model does not fit the data well, leading
to rejection of the null hypothesis.
Goodness of fit is a fundamental concept in statistical modeling that ensures that models
accurately reflect the data they are designed to represent. Various techniques like R-squared,
residual analysis, and chi-square tests offer ways to quantify how well a model fits. However,
understanding the context, assumptions, and limitations of each method is crucial in applying the
concept effectively.

Chi-squared test
This test was developed in 1900 by Karl Pearson (1857–1936), in part to investigate theories of
genetic inheritance.
Chi-Square is used to find out how the observed value of a given phenomenon is significantly
different from the expected value. In Chi Square goodness of fit test, the term goodness of fit is
used in order to compare the observed sample distribution with the expected probability
distribution. Chi-Square goodness of fit test determines how well theoretical distribution (such as
normal, binomial, or Poisson) fits the empirical distribution. In Chi-Square goodness of fit test,
sample data is divided into intervals. Then the numbers of points that fall into the interval are
compared, with the expected numbers of points in each interval. In regard to the procedure, set up
the hypothesis for Chi-Square goodness of fit test, that is set upo both null and alternative
hypothesis. a) Null hypothesis: In Chi-Square goodness of fit test, the null hypothesis assumes that
there is no significant difference between the observed and the expected value. b) Alternative
hypothesis: In Chi-Square goodness of fit test, the alternative hypothesis assumes that there is a
significant difference between the observed and the expected value. Calculate chi-square using the
formula and find out for the given degrees of freedom if chi-square value is significant at .05 or
.01 levels. If so reject the null hypothesis. If not significant accept the null hypothesis. 2.2.2 Testing
Hypothesis of Equal Probability The Chi-square test is a useful method of comparing
experimentally obtained results with those to be expected theoretically on some hypothesis. The
formula for calculating χ2 is
χ2 =∑[(O-E)2]/E
Where; O = observed frequency of a phenomenon or even which the experimenter is studying
E = expected frequency of the same phenomenon based on “no difference” or “null”
hypotheses.

3|O L AT U B I V I C TO R I A
The use of the above formula can be illustrated by the following example.
Example 1:
An attitude scale designed to measure attitude toward co-education was administered on 240
students. They have to give their response in terms of favorable, neutral and unfavorable. Of the
members in the group 70 marked favorable, 50 neutral and 120 disagreed. Do these results indicate
significant difference in attitude?
The observed data is (O) given in the first row of table below
In the second row is the distribution of answer to be expected on the basis of null hypothesis (E),
if each answer is selected equally.
Table 1: Responses from subjects in regard to the attitudes
Calculations Favourable Neutral Unfavourable Total
O 70 50 120 240
E 80 80 80 240
(O-E) -10 -30 40
(O-E)2 100 900 1600
(O-E)2 / E 100 /80 900/80 1600/80
X 1.25 11.25 20 ∑(O- E)2 /E =
32.50

The formula of χ2 is χ2 = ∑(O- E)2 /E

χ2 = 1.25+11.25+20=32.5
d.f.=(r-1)(k-1)=2
Entering table of χ2 (which can be obtained from any book of statistic), we find in row d.f.=2 a
value 5.59 and 9.19 given under the heading .05 and .01 levels of significance respectively. Our
obtained value is 32.5, which is far above the given value in the table. Thus, our results will be
marked significant at .01 level. We discard the null hypothesis which stated that there will be no
difference in the attitude. But from the results it may be stated quite confidently that there is
difference in the attitudes of people towards coeducation. It may also be mentioned that these
results clearly indicate that the results are not due to any chance factor.
Example 2: A personnel manager is trying to determine whether absenteeism is greater on one
day of the week than on another. His record for the past year shows the following scores. Test
whether the absence is uniformly distributed over the week.
Table 2.
Calculation Monday Tuesday Wednesday Thursday Friday Total
O 23 18 24 17 18 100
E 20 20 20 20 20 100

4|O L AT U B I V I C TO R I A
(O-E) 3 -2 4 -3 -2
(O-E)2 9 4 16 9 4

(O-E)2 / E 9/20=0.45 4/20=.20 16/20=.80 9/20=.45 4/20=.20

The formula of χ2 is
χ2 =∑[(O-E)2 /E] = 0.45+0.20+0.80+0.45+0.20 = 2.10
χ2 = 2.10
d.f.=(r-1)(c-1)
d.f.=(5-1)(2-1)

Critical value of χ2 at .05 level=9.488 (refer to statistical table given at the end of the statistics
book) Critical value of χ2 at .01 level=13.277 (refer to statistical table given at the end of the
statistics book) The computed value of χ2, i.e. 2.10 is less than the critical values at .05 and
.01significance levels, we conclude that χ2 is not significant and we retain the null hypothesis.
We can say that the deviation of observed absenteeism from expectation might be a matter of
chance. On the other hand, if the computed value of χ2 is more than 9.48 or 13.28, then the null
hypothesis is rejected and it may be concluded that there is significant difference in the
absenteeism that happens on different days of the week.
However, in our example we have found the chi-square value being lower than what is given in
the table and so we retain the null hypothesis stating that the absenteeism does not vary interms
of the days and that it is purely a chance factor.
Steps for Chi-square Testing
1) First set a null of hypotheses.
2) Collect the data and find out observed frequency.
3) Find out the expected frequencies by adding all the observed frequencies divided by number
of Categories (In Ist example 240/3=80, in II example 100/5=20).
4) Find out the difference between observed frequencies and expected frequencies. (O-E)
5) Find out the square of the difference between observed and expected Frequency. (O-E)2
6) Divide it by expected frequency. (O-E)2 /E. You will get a quotient
7) Find out the sum of these quotients.
8) Determine the degree of freedom and find out the critical value of χ2 from table.

5|O L AT U B I V I C TO R I A
9) Compare the calculated and table value of χ2 and use the following decision rule. Accept null
hypothesis if χ2 is less than critical value given in table. Reject the null hypothesis if calculated
value of χ2 is more than what is given in the table under .05 or .01 significance levels.

6|O L AT U B I V I C TO R I A

A Guide To The Matlab Toolbox For Interacted Panel VAR Estimations (IPVAR)
No ratings yet
A Guide To The Matlab Toolbox For Interacted Panel VAR Estimations (IPVAR)
17 pages
CH 04
No ratings yet
CH 04
53 pages
1 WorkText WW and PT Stat and Prob
No ratings yet
1 WorkText WW and PT Stat and Prob
2 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Hypothesis Testing - Correlation
No ratings yet
Hypothesis Testing - Correlation
1 page
IV AI-DS AD3491 FDSA Unit5
No ratings yet
IV AI-DS AD3491 FDSA Unit5
39 pages
Sardilla's Report On Advance Statistic
No ratings yet
Sardilla's Report On Advance Statistic
32 pages
IV AI-DS AD3491 FDSA Unit5
No ratings yet
IV AI-DS AD3491 FDSA Unit5
35 pages
Engineering Mathematics 2
No ratings yet
Engineering Mathematics 2
29 pages
AP Stats Ch25
No ratings yet
AP Stats Ch25
105 pages
Stats Notes
No ratings yet
Stats Notes
76 pages
Define The Null Hypothesis (No Difference Between Sample and Theoretical Distribution) and The Alternative Hypothesis (Difference Exists) .
No ratings yet
Define The Null Hypothesis (No Difference Between Sample and Theoretical Distribution) and The Alternative Hypothesis (Difference Exists) .
21 pages
1 - CA51018 - Chi Square - Introduction - Goodness of Fit Test - 2
No ratings yet
1 - CA51018 - Chi Square - Introduction - Goodness of Fit Test - 2
36 pages
Nonparametric Testing
No ratings yet
Nonparametric Testing
4 pages
RM Unit 4 - Part 2
No ratings yet
RM Unit 4 - Part 2
35 pages
23MT2013 DSS CO4 Session 20 Statistical Tests
No ratings yet
23MT2013 DSS CO4 Session 20 Statistical Tests
40 pages
Statistical Theory Lecture 5-2025
No ratings yet
Statistical Theory Lecture 5-2025
13 pages
Statistics (Autosaved)
No ratings yet
Statistics (Autosaved)
75 pages
Chi Square and Annova
100% (1)
Chi Square and Annova
29 pages
Chapter 5 Goodness of Fit and Contingency Table
No ratings yet
Chapter 5 Goodness of Fit and Contingency Table
21 pages
Statistics Unit 9 Notes
No ratings yet
Statistics Unit 9 Notes
10 pages
Chisquare
No ratings yet
Chisquare
10 pages
Chi Square Goodness of Fit Test
No ratings yet
Chi Square Goodness of Fit Test
25 pages
5-Chi-Square Test
No ratings yet
5-Chi-Square Test
27 pages
Variance StdDev
No ratings yet
Variance StdDev
47 pages
When To Use Chi-Square? Sample Problems
No ratings yet
When To Use Chi-Square? Sample Problems
5 pages
Chi Square Test
No ratings yet
Chi Square Test
9 pages
Chapter 6
No ratings yet
Chapter 6
13 pages
Biostatistics L11+12 2021
No ratings yet
Biostatistics L11+12 2021
9 pages
Chapter 6
No ratings yet
Chapter 6
10 pages
Grad Lecture 3
No ratings yet
Grad Lecture 3
27 pages
Chisquaretest
No ratings yet
Chisquaretest
16 pages
Lecture3 - Contingency Analysis
No ratings yet
Lecture3 - Contingency Analysis
16 pages
AI22 Chi Square Goodness of Fit Test
No ratings yet
AI22 Chi Square Goodness of Fit Test
15 pages
Dsbda Unit2
No ratings yet
Dsbda Unit2
13 pages
9 0
No ratings yet
9 0
9 pages
Chi Square Test (Lecture 5)
No ratings yet
Chi Square Test (Lecture 5)
34 pages
Parametric Vs Non Parametric Tests - Chi Square Test
No ratings yet
Parametric Vs Non Parametric Tests - Chi Square Test
21 pages
Parametric and Non
No ratings yet
Parametric and Non
36 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
6 pages
Goodness of Fit
No ratings yet
Goodness of Fit
15 pages
Psychology Statistics
No ratings yet
Psychology Statistics
26 pages
QRM 7A Inferential Statistics
No ratings yet
QRM 7A Inferential Statistics
33 pages
Non-Parametric Methods: Goodness of Fit Tests: (Chi-Square Applications)
No ratings yet
Non-Parametric Methods: Goodness of Fit Tests: (Chi-Square Applications)
45 pages
Chi Square
No ratings yet
Chi Square
19 pages
Quantitative Methods For Management: Session - 10
No ratings yet
Quantitative Methods For Management: Session - 10
95 pages
Test of Significance
No ratings yet
Test of Significance
20 pages
Statistics Notes
No ratings yet
Statistics Notes
18 pages
C22 P09 Chi Square Test
No ratings yet
C22 P09 Chi Square Test
33 pages
Chi Square
No ratings yet
Chi Square
19 pages
Chisquare Gonzales
No ratings yet
Chisquare Gonzales
32 pages
ChiSquare Examples
No ratings yet
ChiSquare Examples
22 pages
Statistics
No ratings yet
Statistics
37 pages
QM Lecture 10 - Chi Square Tests
No ratings yet
QM Lecture 10 - Chi Square Tests
48 pages
Model Validity
No ratings yet
Model Validity
511 pages
11 12 .Chi-Square Test
No ratings yet
11 12 .Chi-Square Test
29 pages
Chi Square
No ratings yet
Chi Square
5 pages
T Test, ANOVA, Chi Square Test
No ratings yet
T Test, ANOVA, Chi Square Test
26 pages
Module 5a Chi Square - Introduction - Goodness of Fit Test
No ratings yet
Module 5a Chi Square - Introduction - Goodness of Fit Test
39 pages
STATIS (Statistics) Chapter 5: Introduction To Non-Parametric Lesson I The Non-Parametric (Brownlee, 2018)
No ratings yet
STATIS (Statistics) Chapter 5: Introduction To Non-Parametric Lesson I The Non-Parametric (Brownlee, 2018)
7 pages
Statistical Tests
No ratings yet
Statistical Tests
20 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
24 pages
Document From Da??
No ratings yet
Document From Da??
40 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Might in The Morning
No ratings yet
Might in The Morning
1 page
Proteins and Amino Acids
No ratings yet
Proteins and Amino Acids
23 pages
Might in The Evening
No ratings yet
Might in The Evening
12 pages
In The Evening
No ratings yet
In The Evening
7 pages
Hit The Rocky Morning
No ratings yet
Hit The Rocky Morning
4 pages
2024 2025 First Semester Lecture Schedule-1
No ratings yet
2024 2025 First Semester Lecture Schedule-1
8 pages
L1-Functional Organization of Respiratory System (Lectures 1) 2022
No ratings yet
L1-Functional Organization of Respiratory System (Lectures 1) 2022
15 pages
JAMB Direct Entry Registration Form Template
No ratings yet
JAMB Direct Entry Registration Form Template
1 page
Trading Routine
No ratings yet
Trading Routine
1 page
Human Genome
No ratings yet
Human Genome
19 pages
SDG On Zero Hunger
No ratings yet
SDG On Zero Hunger
2 pages
Blood Donor Recruitment Seminar
No ratings yet
Blood Donor Recruitment Seminar
23 pages
Trading Routine
No ratings yet
Trading Routine
18 pages
Non-Parametric Tests
100% (1)
Non-Parametric Tests
10 pages
Statistical Data Treatment and Evaluation Lecture 1
No ratings yet
Statistical Data Treatment and Evaluation Lecture 1
16 pages
(Springer Series in Statistics) Jun Shao, Dongsheng Tu (Auth.) - The Jackknife and Bootstrap-Springer-Verlag New York (1995)
100% (1)
(Springer Series in Statistics) Jun Shao, Dongsheng Tu (Auth.) - The Jackknife and Bootstrap-Springer-Verlag New York (1995)
532 pages
Time Series Analysis Updated Unit-4
No ratings yet
Time Series Analysis Updated Unit-4
31 pages
ANOVA Test - Definition, Types, Examples
No ratings yet
ANOVA Test - Definition, Types, Examples
8 pages
Package Boot': August 29, 2013
No ratings yet
Package Boot': August 29, 2013
117 pages
RATS 900 Paper Replication Programs
No ratings yet
RATS 900 Paper Replication Programs
6 pages
Topic 8 Time Series and Forecasting
No ratings yet
Topic 8 Time Series and Forecasting
33 pages
STA305H1/STA1004H: Design and Analysis of Experiments
No ratings yet
STA305H1/STA1004H: Design and Analysis of Experiments
4 pages
Unit 1: Essence of Biostatistics: CS4220: Knowledge Discovery Methods For Bioinformatics
No ratings yet
Unit 1: Essence of Biostatistics: CS4220: Knowledge Discovery Methods For Bioinformatics
114 pages
Diffinmeans Example2
No ratings yet
Diffinmeans Example2
11 pages
Result Drug (c.1) Placebo (c.2) Total
No ratings yet
Result Drug (c.1) Placebo (c.2) Total
2 pages
523-530 Jurnal Ministal Teguh Setiawan
No ratings yet
523-530 Jurnal Ministal Teguh Setiawan
8 pages
Margin of Error
No ratings yet
Margin of Error
6 pages
How To Do ANCOVA Problems in SPSS
No ratings yet
How To Do ANCOVA Problems in SPSS
9 pages
NT31103 Lab Report Practical 5 (Tuesday Session - Group Hii Bao Yi) - 2
No ratings yet
NT31103 Lab Report Practical 5 (Tuesday Session - Group Hii Bao Yi) - 2
36 pages
EXPERIMENT 9: Implementing T-Test
100% (1)
EXPERIMENT 9: Implementing T-Test
8 pages
Slides 8 Iu
No ratings yet
Slides 8 Iu
42 pages
Data Science Statistics With Data Science Portfolio
No ratings yet
Data Science Statistics With Data Science Portfolio
6 pages
613 P
No ratings yet
613 P
2 pages
Business Statistics, Canadian Edition Chapter 1 Test Bank
60% (5)
Business Statistics, Canadian Edition Chapter 1 Test Bank
29 pages
FEU Diliman - Forecasting Techniques
No ratings yet
FEU Diliman - Forecasting Techniques
5 pages
Z Test (Modified)
No ratings yet
Z Test (Modified)
33 pages
Inferential Statistics Definition Uses
No ratings yet
Inferential Statistics Definition Uses
2 pages
Table 9 3 Contains 40 Annual Counts of The Numbers of Recruits and Spawners in A Salmon
No ratings yet
Table 9 3 Contains 40 Annual Counts of The Numbers of Recruits and Spawners in A Salmon
2 pages
Statistics I Ii For Dummies 2 Ebook Bundle 1 2 Deborah Rumsey Download
No ratings yet
Statistics I Ii For Dummies 2 Ebook Bundle 1 2 Deborah Rumsey Download
87 pages

Goodnessnof Fit

Uploaded by

Goodnessnof Fit

Uploaded by

BOWEN UNIVERSITY IWO

COLLEGE OF AGRICULTURE ENGINEERING AND SCIENCE (COAES)

Where O represents observed values and E represents expected values.

The formula of χ2 is χ2 = ∑(O- E)2 /E

(O-E)2 / E 9/20=0.45 4/20=.20 16/20=.80 9/20=.45 4/20=.20

You might also like