Chi Squared Tutorial

This document provides an overview of chi-squared analysis, which determines the relationship between a data set and a theoretical model. It explains that chi-squared analysis can identify the best-fit line for a data set, determine how good the fit is, and assess whether the best-fit line agrees with the theoretical model. The document then walks through the chi-squared calculation and uses examples to illustrate maximum likelihood estimation, testing the goodness of fit, and determining the uncertainty in the fit.

Uploaded by

galaxi.nongnong6291

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views5 pages

Chi Squared Tutorial

Uploaded by

galaxi.nongnong6291

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

PSI

Tutorial: Chi-Squared Analysis

The chi-squared analysis is a method of determining the relationship between a data set and a theoretical
model. This analysis can tell us three things:

1. What is the best-fit line?
2. How good is the fit?
3. Does the best-fit line agree with our theoretical model?

We will examine each of these in this tutorial. As you work through the tutorial, do not hesitate to ask
questions and collaborate with your classmates. Solutions will be posted on the course website.

Part 1: Maximum Likelihood Estimation: Finding the best-fit line

Suppose you have some set of data, like in the sketch shown
at the right. We want to know, what is the probability of
getting that set of data given some particular model? We will
use a line of reasoning known as Maximum Likelihood
ydata
Estimation (MLE).

Because of the central limit theorem, we can assume that the
ymodel
measurements are Gaussian in distribution. What matters is
the distance between the data point and the model we are
testing compared to the standard deviation at every
measured point.

1. What is the probability that a single point y will fall on
the line given by the model? (You can ignore the
normalization factor for now.)

2. Sketch this distribution and label where the ymodel is on the plot.

3. Will the probability distribution look the same for each data point? What will be the differences?

4. What is the probability of getting the whole data set given this particular model (i.e., fit line)?

Here we define chi-squared (2) to simplify things a bit:

!

!
!

!! !!"#$%!

!!!

The probability found above in #4 is known as the likelihood. Our most probable model is the one with
the greatest likelihood of producing the data set we collected.

Likelihood ~ !

!!!

5. When is the likelihood the greatest? What is the relative value of chi-squared at this value?

Part 2: How good is the fit? Testing our family of models

We have shown that the best-fit line is the one that generates the lowest-value for chi-squared. But how
do we know if this fit is any good? Consider the plots shown below. On the right we have a best-fit line
found by minimizing chi-squared. However, the linear model (ymodel = Ax) does not agree very well with
the data. On the left, we plot the same data, but use a different family of models (ymodel = Ax2) to fit the
data. Here we will show how you can use the minimum value of chi-squared to test whether or not the
family of models represents the data.

ymodel = Ax2
ymodel = Ax

x
x2

6. In a reasonable fit to the data, we expect most of the data points to be within one sigma of the
best-fit line. What would be chi-squared if each data point was one sigma from the best-fit line?

Stop here and use the Matlab script <ChiSquared.m> to analyze your data.

Part 3: Determining uncertainty in the fit: Testing how
well our model agrees with the best-fit line

In Matlab, we made a plot of chi-squared vs. slope (A) for a
range of A values. Your chi-squared vs. slope plot probably
looks something like the figure at the right.

Our model predicts that the slope of the F/mr vs. 2 graph
should be 1. We used Matlab to find the minimum A value on
this plot, and it is not exactly 1.

The Likelihood that our model (ymodel = Axi. ) could generate
the set of data we collected is given by:

!

Likelihood = exp
!!!

!! !! !!
2!!!

If we plot Likelihood vs. slope, we will get something that
looks like a Gaussian (see the figure at right). This is the
distribution describing the probability that our data fits a
given family of models. The most likely model is the mean of
the distribution. Note that the standard deviation (A) of
this distribution is different from the standard deviation (i)
of the data in the sample.

The question now is whether or not our model fits the data. This is tedious to do analytically, so we will
answer the question using an analogy. Think back to our original Gaussian for one variable:

!
!! !
1 !!
1
G(x) = exp
= exp
= exp ! !
!
2!
2
!
2

Where we simplify the argument in the exponent to be in terms of z, where ! ! =

!!!
!

Physics 15a/16

Harvard University, Fall 2011

7. We know what a typical Gaussian looks like (see figure at
left). Instead of graphing G(x) vs. x, sketch z2 vs. x. Line up the x-
axis with the graph of the Gaussian.

8. Mark minimum value of z2 on the graph and call it zmin2

Figure 13: 68% of the area under the Gaussian curve falls within one of the mean (left); 95% falls within

two of the mean (right).

Of course, we dont have to stop there; 99.7% of the area
falls within 3 of the mean, and so forth. (You
could look up all these figures in a table of numerical integrals of the Gaussian distribution function.) But
point is that these figures establish confidence
for practical purposes, 68% and 95% will suce.9 The key
intervals. If a set of measurements obeys a Gaussian distribution
with mean 100 and standard deviation

1, then we can be 68% confident that any given measurement from that set lies between 99 and 101, and

95% confident that it lies between 98 and 102. In particular, we can predict that the next measurement
made in the same way will also fall into those ranges with the same probabilities.
This is exactly what we wanted: a concise but specific
quantitative description of the random error
associated with a measurement. It works so well that well use it to define what we mean by uncertainty
such a convenient quantity to work with, well
going forward. Since one standard deviation turns out to be
2
use that:
9. What values of z correspond to x = 1 and x = 2? Mark these points on your sketch and

on the Gaussian.
The uncertainty in a reported value is defined as the interval
in which we are 68% confident the true value lie.

definition, for a single measurement taken from a data set, the uncertainty is equal to
From this
the standard
deviation of the data set. Rather than specifying the endpoints of the interval, we often
express uncertainties as the central value plus or minus a certain amount. So in our example, rather

than specifying that the measurement is between 99 and 101 about 68% of the time, we can say that the
is 100 with an uncertainty of 1, or write it even more concisely as 100 1.
measurement
A bit of notation: if a measured quantity is represented by x, then the uncertainty in x is sometimes
written as x. is the Greek lowercase letter delta, and in this context it means uncertainty of. So if
Look
back
at length
your tograph
f chi-squared
s. scm
lope
n M
you
measure
some
be L = o
100
1 cm, then L =v100
and iL
= 1atlab.
cm. It should look similar to what we just drew,
but we are using 2 (which includes many data points) instead of z2 (which is for a single data point).
3.6
Standard Deviation and Standard Error

Even 10.
with What
this definition
ofm
uncertainty,
can still
into
confusion
it comes to repeated
is the
inimum wevalue
of run
2?
Wsome
hy is
it not when
zero?
measurements. Consider the example we saw back in Section 2.3, when we took the set of 50 measurements
a length (Figure 8) and expanded it to a set including 5000 data points (Figure 10). In our original data
of
Actually, there is a good reason we dont bother with 4 and beyond, and rarely with 3: Treating almost every
distribution
that comes up as a Gaussian is an approximation. For most distributions, its very good approximation near the

center of the distribution, and not so good way o on the tails. This agrees with common sensethe Gaussian is infinitely
in the sense that it never goes to zero, but of course there are physical measurements where you will obviously never get
wide
a negative value, for example.
11. What is the corresponding value of A? This is our most likely model (best A value).

18

12. Now lets think about the uncertainty in our best A. What values of 2 correspond to A 1 and A
2 ? Give your answer in terms of 2min . (Dont try to do this analytically. Look back at what we
did with z2 and see if you can figure out the answer.) Mark these points on the graphs on the first
page.

9

13. Does our model A value fall within the A1 range? A2 ?

14. How does our actual minimum value of 2 compare to the number of data points?

15. Generally, if each data point is (on average) within one sigma of the model, we say the model is
pretty good. If the data is more than two sigma away from the model, we will reject the family of
models. Can we reject our model based on this analysis?

Cention N
100% (4)
Cention N
58 pages
Rock Mass Quality Q Used in Designing Reinforced R PDF
No ratings yet
Rock Mass Quality Q Used in Designing Reinforced R PDF
19 pages
1485 Digital Breast Tomosynthesis
100% (3)
1485 Digital Breast Tomosynthesis
156 pages
Assign3 AanchalDhar 4590
No ratings yet
Assign3 AanchalDhar 4590
12 pages
CDA Course
No ratings yet
CDA Course
196 pages
LectureNotes22 WI4455
No ratings yet
LectureNotes22 WI4455
154 pages
AllNotes 4
No ratings yet
AllNotes 4
56 pages
Statistics
No ratings yet
Statistics
53 pages
High Power Lenses
No ratings yet
High Power Lenses
36 pages
ABD Formulas
No ratings yet
ABD Formulas
55 pages
Intro To Essential Stats With Python
No ratings yet
Intro To Essential Stats With Python
51 pages
Stats - The Theory 2
No ratings yet
Stats - The Theory 2
25 pages
Lec08 2025
No ratings yet
Lec08 2025
43 pages
IV AI-DS AD3491 FDSA Unit5
No ratings yet
IV AI-DS AD3491 FDSA Unit5
35 pages
IV AI-DS AD3491 FDSA Unit5
No ratings yet
IV AI-DS AD3491 FDSA Unit5
39 pages
(Gaston Charlot) Qualitative Inorganic Analysis A
No ratings yet
(Gaston Charlot) Qualitative Inorganic Analysis A
376 pages
Least Squares Fit To Polynomial
No ratings yet
Least Squares Fit To Polynomial
12 pages
Grad Lecture 3
No ratings yet
Grad Lecture 3
27 pages
014 - Feature Selection and Dimensionality Reduction
No ratings yet
014 - Feature Selection and Dimensionality Reduction
58 pages
Advanced CAD - Surface Modeling
0% (1)
Advanced CAD - Surface Modeling
8 pages
11 Mle
No ratings yet
11 Mle
26 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Model Comparison
No ratings yet
Model Comparison
22 pages
Pengantar Analisis Real I
No ratings yet
Pengantar Analisis Real I
177 pages
Ex - No.12. CG of Tractors
No ratings yet
Ex - No.12. CG of Tractors
2 pages
Biodegradable Polymer Adhesives, Hybrids and Nano Materials
No ratings yet
Biodegradable Polymer Adhesives, Hybrids and Nano Materials
498 pages
Confidence Interval and Credintial Interval
No ratings yet
Confidence Interval and Credintial Interval
15 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
Chapter1 Introduction
No ratings yet
Chapter1 Introduction
38 pages
Inferential Statistics For Data Science
100% (1)
Inferential Statistics For Data Science
10 pages
Cito Proefschrift Maarten Marsman PDF
No ratings yet
Cito Proefschrift Maarten Marsman PDF
114 pages
Notes
No ratings yet
Notes
172 pages
A Gentle Tutorial in Bayesian Statistics PDF
100% (4)
A Gentle Tutorial in Bayesian Statistics PDF
45 pages
STAT613
No ratings yet
STAT613
295 pages
Psp-Unit-6 Estimation Theory PDF
No ratings yet
Psp-Unit-6 Estimation Theory PDF
38 pages
Scilab Guide
No ratings yet
Scilab Guide
69 pages
Goodnessnof Fit
No ratings yet
Goodnessnof Fit
6 pages
Applied Statistics II Chapter 9 The One-Way Model: Jian Zou
No ratings yet
Applied Statistics II Chapter 9 The One-Way Model: Jian Zou
81 pages
Random Walk
100% (1)
Random Walk
58 pages
STAT359 Study Guide
No ratings yet
STAT359 Study Guide
7 pages
Stat 473-573 Notes
No ratings yet
Stat 473-573 Notes
139 pages
101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann
No ratings yet
101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann
31 pages
Data Mining: Statistical Methods
No ratings yet
Data Mining: Statistical Methods
35 pages
01 Basics 03LinearRegression 02
No ratings yet
01 Basics 03LinearRegression 02
4 pages
R Commands
No ratings yet
R Commands
5 pages
Actl3003 Notes Unsw Summary
No ratings yet
Actl3003 Notes Unsw Summary
58 pages
Manual Uso p5 Booster
No ratings yet
Manual Uso p5 Booster
68 pages
00 Lab Notes
No ratings yet
00 Lab Notes
8 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
Stat Lecture 2
No ratings yet
Stat Lecture 2
6 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
Lagrangian Mechanics, Dynamics & Control-Andrew - D. Lewis
100% (1)
Lagrangian Mechanics, Dynamics & Control-Andrew - D. Lewis
269 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
17 pages
Functionals and The Functional Derivative: Appendix A
No ratings yet
Functionals and The Functional Derivative: Appendix A
10 pages
9A01709 Advanced Structural Analysis PDF
No ratings yet
9A01709 Advanced Structural Analysis PDF
8 pages
Stats, Mle, and Other Stuff: 1 Sevssd
No ratings yet
Stats, Mle, and Other Stuff: 1 Sevssd
10 pages
Functional Condition Evaluation of Pavements
100% (2)
Functional Condition Evaluation of Pavements
17 pages
Input Modeling: Discrete-Event System Simulation
No ratings yet
Input Modeling: Discrete-Event System Simulation
14 pages
Regression
No ratings yet
Regression
48 pages
Model Fitting and Error Estimation: BSR 1803 Systems Biology: Biomedical Modeling
No ratings yet
Model Fitting and Error Estimation: BSR 1803 Systems Biology: Biomedical Modeling
34 pages
Chapter - 3 Common Statistical Procedure
No ratings yet
Chapter - 3 Common Statistical Procedure
20 pages
MSBTE Time Table For SUMMER 2017 Examination
No ratings yet
MSBTE Time Table For SUMMER 2017 Examination
7 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
STA1007S Lab 10: Confidence Intervals: October 2020
No ratings yet
STA1007S Lab 10: Confidence Intervals: October 2020
5 pages
Statistical Inference in Science
No ratings yet
Statistical Inference in Science
262 pages
Parallax
No ratings yet
Parallax
26 pages
MIT18 05S14 Prac Fnal Exm
No ratings yet
MIT18 05S14 Prac Fnal Exm
8 pages
6242 01 Que 20060118
No ratings yet
6242 01 Que 20060118
12 pages
Lab Sheet For Computational Fluid Dynamics: College of Engineering Pune
No ratings yet
Lab Sheet For Computational Fluid Dynamics: College of Engineering Pune
5 pages
ARAHAN
No ratings yet
ARAHAN
9 pages
Fortran Program For Batch Reactor
No ratings yet
Fortran Program For Batch Reactor
8 pages
On Fitting Models For Danish Fire Data
No ratings yet
On Fitting Models For Danish Fire Data
49 pages
Baku Mutu Atau Standart
No ratings yet
Baku Mutu Atau Standart
6 pages
H-S Diagram of Simple Rankine Cycle
No ratings yet
H-S Diagram of Simple Rankine Cycle
4 pages
A Study of The Use of Ephedra in The Manufacture of Methamphetamine
No ratings yet
A Study of The Use of Ephedra in The Manufacture of Methamphetamine
8 pages
Basic Concepts of Inference: Corresponds To Chapter 6 of Tamhane and Dunlop
No ratings yet
Basic Concepts of Inference: Corresponds To Chapter 6 of Tamhane and Dunlop
40 pages
SteelDesignSeries Part3SDS3 PDF
No ratings yet
SteelDesignSeries Part3SDS3 PDF
9 pages
Yearly Plan Mathematics T Sem 2
No ratings yet
Yearly Plan Mathematics T Sem 2
7 pages
Data Analysis For Physics Laboratory: Standard Errors
No ratings yet
Data Analysis For Physics Laboratory: Standard Errors
5 pages
EN 175: Advanced Mechanics of Solids: Homework Set No.8
No ratings yet
EN 175: Advanced Mechanics of Solids: Homework Set No.8
2 pages
Poisson Regression - Stata Data Analysis Examples
No ratings yet
Poisson Regression - Stata Data Analysis Examples
12 pages
Rock Mechanics 2
No ratings yet
Rock Mechanics 2
5 pages
Statnews #50 What Is Maximum Likelihood? March 2002
No ratings yet
Statnews #50 What Is Maximum Likelihood? March 2002
2 pages
MSDS Dishwashing
No ratings yet
MSDS Dishwashing
2 pages
Physics Dance Party
No ratings yet
Physics Dance Party
2 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
6 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
From Average To K-means
From Everand
From Average To K-means
Beam van Waardenberg
No ratings yet

Chi Squared Tutorial

Uploaded by

Chi Squared Tutorial

Uploaded by

PSI

Tutorial: Chi-Squared Analysis

Harvard University, Fall 2011

13. Does our model A value fall within the A1 range? A2 ?

You might also like