0% found this document useful (0 votes)
18 views

P2_Error Analysis and curve fitting

The document discusses data analysis, focusing on errors, uncertainties, and curve fitting in experiments. It emphasizes the importance of distinguishing between correlation and causation, understanding significant digits, and applying error propagation techniques. Additionally, it covers statistical concepts such as mean, standard deviation, and the criteria for acceptable results in experimental measurements.

Uploaded by

p20230439
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

P2_Error Analysis and curve fitting

The document discusses data analysis, focusing on errors, uncertainties, and curve fitting in experiments. It emphasizes the importance of distinguishing between correlation and causation, understanding significant digits, and applying error propagation techniques. Additionally, it covers statistical concepts such as mean, standard deviation, and the criteria for acceptable results in experimental measurements.

Uploaded by

p20230439
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

Data analysis : Errors and curve fitting

Instructor: Dr. Indrani Chakraborty


Semester II 2022-23

e-mail: [email protected]
Reality of an experiment
Reading data

1. Is there a connection between the red


and the blue curve?

2. Can I say, the blue is increasing because


the red is increasing or the opposite?

CORRELATION DOES NOT IMPLY CAUSATION!!

Maybe the correlation is because in warmer


weather people consume ice-cream more and
also get to the seaside more!
Or,
This just be a coincidental correlation.

Unless we find the mechanism via which more


ice-creams cause more shark attacks, we cannot
Monthly data from US say one is the cause of the other!
Random and systemic error
Precision: High Precision: High
Accuracy: High Accuracy: Low

In a real experiment,
this translates to:

Precision: Low Precision: Low Is parallax a random or systemic error?


Accuracy: Maybe high Accuracy: Low
Significant digits
• Any non-zero digit is significant. Example: 486 has three significant digits and 5.2856 has five
significant digits.

• If zeros are between non-zero digits, they are significant. Example: 1002 has four significant
figures, as the zeros are within non-zero digits.

• For a number less than 1, the zeros to the left of the first non-zero digit are not significant.
Example: 0.0000079 has only two significant figures, as it can be written as 7.9x10-6.

• For numbers with decimal points, zeros to the right of a non-zero digit are significant. Hence, it is
important to retain the trailing zeros to indicate the number of significant digits.
Example: 42.0400 has six significant figures (not four), 0.0006500 has four significant figures (not
seven, as zeroes before 65 are not significant)

• For numbers without decimal points (i.e. integers), trailing zeros are not significant, (unless there
is a decimal point following it, making it a non-integer). Example: 6100 indicates only two
significant digits. 6100 = 6.1x103 also has two significant digits. However, 6100. has four
significant digits, and 6100.0 has 5 significant digits.
Uncertainties
g = 9.82 ± 0.02385 m/s2.

V = 6051.78 ± 30 m/s

What is wrong in the above statement?

• The uncertainty in a measurement cannot be known to 4 significant figures. We normally state


uncertainties by rounding to 1 significant digit or at most 2.

• The last significant figure of in any stated answer should usually be of the same order of magnitude (in
the same decimal position) as the uncertainty.

• To reduce inaccuracies caused by rounding, any numbers to be used in subsequent calculations should
retain at least 1 significant digit more than it is finally justified. Don’t round numbers in the middle of a
calculation, but only at the end.

Eg: We measure a value of 92.81 with an uncertainty of 0.3. The correct statement: 92.8 ± 0.3.
For an uncertainty of 3, it is 93 ± 3.
For an uncertainty of 30, it is 90 ± 30.
Uncertainties
Speed of sound at standard atmosphere
and pressure:
A’s value : 329 ± 5 m/s
B’s value : 325 ± 5 m/s
C’s value : 345 ± 2 m/s

Which values are acceptable?

On the left are the results of a collision of two


cars experiment.

Initial momentum p = 1.49 ± 0.03 kg m/s


Final momentum q = 1.56 ± 0.06 kg m/s
Is the momentum conserved?
Uncertainties
Successive measurements of p-q in the collision
measurement.

Does this prove momentum conservation?

If 2 quantities x and y are measured with


uncertainties δx and δy, and if the measured
values x and y are used to calculate the difference
q = x-y, the uncertainty in q is the sum of the
uncertainties in x and y.

We will find an ‘improved’ rule for this as we go


into error propagation.
Error bars and linearization
Hooke’s law experiment: Extension of a spring by a
distance x due to a mass m attached to the spring.

• How do we present the uncertainties in a plot?

• How do we verify whether

• In another experiment, let’s say .


How do we verify this?

• This is a parabola, so can we say that the


parabola ‘fits’ the points with the error bars
nicely?

• Linearization of a non-linear data is always


done to avoid such difficulties.
How large is a given fractional uncertainty?
If x is measured in the standard form ,
Consider 2 scenarios: then the fractional uncertainty in x is =,
• I have a 10% error in experiment A. often expressed as a percentage.

• I have a 1% error in my experiment B.

Which experiment is scientifically more useful?

• Answer :It depends!

• A 1 % error or deviation in the genetic code will turn a human to a chimpanzee.

• A 10% error in measurement of g by a student using a simple pendulum can be a ‘not so bad’
result.
Before saying ‘ we have an error of 1%, so we are good’, always check which experiment and what
measurement are you talking about, and what is the effect of a certain degree of uncertainty on the
result .
Error propagation for products

Measured
Measured

So momentum

We can improve this expression also.


Mean, standard deviation and standard error

Root mean square deviation of the measurements from .

For very small N, we will prefer to use the sample standard deviation which gives a larger estimate of
the error. For eg: N =1, population standard deviation = 0, but sample standard deviation is which
correctly reflects our total ignorance of the uncertainty after just 1 measurement.
Mean, standard deviation and standard error

• If we measure the same quantity x many times, always using


the same method, and all our sources of uncertainty are
small and random, then the results will be distributed about
the true value in accordance with a Gaussian (or normal)
curve. This means 68% of our measurements will fall within a
distance on either side of .

• Where is called the width parameter.


• Now if we make 1 measurement, the uncertainty associated
with this measurement is , that is . So we can be 68% sure
confident that the measurement is within of the correct
answer.

• Question: What is the true value of a measured physical


quantity?
• Impossible to determine! We can only arrive at the best
estimate.
Mean, standard deviation and standard error

• Probability that the value lies within of X:

Putting , limits are ,

Where is any positive number.

This integral is called the ‘Error function’ erf(t) or the normal error
integral.

• Probability that a measurement will fall within one : 68%.


• Probability that a measurement will fall within 2: 95.4%.
• Probability that a measurement will fall within one : 99.7%.
• These are called ‘Confidence Limits’.
Mean, standard deviation and standard error

• It can be shown that the ‘Mean’ is the best estimate of and the
standard deviation is the best estimate of the width .

• We can also find the uncertainty in the standard deviation or the


‘standard deviation of the standard deviation’. This will be given
as:

This means for N =3, the fractional uncertainty is 50%!

If , how many significant digits do we write in the error?


For N =10,000, the fractional uncertainty is 0.7%. Therefore 0.995
and 0.981. This means we will need tens of thousands of data points
before we can quote 2 significant digits in the error.
So we normally write the error upto 1 significant digit.
Mean, standard deviation and standard error

• Aim: We make measurements of a quantity which are


normally distributed about the true value with width
parameter . We now want to know the reliability of the mean,
or ‘how close the mean values are to each other’.

• We will repeat our measurements many times, that is a whole


set of experiments in each of which we make N measurements
and compute the mean. If we have a set of experiments each
having N measurements, then we have number of mean
values.

• The ‘Standard error’ or ‘standard deviation of mean’ gives the


error on these mean values.
Mean, standard deviation and standard error

Histogram of 2500 After averaging every After averaging every After averaging every
data points show a 5 data points, we have 10 data points, we 50 data points, we
normal distribution 500 points, with mean have 250 points, with have 250 points, with
with mean = 10 and = 10 and=0.5. mean = 10 and=0.3. mean = 10 and=0.14.
=1.
Mean, standard deviation and standard error

• Standard deviation: Quantifies the variation within a set of measurements.

• Standard Error: Quantifies the variation in the means from multiple sets of measurements.

CONFUSION:
• Standard error can be estimated even from 1 set of measurement. In that case, we can be 68%
confident that our answer lies within a distance of the true value .

• Unless specifically requested, it is better to specify the standard deviation instead of the standard
error from a single set of measurement as that is more reflective of the spread in the data.
What is an acceptable result?

• Let us take that in an experiment we measure a quantity x and the true expected value of x is

• We make a set of N measurements of x, and we calculate the beat estimate of x , that is which is
equal to and has a standard deviation .

• We calculate the discrepancy= and then

• Next by calculating the error function, we can find the probability of obtaining an answer that differs
from by or more standard deviations. This is:

• If is large, the discrepancy is perfectly reasonable, and the result is acceptable.


What is an acceptable result?

• A student measures the electron charge e and notes her answer is 2 standard deviations away
from the accepted value.
• In this case

• This means there is only a 4.6% probability that the answer will fall outside . If we put an
(arbitrary) cutoff at 5% probability, the student’s calculated discrepancy = is significant (that is
her answer is unacceptable).

• If the discrepancy is appreciably less than , then by any standard the result is deemed acceptable,
whereas if it is appreciably more than , then by any standard it is unacceptable. If it falls
between to , the result is inconclusive, and the experiment needs to be repeated.
The collision experiment in Lab 1
Elastic collision: One stationary cart, one moving

Total Momentum Total Kinetic Energy We have a discrepancy of !


Difference
S. N. Before After Difference (in %)
Difference Before Collision After Collision (in %)
Collision Collision

pA = – p'1 +
pB = p1 – p2
p'2
|(pA – pB)| KB = K 1 + K 2 KA = K'1 + K'2 The experiment has not been done
(pA – pB)*100/pB
1 0.250 0.240 0.010 -4.198 0.081 0.074 8.929 in a good way to establish
2 0.260 0.249 0.012 -4.459 0.078 0.071 9.345 conservation of linear momentum,
3 0.253 0.243 0.011 -4.156 0.081 0.074 8.831
4 0.266 0.256 0.010 -3.928 0.080 0.073 8.320
and we need to repeat it/design it in
5 0.274 0.257 0.017 -6.266 0.077 0.068 12.679 a better manner!

In this experiment the uncertainty in


determining the velocity of one cart
through 2 photogates was 1-2%
which was too high for a good
experiment.
Discrepancy = You need better timers, better
tracks, smoother motion of carts.
Error propagation revisited

• We started by considering the sum of two numbers x and y.

• Measured value of
• Measured value of

• What are the probable values ? + Highest possible value of is and lowest possible value of is . So lies in
between the 2 values and uncertainty in is

• This is actually an overestimate of ! If and are measure independently, and the errors on them are
random in nature, we have a 50% chance that an underestimate of is accompanied by an overestimate of
and vice versa. Therefore overstates the probable error.

• How to get a better estimate of


Error propagation revisited
• We go to error propagation in quadrature!

• This is always smaller than our pervious estimate .

• If our errors in and are independent and random, then one can use .

• What if the errors are not independent, or random?

• For example: A student measures as the sum of two lengths and with the same steel tape that has a
certain expansion coefficient, and expands by a certain amount as the temperature rises. So if he uses the
tape to measure lengths at a temperature different from the calibration temperature, we can have an
overestimate of both and or underestimate of both and as the errors are not random anymore.

• It can be shown that even if the errors are NOT independent or random, .
Error propagation revisited

• FOR SUMS AND DIFFERENCES:

If are measured with uncertainties , and the measured values used to compute

a) If the uncertainties are independent and random,

b) In any case (whether or not the uncertainties are independent or random or not),
.
Error propagation revisited
• FOR PRODUCTS AND QUOTIENTS:

If are measured with uncertainties , and the measured values used to compute

a) If the uncertainties are independent and random,

b) In any case (whether or not the uncertainties are independent or random or not),
.
Error propagation revisited
• MEASURED QUANTITY TIMES EXACT NUMBER:

If is known exactly and , then:

• UNCERTAINTY IN A POWER:

Ifis an exact number and

• GENERAL FORMULA FOR A FUNCTION:

If is any function of then:


(provided all errors are independent and random)
(always).
Problems
Problems
1. We have measured an angle θ as θ = 20 ± 3°. Find cos θ with error.

2. If you measure x = 100 ± 6, state how you will write with uncertainty.

3. A student measures , state the best estimate of with uncertainty if = 3.0 ± 0.1 and = 2.0 ± 0.1.

4. Consider the following data taken in a Snell’s Law experiment (i and r denote the angle of
incidence and the angle of refraction). Find the errors in the refractive index.
Problems

4. A car rolls down an incline of slope . The expected acceleration is .


We can measure a by timing the cart past the two photocells as
shown in the figure, each connected to a timer. The cart has
lengthand takes time to pass the first photocell and time to pass the
second photocell. Velocity measured by 1st photogate: and by 2nd
photogate is .
We can show .
Given

Find the acceleration with error.


Problems
Problems
Problems
Problems
Rejection of data: Chauvenet’s criterion
“If the expected number of measurements at least as devious as the suspect measurement is less than
one half, then the suspect measurement should be rejected.”

Example: A students makes 10 measurements of length x and gets the results (all in mm)
46, 48, 44, 38, 45, 47, 58, 44, 45, 53
One has to be careful
and . while rejecting data
even with this criterion
If it seems like 58 is too large, it is a ‘suspect value’ then we do the following test: for small sample size
and also because an
anomaly may
Probability that a measurement will differ from by or more is : sometimes be real!

So the number of suspect measurements expected = (no of measurements) x


= 10 x 0.016 = 0.16.

As 0.16 < 0.5 set by Chauvenet’s criterion, the student would reject the result 58. If he does that, his new
mean and stdev will be and .
Weighted averages
Often a physical variable is measured several times, perhaps in several separate labs, and the question
arises how these measurements can be combined to give a single best estimate.

Let’s say student A and B measure a quantity x.

Student A:
Student B:

Do we just take a mean of and ? This might be unsuitable if and are unequal. Averaging gives equal
importance to both and but we should give more importance to the one with the smaller stdev.

• We can solve this problem by the ‘principle of maximum likelihood’.

• We assume both measurements are governed by the Gaussian distribution and denote the
unknown true value of by .
Weighted averages
Probability of student A obtaining his value :

Probability of student B obtaining his value :

Probability that A finds the value and B finds the value :

Where (chi –square) is the quantity:

• The principle of maximum likelihood states that our best estimate for is the value for which is
maximum or is minimum.

• Since this involves minimizing the “sum of squares” , this is also called the ‘Method of Least
squares”.
Weighted averages
Differentiating with respect to and setting the derivative equal to zero,

Therefore best estimate for

If we define “weights”: and

We can write the “weighted average”:

In general form:

Where the sum is over all measurements, and


Weighted averages
Example:
3 students measure the same resistance several times and their three final answers (all in ohm) are:
Student 1 : R = 11 ± 1
Student 2 : R = 12 ± 1
Student 3 : R = 10 ± 3
What is the best estimate for R?
Curve fitting to data

xkcd comics
Least squares fitting
PROBLEM: Find the ‘best fit ‘ straight line to a set of measured points

We will assume for simplicity that although our measurements in y suffer appreciable uncertainty, the
uncertainty in x measurements is negligible.

Now if we know the constants and , then for any given value of , we calculate the corresponding value
of .

Just as before, probability of obtaining the observed value :

Probability of obtaining a complete set of measurements :


Least squares fitting
Just as before we will minimize .

Two simultaneous equations can be written for A and B:

These equations are called ‘Normal equations’ and can be solved to give:
Least squares fitting

THIS IS FOR FITTING


A STRAIGHT LINE!
The resulting line drawn using the above equations is called the “least-squares” fit to the data or the
“line of regression” of y on x.

The Uncertainties:
Least squares fitting
What about a polynomial?

We will again find the :

And again by minimizing we will land up with 3 simultaneous equations:

With A, B and C calculated in this way, the equation will be called the least squares polynomial fit or
polynomial regression.
Least squares fitting
What about an exponential?

Simple! First take log on both sides and linearize the equation:

Then do the least squares fitting of a straight line.


Coefficient of linear correlation r
While doing a least squares fitting of a set of data points what if you have no idea about any of the
uncertainties on x and y?

No idea about the uncertainties on the data points in the left hand side
graph.

The extent to which a set of points supports a linear relation between x and
y is measured by the “Linear correlation coefficient” or just the “correlation
coefficient” or ‘Pearson’s r’.

indicates how well the points fit a straight line. It is a number between -1
and 1
is a measure of the If is close to ± 1, the points lie close to some straight line. If is close to zero,
closeness of association of the points have little or no tendency to lie on a straight line.
the points in a scatter plot
to a linear regression line
based on those points
R2 (COD)
A data set has values marked each associated with a fitted (or modeled, or predicted) value Define the
residuals as .

• There is a quantity called ‘R-square’ () which is called the coefficient of determination (COD).

• This is a statistical measure to qualify the linear regression. It is a percentage of the response variable variation
that is explained by the fitted regression line.

• R-square is always between 0 and 1. If R-square is 0, it indicates that fitted line explains none of the variability of
the response data around its mean; while if R-square is 1, it indicates that the fitted line explains all the variability
of the response data around its mean.

, except in very special cases


of linear regression!
Sum of squares of residuals :
Total sum of squares:
Coefficient of linear correlation r
Coefficient of linear correlation r

R2: How well the predicted


values match (and not just
follow) the observed values?

r2 or r: How close is the


data to the line of best
fit?
Some peak functions

Gaussian:

Lorentzian:

Voigt:
Bad graphs
Omitting the baseline
Bad graphs

Manipulating the Y axis Manipulating the X axis


Bad graphs
Cherrypicking
Bad graphs

Using the wrong graph

Incorrect Correct
Bad graphs
Non-conventional representation (often intentionally misleading)
Bad graphs
Non-conventional representation (often intentionally misleading)

What is the problem with this


graph?

Hint: Check the X axis.

This is intentional Data Omission,


used to show what exactly one
‘wants to show’.
Bad graphs
Non-conventional representation (often intentionally misleading)
Bad graphs
Bad graphs during COVID

Until March 26, the bars’ heights


correspond to the numbers. March 24
bar with 495 cases is twice as high as
the March 20 bar with 253 cases.
However, starting on March 26, the
scaling of bars becomes inconsistent
with the numerical differences.
The accurate graph would be:

Number of COVID-19 cases in Russia from March


5 to March 31, Russia today
Is the curve flattening?
Bad graphs
Bad graphs during COVID

Is the data timely?


Bad graphs
Bad graphs during COVID

Correlation does not imply causation


Problems
Problems
Problems
Practice problems: John R Taylor book, 3.6, 3.12, 3.23, 3.24, 4.11, 5.19, 5.37, 6.3, 7.5, 9.13.
Softwares

Plotting and Analysis softwares like


Origin or Matlab

Or write your own code in Python/ C/


Fortran etc.
Resources
“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit
theories, instead of theories to suit facts”
-Sherlock Holmes, A scandal in Bohemia

It is what the data is.

1. An Introduction to Error Analysis, John R. Taylor

2. Measurements and their uncertainties, Ifan G. Hughes and Thomas P. A. Hase.

You might also like