P2_Error Analysis and curve fitting
P2_Error Analysis and curve fitting
e-mail: [email protected]
Reality of an experiment
Reading data
In a real experiment,
this translates to:
• If zeros are between non-zero digits, they are significant. Example: 1002 has four significant
figures, as the zeros are within non-zero digits.
• For a number less than 1, the zeros to the left of the first non-zero digit are not significant.
Example: 0.0000079 has only two significant figures, as it can be written as 7.9x10-6.
• For numbers with decimal points, zeros to the right of a non-zero digit are significant. Hence, it is
important to retain the trailing zeros to indicate the number of significant digits.
Example: 42.0400 has six significant figures (not four), 0.0006500 has four significant figures (not
seven, as zeroes before 65 are not significant)
• For numbers without decimal points (i.e. integers), trailing zeros are not significant, (unless there
is a decimal point following it, making it a non-integer). Example: 6100 indicates only two
significant digits. 6100 = 6.1x103 also has two significant digits. However, 6100. has four
significant digits, and 6100.0 has 5 significant digits.
Uncertainties
g = 9.82 ± 0.02385 m/s2.
V = 6051.78 ± 30 m/s
• The last significant figure of in any stated answer should usually be of the same order of magnitude (in
the same decimal position) as the uncertainty.
• To reduce inaccuracies caused by rounding, any numbers to be used in subsequent calculations should
retain at least 1 significant digit more than it is finally justified. Don’t round numbers in the middle of a
calculation, but only at the end.
Eg: We measure a value of 92.81 with an uncertainty of 0.3. The correct statement: 92.8 ± 0.3.
For an uncertainty of 3, it is 93 ± 3.
For an uncertainty of 30, it is 90 ± 30.
Uncertainties
Speed of sound at standard atmosphere
and pressure:
A’s value : 329 ± 5 m/s
B’s value : 325 ± 5 m/s
C’s value : 345 ± 2 m/s
• A 10% error in measurement of g by a student using a simple pendulum can be a ‘not so bad’
result.
Before saying ‘ we have an error of 1%, so we are good’, always check which experiment and what
measurement are you talking about, and what is the effect of a certain degree of uncertainty on the
result .
Error propagation for products
Measured
Measured
So momentum
For very small N, we will prefer to use the sample standard deviation which gives a larger estimate of
the error. For eg: N =1, population standard deviation = 0, but sample standard deviation is which
correctly reflects our total ignorance of the uncertainty after just 1 measurement.
Mean, standard deviation and standard error
This integral is called the ‘Error function’ erf(t) or the normal error
integral.
• It can be shown that the ‘Mean’ is the best estimate of and the
standard deviation is the best estimate of the width .
Histogram of 2500 After averaging every After averaging every After averaging every
data points show a 5 data points, we have 10 data points, we 50 data points, we
normal distribution 500 points, with mean have 250 points, with have 250 points, with
with mean = 10 and = 10 and=0.5. mean = 10 and=0.3. mean = 10 and=0.14.
=1.
Mean, standard deviation and standard error
• Standard Error: Quantifies the variation in the means from multiple sets of measurements.
CONFUSION:
• Standard error can be estimated even from 1 set of measurement. In that case, we can be 68%
confident that our answer lies within a distance of the true value .
• Unless specifically requested, it is better to specify the standard deviation instead of the standard
error from a single set of measurement as that is more reflective of the spread in the data.
What is an acceptable result?
• Let us take that in an experiment we measure a quantity x and the true expected value of x is
• We make a set of N measurements of x, and we calculate the beat estimate of x , that is which is
equal to and has a standard deviation .
• Next by calculating the error function, we can find the probability of obtaining an answer that differs
from by or more standard deviations. This is:
• A student measures the electron charge e and notes her answer is 2 standard deviations away
from the accepted value.
• In this case
• This means there is only a 4.6% probability that the answer will fall outside . If we put an
(arbitrary) cutoff at 5% probability, the student’s calculated discrepancy = is significant (that is
her answer is unacceptable).
• If the discrepancy is appreciably less than , then by any standard the result is deemed acceptable,
whereas if it is appreciably more than , then by any standard it is unacceptable. If it falls
between to , the result is inconclusive, and the experiment needs to be repeated.
The collision experiment in Lab 1
Elastic collision: One stationary cart, one moving
pA = – p'1 +
pB = p1 – p2
p'2
|(pA – pB)| KB = K 1 + K 2 KA = K'1 + K'2 The experiment has not been done
(pA – pB)*100/pB
1 0.250 0.240 0.010 -4.198 0.081 0.074 8.929 in a good way to establish
2 0.260 0.249 0.012 -4.459 0.078 0.071 9.345 conservation of linear momentum,
3 0.253 0.243 0.011 -4.156 0.081 0.074 8.831
4 0.266 0.256 0.010 -3.928 0.080 0.073 8.320
and we need to repeat it/design it in
5 0.274 0.257 0.017 -6.266 0.077 0.068 12.679 a better manner!
• Measured value of
• Measured value of
• What are the probable values ? + Highest possible value of is and lowest possible value of is . So lies in
between the 2 values and uncertainty in is
• This is actually an overestimate of ! If and are measure independently, and the errors on them are
random in nature, we have a 50% chance that an underestimate of is accompanied by an overestimate of
and vice versa. Therefore overstates the probable error.
• If our errors in and are independent and random, then one can use .
• For example: A student measures as the sum of two lengths and with the same steel tape that has a
certain expansion coefficient, and expands by a certain amount as the temperature rises. So if he uses the
tape to measure lengths at a temperature different from the calibration temperature, we can have an
overestimate of both and or underestimate of both and as the errors are not random anymore.
• It can be shown that even if the errors are NOT independent or random, .
Error propagation revisited
If are measured with uncertainties , and the measured values used to compute
b) In any case (whether or not the uncertainties are independent or random or not),
.
Error propagation revisited
• FOR PRODUCTS AND QUOTIENTS:
If are measured with uncertainties , and the measured values used to compute
b) In any case (whether or not the uncertainties are independent or random or not),
.
Error propagation revisited
• MEASURED QUANTITY TIMES EXACT NUMBER:
• UNCERTAINTY IN A POWER:
2. If you measure x = 100 ± 6, state how you will write with uncertainty.
3. A student measures , state the best estimate of with uncertainty if = 3.0 ± 0.1 and = 2.0 ± 0.1.
4. Consider the following data taken in a Snell’s Law experiment (i and r denote the angle of
incidence and the angle of refraction). Find the errors in the refractive index.
Problems
Example: A students makes 10 measurements of length x and gets the results (all in mm)
46, 48, 44, 38, 45, 47, 58, 44, 45, 53
One has to be careful
and . while rejecting data
even with this criterion
If it seems like 58 is too large, it is a ‘suspect value’ then we do the following test: for small sample size
and also because an
anomaly may
Probability that a measurement will differ from by or more is : sometimes be real!
As 0.16 < 0.5 set by Chauvenet’s criterion, the student would reject the result 58. If he does that, his new
mean and stdev will be and .
Weighted averages
Often a physical variable is measured several times, perhaps in several separate labs, and the question
arises how these measurements can be combined to give a single best estimate.
Student A:
Student B:
Do we just take a mean of and ? This might be unsuitable if and are unequal. Averaging gives equal
importance to both and but we should give more importance to the one with the smaller stdev.
• We assume both measurements are governed by the Gaussian distribution and denote the
unknown true value of by .
Weighted averages
Probability of student A obtaining his value :
• The principle of maximum likelihood states that our best estimate for is the value for which is
maximum or is minimum.
• Since this involves minimizing the “sum of squares” , this is also called the ‘Method of Least
squares”.
Weighted averages
Differentiating with respect to and setting the derivative equal to zero,
In general form:
xkcd comics
Least squares fitting
PROBLEM: Find the ‘best fit ‘ straight line to a set of measured points
We will assume for simplicity that although our measurements in y suffer appreciable uncertainty, the
uncertainty in x measurements is negligible.
Now if we know the constants and , then for any given value of , we calculate the corresponding value
of .
These equations are called ‘Normal equations’ and can be solved to give:
Least squares fitting
The Uncertainties:
Least squares fitting
What about a polynomial?
With A, B and C calculated in this way, the equation will be called the least squares polynomial fit or
polynomial regression.
Least squares fitting
What about an exponential?
Simple! First take log on both sides and linearize the equation:
No idea about the uncertainties on the data points in the left hand side
graph.
The extent to which a set of points supports a linear relation between x and
y is measured by the “Linear correlation coefficient” or just the “correlation
coefficient” or ‘Pearson’s r’.
indicates how well the points fit a straight line. It is a number between -1
and 1
is a measure of the If is close to ± 1, the points lie close to some straight line. If is close to zero,
closeness of association of the points have little or no tendency to lie on a straight line.
the points in a scatter plot
to a linear regression line
based on those points
R2 (COD)
A data set has values marked each associated with a fitted (or modeled, or predicted) value Define the
residuals as .
• There is a quantity called ‘R-square’ () which is called the coefficient of determination (COD).
• This is a statistical measure to qualify the linear regression. It is a percentage of the response variable variation
that is explained by the fitted regression line.
• R-square is always between 0 and 1. If R-square is 0, it indicates that fitted line explains none of the variability of
the response data around its mean; while if R-square is 1, it indicates that the fitted line explains all the variability
of the response data around its mean.
Gaussian:
Lorentzian:
Voigt:
Bad graphs
Omitting the baseline
Bad graphs
Incorrect Correct
Bad graphs
Non-conventional representation (often intentionally misleading)
Bad graphs
Non-conventional representation (often intentionally misleading)