Intro To Error Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Introduction to Error Analysis for the Physical Chemistry Laboratory Introduction to error analysis adapted from: Statistical and

Error Analysis Lab: Using a Spreadsheet Program by L. G Anderson 2000 with modification by H. Lin 2007. Introduction to Error Analysis for the Physical Chemistry Laboratory Department of Chemistry and Biochemistry, University of Colorado, Boulder 2007. Introduction In the physical chemistry laboratory you will make a variety of measurements, and then manipulate them to arrive at a numerical value for a physical property. However, without an estimate of the error of these numbers they are largely useless. Some published numbers in physics and chemistry are accurate to 10 significant figures or more, while others are only accurate to an order of magnitude (no significant figures!). The estimated error of a published number is crucial piece of information that must be calculated. This handout provides a practical introduction to the error analysis required in a typical physical chemistry lab. Error in an experiment is classified into tow types: random error and systematic error. Random error arises from uncertainty in measurement devices, and is the subject of this handout. Systematic error includes all other sources of error, including simplifications in your model, biased instrumentation, impure reagents, etc. Random error measures the precision of the experiment, or the reproducibility of a given result. It cab be expressed with error bars, or a quantity associated with each value. Systematic error measures the accuracy of a result, or how close a result is tot the true value. The distinction between accuracy and precision is illustrated schematically in figure 1. In your lab write-ups you will be reporting the precision of your measurements by looking at their errors (uncertainties) and propagating those errors (uncertainties) through all of your calculations. You will be reporting the accuracy of your measurements by comparing them to accepted literature.

FigureFigureSchematic illustration of accuracy and precision. The left-hand target target 1 1: Schematic illustration of accuracy and precision. The left-hand represents a precision but low accuracy experiment. The right-hand represents a highhigh precision but low accuracy experiment. The right-hand target target represents a low precision but high accuracy experiment represents a low precision but high accuracy experiment.
your initial measurement. The scatter can be quantied using the standard 1 deviation () of the distribution which is dened as
N

(fi f )2

Mean and Probability When making a series of repeated measurements with independent, random errors, the values will often tend to distribute themselves symmetrically about one value the mean. The mean of a distribution is found by summing all the measurements then dividing by the number of measurements:
N

x =
i=1

xi N

(1)

where x is the mean, xi denotes the individual value of a measurement in the series, and N is the total number of measurements. The frequency of a value is the number of times it occurs in a series of measurements. Then the total number of measurements can be defined as:
n

N = fi
i=1

(2)

where fi is the frequency of a particular value of the measurement occurring and n is the number of different values of the measurement that occur. (Note: This summation is performed over the different values of the measurements, not over the individual measurement as in Eq. 1.) Now the mean, x , can be redefined using Eq. 2:
n

fx
x=
i=1 n

i i

(3)

f
i=1

where the summations are performed over the number of different values of the measurements. This leads to a description of the probability of a particular value being measured. The probability of a particular measurement occurring is the number times a particular n measurement occurs divided by the total number of measurements: P = i . In general, ntotal probability is the number of times a particular outcome can occur divided by the total number of possible outcomes. For example, in a in with 15 red balls and 2 white balls, 2 the probability of getting a white ball is . For the measurement distribution being (15 + 2) discussed above, the number of times a particular measurement occurs is its frequency, fi, and the total number of measurements is N or the sum of all of the frequencies. Thus the probability of measuring a particular value, xi, is:

P(xi ) =

fi
n

f
i=1

(4)

It is important to note that the probability of getting all values is 1, so the sum of all of the probabilities for the values occurring is:
n

P(x ) = 1
i i=1

(5)

Now, once again, we can redefine the mean, this time using the definition of probability:
n

x = P(x1 )x1
i=1

(6)

for n different values of the measurements taken. The Gaussian Distribution As the number of measurements taken approaches infinity, the shape of the distribution approaches a curve known as a Guassian. The functional from of a Gaussian distribution is:
" x2 % f (xi ) = exp $ 2 ' # 2 &

(7)

This particular function has the properties that it is symmetric about the value x = 0 and the width of the curve is inversely proportional to the standard deviation, . Since the sum of the probabilities of finding all values must be 1, the probability distribution function must satisfy the condition that f (xi )dx = 1 when integrated from - to +. Thus, there must be a normalization constant for the Gaussian distribution function to satisfy this condition. Furthermore, there must also be a correction made to Eq. 7 so that the Gaussian can be symmetric around any value of x. The full, generalized, and normalized equation for the Gaussian distribution is then:
f (xi ) = " (x )2 % 1 exp $ ' 2 2 # 2 &

(8)

where the pre-exponential term is the normalization constant, and the center of the distribution is at x = , which is defined as the mean of the Guassian. Significant Figures and Roundoff 3

The precision of an experimental result is implied by the number of digits recorded in the result, although generally the uncertainty should be quoted specifically as well. The number of significant figures in a result is defined as follows: 1. The leftmost nonzero digit is the most significant digit. 2. If there is no decimal point, the rightmost nonzero digit is the least significant digit. 3. If there is a decimal point, the rightmost digit is the least significant digit, even if it is a 0. 4. All digits between the least and most significant digits are counted as significant digits. When insignificant digits are dropped from a number, the last digit retained should be rounded off for the best accuracy. To round off a number to fewer significant digits than specified originally, we truncate the number as desired and treat the excess digits as a decimal fraction. Then: 1. If the fraction is greater than , increment the new least significant digit. 2. If the fraction is less than , do no increment. 3. If the fraction equals , increment the least significant digit only if it is odd. What is uncertainty? Most chemists have an intuitive idea of what uncertainty is, but it is instructive to give a more rigorous definition. Suppose you perform an experiment to determine the boiling point of a liquid, and you measure 32 degrees. How confident are you in this number? This question could in principle be answered by repeating the experiment many times and collecting the results. If you took this large collection of results and counted all of the values that lie within specified intervals (e.g. between 30 and 30.1, 30.1, and 30.2, etc.) you could make a histogram plot, as can be seen in figure 2. You can see that the results are spread over a range of values. The width of this spread is a measure of the uncertainty in your initial measurement. The scatter can be quantified using the standard deviation () of the distribution, which is defined as

( fi f )2 N 1 i=1

(9)

where fi are the results of your individual experiments, f is the average of your results, and N is the number of trials performed. The standard deviation gives limits above and

below a measured value in which subsequent experimental results will probably lie (with a 70% certainty). The importance of the standard deviation is that is describes the region within a Gaussian distribution, which contains a given percentage of all the measured values. For data described by a Gaussian distribution, 68.3% of all the values measured lie within the region of and + . For 2, the percentage increases to 95.4%, and for 3 increases to 99.7%. Therefore, you will generally describe your errors for a measurement as , and it is understood that this is the standard deviation, and this range is expected to contain about 68% of all measured values. The range describes the 68% confidence interval that the value falls within this range based on an infinite number of repeated measurements. However, it is not practical to collect an infinite number of measurements. As the number of measurements decreases, the Gaussian curve becomes broader at the wings and lower at the center, and it is referred to as a t-distribution. Depending on the number of measurements, the confidence level of will decrease. For example, to get a 95% confidence level with 20 measurements, you would need to report 2.09. For 5 measurements, you would need to report 2.57. For 2 measurements, you would need to report 4.30. This illustrates why it is imperative to repeat measurements as many times as possible. You will have much more confidence in you results by taking 5 measurements instead of 2 or 3. Error propagation Most often in the physical chemistry lab you will only perform an experiment one or two times. In this case it is not possible to calculate the standard deviation of a large number of trials directly. Instead, it must be estimated by propagating the errors in the individual measurements that lead to your final result. First of all, we need to discuss the uncertainties individual measurements. Sometimes an instrument such as a volumetric flask will state its uncertainty in its technical specifications. If not, you can make an educated estimate of the uncertainty. If an instrument gives a digital reading, you can generally take the uncertainty to be half of the last decimal place. For example, if a digital thermometer reads 25.4 degrees, the uncertainty is 0.05 degrees. For analog instruments, first read the measurement to as many significant figures as there are marks on the gauge, and then estimate one more significant figure. The uncertainty should then be estimated based on how confident you are in the estimated significant figure. For example, if a mercury thermometer has marks at every degree, you would read the number of degrees, and estimate the tenths of a degree. The uncertainty might be plus or minus 0.2 degrees (e.g. 25.4 0.2 degrees).

70 60 50
Counts

40 30 20 10 0 27 28 29 30 31 32 33 Boiling point [degrees C] 34 35

Figure 2 Distribution of a series of 1000 boiling point experiment results. The uncertainty in a single trial is related 1000 boiling the distribution, and is called the Figure 2: Distribution of a series ofto the width of point experiment results. The standard deviation single trial is related uncertainty in a of the distribution (). to the width of the distribution, and is called the standard deviation of the distribution (). To begin our discussion of error propagation, consider an experiment that measures some quantity x. The result we are looking for is some function f(x). The measurement of x is subject to of error propagation, consider most general casethat To begin our discussion some uncertainty bounds, and the an experiment does not assume symmetric uncertaintiesresult we are looking for is some function fvalue measures some quantity x. The above and below x. In this case, the measured (x). x0 is measurement The within the range of x is subject to some uncertainty bounds, and the most general case does not assume symmetric uncertainties above and below x. In x0 < x0 < x0 + + (10) this case, the measured value x0 is within the range where x0 is the measured value x,< + is<the + the x0 of x0 x0 uncertainty above x0, and - is (2) + uncertainty below x0. The desired property f is then within the range where x0 is the measured value of x, + is the uncertainty above x0 , and is f (x < f (x0 property ) the uncertainty below x0 . 0The )desired) < f (x0 + +f is then within the range (11) If the uncertainty in x is assumed be< f (x0 ) < f (x0 + + ) f becomes f (x0 to ) small, the uncertainty in

(3)

If the uncertainty in x is assumed to be small, the uncertainty in f becomes df f = x (12) dx xo df x (4) f = dx x0 where x us the uncertainty in x, which we now take to be symmetric. If we have a function is many variables, and which we now take to be symmetric. If we have where x of the uncertainty in x, if the errors are both small and independent, the uncertainty is a function of many variables, and if the errors are both small and independent, the uncertainty is

f =

f x

2 2 + x 6 x0

f y

2 2 y + y0

(5)

2 " f % 2 " f % 2 f = $ ' x + $ ' y +... # x &x0 # y &y


0

(13)

We can derive some special cases from equation 5. If a function only contains addition and subtraction operations, the uncertainty is
2 2 f = x + y +...

(14)

If a function only contains multiplication and division operation, the uncertainty is


" % " % f = f $ x ' + $ y ' +... # x0 & # y0 &
2 2

(15)

Now that we are equipped with these formulas, we can proceed to propagate our individual uncertainties. We will illustrate this procedure with an example. Suppose you want to measure the molar heat of solvation of LiCl in water. This involves (1) weighing an amount of LiCl, (2) measuring a volume of water, and (3) measuring the temperature change when the reagent is dissolved. We will first calculate the heat of solvation (H) itself, which is expressed in terms of our three measurements:
H = C V T m/M

(16)

Here m is the mass of LiCl, M is the molecular weight, C is the heat capacity of water per unit volume, V is the volume of water, and T is the temperature change. Note that the function H only contains multiplication/division operations, so we can use the error propagation rule for multiplication and division (equation 7). The variables we need to consider are m, V, and T. We do not include M and C in our list of variables, because they assumed to be known too much higher (relative) precision than m, V, and T. If they werent, we would have to include them in our error analysis, even if we didnt measure them. Plugging our variables into equation 7, we have

" % " % " % H = H $ m ' +$ V ' +$ T ' #m& #V & #T &

(17)

Problem 1 Calculate the heat of solvation of LiCl and its associated uncertainty as discussed above, if the mass is 2.1 0.05 g, the molecular weight is 42.394 0.0005 g mol-1, the volume is 0.1 0.02 L, the temperature change is 4 0.5 K, and the heat capacity per volume is exactly 4.184 kJ L-1 K-1. (Answer: H = -33.79 kJ mol-1, H = 8.01 kJ mol-1, so you would report the heat of solvation as -33.8 8 kJ mol-1.)

Problem 2 Perform the same calculation as in the last problem, but include the molecular weight in your list of error propagation variables. (Answer: You should get the same answer as before, to at least 6 significant figures! This is why it is often possible to ignore variables known to high precision in your error propagation.) Sometimes a function may contain both addition/subtraction and multiplication/division, in which case the two rules can be combined. The easiest way to do this is to break the calculation into steps. Going back to the heat of solvation experiment, suppose that two separate masses were weighed, and then both masses were added to the solvent. Now the total mass is m = m1 + m2 and the equation for the heat of solvation becomes

H =

C V T (m1 + m2 ) / M

(18)

The first step is to calculate the uncertainty in m = m1 + m2 using the error propagation rule for addition/subtraction, i.e.
2 2 m = m1 + m2

(19)

Then simply use the total mass m and its calculated uncertainty, and proceed as in Problem 1. Problem 3 Calculate the uncertainty for the previous example if the two weights were 0.5 0.05 g and 1.1 0.05 g, the volume is again 0.1 0.02 L, and the temperature change is 3 0.5 K. (Answer: m = 0.07 g, H =33.26 kJ mol-1, and H = 8.78 kJ mol-1.) Almost all of the error propagation you will do in the physical chemistry lab will only require the rules for addition/subtraction and multiplication/division. However, occasionally you might come across a more complicated function, in which case we need to use equation 5. For example, suppose you have determined G for a reaction and are interested in calculating the equilibrium constant:

# G & K = exp % ( $ RT '

(20)

Assuming that R and T are known to high precision, we only need to calculate the partial derivative of K with respect to G:

$ G ' K 1 = exp & ) % RT ( G RT


and using equation 5, the uncertainty in K is

(21)

K =

# G & 1 exp % ( G $ RT ' RT

(22)

In many experiments you will be required to calculate the slope or intercept of a linear function. For example, the rate of a unimolecular reaction obeys an exponential rate law

c(t) = A exp(kt)

(23)

where c is the concentration of reagent, k is the rate constant, t is time, and A is a constant. If you want to calculate k you will need values for c at different times. Taking the natural logarithm of equation 15 gives
ln c = kt + ln A

(24)

Which is a linear equation with the familiar y = mx + b. In this case, what interests us is the slope as function of t, which is our desired rate constant. Using two data points, (x1, y1) and (x2, y2), the slope and intercept can be calculated directly:

m=

y2 y1 x2 x1

(25) (26)

b = y1 mx1

and the uncertainties can be calculated using error propagation. If the error in x is negligible compared to the error in y, you can calculate the maximum and minimum values for the slope and intercept as illustrated in figure 3. Drawing a line through the upper bound of y1 and the lower bound for y2 gives an upper limit to the slope and a lower limit for the intercept. If you have many data points, the average slope and intercept can be calculated using linear regression. In practice this is always calculated using a computer program, so the details of the procedure do not concern us. You simply need to become acquainted with a program capable of performing a linear regression with error analysis. The result of the calculation will give a value and an uncertainty for the slope and intercept.

60 55 50 45 y 40 35 30 25 20 4 5 6 7 x 8 9 10 11

Figure 3: How to to determine the uncertainty in the slope and intercept intercept of two Figure 3 How determine the uncertainty in the slope and of two data point. data The bars indicated the uncertainty in the y variable, and the variable, and the dashed points. The bars indicate the uncertainty in the y dashed lines give upper and lines lower upper for the line. bounds for the line. give bounds and lower
Further reading

where c is the concentration of reagent, k is the rate constant, t is time, and A 1. P. Bevington and D. K. Robinson, Data Reduction and Error Analysis for the is a constant. If you want McGraw-Hill, 2002. will need values for c at dierent Physical Sciences; to calculate k, you times. Taking the natural logarithm of equation 15 gives
2. E. B. Wilson Jr., An Introduction to Scientific Research, Dover, 1990.

ln c = kt + ln A

(16)

which is a linear equation with the familiar form y = mx + b. In this case, what Physical Measurements; University Science Books, 1997. interests us is the slope as a function of t, which is our desired rate constant. 4. D. data points, C. , y ) and (x2 , J. the slope and intercept can be Using two P. Shoemaker, (x1W. 1Garland, andy2 ),W. Nibler, Experiments in Physical Chemistry, calculated directly: McGraw-Hill, 1996. y2 y 1 x2 x1 b = y1 mx1 m= (17) (18)

3. J. R. Taylor, An Introduction to Error Analysis: The Study of Uncertainties in

and the uncertainties can be calculated using error propagation. If the error in x is negligible compared to the error in y, you can calculate the maximum and minimum values for the slope and intercept as illustrated in gure 3. Drawing 10 a line through the upper bound of y1 and the lower bound for y2 gives a lower limit to the slope and an upper limit to the intercept. Similarly, drawing a line through the lower bound of y1 and the upper bound of y2 gives an upper limit

You might also like