0% found this document useful (0 votes)
22 views62 pages

CHM121 01 Chapter 3 Evaluation of Analytical Data

Chapter 3 discusses the evaluation of analytical data, focusing on measures of central tendency such as mean, median, and mode, as well as the concepts of precision and accuracy in measurements. It outlines types of errors, including random, systematic, and gross errors, and provides methods for detecting and correcting these errors through statistical analysis. The chapter emphasizes the importance of statistical treatment in assessing the reliability of data and making informed decisions regarding outliers and confidence limits.

Uploaded by

norniharl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views62 pages

CHM121 01 Chapter 3 Evaluation of Analytical Data

Chapter 3 discusses the evaluation of analytical data, focusing on measures of central tendency such as mean, median, and mode, as well as the concepts of precision and accuracy in measurements. It outlines types of errors, including random, systematic, and gross errors, and provides methods for detecting and correcting these errors through statistical analysis. The chapter emphasizes the importance of statistical treatment in assessing the reliability of data and making informed decisions regarding outliers and confidence limits.

Uploaded by

norniharl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Chapter 3

EVALUATION OF
ANALYTICAL
DATA
“It is impossible to perform a chemical analysis
that is totally free of errors, or uncertainties. All
can hope is to minimize these errors and to
estimate their size with acceptable accuracy.”
Measures of Central Tendency

Chemists usually carry three to five


replicates (portions) of a sample through an
analytical procedure. Individual results from a set
of measurements are seldom the same, so a
central or “best” value is used for the set.
1. Mean,
=> arithmetic mean or average
=> the quantity obtained by dividing the sum
of replicate measurements (xi) by the number of
measurements (N) in the set.

Mathematically speaking:

= ( x1 + x2 + x3 + x4 + … + xN )
_______________________

N
2. Median, M
 middle value of a sample of results arranged in
order of increasing/decreasing magnitude.
=> odd number of results → take the middle value
=> even number of results → take the mean of the
two middle values

Example : Calculate the mean and the median for the


following data:

20.3, 19.4, 19.8, 20.1, 19.6 , 19.5


Ideally, the mean and the median are
identical. Frequently they are not particularly the
same when the number of measurements in the
set is small.

The median is used advantageously when a


set of data contains an outlier. An outlier can
have a significant effect on the mean but lesser
on the median.

Outlier – a result that differs significantly from


others in the set
3. Mode
=> the value that occurs most frequently in a set
of determinations.
Precision
➢ describes the reproducibility of the measurements
➢ tells how close the results are, provided that they are
obtained in exactly the same way.
➢ deals with repeatability (within-runs) and
reproducibility (between-runs).
➢ three terms are widely used to describe the precision
of a set of replicate data:
standard deviation
variance
coefficient of variation
All these terms are functions of the deviation from
the mean which is defined as:

deviation from the mean = di = / xi - xt /


where
Xi - experimental value
Xt - true or accepted value of the quantity
Accuracy
→ indicates the closeness of the measurements to
its true value or accepted value and is expressed
by the error (or simply the proximity to the true
value).
- is expressed in terms of:
a. absolute error, E

E = Xi - Xt or - Xt

• the sign of the absolute error tells whether the value


is high or low; if the measurement is low, the sign is
(-) ; if the measurement is high, the sign is (+)
b. relative error, Er

(in terms of %)

(in terms of ppt)


Precision vs. Accuracy

good precision & good accuracy good accuracy but poor precision

poor accuracy but good precision poor precision & poor accuracy

Note: Accuracy measures the agreement between a result


and its true value. Precision describes the agreement
among several results that have been measured in the
same way.
Example

Given : 20.3, 19.4, 19.8, 20.1, 19.6 , 19.5

Determine the relative error (in % and ppt)


and the absolute error for the mean
given that the true value is 20.0.
Experimental error – uncertainty in every
measurement
Types of Errors:
1. Random / Indeterminate Error – arises from the
effects of uncontrolled variables in the
measurement
→ causes data to be scattered more or less
systematically around a mean value.
→ reflected by the precision.
2. Systematic / Determinate Error – arises from a
flaw in equipment or the design of an
experiment
→ causes the mean of a set of data to differ from
the accepted value.
→ causes the results in a series of replicate
measurements to be all high or low.
→ error is reproducible
3. Gross Error
→ occur only occasionally, are often large and may
cause a result to be either high or low.
→ leads to outliers, results that obviously differ
significantly from the rest of the data of replicate
measurements.
→ often product of human errors
Sources of Systematic Errors
a. instrumental error
→ caused by nonideal instrument behavior,
imperfection of measuring devices and instabilities
in their power supplies.

• glasswares used at temperatures that differ from


their calibration temperature
• distortion of container walls
• errors in the original calibration
• contaminants on the inner surface of the containers
b. methodic error
→ arises from non-ideal chemical or physical
behavior of analytical systems.

• due to slowness of some reactions


• incompleteness of a reaction
• instability of some species
• nonspecificity of the reagents
• possible occurrence of side reactions
c. personal error
→ results from the carelessness, inattention or
personal limitations of the experimenter

• estimating the level of the liquid between two


scale divisions
• the color of the solution at the end point in a
titration
Persons who make measurements must
guard against personal bias to preserve the
integrity of the collected data. Of the three
types of systematic errors encountered in a
chemical analysis, methodic errors are
usually the most difficult to identify and
correct.
Classification of Systematic Errors
a. constant error
→ independent of the magnitude of the measured
quantity.
→ becomes less significant as the magnitude

increases.
Example. constant end-point error of 0.10 mL
sample 1: 10.0 mL titrant:
relative E = _0.10 mL_ x 100 % = 1.0 %
10.0 mL
sample 2: 50.0 mL titrant
relative E = _0.10 mL_ x 100 % = 0.20 %
50.0 mL
b. proportional error
→ increases or decreases in proportion to the size
of the sample taken for analysis.
→ common cause of this error is the presence of
interfering contaminants in the sample.

Illustration: Determination of copper based upon the


reaction of copper(II) ion with potassium iodide to give
iodine.
• First, the quantity of iodine is measured and is
proportional to the amount of copper(II) in the
sample.
• If iron(III) is present, it will also liberate iodine from
potassium iodide.
• Unless steps are taken to eliminate the interference,
iron(III), high results are observed for the percentage
of copper because the iodine produced will be a
measure of the copper(II) and iron(III) in the sample.
Detection of Systematic Error
a. systematic instrument error
• usually found and corrected by calibration.
• periodic calibration of the equipment is
always desirable because the response of
most instruments changes with time as a
result of wear, corrosion and mistreatment.

b. systematic personal error


• can be minimized by and self-discipline.
• a good habit is to check the instrument
readings, notebook entries, and
calculations systematically.
c. systematic methodic error
• analytical method has its biases and
difficult to detect.

Steps that recognize and adjust for a


systematic error in an analytical method:
• analysis of standard samples
 the analysis of the standard reference
materials, SRM (materials that contain one or
more analyte with exactly known
concentration levels.)
 standard material can be purchased or
sometimes prepared by synthesis but
unfortunately, this often is impossible or so
difficult and time consuming that this
approach is not practical.
▪ SRM can be purchased from a number
of governmental and industrial sources
(e.g. National Institute of Standards and
Technology, NIST which offers over 900
SRMs)
▪ Concentration of the SRM has been
determined in one of the three ways:
a) through analysis by previously
validated reference method,
b) through analysis by two or more
independent, reliable measurement
methods,
c) through analysis by a network of
cooperating laboratories, technically
competent and thoroughly
knowledgeable with the material
being tested.
• independent analysis
 a second independent and reliable analytical
method to be used in parallel with the method
being evaluated.
 should differ as much as possible from the
method used.
 This minimizes the possibility that some
common factor in the sample has the same
effect on both methods.
• blank determination
 useful for detecting certain types of constant
errors.
 all steps of the analysis are performed in the
absence of the sample.
 the results from the blank are then applied as
a correction to the sample measurements.
 this reveals errors due to interfering
contaminants from the reagents and vessels
employed in the analysis.
 this also allow the analyst to correct titration
data for the volume of reagent needed to
cause an indicator to change color at the
end-point.
• variation in sample size
-can detect constant errors
(as the size of the measurement
increases, the effect of a constant error
decreases).
Random Errors
 arise when a system of measurement is
extended to its maximum sensitivity. This
type of error is caused by many
uncontrollable variables that are an
inevitable part of every physical or chemical
measurement.
 the accumulated effect of the individual
indeterminate uncertainties, however, causes
replicate measurements to fluctuate
randomly around the mean of the set.
Statistical Treatment of Random Errors
sample – a finite number of experimental
observations; a tiny fraction of infinite number of
observations

population or universe – theoretical infinite number


of data.

population mean, μ – true mean of the population;


in the absence of any systematic error, this is also
the true value for the measured quantity
where N →
sample mean – the mean of a limited sample
drawn from the population of the data

where N is finite
Measures of Precision
population standard deviation, σ – a measure
of the precision of a population of data and is
mathematically given by:
sample standard deviation, s
– measures how closely the data are
clustered about the mean
- a measure of the precision of a sample of
data and is given by:

N – 1 = degrees of freedom

• the smaller the standard deviation, the more


closely the data are clustered about the mean
Standard deviation of the mean, S or sm

sm = s
Other ways of expressing precision
 Variance, s2 – simply the square of the standard
deviation
% relative standard deviation, RSD, or
coefficient of variation, CV
 Spread or range, w
→ the difference between the largest value and the
smallest in the set of data.
w = highest value – lowest value

Example: The following results were obtained in the


replicate determination of the lead content of a blood
sample: 0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb.
Calculate the mean, standard deviation, standard
deviation of the mean, variance, spread and % relative
standard deviation.
Reliability of s as a Measure of Precision

Most of the statistical tests described are


based upon sample standard deviations, and the
probability of correctness of the results of these
tests improves as the reliability of s becomes
greater. Uncertainty in the calculated value of s
decreases as N increases. When N is greater
than 20, s and σ can assumed to be identical for
all practical purposes.
Pooling Data to Improve the Reliability of s

 data from a series of similar samples


accumulated over time can often be pooled to
provide an estimate of s superior to the value of
the individual subset.
 mathematical equation of the superior s or
pooled standard deviation:
where: N1 = number of data in set 1
N2 = number of data in set 2
NT = number of data sets that are
pooled
N1 + N2 + … – NT = degrees of
freedom
The Confidence Limit

Statistical theory, however, allows us


to set limits around an experimentally
determined mean within which the
population mean lies with a given degree
of probability. These limits are called
confidence limits, and the interval they
define is known as the confidence
interval, CI.
Confidence interval – an
estimate of uncertainty, stating that
the true mean, μ , is likely to lie
within a certain distance from the
measured mean, .
CI for μ = ± ts

Example.
90 % confidence interval is defined such that,
if we repeat an experiment an infinite number of
times, there is a 90% chance that the true value
lies in a given interval.

 The value of t depends on the desired


confidence level and on the number of degrees of
freedom ( N – 1 ) in the calculation of s.
Detection of Gross Error
When a set of data contains an outlying
result that appears to differ exclusively from the
average, the decision must be made whether to
retain or reject it. It is an unfortunate fact that no
universal rule can be invoked to settle the question
of retention or rejection.

Rejecting Data
When one value in a set of results is much
larger or smaller than the others, decide whether to
retain or reject the questionable value.
The Q-Test
 a simple, widely used statistical test.;
 Qexp is the absolute value of the questionable
result Xq and its neighbor Xn (provided that the
result was arranged in increasing or decreasing
order) divided by the range or spread of the entire
set.
Q exp = /questionable value – nearest numerical value/ or
range
Table. Values of Qt for Rejecting Data
Number of Confidence Level
Observations 90 % 95 % 99 %
3 0.941 0.970 0.994
4 0.765 0.829 0.926
5 0.642 0.710 0.821
6 0.560 0.625 0.740
7 0.507 0.568 0.680
8 0.468 0.526 0.634
9 0.437 0.493 0.598
10 0.412 0.466 0.568

Xq is rejected if : Qexp ≥ Qt
Xq is accepted if: Qexp < Qt
Recommendation for Treatment of Outliers:
1. Reexamine carefully all data relating to the
outlying result to see if a gross error could have
affected its value.
2. If possible, estimate the precision that can be
reasonably expected from the procedure to be
sure that the outlying result actually is
questionable.
3. If more data cannot be secured, apply Q-test to
the existing set if the doubtful result should be
retained or rejected on statistical grounds.
4. If the Q-test indicates retention, consider
reporting the median of the set rather than the
mean. The median has the great virtue of allowing
inclusion of all data in a set without undue
influence from an outlying value. In addition, the
median of a normally distributed set containing 3
measurements provides a better estimate of the
correct value than the mean of the set after an
outlying value has been discarded.
Application of Statistics to Data Treatment
and Evaluation
Experimentalist use statistical calculations to
sharpen their judgments concerning the quality of
experimental measurements. The most common
application of statistics to analytical chemistry
includes:

a. establishing confidence limits for the mean of a


set of replicate data.
b. determining the number of replications required
to decrease the confidence limit for a mean for a
given level of confidence.
c. determining at a given probability whether an experimental
mean is different from the accepted value for the quantity being
measured. (t test)
- comparing an experimental mean with a true or accepted
value

|ഥ
𝒙 − 𝒕𝒓𝒖𝒆 𝒗𝒂𝒍𝒖𝒆|
𝒕𝒄𝒂𝒍𝒄𝒖𝒍𝒂𝒕𝒆𝒅 = 𝑵
𝒔

If tcalculated > ttable (95%) ; the two results are considered


to be different or a significant difference exists
d. determining at a given probability level
whether two experimental means are
different (paired t test)
- comparing two experimental means

│𝑥ҧ 1 −𝑥ҧ 2 │ 𝑁1 𝑁2
tcalculated =
𝑠𝑝𝑜𝑜𝑙𝑒𝑑 𝑁1 +𝑁2

If tcalculated > ttable (95%) ; the two results are considered


to be different or a significant difference exists
e. determining at a given probability level whether
precision of two sets of measurements differs.
(F Test)

Fc = S12 = V1 where the larger variance


S22 V2 is always the numerator

If Fc > Ft ; a significant difference exists in


their precision
f. deciding whether an outlier is probably the
result of a gross error and should be discarded
in calculating a mean.
g. defining and estimating detection limits
h. treating calibration data.
i. in quality control of analytical data and of
industrial products.
Method of Least Squares
(A tool for calibration plots)

Regression analyses – techniques used for


obtaining the “best line” from a set of data
points and for specifying the uncertainties of
that line
Assumptions:
1. There is actually a linear relationship between
the measured variable, y, and the analyte
concentration, x.
2. One of the plotted variables is independent of
the other one and is known with a high degree of
accuracy.
Recall: Equation of the Line
y = mx + b
where: b = y-intercept (the value of y when x
is zero)
b = 𝒚 ഥ - mഥ 𝒙
m = slope of the line
Solve Example 11
𝑋ത = 1.00 + 2.50 + 5.00 + 7.50 + 10.00 = 5.20
5

𝑦ത = 0.116 + 0.281 + 0.567 + 0.880 + 1.074 = 0.584


5
𝒙/
/ xi-ഥ 𝒙 )2
( xi-ഥ 𝒚/
/ yi-ഥ 𝒚 )2
( yi-ഥ 𝒙/ / yi-ഥ
/ xi-ഥ 𝒚/

4.20 17.64 0.468 0.219 1.966

2.70 7.29 0.303 0.092 0.818

0.20 0.04 0.017 0.000 0.003

2.30 5.29 0.296 0.088 0.681

4.80 23.04 0.490 0.240 2.352

∑ = 14.20 ∑ = 53.30 ∑ = 1.574 ∑ = 0.639 ∑=5.820


slope, m = ∑ [(xi-𝑥)(y
ҧ i-𝑦)]
ത = 5.820 = 0.1092
∑ (xi- 𝑥)2
ҧ 53.30

intercept = b = 𝒚 ഥ - mഥ 𝒙
= 0.584 – (0.1092)(5.20)
= 0.016

equation of the line :


y = 0.1092x + 0.016
What is the concentration in ppm if the signal
given by the unknown is 0.405?
slope, m = ∑ [(xi- 𝑥)(y
ҧ i- 𝑦)]
ത = 15.071 = 0.02932
∑ (yi-x) 513.98

intercept , b = 𝑦ത – m 𝑥ҧ
= 0.316 – (0.2932) (10.75)
= 0.316 – 0.3152
= 0.001

y = 0.02932x + 0.001

You might also like