Understanding The Structure of Scientific Data: LC - GC Europe Online Supplement
Understanding The Structure of Scientific Data: LC - GC Europe Online Supplement
Understanding the
Structure of
Scientific Data
Shaun Burke, RHM Technology Ltd, High Wycombe, Buckinghamshire, UK.
This is the first in a series of articles that aims to promote the better use of
statistics by scientists. The series intends to show everyone from bench
chemists to laboratory managers that the application of many statistical
methods does not require the services of a ‘statistician’ or a ‘mathematician’
to convert chemical data into useful information. Each article will be a con-
cise introduction to a small subset of methods. Wherever possible, diagrams
will be used and equations kept to a minimum; for those wanting more the-
ory, references to relevant statistical books and standards will be included.
By the end of the series, the scientist should have an understanding of the
most common statistical methods and be able to perform the test while
avoiding the pitfalls that are inherent in their misapplication.
In this article we look at the initial steps in known as a dot plot) can be used to graph types with a few clicks of the
data analysis (i.e., exploratory data analysis), explore how the data set is distributed mouse. All of these plots can give an
and how to calculate the basic summary (Figure 1). Blob plots are constructed indication of the presence or absence of
statistics (the mean and sample standard simply by drawing a line, marking it off outliers (1). The frequency histogram, stem
deviation). These two processes, which with a suitable scale and plotting the data and leaf plot, and blob plot can also
increase our understanding of the data along the axis. indicate the type of distribution the data
structure, are vital if the correct selection of A stem-and-leaf plot is yet another belongs to. It should be remembered that
more advanced statistical methods and method for examining patterns in the data if the data set is from a non-normal (2)
interpretation of their results are to be set. These are complex to describe and distribution, (Figure 2(a) and possibly
achieved. From that base we will progress to perceived as old fashioned, especially with Figure 1(a)), it may be that which looks like
significance testing (t-tests and the F-test). the modern graphical packages available an outlier is in fact a good piece of
These statistics allow a comparison between today. For the sake of completeness they information. The outliers are the most
two sets of results in an objective and are described in Box 1. extreme points on the right-hand side of
unbiased way. For example, significance For larger data sets, frequency Figures 1(a) and 2(a). Note: Outliers, outlier
tests are useful when comparing a new histograms (Figure 2(a)) and Box and tests and robust methods will be the
analytical method with an old method or Whisker plots (Figure 2(b)) may be better subject of a later article.
when comparing the current day’s options to display the data distribution. Assuming there are no obvious outliers,
production with that of the previous day. Once the data set is entered, or as is more we still have to do one more plot to make
usual with modern instrumentation, sure we understand the data structure. The
Exploratory Data Analysis electronically imported, most modern PC individual results should be plotted against
Exploratory data analysis is a term used to statistical packages can construct these a time index (i.e., the order the data were
describe a group of techniques (largely
graphical in nature) that sheds light on the
structure of the data. Without this
knowledge the scientist, or anyone else, (a)
cannot be sure they are using the correct
form of statistical evaluation. Scale
The statistics and graphs referred to in this Mean
(b)
first section are applicable to a single
column of data (i.e., univariate data), such Scale
as the number of analyses performed in a Mean
laboratory each month. For small amounts
of data (<15 points), a blob plot (also figure 1 Blob plots of the raw data.
4 statistics and data analysis LC•GC Europe Online Supplement
obtained). If any systematic trends are together with how they relate to the Unfortunately, the mean is often reported
observed (Figures 3(a)–3(c)) then the confidence intervals for normally as an estimate of the ‘true-value’ (m) of
reasons for this must be investigated. distributed data. whatever is being measured without
Normal statistical methods assume a considering the underlying distribution.
random distribution about the mean with The Mean This is a mistake. Before any statistic is
time (Figure 3(d)) but if this is not the case The average or arithmetic mean (3) is calculated it is important that the raw data
the interpretation of the statistics can be generally the first statistic everyone is should be carefully scrutinized and plotted
erroneous. taught to calculate. This statistic is easily as described above. An outlying point can
found using a calculator or spreadsheet have a big effect on the mean (compare
Summary Statistics and simply involves the summing of the Figure 1(a) with 1(b)).
Summary statistics are used to make sense individual results x1, x2, x3, ..., xi) and
of large amounts of data. Typically, the division by the number of results (n), The Standard Deviation (3)
mean, sample standard deviation, range, n The standard deviation is a measure of the
xi
confidence intervals, quantiles (1), and i1 spread of data (dispersion) about the mean
measures for skewness and x n and can again be calculated using a
spread/peakedness of the distribution where, calculator or spreadsheet. There is,
(kurtosis) are reported (2). The mean and however, a slight added complication; if
n
sample standard deviation are the most x1 x2 x3 … xi you look at a typical scientific calculator
widely used and are discussed below i1 you will notice there are two types of
Frequency (Nº of data points in each bar)
(a) (b)
Box 1: Stem-and-leaf plot
1.5 interquartile
A stem-and-leaf plot is another
method of examining patterns in the
upper quartile value
data set. They show the range, in
which the values are concentrated, interquartile
and the symmetry. This type of plot is median
constructed by splitting data into the lower quartile value
stem (the leading digits). In the figure
below, this is from 0.1 to 0.6, and 1.5 interquartile
the leaf (the trailing digit). Thus,
0.216 is represented as 2|1 and
0.350 by 3|5. Note, the decimal *outlier
places are truncated and not round-
ed in this type of plot. Reading the *The interquartile range is the range which contains the middle 50% of the data when
it is sorted into ascending order.
plot below, we can see that the data
values range from 0.12 to 0.63. The figure 2 Frequency histogram and Box and Whisker plot.
column on the left contains the
depth information (i.e., how many
leaves lie on the lines closest to the
end of the range). Thus, there are 13
(a) (b)
points which lie between 0.40 and
0.63. The line containing the middle Magnitude Magnitude
value is indicated differently with a 10 10
count (the number of items in the 8 8
line) and is enclosed in parentheses. 6 6
4 4
2 2
Stem-and-leaf plot 0 Time 0 Time
n = 7, mean = 6, standard deviation = 2.16 n = 9, mean = 6, standard deviation = 2.65
Units = 0.1 1|2 = 0.12 Count =
(c) (d)
42
Magnitude Magnitude
5 1|22677 10 10
14 2|112224578 8 8
(15) 3|000011122333355 6 6
13 4|0047889 4 4
6 5|56669 2 2
0 Time 0 Time
1 6|3 n = 9, mean = 6, standard deviation = 2.06 n = 9, mean = 6, standard deviation = 1.80
(a)
(i)
probably not different and would 'pass' the t-test
(ii) (tcrit > tcalculated value)
(b)
(i)
probably different and would 'fail' the t-test contamination for several batches of the
(tcrit < tcalculated value) chemical, the 10 results then represent a
(ii)
sample from the whole population and the
correct standard deviation to use is that for
(c) a sample (n-1). If you are using a statistical
(i) package you should always check that the
could be different but not enough data to say for correct standard deviation is being
(ii) sure (i.e., would 'pass' the t-test [tcrit > tcalculated value])
calculated for your particular problem.
µ1
(d) n ((xi µ)2 / n)
(i)
practically identical means, but with so many data
points there is a small but statistically siginificant s n–1 ((xi x)2 / n 1)
(ii) ('real') difference and so would 'fail' the t-test
(tcrit < tcalculated value)
µ2
Significance Testing and its spread. For example, consider the nearly always lead to a significant
Suppose, for example, we have the blob plots shown in Figure 5. For the two difference but a statistically significant
following two sets of results for lead data sets shown in Figure 5(a), the means result is not necessarily an important result.
content in water 17.3, 17.3, 17.4, 17.4 for set (i) and set (ii) are numerically For example in Figure 5(d) there is a
and 18.5, 18.6, 18.5, 18.6. It is fairly clear, different. From the limited amount of statistically significant difference, but does
by simply looking at the data, that the two information available, however, they are it really matter in practice?
sets are different. In reaching this from a statistical point of view the same.
conclusion you have probably considered For Figure 5(b), the means for set (i) and What is a t-test?
the amount of data, the average for each set (ii) are probably different but when A t-test is a statistical procedure that can
set and the spread in the results. The fewer data points are available, Figure 5(c), be used to compare mean values. A lot of
difference between two sets of data is, we cannot be sure with any degree of jargon surrounds these tests (see Table 1
however, not so clear in many situations. confidence that the means are different for definition of the terms used below) but
The application of significance tests gives even if they are a long way apart. With a they are relatively simple to apply using the
us a more systematic way of assessing the large number of data points, even a very built-in functions of a spreadsheet like
results with the added advantage of small difference, can be significant (Figure Excel or a statistical software package.
allowing us to express our conclusion with 5(d)). Similarly, when we are interested in Using a calculator is also an option but you
a stated degree of confidence. comparing the spread of results, for have to know the correct formula to apply
example, when we want to know if (see Table 2) and have access to statistical
What does significance mean? method (i) gives more consistent results tables to look up the so-called critical
In statistics the words ‘significant’ and than method (ii), we have to take note of values (4).
‘significance’ have specific meanings. A the amount of information available Three worked examples are shown in
significant difference, means a difference (Figures 5(e)–(g)). Box 2 (5) to illustrate how the different
that is unlikely to have occurred by chance. It is fortunate that tables are published t-tests are carried out and how to interpret
A significance test, shows up differences that show how large a difference needs to the results.
unlikely to occur because of a purely be before it can be considered not to have
random variation. occurred by chance. These are, critical What is an F-test?
As previously mentioned, to decide if one t-value for differences between means, An F-test compares the spread of results in
set of results is significantly different from and critical F-values for differences two data sets to determine if they could
another depends not only on the between the spread of results (4). reasonably be considered to come from the
magnitude of the difference in the means Note: Significance is a function of sample same parent distribution. The test can,
but also on the amount of data available size. Comparing very large samples will therefore, be used to answer questions
such as are two methods equally precise?
The measure of spread used in the F-test is
variance which is simply the square of the
Jargon Definition standard deviation. The variances are
ratioed (i.e., divide the variance of one set
Alternate Hypothesis A statement describing the alternative to the null hypothesis
(H1) (i.e., there is a difference between the means [see two-tailed] of data by the variance, of the other) to
or mean1 is ≥ mean2 [see one-tailed]). get the test value F = 2
S1 2
S2
Critical Value The value obtained from statistical tables or statistical packages at a
(tcrit or Fcrit) given confidence level against which the result of applying a signifi- This F value is then compared with a critical
cance test is compared. value that tells us how big the ratio needs
Null hypothesis A statement describing what is being tested to be to rule out the difference in spread
(H0) (i.e., there is no difference between the two means [mean1 = mean2]). occurring by chance. The Fcrit value is
found from tables using (n1–1) and (n2–1)
One-tailed A one-tailed test is performed if the analyst is only interested in the
answer when the result is different in one direction, for example, (1) degrees of freedom, at the appropriate
the level of confidence.
new production method results in a higher yield, or (2) the amount of [Note: it is usual to arrange s1 and s2 so
waste product is reduced (i.e., a limit value ≤, >, <, or ≥ is used in the that F > 1]. If the standard deviations are to
alternate hypothesis). In these cases the calculation to determine the be considered to come from the same
t-value is the same as that for the two-tailed t-test but the critical population then Fcrit > F. As an example we
value is different.
use the data in Example 2 (see Box 2).
Population A large group of items or measurements under investigation
(e.g., 2500 lots from a single batch of a certified reference material). 2
F 2.75 1.471 2 3.49
Sample A group of items or measurements taken from the population
(e.g., 25 lots of a certified reference material taken from a batch
containing 2500 lots). Fcrit = 9.605 (5–1) and (5–1) degrees of
Two-tailed A two-tailed t-test is performed if the analyst is interested in any
freedom at the 97.5% confidence level.
change. For example, is method A different from method B As Fcrit> Fcalculated we can conclude that the
(i.e., ≠ is used in the alternate hypothesis. Under most circumstances spread of results in the two data sets are
two-tailed t-tests should be performed). not significantly different and it is,
therefore, reasonable to combine the two
table 1 Definitions of statistical terms used in significance testing. standard deviations as we have done.
LC•GC Europe Online Supplement statistics and data analysis 7
Conclusions d n
t sd
• Always plot your data and understand
the patterns in it before calculating any For a one-tailed test
statistic, even the arithmetic mean. the sign is important
• Make sure the correct standard deviation
d n
is calculated for your particular t sd
circumstance. This will nearly always be
the sample standard deviation (n-1). x1 x2
Difference between independent sample means with equal variances t
• Significance tests are used to compare, 1 1
in an unbiased way, the means or spread sc n1 n2
(variance) of two data sets.
• The tests are easily performed using Difference between independent sample means with unequal variances† x1 x2
t
statistical routines in spreadsheets and s21 s22
statistical packages. n1 n2
• The p-value is a measure of confidence
in the result obtained when applying a where:
significance test. –
x is the sample mean, µ is the population mean, s is the standard deviation for the sample, n is the number items in the sample,
– –
|d | is the absolute mean difference between pairs, d is the mean difference between pairs, sd is the sample standard deviation for the
Acknowledgement – –
pairs, x1 and x2 are two independent sample means, n1 and n2 are the number of items making up each sample
The preparation of this paper was
2 2
supported under a contract with the UK s1 n1 1 s2 n2 1
and s is the combined standard deviation found using sc
Department of Trade and Industry as part c
n1 n2 2
of the National Measurement System Valid where s1 and s2 are the sample standard deviations.
Analytical Measurement Programme
(VAM)6. †Note: The degrees of freedom (υ) used for looking up the critical t value for independent sample means with unequal variances
Box 2
–
x s
Example 1 Method 1 4.2 4.5 6.8 7.2 4.3 5.40 1.471
Method 2 9.2 4.0 1.9 5.2 3.5 4.76 2.750
A chemist is asked to validate a new
economic method of derivatization table 3 Results from two methods used to determine concentrations of selenium.
before analysing a solution by a standard
gas chromatography method. The long-
term mean for the check samples using tcrit = 2.26 at the 95% confidence
the old method is 22.7 µg/L. For the new level for 9 degrees of freedom. t 0.64 0.64 0.459
2.205 0.632 1.395
method the mean is 23.5 µg/L, based on As tcalculated > tcrit we can reject the null
10 results with a standard deviation of hypothesis and conclude that we are 95% The 95% critical value is 2.306 for
0.9 µg/L. Is the new method equivalent certain that there is a significant difference n = 8 (n1 + n2 –2 ) degrees of freedom.
to the old? To answer this question we between the new and old methods. This exceeds the calculated value of
use the t-test to compare the two mean [Note: This does not mean the new 0.459, thus the null hypothesis (H0)
values. We start by stating exactly what derivatization method should be cannot be rejected and we conclude
we are trying to decide, in the form of abandoned. A judgement needs to there is no significant difference between
two alternative hypotheses; (i) the means be made on the economics and on the means or the results given by the
could really be the same, or (ii) the whether the results are ‘fit for purpose’. two methods.
means could really be different. In The significance test is only one piece
statistical terminology this is written as: of information to be considered.] Example 3 (5)
• The null hypothesis (H0): new method Two methods are available for
mean = long-term check sample mean. Example 2 (5) determining the concentration of
• The alternative hypothesis (H1): new Two methods for determining the vitamins in foodstuffs. To compare
method mean ≠ long-term check sample concentration of Selenium are to be the methods several different sample
mean. compared. The results from each matrices are prepared using the same
To test the null hypothesis we calculate method are shown in Table 3: technique. Each sample preparation is
the t-value as below. Note, the calculated Using the t-test for independent then divided into two aliquots and
t-value is the ratio of the difference sample means we define the null readings are obtained using the two
– –
between the means and a measure of hypothesis H0 as x 1 = x 2 methods, ideally commencing at the
the spread (standard deviation) and the This means there is no difference between same time to lessen the possible effects
amount of data available (n). the means of the two methods (the of sample deterioration. The results are
alternative hypothesis is H1: x–1 ≠ x–2). If shown in Table 4.
23.5 22.7 –
t 2.81 the two methods have sample standard The null hypothesis is H0: d = 0
0.9 / 10 –
deviations that are not significantly against the alternative H1: d ≠ 0
In the final step of the significance test different then we can combine (or pool) The test is a two-tailed test as we are
– –
we compare the calculated t-value with the standard deviation (Sc). interested in both d<0 and d>0
–
the critical t-value obtained from tables (see What is an F-Test?) The mean d = 0.475 and the sample
(4). To look up the critical value we need standard deviation of the paired
to know three pieces of information: 1.4712 (5 1) 2.7502 (5 1) differences is sd = 0.700
Sc
(i) Are we interested in the direction (5 5 2)
of the difference between the two 0.475 8
2.205 t 1.918
0.700
means or only that there is a difference,
for example, are we performing a one- If the standard deviations are The tabulated value of tcrit (with
sided or two-sided t-test (see Table 1)? significantly different then the t-test n = 7 degrees of freedom, at the 95%
In the case above, it is the latter, there- for un-equal variances should be used confidence limit) is 2.365. Since the
fore, the two-sided critical value is used. (Table 2). calculated value is less than the critical
(ii) The degrees of freedom: this is Evaluating the test statistic t value, H0 cannot be rejected and it
simply the number of data points follows that there is no difference between
minus one (n–1). (5.40 4.76) the two techniques.
t =>
(iii) How certain do we want to be 2.205 1 1
5 5
about our conclusions? It is normal
practice in chemistry to select the 95%
confidence level (i.e., about 1 in 20 Matrix
times we perform the t-test we could
arrive at an erroneous conclusion). Method 1 2 3 4 5 6 7 8
However, in some situations this is an A (mg/g) 2.52 3.13 4.33 2.25 2.79 3.04 2.19 2.16
unacceptable level of error, such as in B (mg/g) 3.17 5.00 4.03 2.38 3.68 2.94 2.83 2.18
medical research. In these cases, the Difference (d) -0.65 -1.87 0.30 -0.13 -0.89 0.10 -0.64 -0.02
99% or even the 99.9% confidence
level can be chosen. table 4 Comparison of two methods used to determine the concentration of vitamins in foodstuffs.
LC•GC Europe Online Supplement statistics and data analysis 9
Analysis
of Variance
Shaun Burke, RHM Technology Ltd, High Wycombe, Buckinghamshire, UK.
With the advent of built-in spreadsheet central tenet of ANOVA is that the total SS in the form of the data contained in Figure 1,
functions and affordable dedicated an experiment can be divided into the which shows the results from 12 different
statistical software packages, Analysis of components caused by random error, given analysts analysing the same material. Using
Variance (ANOVA) has become relatively by the within-group (or sample) SS, and the these data and a spreadsheet, the results
simple to carry out. This article will components resulting from differences obtained from carrying out one-way
therefore concentrate on how to select the between means. It is these latter components ANOVA are reported in Example 1. In this
correct variant of the ANOVA method, the that are used to test for statistical example, the ANOVA shows there are
advantages of ANOVA, how to interpret significance using a simple F-test (1). significant differences between analysts
the results and how to avoid some of the (Fvalue > Fcrit at the 95% confidence level).
pitfalls. For those wanting more detailed Why not use multiple t-tests This result is obvious from a plot of the
theory than is given in the following instead of ANOVA? data (Figure 1) but in many situations a
section, several texts are available (2–5). Why should we use ANOVA in preference visual inspection of a plot will not give such
to carrying out a series of t-tests? I think a clear-cut result. Notice that the output
A bit of ANOVA theory this is best explained by using an example; also includes a ‘p-value’ (see Interpretation
Whenever we make repeated suppose we want to compare the results of the result(s) section, which follows).
measurements there is always some from 12 analysts taking part in a training
variation. Sometimes this variation (known exercise. If we were to use t-tests, we Note: ANOVA cannot tell us which
as within-group variation) makes it difficult would need to calculate 66 t-values. Not individual mean or means are different
for analysts to see if there have been only is this a lot of work but the chance of from the consensus value and in what
significant changes between different groups reaching a wrong conclusion increases. The direction they deviate. The most effective
of replicates. For example, in Figure 1 correct way to analyse this sort of data is way to show this is to plot the data (Figure
(which shows the results from four replicate to use one-way ANOVA. 1) or alternatively, but less effectively, carry
analyses by 12 analysts), we can see that out a multiple comparison test such as
the total variation is a combination of the One-way ANOVA Scheffe's test (2). It is also important to
spread of results within groups and the One-way ANOVA will answer the question: make sure the right questions are being
spread between the mean values (between- Is there a significant difference between asked and that the right data are being
group variation). The statistic that measures the mean values (or levels), given that the captured. In Example 1, it is possible that
the within and between-group variations in means are calculated from a number of the time difference between the analysts
ANOVA is called the sum of squares and replicate observations? ‘Significant’ refers carrying out the determinations is the
often appears in the output tables to the observed spread of means that reason for the difference in the mean
abbreviated as SS. It can be shown that the would not normally arise from the chance values. This example shows how good
different sums of squares calculated in variation within groups. We have already experimental design procedures could have
ANOVA are equivalent to variances (1). The seen an example of this type of problem in prevented ambiguity in the conclusions.
10 statistics and data analysis LC•GC Europe Online Supplement
27.1
27.2
27
27.1
27.2
27
27.1
27.2
26.9
27.3
26.9
27.3
26.9
27.3
27.1
27.2
27
27.1
27.2
27
27.1
27.2
26.9
27.3
26.9
27.3
26.9
27.3
protein yield?
60
• Does time and/or temperature affect the
protein yield?
27
27.1
27.2
27
27.1
27.2
27
27.1
27.2
26.9
27.3
26.9
27.3
26.9
27.3
48
46
44
Analyte concentration (ppm)
42
40 total
standard
deviation
38
36
34
32 Mean
30
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12
Analyst ID
significant difference when none is a number of tests for heteroscedasity (i.e., problem in the data structure by
present. The best way to avoid this pitfall Bartlett's test (5) and Levene's test (2)). It transforming it, such as by taking logs (7).
is, as ever, to plot the data. There also exist may be possible to overcome this type of If the variability within a group is
correlated with its mean value then
ANOVA may not be appropriate and/or it
may indicate the presence of outliers in the
ZHigh data (Figure 4). Cochran's test (5) can be
ZHigh used to test for variance outliers.
ZLow
Conclusions
• ANOVA is a powerful tool for
Response
Response
ZLow
determining if there is a statistically
significant difference between two or
more sets of data.
• One-way ANOVA should be used
when we are comparing several sets
of observations.
YLow YHigh YLow YHigh • Two-way ANOVA is the method
used when there are two separate
(a) Y and Z are independent (b) Y and Z are interacting factors that may be influencing a result.
• Except for the smallest of data sets
ANOVA is best carried out using a
figure 2 Interactive factors. spreadsheet or statistical software
package.
• You should always plot your data to
make sure the assumptions ANOVA is
Yes Compare interaction mean
based on are not violated.
Compare within-group mean Significant
Start squares with interaction mean difference? squares with individual factor
squares (F > F crit) mean squares Acknowledgements
The preparation of this paper was
No supported under a contract with the UK
Pool the within-group and
Department of Trade and Industry as part
interaction sums of squares of the National Measurement System Valid
Analytical Measurement Programme (VAM)
(8).
Compare pooled mean
squares with individual factor References
mean squares (1) S. Burke, Scientific Data Management 1(1),
32–38, September 1997.
(2) G.A. Millikem and D.E. Johnson, Analysis of
Messy Data, Volume 1: Designed Experiments,
Van Nostrand Reinhold Company, New York,
figure 3 Comparing mean squares in two-way ANOVA with replication.
USA (1984).
(3) J.C. Miller and J.N. Miller, Statistics for
Analytical Chemistry, Ellis Horwood PTR
Prentice Hall, London, UK (ISBN 0 13 0309907).
(4) C. Chatfield, Statistics for Technology,
Chapman & Hall, London, UK (ISBN 0412
25340 2).
(5) T.J. Farrant, Practical Statistics for the Analytical
Unreliable high mean (may contain outliers)
Scientist, A Bench Guide, Royal Society of
Chemistry, London, UK (ISBN 0 85404 442 6)
(1997).
(6) K.V. Mardia, J.T. Kent and J.M. Bibby,
Multivariate Analysis, Academic Press Inc. (ISBN
Significantly different means by ANOVA 0 12 471252 5) (1979).
Variance
This article, the fourth and final part of our statistics refresher series, looks
at how to deal with ‘messy’ data that contain transcription errors or extreme
and skewed results.
This is the last article in a series of short readings are taken at set times or the cost Pairwise deletion can be used as an
papers introducing basic statistical methods of retesting is prohibitive, so alternative alternative to casewise deletion in
of use in analytical science. In the three ways of addressing this problem are needed. situations where parameters (correlation
previous papers (1–3) we have assumed Current statistical software packages coefficients, for example) are calculated on
the data has been ‘tidy’; that is, normally typically deal with missing data by one of successive pairs of variables (e.g., in a
distributed with no anomalous and/or three methods: recovery experiment we may be interested
missing results. In the real world, however, Casewise deletion excludes all examples in the correlations between material
we often need to deal with ‘messy’ data, (cases) that have missing data in at least recovered and extraction time, temperature,
for example data sets that contain one of the selected variables. For example, particle size, polarity, etc. With pairwise
transcription errors, unexpected extreme in ICP–AAS (inductively coupled deletion, if one solvent polarity measurement
results or are skewed. How we deal with plasma–atomic absorption spectroscopy) was missing only this single pair would be
this type of data is the subject of this article. calibrated with a number of standard deleted from the correlation and the
solutions containing several metal ions at correlations for recovery versus extraction
Transcription errors different concentrations, if the aluminium time and particle size would be unaffected)
Transcription errors can normally be value were missing for a particular test (see Table 2).
corrected by implementing good quality portion, all the results for that test portion Pairwise deletion can, however, lead to
control procedures before statistical would be disregarded (See Table 1). serious problems. For example, if there is a
analysis is carried out. For example, the This is the usual way of dealing with ‘hidden’ systematic distribution of missing
data can be independently checked or, missing data, but it does not guarantee points then a bias may result when
more rarely, the data can be entered, again correct answers. This is particularly so, in calculating a correlation matrix (i.e., different
independently, into two separate files and complex (multivariate) data sets where it is correlation coefficients in the matrix can be
the files compared electronically to possible to end up deleting the majority based on different subsets of cases).
highlight any discrepancies. There are also of your data if the missing data are Mean substitution replaces all missing
a number of outlier tests that can be used randomly distributed across cases data in a variable by the mean value for
to highlight anomalous values before other and variables. that variable. Though this looks as if the
statistics are calculated. These tests do not
remove the need for good quality
assurance; rather they should be seen as
an additional quality check. Al B Fe Ni
Solution 1 94.5 578 23.1
Solution 2 567 72.1 673 7.6
Missing data
Solution 3 34.0 674 44.7
No matter how well our experiments are
Solution 4 234 97.4 429 82.9
planned there will always be times when
something goes wrong, resulting in gaps in Casewise deletion. Statistical analysis
only carried out on the reduced data set.
the data. Some statistical procedures will
not work as well, or at all, with some data Al B Fe Ni
missing. The best recourse is always to Solution 2 567 72.1 673 7.6
Solution 4 234 97.4 429 82.9
repeat the experiment to generate the
complete data set. Sometimes, however,
this is not feasible, particularly where table 1 Casewise deletion.
20 statistics and data analysis LC•GC Europe Online Supplement
data set is now complete, mean substitution way (7,8). Good grounds for believing the Shapiro–Wilk’s test, skewness test,
has its own disadvantages. The variability data is normal are kurtosis test (7,9) etc.
in the data set is artificially decreased in • past experience of similar data • plots of the data, e.g., frequency
direct proportion to the number of missing • passing normality tests, for example, histogram normal probability plots (1,7).
data points, leading to underestimates of Kolmogrov–Smirnov–Lillefors test, Note that the tests used to check
dispersion (the spread of the data). Mean
substitution may also considerably change
the values of some other statistics, such as
Recovery Extraction Particle Solvent
linear regression statistics (3), particularly % time Size Polarity
where correlations are strong (See Table 3). (mins) (µm) (pKa)
Examples of these three approaches are Sample 1 93 20 90
illustrated in Figure 1, for the calculation of Sample 2 105 120 150 1.8
a correlation matrix, where the correlation Sample 3 99 180 50 1.0
coefficient (r) (3) is determined for each Sample 4 73 10 500 1.5
paired combination of the five variables, Pairwise deletion. Statistical analysis unaffected except
A to E. Note, how the r value can increase, for when one of a pair of data points are missing.
diminish or even reverse sign depending on Recovery Recovery Recovery
which method is chosen to handle the vs vs vs
Extraction Particle Solvent
missing data (i.e., the A, B correlation time Size Polarity
coefficients). r 0.728886 -0.87495 0.033942
(number of data points (4) (4) (3)
in the correlation)
Extreme values,
stragglers and outliers
Extreme values are defined as observations table 2 Pairwise deletion.
in a sample, so far separated in value from
the remainder as to suggest that they may
be from a different population, or the
result of an error in measurement (6). Al B Fe Ni
Extreme values can also be subdivided into Solution 1 94.5 578 23.1
stragglers, extreme values detected Solution 2 567 72.1 673 7.6
between the 95% and 99% confidence Solution 3 34.0 674 44.7
Solution 4 234 97.4 429 82.9
levels; and outliers, extreme values at >
99% confidence level. Mean substitution. Statistical analysis carried
It is tempting to remove extreme values out on pseudo completed data with no
allowance made for errors in estimated values.
automatically from a data set, because
they can alter the calculated statistics, e.g.,
Al B Fe Ni
increase the estimate of variance (a
Solution 1 400.5 94.5 578 23.1
measure of spread), or possibly introduce a
Solution 2 567 72.1 673 7.6
bias in the calculated mean. There is one Solution 3 400.5 34.0 674 44.7
golden rule however: no value should be Solution 4 234 97.4 429 82.9
removed from a data set on statistical
grounds alone. ‘Statistical grounds’ include
outlier testing. table 3 Mean substitution.
Outlier tests tell you, on the basis of
some simple assumptions, where you are
most likely to have a technical error; they
do not tell you that the point is ‘wrong’. Box 1: Imputation (4,5) is yet another method that is increasingly being used to
No matter how extreme a value is in a set handle missing data. It is, however, not yet widely available in statistical software
of data, the suspect value could packages. In its simplest ad hoc form an imputed value is substituted for the
nonetheless be a correct piece of missing value (e.g., mean substitution already discussed above is a form of
information (1). Only with experience or imputation). In its more general/systematic form, however, the imputed missing
the identification of a particular cause can values are predicted from patterns in the real (non-missing) data. A total of m
data be declared ‘wrong’ and removed. possible imputed values are calculated for each missing value (using a suitable
So, given that we understand that the statistical model derived from the patterns in the data) and then m possible
tests only tell us where to look, how do we complete data sets are analysed in turn by the selected statistical method. The m
test for outliers? If we have good grounds intermediate results are then pooled to yield the final result (statistic) and an
for believing our data is normally estimate of its uncertainty. This method works well providing that the missing
distributed then a number of ‘outlier tests’ data is randomly distributed and the model used to predict the inputed values
(sometimes called Q-tests) are available is sensible.
that identify extreme values in an objective
LC•GC Europe Online Supplement statistics and data analysis 21
tests can be applied to the data. These value = Data removed to show the effects B C D E
A 0.54(12) 0.55(12) 0.27(12) 0.23(11)
three approaches (outlier tests, robust of missing data.
mean = Mean values replacing missing data. B 0.50(11) 0.47(11) 0.77(10)
estimates and non-parametric methods)
C 0.79(11) 0.70(10)
are examined in more detail below.
Note, at the 95% confidence level, significant correlations are indicated at
3 D 0.71(10)
excluding the suspected pair of outlier is not only possible for individual points within a group to be outlying but also for the
values, i.e., the pair of values furthest away group means to have outliers with respect to each other. Another type of ‘outlier’ that can
from the mean. occur is when the spread of data within one particular group is unusually small or large
If the test values (G1, G2, G3) are greater when compared with the spread of the other groups (see Figure 4).
than the critical value obtained from tables • The same Grubbs’ tests that are used to determine the presence of within group
(see Table 4) then the extreme value(s) are outlying replicates may also be used to test for suspected outlying means.
unlikely to have occurred by chance at the • The Cochran’s test can be used to test for the third case, that of a suspected
stated confidence level (see Box 2). outlying variance.
To carry out the Cochran’s test, the suspect variance is compared with the sum of all
Pitfalls of outlier tests group variances. (The variance is a measure of spread and is simply the square of the
Figure 3 shows three situations where standard deviation (1).)
outlier tests can misleadingly identify an g
extreme value.
suspected s2 Σ ni
Figure 3(a) shows a situation common in Cn = g where g is the number of groups and n = i = g1
chemical analysis. Because of limited Σ S2
measurement precision (rounding errors) it i=1 i
is possible to end up comparing a result If this calculated ratio, C–n , exceeds the critical value obtained from statistical tables (7)
which, no matter how close it is to the then the suspect group spread is extreme. The choice of n – is the average number of all
other values, is an infinite number of sample results produced by all groups.
standard deviations away from the mean The Cochran’s test assumes the number of replicates within the groups are the same or
of the remaining results. This value will at least similar (± 1). It also assumes that none of the data have been rounded and there
therefore always be flagged as an outlier. are sufficient numbers of replicates to get a reasonable estimate of the variance. The
In Figure 3(b) there is a genuine long tail Cochran’s test should not be used iteratively as this could lead to a large percentage of
on the distribution that may cause data being removed (See Box 3).
successive outlying points to be identified.
This type of distribution is surprisingly Robust statistics
common in some types of chemical Robust statistics include methods that are largely unaffected by the presence of extreme
analysis, e.g., pesticide residues. values. The most commonly used of these statistics are as follows:
If there is very little data (Figure 3(c)) an Median: The median is a measure of central tendency1 and can be used instead of the
↔
outlier can be identified by chance. In this mean. To calculate the median ( χ ) the data are arranged in order of magnitude and the
situation it is possible that the identified median is then the central member of the series (or the mean of the two central
point is closer to the ‘true value’ and it is members when there is an even number of data, i.e., there are equal numbers of
the other values that are the outliers. This observations smaller and greater than the median). For a symmetrical distribution the mean
occurs more often than we would like to and median have the same value.
admit; how many times do your procedures
state ‘average the best two out of three ↔
xm when n is odd 1, 3, 5,…
x = n
determinations’? xm xm 1 when n is even 2, 4, 6,… where m = round up 2
2
Outliers by variance
When the data are from different groups Median Absolute Deviation (MAD): The MAD value is an estimate of the spread in the
(for example when comparing test data similar to the standard deviation.
methods via interlaboratory comparison) it
x1 xn
47.876 47.997 48.065 48.118 48.151 48.211 48.251 48.559 48.634 48.711 49.005 49.166 49.484
2 = 0.123
n = 13, mean = 48.479, s = 0.498, sn–2
An interlaboratory study was carried out by 13 laboratories to determine the amount of cotton in a cotton/polyester fabric,
85 determinations where carried out in total. The standard deviations of the data obtained by each of the 13 laboratories
was as follows:
Std. Dev. 0.202 0.402 0.332 0.236 0.318 0.452 0.210 0.074 0.525 0.067 0.609 0.246 0.198
Cn = 0.6092 = 0.371 = 0.252
n = 85 = 6.54 ≈ 7 0.2022 + 0.4022 ....... 0.2462 + 0.1982 1.474
13
–
Cochran’s critical value for n = 7 and g = 13 is 0.23 at the 95% confidence levels7.
As the test value is greater than the critical values it can be concluded that the laboratory with the highest standard deviation
(0.609) has an outlying spread of replicates and this laboratory’s results therefore need to be investigated further. It is normal
practice in inter-laboratory comparisons not to test for low variance outliers, i.e., laboratories reporting unusually precise results.
24 statistics and data analysis LC•GC Europe Online Supplement
15 Acknowledgement
The preparation of this paper was supported
14 under a contract with the UK’s Department
of Trade and Industry as part of the
National Measurement System Valid
13
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Analytical Measurement Programme (VAM)
Laboratory ID (14).
Regression
and Calibration
Shaun Burke, RHM Technology Ltd, High Wycombe, Buckinghamshire, UK.
Calibration is fundamental to achieving to do this is least-squares regression, which possible for several different data sets to
consistency of measurement. Often works by finding the “best curve” through yield identical regression statistics (r value,
calibration involves establishing the the data that minimizes the sums of residual sum of squares, slope and
relationship between an instrument squares of the residuals. The important intercept), but still not satisfy the linear
response and one or more reference term here is the “best curve”, not the assumption in all cases (9). It, therefore,
values. Linear regression is one of the most method by which this is achieved. There remains essential to plot the data in order
frequently used statistical methods in are a number of least-squares regression to check that linear least-squares statistics
calibration. Once the relationship between models, for example, linear (the most are appropriate.
the input value and the response value common type), logarithmic, exponential As in the t-tests discussed in the first
(assumed to be represented by a straight and power. As already stated, this paper paper (10) in this series, the statistical
line) is established, the calibration model is will concentrate on linear least-squares significance of the correlation coefficient is
used in reverse; that is, to predict a value regression. dependent on the number of data points.
from an instrument response. In general, [You should also be aware that there are To test if a particular r value indicates a
regression methods are also useful for other regression methods, such as ranked statistically significant relationship we can
establishing relationships of all kinds, not regression, multiple linear regression, non- use the Pearson’s correlation coefficient
just linear relationships. This paper linear regression, principal-component test (Table 1). Thus, if we only have four
concentrates on the practical applications regression, partial least-squares regression, points (for which the number of degrees of
of linear regression and the interpretation etc., which are useful for analysing instrument freedom is 2) a linear least-squares
of the regression statistics. For those of you or chemically derived data, but are beyond correlation coefficient of –0.94 will not be
who want to know about the theory of the scope of this introductory text.] significant at the 95% confidence level.
regression there are some excellent However, if there are more than 60 points
references (1–6). What do the linear least-squares an r value of just 0.26 (r2 = 0.0676) would
For anyone intending to apply linear regression statistics mean? indicate a significant, but not very strong,
least-squares regression to their own data, Correlation coefficient: Whether you use a positive linear relationship. In other words,
it is recommended that a statistics/graphics calculator’s built-in functions, a a relationship can be statistically significant
package is used. This will speed up the spreadsheet or a statistics package, the but of no practical value. Note that the test
production of the graphs needed to first statistic most chemists look at when used here simply shows whether two sets
confirm the validity of the regression performing this analysis is the correlation are linearly related; it does not “prove”
statistics. The built-in functions of a coefficient (r). The correlation coefficient linearity or adequacy of fit.
spreadsheet can also be used if the ranges from –1, a perfect negative It is also important to note that a
routines have been validated for accuracy relationship, through zero (no relationship), significant correlation between one
(e.g., using standard data sets (7)). to +1, a perfect positive relationship variable and another should not be taken
(Figures 1(a–c)). The correlation coefficient as an indication of causality. For example,
What is regression? is, therefore, a measure of the degree of there is a negative correlation between
In statistics, the term regression is used to linear relationship between two sets of time (measured in months) and catalyst
describe a group of methods that data. However, the r value is open to performance in car exhaust systems.
summarize the degree of association misinterpretation (8) (Figures 1(d) and (e), However, time is not the cause of the
between one variable (or set of variables) show instances in which the r values alone deterioration, it is the build up of sulfur
and another variable (or set of variables). would give the wrong impression of the and phosphorous compounds that
The most common statistical method used underlying relationship). Indeed, it is gradually poisons the catalyst. Causality is,
14 statistics and data analysis LC•GC Europe Online Supplement
(n 1)
RSE s(y) 1 r2
(n 2)
where s(y) is the standard deviation of the y values in the calibration, n is the number of
in fact, very difficult to prove unless the data pairs and r is the least-squares regression correlation coefficient.
chemist can vary systematically and
independently all critical parameters, while Confidence intervals
measuring the response for each change. As with most statistics, the slope (b) and intercept (a) are estimates based on a finite
sample, so there is some uncertainty in the values. (Note: Strictly, the uncertainty arises
Slope and intercept from random variability between sets of data. There may be other uncertainties, such as
In linear regression the relationship measurement bias, but these are outside the scope of this article.) This uncertainty is
between the X and Y data is assumed to quantified in most statistical routines by displaying the confidence limits and other
be represented by a straight line, Y = a + statistics, such as the standard error and p values. Examples of these statistics are given in
bX (see Figure 2), where Y is the estimated Table 2.
response/dependent variable, b is the slope
(gradient) of the regression line and a is
the intercept (Y value when X = 0). This
straight-line model is only appropriate if Degrees of freedom Confidence level
the data approximately fits the assumption (n-2) 95% ( = 0.05) 99% ( = 0.01)
of linearity. This can be tested for by 2 0.950 0.990
plotting the data and looking for curvature 3 0.878 0.959
(e.g., Figure 1(d)) or by plotting the 4 0.811 0.917
residuals against the predicted Y values or 5 0.754 0.875
X values (see Figure 3).
6 0.707 0.834
Although the relationship may be known
7 0.666 0.798
to be non-linear (i.e., follow a different
functional form, such as an exponential 8 0.632 0.765
curve), it can sometimes be made to fit the 9 0.602 0.735
linear assumption by transforming the data 10 0.576 0.708
in line with the function, for example, by 11 0.553 0.684
taking logarithms or squaring the Y and/or 12 0.532 0.661
X data. Note that if such transformations
13 0.514 0.641
are performed, weighted regression
14 0.497 0.623
(discussed later) should be used to obtain
an accurate model. Weighting is required 15 0.482 0.606
because of changes in the residual/error 20 0.423 0.537
structure of the regression model. Using 30 0.349 0.449
non-linear regression may, however, be a 40 0.304 0.393
better alternative to transforming the data 60 0.250 0.325
when this option is available in the Significant correlation when |r| ≥ table value
statistical packages you are using.
The p value is the probability that a value could arise by chance if the true value was x data for the n points in the calibration.
zero. By convention a p value of less than 0.05 indicates a significant non-zero statistic. RSE is the residual standard error for the
Thus, examining the spreadsheet’s results, we can see that there is no reason to reject the calibration.
hypothesis that the intercept is zero, but there is a significant non-zero positive If we want, therefore, to reduce the size
gradient/relationship. The confidence intervals for the regression line can be plotted for all of the confidence interval of the prediction
points along the x-axis and is dumbbell in shape (Figure 2). In practice, this means that the there are several things that can be done.
model is more certain in the middle than at the extremes, which in turn has important 1. Make sure that the unknown
consequences for extrapolating relationships. determinations of interest are close to
When regression is used to construct a calibration model, the calibration graph is used the centre of the calibration (i.e., close
in reverse (i.e., we predict the X value from the instrument response [Y-value]). This to the values –x ,y– [the centroid point]).
prediction has an associated uncertainty (expressed as a confidence interval) This suggests that if we want a small
confidence interval at low values of x
Y a
Xpredicted then the standards/reference samples
b used in the calibration should be
concentrated around this region. For
t RSE Y y
2 example, in analytical chemistry, a typical
Conf. interval for the prediction is: X predicted
1 1
b m n 2 pattern of standard concentrations
b2 n 1 s x might be 0.05, 0.1, 0.2, 0.4, 0.8, 1.6
where a is the intercept and b is the slope obtained from the regression equation.
–
Y is the mean value of the response (e.g., instrument readings) for m replicates (replicates
are repeat measurements made at the same level).
–y is the mean of the y data for the n points in the calibration. t is the critical value obtained (a) r = -1
from t-tables for n–2 degrees of freedom. s(x) is the standard deviation for the
1.4
Intercept Slope Residuals (b) r = 0
1.2
Y= -0.046 + 0.1124 * X
r = 0.98731
1.0
Y Correlation coefficient
0.8
(c) r = +1
0.6
0.14
0.12
(f) r = 0.9
0.1
0.08
0.06 possible outlier
Residuals
0.04
0.02
0 (g) r = 0.9
-0.02
-0.04
-0.06
-0.08
0 1 2 3 4 5 6 7 8 9 10
X
figure 1 Correlation coefficients and
figure 3 Residuals plot. goodness of fit.
16 statistics and data analysis LC•GC Europe Online Supplement
2
xi x
Leverage i 1n n
(i.e., only one or two standards are used at Σ
j=1
x j x 2
higher concentrations). While this will lead
to a smaller confidence interval at lower where xi is the x value for which the leverage statistic is to be calculated, n is the
concentrations the calibration model will number of points in the calibration and –x is the mean of all the x values in the calibration.
be prone to leverage errors (see below). To test if a data point (xi,yi) is an outlier (relative to the regression model) the following
2. Increase the number of points in the outlier test can be applied.
calibration (n). There is, however, little
residualma x
improvement to be gained by going Test value 2
above 10 calibration points unless Y y
RSE 1 1n i
standard preparation and analysis is n 1 sy2
rapid and cheap.
3. Increase the number of replicate where RSE is the residual standard error, sy is the standard deviation of the Y values, Yi is
determinations for estimating the the y value, n is the number of points, –y is the mean of all the y values in the calibration
unknown (m). Once again there is a and residualmax is the largest residual value.
law of diminishing returns, so the For example, the test value for the suspected outlier in Figure 3 is 1.78 and the critical
number of replicates should typically value is 2.37 (Table 3 for 10 data points). Although the point appears extreme, it could
be in the range 2 to 5. reasonably be expected to arise by chance within the data set.
4. The range of the calibration can be
extended, providing the calibration is still
linear. Extrapolation and interpolation
We have already mentioned that the regression line is subject to some uncertainty and that
Bias, leverage and outliers this uncertainty becomes greater at the extremes of the line. If we, therefore, try to
Points of influence, which may or may not extrapolate much beyond the point where we have real data (10%) there may be
be outliers, can have a significant effect on relatively large errors associated with the predicted value. Conversely, interpolation near
the regression model and therefore, on its the middle of the calibration will minimize the prediction uncertainty. It follows, therefore,
predictive ability. If a point is in the middle that when constructing a calibration graph, the standards should cover a larger range of
of the model (i.e., close to –x ) but outlying concentrations than the analyst is interested in. Alternatively, several calibration graphs
on the Y axis, its effect will be to move the covering smaller, overlapping, concentration ranges can be constructed.
regression line up or down. The point is
then said to have influence because it
introduces an offset (or bias) in the
predicted values (see Figure 1(f)). If the
point is towards one of the extreme ends Response Residuals
of the plot its effect will be to tilt the (a) (b)
regression line. The point is then said to
have high leverage because it acts as a
lever and changes the slope of the
regression model (see Figure 1(g)).
0
Leverage can be a major problem if one or
two data points are a long way from all the
other points along the X axis.
A leverage statistic (ranging between
_ and 1) can be calculated for each value
1
n
of x. There is no set value above which this Concentration Predicted value
leverage statistic indicates a point of
influence. A value of 0.9 is, however, used
by some statistical software packages. figure 4 Plots of typical instrument response versus concentration.
*Note the large number of significant figures. In fact none of the values above warrant more than 3 significant figures!
table 2 Statistics obtained using Excel 5.0 regression analysis function from the data used to generate the calibration graph in Figure 2.
LC•GC Europe Online Supplement statistics and data analysis 17
2 95%
99%
1.5
0 10 20 30 40 50 60 70 80 90 100
References
(1) G.W. Snedecor and W.G. Cochran, Statistical
The associated uncertainty for the weighted prediction, expressed as a confidence
Methods, The Iowa State University Press, USA,
interval is then: 6th edition (1967).
Conf. interval for the prediction is (2) N. Draper and H. Smith, Applied Regression
Analysis, John Wiley & Sons Inc., New York,
t RSE(w) 1 Y
2 USA, 2nd edition (1981).
X(w)predicted n (3) BS ISO 11095: Linear Calibration Using
b(w) mWi 2
b(w) Σ Wj xj2
j=1
Reference Materials (1996).
(4) J.C. Miller and J.N. Miller, Statistics for
Analytical Chemistry, Ellis Harwood PTR Prentice
where t is the critical value obtained from t tables for n–2 degrees of freedom at a Hall, London, UK.
stated significance level (typically a = 0.05), Wi is the weighted standard deviation for the (5) A.R. Hoshmand, Statistical Methods for
x data for the ith point in the calibration, m is the number of replicates and the weighted Environmental and Agricultural Sciences, 2nd
edition, CRC Press (ISBN 0-8493-3152-8)
residual. (1998).
n n (6) T.J. Farrant, Practical Statistics for the Analytical
Σ Wj yj2 b(w)
j=1
2
Σ Wj xj2
j=1
Scientist, A Bench Guide, Royal Society of
Standard error for the calibration RSE(w) Chemistry, London, UK (ISBN 0 85404 4226)
n 1 (1997).
(7) Statistical Software Qualification: Reference
Data Sets, Eds. B.P. Butler, M.G. Cox, S.L.R.
Ellison and W.A. Hardcastle, Royal Society of
Conclusions
Chemistry, London, UK (ISBN 0-85404-422-1)
• Always plot the data. Don’t rely on the regression statistics to indicate a linear (1996).
relationship. For example, the correlation coefficient is not a reliable measure of (8) H. Sahai and R.P. Singh, Virginia J. Sci., 40(1),
goodness-of-fit. 5–9, (1989).
(9) F.J. Anscombe, Graphs in Statistical Analysis,
• Always examine the residuals plot. This is a valuable diagnostic tool. American Statistician, 27, 17–21, February
• Remove points of influence (leverage, bias and outlying points) only if a reason can be 1973.
found for their aberrant behaviour. (10) S. Burke, Scientific Data Management, 1(1),
32–38, September 1997.
• Be aware that a regression line is an estimate of the “best line” through the data and (11) M. Sargent, VAM Bulletin, Issue 13, 4–5,
that there is some uncertainty associated with it. The uncertainty, in the form of a Laboratory of the Government Chemist
confidence interval, should be reported with the interpolated result obtained from any (Autumn 1995).
linear regression calibrations.
Shaun Burke currently works in the Food
Technology Department of RHM Technology
Acknowledgement Ltd, High Wycombe, Buckinghamshire, UK.
The preparation of this paper was supported under a contract with the Department of However, these articles were produced while
Trade and Industry as part of the National Measurement System Valid Analytical he was working at LGC, Teddington,
Measurement Programme (VAM) (11). Middlesex, UK (http://www.lgc.co.uk).