0% found this document useful (0 votes)
39 views49 pages

Stats and Data Analysis

Statistics and robust methods are useful for proficiency testing in several ways: 1) Finding the consensus value and its uncertainty by using the robust mean, which is less influenced by outliers than the regular mean. 2) Assessing participant performance using a z-score based on the robust mean and a fitness-for-purpose standard deviation, providing information about how well results meet intended uses. 3) Evaluating test materials for sufficient homogeneity and stability by applying robust statistics, which are better suited for datasets that may contain outliers or stragglers.

Uploaded by

Eman Yahia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views49 pages

Stats and Data Analysis

Statistics and robust methods are useful for proficiency testing in several ways: 1) Finding the consensus value and its uncertainty by using the robust mean, which is less influenced by outliers than the regular mean. 2) Assessing participant performance using a z-score based on the robust mean and a fitness-for-purpose standard deviation, providing information about how well results meet intended uses. 3) Evaluating test materials for sufficient homogeneity and stability by applying robust statistics, which are better suited for datasets that may contain outliers or stragglers.

Uploaded by

Eman Yahia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Statistics and Data Analysis

in
Proficiency Testing
Michael Thompson
School of Biological and Chemical Sciences
Birkbeck College (University of London)
Malet Street
London WC1E 7HX
[email protected]
Organisation of a proficiency test

“Harmonised Protocol”. Pure Appl Chem. 2006, 78, 145-196.


Where do we use statistics in
proficiency testing?

• Finding a consensus and its uncertainty to


use as an assigned value
• Assessing participants’ results
• Assessing the efficacy of the PT scheme
• Testing for sufficient homogeneity and
stability of the distributed test material
• Others
Criteria for an ideal scoring
method

• Adds value to raw results.


• Easily understandable, based on the
properties of the normal distribution.
• Has no arbitrary scaling transformation.
• Is transferable between different
concentrations, analytes, matrices, and
measurement principles.
How can we construct a score?
• An obvious idea is to utilise the properties
of the normal distribution to interpret the
results of a proficiency test.
BUT…
We do not make
any assumptions
about the actual
data.
Example dataset A
• Determination of protein nitrogen in a meat
product.
A weak scoring method

z 
x x s

x 2.126
s 0.077
• On average, slightly more than 95% of laboratories
receive z-score within the range ±2.
Robust mean and standard
deviation

ˆrob , 
ˆrob
• Robust statistics is applicable to datasets that
look like normally distributed samples
contaminated with outliers and stragglers (i.e.,
unimodal and roughly symmetric.
• The method downweights the otherwise large
influence of outliers and stragglers on the
estimates.
• It models the central ‘reliable’ part of the dataset.
Can I use robust estimates?

Skewed

Bimodal

Heavy-tailed

Measurement axis
x T 
x1 x 2  xn 
Huber’s H15
Set 1 k 2, p 0, 
ˆ0 median, 0 
ˆ 1.5 MAD

xi if 
ˆp kˆp xi 
ˆp k
ˆp
~ 
xi 
ˆ p kp
ˆ if xi ˆp kˆp
ˆ
p k ˆp if xi 
ˆp kˆp


ˆ mean ( ~
xi )
p
1

ˆ2p 1 f ( k ) var( ~
 xi )

If not converged, p p 1
References: robust statistics

• Analytical Methods Committee,


Analyst,1989, 114, 1489
• AMC Technical Brief No 6, 2001
(download from www/rsc.org/amc)
• P J Rousseeuw, J. Chemomet, 1991, 5, 1.
Is that enough?

z 
x 
ˆrob 
ˆrob


ˆrob 2.128


ˆrob 0.048

• On average, slightly less than 95% of


laboratories receive a z-score between ±2.
What more do we need?
• We need a method that evaluates the data
in relation to its intended use, rather than
merely describing it.
• This adds value to the data rather than
simply summarising it.
• The method is based on fitness for
purpose.
Fitness for purpose

• Fitness for purpose occurs when the uncertainty


of the result uf gives best value for money.
• If the uncertainty is smaller than uf , the analysis
may be too expensive.
• If the uncertainty is larger than uf , the cost and
the probability of a mistaken decision will rise.
Fitness for purpose
• The value of uf can sometimes be estimated
objectively by decision theoretic methods, but is
most often simply agreed between the
laboratory and the customer by professional
judgement.
• In the proficiency test context, uf should be
determined by the scheme provider.

Reference: T Fearn, S A Fisher, M Thompson,


and S L R Ellison, Analyst, 2002, 127, 818-824.
A score that meets all of the
criteria
• If we now define a z-score thus:
z 
x 
ˆrob p where p u f
we have a z-score that is both robustified against
extreme values and tells us something about fitness
for purpose.
• In an exactly compliant laboratory, scores of 2<|z|<3
will be encountered occasionally, and scores of |z|>3
rarely. Better performers will receive fewer of these
extreme z-scores.
Example data A again
• Suppose that the fitness for purpose criterion set
for the analysis is an RSD of 1%. This gives us:
p 0.01 2.1 0.021
Finding a consensus from
participants’ results

• The consensus is not theoretically the best


option for the assigned value but is usually
the only practicable value.
• The consensus is not necessarily identical
with the true value. PT providers have to
be alert to this possibility.
What is a ‘consensus’?
• Mean? - easy to calculate, but affected by
outliers and asymmetry.
• Robust mean? - fairly easy to calculate, handles
outliers but affected by asymmetry.
• Median? - easy to calculate, more robust for
asymmetric distributions, but larger standard
error than robust mean.
• Mode? - intuitively good, difficult to define,
difficult to calculate.
The robust mean as consensus

• The robust mean provides a useful consensus


in the great majority of instances, where the
underlying distribution is roughly symmetric
and there are 0-10% outliers.
• The uncertainty of this consensus can be
safely taken as

u
xa 
ˆrob n
When can I use robust estimates?

Skewed

Bimodal

Heavy-tailed

Measurement axis
Skewed distributions
• Skews can arise when the participants’
results come from two or more
inconsistent methods.
• They can also arise as an artefact at low
concentrations of analyte as a result of
data recording practice.
• Rarely, skews can arise when the
distribution is truly lognormal.
Possible use of a trimmed data
set?
Can I use the mode?
How many modes? Where are they?
The normal kernel density for
identifying a mode
n
x xi 
y  
1

nh i 1  h 
where Φis the standard normal density,
exp( a / 2)
2
(a) 
2

AMC Technical Brief No. 4


A normal kernel
A kernel density
Another kernel density
Graphical representation of sample data
Kernel density of the aflatoxin data
Uncertainty of the mode
• The uncertainty of the consensus can be
estimated as the standard error of the
mode by applying the bootstrap to the
procedure.
• The bootstrap is a general procedure
based on resampling for estimating
standard errors of complex statistics.
• Reference: Bump-hunting for the proficiency
tester – searching for multimodality. P J
Lowthian and M Thompson, Analyst, 2002,127,
1359-1364.
The normal mixture model
m m
f ( y ) p j f j ( y ), p j 1
j 1 j
1

exp( ( y j ) / 2 2 2
f j ( y) 
2

AMC Technical Brief No 23, and AMC Software.


Thompson, Acc Qual Assur, 2006, 10, 501-505.
Mixture models found by the maximum
likelihood method (the EM algorithm)
• The M-step
n
pˆ j   Pˆ( j y i ) / n
i 1
n n
̂j  y i Pˆ
( j yi ) Pˆ
( j yi )
i 1 i 1

2
n m

j
ˆ
1i 
1

2 ˆ
  ( yi j ) P( j yi )
ˆ Pˆ( j y )i

• The E-step
m

( j yi ) pˆj f j ( yi ) pˆj f j ( yi )
j
1
Kernel density and fit of 2-component
normal mixture model
Kernel density and variance-inflated
mixture model
Useful References
• Mixture models
M Thompson. Accred Qual Assur. 2006, 10, 501-505.
AMC Technical Brief No. 23, 2006. www/rsc.org/amc

• Kernel densities
B W Silverman, Density estimation for statistics and data
analysis. Chapman and Hall, London, 1986.
AMC Technical Brief, no. 4, 2001 www/rsc.org/amc

• The bootstrap
B Efron and R J Tibshirani, An introduction to the
bootstrap. Chapman and Hall, London, 1993
AMC Technical Brief, No. 8, 2001 www/rsc.org/amc
Conclusions—scoring

• Use z-scores based on fitness for


purpose.
• Estimate the consensus as the robust
mean and its uncertainty as  ˆrob n
if the dataset is roughly symmetric.
• If the dataset is skewed and plausibly
composite, use kernel density methods
or mixture models
Homogeneity testing
• Comminute and mix bulk material.
• Split into distribution units.
• Select m>10 distribution units at random.
• Homogenise each one.
• Analyse 2 test portions from each in
random order, with high precision, and
conduct one-way analysis of variance on
results.
Design for homogeneity testing

MSB MSW
san  MSW , ssam 
2
Problems with simple ANOVA
based on testing
H 0 : sam 0
• Analytical precision too low—method
cannot detect consequential degree of
heterogeneity.

• Analytical precision too high—method


finds significant degree of heterogeneity
that may not be consequential.

(Everything is heterogeneous!)
“Sufficient homogeneity”:
original definition

• Material passes homogeneity test if

ssam L 0.3p


• Problems are:
– ssam may not be well estimated;
– too big a probability of rejecting
satisfactory test material.
Fearn test
• Test H 0 : sam
2
L
2
by rejecting when

2
ssam 
L m 1
2 2

2
san Fm1,m 1
m 1 2

Ref: Analyst, 2001, 127, 1359-1364.


Problems with homogeneity
data
• Problems with data are common:
e.g., no proper randomisation, insufficient
precision, biases, trends, steps,
insufficient significant figures recorded,
outliers.
• Laboratories need detailed instructions.
• Data need careful scrutiny before
statistics.
• HP1 is incorrect in saying that all outlying
data should be retained.
General references
• The Harmonised Protocol (revised)
M Thompson, S L R Ellison and R Wood
Pure Appl. Chem., 2006, 78, 145-196.
• R E Lawn, M Thompson and R F Walker,
Proficiency testing in analytical chemistry. The
Royal Society of Chemistry, Cambridge, 1997.
• ISO Guide 43. International Standards
Organisation, Geneva, 1997.

You might also like