Stats and Data Analysis
Stats and Data Analysis
in
Proficiency Testing
Michael Thompson
School of Biological and Chemical Sciences
Birkbeck College (University of London)
Malet Street
London WC1E 7HX
[email protected]
Organisation of a proficiency test
z
x x s
x 2.126
s 0.077
• On average, slightly more than 95% of laboratories
receive z-score within the range ±2.
Robust mean and standard
deviation
ˆrob ,
ˆrob
• Robust statistics is applicable to datasets that
look like normally distributed samples
contaminated with outliers and stragglers (i.e.,
unimodal and roughly symmetric.
• The method downweights the otherwise large
influence of outliers and stragglers on the
estimates.
• It models the central ‘reliable’ part of the dataset.
Can I use robust estimates?
Skewed
Bimodal
Heavy-tailed
Measurement axis
x T
x1 x 2 xn
Huber’s H15
Set 1 k 2, p 0,
ˆ0 median, 0
ˆ 1.5 MAD
xi if
ˆp kˆp xi
ˆp k
ˆp
~
xi
ˆ p kp
ˆ if xi ˆp kˆp
ˆ
p k ˆp if xi
ˆp kˆp
ˆ mean ( ~
xi )
p
1
ˆ2p 1 f ( k ) var( ~
xi )
If not converged, p p 1
References: robust statistics
z
x
ˆrob
ˆrob
ˆrob 2.128
ˆrob 0.048
u
xa
ˆrob n
When can I use robust estimates?
Skewed
Bimodal
Heavy-tailed
Measurement axis
Skewed distributions
• Skews can arise when the participants’
results come from two or more
inconsistent methods.
• They can also arise as an artefact at low
concentrations of analyte as a result of
data recording practice.
• Rarely, skews can arise when the
distribution is truly lognormal.
Possible use of a trimmed data
set?
Can I use the mode?
How many modes? Where are they?
The normal kernel density for
identifying a mode
n
x xi
y
1
nh i 1 h
where Φis the standard normal density,
exp( a / 2)
2
(a)
2
exp( ( y j ) / 2 2 2
f j ( y)
2
2
n m
j
ˆ
1i
1
2 ˆ
( yi j ) P( j yi )
ˆ Pˆ( j y )i
• The E-step
m
Pˆ
( j yi ) pˆj f j ( yi ) pˆj f j ( yi )
j
1
Kernel density and fit of 2-component
normal mixture model
Kernel density and variance-inflated
mixture model
Useful References
• Mixture models
M Thompson. Accred Qual Assur. 2006, 10, 501-505.
AMC Technical Brief No. 23, 2006. www/rsc.org/amc
• Kernel densities
B W Silverman, Density estimation for statistics and data
analysis. Chapman and Hall, London, 1986.
AMC Technical Brief, no. 4, 2001 www/rsc.org/amc
• The bootstrap
B Efron and R J Tibshirani, An introduction to the
bootstrap. Chapman and Hall, London, 1993
AMC Technical Brief, No. 8, 2001 www/rsc.org/amc
Conclusions—scoring
MSB MSW
san MSW , ssam
2
Problems with simple ANOVA
based on testing
H 0 : sam 0
• Analytical precision too low—method
cannot detect consequential degree of
heterogeneity.
(Everything is heterogeneous!)
“Sufficient homogeneity”:
original definition
2
ssam
L m 1
2 2
2
san Fm1,m 1
m 1 2