Peirce Sub
Peirce Sub
(1)
Where x is the sample mean. R depends on k, the number of outliers proposed to be rejected
and m the number of unknown quantities. The meaning of this latter may be destined to
remain obscure; however it appears to be something akin to degrees of freedom, i.e., the
number of independent processes that are giving rise to outliers in the data. Gould (1855)
acknowledges that the cases of m > 2 are of little practical signicance.
An illustration of the range of values of R for k = 0 100% and m = 0 100% of N is shown
in Figure 3, with details in PeirceLimits.R. For values of m = 1 we can see R dropping
below 0 at the point where k > 90% i.e., it is meaningless to try to reject more than 90%
of a given dataset. Additionally, for low values of k, increasing m does reduce R.
As an aside, I sought to generate the values in Table III in Gould (1855), giving values of
NlogQ for N observations and k proposed rejections. I followed his equation (B.) which is:
Q
k
=
k
k
(N k)
N
k
N
k
(2)
whence
N log
10
Q = N log
10
k
k
k
(N k)
N
k
N
k
(3)
However in faithfully replicating the table, I found additional adjustments were necessary
such that
Nlog
10
Q 10 + Nlog
10
Q (4)
Journal of Statistical Software 5
Figure 4: Values of NlogQ for N and k, per Gould paper.
and
Nlog
10
Q < 0.05 Nlog
10
Q 100 + Nlog
10
Q (5)
The reason for this is not entirely self-evident. An illustration of the function is shown in
Figure 4, which may be manipulated, if desired, with NlogQLimits.R:
4. Original paper
Peirce himself appears to have been aware of the diculties in interpreting his orininal paper.
I perceive that the theory of my criterion has been frequently misunderstood.
I presume this to be due in a great degree to the conciseness of the argument with
which it was published.
Peirce (1877)
However this did not prevent the methods becoming widely adopted in his own time, largely
due to the clarity of Goulds implementation. I had some diculty in replicating all of the
results from the original paper Peirce (1852). However a number of formulas are of interest.
6 Peirce
Figure 5: Probability of an error occurring, varying by mean error of system.
(These functions with corresponding plots are included in Pierce1852.R). His expression for
the probability of certain error in a system with mean error is given by:
() =
1
2
e
2
2
2
(6)
This is illustrated for a number of sample ranges of interest in Figure 5:
His next formula gives the probability of an error in a system which exceeds the required
limit, x:
(x) =
2
x
e
1
2
x
2
(7)
The reader may recognize as closely related to the complementary error function, erfc,
which is already implemented in R as NORMT3::erfc:
erfc(x) =
2
x
e
t
2
dt (8)
A comparison of both is shown in Figure 6.
His next equation for the probability of k observations exceeding the required limit x I took
to be:
P =
(x)
(x)
k
(9)
Journal of Statistical Software 7
Figure 6: Comparison of Peirces with erfc.
shows probability varying by ratio of limit of acceptable error to mean error.
whereby he derives:
P =
1
Nk
2
N
k
2
e
N+m+kx
2
2
(x)
k
(10)
However substituting arbitrary values in both lead to values of > 1. Again, the reason for
this is not entirely self-evident to me.
5. Conclusion
Peirce deserves credit as the rst to suggest a method of excluding outliers. Given the preva-
lence of normal distributions in routine observations, a revival of his methods may be timely.
I hope these illustrations will clarify the range over which his methods may be applied.
6. Acknowledgements
Knud Thomsen for sharing the method in C. Kevin Mullin for converting the method to R.
References
Gould BA (1855). On Peirces Criterion for the Rejection of Doubtful Observations, with
Tables for Facilitating its Application. Astronomical Journal, 4(83), 8187. doi:10.1086/
100480. URL http://adsabs.harvard.edu/abs/1855AJ......4...81G.
Natrella M (2012). NIST/SEMATECH e-Handbook of Statistical Methods. URL http:
//www.itl.nist.gov/div898/handbook/index.htm.
Peirce B (1852). Criterion for the Rejection of Doubtful Observations. The As-
tronomical Journal, 2(45), 161163. URL http://articles.adsabs.harvard.edu/
8 Peirce
cgi-bin/nph-iarticle_query?1852AJ......2..161P;data_type=PDF_HIGHhttp:
//adsabs.harvard.edu/full/1852AJ......2..161P.
Peirce B (1877). On Peirces Criterion. Proceedings of the American Academy of Arts and
Sciences, 13, 348351. URL http://www.jstor.org/stable/25138498.
Ross S (2003). Peirces Criterion for the Elimination of Suspect Experimental Data. Journal
of Engineering Technology, 20(2), 112. URL http://classes.engineering.wustl.edu/
2009/fall/che473/handouts/OutlierRejection.pdf.
Team RDC (2012). R: A Language and Environment for Statistical Computing. R Foun-
dation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http:
//www.R-project.org/.
Aliation:
Christopher Dardis
Department of Neurology
Barrow Neurological Institute
350 W. Thomas Road
Phoenix, AZ 85013
E-mail: [email protected]
URL: https://christopherdardis.wordpress.com/
Journal of Statistical Software http://www.jstatsoft.org/
published by the American Statistical Association http://www.amstat.org/
Volume VV, Issue II Submitted: yyyy-mm-dd
MMMMMM YYYY Accepted: yyyy-mm-dd