0% found this document useful (0 votes)

57 views8 pages

Peirce Sub

Peirce's criterion for rejecting outliers in datasets has been used for over 150 years. The author implements Peirce's method as an R function to automate outlier removal for large datasets generated in their lab. They compare Peirce's criterion to other outlier detection methods using several sample datasets. Peirce's criterion performs comparably to other methods while rejecting fewer observations per application. The author also explores the theoretical underpinnings and limitations of Peirce's criterion as described in the original 1852 paper.

Uploaded by

neverwolf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views8 pages

Peirce Sub

Uploaded by

neverwolf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

JSS

Journal of Statistical Software

MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/
Peirces Criterion for the Rejection of Non-Normal
Outliers; Dening the Range of Applicability
Christopher Dardis
Barrow Neurological Institute, Phoenix, Arizona
Abstract
Peirces criteria for the rejection of non-normal outliers has been with us for over 150
years. Here, I present an implementation of the method in R. A number of examples are
presented and I discuss its range of applicability. Finally, I give illustrations from the
early literature on the method.
Keywords: Peirce, outlier, R.
1. Introduction
Peirces criterion for the rejection of non-normality have been with us for over 150 years. It
was, in fact, the rst criterion developed for the exclusion of outliers. I became interested in
his methods during the course of some lab research, where it became clear that the techniques
we were using were producing occasional grossly erroneous results.
I was persuaded of the merits of the technique by a paper from Ross (2003), which gave a
simple but practical technique for applying the method. However given the volume of data
our lab was generating, I sought to automate the method as a function in R (Team (2012)).
I was also interested to see if the technique generalized to a broader range than that given in
the above paper, which is limited to rejecting up to 9 outliers from 60 observations.
The original paper Peirce (1852) describes a technique for rejecting doubtful observations in
the case of those arising from observing planetary motion. It is assumed that unusual depar-
tures from normality (a Gaussian distribution) of observations are the result of the observer
rather the planets themselves. I believe similar assumptions are worthwhile in generalizing
the ndings of the natural sciences.
In brief, his technique was to generate probabilities of error occurring in the system where all
N observations are retained vs. that where k are rejected. He then rejected k observations if
2 Peirce
the new system (i.e., rejecting k) is closer to normal than the old.
2. Practical application
To begin with, an illustration of the merit of his technique following, with comparisons
with a number of standard existing techniques. While multiple methods are already im-
plemented in the outliers package in R, the following are limited to removing only one value:
chisq.out.test, dixon.test, grubbs.test. There is limited literature as to how legiti-
mate it is to repeat them multiple times on a dataset.
A leading alternative for rejecting non-normal values is Chauvenets criterion Chauvenet.R.
There is a lack of consensus as to whether it is legitimate to repeatedly apply the function,
so I have provided the option loop=TRUE to do so. Repeated application tends to further
shrink sets, whereas Peirces criterion does not suer this disadvantage. I took four sample
sets; that from Ross (2003), one from the National Institute of Standards and Technology
Natrella (2012) and two that are already available with R. The latter two are cautionary tales:
TeachingDemos in regard to repeated application of Chauvenets, compositions sa.outliers for
the perils of a set with complete separation. These sets are shown in Figure 1 and the results
in Table 1. Full details in PeirceVsChauvenet.R.
Another approach to rejecting outliers has been proposed by bbalibor
TM
. Their goal is to
determine an average value for a series of submissions (referred to as libor, basically a nominal
rate of interest). They suggest that a reasonable approach in the case of 16 submissions is
to eliminate the upper and lower 4, leaving 8 and then taking the mean of these 8. (Method
explained here). I compared this to using Peirces criterion on each set of 16 observations,
then averaging those, gure 2. (This is based on part of a complete set available from google
docs). Although there is no reason a priori to assume that the submissions on which libor is
based should follow a normal distribution, it can clearly be seen that Peirces criterion gives a
result almost identical to its rival, and excludes far fewer observations per application (range
is 1-4 in this example, vs 8 each time). This is akin to saying that excluding larger numbers
of outliers tends to have little eect on the mean value for this type of data. Alternatively,
one may say that the bbalibor
TM
method is to be preferred owing to its simplicity.
Dataset No. observations Method (no. removed)
Peirce Chauvenet Chauvenet
(repeated)
Ross 10 2 2 8
NIST 90 11 3 13
TeachingDemos 100 7 6 17
compositions sa.outliers 300 71 0 0
Table 1: Performance of outlier detection methods.
3. Methods
I sought to duplicate the table and results from Ross (2003). I was able to do so by following
Journal of Statistical Software 3
Figure 1: Illustration of datasets used to test outlier methods.
Figure 2: Libor calculated traditionally and using Peirces criterion.
4 Peirce
Figure 3: Values of R from % values of k and m. Sample size N=1000.
the methods in Gould (1855), (see PeirceGould.R). However, the upper limit for N, the
number of observations in the sample, is limited by Rs representation of large numbers, which
is given by .Machine$double.xmax and is 143 on my device. A more ecient technique for
achieving the same result already exists in C, and I implemented this as Peirce.R. Both
methods rely on generating R, the ratio of the absolute error of one measurement to the
sample standard deviation :
R =
|x
i
x|

(1)
Where x is the sample mean. R depends on k, the number of outliers proposed to be rejected
and m the number of unknown quantities. The meaning of this latter may be destined to
remain obscure; however it appears to be something akin to degrees of freedom, i.e., the
number of independent processes that are giving rise to outliers in the data. Gould (1855)
acknowledges that the cases of m > 2 are of little practical signicance.
An illustration of the range of values of R for k = 0 100% and m = 0 100% of N is shown
in Figure 3, with details in PeirceLimits.R. For values of m = 1 we can see R dropping
below 0 at the point where k > 90% i.e., it is meaningless to try to reject more than 90%
of a given dataset. Additionally, for low values of k, increasing m does reduce R.
As an aside, I sought to generate the values in Table III in Gould (1855), giving values of
NlogQ for N observations and k proposed rejections. I followed his equation (B.) which is:
Q
k
=
k
k
(N k)
N
k
N
k
(2)
whence
N log
10
Q = N log
10
k

k
k
(N k)
N
k
N
k
(3)
However in faithfully replicating the table, I found additional adjustments were necessary
such that
Nlog
10
Q 10 + Nlog
10
Q (4)
Journal of Statistical Software 5
Figure 4: Values of NlogQ for N and k, per Gould paper.
and
Nlog
10
Q < 0.05 Nlog
10
Q 100 + Nlog
10
Q (5)
The reason for this is not entirely self-evident. An illustration of the function is shown in
Figure 4, which may be manipulated, if desired, with NlogQLimits.R:
4. Original paper
Peirce himself appears to have been aware of the diculties in interpreting his orininal paper.
I perceive that the theory of my criterion has been frequently misunderstood.
I presume this to be due in a great degree to the conciseness of the argument with
which it was published.
Peirce (1877)
However this did not prevent the methods becoming widely adopted in his own time, largely
due to the clarity of Goulds implementation. I had some diculty in replicating all of the
results from the original paper Peirce (1852). However a number of formulas are of interest.
6 Peirce
Figure 5: Probability of an error occurring, varying by mean error of system.
(These functions with corresponding plots are included in Pierce1852.R). His expression for
the probability of certain error in a system with mean error is given by:
() =
1

2
e

2
2
2
(6)
This is illustrated for a number of sample ranges of interest in Figure 5:
His next formula gives the probability of an error in a system which exceeds the required
limit, x:
(x) =
2

x
e

1
2
x
2
(7)
The reader may recognize as closely related to the complementary error function, erfc,
which is already implemented in R as NORMT3::erfc:
erfc(x) =
2

x
e
t
2
dt (8)
A comparison of both is shown in Figure 6.
His next equation for the probability of k observations exceeding the required limit x I took
to be:
P =

(x)
(x)

k
(9)
Journal of Statistical Software 7
Figure 6: Comparison of Peirces with erfc.
shows probability varying by ratio of limit of acceptable error to mean error.
whereby he derives:
P =
1

Nk
2
N
k
2
e
N+m+kx
2
2
(x)
k
(10)
However substituting arbitrary values in both lead to values of > 1. Again, the reason for
this is not entirely self-evident to me.
5. Conclusion
Peirce deserves credit as the rst to suggest a method of excluding outliers. Given the preva-
lence of normal distributions in routine observations, a revival of his methods may be timely.
I hope these illustrations will clarify the range over which his methods may be applied.
6. Acknowledgements
Knud Thomsen for sharing the method in C. Kevin Mullin for converting the method to R.
References
Gould BA (1855). On Peirces Criterion for the Rejection of Doubtful Observations, with
Tables for Facilitating its Application. Astronomical Journal, 4(83), 8187. doi:10.1086/
100480. URL http://adsabs.harvard.edu/abs/1855AJ......4...81G.
Natrella M (2012). NIST/SEMATECH e-Handbook of Statistical Methods. URL http:
//www.itl.nist.gov/div898/handbook/index.htm.
Peirce B (1852). Criterion for the Rejection of Doubtful Observations. The As-
tronomical Journal, 2(45), 161163. URL http://articles.adsabs.harvard.edu/
8 Peirce
cgi-bin/nph-iarticle_query?1852AJ......2..161P;data_type=PDF_HIGHhttp:
//adsabs.harvard.edu/full/1852AJ......2..161P.
Peirce B (1877). On Peirces Criterion. Proceedings of the American Academy of Arts and
Sciences, 13, 348351. URL http://www.jstor.org/stable/25138498.
Ross S (2003). Peirces Criterion for the Elimination of Suspect Experimental Data. Journal
of Engineering Technology, 20(2), 112. URL http://classes.engineering.wustl.edu/
2009/fall/che473/handouts/OutlierRejection.pdf.
Team RDC (2012). R: A Language and Environment for Statistical Computing. R Foun-
dation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http:
//www.R-project.org/.
Aliation:
Christopher Dardis
Department of Neurology
Barrow Neurological Institute
350 W. Thomas Road
Phoenix, AZ 85013
E-mail: [email protected]
URL: https://christopherdardis.wordpress.com/
Journal of Statistical Software http://www.jstatsoft.org/
published by the American Statistical Association http://www.amstat.org/
Volume VV, Issue II Submitted: yyyy-mm-dd
MMMMMM YYYY Accepted: yyyy-mm-dd

Identification of Outliers (Monographs On Statistics and - D. M. Hawkins (Auth.)
No ratings yet
Identification of Outliers (Monographs On Statistics and - D. M. Hawkins (Auth.)
194 pages
Outliers
No ratings yet
Outliers
24 pages
Maximum Likelihood Estimation With Stata, Fourth Edition by William Gould, Jeffrey Pitblado, Brian Poi
No ratings yet
Maximum Likelihood Estimation With Stata, Fourth Edition by William Gould, Jeffrey Pitblado, Brian Poi
376 pages
Design Rainfall Data and Analysis
No ratings yet
Design Rainfall Data and Analysis
213 pages
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
From Everand
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
Wiley
No ratings yet
Omron PLC CP1E Manual
100% (1)
Omron PLC CP1E Manual
257 pages
Box, Wetz Technical Report PDF
No ratings yet
Box, Wetz Technical Report PDF
95 pages
Statistical Test Methods For Hypothesis Testing
No ratings yet
Statistical Test Methods For Hypothesis Testing
6 pages
HonestParallelTrends Main
No ratings yet
HonestParallelTrends Main
86 pages
Outlier Detection Algorithms
No ratings yet
Outlier Detection Algorithms
38 pages
Ch-6 Solution
No ratings yet
Ch-6 Solution
34 pages
Week 3 - The SLRM (2) - Updated PDF
No ratings yet
Week 3 - The SLRM (2) - Updated PDF
49 pages
Empirical Finance2
No ratings yet
Empirical Finance2
28 pages
Sample Criteria For Outliers GRUBBS
No ratings yet
Sample Criteria For Outliers GRUBBS
33 pages
Cito Proefschrift Maarten Marsman PDF
No ratings yet
Cito Proefschrift Maarten Marsman PDF
114 pages
Beginner's Guide Pro Panel ENGLISH
100% (6)
Beginner's Guide Pro Panel ENGLISH
32 pages
Schamberger2020 Article RobustPartialLeastSquaresPathM
No ratings yet
Schamberger2020 Article RobustPartialLeastSquaresPathM
28 pages
Touchless Touch Screen Report
67% (3)
Touchless Touch Screen Report
24 pages
Mic Inalambrico Uhf Shure
50% (2)
Mic Inalambrico Uhf Shure
48 pages
1-Fluid Flow and The Production System
No ratings yet
1-Fluid Flow and The Production System
128 pages
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
No ratings yet
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
34 pages
OUTLIERS
100% (1)
OUTLIERS
5 pages
PNP Cybercrime Strategy
No ratings yet
PNP Cybercrime Strategy
13 pages
EC212: Introduction To Econometrics Multiple Regression: Inference (Wooldridge, Ch. 4)
No ratings yet
EC212: Introduction To Econometrics Multiple Regression: Inference (Wooldridge, Ch. 4)
89 pages
MIT18 05S14 Class17 Slides
No ratings yet
MIT18 05S14 Class17 Slides
25 pages
Data Mining Slide Contents
No ratings yet
Data Mining Slide Contents
22 pages
Lecture 12 1
No ratings yet
Lecture 12 1
46 pages
Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements
From Everand
Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements
S. Twomey
No ratings yet
Outlier Analysis
No ratings yet
Outlier Analysis
28 pages
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
No ratings yet
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
46 pages
14640749408401131
No ratings yet
14640749408401131
22 pages
DK5985 - Ch08engineers (Chemical Industries) by Peter Englezos (Cap 8)
No ratings yet
DK5985 - Ch08engineers (Chemical Industries) by Peter Englezos (Cap 8)
25 pages
10 Barnettandlewis 1978 Outliersinstatisticaldata
No ratings yet
10 Barnettandlewis 1978 Outliersinstatisticaldata
31 pages
Assignment Regression Techniques
No ratings yet
Assignment Regression Techniques
12 pages
Acs Analchem 0c02178
No ratings yet
Acs Analchem 0c02178
9 pages
Electronics For You 03 2017
No ratings yet
Electronics For You 03 2017
148 pages
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Davies 1993
No ratings yet
Davies 1993
12 pages
Pre-Practicum Lesson 10 28
No ratings yet
Pre-Practicum Lesson 10 28
9 pages
Outlier Detection in Non-Gaussian Distributions Uitschieter Detectie in Niet-Gauss Verdelingen
No ratings yet
Outlier Detection in Non-Gaussian Distributions Uitschieter Detectie in Niet-Gauss Verdelingen
45 pages
Alamgir
No ratings yet
Alamgir
21 pages
Institute of Mathematical Statistics
No ratings yet
Institute of Mathematical Statistics
20 pages
Robust Decision Trees
No ratings yet
Robust Decision Trees
6 pages
C Lab Manual
No ratings yet
C Lab Manual
25 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
David 1985
No ratings yet
David 1985
4 pages
The Power of Outliers
No ratings yet
The Power of Outliers
9 pages
Sampling Criterion
No ratings yet
Sampling Criterion
6 pages
Peirce Criterion
No ratings yet
Peirce Criterion
4 pages
DB Structure Pivot Etc
No ratings yet
DB Structure Pivot Etc
14 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
Ouliers in Statistica
0% (1)
Ouliers in Statistica
5 pages
Outliers PDF
No ratings yet
Outliers PDF
15 pages
COA Minimum Standards Regulations 2015 Submitted To MHRD For Approval
No ratings yet
COA Minimum Standards Regulations 2015 Submitted To MHRD For Approval
30 pages
Final Paper Guide For PS, Spring : e Source File For This Document Is Not Yet Available at
No ratings yet
Final Paper Guide For PS, Spring : e Source File For This Document Is Not Yet Available at
13 pages
Outliers
No ratings yet
Outliers
15 pages
16 STS601
No ratings yet
16 STS601
3 pages
Package Outliers': R Topics Documented
No ratings yet
Package Outliers': R Topics Documented
15 pages
Schulkes - Slug Frequencies Revisited PDF
No ratings yet
Schulkes - Slug Frequencies Revisited PDF
15 pages
On The Distribution of Prime Numbers
No ratings yet
On The Distribution of Prime Numbers
16 pages
Alto Mistral 2500, 4000
100% (1)
Alto Mistral 2500, 4000
46 pages
EDUR8132 IntroductoryNotes 17august2010
No ratings yet
EDUR8132 IntroductoryNotes 17august2010
5 pages
Weak IV
No ratings yet
Weak IV
7 pages
Air Quality - Report
No ratings yet
Air Quality - Report
62 pages
4th QTR - Feb 27, 2019 - Day 2 - Measures of Spread - Outlier and Box Plot
No ratings yet
4th QTR - Feb 27, 2019 - Day 2 - Measures of Spread - Outlier and Box Plot
5 pages
DAY - 1 Day - 6 DAY - 11: Extended Primitives Modelling
No ratings yet
DAY - 1 Day - 6 DAY - 11: Extended Primitives Modelling
2 pages
Cooks
No ratings yet
Cooks
5 pages
Ict Note
No ratings yet
Ict Note
48 pages
Outliers PDF
No ratings yet
Outliers PDF
5 pages
GProg Python 6-Print
No ratings yet
GProg Python 6-Print
14 pages
Manual Luxometro Tecnolux Moon 12995
No ratings yet
Manual Luxometro Tecnolux Moon 12995
19 pages
A Review of Statistical Outlier Methods
No ratings yet
A Review of Statistical Outlier Methods
8 pages
Lecture-2,3 - Chapter 2 - Organizing and Graphing Data
No ratings yet
Lecture-2,3 - Chapter 2 - Organizing and Graphing Data
46 pages
Recommended Criteria For Single Samples: Table 1 Table 1
No ratings yet
Recommended Criteria For Single Samples: Table 1 Table 1
1 page
How To Calculate Outliers
No ratings yet
How To Calculate Outliers
7 pages
IN00-CCTV-NC-XX0002-001001 REV-00-Layout2
No ratings yet
IN00-CCTV-NC-XX0002-001001 REV-00-Layout2
1 page
Boxplot Outlier
No ratings yet
Boxplot Outlier
3 pages
22 Entraiment A 2004
No ratings yet
22 Entraiment A 2004
34 pages
BR Imc2000 Imc2500
No ratings yet
BR Imc2000 Imc2500
4 pages
Maharaja Agrasen Institute of Technology: PSP Area, Plot No.1, Sector-22, Rohini, Delhi-110086
No ratings yet
Maharaja Agrasen Institute of Technology: PSP Area, Plot No.1, Sector-22, Rohini, Delhi-110086
22 pages
‎⁨حل كتاب التمارين الإنجليزي Mega Goal 2.1 ثاني ثانوي مسارات ف1 1444⁩
No ratings yet
‎⁨حل كتاب التمارين الإنجليزي Mega Goal 2.1 ثاني ثانوي مسارات ف1 1444⁩
50 pages
Bro ibaConditionMonitoring en
No ratings yet
Bro ibaConditionMonitoring en
36 pages
2 LaminarFlowPipes&Annuli Newtonian
No ratings yet
2 LaminarFlowPipes&Annuli Newtonian
35 pages
Unit 3: Bandwidth Utilization
No ratings yet
Unit 3: Bandwidth Utilization
74 pages
ICU Patient Monitoring System With Automatic SMS System Using GSMZIGBEE Wireless Technology
No ratings yet
ICU Patient Monitoring System With Automatic SMS System Using GSMZIGBEE Wireless Technology
2 pages
Least Squares: Optimization Techniques for Computer Vision: Least Squares Methods
From Everand
Least Squares: Optimization Techniques for Computer Vision: Least Squares Methods
Fouad Sabry
No ratings yet
Gs36j02a10-01e 047
No ratings yet
Gs36j02a10-01e 047
13 pages
Install Arch Linux in VirtualBox VM GitHub
No ratings yet
Install Arch Linux in VirtualBox VM GitHub
6 pages
Jokes Advanced
No ratings yet
Jokes Advanced
6 pages
Major Finalproject
No ratings yet
Major Finalproject
44 pages
Eee141l Final Question
No ratings yet
Eee141l Final Question
5 pages
Battery Charger1
No ratings yet
Battery Charger1
25 pages
Colebrook Equation Solver
No ratings yet
Colebrook Equation Solver
1 page
Ans: - Continous and Discrete Simulation Model
No ratings yet
Ans: - Continous and Discrete Simulation Model
6 pages
Measurement # JL JG: Meas1
No ratings yet
Measurement # JL JG: Meas1
25 pages
Fujistu P55XHA30ES
No ratings yet
Fujistu P55XHA30ES
2 pages
Laplace Distribution: (The Horizontal Scale Is Determined by The Parametric Values)
No ratings yet
Laplace Distribution: (The Horizontal Scale Is Determined by The Parametric Values)
4 pages
Discrete Uniform Distribution
No ratings yet
Discrete Uniform Distribution
3 pages
Planning Feb Aug 2014
No ratings yet
Planning Feb Aug 2014
1 page
Author Templates ASMETemplate
No ratings yet
Author Templates ASMETemplate
2 pages
Rplot
No ratings yet
Rplot
1 page
Book Review: Rensselaer Polytechnic Institute Troy, New York 12181 August 2 7, 1 9 7 9
No ratings yet
Book Review: Rensselaer Polytechnic Institute Troy, New York 12181 August 2 7, 1 9 7 9
1 page
How To Do Some Calculus in Sage
No ratings yet
How To Do Some Calculus in Sage
2 pages
Copyright Form ABCM
No ratings yet
Copyright Form ABCM
1 page
Eaton-Ceag-El-Cps-Datasheet-Zb-S - Sku Cg-S 2 X 3 A - GB
No ratings yet
Eaton-Ceag-El-Cps-Datasheet-Zb-S - Sku Cg-S 2 X 3 A - GB
1 page

Peirce Sub

Uploaded by

Peirce Sub

Uploaded by

JSS

Journal of Statistical Software

You might also like