0% found this document useful (0 votes)
190 views16 pages

Outliers

The document describes methods for detecting outliers in data sets, including calculating the interquartile range and using the Tukey, Grubbs, and Dixon's Q tests. For the Tukey test, any values below Q1 - 1.5*IQR or above Q3 + 1.5*IQR would be considered suspected outliers. The Grubbs and Dixon's Q tests calculate a test statistic and compare it to a critical value to determine if a value should be rejected as an outlier. Examples are provided to demonstrate applying these tests.

Uploaded by

cormac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
190 views16 pages

Outliers

The document describes methods for detecting outliers in data sets, including calculating the interquartile range and using the Tukey, Grubbs, and Dixon's Q tests. For the Tukey test, any values below Q1 - 1.5*IQR or above Q3 + 1.5*IQR would be considered suspected outliers. The Grubbs and Dixon's Q tests calculate a test statistic and compare it to a critical value to determine if a value should be rejected as an outlier. Examples are provided to demonstrate applying these tests.

Uploaded by

cormac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

DETECT suspects 1: Visual inspection

2.24 2.43 2.36 2.83 2.30


N Titre
o (ml)
1 11.4
2 11.1
3 11.5
4 11.9
5 11.3
6 11.2
DETECT suspects 2: Calculation: Tukey k-Test

The interquartile range (IQR) is the distance between the first and third
quartiles (the length of the box in the boxplot)
IQR = Q3 Q1

An outlier is an individual value that falls outside the overall pattern.


How far outside the overall pattern does a value have to fall to be
considered a suspected outlier?

Suspected low outlier: any value < Q1 1.5 IQR

Suspected high outlier: any value > Q3 + 1.5 IQR


25 7.9
24 5.6
23 5.3
22 4.9
21 4.7
20 4.5
19 4.2 Q3 = 4.35
18 4.1
17 3.9
16 3.8
15 3.7
14 3.6
13 3.4
12 3.3
11 2.9
10 2.8
9 2.5
8 2.3
7 2.3
Q1 = 2.2
6 2.1
5 1.5
4 1.9
3 1.6
2 1.2
1 0.6
DETECT suspects: Calculation: Grubbs Test
ISO test for point outliers
suspect value is value that is furthest away from mean
Normal population
Use entire dataset to calculate statistics
Gcritical depends on n
If G exp> Gcritical value, then REJECT suspect

suspect x
G exp
s
example

The following values were got for the nitrate concentration (mg/L) in a
sample of river water:

0.403 0.410 0.401 0.380

Ideally get more measurements if suspect occurs, esp. if only a few made.
the more values may make it clearer if suspect should be rejected
Also if kept, reduce its effect.
if 3 further measurements...

0.403 0.410 0.401 0.380 0.400 0.413 0.408


You try

set of mass spectrometer measurements on a uranium isotope:

199.31 199.53 200.19 200.82 201.92 201.95 202.18 206.32


DETECT suspects 2: Calculation: Dixon's Q-Test

popular
for small sample (n=3 to 10)
assumes Normal population
if Q > critical value, then REJECT suspect
Dixon's Q-Test
The following values were got for the
nitrate concentration (mg/L) in a sample of
river water:

0.403 0.410 0.401 0.380 0.400 0.413 0.408

suspect nearest
Q
range
You try:
0.189 0.167 0.187 0.183 0.186 0.182

0.181 0.184 0.181 0.177

suspect nearest
Q
range
DECIDE
Correct obvious errors for which data exists
Exclude obvious errors for which no data exists
Ignore? run with/without to see if influential
trimmed mean
Retain?
outliers are expected for large sample sizes
some methods are robust
Replace

DISCLOSE

You might also like