0% found this document useful (0 votes)
66 views1 page

Finding Outliers 2 Wayes Z-Score and Interquortile Range

The document discusses using Z-scores to identify outliers in data that follows a normal distribution. It explains that a Z-score indicates how many standard deviations an observation is from the mean, and values more than 3 standard deviations out are considered outliers. However, outliers can skew the calculation of Z-scores by influencing the mean and standard deviation. The document then introduces an alternative method using interquartile range to calculate inner and outer fences to identify outliers. Values outside the outer fences would be outliers.

Uploaded by

Ana Chikovani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views1 page

Finding Outliers 2 Wayes Z-Score and Interquortile Range

The document discusses using Z-scores to identify outliers in data that follows a normal distribution. It explains that a Z-score indicates how many standard deviations an observation is from the mean, and values more than 3 standard deviations out are considered outliers. However, outliers can skew the calculation of Z-scores by influencing the mean and standard deviation. The document then introduces an alternative method using interquartile range to calculate inner and outer fences to identify outliers. Values outside the outer fences would be outliers.

Uploaded by

Ana Chikovani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Using Z-scores to Detect Outliers

Z-scores can quantify the unusualness of an observation when your data follow the normal distribution. Z-scores
are the number of standard deviations above and below the mean that each value falls. For example, a Z-score
of 2 indicates that an observation is two standard deviations above the average while a Z-score of -2 signifies it
is two standard deviations below the mean. A Z-score of zero represents a value that equals the mean.

The further away an observation’s Z-score is from zero, the more unusual it is. A standard cut-off value for
finding outliers are Z-scores of +/-3 or further from zero. The probability distribution below displays the
distribution of Z-scores in a standard normal distribution. Z-scores beyond +/- 3 are so extreme you can barely
see the shading under the curve.

In a population that follows the normal distribution, Z-score values more extreme than +/- 3 have a probability
of 0.0027 (2 * 0.00135), which is about 1 in 370 observations. However, if your data don’t follow the normal
distribution, this approach might not be accurate.

Also, note that the outlier’s presence throws off the Z-scores because it inflates the mean and standard deviation
as we saw earlier. Notice how all the Z-scores are negative except the outlier’s value. If we calculated Z-scores
without the outlier, they’d be different! Be aware that if your dataset contains outliers, Z-values are biased such
that they appear to be less extreme (i.e., closer to zero).

To calculate the outlier fences, do the following:

1. Take your IQR and multiply it by 1.5 and 3. We’ll use these values
to obtain the inner and outer fences. For our example, the IQR equals 0.222.
Consequently, 0.222 * 1.5 = 0.333 and 0.222 * 3 = 0.666. We’ll use 0.333
and 0.666 in the following steps.
2. Calculate the inner and outer lower fences. Take the Q1 value and subtract the two values from step 1. The two
results are the lower inner and outer outlier fences. For our example, Q1 is 1.714. So, the lower inner fence =
1.714 – 0.333 = 1.381 and the lower outer fence = 1.714 – 0.666 = 1.048.
3. Calculate the inner and outer upper fences. Take the Q3 value and add the two values from step 1. The two
results are the upper inner and upper outlier fences. For our example, Q3 is 1.936. So, the upper inner fence =
1.936 + 0.333 = 2.269 and the upper outer fence = 1.936 + 0.666 = 2.602.

Using the Outlier Fences with Our Example Dataset

For our example dataset, the values for these fences are 1.048, 1.381, 2.269, and 2.602. Almost all of our data
should fall between the inner fences, which are 1.381 and 2.269. At this point, we look at our data values and
determine whether any qualify as being major or minor outliers. 14 out of the 15 data points fall inside the inner
fences—they are not outliers. The 15th data point falls outside the upper outer fence—it’s a major or extreme
outlier.

The IQR method is helpful because it uses percentiles, which do not depend on a specific distribution.
Additionally, percentiles are relatively robust to the presence of outliers compared to the other quantitative
methods. Values that fall inside the two inner fences are not outliers. Let’s see how this method works using
our example dataset.

You might also like