0% found this document useful (0 votes)
8 views16 pages

IDS-Boxplots and Outliers

Uploaded by

rkschinna802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views16 pages

IDS-Boxplots and Outliers

Uploaded by

rkschinna802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

BOX PLOTS

Box Plots:
Box plots are a graphical representation of your sample (easy to visualize descriptive
statistics); they are also known as box-and-whisker diagrams. Any data that you can
present using a bar graph can, in most cases, also be presented using box plots. A box
plot provides more information about the data than does a bar graph.
Things to know about box plots:

• Your sample is presented as a box.


• The spacing's between the different parts of the box help indicate the degree
of dispersion (spread) and skewness in the data, and identify outliers.
• A box plot shows a 5-number data summary: minimum, first (lower) quartile,
median, third (upper) quartile, maximum.
• The box is divided at the median.
• The length of the box is the interquartile range (IQR).
• The 1st quartile is the bottom line.
• The 3rd quartile is the top line.
A box plot uses five especially selected numbers to display
information about numerical scores in a graphical form.
The numbers used are the extremes (the highest and
lowest scores), the median (the middle score) and the upper and
lower quartiles. These five numbers make up the five-number
summary.
A box plot is used to show the range and middle
half of ranked data. Ranked data is numerical
data such as numbers.

The middle half of the data is represented by the


box. The highest and lowest scores are joined to
the box by straight lines.

The regions above the upper quartile and below


the lower quartile each contain 25% of the data.
Outliers
An outlier is an observation “that appears to deviate markedly from
other members of the sample in which it occurs” (Grubbs, 1969)

Note: we focus on univariate outliers, those found when looking at a


distribution of values in a single dimension (e.g. income).

An outlier is “an observation that deviates so much from other


observations as to arouse suspicion that it was generated by a different
mechanism” (Hawkins 1980)

You might also like