0% found this document useful (0 votes)
80 views15 pages

Also Called: Box-And-Whisker Plot

The box plot is a graph that summarizes key statistical characteristics of a data set, including the median, spread, and outliers. It displays the median and quartiles to show the central tendency and variability of data through boxes and whiskers. The box plot allows for easy visualization and comparison of multiple data sets. It is constructed by listing data, finding hinges and fences to identify outliers, and drawing boxes and whiskers accordingly.

Uploaded by

amirq4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views15 pages

Also Called: Box-And-Whisker Plot

The box plot is a graph that summarizes key statistical characteristics of a data set, including the median, spread, and outliers. It displays the median and quartiles to show the central tendency and variability of data through boxes and whiskers. The box plot allows for easy visualization and comparison of multiple data sets. It is constructed by listing data, finding hinges and fences to identify outliers, and drawing boxes and whiskers accordingly.

Uploaded by

amirq4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Box Plot

Also called: box-and-whisker plot


Description
The box plot is a graph that summarizes the most important
statistical characteristics of a frequency distribution for easy
understanding and comparison. Information about where the
data falls and how far it spreads can be seen on the plot. The
box plot is a powerful tool because it is simple to construct yet
yields a lot of information.
When to Use
 When analyzing or communicating the most important
characteristics of a batch of data, rather than the detail, and

 When comparing two or more sets of data, or . . .
 When there is not enough data for a histogram, or . . .
 When summarizing the data shown on another graph, such
as a control chart or run chart
Procedure
1. List all the data values in order from smallest to largest. We
will refer to the total number of values, the count, as n. We
will refer to the numbers in order like this: X1 is the smallest
number; X2 is the next smallest number; up to Xn, which is
the largest number.
2. Medians. Cut the data in half. Find the median—the point
where half the values are larger and half are smaller.
 If the total number of values (n) is odd: the median is the
middle one. Count (n + 1)/2 from either end.
 If the total number of values (n) is even: the median is
the average of the two middle ones. Count n/2 and n/2
+ 1 from either end. Average those two numbers:

3. Hinges. Cut the data in quarters. Find the hinges—the


medians of each half.
 If n is even, the median is the average of X and X .
n/2 n/2 + 1

Take the values from 1 to X and find their median just


n/2

as in step 2. This is the lower hinge.


 If the total number of values is odd, the median is X .
(n + 1)/2

Take the values from 1 to the median and find their


median, just as in step 2. This is the lower hinge. Do the
same with the values at the upper end to find the upper
4. H-spread. Calculate the distance between the hinges, or H-
spread:
H-spread = upper hinge – lower hinge
5. Inner fences. These are values separating data that are
probably a predictable part of the distribution from data
that are outside the distribution. Inner fences are located
beyond each hinge at 11⁄2 times the H-spread, a distance
called a step.
upper inner fence = upper hinge + 1.5 × H-spread
lower inner fence = lower hinge – 1.5 × H-spread
6. Outer fences. Data beyond these values are far outside the
distribution and deserving of special attention. Outer
fences are located one step beyond the inner fences.
upper outer fence = upper inner fence + 1.5 × H-spread lower
outer fence = lower inner fence – 1.5 × H-spread
7. To draw the box plot, first draw one horizontal axis. Scale it
appropriately for the range of data.
 Draw a box with ends at the hinge values.
 Draw a line across the middle of the box at the median
value.
 Draw a line at each inner fence value.
 Draw a dashed crossbar at the adjacent value, the first
value inside the inner fences.
 Draw whiskers, dashed lines from the ends of the box to
the adjacent values.
 Draw small circles representing any outside data points:
beyond the inner fences but inside the outer fences.
 Draw double circles to represent far out data points:
beyond the outer fences.
8. If you are comparing several data sets, repeat the
procedure for each set of data.
9. Analyze the plot. Look for:
 Location of the median
 Spread of the data: how far the hinges and fences are
from the median
 Symmetry of the distribution
 Existence of outside points
Example
Suppose two bowling teams, the Avengers and the Bulldogs,
have the scores shown in Figure 1. Which team is better? We
will draw a box plot of each team’s scores and compare the two
plots.
1. The scores are already in order from smallest to largest.
There are 14 scores for each team, so n = 14.

Fig 1
Data for box
plot example
2. Median. There is an even number of scores, so the median
is the average of the two middle ones. We must count n/2
and n/2 + 1 from one end.
n/2 = 14/2 = 7 and n/2 + 1 = 8
Count to the seventh and eighth scores in each group and
average them.
3. Hinges. We must find two medians, first of values 1
through 7 and then of values 8 through 14. There are seven
values in each half, an odd number, so we count (7 + 1)/2 =
4 from either end.
lower hinge A = 142 upper hinge A = 160
lower hinge B = 152 upper hinge B = 163
4. H-Spread. The distance between hinges is
H-spread = upper hinge – lower hinge
H-spread A = 160 – 142 = 18
H-spread B = 163 – 152 = 11
5. Inner fences.
upper inner fence = upper hinge + 1.5 × H-spread
upper inner fence A = 160 + 1.5 × 18 = 160 + 27 = 187
upper inner fence B = 163 + 1.5 × 11 = 163 + 16.5 = 179.5
lower inner fence = lower hinge – 1.5 × H-spread
lower inner fence A = 142 – 27 = 115
lower inner fence B = 152 – 16.5 = 135.5

Fig 2
Box plot
example
6. Outer fences.
upper outer fence = upper inner fence + 1.5 × H-spread
upper outer fence A = 187 + 27 = 214
upper outer fence B = 179.5 + 16.5 = 196
lower outer fence = lower inner fence – 1.5 × H-spread
lower outer fence A = 115 – 27 = 88
lower outer fence B = 135.5 – 16.5 = 119
Figure 2 is the box plot of the two teams’ scores. While the
Avengers have a star and the Bulldogs have a poor player,
overall the Bulldogs tend to score higher than the Avengers.
The Bulldogs’ smaller spread also indicates they score more
consistently.
Variations
The box plot was created by John W. Tukey. Many variations
have been proposed for calculating, drawing, and using box
plots. Whenever you use a variation on the basic box plot,
draw solid lines beyond the hinges to indicate that you are not
conforming to Tukey’s rules. Some variations are:
 Simple box plot. Instead of calculating and drawing fences
and outliers, draw lines from the ends of the box (hinge
values) to the highest and lowest data values.
 Modified box plot. Calculate the arithmetic average of all
the data values and show it with a dot on the box plot. The
closer the average is to the median, the more symmetrical
the distribution.
 Modified-width box plot. When using two or more box
plots to compare several data sets, the widths of the boxes
can be drawn proportional to the sample size of the data
sets.
 Parentheses can be drawn on the plot to represent 95%
confidence limits.
 Ghost box plot or box-plot control chart. A box plot can be
drawn with dotted lines directly on a control chart or other
graph of individual data points to show a summary of the
data. This variation is especially useful if several plots are
drawn showing sequential subgroups of the data. For
example, draw one ghost box plot in the middle of a set of
15 data points prior to a process change and another in the
middle of the next set of 15 data points after the change.

You might also like