Week 7
Week 7
(continued)
1
Using the data, we get
x̄ − µ x̄ − µ 18.87 − 21
t= ≈ = = −2.83
√σ √s 2.2583
√
n n 9
Parameter
2
In producing such a plot, the data are first arranged in ascending or-
der. The minimum value and the maximum value thus obtained gives the
extremities of the ‘whiskers’ of the plot. Then one has to obtain the median,
which is nothing but the middle value. If the number of data points is odd,
the middle number is easy to identify. If there are an even number of data
points, two numbers will appear in the middle, and one has to take the mean
of these two numbers. This median gives the mid-point of the plot, called
the second quartile, or Q2 (see Fig. 2).
Interquartile
range
Minimum Q1 Q2 Q3 Maximum
Median
Then one has to obtain the median of the data points below Q2. That
gives another value, called the first quartile or Q1. Similarly, one obtains
the median of the data points above Q2, which gives the third quartile, or
Q3. The range between Q1 and Q3 is called the interquartile range (IQR),
which is plotted as a box. The range between the minimum and Q1, and
that between Q3 and the maximum is plotted as a ‘whisker’. Therefore, the
representation of a typical data set would look like Fig. 2. One characteristic
feature of such a plot is that 25% of the data lie in each of the four ranges
shown in the plot.
Sometimes one gets some data points that lie way outside the natural
range of the data. These are called the ‘outliers’. The box plot also enables
one to identify and present the outliers. The usual method is that the data
points outside 1.5 times the interquartile range outside the box are called
outliers. Therefore, one can identify the ’reasonable’ range of the data as
that between (Q1 − 1.5 × IQR) and (Q3 + 1.5 × IQR), and any data point
falling outside this range may be suspected to be ‘outlier’1 .
Such outliers may result from experimental or observational errors but
may also result from some phenomenon not yet discovered. That is why one
cannot simply ignore an outlier or delete it from a data set. Outliers have to
be faithfully presented in a research paper, though you may ignore these in
further analysis of the data.
1
Before we conclude that such a point is indeed an outlier, some more tests would be
required.
3
Example 1:
Consider the following data set:
17.2, 15.9, 16.7, 18.3, 15.0, 19.3, 20.2, 16.3, 17.9, 15.3, 10.1, 19.1, 18.2
Obtain the box and whisker plot.
Solution:
Arranging the data in ascending order, we get
10.1, 15.0, 15.3, 15.9, 16.3, 16.7, 17.2, 17.9, 18.2, 18.3, 19.1, 19.3, 20.2
It has 13 data points, which is an odd number. Hence, the 7th data point,
17.2, is the median.
There are 6 data points below and above the median, which is an even
number. So we get Q1 by taking the mean of the 3rd and 4th entries and
get Q1=15.6. Similarly, we get Q3 as the mean of the 10th and 11th entries
and get Q3=18.7.
10 11 12 13 14 15 16 17 18 19 20
Figure 3: The box and whisker plot for the whole data set given in the
Example
10.1
10 11 12 13 14 15 16 17 18 19 20
4
Table 1: The t-table.