Topic 2 - Data Presentation
Topic 2 - Data Presentation
• "The important point that must be borne in mind at all times that the pictorial
representation chosen for any situation must depict the true relationship and point out the
proper conclusion. Above all the chart must be honest.”.... C. W. LOWE.
It represents only one variable. For example sales, production, population figures etc. for various
years may be shown by simple bar charts. Since these are of the same width and vary only in
heights (or lengths), it becomes very easy for readers to study the relationship. Simple bar
diagrams are very popular in practice. A bar chart can be either vertical or horizontal; vertical
bars are more popular.
Illustration: - The following table gives the birth rate per thousand of different countries over a
certain period of time.
Country Birth rate Country Birth rate
India 33 China 40
Germany 15 New Zealand 30
U. K. 20 Sweden 15
Illustration: - During 2016 - 2019, the number of students in XEE University are as follows.
Represent the data by a similar diagram.
120,000
100,000
80,000 2018-2019
2017-2018
60,000
2016-2017
40,000
20,000
0
Arts Science Law Total
Illustration: - The table below gives data relating to the exports and imports of a certain country
X (in thousands of dollars) during the four years ending in 2015 - 2019.
Year Export Import
2015-2016 319 250
2016-2017 339 263
2017-2018 345 258
2018-2019 308 206
350
300
250
200 Export
Import
150
100
50
0
2015-2016 2016-2017 2017-2018 2018-2019
Deviation bars are used to represent net quantities - excess or deficit i.e. net profit, net loss, net
exports or imports, swings in voting etc. Such bars have both positive and negative values.
Positive values lie above the base line and negative values lie below it.
Illustration:-
Years Sales Net profits
Present the above data by a suitable diagram showing the sales and net profits of private
industrial companies.
2.3 Pie Chart
i) Geometrically it can be seen that the area of a sector of a circle taken radically, is proportional
to the angle at its center. It is therefore sufficient to draw angles at the center, proportional to the
original figures. This will make the areas of the sector proportional to the basic figures.
For example, let the total be 1000 and one of the component be 200, then the angle will be
ii) When a statistical phenomenon is composed of different components which are numerous (say
four or more components), bar charts are not suitable to represent them because, under this
situation, they become very complex and their visual impressions are questioned. A pie diagram
is suitable for such situations. It is a circular diagram which is a circle (pie) divided by the radii,
into sectors (like slices of a cake or pie). The area of a sector is proportional to the size of each Pie
charts are useful to compare different parts of a whole amount. They are often used to present
financial information. E.g. A Company’s expenditure can be shown to be the sum of its parts
including different expense categories such as salaries, borrowing interest, taxation and general
running costs (i.e. rent, electricity, heating etc).
A pie chart is a circular chart in which the circle is divided into sectors. Each sector visually
represents an item in a data set to match the amount of the item as a percentage or fraction of the
total data set.
Illustration
A family's weekly expenditure on its house mortgage, food and fuel is as follows:
Expense Ksh 00
Mortgage 300
Food 225
Fuel 75
Solution:
= 60000
We can find what percentage of the total expenditure each item equals.
To draw a pie chart, divide the circle into 100 percentage parts. Then allocate the number of
percentage parts required for each item.
• It is simple to read a pie chart. Just look at the required sector representing an item (or
category) and read off the value. For example, the weekly expenditure of the family on
food is 37.5% of the total expenditure measured.
• A pie chart is used to compare the different parts that make up a whole amount.
2.4 Graphs
A graph is a visual representation of data by a continuous curve on a squared (graph) paper. Like
diagrams, graphs are also attractive, and eye-catching, giving a bird's eye-view of data and
revealing their inner pattern.
1. Histogram
2. Frequency Polygon
3. Frequency Curve
4. Ogive or Cumulative Frequency Curve
2.4.1 Histogram
It is defined as a pictorial representation of a grouped frequency distribution by means of adjacent
rectangles, whose areas are proportional to the frequencies.
To construct a Histogram, the class intervals are plotted along the x-axis and corresponding
frequencies are plotted along the y - axis. The rectangles are constructed such that the height of
each rectangle is proportional to the frequency of the class and width is equal to the length of the
class. If all the classes have equal width, then all the rectangles stand on the equal width. In case
of classes having unequal widths, rectangles too stand on unequal widths (bases). For open-classes,
Histogram is constructed after making certain assumptions. As the rectangles are adjacent leaving
no gaps, the class-intervals become of the inclusive type, adjustment is necessary for end points
only.
For example, in a book sale, you want to determine which books were most popular, the high
priced books, the low priced books, books most neglected etc. Let us say you sold total 31 books
at this book-fair at the following prices.
Sh. ....2, Sh. 1, Sh. 2, Sh. 2, Sh. 3, Sh. 5, Sh. 6, Sh. 17, Sh. 17, Sh. 7, Sh. 15, Sh. 7, Sh. 7, Sh. 18,
Sh. 8, Sh. 10, Sh. 10, Sh. 9, Sh. 13, Sh. 11, Sh. 12, Sh. 12, Sh. 12, Sh. 14, Sh. 16, Sh. 18, Sh. 20,
Sh. 24, Sh. 21, Sh. 22, Sh. 25.
The books are ranging from Sh.1 to Sh.25. Divide this range into number of groups, class
intervals. Typically, there should not be fewer than 5 and more than 20 class-intervals are best
for a frequency Histogram.
Our first class-interval includes the lowest price of the data and, the last-interval of course
includes, the highest price. Also make sure that overlapping is avoided, so that, no one price falls
into two class-intervals. For example you have class intervals as 0-5, 5-10, 10-15 and so on, then
the price Sh.10 falls in both 5-10 and 10-15. Instead if we use Sh.1 - Sh.5, Sh.6=Sh.10, the class-
intervals will be mutually exclusive.
Sh. 1- Sh. 5 6
Sh.6 - Sh.10 8
Sh.11 - Sh.15 10
Sh.16 - Sh.20 3
Sh.21 - Sh.25 4
Total n = sum fi = 31
Note that each class-interval is of equal width i.e. Sh.5 inclusive. Now we draw the frequency
Histogram as under.
To construct an Ogive:-
1) Add up the progressive totals of frequencies, class by class, to get the cumulative
frequencies.
2) Plot classes on the horizontal (x-axis) and cumulative frequencies on the vertical (y-axis).
3) Join the points by a smooth curve. Note that Ogives start at (i) zero on the vertical axis, and
(ii) outside class limit of the last class. In most of the cases it looks like 'S'.
Note that cumulative frequencies are plotted against the 'limits' of the classes to which they
refer.
(A) Less than Ogive: - To plot a less than Ogive, the data is arranged in ascending order of
magnitude and the frequencies are cumulated starting from the top. It starts from zero on the y-
axis and the lower limit of the lowest class interval on the x-axis.
(B) Greater than Ogive: - To plot this Ogive, the data are arranged in the ascending order of
magnitude and frequencies are cumulated from the bottom. This curve ends at zero on the y-axis
and the upper limit of the highest class interval on the x-axis.
Illustrations: - On a graph paper, draw the two Ogives for the data given below of the I.Q. of
160 students.
110 - 120 120 - 130 130 - 140 140 - 150 150 - 160
36 18 10 4 1
Uses: - Certain values like median, quartiles, deciles, quartile deviation, coefficient of skewness
etc. can be located using Ogives. It can be used to find the percentage of items having values less
than.
A stem and leaf diagram provides a visual summary of your data. This diagram provides a
partial sorting of the data and allows you to detect the distributional pattern of the data.
There are three steps for drawing a stem and leaf diagram.
1. Split the data into two pieces, stem and leaf.
2. Arrange the stems from low to high.
3. Attach each leaf to the appropriate stem.
Illustration
154, 143, 148, 139, 143, 147, 153, 162, 136, 147, 144, 143, 139, 142, 143, 156, 151, 164, 157,
149, 146
What we have here is almost a stem and leaf diagram. Note that with the data written in this way
you can see what the modal class is (the one with the most values. You can also see the shape of
the distribution- most of the values are in the 140s with higher or lower values rarer.
To change this into a stem and leaf diagram, we just simplify it a little. Instead of writing out the
full figures each time (143, 143, 144, 143, ...) we write '14' and call this the 'stem' and then write
3, 3, 4, 3, ... (these being the 'leaves'). We would usually, however, write the leaves in order (with
the smallest first). Finally, we must also include a little key so that people know how to interpret
the diagram.
So we finish up with:
BOYS GIRLS
3 4 40 5 4 1 2 8 5
3 5 5 0 50 2 3 5 8 9 4
2 2 3 3 4 5 60 3 5 6 4 5
5 5 2 8 0 2 70 0 3 3
3 1 3 4 80 3 6 4
4 4 9 90 3 4
KEY: 40 5 =45
Can you comment on the shape of the distribution of the two sets of data?
It is one step further to stem-and-leaf. It displays a number of statistics like, median, lower
quartile (Q1), upper quartile (Q3), Inter-quartile range (IQR). It tells us about the symmetry of
the distribution and also gives us the idea about the highest and the lowest values.
Illustration
Statistics CAT scores of 12 students are as follows:-
10, 22, 24, 27, 31, 33, 39, 40, 42, 43, 44, 45
Solution: The scores are arranged in the ascending order. 10, 22, 24, 27, 31, 33, 39, 40, 42, 43,
44, 45
Therefore the average of the two is the median (n+1)/2 i.e. (12 +1)/2
i.e. Median = 33 + 39 = 72 = 36
1 1
2 2
2) The quartile (Q1) is the median of the bottom half. i.e. 25th percentile
Thus
𝑛+1 12+1
𝑄1 = = = 13.25 ≅ 3.rd score=24
4 4
3) The upper quartile (Q3) is the median of the top half. i.e.75th percentile.
Thus
3 (𝑛+1) 3(12+1)
𝑄3 = 4 = 4 = 9.75 score =approximately 10th score
Now the box-plot is constructed as follows: -
i) the line inside the box indicates the median.
ii) The left side of this box indicates the lower quartile (Q1).
iii) The right side of this box indicates the upper quartile (Q3).
iv) A straight line is then drawn from the lowest value of this distribution through the box to the
highest value of this distribution. This horizontal straight line is called the
"Whiskers".
Then the above CAT score in box-plot will look like this:
0 10 20 30 40 50 60
2.6 Exercise
1. The bar chart below shows the number of people in a selection of families.
10
6
Number of
families 4
0
3 4 5 6 7 8 9 10
Number of people in a family
(c) Find, correct to the nearest whole number, the mean number of people in a
family.
(a) Construct a cumulative frequency table for the data in the table.
3. The following table shows the age distribution of teachers who smoke at Fegi High
School.
Ages Number of
smokers
20 ≤ x < 30 5
30 ≤ x < 40 4
40 ≤ x < 50 3
50 ≤ x < 60 2
60 ≤ x < 70 3
180 184 195 177 175 173 169 167 197 173 166 183 161 195 177
192 161 165
5. The following stem and leaf diagram gives the heights in cm of 39 schoolchildren.
Ste Leaf Key 2 represents 132
m 13 cm.
13 2, 3, 3, 5, 8,
14 1, 1, 1, 4, 5, 5, 9,
15 3, 4, 4, 6, 6, 7, 7, 7, 8, 9, 9,
16 1, 2, 2, 5, 6, 6, 7, 8, 8,
17 4, 4, 4, 5, 6, 6,
18 0,
(a) (i) State the lower quartile height,