0% found this document useful (0 votes)
25 views15 pages

Topic 2 - Data Presentation

Uploaded by

agoyamiracle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views15 pages

Topic 2 - Data Presentation

Uploaded by

agoyamiracle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

2.

0 ORGANIZATION AND REPRESENTATION OF DATA


2.1 General Principles of Constructing Diagrams
1. The diagrams should be simple.
2. Each diagram must be given a clear, concise and suitable title without damaging clarity.
3. A proper proportion between height and width must be maintained in order to avoid an
unpleasant look.
4. Select a proper scale; it should be in even numbers or in multiples of five or ten.
e.g. 25, 50, 75 or 10, 20, 30, 40… etc But no fixed rule
5. In order to clear certain points, always put footnotes.
6. An index, explaining different lines, shades and colors should be given.
7. Diagrams should be absolutely neat and clean.

• "The important point that must be borne in mind at all times that the pictorial
representation chosen for any situation must depict the true relationship and point out the
proper conclusion. Above all the chart must be honest.”.... C. W. LOWE.

2.2 Bar Diagrams

2.2.1 Simple 'Bar diagram'

It represents only one variable. For example sales, production, population figures etc. for various
years may be shown by simple bar charts. Since these are of the same width and vary only in
heights (or lengths), it becomes very easy for readers to study the relationship. Simple bar
diagrams are very popular in practice. A bar chart can be either vertical or horizontal; vertical
bars are more popular.

Illustration: - The following table gives the birth rate per thousand of different countries over a
certain period of time.
Country Birth rate Country Birth rate

India 33 China 40
Germany 15 New Zealand 30
U. K. 20 Sweden 15

Represent the above data by a suitable diagram.


Comparing the size of bars, you can easily see that China's birth rate is the highest while
Germany and Sweden equal in the lowest positions. Such diagrams are also known as component
bar diagrams.

2.2.2 Sub - divided Bar Diagram


While constructing such a diagram, the various components in each bar should be kept in the same
order. A common and helpful arrangement is that of presenting each bar in the order of magnitude
with the largest component at the bottom and the smallest at the top. The components are shown
with different shades or colors with a proper index.

Illustration: - During 2016 - 2019, the number of students in XEE University are as follows.
Represent the data by a similar diagram.

2016- 2017- 2018-


2017 2018 2019
Arts 20,000 26,000 31000
Science 10,000 9,000 9500
Law 5,000 7,000 7500
Total 35,000 42,000 48000
140,000

120,000

100,000

80,000 2018-2019
2017-2018
60,000
2016-2017
40,000

20,000

0
Arts Science Law Total

2.2.3 Multiple Bar Diagram


This method can be used for data which is made up of two or more components. In this method
the components are shown as separate adjoining bars. The height of each bar represents the actual
value of the component. The components are shown by different shades or colors. Where changes
in actual values of component figures only are required, multiple bar charts are used.

Illustration: - The table below gives data relating to the exports and imports of a certain country
X (in thousands of dollars) during the four years ending in 2015 - 2019.
Year Export Import
2015-2016 319 250
2016-2017 339 263
2017-2018 345 258
2018-2019 308 206

Represent the data by a suitable diagram


400

350

300

250

200 Export
Import
150

100

50

0
2015-2016 2016-2017 2017-2018 2018-2019

2.2.4 Deviation Bar Charts

Deviation bars are used to represent net quantities - excess or deficit i.e. net profit, net loss, net
exports or imports, swings in voting etc. Such bars have both positive and negative values.
Positive values lie above the base line and negative values lie below it.

Illustration:-
Years Sales Net profits

1985 - 86 10% 50%


1986 - 87 14% -20
1987 – 88 12% -10%

Present the above data by a suitable diagram showing the sales and net profits of private
industrial companies.
2.3 Pie Chart

i) Geometrically it can be seen that the area of a sector of a circle taken radically, is proportional
to the angle at its center. It is therefore sufficient to draw angles at the center, proportional to the
original figures. This will make the areas of the sector proportional to the basic figures.

For example, let the total be 1000 and one of the component be 200, then the angle will be

In general, angle of sector at the center corresponding to a component

ii) When a statistical phenomenon is composed of different components which are numerous (say
four or more components), bar charts are not suitable to represent them because, under this
situation, they become very complex and their visual impressions are questioned. A pie diagram
is suitable for such situations. It is a circular diagram which is a circle (pie) divided by the radii,
into sectors (like slices of a cake or pie). The area of a sector is proportional to the size of each Pie
charts are useful to compare different parts of a whole amount. They are often used to present
financial information. E.g. A Company’s expenditure can be shown to be the sum of its parts
including different expense categories such as salaries, borrowing interest, taxation and general
running costs (i.e. rent, electricity, heating etc).

A pie chart is a circular chart in which the circle is divided into sectors. Each sector visually
represents an item in a data set to match the amount of the item as a percentage or fraction of the
total data set.

Illustration

A family's weekly expenditure on its house mortgage, food and fuel is as follows:
Expense Ksh 00
Mortgage 300
Food 225
Fuel 75

Draw a pie chart to display the information.

Solution:

The total weekly expenditure = 30000 + 22500 + 7500

= 60000

We can find what percentage of the total expenditure each item equals.

Percentage of weekly expenditure on:

To draw a pie chart, divide the circle into 100 percentage parts. Then allocate the number of
percentage parts required for each item.

• It is simple to read a pie chart. Just look at the required sector representing an item (or
category) and read off the value. For example, the weekly expenditure of the family on
food is 37.5% of the total expenditure measured.
• A pie chart is used to compare the different parts that make up a whole amount.

2.4 Graphs
A graph is a visual representation of data by a continuous curve on a squared (graph) paper. Like
diagrams, graphs are also attractive, and eye-catching, giving a bird's eye-view of data and
revealing their inner pattern.

Graphs of Frequency Distributions:-

The methods used to represent a grouped data are:-

1. Histogram
2. Frequency Polygon
3. Frequency Curve
4. Ogive or Cumulative Frequency Curve

2.4.1 Histogram
It is defined as a pictorial representation of a grouped frequency distribution by means of adjacent
rectangles, whose areas are proportional to the frequencies.

To construct a Histogram, the class intervals are plotted along the x-axis and corresponding
frequencies are plotted along the y - axis. The rectangles are constructed such that the height of
each rectangle is proportional to the frequency of the class and width is equal to the length of the
class. If all the classes have equal width, then all the rectangles stand on the equal width. In case
of classes having unequal widths, rectangles too stand on unequal widths (bases). For open-classes,
Histogram is constructed after making certain assumptions. As the rectangles are adjacent leaving
no gaps, the class-intervals become of the inclusive type, adjustment is necessary for end points
only.

For example, in a book sale, you want to determine which books were most popular, the high
priced books, the low priced books, books most neglected etc. Let us say you sold total 31 books
at this book-fair at the following prices.

Sh. ....2, Sh. 1, Sh. 2, Sh. 2, Sh. 3, Sh. 5, Sh. 6, Sh. 17, Sh. 17, Sh. 7, Sh. 15, Sh. 7, Sh. 7, Sh. 18,
Sh. 8, Sh. 10, Sh. 10, Sh. 9, Sh. 13, Sh. 11, Sh. 12, Sh. 12, Sh. 12, Sh. 14, Sh. 16, Sh. 18, Sh. 20,
Sh. 24, Sh. 21, Sh. 22, Sh. 25.

The books are ranging from Sh.1 to Sh.25. Divide this range into number of groups, class
intervals. Typically, there should not be fewer than 5 and more than 20 class-intervals are best
for a frequency Histogram.

Our first class-interval includes the lowest price of the data and, the last-interval of course
includes, the highest price. Also make sure that overlapping is avoided, so that, no one price falls
into two class-intervals. For example you have class intervals as 0-5, 5-10, 10-15 and so on, then
the price Sh.10 falls in both 5-10 and 10-15. Instead if we use Sh.1 - Sh.5, Sh.6=Sh.10, the class-
intervals will be mutually exclusive.

Therefore now we have distribution of books at a book-faire


Class-interval Frequency

Sh. 1- Sh. 5 6

Sh.6 - Sh.10 8

Sh.11 - Sh.15 10

Sh.16 - Sh.20 3

Sh.21 - Sh.25 4

Total n = sum fi = 31

Note that each class-interval is of equal width i.e. Sh.5 inclusive. Now we draw the frequency
Histogram as under.

2.4.2 Frequency Distribution (Curve):-


Frequency distribution curves are like frequency polygons. In frequency distribution, instead of
using straight line segments, a smooth curve is used to connect the points. The frequency curve
for the above data is shown as:
2.4.3 Ogives or Cumulative Frequency Curves
When frequencies are added, they are called cumulative frequencies. The curve obtained by
plotting cumulating frequencies is called a cumulative frequency curve or an Ogive (pronounced
ojive).

To construct an Ogive:-
1) Add up the progressive totals of frequencies, class by class, to get the cumulative
frequencies.
2) Plot classes on the horizontal (x-axis) and cumulative frequencies on the vertical (y-axis).
3) Join the points by a smooth curve. Note that Ogives start at (i) zero on the vertical axis, and
(ii) outside class limit of the last class. In most of the cases it looks like 'S'.
Note that cumulative frequencies are plotted against the 'limits' of the classes to which they
refer.
(A) Less than Ogive: - To plot a less than Ogive, the data is arranged in ascending order of
magnitude and the frequencies are cumulated starting from the top. It starts from zero on the y-
axis and the lower limit of the lowest class interval on the x-axis.
(B) Greater than Ogive: - To plot this Ogive, the data are arranged in the ascending order of
magnitude and frequencies are cumulated from the bottom. This curve ends at zero on the y-axis
and the upper limit of the highest class interval on the x-axis.

Illustrations: - On a graph paper, draw the two Ogives for the data given below of the I.Q. of
160 students.

Class -intervals: 60 - 70 70 - 80 80 – 90 90 - 100 100 - 110


No. of students: 2 7 12 28 42

110 - 120 120 - 130 130 - 140 140 - 150 150 - 160
36 18 10 4 1
Uses: - Certain values like median, quartiles, deciles, quartile deviation, coefficient of skewness
etc. can be located using Ogives. It can be used to find the percentage of items having values less
than.

2.4.4 Stem and Leaf Diagram

A stem and leaf diagram provides a visual summary of your data. This diagram provides a
partial sorting of the data and allows you to detect the distributional pattern of the data.

There are three steps for drawing a stem and leaf diagram.
1. Split the data into two pieces, stem and leaf.
2. Arrange the stems from low to high.
3. Attach each leaf to the appropriate stem.

Illustration

Suppose you have the heights of 20 people as follows:

154, 143, 148, 139, 143, 147, 153, 162, 136, 147, 144, 143, 139, 142, 143, 156, 151, 164, 157,
149, 146

What we have here is almost a stem and leaf diagram. Note that with the data written in this way
you can see what the modal class is (the one with the most values. You can also see the shape of
the distribution- most of the values are in the 140s with higher or lower values rarer.

To change this into a stem and leaf diagram, we just simplify it a little. Instead of writing out the
full figures each time (143, 143, 144, 143, ...) we write '14' and call this the 'stem' and then write
3, 3, 4, 3, ... (these being the 'leaves'). We would usually, however, write the leaves in order (with
the smallest first). Finally, we must also include a little key so that people know how to interpret
the diagram.

So we finish up with:

2.4.1 Back-to-back stem and leaf diagram


Back-to-back stem plots are used to compare two distributions side-by-side. This type of double
stem plot contains three columns, each separated by a vertical line. The center column contains
the stems. The first and third columns each contain the leaves of a different distribution. The
numbers for the leaves of the distribution in the leftmost column are aligned to the right and are
listed in increasing order from right to left. Here is an example of a back-to-back stem plot
comparing the distribution of marks obtained in an exam by a sample of 25 boys and 25 girls.

BOYS GIRLS
3 4 40 5 4 1 2 8 5
3 5 5 0 50 2 3 5 8 9 4
2 2 3 3 4 5 60 3 5 6 4 5
5 5 2 8 0 2 70 0 3 3
3 1 3 4 80 3 6 4
4 4 9 90 3 4

KEY: 40 5 =45

Can you comment on the shape of the distribution of the two sets of data?

2.5 Box and Whisker Plots

It is one step further to stem-and-leaf. It displays a number of statistics like, median, lower
quartile (Q1), upper quartile (Q3), Inter-quartile range (IQR). It tells us about the symmetry of
the distribution and also gives us the idea about the highest and the lowest values.

Illustration
Statistics CAT scores of 12 students are as follows:-

10, 22, 24, 27, 31, 33, 39, 40, 42, 43, 44, 45

Draw a box and whisker plot to represent the above scores.

Solution: The scores are arranged in the ascending order. 10, 22, 24, 27, 31, 33, 39, 40, 42, 43,
44, 45

1) Since n = 12 (total items)


12 12 + 2
the two middle scores are = 6th and = 7 th
2 2

i.e. 33 and 39 respectively

Therefore the average of the two is the median (n+1)/2 i.e. (12 +1)/2
i.e. Median = 33 + 39 = 72 = 36
1 1
2 2

2) The quartile (Q1) is the median of the bottom half. i.e. 25th percentile
Thus
𝑛+1 12+1
𝑄1 = = = 13.25 ≅ 3.rd score=24
4 4

3) The upper quartile (Q3) is the median of the top half. i.e.75th percentile.
Thus
3 (𝑛+1) 3(12+1)
𝑄3 = 4 = 4 = 9.75 score =approximately 10th score
Now the box-plot is constructed as follows: -
i) the line inside the box indicates the median.
ii) The left side of this box indicates the lower quartile (Q1).
iii) The right side of this box indicates the upper quartile (Q3).
iv) A straight line is then drawn from the lowest value of this distribution through the box to the
highest value of this distribution. This horizontal straight line is called the
"Whiskers".

Then the above CAT score in box-plot will look like this:

0 10 20 30 40 50 60

2.6 Exercise

1. The bar chart below shows the number of people in a selection of families.
10

6
Number of
families 4

0
3 4 5 6 7 8 9 10
Number of people in a family

(a) How many families are represented?

(b) Write down the mode of the distribution.

(c) Find, correct to the nearest whole number, the mean number of people in a
family.

2. A marine biologist records as a frequency distribution the lengths (L), measured to


the nearest centimeter, of 100 mackerel. The results are given in the table below.
Length of mackerel Number of
(L cm) mackerel
27 < L ≤ 29 2
29 < L ≤ 31 4
31 < L ≤ 33 8
33 < L ≤ 35 21
35 < L ≤ 37 30
37 < L ≤ 39 18
39 < L ≤ 41 12
41 < L ≤ 43 5
100

(a) Construct a cumulative frequency table for the data in the table.

(b) Draw a cumulative frequency curve.

Hint: Plot your cumulative frequencies at the top of each interval.

3. The following table shows the age distribution of teachers who smoke at Fegi High
School.

Ages Number of
smokers
20 ≤ x < 30 5
30 ≤ x < 40 4
40 ≤ x < 50 3
50 ≤ x < 60 2
60 ≤ x < 70 3

(a) Calculate an estimate of the mean smoking age.


(b) Construct a histogram to represent this data.

4. The following results give the heights of sunflowers in centimeters.

180 184 195 177 175 173 169 167 197 173 166 183 161 195 177
192 161 165

Represent the data by a stem and leaf diagram.

5. The following stem and leaf diagram gives the heights in cm of 39 schoolchildren.
Ste Leaf Key 2 represents 132
m 13 cm.
13 2, 3, 3, 5, 8,
14 1, 1, 1, 4, 5, 5, 9,
15 3, 4, 4, 6, 6, 7, 7, 7, 8, 9, 9,
16 1, 2, 2, 5, 6, 6, 7, 8, 8,
17 4, 4, 4, 5, 6, 6,
18 0,
(a) (i) State the lower quartile height,

(ii) State the median height

(iii) State the upper quartile height.

(b) Draw a box-and-whisker plot for the above information.

You might also like