CH 2
CH 2
After collecting and organize data the next important task is effective presentation of bulk
volume data. The major objectives of data presentation are:-
➢ To presenting data in visual display and more understandable
➢ To have great attraction about the data
➢ To facilitate quick comparisons using measures of location and dispersion.
➢ To enable the reader to determine the shape and nature of distribution to make
statistical inference.
➢ To facilitate further statistical analysis.
There are three methods of data presentation, namely: Tables (e.g., frequency distribution),
Graphs (e.g., histogram), and Diagrams (e.g., bar chart) are commonly used to summarize
both qualitative and quantitative data.
1
Arba Minch University Basic Statistics
Frequency Distribution: is the organization of raw data in table form, using classes and
frequencies. Where, Frequency (f) is the number of values in a specific class of the
distribution.
There are three basic types of frequency distributions, and there are specific procedures for
constructing each type. The three types are categorical, ungrouped and grouped frequency
distributions.
Example 2.1: Twenty-five army inductees were given a blood test to determine their blood
type. The data set is given as follows:
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
❖ Solution:
Our data type is nominal so we use categorical frequency distribution type to present data.
After we have followed the above six steps the following frequency distribution was obtained.
A B C D
Class Tally Frequency Percent
A ///// 5 20
B ///// // 7 28
2
Arba Minch University Basic Statistics
O ///// //// 9 36
AB //// 4 16
Example 2.2: A demographer is interested in the number of children a family may have,
he/she took sample of 30 families and obtained the following observations.
Number of children in a sample of 30 families
4 2 4 3 2 8
3 4 4 2 2 8
5 3 4 5 4 5
4 3 5 2 7 3
3 6 7 3 8 4
Construct a frequency distribution for this data.
❖ Solution:
• Find the range, Range=Max-Min= 8-2=6
• These individual observations can be arranged in ascending or descending order of
magnitude in which case the series is called array. Array of the number of children in 30
families is:
• 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 7, 7, 8, 8, 8
• Frequency distribution of children in a 30 families is as follow:
No of children No of family
3
Arba Minch University Basic Statistics
4
Arba Minch University Basic Statistics
3. Select the number of classes desired. Here, we have two choices to get the desired
number of classes:
i. Use Struge’s rule. That is, K = 1+ 3.32 log n where K is the number of class and n
is the number of observations. Round the decimal to the upper nearest integer.
ii. Select the number of classes arbitrarily between 5 and 20 conventionally. If you fail
to calculate K by Struge’s rule, this method is more appropriate.
When we choose the number of classes, we have to think about the following criteria:
i. There should be between 5 and 20 classes.
ii. The classes must be mutually exclusive. This means that no data value can fall into
two different classes
iii. The classes must be all inclusive or exhaustive. This means that all data values
must be included.
iv. The classes must be continuous. There are no gaps in a frequency distribution. The
only exception occurs when the class with a zero frequency is the first or last. A
class width with a zero frequency at either end can be omitted without affecting the
distribution.
v. The classes must be equal in width. The exception here is the first or last class. It is
possible to have a "below ..." or "... and above" class. This is often used with Ages.
4. Find the Class Width (W) by dividing the range by the number of classes
R Range
W = or W =
K Number of Classes
Note that: Round the value of W up to the nearest whole number if there is a reminder.
For instance, 4.7 ≈ 5 and 4.12 ≈ 5.
5. Select the Starting Point as the LCL. This is usually the lowest score (observation). Add
the width to that score to get the LCL of the next class. Keep adding until you achieve the
number of desired classes (K) calculated in step 3.
6. Find the UCL; subtract unit of measurement (U) from the LCL of the second class in
order to get the UCL of the first class. Then add the W to each UCL to get all UCL.
Unit of measurement (U): Is the smallest value of difference between consecutive
observations or sometimes it is next value. For instance, 28, 23, 52, and then the unit
of measurement of this data set is one. Because take one datum arbitrarily, say 23,
then the next value will be 24. Therefore, U = 24 − 23 = 1. If the data set is 24.12,
30, 21.2, then give priority to the datum with more decimal place. Take 24.12 and
guess the next possible value. It is 24.13. Therefore, U = 24.12 - 24.13 = 0.01
Note that: U=1 is the maximum value of unit of measurement and is the value when
we don’t have a clue about the data.
U
7. Find the Class Boundaries (CB). Lower Class Boundary = Lower Class Limit − And
2
U
Upper Class Boundary = Upper Class Limit + .
2
8. Find the Class Mark/Mid Points (CM)
9. Tally the data and write the numerical values for tallies in the frequency column.
5
Arba Minch University Basic Statistics
Example 2.3: Consider the following set of data and construct the frequency distribution.
11 29 6 33 14 21 18 17 22 38
31 22 27 19 22 23 26 39 34 27
❖ Solution:
Using steps to construct grouped frequency distribution
1. Highest value=39, Lowest value=6
2. R = 39 - 6 = 33
3. K = 1 + 3.32 log20 = 5.32 6 which is a classes desired
R 33
4. W = = = 5.5 6
K 6
5. Select starting point. Take the minimum which is 6 then add width 6 on it to get the
next class LCL.
➢ Lower class limit of the first class: LCL1=6
➢ Lower class limit of the second class: LCl2= LCL1+W= 6+6 =12
➢ LCL3= LCL2+W= 12+6=18
➢ LCL4= LCL3+W= 18+6=24......... continue
6 12 18 24 30 36
6
Arba Minch University Basic Statistics
8. With respective the CL and CB, the class marks (CM) each class limit are:-
➢ CM1= (LCL1+UCL1)/2 = (6+11)/2 = 8.5 or (LCB1+UCB1)/2 = (5.5+11.5)/2 =
8.5……continue
9. Tally the data and write the numerical values for tallies in the frequency column.
Classes desired( K) Class limit (CL) Frequency (f)
1 6 - 11 2
2 12 - 17 2
3 18 - 23 7
4 24 - 29 4
5 30 - 35 3
6 36 - 41 2
Total frequency 20
11. Find Relative frequency (rf): it is the frequency (f) divided by the total frequency (Tf)
and find Relative cumulative frequency (rcf): it is the CF divided by the total
frequency (Tf).
➢ rf1= f1/Tf= 2/20= 0.1, rf3= f3/Tf= 7/20= 0.35......continue
➢ less than type rcf1= LCF1/Tf= 2/20= 0.1,
➢ less than type rcf2= LCF2/Tf= 4/20= 0.2........continue
➢ more than type rcf1= MCF1/Tf= 20/20= 1,
➢ more than type rcf2= MCF2/Tf= 18/20= 0.9.....continue
K f LCF MCF rf rcf(less than rcf(more than type)
type)
1 2 2 20 0.10 0.10 1.00
2 2 4 18 0.10 0.20 0.90
3 7 11 16 0.35 0.55 0.80
4 4 15 9 0.20 0.75 0.45
5 3 18 5 0.15 0.90 0.25
6 2 20 2 0.10 1.00 0.10
Therefore: the overall grouped frequency distribution of the given data set is show below
K CL CB CM f LCF MCF rf rcf(less rcf(more
than type) than type)
1 6 - 11 5.5– 11.5 8.5 2 2 20 0.10 0.10 1.00
2 12 - 17 11.5–17.5 14.5 2 4 18 0.10 0.20 0.90
3 18 - 23 17.5– 23.5 20.5 7 11 16 0.35 0.55 0.80
4 24 - 29 23.5– 29.5 26.5 4 15 9 0.20 0.75 0.45
7
Arba Minch University Basic Statistics
Example 2.4: The following data are percentage coverage of forest in countries in Africa.
Construct frequency distribution of CL, CB, CM, and F by using sturge’s rule.
30, 25, 23, 41, 39, 27, 41, 24, 32, 29, 35, 31, 36, 33, 36, 42, 35, 37, 41, and 29
❖ Solution
1. Given no. of observation( n ) = 20,then no. of classes
K = 1 + 3.32 log 1020
5 , where k is number of classes.
highestvalue − lowestvalue 42 − 23
2. Class width(W) = = 4
k 5
2.3.1. Histogram
✓ Histogram is a special type of bar graph in which the horizontal scale represents class
intervals of data values and the vertical scale represents frequencies (f).
✓ The height of the bars correspond to the frequency values, and the drawn adjacent to
each other (without gaps).
✓ We can construct a histogram after we have first completed a frequency distribution
table for a data set.
Example 2.4: The histogram for the data in example 2.3 is (See Figure 2.1)
Example 2.5: The frequency polygon for the data in example 2.3 is (See Figure 2.2)
8
Arba Minch University Basic Statistics
Frequency
7.0
Frequency
6.0
polygon
5.0
4.0
3.0
2.0
9
Arba Minch University Basic Statistics
✓ Note that the O-give uses class boundaries along the horizontal scale, and graph begins
with the lower boundary of the first class and ends with the upper boundary of the last
class.
✓ There are two type of O-give namely less than O-give and more than O-give.
✓ If we plot a ‘less than’ Ogive is moving up and to the right while if we plot a ‘more than’
curve then it would show a declining slope and to the right.
In last lesson we observed that the technique of tabulation helps us to put unorganized
collected data in an orderly form so that it is easily understood and the needed information is,
quickly located. However, the grouping of data or too many figures in a table do not always
appeal to a common man as too many figures are generally confusing and fail to convey the
definite pattern or trend of the figures. A picture is said to be worth 10,000 words, i.e.,
through pictorial presentation data can be presented in an interesting form. Importance:
The most commonly used diagrammatic presentation for discrete as well as qualitative data
are: Line diagram, Pie charts, Bar charts, and Pictograms
60
50
40
Sales($)
30 Sales($) in 1957
20 Sales($) in 1958
10 Sales($) in 1959
0
A B C
Production
10
Arba Minch University Basic Statistics
600 600
➢ Angle of SectorFood = x 3600 =1440, PercentageofFood = x100 = 40%
1500 1500
Item of Expenditure Family Budget Angle of Sectors %
Food $600 1440 40
Clothing $100 240 6.67
House Rent $400 960 26.67
Fule and Lighting $100 240 6.67
Miscellaneouse $300 720 20
Total $1500 3600 100
Percent
20% Food
40% Clothing
7%
House Rent
Fule and Lighting
26%
Miscellaneouse
7%
11
Arba Minch University Basic Statistics
0 0
A B C A B C
Figure 2.5: Simple bar charts of the three products in 1957 and 1958
12
Arba Minch University Basic Statistics
100
80
60 54 C
24 35
40 B
24 21 18 A
20
12 14 18
0
Sales($) in 1957 Sales($) in 1958 Sales($) in 1959
Figure 2.6: Component bar charts of the three products in 1957, 1958 and 1959
54
60
50
35
Sales in $
40
24 24 21 18 18
30 A
12 14
20 B
10
0 C
Sales($) in Sales($) in Sales($) in
1957 1958 1959
Year of production
Figure 2.7: Multiple bar charts of the three products in 195, 1958 and 1959
13
Arba Minch University Basic Statistics
0
Soap Sugar Coffee Net profit
-100
-90
-200
Figure 2.8: Deviation bar diagram of the three commodities
2.4.4. Pictogram
In this diagram, we represent data by means of some picture symbols. We decide about a
suitable picture to represent a definite number of units in which the variable is measured.
The stem and leaf plot is a method of organizing data and is a combination of sorting and
graphing. It has the advantage over a grouped frequency distribution of retaining the actual
data while showing them in graphical form. A stem and leaf plot is a data plot that uses part
of the data value as the stem and part of the data value as the leaf to form groups or classes.
Example:
At internet business center, the number of customers served each day for 20 days is shown.
Construct a stem and leaf plot for the data.
25 31 20 32 13
14 43 02 57 23
36 32 33 32 44
32 52 44 51 45
Solution:
02, 13, 14, 20, 23, 25, 31, 32, 32, 32,
14
Arba Minch University Basic Statistics
02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36 43, 44, 44, 45 51, 52, 57
Step 3: A display can be made by using the leading digit as the stem and the trailing digit as
the leaf. For example, for the value 32, the leading digit, 3, is the stem and the trailing digit, 2,
is the leaf. For the value 14, the 1 is the stem and the 4 is the leaf. Now a plot can be
constructed as shown below
Figure shows that the distribution peaks in the center and that there are no gaps in the data.
For 7 of the 20 days, the number of customer visiting internet center between 31 and 36. The
plot also shows that the center served from a minimum of 2 customers to a maximum of 57
customers in any one day
The Dotplot
A dotplot uses points or dots to represent the data values. If the data values occur more than
once, the corresponding points are plotted above one another. A dotplot is a statistical graph
in which each data value is plotted as a point (dot) above the horizontal axis. Dotplots are
used to show how the data values are distributed and to see if there are any extremely high or
low data values.
Example
The data show the number of named storms each year for the last 40 years. Construct and
analyse a dotplot for the data.
19 15 14 7 6 11 11
9 16 8 8 11 9 8
16 12 13 14 13 12 7
15 15 19 11 4 6 13
10 15 7 12 6 10
28 12 8 7 12 9
Step 1: Find the lowest and highest data values, and decide what scale to use on the
horizontal axis. The lowest data value is 4 and the highest data value is 28, so a scale from 4
to 28 is needed.
Step 2: Draw a horizontal line, and draw the scale on the line.
Step 3: Plot each data value above the line. If the value occurs more than once, plot the other
point above the first point.
15
Arba Minch University Basic Statistics
The graph shows that the majority of the named storms occur with frequency between 6
and 16 per year. There are only 3 years when there were 19 or more named storms per year.
Exercise 2.1:- draw o-give curves (less than o-give and more than o-give curves), suppose we
are given the following data set:
Exercise 2.1: Following is the distribution of the size of certain farms selected at random
from a district. Draw Histogram, Frequency polygon and o-give curves (less than o-give and
more than o-give curves).
Size of farms No. of farms
5-15 8
15-25 12
25-35 17
35-45 29
45-55 31
55-65 5
65-75 3
16