Unit 03 - 04
Unit 03 - 04
Unit 03 - 04
INTRODUCTION
In the preceding block, you have learnt about the collection of data. When conducting a
statistical study, you must gather data for the particular variable under study.
In this block, you will learn about classification and presentation of data. The first part deals with
‘classification of data’ and the following unit deals with ‘presentation of data’.
The purpose of this block is to explain how to organize data by constructing ‘frequency
distribution’ and how to present the data by constructing graphs and charts. The graphs and
charts illustrated in this block include histograms, frequency polygons, ogives, pie charts, bar
charts, time series graphs, and pictographs (pictograms).
CONTENTS:
3.0. Aims and Objectives
3.1 Introduction
3.2. Definition of Classification of Data
3.3. Types of Classification
3.4. Frequency Distribution
3.5. Common Terminologies in a Grouped Frequency Distribution
3.6. Rules for Forming a Grouped Frequency Distribution
3.7. Cumulative Frequency Distribution (CFD)
3.8. Relative Frequency Distribution (RFD)
3.9. Summary
3.10.Answers to Check Your Progress (CYP) Questions
3.11. Model Examination Questions
3.12. Glossary
3.13. References
The aim of this unit is to study about the collection of data for a statistical study and discuss the
various types of classification of data, and then to organize these data into a frequency
distribution.
3.1. INTRODUCTION
After collecting relevant information (data) for the purpose of statistical investigation, the next
important task is classification and presentation of this data. It is difficult to group the meaning
of any considerable volume of numerical data unless their mass is some hours reduced to
relatively few convenient classes or categories and presented with the help of some kinds of
visual aid.
This section discusses classification of data. Presentation of data using graphs and charts will be
seen in the next unit.
Purposes of Classification:-
To eliminate unnecessary detail.
To bring out clearly points of similarity & dissimilarity
To enable one to form mental pictures of objects on measurements
To enable one to make comparisons and draw inferences
Example
Region Common Language Spoken
1 Tigrigna
2 Afar
3 Amharic
4 Oromifa
2. Chronological Classification:- Data are arranged according to time like year, month.
Example
Year (in EC) Population (in million)
1974 30
1986 52
1991 60
3. Qualitative Classification: - Data are arranged according to attributes like color, religion,
marital-status, sex, educational background, etc.
Educated Un educated
Example 4.
Mr. x Height (X) in cm
A 160
B 182
C 175
D 178
Note: There are two kinds of variables, which can have values: Discrete Variable and
Continuous Variable.
A. Discrete Variables – are variables that are associated with enumeration or counting
Example
Number of students in a class
Number of children in a family, etc
When the raw data have been collected, they should be put in to an ordered array in an ascending
or descending order so that it can be looked at more objectively. Then this data must be
organized in to a “FD” which simply lists the values or classes with their corresponding
frequencies in a tabular form. Here, frequency refers to the number of observations a certain
value occurred in a data.
The tabular representation of values of a variable together with the corresponding frequency is
called a Frequency Distribution (FD).
Definition:
A frequency distribution is the organization of raw data in table form, using classes and
frequencies.
If the mass of the data is very large, it is necessary to condense the data in to an appropriate
number of classes or groups of values of a variable and indicate the number of observed values
which fall in to each class. Therefore, a GFD is a frequency distribution where values of a
variable are linked in to groups & corresponded with the number of observations in each group.
Example *
Values (xi) 1 - 25 26 - 50 51 - 75 76 - 100
Frequency (fi) 3 10 18 6
i. Class:- group of values of a variable between two specified numbers called lower class limit
(LCL) & upper class limit (UCL)
*
In Example , the GFD contains four classes: 1 – 25, 26 – 50, 51 – 75, and 76 – 100
LCL1 = 1, UCL1 = 25 LCL3 = 51, UCL3 = 75
LCL2 = 26, UCL2 = 50 LCL4 = 76, UCL4 = 100
ii. Class Frequency (or Simply Frequency): refers to the number of observations
corresponding to a class.
iii. Class Boundaries: are boundaries obtained by subtracting half of the unit of measurement
(u) from the lower limits or by adding ½ (u) on the upper limits of a class.
i.e UCBi = UCLi + ½ (u)
LCBi = LCLi - ½ (u)
Where UCBi = Upper Class Boundaries and
LCBi = Lower Class Boundaries
Remark: The unit of measurement (u) is the gap between any two successive classes. i.e
*
In Example , consider the 2nd class, 26 – 50 , since u = 26 – 25 = 1,
LCL2 = 26 UCL2 = 50
LCB2 = 26 - ½(1) = 25.5 UCB2 = 50 + ½(1) =50.5
iv. Class Width (size of a class or class interval): it is the difference between the upper and
lower class limits or the difference between the upper and lower class boundaries of any class.
Remarks:
1. If both the LCL & UCL are included in a class, it is called an inclusive class. For
inclusive classes,
Class width (cw) = UCBi - LCBi
2. If LCL is included and the UCL is not included in a class, it is called an exclusive class.
For exclusive classes
cw = UCLi – LCLi
vi. Range (R) : is the difference between the largest (L) and the smallest (S) values in a
data
R=L–S
2) The number of classes neither should either to be too larger nor should be too small.
Normally, 5 to 20 classes are recommended
3) All the classes should be of the same width. An approximate suitable class width can be
obtained as:
Range R L S
cw i.e cw
Number of Classes n n
R
Example 8. Let 6.8263
n
If all the observations are whole numbers, cw = 7
If all the observations are to one decimal places, cw = 6.8
If all the observations are to two decimal places, cw = 6.83, etc.
Note that a suitable number of classes can be obtained by using the formula n 1 + 3.322 logN
up/down to the nearest whole number, where N is the total number of observations.
Remark Unequal class intervals create problem in graphing and computing some statistical
measures
Example 9. The number of customers for consecutive 30 days in a supermarket was listed as
follows:
20 48 65 25 48 49
35 25 72 42 22 58
53 42 23 57 65 37
18 65 37 16 39 42
49 68 69 63 29 67
a. construct a GFD with a suitable number of classes
b. complete the distribution obtained in (a) with class boundaries & class marks
Range 56
iii. Class width = = 9.33 = cw
n 6
For the sake of convenience, take cw to be 10 (note that it is also possible to
choose the cw to be 9).
iv. Take lower limit of the 1st class (LCL1) to be 16 & u = 1
i.e. LCL1 = 16 and UCL1 = LCL1 + cw – u = 16+10-1 = 25
LCL2 = LCL1 + cw = 16 + 10 = 26 UCL2 = UCL1 + cw = 25 + 10 = 35
LCL3 = LCL2 + cw = 26 + 10 = 36 UCL3 = UCL2 + cw = 35 + 10 = 45
a)
Class (xi) Frequency (fi) Class (xi) Frequency (fi) CBi cmi
16 – 25 7 16 – 25 7 15.5 – 25.5 2.05
26 – 35 2 26 – 35 2 25.5 – 35.5 30.5
36 – 45 6 36 – 45 6 35.5 – 45.5 40.5
46 – 55 5 46 – 55 5 45.5 – 55.5 50.5
56 – 65 6 56 – 65 6 55.5 – 65.5 60.5
66 – 75 4 66 – 75 4 65.5 – 75.5 70.5
b)
CYP 3
Construct a grouped frequency distribution for the following ages of 50 persons with 6 classes.
37 40 69 35 36 70 72 62 36 72
65 64 47 59 55 42 45 50 46 65
54 63 51 50 61 60 58 58 56 58
55 45 49 51 50 56 44 60 70 44
52 43 55 46 42 62 57 48 60 55
It is the collection of values of a variable above or below specified values in a distribution. GFD
is of two types.
a. ‘Less Than’ Cumulative Frequency Distribution (<CFD): shows the collection of
cases lying below the upper class boundaries of each class.
Remark: The frequency distribution does not tell us directly the number of units above or
below specified values of the classes this can be determined from a “cumulative Frequency
Distribution’
Class (xi) Frequency (fi) Less than Cumulative More than Cumulative
Frequency (<cfi) Frequency (>cfi)
3-6 4 4 30
7 – 10 7 11 26
11 – 14 10 21 19
15 – 18 6 27 9
19 – 22 3 30 3
This means that from ‘less than’ cumulative frequency distribution there are 4 observations less
than 6.5, 11 observations below 10.5, etc and from ‘more than’ cumulative frequency
distribution 30 observations are above 2.5, 25 above 6.5 etc.
It enables the researcher to know the proportion or percentage of cases in each class. Relative
frequencies can be obtained by dividing the frequency of each class by the total frequency. It
can be converted in to a percentage frequency by multiplying each relative frequency by 100%.
i.e.
fi
Rf i
n
This unit discussed the definitions of classification of data and a frequency distribution. In order
to describe situations, draw conclusions or make inferences about random events, one must
organize the data in some meaningful way. The most convenient method of organizing data is to
construct a frequency distribution.
CYP 1
Value(xi) Frequency(fi)
4 3
5 4
7 7
8 4
10 2
CYP 2
a) 12
b) 3
c) i) L.C.L4 = 20 and U.C.L4 = 24
ii) Since u = 10 – 9 = 1 (or any gap between two consecutive classes)
L.C.B3 = L.C.L3 – ½(u) = 15 - ½.1 = 14.5
U.C.B3 = U.C.L3 + ½(u) = 19+ ½.1 = 19.5
iii) class interval = class width = cw = UCB5 – LCB5 = 29.5 – 25.5 = 6
iv) class mark(cm2) = UCB2 + LCB2
2
= 19.5 + 14.5
2
= 24/2
= 12
CYP 3
a) A frequency distribution is the organization of raw data, in table form, that lists
values or classes with their corresponding frequencies.
b) The mid point of a class is found by adding the upper and lower limits and
dividing by
c) If the gap between any two successive classes is one and the limits of a class are 10-19,
then the width of the class is 9.
d) If the limits of a class in a frequency distribution are 26-30, then the boundaries are
25.5-30.5.
e) When data is first collected, it is called raw data.
32 21 28 31 35 46 48 49 49 48
36 37 22 31 28 34 20 45 44 48
38 33 33 23 28 29 33 26 36 30
43 42 32 36 24 27 27 32 45 45
39 39 38 32 33 25 30 28 37 36
42 43 38 40 35 34 20 30 36 32
40 38 38 40 46 36 35 21 31 35
41 42 39 40 46 44 32 37 22 27
41 39 40 38 44 45 48 36 32 23
40 41 40 44 49 49 49 49 37 33
Construct a Grouped Frequency Distribution (GFD) with five classes for the above data.
3.12. GLOSSARY
Frequency: The number of values in a specific class of the distribution or the number of times a
value occurs in the distribution.
Cumulative Frequencies: refer to the total frequency of all values up to and including the upper
boundary of the class interval that is under consideration.
Class: In set refers to a group of data considered as one item in a frequency distribution.
Range: Means the difference between the largest and the smallest values in a set of data.
Class Boundaries: Boundaries that are obtained by adding and subtracting half of unit of
measurement.
3.13. REFERENCES
CONTENTS:
4.0. Aims and Objectives
4.1. Introduction
4.2. Histogram
4.3. Frequency Polygon
4.4. Cumulative Frequency Curve (Ogive)
4.5. Line Graph
4.6. Vertical Line Graph
4.7. Bar Chart (Bar Diagram)
4.8. Types of Bar Charts
4.9. Pie Chart
4.10. Pictograph (Pictogram)
4.11. Summary
4.12. Answer to Check Your Progress Questions (CYP)
4.13. Model Examination Questions
4.14. Glossary
4.15. References
The aim of this unit is to study how to construct and present data using different types of graphs,
charts, and diagrams that can facilitate comparisons and in general to have an over all good
picture of data.
4.1. INTRODUCTION
This unit deals with the study of organizing a set of raw data in to a Frequency Distribution (FD)
and describes the distribution graphically in a histogram, a frequency polygon, & a cumulative
frequency curve (ogive). The other types of numerical information will be summarized &
presented in the form of bar chart, pie chart or a pictogram.
Definition:
4.2. HISTOGRAM
After you complete a frequency distribution, your next step will be to construct a “picture” of
these data values using a histogram. A histogram is a graph consisting of a series of adjacent
rectangles whose bases are equal to the class width of the corresponding classes and whose
heights are proportional to the corresponding class frequencies. Here, class boundaries are
marked along the horizontal axis (x – axis) and the class frequencies along the vertical axis ( y –
axis) according to a suitable scale. It describes the shape of the data. You can use it to answer
quickly such questions a,s are the data symmetric? And where do most of the data values lie?
Solution:
10
Class frequency (fi)
It is a line graph of frequency distribution. Although a histogram does demonstrate the shape of
the data, perhaps the shape can be more clearly illustrated by using a frequency polygon. Here,
you merely connect the centers of the tops of the histogram bars (located at the class midpoints)
with a series of straight lines. The resulting figure is a frequency polygon. Here the class marks
are plotted along the x – axis and the class frequencies along the y – axis. Empty classes are
include at each end so that the curve will anchor with the x – axis.
Example 2. Construct a frequency polygon for the frequency distribution given in Example9
Solution:
A frequency polygon for the
distribution in example 9
15
frequency (fi)
10
0
0.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5
Class marks (cmi)
CYP 2 construct a frequency polygon for the frequency distribution given under CYP 1
It is the graphic representation of a cumulative frequency distribution Ogives are of two kinds.
‘Less than’ ogive and ‘more than’ Ogive < Ogive and > Ogive.
A) ‘Less than’ ogive: here, upper class boundaries are plotted against the ‘less than’
cumulative frequencies of the respective class & they are joined by adjacent lines.
Example 3. Draw a ‘less than’ ogive for the frequency distribution in Example 11
Solution:
35
Less than cumulative
30
frequency (<Cfi)
25
20
15
10
5
0
6.5 10.5 14.5 18.5 22.5
Upper class boundary (UCBi)
B) ‘More than’ ogive: here, lower class boundaries are plotted against the ‘more than’
cumulative frequencies of their respective class and they are joined by adjacent lines.
Example 4. Draw a ‘More than’ ogive for the frequency distribution in Example 11
Solution:
40
More than cumulative
30
frequency (>Cfi
20
10
0
2.5 6.5 10.5 14.5 18.5
lower class boundaries (LCBi)
It represents the relation ship between time (on the x-axis) and values of variable (on the y-axis).
The values are recorded with respect to the time of occurrence.
Solution:
A line graph showing the above time series
35
30 30
25 25
20 20
Values
15 15
10 10 10
5
0
1986 1987 1988 1989 1990 1991
Year
Is a graphical representation of discrete data (or characteristics expressed with whole numbers)
with respect to the frequencies. Vertical solid lines are used to indicate the frequencies.
Family A B C D E
Number of children 3 2 7 6 4
Solution:
Y
7 …………………
6 …………………………
5
4 ………………………………
3 ……
2 ……………
1
X
A B C D E
vertical line graph showing number of children in family A , B , C , D and E
Histogram, Frequency polygon, ogives are used for data having an interval or ratio level of
measurement. The other kinds of presenting statistical data suitable for a particular kind of
situations are bar charts, pie chart and pictograph.
Bar chart is a series of equally spaced bars of uniform width where the height (length) of a bar
represents the amount (magnitude) of frequency corresponding with a category. Bars may be
drawn horizontally or vertically. Vertical bar graphs are preferred as they allow comparison with
other bars.
It represents a single set of data (variable) classified in different categories. Singular bars are
drawn with the respective frequencies.
Example18: Revenue (in millions of Birr) of company x from 1980 to 1982 is given below
Year Revenue
1980 50
1981 150
1982 200
Solution:
250
200
Revenue
150
100
50
0
1980 1981 1982
year
here two or more bars are grouped with the corresponding frequency to represent two or more
interrelated data in each category. The bars of related variables are kept adjacent to each other
for every set of values. These charts can be used if the overall total is not required and each bar
is shaded or colored separately and a key is given to distinguish them.
Example19: The following table shows the production of wheat and maize in hundreds of
quintals.
1982 60 100
Solution:
100 100
80 80
60 60 60
Number of
quintals 40 40 maize
20 20 wheat
0
1980 1981 1982
Year
It is used to present data by subdividing a single bar with respect to the proportional frequency.
Each portion of the bar is then shaded or colored and a key is give to distinguish them.
Example20: The number of quintals of wheat and maize (in millions of quintals) produced by
country x in the indicated years.
600
Number of
quintals
400 200 100 Maize
200 150 Wheat
300 350
150
0
1980 1981 1982
Year
It is a subdivided bar chart where percentages are used in each classification rather than the actual
frequencies.
Example 21: construct percentage bar chart for the data in Example 19.
Solution:
Year % of Wheat Production % of Maize
Production
1980 150/300 100 = 50 150/300 100 = 50
1981 300/500 100 = 60 200/500 100 = 40
1982 350/450 100 = 78 100/450 100 = 22
100%
22
80% 50 40
Percentage
produced
60% wheat
40% 78 maize
50 60
20%
0%
1980 1981 1982
Year
A pie chart is a circle divided in to various sectors with areas proportional to the value of the
component they represent. It shows the components in terms of percentages not in absolute
magnitude. The degree of the angle formed at the center has to be proportional to the values
represented.
300
350 Food
House rent
Clothing
Misc.
100
250
Example 23: In comparing the population of a country from 1990 to 1992, we simply draw
pictures of people where each picture may represent 1000,000 people.
1991 -
1990 -
4.11. SUMMERY
This unit discussed how to present the organized data. Once a frequency distribution is
constructed, the representation of the data by using graphs is a simple task. The most commonly
used graphs in research statistics are the histograms, frequency polygon, an ogive, and other
graphs and diagrams, like the bar charts, pie charts, pictograms can also be used. And some of
these graphs are seen frequently in newspapers, magazines, and various statistical reports.
CYP 1
y
freq.12
10
x
5 10 15 20 25 30 35
Class boundaries (CBi)
CYP 2
. y
12
10
Cummulative Frequency
x
2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5
Class Marks (cmi)
4.14. GLOSSARY
Histogram: Refers to a statistical graph which represents, by the height of a rectangular column,
the number of times that each class of result occurs in a sample or experiment.
Frequency Polygon: Refers to the graph obtained when the mid points of the tops of the
rectangles in a histogram having equal class intervals are connected
by line segments.
Frequency Curve: Refers to a smooth frequency polygon for data that can take a continuous set
of values.
Bar Chart: Refers in a graph made up of bars whose lengths are proportional to quantities in a
set of data
Pie Chart: Refers to a diagram wherein proportions are shown as sectors of a circle.
4.15 REFERENCES