18bge14a U2
18bge14a U2
RaviSankar Page: 1
I BSc Geography (English Medium) Dept.of Statistics
UNIT-II
Formation of Frequency Distribution
Frequency distribution is a series when a number of observations with similar or closely
related values are put in separate bunches or groups, each group being in order of magnitude in
a series. It is simply a table in which the data are grouped into classes and the number of cases
which fall in each class are recorded. It shows the frequency of occurrence of different values
of a single Phenomenon.
A frequency distribution is constructed for three main reasons:
✓ To facilitate the analysis of data.
✓ To estimate frequencies of the unknown population distribution from the
distributionof sample data and
✓ To facilitate the computation of various statistical measures
Raw data:
The statistical data collected are generally raw data or ungrouped data. Let us consider
the daily wages (in Rs ) of 30 labours in a factory.
80 70 55 50 60 65 40 30 80 90
75 45 35 65 70 80 82 55 65 80
60 55 38 65 75 85 90 65 45 75
The above figures are nothing-but raw or ungrouped data and they are recorded as they
occur without any pre consideration. This representation of data does not furnish any useful
information and is rather confusing to mind. A better way to express the figures in an ascending
or descending order of magnitude and is commonly known as array. But this does not reduce
the bulk of the data. The above data when formed into an array is in the following form:
30 35 38 40 45 45 50 55 55 55
60 60 65 65 65 65 65 65 70 70
75 75 75 80 80 80 80 85 90 90
The array helps us to see at once the maximum and minimum values. It also gives a
rough idea of the distribution of the items over the range . When we have a large number of
items, the formation of an array is very difficult, tedious and cumbersome. The Condensation
should be directed for better understanding and may be done in two ways, depending on the
nature of the data.
A. Discrete (or) Ungrouped frequency distribution:
In this form of distribution, the frequency refers to discrete value. Here the data are
presented in a way that exact measurement of units are clearly indicated.
There are definite difference between the variables of different groups of items. Each
class is distinct and separate from the other class. Non-continuity from one class to another
class exist. Data as such facts like the number of rooms in a house, the number of companies
registered in a country, the number of children in a family, etc.
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 2
I BSc Geography (English Medium) Dept.of Statistics
The process of preparing this type of distribution is very simple. We have just to count
the number of times a particular value is repeated, which is called the frequency of that class.
In order to facilitate counting prepare a column of tallies.
In another column, place all possible values of variable from the lowest to the highest.
Then put a bar (Vertical line) opposite the particular value to which it relates.
To facilitate counting, blocks of five bars are prepared and some space is left in
between each block. We finally count the number of bars and get frequency.
Example:
In a survey of 40 families in a village, the number of children per family was recorded
and the following data obtained.
1 0 3 2 1 5 6 2
2 1 0 3 4 2 1 6
3 2 1 5 3 3 2 4
2 2 3 0 2 1 4 5
3 3 4 4 1 2 4 5
Number of Tally
Frequency
Children Marks
0 3
1 7
2 10
3 8
4 6
5 4
6 2
Total 40
In this form of distribution refers to groups of values. This becomes necessary in the case
of some variables which can take any fractional value and in which case an exact measurement
is not possible. Hence a discrete variable can be presented in the form of a continuous frequency
distribution.
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 3
I BSc Geography (English Medium) Dept.of Statistics
Nature of class:
The following are some basic technical terms when a continuous frequency distribution
is formed or data are classified according to class intervals.
a) Class limits:
The class limits are the lowest and the highest values that can be included in the class.
For example, take the class 30-40. The lowest value of the class is 30 and highest class is 40.
The two boundaries of class are known as the lower limits and the upper limit of the class. The
lower limit of a class is the value below which there can be no item in the class. The upper limit
of a class is the value above which there can be no item to that class. Of the class 60-79, 60 is the
lower limit and 79 is the upper limit, i.e. in the case there can be no value which is less than 60
or more than 79. The way in which class limits are stated depends upon the nature of the data.
In statistical calculations, lower class limit is denoted by L and upper class limit by U.
b) Class Interval:
The class interval may be defined as the size of each grouping of data. For example, 50-
75, 75-100, 100-125… are class intervals. Each grouping begins with the lower limit of a class
interval and ends at the lower limit of the next succeeding class interval
c) Width or size of the class interval:
The difference between the lower and upper class limits is called Width or size of class
interval and is denoted by ‘ C’ .
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 4
I BSc Geography (English Medium) Dept.of Statistics
d) Range:
The difference between largest and smallest value of the observation is called The Range
and is denoted by ‘ R’ ie
f) Frequency:
Number of observations falling within a particular class interval is called frequency of that
class. Let us consider the frequency distribution of weights of persons working in a company.
Weight Number of
(in Kgs) persons
30-40 25
40-50 53
50-60 77
60-70 95
70-80 80
80-90 60
90-100 30
Total 420
In the above example, the class frequencies are 25,53,77,95,80,60,30. The total frequency
is equal to 420. The total frequency indicates the total number of observations considered in a
frequency distribution.
g) Number of class intervals:
distributive in the whole data, we choose the lowest and the highest of the values. The difference
between them will enable us to decide the class intervals.
Thus the number of class intervals can be fixed arbitrarily keeping in view the nature of
problem under study or it can be decided with the help of Sturges’ Rule. According to him, the
number of classes can be determined by the formula
K = 1 + 3. 322 log10N
where N = Total number of observations
log = logarithm of the number
K = Number of class intervals.
Thus, if the number of observation is 10,
then the number of class intervals is K = 1 + 3. 322 log 10 = 4.322 4
If 100 observations are being studied,
the number of class interval is K = 1 + 3. 322 log 100 = 7.644 8 and so on.
The exclusive method ensures continuity of data as much as the upper limit of one class
is the lower limit of the next class. In the above example, there are so familieswhose expenditure
is between Rs.0 and Rs.4999.99. A family whose expenditure is Rs.5000 would be included in
the class interval 5000-10000. This method is widely used in practice.
b) Inclusive method:
In this method, the overlapping of the class intervals is avoided. Both the lower and upper
limits are included in the class interval. This type of classification may be used for a grouped
frequency distribution for discrete variable like members in a family, number of workers in a
factory etc., where the variable may take only integral values. It cannot be used with fractional
values like age, height, weight etc.
This method may be illustrated as follows:
Thus, to decide whether to use the inclusive method or the exclusive method, it is
important to determine whether the variable under observation in a continuous or discrete one.
In case of continuous variables, the exclusive method must be used. The inclusive method
should be used in case of discrete variable.
c) Open end classes:
A class limit is missing either at the lower end of the first class interval or at the upper
end of the last class interval or both are not specified. The necessity of open end classes arises
in a number of practical situations, particularly relating to economic and medical data when
there are few very high values or few very low values which are far apart from the majority of
observations.
The example for the open-end classes as follows:
42 62 46 54 41 37 54 44 32 45
47 50 58 49 51 42 46 37 42 39
54 39 51 58 47 64 43 48 49 48
49 61 41 40 58 49 59 57 57 34
56 38 45 52 46 40 63 41 51 41
Here the size of the class interval as per Sturges rule is obtained as follows
Range
Size of class interval = C =
1 + 3.322 log N
64 − 32 32
= = 5
1 + 3.322 log(50) 6.64
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 8
I BSc Geography (English Medium) Dept.of Statistics
Thus, the number of class interval is 7 and size of each class is 5. The required size
of each class is 5. The required frequency distribution is prepared using tally marks as given
below:
Class Tally
Frequency
Interval Marks
30-35 2
35-40 6
40-45 12
45-50 14
50-55 6
55-60 6
60-65 4
Total 50
Example:
43 18 25 18 39 44 19 20 20 26
40 45 38 25 13 14 27 41 42 17
34 31 32 27 33 37 25 26 32 25
33 34 35 46 29 34 31 34 35 24
28 30 41 32 29 28 30 31 30 34
31 35 36 29 26 32 36 35 36 37
32 23 22 29 33 37 33 27 24 36
23 42 29 37 29 23 44 41 45 39
21 21 42 22 28 22 15 16 17 28
22 29 35 31 27 40 23 32 40 37
Construct frequency distribution with inclusive type of class interval. Also find.
a. How many workers produced more than 38 tools?
b. How many workers produced less than 23 tools?
Solution:
Using Sturges formula for determining the number of class intervals, we
have
Number of class intervals = 1+ 3.322 log10N
= 1+ 3.322 log10100
= 7.6
Sizes of class interval = Range
Number of class interval
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 9
I BSc Geography (English Medium) Dept.of Statistics
46 −13
=
7.6
5
Hence taking the magnitude of class intervals as 5, we have 7 classes 13-17, 18-22…
43-47 are the classes by inclusive type. Using tally marks, the required frequency distribution
is obtain in the following table
A diagram is a visual from for presentation of statistical data, highlighting their basic
facts and relationship. If we draw diagrams on the basis of the data collected they will easily be
understood and appreciated by all. It is readily intelligible and save a considerable amount of
time and energy.
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 10
I BSc Geography (English Medium) Dept.of Statistics
Types of diagrams:
In practice, a very large variety of diagrams are in use and new ones are constantly being
added. For the sake of convenience and simplicity, they may be divided under the following
heads:
A.One-dimensional diagrams
B. Two-dimensional diagrams
C. Three-dimensional diagrams
D.Pictograms and Cartograms
A. One-dimensional diagrams:
In such diagrams, only one-dimensional measurement, i.e height is used and the width
is not considered. These diagrams are in the form of bar or line charts and can be classified as
(i) Line Diagram
(ii) Simple Diagram
(iii) Multiple Bar Diagram
(iv) Sub-divided Bar Diagram
(v) Percentage Bar Diagram
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 11
I BSc Geography (English Medium) Dept.of Statistics
Line Diagram:
Line diagram is used in case where there are many items to be shown and there is not
much of difference in their values. Such diagram is prepared by drawing a vertical line for each
item according to the scale. The distance between lines is kept uniform. Line diagram makes
comparison easy, but it is less attractive.
Example: Show the following data by a line chart:
No. of children 0 1 2 3 4 5
Frequency 10 14 9 6 4 2
Line Diagram
16
14
12
Frequency
10
8
6
4
2
0
0 1 2 3 4 5 6
No. of Children
Simple bar diagram can be drawn either on horizontal or vertical base, but bars on
horizontal base more common. Bars must be uniform width and intervening space between bars
must be equal. While constructing a simple bar diagram, the scale is determined on the basis of
the highest value in the series.
To make the diagram attractive, the bars can be coloured. Bar diagram are used in
business and economics. However, an important limitation of such diagrams is that they can
present only one classification or one category of data. For example, while presenting the
population for the last five decades, one can only depict the total population in the simple bar
diagrams, and not its sex-wise distribution.
Example: Represent the following data by a bar diagram.
Solution :
Simple Bar Diagram
60
50
40
(in tonnes)
Production
30
20
10
0
1991 1992 1993 1994 1995
Year
Multiple bar diagram is used for comparing two or more sets of statistical data. Bars are
constructed side by side to represent the set of values for comparison. In order to distinguish
bars, they may be either differently coloured or there should be different types of crossings or
dotting, etc. An index is also prepared to identify the meaning of different colours or dottings.
Year Profit before tax (in lakhs of rupees) Profit after tax (in lakhs of rupees)
1998 195 80
1999 200 87
2000 165 45
2001 140 32
Solution :
Multiple Bar Diagram
200
180
160
Profit (in Rs)
140
120
100
80
60
40
20
0
1998 1999 2000 2001
Year
Profit before tax Profit after tax
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 13
I BSc Geography (English Medium) Dept.of Statistics
This is another form of component bar diagram. Here the components are not the actual
values but percentages of the whole. The main difference between the sub-divided bar diagram
and percentage bar diagram is that in the former the bars are of different heights since their
totals may be different whereas in the latter the bars are of equal height since each bar represents
100 percent. In the case of data having sub-division, percentage bar diagram will be more
appealing than sub-divided bar diagram.
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 14
I BSc Geography (English Medium) Dept.of Statistics
B. Two-dimensional Diagrams:
In one-dimensional diagrams, only length 9 is considered. But, in two-dimensional
diagrams, the area represent the data and so the length and breadth have both to be taken into
account. Such diagrams are also called area diagrams or surface diagrams. The important types
of area diagrams are:
a) Rectangles b) Squares c) Circles or Pie-diagrams
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 15
I BSc Geography (English Medium) Dept.of Statistics
Rectangles:
Rectangles are used to represent the relative magnitude of two or more values. The area
of the rectangles are kept in proportion to the values. Rectangles are placed side by side for
comparison. When two sets of figures are to be represented by rectangles, either of the two
methods may be adopted.
We may represent the figures as they are given or may convert them to percentages
and then subdivide the length into various components. Thus the percentage sub-divided
rectangular diagram is more popular than sub-divided rectangular since it enables comparison
to be made on a percentage basis.
Example:
100
80
Percentage
60
40
20
0
Family A (0-5000) Family B (0-8000)
Squares:
The rectangular method of diagrammatic presentation is difficult to use where the values
of items vary widely. The method of drawing a square diagram is very simple. One has to take
the square root of the values of various item that are to be shown in the diagrams and then select
a suitable scale to draw the squares.
Example:
4 cm
3.5 cm
3 cm
2.5 cm
2 cm
While making comparisons, pie diagrams should be used on a percentage basis and not
on an absolute basis. In constructing a pie diagram the first step is to prepare the data so that
various components values can be transposed into corresponding degrees on the circle.
The second step is to draw a circle of appropriate size with a compass. The size of
the radius depends upon the available space and other factors of presentation. The third step
is to measure points on the circle and representing the size of each sector with the help of a
protractor.
Example: Draw a Pie diagram for the following data of production of sugar in quintals of various
countries.
Country Production of Sugar
(in quintals)
Cuba 62
Australia 47
India 35
Japan 16
Egypt 6
Solution:
Production of Sugar
Country
In Quintals In Degrees
Cuba 62 134
Australia 47 102
India 35 76
Japan 16 35
Egypt 6 13
Total 166 360
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 18
I BSc Geography (English Medium) Dept.of Statistics
Pie Diagram
C. Three-dimensional diagrams:
4 cm 3 cm 2 cm
Solution :
HISTOGRAM
30
25
Number of Workers
20
15
10
0
50 100 150 200 250
Solution:
For drawing a histogram, the frequency distribution should be continuous. If it is not continuous,
then first make it continuous as follows.
HISTOGRAM
35
30
25
Number of Students
20
15
10
0
20.5 30.5 40.5 50.5 60.5 70.5 80.5
Marks
Profits Number of
(in lakhs) Companies
0-10 4
10-20 12
20-30 24
30-50 32
50-80 18
80-90 9
90-100 3
Solution:
When the class intervals are unequal, a correction for unequal class intervals must be
made. The frequencies are adjusted as follows: The frequency of the class 30-50 shall be divided
by two since the class interval is in double. Similarly, the class interval 50- 80 can be divided
by 3. Then draw the histogram.
Profits Number of
(in lakhs) Companies
0-10 4
10-20 12
20-30 24
30-40 16
40-50 16
50-60 6
60-70 6
70-80 6
80-90 9
90-100 3
HISTOGRAM
30
No. of Companies
25
20
15
10
0
10 20 30 40 50 60 70 80 90 100
Frequency Polygon:
If we mark the midpoints of the top horizontal sides of the rectangles in a histogram and
join them by a straight line, the figure so formed is called a Frequency Polygon. This is done
under the assumption that the frequencies in a class interval are evenly distributed throughout
the class. The area of the polygon is equal to the area of the histogram, because the area left
outside is just equal to the area included in it.
Example: Draw a frequency polygon for the following data.
Weight (in kg) Number of Students
30-35 4
35-40 7
40-45 10
45-50 18
50-55 14
55-60 8
60-65 3
requency Polygon
20
18
16
14
Number of Students
12
10
0
30 35 40 45 50 55 60 65
Frequency Curve:
If the middle point of the upper boundaries of the rectangles of a histogram is corrected
by a smooth freehand curve, then that diagram is called frequency curve. The curve should
begin and end at the base line.
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 24
I BSc Geography (English Medium) Dept.of Statistics
Solution:
Frequency Curve
80
70
60
No. of Family
50
40
30
20
10
0
1000 2000 3000 4000 5000 6000 7000 8000
14BGE14A: Allied: Statistics-I UNIT-II Handled & Prepared by: Dr.S.RaviSankar Page: 25
I BSc Geography (English Medium) Dept.of Statistics
Ogive curves
Ogive Curves:
In less than ogive method we start with the upper limits of the classes and go adding
the frequencies. When these frequencies are plotted, we get a rising curve. In more than
ogive method, we start with the lower limits of the classes and from the total frequencies
we subtract the frequency of each class. When these frequencies are plotted we get a
declining curve.
Example: