318 Economics Eng Lesson6
318 Economics Eng Lesson6
Introduction to Statistics
6
Notes
COLLECTION AND
CLASSIFICATION OF DATA
In the previous lesson, you have learnt about the meaning and scope of statistics
and its need in Economics. In this lesson you will learn about the techniques of
collecting, organizing and condensing of data. These techniques are necessary for
making the statistical data meaningful.
OBJECTIVES
After completing this lesson, you will be able to:
z distinguish between primary and secondary data;
z list the methods of collecting primary data;
z give some examples of sources of secondary data;
z explain the concepts of an array, frequency array and frequency distribution;
z state different methods of constructing frequency distribution; and
z construct simple and cumulative frequency distributions from a given data.
68 ECONOMICS
Collection and Classification of Data MODULE - 3
Introduction to Statistics
Open Schooling employees by approaching them, then it is primary data for him.
Another way is to adopt the data already collected by someone else. The
investigator only adopts the data. Statistical information thus obtained is called
secondary data. The source of such information is called secondary source. For
example, if the investigator collects the information about the salaries of employees
of National Institute of Open Schooling from the salary register maintained by its
accounts branch, then it is secondary data for him. Notes
I. Published Sources
There are certain agencies which collect the data and publish them in the form of
either regular journals or reports. These agencies/sources are known as published
sources of data.
ECONOMICS 69
MODULE - 3 Collection and Classification of Data
Introduction to Statistics
In India some of the published sources are:
1. Central Statistical Organisation (CSO) : It publishes data on national
income, savings, capital formation etc., in a publication called National
Accounts Statistics.
2. National Sample Survey Organisation (NSSO) : This organization which
is under the Ministry of Finance provides data on all aspects of national
Notes economy, such as agriculture, industry, employment and poverty etc.
3. Reserve Bank of India (RBI) : It publishes financial statistics. Its publications
are Report on Currency and Finance, Reserve Bank of India Bulletin and
Statistical Tables Relating to Banks in India etc.
4. Labour Bureau : Its publications are Indian Labour Statistics, Indian Labour
Year Book and Indian Labour Journal.
5. Population Census : It is undertaken by the office of the Registrar General,
Census of India, Ministry of Home Affairs. It provides us statistics on
population, per capita income, literacy rate etc.
6. Papers and Magazines : Journals like ‘Capital’, ‘Commerce’, Economic and
Political Weekly’, and newspapers likes ‘The Economic Times’ etc. also
publish important statistical data.
70 ECONOMICS
Collection and Classification of Data MODULE - 3
Introduction to Statistics
6.2 ORGANISING AND CONDENSING DATA
Suppose a statistical investigator wants to analyse the marks obtained by 40
students in a class. He collects data and finds that marks obtained by 40 students
in the class are:
20 25 28 27 34 31 30 32 33 40
43 43 40 43 42 43 42 45 43 47 Notes
48 46 47 48 46 49 58 54 56 50
53 51 39 38 36 38 35 35 37
Put yourself in the position of investigator. In which aspect of this data you will be
interested? Perhaps you would be interested in knowing the highest marks
obtained by any student. You may also be interested to know the lowest marks
obtained by a student. Another point of interest can be the marks level around
which most of the students have obtained.
The above data are unorganized. To refine this data for comparison and analysis
it should be arranged in an orderly sequence or into groups on the basis of some
similarity. This whole process of arranging and grouping the data into some
meaningful arrangement is a first step towards analysis of data. Data can be
arranged in two forms: (a) Arrays and (b) Frequency distributions.
(a) Arrays
A method of presenting an individual series is a simple array of data. An orderly
arrangement of raw data is called ‘Array’. Arrays are of two types: (i) Simple array,
and (ii) Frequency array.
(i) Simple Array : A simple array is an arrangement of data in ascending or
descending order. Let us construct the simple arrays of the data about the
marks of 40 students. The data in table 6.1 is arranged in ascending order and
in table 6.2 in descending order.
Table 6.1: Ascending Array of the Marks obtained by 40 students in class
20 35 42 47
25 36 43 48
27 37 43 48
28 38 43 49
30 38 43 50
ECONOMICS 71
MODULE - 3 Collection and Classification of Data
Introduction to Statistics
31 39 43 51
32 40 45 53
33 40 46 54
34 40 46 56
Notes 35 42 47 58
72 ECONOMICS
Collection and Classification of Data MODULE - 3
Introduction to Statistics
2. Put the items in first column in a ascending order in such a way that one
item is reordered once only.
3. Prepare the tally sheet in second column marking one bar for one item.
Make blocks of five tally bars to avoid mistake in counting. Note that
every fifth bar is shown by crossing the previous four bars like e.g., ////.
4. Count the tally bars and record the total number in third column. This
column will represent the frequencies of corresponding items. Notes
Let us now explain construction of frequency array of the marks obtained by 40
students. In table 6.3 data about the marks is arranged in an ascending order in first
column. It helps to find not only the maximum and minimum values but also makes
it easy to draw bars.
Now for each mark level make one bar (/) in second column and cross the item from
the data.
Table 6.3 Frequency array of marks obtained by 40 students
ECONOMICS 73
MODULE - 3 Collection and Classification of Data
Introduction to Statistics
46 // 2
47 // 2
48 / 1
49 / 1
50 / 1
Notes 51 / 1
53 / 1
54 / 1
56 / 1
58 / 1
Total Frequency = 40
The main limitations of frequency array is that it does not give the idea of the
characteristics of a group. For example it does not tell us that how many students
have obtained marks between 40 and 45. Therefore it is not possible to compare
characteristics of different groups. This limitation is removed by frequency
distribution.
74 ECONOMICS
Collection and Classification of Data MODULE - 3
Introduction to Statistics
1. Class : Class is a group of magnitudes having two ends called class limits. For
example, 20-25, 25-30 etc. or 20-24, 25-29 etc. as the case may be, each
represents a class.
2. Class Limits : Every class has two boundaries or limits called lower limit (L1)
and upper limit (L2). For example in the class (20-30) L1 = 20 and L2 = 30.
3. Class Interval : The difference between two limits of a class is called class
interval. It is equal to upper limit minus lower limit. It is also called class width. Notes
Class interval = L2 – L1. For 30 – 20 =10.
4. Class Frequency : Total number of items falling in a class that is having the
value within L1 and L2 is class frequency. For example in table 6.4 class
frequency in class (40-45) is 10. Similarly in class (50-55) the frequency is 4.
5. Mid-Point/Mid-Value(M.V.) : The mid-value of the class interval of a class
also called as mid-point is obtained by dividing the sum of lower limit and upper
limit of the class by 2. It is the average value of two limits of a class. It falls just
in the middle of a class is
L1 + L 2
M.V. =
2
20 30
For example, the mid-value of class (20-30) is = 25
2
ECONOMICS 75
MODULE - 3 Collection and Classification of Data
Introduction to Statistics
Item having the value of 25 will be counted in next class of (25-30) as is clear from
the following example, Using the same data as given in making a frequency array and
taking class interval of 5, a frequency distribution of exclusive type will be as under:
Table 6.4: Construction of Frequency Distribution – “Exclusive Type”
Class Tally Sheet (Tallies) Frequency (f)
Notes 20-25 / 1
25-30 /// 3
30-35 //// 5
35-40 //// // 7
40-45 //// //// 10
45-50 //// /// 8
50-55 //// 4
55-60 // 2
Total Frequency = 40
(b) Inclusive Series : In this type the lower limit of next class is increased by one
over the upper limit of previous class. Both the items having value equal to
lower and upper limit of a class are counted or included in the same class. That
is why such a frequency distribution is called inclusive type. For example in
the class (20-24) both 20 and 24 will be included in the same class. Similarly
in the class (40-44) both 40 and 44 will be included. The following table has
been formed on the basis of same data as taken in the exclusive type.
Table 6.5: Construction of Frequency Distribution – “Inclusive Type”
Class Tally Sheet (Tallies) Frequency (f)
20-24 / 1
25-29 /// 3
30-34 //// 5
35-39 //// // 7
40-44 //// //// 10
45-49 //// /// 8
50-54 //// 4
55-59 // 2
Total Frequency = 40
76 ECONOMICS
Collection and Classification of Data MODULE - 3
Introduction to Statistics
(c) Open-end Classes : Open-end frequency distribution is one which has at
least one of its ends open. You will observe that either lower limit of first class
or upper limit of last class or both are not given in such series. In table 6.6 the
first class and the last class i.e. below 25 and 55 and above are open-end
classes.
Table 6.6: Open-end Classes Frequency Distribution
Notes
Class Tally Sheet Frequency (f)
Below-25 / 1
25-30 /// 3
30-35 //// 5
35-40 //// // 7
40-45 //// //// 10
45-50 //// /// 8
50-55 //// 4
55 and above // 2
Total Frequency = 40
(d) Unequal Classes : In case of unequal classes frequency distribution, the
width of different classes (i.e. L2-L1) need not be the same. In table 6.7, the
class (30 – 40 has width 10 while the class (40-55) has width 15.
Table 6.7: Unequal Classes Frequency Distribution
Class Tally Sheet Frequency (f)
20-25 / 1
25-30 /// 3
30-40 //// //// // 12
40-55 //// //// //// //// // 22
55-60 // 2
Total Frequency = 40
(e) Cumulative Frequency : A ‘Cumulative Frequency Distribution’ is formed
by taking successive totals of given frequencies. This can be done in two ways:
(i) From above, such as 1,4 (i.e. 1+3), 9(i.e. 4+5), 16 (i.e. 9+7), and so on.
ECONOMICS 77
MODULE - 3 Collection and Classification of Data
Introduction to Statistics
Such a distribution is called ‘Less-than’ culmulative frequency
distribution. It shows the total numbers of observations (frequencies)
having less than a particular value of the variable (here marks). For
example, there are 4 (i.e. 1+3) students who got marks less than 30; 9
(i.e. 4+5) students who got marks less than 35 and so on. Table 6.8 gives
the less-than cumulative frequency distribution.
Notes Table 6.8: ‘Less-than’ Cumulative Frequency Distribution
Marks Cumulative Frequency (cf)
Less than 25 1
Less than 30 4 (1+3)
Less than 35 9 (4+5)
Less than 40 16 (9+7)
Less than 45 26 (16+10)
Less than 50 34 (26+8)
Less than 55 38 (34+4)
Less than 60 40 (38+2)
(ii) From below, such as 2,6 (i.e. 2 + 4), 14 (i.e. 6+8), 24 (i.e. 14 + 10) and
so on. Such a distribution is called ‘More-than’ cumulative frequency
distribution. It shows the total number of observations (frequencies)
having more than a particular value of the variable (here marks). For
example there are 6 (i.e. 2 + 4) students who got marks more than 50,
14 (i.e. 2 + 4 + 8) students who got marks more than 45 etc. See table
6.9.
Table 6.9: ‘More-than’ Cumulative Frequency Distribution
Marks Cumulative Frequency (cf)
More than 20 40
More than 25 39 (40-1)
More than 30 36 (39-3)
More than35 31 (36-5)
More than 40 24 (31-7)
More than 45 14 (24-10)
More than 50 6 (14-8)
More than 55 2 (6-4)
78 ECONOMICS
Collection and Classification of Data MODULE - 3
Introduction to Statistics
ACTIVITY
1. Visit children in your neighbourhood and record the age of at least 30 of them
and then construct a frequency distribution of both exclusive as well as
inclusive types.
2. From daily newspapers, record maximum temperature of your city for 30 days.
Prepare at frequency distribution of both exclusive as well as inclusive types
with a class interval of 1.5 degrees Celsius and with at least 5 classes.
ECONOMICS 79
MODULE - 3 Collection and Classification of Data
Introduction to Statistics
and frequency distribution. Arrays can be (i) simple array or (ii) frequency
array.
z When simple frequencies are successively totaled, we get what is called
cumulative frequency distribution.
z To get frequency distribution we have to make use of tally sheet.
z Formation of frequency distribution requires important decisions regarding
Notes
number of classes, class limits and class width etc.
z A class is a group of magnitudes having two ends called class limits (L1 and L2),
L1 being lower limit and L2 the upper limit.
z Total number of cases falling in a particular class is called class frequency.
z We can form the following types of frequency distributions:
(a) exclusive type where the upper limit of the class is excluded and put in
the next class.
(b) inclusive type where the upper limit of the class is included in the same
class.
(c) Open-end like (below 25) and (55 and above).
(d) Unequal classes where class width or class interval of different classes
is different like (20 – 25), (25-30), (30-40)….
(e) Cumulative ‘Less-than’ and ‘More-than’ where simple frequencies are
successively totaled from above and from below respectively.
Cumulative: means successive totaling. That is, something increasing in quantity
by one addition after another.
Condensation: putting huge quantity of data in some useful, short or brief form
without losing its utility.
Respondent: is a person who responds or answers to some questions raised.
When an investigator approaches a person with a questionnaire, the person who
answers these questions is called respondent.
Sequence: in ordinary language means connected line of events or ideas. In
statistics it means a series formed on some such principle e.g. sequence of numbers
from, say 2, with a difference of say, 5 That is 2, 7, 12, 17, 22 etc. These numbers
are in ascending order of sequences. If we put them in the form (reverse sequence)
………….. 22, 17, 12, 7, 2, these are called in descending order of sequence.
Tally Sheet: is a statement where occurrence of each value of a series is recorded
by making one bar. (/)
Data: means statistical information on population, employment, prices, exports,
imports etc. that has been collected, analysed and published by government
departments, commercial and industrial associations, and other research agencies.
80 ECONOMICS
Collection and Classification of Data MODULE - 3
Introduction to Statistics
TERMINAL EXERCISE
1. Distinguish between primary and secondary data. Describe the methods for
collecting primary data.
2. What is secondary data? Name some of its sources in India.
3. Distribution between simple array and frequency array with examples. Notes
4. On the basis of the following data about the wages of 20 workers in a factory,
prepare a frequency array; 450, 580,600, 480, 540, 620, 400, 475, 500, 480,
620, 480, 570, 600, 650, 410, 550, 600, 650, 450.
5. Explain the concept of ‘frequency distribution’. How is it different from
‘frequency array.?
6. On the basis of data in question 4, prepare a frequency distribution by exclusive
method.
7. Distinguish between ‘exclusive method’ and ‘inclusive method’ of frequency
distribution with examples.
8. Write short notes on:
(a) Open-end frequency distribution.
(b) Frequency distribution with unequal classes.
(c) Cumulative frequency distribution.
6.1
1. (a) Primary (b) Investigator (c) National income
2. (a) False (b) False (c) True
6.2
(a) either ascending or descending order (b) small
(c) frequency (d) does not give
6.3
(a) classifies (b) class interval
(c) next (d) same
(e) cumulative.
ECONOMICS 81
MODULE - 3 Collection and Classification of Data
Introduction to Statistics
Terminal Exercise
1. Read section 6.1(a) and (b)
2. Read section 6.1 (a) and (c)
3. Read section 6.2(a)
(i) Arrange the data in ascending order:
Notes
400 480 550 600
410 480 570 620
450 480 580 620
450 500 600 650
475 540 600 650
(ii) Prepare a tally sheet.
`.)
Income (` Tallies Frequency (f)
400 / 1
410 / 1
450 // 2
475 / 1
480 /// 3
500 / 1
540 / 1
550 / 1
570 / 1
580 / 1
600 /// 3
620 // 2
650 // 2
Total Frequency = 20
5. Read section 6.2 and 6.3
82 ECONOMICS
Collection and Classification of Data MODULE - 3
Introduction to Statistics
6. First two steps have already been explained in answer to question 4. The third
step is as follows.
Income groups (Rs.) Frequency (f)
400-450 2
450-500 6
Notes
500-550 2
550-600 3
600-650 5
650-700 2
Total Frequency = 20
7. Read section 6.3 (a) and (b)
8. (a) Read section 6.3 (c)
(b) Read section 6.3 (d)
(c) Read section 6.3 (e)
ECONOMICS 83