Ch.04 Organisation of Data
Ch.04 Organisation of Data
Ch.04
ORGANISATION OF DATA
When an investigator collects data for an investigation, these are just raw data. Raw data are not
capable of offering any meaningful conclusion. Data are to be organised before these are
presented for final observations or conclusions. “Organisation of the data refers to the
arrangement of figures in such a form that comparison of the mass of similar data may be
facilitated and further analysis may be possible.” An important method of organisation of data is to
distribute these into different classes on the basis of their characteristics. This process is called
classification of data.
WHAT IS CLASSIFICATION?
Classification is the process of arranging things (either actually or notionally) in groups or classes
according to their resemblances and affinities and gives expression to the unity of attributes may
exist amongst a diversity of individuals. “This definition suggests two important features of
classification:
i. Data are divided into different groups. For example, on the basis of education, persons
may be classified as educated and uneducated.
ii. Data are grouped or classified on the basis of their class similarities. All similar units are
put in one class and as the similarity changes, class also changes.
Objectives of classification
1. Brief and Simple
2. Utility
3. Distinctiveness
4. Comparability
5. Scientific Arrangement
6. Attractive and Effective
Basis of Classification
Simple Manifold
2
2. Chronological Classification: When data are classified on the basis of time, it is known as
chronological classification. This is illustrated in the following Table 2.
Skilled Unskilled
2. CONCEPT OF VARIABLE
A characteristic or a phenomenon which is capable of being measured and changes its
value overtime is called variable. Thus, a variable refers to that quantity which is subject to
change and which can be measured by some unit. If we measure the weight of students of
Class XI, then the weight of the students will be called variables.
4
1. Discrete Variable: Discrete variable are those variables that increase in jumps or in
complete numbers. For example, the number of students in class XI could be
1 1 3
1,2,3,10,11,15 or 20 etc. but cannot be 1 ,1 , 1 , etc .
4 2 4
2. Continuous Variable: Variables that assume a range of values or increase not in
jumps but continuously or in fractions are called continuous variables. For example,
height of the boys in a school is expressed as 5’1”, 5’2” 5’3”, and so on.
In short, while the values of discrete variables are in complete numbers (1,2,3, etc.),
values of continuous variables are in fractions (5’4”, 5’2”, etc.) or are in any range
such as 10-15, 15-20, etc.
3. RAW DATA
a mass of data in its crude form is called raw data. It is an unorganised mass of the various
items. These are yet to be organised by the investigator.
30 20 40 20 15 20
25 10 20 15 25 20
15 45 10 30 20 25
30 20 30 20 15 35
25 10 25 15 35 10
Data presented in this table are raw data. These are not homogeneous data or the data
classified into different groups or classes with similarities. No meaningful conclusion is
possible from this data. To draw any conclusion from these data, an investigator has to first
organise them. To draw any conclusion from these data, an investigator has to first
organise them. To do so, an investigator has to classify the same in the form of series.
“A series as used statistically may be defined as things or attributes of things arranged
according to some logical order.”
1. Individual Series
Individual series are those series in which the items are listed singly. These series may
be presented in two ways:
i. According to series Numbers: One way of presenting an individual series is that
all the items are arranged in a serial order.
Organisation of data in the form of individual series is a very simple form of presentation
of data. But this method is not of much use when the number of items is very large.
2. Frequency Series
Frequency series or series with frequencies may be of two types:
i. Discrete series or frequency Array, and
ii. Frequency Distribution. Before we discuss these two types of series, let us
understand the meaning of the following terms:
(a). Frequency: Frequency is the number of times an item occurs (or repeats
itself) in the series.
(b).Class frequency: The number of times an item repeat itself corresponding to
a range of value (or class interval) is called class frequency. For example, if
there are 4 students securing marks between10-15, then 4 is the frequency
corresponding to the class interval 10-15. Thus, 4 will be called class
frequency.
(c). Tally Bars: Every time an item occurs, a tally bar ( ) is marked against that
item. Thus, making a group of five, i.e., IIII. This method of marking and
counting is known as Four and cross method.
Four and Cross Method of converting Raw Data into Frequency series (data
in Table 5 or 6)
20 IIII III 8
25 IIII 5
30 IIII 4
35 II 2
40 I 1
45 I 1
Illustration.
Twenty students of class XI have secured the following marks:
11, 12, 14, 16, 11, 17, 16, 17, 14
17, 18, 20, 14, 20, 17, 20, 17, 14, 20.
Marks Frequency
10-15 4
15-20 5
20-25 8
25-30 5
30-35 4
35-40 2
40-45 1
45-50 1
Some important Terms
7
i. Class: A range of values which incorporate a set of items is called a class. For example, 5-10,
10-15 are the classes.
ii. Class Limits: The extreme values of a class are limits. Every class interval has two limits, lower
limit and upper limit. Of the class interval 5-10 in the above example, the lower limit is 5 and
he upper limit is 10.
iii. Magnitude of a class interval: Magnitude of a class interval is the difference between the upper
limit and he lower limit of a class. For example, in a class interval 10-15, the magnitude of the
class interval would be 15-10=5. Thus,
Magnitude of a class interval (i) = Upper limit ( l 2 )−¿ Lower limit (l 1)
iv. Mid-value: Mid-value is the average value of the upper and lower limits. It is known by adding
up the upper limit and lower limit values and dividing the total by 2. Thus,
l 2+ l1
m=
2
Frequency Distribution
1. Exclusive Series
Exclusive series is that series in which every class interval exclusive items corresponding to
its upper limit.
Exclusive Series
Marks Frequency
10-15 4
15-20 5
20-25 8
25-30 5
30-35 4
Total = 26
2. Inclusive Series
8
An inclusive series is that series which includes all items up-to its upper limit. In such
series, the upper limit of class interval does not repeat itself as a lower limit of the next
class interval. Thus, there is a gap between the upper limit of a class interval and the lower
limit of the next class interval.
In short, while in the exclusive series there is an overlapping of the class limits (upper class
limit of one class interval being the lower class limit of the next class interval), there is no
such overlapping in the inclusive series.
Inclusive Series
Marks Frequency
10-14 4
15-19 5
20-24 8
25-29 5
30-34 4
Total = 26
Marks Frequency
9.5-14.5 4
14.5-19.5 5
19.5 -24.5 8
24.5-29.5 5
29.5-34.5 4
Marks Frequency
Below 5 1
5-10 3
10-15 4
9
15-20 6
20 and above 1
Marks Frequency
5-10 3
10-15 8
15-20 9
20-25 4
25-30 4
Method I Method II
Marks Number of Students Marks Number of Students
Less than 10 0+3=3 More than 5 28
Less than 15 3+8=11 More than 10 28 -3=25
Less than 20 11+9=20 More than 15 25 -8=17
Less than 25 20+4=24 More than 20 17 9=8
Less than 30 24+4=28 More than 25 8 -4=4
Illustration.
Convert the following cumulative frequency series into a simple frequency series.
4 students obtained less than 10 marks
20 students obtained less than 20 marks
40 students obtained less than 30 marks
48 students obtained less than 40 marks
50 students obtained less than 50 marks
Illustration
Mid-value 5 15 25 35 45
Frequency 6 5 11 9 8
Such series may be converted into simple frequency series using the following method: (i)
First, mutual difference between mid-values (i ) is determined; and (ii) Second, the
( )
1
difference so obtained is reduced to half i which when deducted from the mid-value
2
gives lower limit of the class interval and when added to the mid-value gives the
corresponding upper limit.
1
Thus, Lower limit; l 1=m− i
2
1
Upper limit: l 2=¿ m+ i
2
Where, m=¿ mid-value; i=¿ difference between mid-values; l 1=lower limit and
l 2=upper limit .
10 10
5 6 0-10 l 1=5− =0 ,l 2=5+ =10
2 2
10 10
15 5 10-20 l 1=15− =10 , l 2=15+ =20
2 2
10 10
25 11 20-30 l 1=25− =20 ,l 2=25+ =30
2 2
10 10
35 9 30-40 l 1=35− =30 , l 2=35+ =40
2 2
10 10
45 8 40-50 l 1=45− =40 , l 2=45+ =50
2 2