12-12-2022 Types of Biological Data 1
12-12-2022 Types of Biological Data 1
Data are the raw materials, essential for statistics which consists of numbers.
A data can be obtained based on measurements such as blood glucose, Height or weight measured
using instruments or it can be based on counts. The data can be any group of measurements of interest.
All the characteristics individually studied are referred to as datum and collectively is called as data.
Statistics deals with collection, organization, summarization and analysis of data. Statistics is the use of
data to reach better decision by a decision maker. It deals with drawing of inference about a portion of
data.
In Biostatistics, data are obtained from Plant Sciences, Animal Sciences and Medicine to manage a
condition which is not certain.
Data in broad is classified into qualitative and quantitative data. Qualitative data is based on non-
numerical features or qualititative observation of elementary units called as attributes that can be
assessed as present or absent whereas quantitative data possess numerical features.
A data on a nominal scale can neither be graded nor ranked which is described by its name and at times
can have a dichotomous classification or a binary classification.
A data based on a metric scale measurement is a quantitative data which is represented with the unit of
measurement.
To differentiate between nominal and ordinal data and quantitative data, let us consider the example of
clinical condition diabetes.
The occurrence of the disease represents it as a nominal data. Further classification of the condition as
present or absent is a binary data and to further characterize the extent of disease as borderline, mild,
moderate, severe and grade or rank the condition specifies it as an ordinal data which can be denoted as
codes 0,1,2 while tabulating the data for convenience. A quantitative data can be denoted as ordinal
data. Exact measurement of blood glucose is metric scale. In metric measurements, accuracy is based on
approximation and not required more than needed.
A qualitative data is easy to obtain. Different types of data have various roles to manage an uncertain
condition.
Blood pressure is measured using sphygmomanometer. For more accuracy it can be measured to
nearest mm Hg but it need not be measured in decimals. Quantitative presentation like 96 mm Hg and
98 mm Hg (diastolic pressure) is required to monitor the prognosis of a patient .But it can also be
presented for convenience as mild hypertension, moderate and severe.
Blood biochemical parameters like cholesterol, creatinine, phosphorus, Hemoglobin, various enzymes
like Creatine phosphokinase, Lactate dehydrogenase, Serum glutamate oxaloacetate transaminase,
enzymes which increase after myocardial infarction are measured quantitatively with specific
instruments and have to be represented as a data with appropriate SI unit as IU or katal.
The characteristics that differ from one biological entity to other are called variable. The number of
leaves counted in a group of plants and the height measured are various characteristics. Variable refers
to the characteristic that may take different values at different place, time and situation. It varies in
amount and magnitude in a frequency distribution.
A variable can be qualitative or quantitative. When two possibilities exist, measurement is binary.
Gender distinguished as male or female, Recovery or non-recovery from disease.
Data is categorized on the basis of characteristic studied. When the value of variable is finite and cannot
assume fractional value and decimals the variable is discrete. On the other hand a continuous variable
can manifest every possible fractional value. Ex. Height and weight of persons. The series of data
obtained for a discrete variable is a discrete frequency distribution obtained based on counts and the
series obtained for a continuous variable is a continuous frequency distribution obtained by
measurement.
The number of observations in each class is frequency. A frequency distribution is a table in which data
is grouped into classes and the number in each class is recorded. If the numbers of items are expressed
by proportion in each class, the distribution is a relative frequency distribution or percentage
distribution.
The following terms are to be understood when a continuous frequency distribution is formed or when
data classification is done according to class intervals.
Class limits: The lowest and the highest value included in each class constitute the class limit. If the
weight is recorded as 4.0 to 4.2, the lowest value is 4 and the highest 4.2. The two boundaries are
known as the upper and the lower limit. The lower limit is the value below which there can be no
observation in the particular class and the upper limit is the value above which there can be any
observation.
Class intervals:
It is the difference between the upper and the lower limit in the class. If the class is 50 to 100, the class
interval is 50. The width of the class interval is important when a continuous frequency distribution is
made. It depends on the following factors.
The starting class would be 3.5 to 3.65 next 3.65 to 8 and so on.How to fix the number of classes ?
The number of classes can be fixed arbitrarily keeping the nature of problem or based on Sturge’s rule.
K = 1 + 3.322 log N
The number of classes shall be between 4 to 20 . It cannot be less than 4 even if number of observations
is less than 10 and if N is 10 lakh, k will be 1+3.322 x6 = 20.932.
where range is the difference between the large and small value. In the above example taken,
i = 5 – 3.5/1 + 3.322 x 2 = 1.5 / 7.644 = 0.196 or 0.2 . If we take a class interval of 0.2, the number of
classes formed would be 1.5/0.2 = 7.5 or 8
The application of the above formula may give a value involving fractions which has to be approximated.
Class frequency
The number of observations corresponding to a particular class is frequency. In the above example if the
number of chicks between 3.5 and 3.7 is 15, it implies that the frequency is 15.
If all the individual frequencies are added together, the total frequency is obtained. The total frequency
is 100.
It is the value lying halfway between the upper limit and the lower limit.
Mid-point of a class = upper limit of the class + lower limit of the class / 2 .
For further calculations, the midpoint is considered to represent the class in a continuous data.
There are two methods of classifying the data according to class intervals.
1. Exclusive method
2. Inclusive method
Exclusive method
If the class intervals are fixed and the upper limit of one class becomes the lower limit of the next class,
it is exclusive method of classification.
Weight
3.5-3.7
3.7-3.9
3.9 – 4.1
Exclusive method ensures continuity of data so that upper limit of one class becomes the lower limit of
next class. Thus in the above example, there are 15 chicks whose weight lies between 3.5 to 3.699 kg. A
chick whose weight is 3.7 would be included in the class 3.7 to 3.9 kg. This method is widely followed in
statistics. It would be confusing to a layman who has no statistical awareness. If the class intervals upper
and lower limit are repetitive , exclusive method of data entry should be followed. The upper limit is
always exclusive and the item with that value is not included in that class.
Inclusive method
In Inclusive method, upper limit of one class is included in that class itself. In the above example, if
the class is 3.5 to 3.6, weight s of chicks between 3.51 to 3.69 would be included in this class and a
chick with weight of 3.7 would be placed in the next class .
3.5-3.69
3.7- 3.89
To decide whether exclusive or inclusive method, it depends on whether the variable is continuous
or discrete. For continuous variable, exclusive method is preferred and for discrete variable,
inclusive method is applicable.
To ensure continuity of data recorded, an inclusive data is converted to exclusive data by using the
correction factor.
Correction factor = Upper limit of succeeding class – Lower limit of preceding class /2
The obtained value is subtracted from all the lower limits and added to all the upper limits to obtain
the class intervals in exclusive form.
3.495 – 3.695
3.695 – 3.895
The difference between the limits in the inclusive method is 0.19 but adopting the correction factor
and conversion to exclusive form the difference between the limits becomes 0.2.
Ex.
CF = 10 -9 /2 = 0.5
0.5- 9.5
0.5 – 9.5 10
9.5 – 19.5 25
19.5 – 29.5 36
29.5 – 39.5 20
The class intervals should be in the form of 5, 10 or multiples of 5 to understand and stratify the
distribution.
If the salary of employees is the criteria of grouping the data, the range of limits can be open ended
to include few observations with small and large values at the end and class intervals of varying sizes
to include observations were most values fall and also to prevent constructing distribution with too
many classes.
< 5000 15
5000 – 15000 35
15000 – 25000 50
25000 –35000 80
35000 – 45000 65
45000 – 70000 20
70000 – 150000 10
>150000 5