0% found this document useful (0 votes)
11 views46 pages

Unit 1

Data

Uploaded by

hypertaker04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views46 pages

Unit 1

Data

Uploaded by

hypertaker04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

UNIT 1 TYPES OF DATA

Structure

1.0 Introduction
1.1 Objectives
1.2 Types of Data: Quantitative and Qualitative
1.3 Quantitative Data
1.3.1 Types of Quantitative Data
1.3.2 Tabulation and Organisation of Quantitative Data
A. Frequency Distribution
B. Cumulative Frequency Distribution
1.3.3 Graphical Presentation of Quantitative Data
1.3.4 Analysis of Quantitative Data
A. Measures of Central Tendency
B. Measures of Variability
C. Measures of Relative Position
D. Measures of Relationship
1.4 Qualitative Data
1.4.1 Organisation of Qualitative Data
1.4.2 Analysis of Qualitative Data
A. Content Analysis
B. Inductive Analysis
C. Logical Analysis
1.5 Let Us Sum Up
1.6 Glossary
1.7 Check Your Progress: The Key

1.0 INTRODUCTION
In Block 3, we dealt with the nature of various tools used in the collection of data. These
data are mostly expressed in quantified terms. However, quantitative data may not be
available in certain cases. In such a situation, the researcher has to consider the
phenomenon as a whole without breaking it down into measurable or quantifiable
components. Indeed, he/she should be familiar not only with the two types of data –
quantitative and qualitative, but also with the process of classifying data, graphical
representation and the various methods of data analysis.

The aim of this Unit is to make you understand the nature of quantitative and qualitative
data, the procedures for classifying and tabulating quantitative data, presenting them
graphically, and the various methods used in data analysis.

7
Data Analysis

1.1 OBJECTIVES

After the completion of this Unit, you should be able to:


• Define quantitative and qualitative data,
• Prepare various types of graphs and tables for presenting data,
• Compute measures of central tendency, variance, standard deviation, measures of
relative position and measures of relationships, and
• Describe various methods used for analysing qualitative data.

1.2 TYPES OF DATA: QUANTITATIVE AND


QUALITATIVE

The data collected through the administration of various types of tools on the selected
samples are of two types - qualitative and quantitative. In quantitative data,
numerical values are assigned to the characteristics or properties of objects or events,
according to logically accepted rules. It is a process wherein a number system like figures,
ratings or scores is imposed on empirical data. However, when the researcher takes into
consideration the phenomenon as a whole and does not attempt to analyse it in
measurable or quantifiable terms, the approach becomes ‘qualitative’. Generally, in
educational and behavioural sciences, both types of data, (i) quantitative and
(ii) qualitative, are recognised. We will look at the characteristics of both in the following
sections. Section 1.3 deals with quantitative data, their types, tabulation, frequency
distribution and cumulative frequency distributions, the need to represent data graphically,
the various types of graphs, and the methods of analysing quantitative data, viz., measures
of central tendency, variability, relative positions and relationship. In section 1.4, we shall
briefly discuss qualitative data and their analysis with reference to content analysis, logical
analysis and inductive analysis. The application of various parametric and non-parametric
tests is discussed in more detail in Unit 2 of this Block.

1.3 QUANTITATIVE DATA

Quantitative data describe an empirical event or phenomenon in a numerical system with


the help of different scales of measurement: nominal, ordinal, interval or ratio.
Nominal scales of measurement are used when a set of objects among two or more
categories is to be differentiated on the basis of certain clearly known characteristics. For
example, we may assign individuals categories such as sex (male and female), nationality
(French and Indian), level (school and higher), education (formal and non-formal), etc.
The ordinal scales of measurement correspond to quantitative classification of a set of
objects done with the help of ranking on a continuum. For example, we may rank students
according to their height from the lowest to the highest. The interval scale of
measurement is based on equal units of measurement.

It includes how much or how little of a given characteristic or attribute is present. For
example, the difference in the amount of an attribute possessed by individuals with test

8
Types of Data

scores of 60 and 61 is assumed to be equivalent to that between individuals with scores of


50 and 51. Ratio scale is the highest level of measurement. Since this scale assumes the
existence of absolute zero, this type of measurement is almost non-existent in educational
and psychological measurement. Thus, quantitative data are expressed in nominal, ordinal,
interval and ratio scales of measurement.

1.3.1 Types of Quantitative Data

Quantitative data may be classified into two categories:


(i) Parametric and
(ii) Non-parametric

Parametric data are obtained by applying interval or ratio scales of measurement. Scores
of the tests of ability, achievement, attitude, interest, values, personality etc. are examples
of interval scales of measurement. In the study of reaction time we use ratio scale. In this
type of experiment, the zero point in the absolute sense is known and it makes sense to
look at the ratio of the time taken to respond in different treatment situations.

Non-parametric data are either counted or ranked. In counted data, we make use of the
nominal scale. Each individual can be a member of only one category and all the members
of that category have the same, defined characteristics. For example, we may categorise
a group as a sample of ‘female students’ of a particular ‘study centre’ of an open
university. The categorisation of teachers at different educational levels—school, college
and university—is another example of nominal data.

In ranked data, we apply the ordinal scale of measurement. The sets or classes of objects
are ordered on a continuum in a series ranging from the lowest to the highest according to
the characteristics we wish to measure. The ranking of students in a class for height,
weight or academic achievement are examples of ordinal data.

Check Your Progress 1


Define quantitative data. Describe the various types of quantitative data along with
examples.
Notes: (a) Space is given below for your answer.
(b) Compare your answer with the one given at the end of the Unit.
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................
.............................................................................................................................

9
Data Analysis

1.3.2 Tabulation and Organisation of Quantitative Data

A. Frequency Distribution

Data collected from a test and by using other gathering/measuring tools are raw and may
have little meaning to the researcher until they are tabulated and organised in a systematic
order. One of the ways of doing so is to prepare a frequency distribution. The method
for tabulating the quantified data in a frequency distribution can be illustrated by
considering the following scores of 40 students of MA (DE) of the Indira Gandhi National
Open University in course MDE-412.

Table 1.1: Tabulation of Scores on a Test in Course MDE-412

57 70 80 82 87

60 72 80 82 88

64 73 80 82 87

67 70 78 80 93

67 76 77 84 95

62 76 78 85 97

61 75 80 85 98

63 70 78 85 90

It is difficult to see from the above table how the scores are distributed. Inspection of
scores, however, shows that many scores occur more than once.

You will observe that there are one 98, one 97, one 95, one 88, two 87s, three 85s, and so
on. For our convenience, you can arrange the data in columns as shown in Table 1.2. In
one column, you can arrange the marks in class-intervals and in the other, you can record
the number of students who have scored these marks by tallies. Inspection of the scores
in Table 1.1 shows that the highest score is 98 and the lowest is 57. The range is 41 (i.e.
98-57). Therefore, the distribution of scores can be conveniently arranged by dividing the
range of 41 into eight or more class-intervals if the classes are taken to be of 5 points
each. If you take the starting point at 56, the scores within the range 56 to 60, that is all
scores with the values 57 and 60 will be grouped together to form the lowest class-
interval. All scores from 61 to 65, that is, 61, 62, 63, 64 and 65 will form the next class-
interval. Similarly you shall group all scores within the ranges 66 to 70, 71 to 75 and so on.
The highest class interval will be 96-100.

10
Types of Data

Table 1.2: Frequency Distribution of the Scores of 40 Students:


Course MDE-412
Class Interval Tallies Frequency (f)

96 -100 II 2
91 - 95 II 2
86 - 90 IIII 4
81 - 85 IIII II 7
76 - 80 IIII IIII I 11
71 - 75 III 3
66 - 70 IIII 5
61 - 65 IIII 4
56 - 60 II 2
Total number of scores N = 40

In Table 1.2, the class-intervals have been arranged serially from the smallest at the
bottom to the largest at the top, each class-interval covering 5 scores. For each score, we
have marked a ‘tally’ against the corresponding class-interval. The first score, 57, is
represented by a tally placed against the class interval 56-60, the second score of 60 by a
‘tally’ marked against the class interval 56-60, and the third score 64 by a tally against the
class interval 61-65. The remaining scores have been tabulated in the same way. When all
the 40 scores are listed, the total number of tallies in each class-interval are counted and
written in the next column f. The total of ‘f’ gives the total number of scores (in the
present case 40) and is denoted by N.

It may be noted that the interval 56-60 takes care of all the scores from 56 upto 60. The
score of 56 ordinarily means the interval 55.5 to 56.5 and that the score of 60 means 59.5
to 60.5. The mid-point of the bottom most class-interval is 58. Hence, the distribution
represented in Table 1.2 may also be expressed as:

Table 1.3: Frequency Distribution of the Scores of 40 Students:


Course MDE-412

Score Intervals Exact Units of Class-Intervals Mid Point (x) (f)


96-100 95.5-100.5 98 2
91-95 90.5-95.5 93 2
86-90 85.5-90.5 88 4
81-85 80.5-85.5 83 7
76-80 75.5-80.5 78 11
71-75 70.5-75.5 73 3
66-70 65.5-70.5 68 5
61-65 60.5-65.5 63 4
56-60 55.5-60.5 58 2
N = 40

11
Data Analysis

Check Your Progress 2

Tabulate the following data in a frequency distribution using an interval of 5 units.

185 176 166 177 171


147 176 170 171 180
173 168 181 165 173
175 158 156 162 173
197 151 153 162 188
166 145 191 164 174
178 142 158 167 178
148 187 172 169 184
156 187 172 193 183
181 161 172 179 179

Notes: (a) Space is given below for writing your answer.


(b) See whether you have followed the procedure as shown in Tables 1.2
and 1.3.
(c) The answer is incorporated in Check Your Progress 3.

.........................................................................................................................................

.........................................................................................................................................

.........................................................................................................................................

.........................................................................................................................................

.........................................................................................................................................

.........................................................................................................................................

.........................................................................................................................................

.........................................................................................................................................

.........................................................................................................................................

.........................................................................................................................................

.........................................................................................................................................

.........................................................................................................................................

B. Cumulative Frequency Distribution

In some cases, you may not be concerned with the frequencies within the class-intervals,
but rather with the number or the percentage of values greater than or less than a

12
Types of Data

specified value. These values, called ‘cumulative frequencies’ or ‘cumulative percentage


frequencies’ are obtained by adding successively the individual frequencies of class-
intervals.

Table 1.4: Cumulative Frequency Distribution of the Test Scores of 40 Students:


Course MDE - 412

Score Exact Units of Frequency Cumulative Cumulative


Intervals Class-intervals (f) Frequency (F) Percentage
Frequency

96-100 95.5-100.5 2 38+ 2 = 40 100.00

91-95 90.5-95.5 2 36+ 2 = 38 95.00

86-90 85.5-90.5 4 32+ 4 = 36 90.00

81-85 80.5-85.5 7 25+ 7 = 32 80.00

76-80 75.5-80.5 11 14+11 = 25 62.50

71-75 70.5-75.5 3 11+ 3 = 14 35.00

66-70 65.5-70.5 5 6 + 5 = 11 27.50

61-65 60.5-65.5 4 2+4=6 15.00

56-60 55.5-60.5 2 2 5.00

N = 40

1.3.3 Graphical Presentation of Quantitative Data

Graphical presentation often facilitates understanding of a set of data. With the help of a
well-drawn graph, the data can be read and interpreted very easily. Brief descriptions of
the various types of graph which are useful in visualizing the important properties of a
frequency distribution are given below.

The following three types of graph are commonly used for the above mentioned purposes:
i) Histogram or column diagram
ii) Frequency polygon
iii) Cumulative percentage curve or ogive.

i) Histogram or column diagram

A histogram or column diagram is a graph in which class-intervals are represented


along the horizontal axis and their corresponding frequencies are represented by areas in
the form of rectangular vertical bars drawn on the intervals.

13
Data Analysis

The following steps are followed in preparing a histogram:

Step 1: A horizontal line is drawn at the bottom of a graph paper. Units representing
class-intervals are marked along this line.

Step 2: A vertical line is drawn at the left hand extreme of the horizontal axis. Along
this vertical axis, units representing individual frequencies of the class-intervals
are marked.

Step 3: Taking class units as bases, rectangles are drawn, such that the areas of
rectangles are proportional to the frequencies of the corresponding classes.

Let us consider the following data for drawing a histogram as an illustration of what you
have read above.

Table 1.5: Frequency Distribution of the Scores of 40 Students: Course MDE-412

Class Exact Units of Mid Point Frequency Cummulative Cummulative


Intervals Class Intervals (f) Frequency Percentage
(F) Frequency

35 - 39 34.5 - 39.5 37 4 40 100.00


30 - 34 29.5 - 34.5 32 8 36 90.00
25 - 29 24.5 - 29.5 27 11 28 70.00
20 - 24 19.5 - 24.5 22 8 17 42.50
15 - 19 14.5 - 19.5 17 6 9 2.50
10 - 14 9.5 -14.5 12 3 3 7.50

N = 40

The histogram drawn for the above data is shown in figure 1.


12

10
Y
8

0 9.5 14.5 19.5 24.5 29.5 34.5 39.5


X

Fig. 1: Histogram plotted from the data of Table 1.5

14
Types of Data

ii) Frequency Polygon


Frequency polygon is drawn by plotting the mid-point of each class-interval at a height
proportional to its respective frequency and then joining the points by straight lines. The
first two steps are identical to those used in the construction of a histogram. The next
step to be followed is given as under:

12

10

Y 8

2 12 17 22 27 32 37
0 4.5 9.5 14.5 19.5 24.5 29.5 34.5 39.5 44.5
X
Fig. 2: Frequency Ploygon plotted from the data of Table 1.5

Step 3: Directly above the mid-point of each class-interval along the horizontal axis plot the
points at a height proportional to the respective frequencies. Join these points by straight
lines. The frequency polygon for the distribution of table 1.5 is shown in the figure 2.

iii) Cumulative Percentage Curve or Ogive


When the frequencies are expressed as cumulative percentages of N on the vertical
axis, the graphic representation is known as a cumulative percentage curve or ogive.
After finding the cumulative percentage frequencies, the points are plotted on the exact
upper limits of the class-intervals. A curve joining the points thus obtained is called the
cumulative percentage curve or ogive.
The cumulative percentage curve or ogive of the distribution represented in table 1.5 is
illustrated in figure 3.

Fig. 3: Cumulative percentage curve or ogive plotted from the data of Table 1.5.

15
Data Analysis

1.3.4 Analysis of Quantitative Data

Analysis of quantified data means studying the organised or tabulated data in order to
discover the inherent facts. The data are studied from as many angles as possible to
explore the new facts. Two types of statistical methods are used in the analysis of the
tabulated data measured/expressed in quantified terms. The first category of methods
pertain to ‘descriptive analysis’ and the second, to ‘inferential analysis’ of data.

In this Unit, you will be concerned with ‘descriptive analysis’. Analysis of quantitative
data can be also done by using computer software like SPSS, SAS, Stata and XL Stat. If
you are interested to know more about these software you can check the websites. But it
require thorough understanding of the program and computer.

Descriptive statistical analysis limits generalisation to the particular observed group of


individuals. This analysis describes only one single group. The computed statistical values
provide valuable information about the nature of that particular group only.

The following methods are generally used in descriptive statistical analysis of the tabulated
data:
i) Measures of central tendency
ii) Measures of variability
iii) Measures of relative positions
iv) Measures of relationships.

We shall touch upon each one of them in some detail as follows:

A. Measures of Central Tendency

The three most commonly used measures of central tendency are the Mean, the Median
and the Mode.

I) The Mean (M)


The arithmetic average of a distribution is known as its mean. The mean of a set of
observations or measures is obtained by dividing the sum of all values by the total number
of values.

a) Mean for an ungrouped data:

The formula for finding the mean for an ungrouped data is:

M=
∑X ...................................................................(1)
N
in which
M = Mean
∑ = Sum of

16
Types of Data

X = Observations in a distribution
N = Total number of observations.

To illustrate the use of formula (1) let us consider the data:

16, 14, 12, 18, 21, 22, 13, 15, 16, 18

Using the formula (1):

16 + 14 + 12 + 18 + 21 + 22 + 13 + 15 + 16 + 18
M =
10
∑X
= = 16.5.00
N

b) Mean for grouped data

When the number of observations or measures is large, the data is grouped in a frequency
distribution.

The mean is computed by the formula:


∑ fx′
M = AM + ×i .................(2)
N

Where

M = Mean
AM = Assumed Mean
x′
= [Midpoint score(x) - AM]/(length of the class interval)
∑ fx′
= Sum of the products of frequencies and deviation of observations from
the assumed mean.
i = Width of the class-interval
N = Total number of observation

To illustrate the use of formula (2), consider the grouped data given in table 1.5.

Computations
Step 1: Put the class-intervals in exact limits
Step 2: Find the mid-point of each class interval and take the assumed mean at the
interval which has the maximum frequency.
Step 3: Find the difference between each mid-point score and the assumed mean and
divide it by the length of the class-interval to get the deviation x.
Step 4: Compute fx for each class-interval (fx is the product of the frequency and
deviation of the observation from the assumed mean in a particular case.)

17
Data Analysis

Step 5: Find the sum of all fx


Step 6: Apply formula (2) to compute the mean.

Table 1.6: The Calculation of the Mean from Data Grouped into a Frequency
Distribution (Ref. Table)

Class Interval Mid-point Frequency Deviation fx


(x) (f) from the AM
x

34.5 - 39.5 37 4 2* 8
29.5 - 34.5 32 8 1 8
24.5 - 29.5 27 AM 11 0 0 (+16)
9.5 - 24.5 22 8 -1 -8
14.5 - 19.5 17 6 -2 -12
9.5 - 14.5 12 3 -3 -9 (-29)
N = 40

37 - 27
*2 =
5
Using the formula (2):

∑fx′
M = AM + ×i
N
-13
= 27.0 + ×5
40

= 27.0 - 1.625

II) The Median = 25.375

The median is a point in an array, above and below which one half of the
observations fall. It is a measure of position rather than magnitude.

a) Median for Ungrouped data

If the observations are ungrouped and their number is small, the observations are
arranged in the order of magnitude. The middle score is determined by counting up half
the value of N if the number of observation (N) is even. When the number of
observations (N) is odd, the mid-observation value is median. For example, 10 is the
median of scores : 7, 8, 9, 10, 11, 12, 13. When the number of scores (N) is even, the

18
Types of Data

median is the mid-point between the two middle scores. For example:

(10+11)
= 10.5 is the median of scores:
2

7, 8, 9, 10, 11, 12, 13, 14.

b) Median for grouped data

In the case of grouped data, cumulative frequency distribution is prepared and the
median is calculated by the formula:

N / 2 -F
Mdn = l + ×i ...............(3)
f

Mdn = Median
l = Exact lower limit of the class-interval upon which the median lies.
N/2 = One half of the total number of observations
F = Sum of all frequencies below l.
f = Frequency within the class-interval upon which the median lies.
i = Width of the class interval in which the median lies.

To illustrate the use of formula (3) consider the data of table 1.5 once again.

Table 1.7: The Calculation of the Median from Data Grouped into a Frequency
Distribution
Class-Interval Frequency (F) Cumulative Frequency (F)

34.5 – 39.5 4 40
29.5 – 34.5 8 36
Median 24.5 – 29.5 11 28
Class
19.5 – 24.5 8 17
14.5 – 19.5 6 9
9.5 – 14.5 3 3

N = 40

Here N/2 = 20, I = 24.5, F = 17, f =11 and i = 5

Using formula (3)

19
Data Analysis

N/2-F
Mdn = l + ×i
f

(20 - 17)
= 24.5 + ×5
11

15
= 24.5 +
11

= 25.86
III) The Mode

The mode is defined as the most frequently occurring measure of an observation in a


distribution. If only one value occurs a maximum number of times the distribution is said to
have one mode; i.e. the distribution is unimodal. In some distributions there may be more
than one mode. A two mode distribution is bimodal and it is multimodal in a distribution,
which has more than two modes.

a) Mode for Ungrouped Data

In a simple ungrouped series of measures, the crude or empirical mode is that single
measure which occurs most frequently. For example, in the series 7, 8, 9, 10, 11, and 12
the most often recurring measure, namely, 9 is the crude or empirical mode.

b) Mode for grouped data

When data are grouped into a frequency distribution, the crude or empirical mode is
usually taken to be the mid-point of that interval which contains the largest frequency. In
the example given in table 1.5, the interval 24-29 contains the largest frequency and hence
26.5, its mid-point, is the crude mode.

The true mode, that is, the point of greatest concentration in the distribution, or the point at
which more measures fall than at any other point, is calculated by the formula:

...................................(4)
fm - f1
Mode = l + ×i
2fm - f1 - f2
Where
l = Lower limit of the modal class i.e., the class interval having maximum
frequency
fm = Frequency of the modal class.
f1 = Frequency of the class-interval preceding the modal class.
f2 = Frequency of the class-interval following the modal class.
i = Width of the modal class.

20
Types of Data

To illustrate, let us make use of formula (4) for the data in table 1.5. Here the maximum
frequency is 11 which lies in class interval 24.5 - 29.5.

Therefore, the modal class is (24.5 - 29.5).

Here fm = 11, 8 1 = f , 8 2 = f , i =5 and l = 24.5.

Using Formula (4) :


fm - f1
Mode = l + ×i
2fm - f1 - f2

11 - 8
= 24.5 + ×5
2 × 11 − 8 − 8
3
= 24.5 + ×5
6

= 24.5 + 2.5

= 27.00

B. Measures of Variability

The measures of central tendency are very useful in describing the nature of a distribution
of measures, but they do not give the researcher a complete picture of the data. These
measures will not tell the researcher how the scores tend to be distributed. For this, you
use a different set of measures which are called measures of ‘variability’ or measures of
‘spread’ or ‘dispersion’. The most commonly used measures of variability include the
range, the variance and standard deviation.

I. The Range

The range is defined as the difference between the two extreme measures or values in a
distribution. Suppose the scores of 10 learners in the course MDE -412 are :

50, 40, 39, 35, 29, 28, 24, 27, 19. 18.

The range for this distribution will be (50-18) = 32. Although the range has the advantage
of being easily calculated, it has the following serious limitations:

1) As the value of range is based on only two extreme values in the total distribution, it
does not give any idea of the variation of many other values of the distribution.

21
Data Analysis

2) It is not a stable statistic as its value can differ from sample to sample drawn from
the same population.

II. The Variance and Standard Deviation

The average of the squared deviations of the measures or values from their mean is
known as the variance. The standard deviation is the positive square root of variance.

a) The Variance and Standard Deviation for the Ungrouped Data

The variance for the ungrouped data is found by using the formula:

∑ x2
σ2 = ..........................................(5)
N
= Variance of the sample
x = Deviation of the raw measures or values from the mean.
N = Number of values or measures

Let us consider the following data of scores for the application of formula (5):
10, 10, 9, 9, 8, 8, 7, 7, 6, 6.

As the deviation of each score from the mean is required, the first thing to do is to
calculate the mean. Using formula (1)

M=
∑ x = 80 = 8
N 10

Now, from each raw score, the mean is substracted to get the value of x.

Table 1.8: Distribution of the Test Scores of Ten Learners of Course MDE - 412

Score Deviation (X-M) Deviation Squared


(X) (x) (x 2 )
10 2 4
10 2 4
9 1 1
9 1 1
8 0 0
8 0 0
7 -1 1
7 -1 1
6 -2 4
6 -2 4

----x2 =20

22
Types of Data

Using formula (5)

σ2 = ∑x 2

20
= =2
10

Now to get the standard deviation, you need the positive square root of the variance, σ 2 .

Standard Deviation = σ =
∑x 2

= 2

= 1.41

The raw scores instead of deviation scores may also be used. The raw score formulae for
variance and standard deviation are given as follows

N ∑ X 2 − (∑ X )2
Variance = σ =
2
..........................................(6)
N2

N ∑ X 2 − (∑ X ) 2
Standard Deviation = σ = ..........................................(7)
N2

In which
X = Raw score
N = The number of scores in the distribution.

Using the same set of data, you can calculate variance and standard deviation with the
help of raw-score formulae:

23
Data Analysis

Table 1.9: The calculation of variance and standard deviation from original (row)
scores when the assumed mean is taken at zero and the data is ungrouped

Score (X) X2

10 100
10 100
9 81
9 81
8 64
8 64
7 49
7 49
6 36
6 36
....X =80 .........X2 = 660

Using formula (6)


N ∑ X 2 − (∑ X ) 2
Variance =
N2

10 × 660 − (80 ) 2
=
100
6600 − 6400
=
100
=2

Using formula (7)

N ∑ X 2 − (∑ X ) 2
Standard Deviation = σ =
N

10 × 660 − (80) 2
=σ =
10
6600 − 6400
=
10
14.14
=
10
= 1.414

24
Types of Data

b) Variance and Standard Deviation for Grouped Data

In the case of grouped data in a frequency distribution, the variance and standard
deviation are calculated by using the formulae:

i2
Variance = = σ = 2 N ∑ fx −
N
2 '2
[ (∑ fx ) ]
'2 2
..........................(8)

Standard Deviation =
i2 
N 2 
N ∑ fx ' 2 − (∑ fx )'2 2 
 ..........................(9)

Where
i = Width of the class-interval
N = Total number of measures
f = Frequency of class-interval
x1 = Deviation of the raw measure from the assumed mean divided by the
length of class-interval.

To illustrate the use of these formulae let us consider the distribution given in table 1.10.

Table 1.10: The Calculation of Variance and Standard Deviation from Data
Grouped in a Frequency Distribution

Class Interval x f x1 fx1 fx 12

71-75 73 3 3 9 27
66-70 68 4 2 8 16
61-65 63 9 1 9 9
56-60 58 15 0 0 0
51-55 53 8 -1 -8 8
46-50 48 6 -2 -12 24
41-45 43 5 -3 -15 45

N = 50 …….fx1 =-9 …..fx12 =129

Using formula (8)

Variance = σ =
2 i2
N2
[
N ∑ fx ' 2 − (∑ fx ) ]
'2 2

=
(5)2 [50 ×129 − (− 9)2 ]
(50)2

= 63.69

25
Data Analysis

Using formula (9)

Standard Deviation =
i2 
2 
N 
N ∑ fx ' 2 − (∑ fx )
'2 2 


=
5
50
(50 ×129 − (− 9) )
2

1
= 6369
10

1
= × 79.81
10

= 7.98

The standard deviation is a very useful device for comparing characteristics that may be
different or expressed in different units of measurement. It is also used in describing the
status or position of an individual in a group. But before this concept is developed further,
it is essential to understand the nature of the ‘normal probability distribution’.

26
Types of Data

Check Your Progress 3

Compute (i) Mean (ii) Variance and (iii) Standard Deviation for the following frequency
distribution:

Class Interval F
195-199 1
190-194 2
185-189 4
180-184 5
175-179 8
170-174 10
165-169 6
160-164 4
155-159 4
150-154 2
145-149 3
140-144 1

Notes: (a) Space is given below to write your answer.


(b) Compare your answer with the one given at the end of this unit.

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

27
Data Analysis

Normal Probability Distribution

The normal probability distribution is based upon the law of probability. It is not an
actual distribution of measures or scores; instead, it is a mathematical model. It is
represented by a curve which is called the Normal Probability Curve. Figure 4 represents
an ideal normal probability curve.

Fig. 4: Ideal Normal Probability Curve

The normal probability curve has the following characteristics:

1. The curve is symmetrical around its vertical axis called ordinate. It implies that the
size, shape and slope of the curve on one side of the ordinate is identical to that on its
other side.
2. The values of mean, mode and median computed for a distribution following this
curve, are always the same.
3. The height of the vertical line called ordinate is maximum at mean and in the unit
normal curve it is equal to 0.3989.
4. The curve is asymptotic. It approaches but does not meet horizontal axis and extends
from (minus infinity) to (plus infinity).
5. The points of inflection of the curve occur at points ± 1, standard deviation ,
above and below the mean. Thus the curve changes from convex to concave in
relation to the horizontal axis at these points.
6. About 68.26 percent of the total area falls between the limits M + and M – ;
95.44 percent of the total area of the curve falls between the limits M + and M
- and 99.73 percent of the total area of the curve falls between M+ and M
- .

However, these calculations are rarely necessary, as Normal Table is available from
which the information about the area is readily available. For this reason it is very
essential that the use of Normal Table (Table I Appendix, Unit 2) be clearly understood.
Table I gives the fractional parts of the total area under the normal curve found between

28
Types of Data

mean and ordinate (Y’s) erected at various distances from the mean. The total area under
the curve is taken arbitrarily to be 10,000, because of the greater convenience with which
fractional parts of the total area may then be calculated. You know that x = (X-M)
measures the deviation of a raw score (X) from the mean (M). If x is divided by , then
this deviation is expressed in units. These deviation scores are called sigma scores
 X −M x x
or z-scores  i.e.z = =  . The first column of the table under gives
 σ σ σ
distance from the mean in the tenth of and distance from the mean in the hundredth of
are given by the headings of the other columns.

To find the number of cases in the normal distribution between the mean and the ordinate
x
erected at a distance of from the mean, you go down the column until 1.0 is
σ
reached, and in the next column under .00 you take the entry opposite to 1.0, namely
34.13. This figure means that 3413 cases in 10,000, or 34.13 percent of the entire area of
the curve lies between the mean and . Similarly, if you have to find out the
x
percentage of the distribution between mean and 1.65 , you go down the column till
σ
1.6, then across horizontally to the column headed .05, and take the entry 45.5.

This shows that in a normal distribution, 45.05 percent of the total area lies between the
mean and 1.65 .

Check Your Progress 4


Describe the characteristics of a Normal Probability Distribution
Notes: (a) Space is given below for writing your answer.
(b) Compare your answer with the one given at the end of this unit.
........................................................................................................................................
........................................................................................................................................
........................................................................................................................................
........................................................................................................................................
........................................................................................................................................
........................................................................................................................................
........................................................................................................................................
............................................................................................................................
............................................................................................................................
............................................................................................................................

29
Data Analysis

C. Measures of Relative Positions

A raw score on a test, taken by itself, has no meaning. It gets meaning only by
comparison with some reference group or groups. for eg. If a student score 50 in Maths
and 30 in Science, it does not mean that the student did better in Maths. 50 may be the
lowest score in Maths test and 30 may be the highest score in Science test. The
comparison may be done with the help of the following measures:

1. Sigma Scores

2. Standard Scores

3. Percentiles

4. Percentile Ranks.

What does each one of these mean?

1. Sigma Scores

A sigma score makes a realistic comparison of scores possible and provides a basis for
equal weighting of the scores as the scores on different tests are expressed on a scale
with a mean of zero and standard deviation of 1.

Let us suppose that the mean of a test is 75 and the standard deviation is 5.0. Then if A
earns a score of 85 on this test, his/her deviation from the mean is 85-75 = 10. Dividing
this deviation of 10 by the standard deviation , i.e., 5.0, we give him a score of

10
= 2 . If B’s score on this test is 64, his/her deviation from the mean is 64-75 = -11 and
5
the score in units is -2.20. Deviations from the mean expressed in terms are called
sigma scores.

Half of the scores in a distribution lie below and half above the mean, about half of
scores are positive and half are negative.

2. Standard Scores

The sigma scores, which are often small decimal fractions and half of them are negative,
are somewhat inconvenient to deal with. Hence, scores are usually converted into a new
distribution with mean and standard deviation so selected that it makes all scores positive
and relatively easy to handle in computation. Such scores are called ‘standard scores’.

The formula for the conversion of a raw score to a standard score is as follows:

σ'
X' = (X − M ) + M ' .......................................(10)
σ

30
Types of Data

in which

= A standard score in a new distribution


= SD’s of standard and raw scores.
X = A score in the original distribution.

M and M = Means of raw and standard scores.

When the mean (M’) and standard deviation are taken to be 50 and 10
respectively, the standard score is called a T-score.

10
i.e. T = ( X − M ) + 50 ................................(11)
σ

Example: To illustrate, let us consider a distribution with its mean 67 and = 12.5. Let
us also suppose that A’s score is 76 and B’s score is 54.

Express these scores as (i) standard scores in a distribution with a mean of 250 and of
50 and (ii) T-scores.

Using formula (10)

50
X' = ( X − 67) + 250
12.5

Substituting A’s score of 76 in the above equation you have:

50
X' = (76 − 67) + 250
12.5

50 × 9
= + 250
12.5

= 286
Substituting B’s scores of 54 in the above equation

50
X' = (54 − 67) + 250
12.5

= 198

31
Data Analysis

Using formula (11)

10
T= ( X − 67) + 50
12.5

Substituting A’s score in the above equation you have:

10
T= (76 − 67) + 50
12.5

T = 0.8 × 9 + 50

= 57.2

Substituting B’s score in the above equation you have:

10
T= (54 − 67) + 50
12.5

= 39.6

3. Percentiles

Percentiles are the points which divide the entire scale of measurement into 100 equal
parts. They are denoted by P0, P1, P2, P3, P4, P5 ……………… P99, and P100.

The first percentile may be defined as that point in a frequency distribution below which
lie 1 percent of the total measures or scores. Similarly, twentieth percentile may be
defined as that point in a frequency distribution below which 20 percent of the total
measures or scores fall. It is evident that the median, expressed as a percentile, is P50. It
should be noted that P0 lies at the beginning of the distribution and P100 at the end of the
distribution.

The formula for calculating percentiles is

Pp = l +
(PN − F ) × i
fp .......................................(12)

32
Types of Data

in which

Pp = percentile of the distribution wanted


l = exact lower limit of class-interval upon which Pp lies.
PN = part of the N to be counted off in order to reach Pp.
F = sum of all scores upon intervals below l.
fp = number of scores within the interval upon which Pp falls
i = length of the class-interval.

The use of formula (12) may be illustrated by the following example.


Calculate P25, P45 and P95 from the following distribution:

Table 1.11: The Calculation of Percentiles from Data Grouped in a Frequency


Distribution

Scores Frequency Cumulative Frequency


Class-intervals (f) (F)

81.5 - 86.5 1 8
76.5 - 81.5 4 79
71.5 - 76.5 5 75
66.5 - 71.5 10 70
61.5 - 66.5 35 60
56.5 - 61.5 12 25
51.5 - 56.5 9 13
46.5 - 51.5 2 4
41.5 - 46.5 2 2

N=80

For computing P25 you have to first find PN

Here, 25 percent of 80 is 20, PN = 20

Now l = 56.5, F = 13, fp = 12 and i = 5

Using formula (12)

20 − 13
P25 = 56.5 + ×5
12

= 56.5 + 2.92 = 59.42

33
Data Analysis

Similarly

36 − 25
P45 = 61.5 + ×5
35

= 61.5 + 1.57 = 63.07

76 − 75
P95 = 76.5 + ×5
4

= 76.5 + 1.25 = 77.25

4. Percentile Ranks

The percentile rank is the point in the distribution below which a given percentage of
scores falls. If the 80th percentile rank is a score of 65, then 80 percent of the scores falls
below 65. The median is the 50th percentile rank, for, 50 percent of the scores fall below
it.

The process of calculating percentile ranks is the reverse process of calculating percentile
points. You have to calculate ranks corresponding to particular scores. If R is the rank and
N is the total number of cases, then:

100 R − 50
Percentile Rank = 100 − ................................... (13)
N

Suppose A ranks 13th in the class of 80 learners, 12 learners rank above it, 67 below it. Its
percentile rank is :

100 ×13 − 50
= 100 −
80

= 100 − 15.625

= 84

D. Measures of Relationship

The data in which we secure measures of two variables for each individual is called
bivariate data. The essential feature of bivariate data is that one measure can be

34
Types of Data

compared with another measure for each member of the group. When bivariate data are
studied, you may like to know the degree of relationship between the variables of such
data. This degree of relationship is known as correlation. It can be quantitatively
represented by the coefficient of correlation. Its value ranges from -1.00 to +1.00. A
value of -1.00 describes a perfect negative correlation and +1.00 describes perfect
positive correlation. A zero value describes complete lack of correlation between the two
variables. The sign of the co-efficient indicates the direction of relationship and numerical
value is its strength/magnitude.

Methods of correlating variables

There are various methods of correlating variables. Their use is relative to the situation
and type of data. Product Moment Correlation and Rank Order Correlation are
mostly used for computing correlation between two variables.

1. Product-moment correlation

In some situations, the data for two variables X and Y are expressed in interval or ratio
level of measurement and the distributions of these variables have a linear relationship.
Moreover, the distributions of variables are uni-modal and their variances are
approximately equal. In such situations, product moment method of correlation is used
generally. It is also called Pearson’s r.

i) Calculation of Pearson’s r from ungrouped data:

When the size of the sample is small, there is no need of grouping the data and Pearson’s
r may be calculated with the help of the following formula:

N ∑ xy − (∑ x )(∑ y )
rxy =
[N ∑ x − (x) ][N ∑ y − (∑ y ) ]
2 2
................(14)
2

in which

x = deviations of X measures from the assumed mean.


y = deviations of Y measures from the assumed mean.

To illustrate the use of formula (14), let us compute the product moment ‘r’ from the
following data for the two variables X and Y for 10 learners who are enrolled in a Study
Centre of an Open University.

X : 45 54 52 58 62 46 55 49 50 54
Y : 42 50 55 46 59 41 46 48 45 48

35
Data Analysis

Using formula (14) for the data in Table 1.12

Table 1.12: The Calculation of Product Moment Correlation from Ungrouped


Data when Deviations are taken from Assumed Mean

X Y x y x2 y2 xy

45 42 -7 -6 49 36 42
54 50 2 2 4 4 4
52(AM) 55 0 7 0 49 0
59 46 6 -2 36 4 -12
62 59 10 11 100 121 110
46 41 -6 -7 36 49 42
55 46 3 -2 9 4 -6
49 48(AM) -3 0 9 0 0
50 45 -2 -3 4 9 6
54 48 2 0 4 0 0

∑ x =5 ∑ y =0 ∑ x =251 ∑y =276
2 2
∑xy=186

Using formula (14)

N∑xy - (∑x)(∑y)
rxy =
√ [N∑x2 - (∑x)2[(Ν∑y2-(∑y)2]

10 × 186 - 5 × 0
=
√ [10 × 251) - (5)2][10 × 276 - (0)2]

1860
=
2618.89

= 0.71

ii) Calculation of Pearson’s r from grouped data:

When N is large or even moderate in size, and when no calculating machine is available,
the best procedure is to group data in both variables X and Y and to form a scattergram.

36
Types of Data

The values from the scattergram may be used in the following formula:

N∑fxy - (∑fx)(∑fy)
rxy = ..............(15)
√ [N∑fx2 - (∑fx)2[(Ν∑fy2-(∑fy)2]

To illustrate the use of the formula (15) consider the data of 50 learners enrolled with
IGNOU in Course X and in Course Y in the following scattergram:

Fig. 5: A Scattergram Showing Paired Scores of 50 Learners on the Tests of


Course X and Course Y.

The computation for the values ∑fxy, ∑fx2, ∑fxy etc., may be done in the following
steps in the order given below.

Step 1
The distribution of Course X scores for the 50 learners is found in the f(y) column on the
right of the scattergram. Assume a mean for the distribution of scores of course X (the
mid-point of that interval which contains the largest frequency), and draw double lines to
mark off the row in which the assumed mean falls. In the present example, the mean

37
Data Analysis

score for course X has been taken at 46 (mid point of interval 45-47) and y’s (deviations
from the assumed mean) have been taken from this point.

Fill in fy and then fy2 columns.

Step 2

The distribution of the Course Y of 50 learners is found in the f(x) row at the bottom of
the scattergram. Assume a mean for this distribution and draw double lines to designate
the column under the assumed mean. The mean for the Course Y scores is taken at 26.5
(mid-point of interval 26-27), and the x’s (deviations from assumed mean) are taken from
this point. Fill in the fx and then fx2 rows.

Step 3

The fxy for a cell is computed by multiplying the frequency given in the particular cell with
the corresponding x and y. For example, there is a frequency 1 corresponding with Course
Y score 24-25 and Course X score 51-53. The corresponding x for this cell frequency is -
1 and corresponding y is +2. Thus fxy for this cell is (1) (-1) (+2) = -2. Similarly the value
for fxy is computed for all the cells and their sum is calculated row-wise as well
as column-wise. The two sums should equal each other. In the present example, it has
come to be 4.

Step 4

Substituting the values for ∑ fx, ∑ fx , ∑ fy, ∑ fy


2 2
, and ∑ xy in the formula
(15) we get :

50 × 4 − (− 33)(− 2)
rxy =
[50 ×109 − (− 33) ][50 × 46 − (− 2) ]
2 2

= 0.042

2. Rank Order Correlation

It is also known as the Spearman rank order co-efficient of correlation and is denoted
by r (rho). When the data are available in ordinal (rank) form of measurement rather than
in interval or ratio form, this type of correlation is useful.

To find out Spearman rank order coefficient of correlation, the following formula is used:

6∑ D 2
p = 1−
(
N N 2 −1 ) .........................................(16)

38
Types of Data

in which

D = Difference between the paired ranks.

∑D 2
= Sum of the squared differences between ranks
N = Number of paired ranks.

To make use of formula (16) let us consider the following data. Two judges X and Y
ranked 10 distance learners in a declamation contest. The ranks given to them by the
judges are given in table 1.13.

Table 1.13: The Calculation of Rank Difference Correlation

Students Rank assigned Rank assigned D D2


by X by X

A 2 3 -1 1
B 4 5 -1 1
C 5 4 1 1
D 10 9 1 1
E 8 7 1 1
F 1 2 -1 1
G 3 1 2 4
H 9 8 1 1
I 6 10 -4 16
J 7 6 1 1

∑D 2
=28

Using formula (16)

6∑ D 2
p = 1−
(
N N 2 −1 )
6 × 28
= 1−
10(100 − 1)

= 0.83

39
Data Analysis

Check Your Progress 5

Compute product moment correlation for the following data:


X : 45, 55, 56, 58, 60, 65, 68, 70, 75, 80, 85
Y : 56, 50, 48, 60, 62, 64, 65, 70, 74, 82, 90
Notes: (a) Space is given below for writing your answer.
(b) Compare your answer with the one given at the end of the Unit.

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

.......................................................................................................................................

............................................................................................................................
............................................................................................................................

1.4 QUALITATIVE DATA


It has already been explained that quantitative measurement makes use of tools that
provide a standardised framework in order to limit data collection to certain predetermined
responses or categories. The variables that describe a phenomenon are fit in the
standardised categories to which numerical values are attached . But in some situations it
is difficult to analyse a phenomenon into various components or variables which can be
measured in quantified terms. In such cases the researcher takes into consideration the
phenomenon as a whole and assumes that there is some quality in the phenomenon in its
entirety. When the researcher attempts to retain the totality of a phenomenon while
verifying propositions regarding it, he/she adopts a qualitative approach. While using this
approach the researcher seeks to capture what people have to say in their own words.
Qualitative approach describes the experiences of people in depth and permits the
researcher to record and understand people in their own perceptions.

40
Types of Data

Qualitative data consist of ‘detailed descriptions’ of situations, events, people, interactions,


and observed behaviours. These data are also available in the form of ‘direct quotations’
from people about their experiences, attitudes, beliefs, and thoughts. The verbal data
gathered through questionnaires, observations and interviews are mostly qualitative in
nature. The ‘excerpts’ or ‘entire passages’ from documents, correspondence records and
case histories are also examples of data of a qualitative nature. It may be noted that
detailed descriptions, direct quotations, and case documentation of a qualitative nature are
raw data from empirical situations.

1.4.1 Organisation of Qualitative Data

The responses to open-ended questions on a questionnaire are pretty extensive; they are
neither systematic nor standardised. However, they permit the researcher to understand
situations as seen and felt by the respondent. The data gathered through participant
observation or an open ended/unstructured interview are also descriptive in nature. These
descriptions may be in the form of field notes specifying some basic information pertaining
to the place where the observation has taken place as well as descriptions about the
people who participated in the activities and their extrinsic behaviour in the course of the
activities. However, it is not possible to interpret minds while observing their extrinsic
behaviour. Through an open-ended/unstructured interview, you can know more about
those events which had occurred earlier or could not be observed during participant
observation. It provides a framework within which the researcher should be able to gather
information from people conveniently and accurately. The information may pertain to a
programme, the reaction of participants about the programme and the type of change the
participants perceive in themselves after their involvement in the programme. The data
are mostly in the form of responses to structured and unstructured questions put to
respondents by the researcher during conversation. The responses are generally direct
quotations from respondents in their own words and provide details about the situations,
events, people, experiences, behaviours, values, customs, etc.

The qualitative data gathered using open-ended questionnaires, participant observations


and in-depth interviews are voluminous. They need to be organised and classified into
specific patterns, categories and descriptive units to avoid any chaos. However, before
any such classification is done, it is advisable to make some copies of the data. One copy
may be stored in a safety deposit box so that in case of a loss, this copy can be used by
the researcher. The second copy may be used for further treatment of the data
throughout. The third copy may be used to fill the missing gaps, identified during scrutiny
by the researcher. Additional notes can also be recorded in this copy. Since the
organisation of qualitative data involves a lot of cutting and pasting, a fourth copy may be
used for that purpose.

Actual classification or organisation of the data can begin only after the copies are made.
There are no formal or universal rules for organising the data in various units, patterns, or
categories. It requires a creative approach and a lot of perseverance to give a meaningful
look to the data. The contents of field notes about interviews or observations may be read
carefully by the researcher and he/she may note down his/her comments on the margins

41
Data Analysis

or attach small pieces of paper with his/her written comments/notes using staples or tags.
The arrangement of data in topics, using abbreviations, is the next step. The abbreviated
topics are written down either on the margins of the relevant data or on slips of paper
which may be attached to the relevant pages. The process of classifying or labelling
various kinds of data help in the preparation of a data index. Sometimes there are large
data. In such situations, computers are helpful in developing systematic and
comprehensive classification schemes using code numbers for different categories and
sub-categories. The computerized classification system permits the use of organised data
by several groups of people over a long period of time. It permits easy cross-classification
and cross- comparison of descriptive narrations for complex analysis.

1.4.2 Analysis of Qualitative Data

Analysis of qualitative data means studying the organised material available in the form of
detailed descriptions, direct quotations or case-documentation in order to discover inherent
facts. These data are studied from as many angles as possible either to explore the new
facts or to reinterpret already known or existing facts.

The following methods are generally used in the analysis of qualitative data:

i. Content Analysis

ii. Inductive Analysis

iii. Logical Analysis

A. Content Analysis

Content analysis is concerned with the classification, organisation and comparison of the
content of the document or communication. The terms, content analysis and coding are
sometimes used interchangeably as both the processes involve objective, systematic, and
qualitative description of any symbolic behaviour. Since content analysis involves the
classification, evaluation and comparison of the content of communication or document, it
is sometimes referred to as ‘documentary activity’ or ‘information analysis’. The
communication may be in the form of responses to an open-ended questionnaire,
conversation as a result of an interview, or description of an observed activity. It may also
be in the form of official records (census, birth, accident, crime, school, institutional and
personal records), judicial decisions, laws, budget and financial records, cumulative
records, courses of study, content of text books, reference works, news papers,
periodicals or journals, prospectus of various educational institutions or universities, direct
quotations, and notes of an interview.

There are three approaches that a researcher may adopt in content analysis. These
include: (i) characteristics of content, (ii) procedures or causes of content, and (iii)
audience or effects of content. In the first approach, the researcher is interested primarily
in the characteristics of the content itself. He/she may focus either on the ‘substantive
nature’ of the content or upon the ‘form’ of the content. In the second approach, the

42
Types of Data

researcher attempts to draw valid inferences about the nature of the procedures of the
content or the causes of the symbolic material from the characteristics of the material
itself. In the third approach to content analysis, the researcher interprets the content so as
to reveal something about the nature of its ‘audience’ or its ‘effects’. He/she takes the
content material as a basis for drawing inference about the characteristics of the
‘audience’ for whom the material (content) is designed or about the effects of
communication, which it brings about.

The steps involved in the process of content analysis includes (i) defining the unit of
analysis, (ii) specifying variables and categories, (iii) frequency, direction and intensity of
units, (iv) contingency analysis, (v) sampling of units, and (vi) constructing the content
analysis outline. Defining the unit of analysis indicates whether the unit (material) is
confined to single words, phrases, complete sentences, paragraphs, or to even larger
amounts of materials. Once the unit is defined, the researcher conducts its analysis so
as to create reproducible or objective data for scientific treatment and generalisation
beyond the specific set of symbolic material analysed. For converting symbolic material
into objective data, it is necessary to specify the “variables” explicitly in terms of which
descriptions are to be made. Once the unit is defined and the variables along with their
categories specified, the researcher will classify units in the material to be analysed
according to : (i) the number of units (frequency), (ii) favourableness/ unfavourableness of
the content (direction), and (iii) the emotional impact of the units (intensity). The
contingency analysis aims at considering the content within which the unit is found. The
researcher considers the favourableness or unfavourableness of a single unit in the light of
the reminder of the communication so that its real meaning is not lost.

Steps in the Analysis of Qualitative Data.

1. The first step is to understand your data. Must read the data carefully again and
again for the quality.
2. Second step is to identify the purpose of evaluation. See that how the respondents
have answered to the question.
3. Categorize the data into themes/patterns. Then organize into categories. This is the
most important step in Qualitative Analysis. You can assign the codes which can be
few letters or words or symbols.
4. In the next step you have to find out the patterns and connections within the
categories or between the categories identified.
5. The last step is the interpreting the data. You have to think and design an outline to
present the findings.
6. You can feed the data to the computer by entering the text to word processing
program. These days software programmes like Ethnograph, MODIST etc. can be
used to analyze the qualitative data. If the data is not big, you can also analyze
manually. There are other software also available. You can choose according to your
convenience.

43
Data Analysis

B. Inductive Analysis

Inductive analysis means that patterns, themes, and categories of analysis emerge out of
the data. In this analysis, researcher looks for natural variation in the data. The study of
natural variation involves particular attention to variations in programme processes and
how participants respond to and are affected by programmes. Two ways of representing
the patterns emerge from the analysis of data. First, the researcher can use the categories
developed and articulated in the programme studied to organize presentation of particular
themes. Second, the researcher may also become aware of categories or patterns for
which the people in the programme did not have labels or terms, and the analyst develops
terms to describe these inductively generated categories.

C. Logical Analysis

Logical analysis is used for representing patterns as dimensions or categories using either
participant-generated constructions or evaluator generated constructions. It is sometimes
useful to cross-classify different dimensions to generate new insights about how the data
can be organized and to look for patterns that may not have been recognised in the initial
induction analysis. Logical analysis aims at creating potential categories by crossing one
typology with another, and then moving back and forth between the logical construction
and the actual data for creating a “new typology” using cross-classification matrices.

There are other ways of analysing qualitative data. We have not discussed all of them.
The idea is to give you a feel of qualitative data analysis and show how it differs from
quantitative data analysis.

Check Your Progress 6


Define qualitative data. Give some examples of these data.
Notes: (a) Space is given below for writing your answer.
(b) Compare your answer with the one given at the end of the Unit.

........................................................................................................................................
........................................................................................................................................
........................................................................................................................................
........................................................................................................................................
........................................................................................................................................
........................................................................................................................................
........................................................................................................................................
............................................................................................................................
............................................................................................................................
............................................................................................................................

44
Types of Data

1.5 LET US SUM UP


In this Unit, we discussed the nature of quantitative and qualitative data, the various
methods of representing the quantified data graphically, and the methods used in the
analysis of quantitative and qualitative data. The main points are as follows:

1. The data collected through the administration of various tools on the selected
samples are of (i) quantitative and (ii) qualitative nature.
2. Quantitative data are expressed in nominal, ordinal, interval or ratio scales of
measurement. These data are classified into two categories: (i) parametric and (ii)
non-parametric. The parametric data are obtained by applying interval or ratio
scales of measurement, whereas non-parametric data are either enumerated or
ranked. In the enumerated data we make use of nominal scale and in the ranked
one we apply ordinal scale.
3. The quantified data is tabulated in ‘frequency distribution’ and can be represented
graphically with the help of a histogram, a frequency polygon, and/or an ogive.
4. Measures of (i) central tendency, (ii) variability, (iii) relative positions, and
(iv) relationship are the four types of descriptive statistical measures.
5. Mean, median and mode are the three measures of central tendency.
6. Mean is the arithmetic average of a distribution. It is obtained by dividing the sum of
all values of observation by the total number of values. The formula for finding the
mean for ungrouped data is:

M=
∑X
N
When the number of observations is large, the data is grouped in a frequency distribution.
The formula for computing the mean here is:

M = AM +
∑ fx × i
N

7. Median is a point in an array, above and below which one half of the values or
measures fall. If the values are ungrouped and their number is small, the values are
arranged in order of magnitude and the middle value is determined by counting up
half the value of N.

When the number of values is odd, the mid-value is the median. When the number of
values is even, the median is the mid-point between the two middle values.

In the case of grouped data, the median is calculated by the formula:

N
−F
Mdn = l + 2 xi
x

45
Data Analysis

8. Mode is the most frequently occurring value in a distribution. If only one value
occurs a maximum number of times, the distribution is said to have one mode (uni-
modal). A two mode distribution is bi-modal, and more than a two mode distribution
is called multimodal.

In a simple ungrouped series of measures or values, the crude mode is that single measure
or value which occurs most frequently.

For a group distribution, the mode is calculated by the formula:

fm − fi
Mode = l + xi
2 fm − f1 − f 2

9. The range, variance and standard deviation are the most commonly used measures
of variability.
10. The range is the difference between the two extreme values or measures in a
distribution.
11. The average of the squared deviations of the measures or values from their mean is
known as variance. Standard deviation is the positive square root of variance.
Variance and standard deviation for the ungrouped data are found by the formulae:

N ∑ X 2 − (∑ X )
2

Variance = σ = +
2

N2

N ∑ X 2 − (∑ X )
2

Standard Deviation σ =
N2
When the data are grouped in a frequency distribution, the variance and standard
deviation are computed by the formulae:

Variance = σ 2 =
i2
N 2
[
+ N ∑ fx '2 − (∑ fx ) ] ' 2

Standard Deviation = σ =
i
N
[N ∑ fx '2
− (∑ fx ) ]
' 2

12. The normal probability distribution is represented by a curve which has the
following characteristics:
i) The curve is symmetrical around its vertical axis called ordinate.
ii) The mean, mode and median of the distribution have the same values.
iii) The height of the vertical line called ordinate is maximum at the mean and in
the unit normal curve, it is equal to 0.3989.
iv) The curve is asymptotic.
v) The points of inflection of the curve occur at points 1 ± , standard deviation
( ± 1σ ) above and below the mean.
vi) About 68.26 percent of the total area of the curve falls between limits Mean
±1σ , 95.44 percent of the total area falls between Μ±2σ and 99.73 percent
of the total area falls between Μ±3σ.

46
Types of Data

13. Sigma scores, standard scores, percentiles and percentile ranks are the measures of
relative positions.
14. A sigma score makes it possible to obtain a realistic comparison of scores and
provides a basis for equal weighting of the scores as the scores on different tests
are expressed on a scale with a mean of zero and standard deviation 1.
15. When the sigma scores are converted into a new distribution with mean and
standard deviation so selected as to make all scores positive, the scores are called
standard scores.

The formula for the conversion of a raw score to a standard score is:

σ′
X = σ (X - M) + M′

When the mean (M) and standard deviation (σ′) are taken to be 50 and 10 respectively,
the standard score is called a T-score. It is expressed by the formula:

10
T= (X - M) + 50
σ
16. Percentiles are the points which divide the entire scale of measurement into 100
equal parts.

The formula for computing percentiles is:

(P - F)
Pp = l + ×i
f

17. Percentile rank is the point in the distribution below which a given percentage of
scores fall. If R is the rank and N is the total number of cases.

100R - 50
Percentile Rank = 100 -
N

18. Product Moment correlation and rank-difference correlation are the commonly used
measures of relationship between any two variables.

19. When the size of sample is small and the variables are measured in interval scales
of measurement, the product-moment correlation is computed by the formulae:

N∑xy - (∑x)(∑y)
rxy =
√ [N∑x2 - (∑x)2[(Ν∑y2-(∑y)2]

47
Data Analysis

When the size of the sample is large, the product-moment correlation is found by the
formulae:
N∑fxy - (∑fx)(fy)
rxy =
√ [N∑fx2 - (∑fx)2[(Ν∑fy2-(∑fy)2]

20. When the data are available in ordinal (rank) form of measurement and the size of
the sample is small, the formula for computing the rank-difference correlation is:

6∑D2
ρ=1−
N(N2 - 1)

21. In some situations if it is difficult to measure or analyse a phenomenon into various


components or variables in quantified terms, the researcher takes into consideration
the phenomenon as a whole, in detail and depth. In other words, the researcher uses
qualitative techniques of analysis. Qualitative data consist of detailed descriptions of
situations, events, people, interactions, and observed behaviours. These data are
also available in the form of direct quotations from people about their experiences,
attitudes, beliefs, and thoughts. The excerpts or entire passages from documents,
correspondence, records and case studies are also examples of qualitative data.
22. Content analysis, inductive analysis and logical analysis are some methods of
qualitative analysis.
23. Content analysis pertains to the classification, quantification and consideration. It is
also called documentary or information analysis.
24. Inductive analysis leads to patterns, themes and categories emerging out of the
data. In this type of analysis, the researcher looks for natural variation in the data.
25. Logical analysis is used for representing patterns as dimensions or categories, either
using participant-generated constructions or evaluator-generated constructions.

To assess your learning yourself, see whether you are now able to:
• name various types of data
• define quantitative data
• describe quantitative data
• describe various types of quantitative data with examples
• tabulate a given quantitative data into frequency distribution
• illustrate the methods of expressing the class intervals with the help of examples
• compute the cumulative frequencies and cumulative percents for a given
frequency distribution
• name the four methods of representing a data graphically
• construct a histogram, a frequency polygon and an ogive for a given distribution
• name the four descriptive statistical measures
• name and define the three measures of central tendency or averages (mean,
median and mode)
• compute mean, median and mode from a given (i) ungrouped data (ii) grouped
data

48
Types of Data

• name and define the three measures of variability (range, variance and standard
deviation)
• compute range for ungrouped data
• compute variance and standard deviation for a given (i) ungrouped and (ii)
grouped data
• describe the nature and characteristics of Normal Distribution
• use the Normal Table
• name and define the various measures of comparing individuals on the basis of
different types of scores (Sigma Scores, Standard Scores, Percentiles and
Percentile ranks)
• convert a raw score into a sigma score corresponding to the mean and standard
deviation of a distribution
• convert a given raw score into a standard score (Z-score or T-score)
corresponding to the mean and standard deviation of a distribution
• define a percentile
• compute certain percentiles for a given distribution of scores
• define percentile rank
• compute the percentile rank of an individual corresponding to his/her rank in the
group to which he/she belongs
• name the various measures of relationships
• compute the product moment correlation for a given (i) ungrouped and (ii)
grouped data
• compute rank order correlation for a given ungrouped data
• define qualitative data with examples
• name and describe some methods used in the analysis of qualitative data

1.6 GLOSSARY
1. Quantitative Data: Data which are expressed in nominal, ordinal,
interval or ratio scales of measurement.
2. Qualitative Data: Data which are available in the form of detailed
descriptions of situations, events, people,
interactions, and observed behaviour, direct
quotations from people about their experiences,
attitudes, beliefs, and thoughts, and excerpts from
documents, correspondence, records, and case
histories.
3. Parametric Data: These are data which are got by applying interval or
ratio scales of measurement.
4. Nonparametric Data: These are data which are got by applying nominal or
ordinal scales of measurement. These types of data
are either counted or ranked.
5. Central Tendency: A measure of central tendency provides a single
most typical value as representative of a group of

49
Data Analysis

values; the ‘trend’ of a group of measures as


indicated by some type of averages, usually the
mean, median or mode.
6. Mean: A kind of average obtained by dividing the sum of a
set of measures by their number.
7. Median : The middle value in a distribution or set of ranked
values; the point that divides the group into two
equal parts.
8. Mode: The value that occurs most frequently in a
distribution.
9. Variability: The spread or dispersion of measures or values.
10. Range: For some specified groups, the difference between
the highest and the lowest obtained measure or
value on a tool. It is a rough measure of variability.
11. Variance: A measure of variability of a distribution. It is the
average of the squared deviations of the measures
or values from the mean.
12. Standard Deviation: The positive square root of variance.
13. Standard Score: A general term referring to any of the variety of
‘transformed’ scores, in terms of which raw scores
may be expressed for reasons of convenience,
comparability, ease of interpretation, etc. Sigma
Scores, T-Scores etc. are the examples of standard
score.
14. Normal Distribution: A distribution of measures that in graphic form has a
distinctive bell-shaped appearance. It is symmetrical
and asymptotic. The mean, mode and median for
this type of distribution have equal values.
15. Percentile Rank: The expression of an obtained test score in terms of
its position within a group of 100 scores.
16. Co-efficient of correlation: A measure of the degree of relationship between
two sets of measures for the same group of
individuals. Its value ranges from 00, denoting a
complete absence of relationship, to+1.00 and -1.00,
indicating perfect positive and negative
correspondence respectively.

1.7 CHECK YOUR PROGRESS: THE KEY


1. Quantitative data is the description of an empirical event or phenomenon in a numerical
system presented with the help of different scales of measurement such as nominal,
ordinal, interval and ratio. The two major types of quantitative data are: parametric
(obtained through interval or ratio scales) and non-parametric (counted by a nominal scale

50
Types of Data

or ranked by a ordinal scale).

2&3 Class Interval Mid Point f fx′ fx′2


195-199 197 1 5 5 25
190-194 192 2 4 8 32
185-189 187 4 3 12 36
180-184 182 5 2 10 20
175-179 177 8 1 8 8
170-174 172 10 0 0 0
165-169 167 6 x′
-1 -6 6
160-164 162 4 -2 -8 16
155-159 157 4 -3 -12 36
150-154 152 2 -4 -8 32
145-149 147 3 -5 -15 75
140-144 142 1 -6 -6 36

N=50 ∑ fx′ = −12 ∑ fx′2 = 322

∑ fx′
(i) Mean = A.M. + ×i
N

(-12)
= 172 + ×5
50

= 170.80

i2
(ii) Variance =σ =2
[N ∑ fx′ 2
−(∑ fx′)2 ]
N

(5)2
= [(50 × 322 − (−12)2]
(50)2

= 159.52

i2
(iii) Standard Deviation =
N [N ∑ fx′ 2
−(∑ fx′)2 ]
5
= [(50 × 322 − (−12)2]
50

= 12.63

51
Data Analysis

4 Normal probability curve is symmetrical around its vertical axis, i.e. ordinate.
The values of mean median and mode coincide and have the same value.
The height ordinate is maximum at the mean,
The curve is asymptotic.
The points of inflection of the curve occur at points 1 ± , standard deviation
above and below the mean.
About 68.26 percent of the total area falls between the limits M + 1σ and M - 1σ; 95.44
per cent of the total area of the curve falls between limits M + 2σ and M - 2σ; and 99.73
percent of the total area of the curve falls between M + 3σ and M - 3σ.

5.
X Y x y x2 y2 xy

45 56 -20 -9 400 81 180


55 50 -10 -15 100 225 150
56 48 -9 -17 81 289 153
58 60 -7 -5 49 25 35
60 62 -5 -3 25 9 15
65(AM) 64 0 -1 0 1 0
68 65(AM) 3 0 9 0 0
70 70 5 5 25 25 25
75 74 10 9 100 81 90
80 82 15 17 225 289 225
85 90 20 25 400 625 500

∑ x = 2, ∑ y = 6, ∑ x 2
= 14/4, ∑y2
= 1650, ∑ xy = 1403,

N∑xy - (∑x)(∑y)
rxy =
√ [N∑x2 - (∑x)2[(Ν∑y2-(∑y)2]

11 × 1403 - 2 × 6
=
√ [11 × 1414 - (2)2[(11× 1650 -(6)2]

= 0.92

6 Qualitative data describes a phenomenon which cannot be measured or quantified.


The phenomenon is looked at in its totality. Detailed descriptions of situations, events,
people, interactions, and observed behaviours constitute qualitative data.

52

You might also like