0% found this document useful (0 votes)
138 views70 pages

A Handbook of Communication Research

This chapter discusses data analysis and interpretation in communication research. [1] It explains how to analyze data systematically using statistical techniques to test research hypotheses and draw valid conclusions. [2] It covers key statistical concepts like measures of central tendency, dispersion, frequencies, percentages, and tests of significance. [3] Understanding statistics is important for communication researchers to analyze data properly and make reliable interpretations.

Uploaded by

Johana Vangchhia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
138 views70 pages

A Handbook of Communication Research

This chapter discusses data analysis and interpretation in communication research. [1] It explains how to analyze data systematically using statistical techniques to test research hypotheses and draw valid conclusions. [2] It covers key statistical concepts like measures of central tendency, dispersion, frequencies, percentages, and tests of significance. [3] Understanding statistics is important for communication researchers to analyze data properly and make reliable interpretations.

Uploaded by

Johana Vangchhia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 70

HANDBOOK OF COMMUNICATION RESEARCH

Chapter VII –

DATA ANALYSIS AND INTERPRETATION


LEARNING OBJECTIVES

After going through this chapter you will be able to :

1. Use statistical techniques in analyzing data systematically, scientifically and taking care of
errors.
2. Explain statistical terms used in communication research.
3. Describe measures of central tendency, dispersion and association.
4. Apply test of significance in communication research.
5. Utilize software packages for Data Analysis.

Data analysis is a multifarious process. The aim of data analysis is to reach to useful
informations so that final conclusions can be drawn and recommendations can be made. In
communication research after the coding and classifying raw data, it is set for analysis and
testing the research hypothesis. Analysis means the categorising; ordering, manipulating and
summarising of data. It reduces data to comprehensible and interpretable form.

Based on data analysis the interpretation is done. Interpretation takes the result of analysis,
makes inference relevant to the research problem under study and draw conclusions. Data
through application of statistical procedures are analysed to reach specific conclusions.
Interpretation of a particular data can be done differently. Researchers may use various
statistical analysis techniques to reach conclusions. Charts, graphs and other pictorial
presentations are forms of depicting data. It helps in depicting the data in a form so that the
reader can understand data at a glance.

METHODS OF DATA ANALYSIS AND INTERPRETATION


Statistics are the key component of data analysis and interpretation. It is a branch of applied
mathematics used in observational data. Since, communication research is conducted in natural
settings; the items measured are subjected to variation. Hence, every observation is different
from other. Statistics consists of methods for dealing and interpreting data collected by
measuring such items.

Statistics is ‘Calculus of Observation’ – Whittaker and Robinson

Chapter VII – Data Analysis and Interpretation Page 153


HANDBOOK OF COMMUNICATION RESEARCH

Statistical principles have very wide scope of application. Knowledge of statistics is a must to a
researcher. This is of great help in analysing data systematically, scientifically and taking care
of significant errors.

AIMS OF STATISTICS

There are three main aims of statistics:

1. To study the population – As discussed earlier in chapter VI – Sampling. Population


is the total universe of the study from which representative unit is drawn as a sample.
Here, population means aggregate of the individuals who can be a human, a thing, a
family or any observation of a variable character like height, weight, age, income etc.

2. To study variation – It is the variation in the individuals which lead to study the
population. A communication researcher takes the variation as an important and
essential property of the population. In order to minimize it (s)he carefully observe the
samples and takes the average which (s)he regards as an approach to the true value and
avoid the variation.

3. To study the methods of reducing the data – A researcher collects lot of data which
needs to be work-out for getting the essence of it. The objective of analysing data is to
extract the relevant information and express it in a summarized and useful form.

SATISTICTICAL TERM USED IN COMMUNICATION RESEARCH

i. FREQUENCIES

According to the Oxford dictionary frequency is said as the rate at which something occurs
over a particular period of time in a given sample or population.

Frequencies are simply the numbers which are occurring frequently in population. From a
particular population representative samples are selected. Data derived from the analysis is
tabulated in form of series. The numbers which are occurring frequently in the series are
called frequencies. The frequency which is repeated in the series for maximum time is
called as highest frequency of the series.

HOW TO LOCATE FREQUENCY


Example 1 –

Chapter VII – Data Analysis and Interpretation Page 154


HANDBOOK OF COMMUNICATION RESEARCH

Name of the family No. of children in each family (f)


A 2
B 1
C 2
D 3
E 2
Table 8.1: Frequency table on individual basis

In table 8.1, we see that the number of families is taken as individual and number of
children in each family is listed. The occurrence of ‘2’ children in a family is repeated
maximum times. We see that family A, C & E have two children. Thus, the frequency of
number of children is ‘2’is the highest frequency.

Example 2 -
In table 8.2, we see that the marks obtained by the students from 0 to 30 are divided into a
class interval of 10. The highest frequency obtained here is 16 of the class (11-20) due to
maximum number of students scoring marks from 11 to 20.

Marks obtained (Interval) No. of students (f )


(0 – 10) 6
(11 – 20) 16
(21 – 30) 3
Total 25
Table 8.2: Frequency table on interval basis

ii. PERCENTAGES

Percentage is defined as a fraction or ratio with 100 understood as the denominator; it is a


proportion or share in relation to a whole. For example, a student obtained 20 marks out of
200. Percentage is 20 /200 x 100 = 10. Hence, marks obtained are 10% by the student.

Suppose, in a research study total number of respondents are 131. Now, ages are coded as
Young age below 35 years old, Middle age from 36 – 45 and Old age above 45. After
classification it was found that 47 respondents were young i.e. below 35 years old. 33
respondents were of middle age between 36 to 45 years old and 51 respondents were of old
age above 45 years old.

So, using percentage system calculation can be made as:


For young age - 47 / 131 x 100 = 35.88% say 36%
For middle age – 33 / 131 x 100 = 25.19% say 25%
For old age – 51/ 131 x 100 = 38.93% say 39%

Age of Young Middle Old


  Below 35 36 - 45 Above 45

Chapter VII – Data Analysis and Interpretation Page 155


HANDBOOK OF COMMUNICATION RESEARCH

Respondents 36% 25% 39%


Table 8.3: Percentage of age - groups out of total respondents

iii. MEASURES OF CENTRAL TENDENCY

Measure of central tendency is measure of location of the middle (center) of a distribution of


sample or population. It can be defined as a single number that represent the typical member in
a group consisting of several persons. All measures of central tendency are averages because
they attempt to identify different notions of the centre of a distribution.

To deal with data of mass character measures of central tendency is essential. It is also
called as Averages. In communication research we do not deal with the particular
characteristics of an individual, but with the characteristics like income size, family size,
ages etc of the whole group. To do so we need to take averages.

An average is a figure that conveys the attributes or characteristics of a population to


maximum extent. Average is drawn from the data collected based on the representative
sample of a population. It may not be only a figure but may also be a qualitative expression
like diligent, intelligent, etc. It carries as far as possible average number of qualities of the
whole group.

ESSENTIALS FOR A GOOD AVERAGE

The following are the main essentials for a good average:

a. Representative of the whole group: The average must be nearest to the largest
number of the units of the group in terms of measurement and in term of
qualitative expression it must posses most of the qualities of the group.
b. Definite and clearly ascertained: As far as possible the average must be
expressed in form of a single number rather than quality like a bureaucrat, a
village patriarch. Averages in form of this type are not adjustable to statistical
analysis.
c. Possess stability of value: Validity can be possessed only when the average is
not influenced by the extreme – value items of the groups. Otherwise, the
average will be significantly affected if few more items are added or removed
from the group.
d. Subjected to further mathematical analysis: For research purpose there
should be scope for further analysis to attain more accuracy of the average
drawn.
e. Absolute measurement not a relative one: The average drawn should be
supreme or complete not virtual. If an average is 10% higher and lower than
some particular size. It becomes necessary to know the particular size to
ascertain the 10% higher value.

Chapter VII – Data Analysis and Interpretation Page 156


HANDBOOK OF COMMUNICATION RESEARCH

f. Simple to calculate: Average is the simplest form of statistics used in research


studies. Thus, a researcher should try to keep the calculation of average simple
so that a person with common intelligence can understand and verify it.

CLASSIFICATION OF AVERAGES

The averages are broadly classified in three categories:

1. Averages of location
a. Mode
b. Median

2. Mathematical averages
a. Arithmetic average
b. Geometric mean
c. Harmonic mean

3. Other practical averages


a. Moving average
b. Progressive average

1. Averages of location
The averages in this method are identified by the location in the series. It is generally of two
types –

a. Mode

Mode is the most common type of average. The value of the item that has highest frequency is
referred as Mode. It means that an item which has highest frequency in the series is located. It
is the measurement or size that is directly applicable to the largest number of cases. Whenever,
we come across data detailing average ages, average height of an Indian, it is calculated using
mode method of average.

HOW TO CALCULATE MODE

The calculation of mode depends upon the frequencies; hence it is very easy and quick. But,
one has to carefully locate the frequency which is occurring maximum time in the series.
Grouping of data is done in either discrete or continuous series. The items-value with the
highest frequency will be considered as mode. Whenever complete data is not available or
people using average are not having technical knowledge then mode is frequently used. Mode
is also helpful in finding averages of intangible things known as types like diligence,
intelligence, labour class, middle class etc.

Name of the family No. of media available with family


A 0
Chapter VII – Data Analysis and Interpretation Page 157
HANDBOOK OF COMMUNICATION RESEARCH

B 3
C 5
D 2
E 1
F 2
G 4
H 2
I 1
J 0
K 3
L 2
Table 8.4: Number of media available in twelve families

If we have to calculate the mode from the following data regarding the number of media
available with each family, then the media available is arranged in form of discrete series.

No. of media available No. & Name of family


0 2 (A + J)
1 2 (E + I)
2 4 (D + F + H +L)
3 2 (B + K)
4 1 (G)
5 1 (C)
Total 12
Table 8.5: Discrete series of number of media

The modal number of media available is thus 2, as it has the highest frequency i.e. 4 (D + F +
H +L).

The above calculation of mode was of independent integer having one single number. Now let
us understand the calculation of mode for groups having intervals with a definite magnitude of
upper and lower limit.

From the following data regarding the per day allowance to TV producers of a company, find
out the average allowance by means of Mode.

Per day allowance to TV producers No. of producers


Up to Rs.100 8
Rs.100 – 150 10
Rs.150 - 200 15
Rs.200 – 250 25
Rs.250 – 300 12
Rs.300 – 350 11
Rs.350 – 400 7
Above Rs.400 2
Chapter VII – Data Analysis and Interpretation Page 158
HANDBOOK OF COMMUNICATION RESEARCH

Table 8.6: Continuous series of per day allowance of producers

In this the highest frequency is 25. The modal group therefore can be easily located as 200 –
250. But the mode should not be a group but a single number. Therefore, using the process of
interpolation the actual place of mode with the group will be located. According, to the
frequencies before and after the modal group, the two formulae are used. Let us understand the
calculation of mode using each formula.

1st Formula – Z = l1 + (f1 – f0 / 2 f1 – f0 – f2) x i

The symbols used in the formula denotes –

Z = Mode, it is generally the symbol used for indicating mode.

l1= Lower limit of the size of the modal group.

f0 = Frequency of the group prior to modal group.

f1= Frequency of the modal group.

f2 = Frequency of group, subsequent to modal group.

i = Class interval which is generally the difference between the upper and the lower limits
of the size of the modal group.

According to 1st formula the calculation will be done as follows –

l1 + f1 – f0
Z= xi
2 f1 – f0 – f2
In this example value of:

l1= 200 is the lower limit of the size of the modal group (200 - 250).

f0 = 15 is the frequency of the group prior to modal group.

f1= 25 is the frequency of the modal group.

f2 = 12 is the frequency of group, subsequent to modal group (200 - 250).

i = 50 is the class interval which is the difference between the upper and the lower
limits of the size of the modal group (200 - 250) i.e. 250 – 200 = 50.

Therefore,

Chapter VII – Data Analysis and Interpretation Page 159


HANDBOOK OF COMMUNICATION RESEARCH

Z = 200 + 25 – 15 / (2 x 25 – 15 -12) x 50

= 200 + 10 / 23 x 50

= 200 + 21.74 = 221.74

2nd Formula – Z = l1 + f2 / (f0 + f2) x i

The symbols used in the formula denotes –

Z = Mode, it is generally the symbol used for indicating mode.

l1= Lower limit of the size of the modal group.

f0 = Frequency of the group prior to modal group.

f1= Frequency of the modal group.

f2 = Frequency of group, subsequent to modal group.

i = Class interval which is generally the difference between the upper and the lower limits
of the size of the modal group.

According to 2nd formula the calculation will be done as follows –

l1 + f2 xi
Z=
f0 + f2
In this example value of:

l1= 200 is the lower limit of the size of the modal group (200 - 250).

f0 = 15 is the frequency of the group prior to modal group.

f2 = 12 is the frequency of group, subsequent to modal group (200 - 250).

i = 50 is the class interval which is the difference between the upper and the lower
limits of the size of the modal group (200 - 250) i.e. 250 – 200 = 50.

Therefore,
Chapter VII – Data Analysis and Interpretation Page 160
HANDBOOK OF COMMUNICATION RESEARCH

= 200 + 12 / (15 + 12) x 50

= 200 + 22 = 222

These two formulae are generally used and give same result in a perfectly symmetrical series.
In asymmetrical series the answer may differ somewhat. But, both are standard formulae and
give accurate result.

MULTI – MODAL GROUP

As we know the location of mode depends upon the frequencies of the group, which itself
depends upon the size of class – interval. If the class – interval is too small, instead of one
highest frequency group there may be two or more such groups. Such series having more than
one modal group are known as bi – modal, tri – modal or multi – modal series. It can be
generally corrected by increasing the size of class interval.

ADVANTAGES OF MODE
It is :
i. Not necessary to know the size of all the units.
ii. Expressed and located graphically also.
iii. Very easy to locate.
iv. Directly applicable to largest number of items.
v. More stable in nature as it’s not affected by extra – ordinary measurements.

DISADVANTAGES OF MODE

i. It changes with the size and class-interval. Hence, it is indefinite and


indeterminate as same data may show different modal values.
ii. It is limited to non-mathematical purposes only.
iii. If the highest frequency items are located in the very beginning or at the very
end of the series. So, it is difficult to represent mode even if mathematically
located.
iv. In case of bi-modal or multi-modal group the determination may be very
difficult.
v. It considers the frequencies of one group only and leaves the other groups. This
creates inaccuracy in result.

b. Median

The measurement of the middle item arranged either in ascending or descending order is
known as Median. To measure the average age of the male labour of a factory, we can arrange
the age of all male labours in ascending order and the age in the middle will be the average age
of the whole group.

Chapter VII – Data Analysis and Interpretation Page 161


HANDBOOK OF COMMUNICATION RESEARCH

Median is frequently used in measuring social phenomena like skills, productivity etc. where
the effect of extreme items is to be eliminated. It is also beneficial in case of abstract
phenomena, which cannot be calculated mathematically.

HOW TO CALCULATE MEDIAN

EXAMPLE 1-

Student : A B C D E F G H I J
Scores : 9 0 3 4 6 3 6 7 8 2

To calculate the average score by means of median from the data of the scores of 10 students in
a class test the following procedure is applied-

i. This is an ungrouped data. Thus, first we have to arrange them in serial order ignoring
the roll number and serial number given.

ii. Then we have to find out the middle number. The formula for finding the middle
number is n+1 in case of ungrouped data as given and n / 2 for continuous series.
2
Here n is the total number of items in the group.

iii. As we use the formula we find the size of the middle number which will be value of the
median. If middle number falls between the two whole numbers, the average of the two
will be the median.

Let us calculate median for the above data using this procedure –

Scores are arranged serially for all the students:

No. :A B C D E F G H I J
Scores : 0 2 3 3 4 6 6 7 8 9

n+1
Middle item =
2

= 10 + 1 / 2 = 5.5

Size of 5.5 item = 4 + 6 / 2 = 5

Chapter VII – Data Analysis and Interpretation Page 162


HANDBOOK OF COMMUNICATION RESEARCH

As the size of 5th item was 4 and 6th item was 6. The size of 5.5 item would be the mid – value
of the two.

EXAMPLE 2 -
Now let us understand the calculation of median for groups having intervals.

Measurement Frequency
0-5 5
5-10 7
10-15 10
15-20 18
20-25 20
25-30 12
30-35 8
35-40 6
40-45 4
45-50 1
Table 8.7: Continuous series to calculate median
The following procedure is adopted to calculate median of the above data-

i. The frequencies given must be converted into cumulative frequencies by giving the
progressive total. For the first group the cumulative frequency will be 5, for second it will
be 5+7 = 12, for third it will be 12+10 = 22, for forth it will be 22 + 18 = 40 and so on.

ii. Find out the middle item by using formula n / 2.

iii. Find the middle number in the cumulative frequencies, and thus find locate the median
group.

iv. The following formula is used for locating median by interpolation –

M = l1 + ( i / f ) x (m - c)

The symbols used in the formulae denote –

M = Median
l1 = Lower limit of the size of the median group.
i = Class interval of median group.
f = Frequency of median group.
m = middle number.
c = Cumulative frequency of the previous group.

Let us calculate median for the above data using this procedure –

Chapter VII – Data Analysis and Interpretation Page 163


HANDBOOK OF COMMUNICATION RESEARCH

Measurement Frequency Cumulative Frequency


0-5 5 5
5-10 7 12
10-15 10 22
15-20 18 40
20-25 20 60
25-30 12 72
30-35 8 80
35-40 6 86
40-45 4 90
45-50 1 91
Table 8.8: Cumulative frequency to calculate median

Middle item (m) = 91 / 2 = 45.5

The middle items lie at cumulative frequency 60 as 45.5 is more than 40 cumulative frequency
and less than 72 cumulative frequency, whose measurement is 20 - 25. Thus, it is called as the
middle group.

M = l1 + ( i / f) x (m - c)

In this example value of:

l1 = 20 is the lower limit of the size of the median group.


i = 5 is the class interval of median group.
f = 20 is the frequency of median group.
m = 45 is the middle number.
c = 40 is the cumulative frequency of the previous group.

= 20 + (5 / 20) x (45.5 - 40)

= 20 + (5 / 20) x 5.5

= 20 + 1.38

= 21.38

ADVANTAGES OF MEDIAN

i. Only the value of the middle item and the number of item is sufficient to calculate
the median. Therefore, it is not necessary to know the value of all the items.

ii. Very easy to locate.

Chapter VII – Data Analysis and Interpretation Page 164


HANDBOOK OF COMMUNICATION RESEARCH

iii. It possesses greater stability as it is not affected by the extreme items.


iv. Can be definitely ascertained.
v. Can be also located by means of cumulative frequency.

DISADVANTAGES OF MEDIAN

i. Changes by mere increase in number.


ii. Does not consider all items of the series.
iii. Median from continuous series is different from the one calculated from an array
series.

2. Mathematical averages
This method of finding average is based on mathematical calculation. It is basically of
three types -
a. Arithmetic average

It is also called as arithmetic mean, which is the best form of average. It is also very
popular as average par excellence. For all kinds of measurements where further analysis is
required arithmetic average is used. It is the derived by dividing the total measurement by
the numbers of items. It is not dependent on the frequency like median or mode. It is based
on the total value of all the items from first to last and hence considered more
representative. It is mostly use the case when all the items are equally important for
consideration.

HOW TO CALCULATE ARITHMETIC AVERAGE

EXAMPLE 1 –

To know the average news given by 10 reporters in a month of a newspaper house, we need
to calculate per capita average news given in a month by arithmetic average –

News given in a month – 50, 76, 44, 48, 57, 59, 63, 45, 48, 30.

The formula for the calculation of arithmetic average from the ungrouped data is as
follows:

a = x1 + x2 + x3 + ….. xn / n

or

a = ∑m / n
The symbols used in the formulae denote –
a = Arithmetic average or arithmetic mean

Chapter VII – Data Analysis and Interpretation Page 165


HANDBOOK OF COMMUNICATION RESEARCH

x1 + x2 + x3 + ….. xn = size of various items.

∑m = Total of measurement. The Greek letter sigma is used to


represent summation.

n = Total number of items

In this example value of:

x1 = 50
x2 = 76
x3 = 44
x4 = 48
x5 = 57
x6 = 59
x7 = 63
x8 = 45
x9 = 48
x10 = 30
n = 10
The per capita news average news given in a month will be –

a = 50+76+44+48+57+59+63+45+48+30 / 10

= 520 / 10

= 52

EXAMPLE 2 –

Now let us understand the calculation of arithmetic average for groups having intervals.
From the following data of heights of 100 boy students, calculation of arithmetic average can
be done using the procedure given below:

Height in inches No. of students


48 – 52 6
Table 8.9: Calculation of 52 – 56 12 arithmetic average from the
continuous series. 56 – 60 28
60 – 64 30
i. Calculation of mid 64 – 68 20 – value of different groups
is done first. This 68 – 72 4 is done by adding the two
class limits and dividing them by 2. Thus,
the mid-value of the first group would be (48+52)/ 2 = 50.
Chapter VII – Data Analysis and Interpretation Page 166
HANDBOOK OF COMMUNICATION RESEARCH

ii. Multiply the measurement (mid - value) with their respective frequencies and divide it
by n (total number of frequencies.)
Symbolically –

a = ∑mf / n

Size Mid – Value (m)


Frequency (f) m x f (mf)
48 – 52 50 6 300
52 – 56 54 12 648
56 – 60 58 28 1624
60 – 64 62 30 1860
64 – 68 66 20 1320
68 - 72 70 4 280
Total 6032
Table 8.10: Calculation of mid – value with the frequency.

a = ∑mf / n

a = Arithmetic average or arithmetic mean

∑mf = 6032 is the product of mid value and the frequency.


The Greek letter sigma is use to represent summation.

n = 100 is the total number of boy student.

= 6032 / 100

= 60.32

ADVANTAGES OF ARITHMETIC AVERAGE

i. One of the unique characteristic of arithmetic average is that the total of the
measurement can be known if the average and the number of items are known.
ii. Variation is not seen in arithmetic average like mode. It is definite and ascertained.
iii. It consider all the items, hence is more representative.
iv. It is easily understood.
v. It is not necessary to know the size of each individual or each group. Arithmetic
mean can be calculated even if the total of all the items is known.
vi. Further mathematical analysis can be made in this method.

DISADVANTAGES OF ARITHMETIC AVERAGE

i. As it is based on calculation. Thus, sometimes it gives very absurd results. Like

Chapter VII – Data Analysis and Interpretation Page 167


HANDBOOK OF COMMUNICATION RESEARCH

2.8 newspapers is the subscribed per family.

ii. The averages derived are not directly applicable to any items of the group.

iii. It is not suitable for open end tables. To calculate the arithmetic average value of
all the items separately or at least total value of all the items must be known.

iv. It cannot be located at a glance like mode. As it requires mathematical


calculation.

v. It is affected by extreme items. This makes the average unrepresentative.

b. Geometric mean

This is used in higher type of statistical analysis as it is most mathematical and


complicated. Geometric mean is the nth root of the product of all the items values.
It is strictly determinate in averaging positive values. To get the average of the rate of
change or the ratios between the measures geometric mean is used. It is capable to
algebraic manipulation. Thus, it is mostly put in those cases where the data has to be put to
further mathematical analysis. In cases where less importance is to be given to large
measurements geometric mean is used. To average percentages, ratios or index numbers
geometric mean is most suitable method.

HOW TO CALCULATE GEOMETRIC MEAN

g=n x1 x x2 x x3 …..xn

Here, g = Geometric mean


x1 , x2 , x3 = Various measurements

EXAMPLE –

The geometric mean of 18 and 8 would be

18 x 8 = 144 = 12.

In case of more than two items. We have to calculate 3 rd, 4th or any other root. The use of
logarithm tables is made. The formula used for this purpose is –

g = Anti log log x1 + log x2 + log x3 ….. + log xn / n

The procedure for calculating geometric mean is –


i. Firstly, the logarithm of each item is to be drawn using the log table.
ii. Then total of all logarithms is made.
Chapter VII – Data Analysis and Interpretation Page 168
HANDBOOK OF COMMUNICATION RESEARCH

iii. The total obtained is divided by the number of items.


iv. Lastly, the anti log of the quotient is found. It is the required geometric mean.

EXAMPLE 1 -
To calculate the average percentage increase in population on different censuses in a particular
town. The following data is given:

Year % of increase in population


1970 4.5
1980 16.8
1990 104.3
2000 0.9
2010 0.04
Table 8.11: Percentage increase in population in different census of a town

Following the procedure, we need to find the logarithm of each item.

Year % of increase in population Logarithms


1970 4.5 0.6532
1980 16.8 1.2253
1990 104.3 2.0170
2000 0.9 0.9542
2010 0.04
1.6021
Total 4.4518
Table 8.12: Logarithms of various percentage increase in population.

The values given after before decimal points are called Characteristics which can be positive or
negative. Like the 2010 item’s logarithm is 1.6021. The bar upon a characteristic indicates a
negative value. The value after the decimal point is known as Mantissa which is always
positive.

g = Anti log of 4.4518 / 5

= Anti log of 0.89036


= 7.762

ADVANTAGES OF GEOMETRIC MEAN

i. The most appreciable quality of geometric mean is its reversibility. Let us


understand with the following example –

Commodity 2008 2010

Chapter VII – Data Analysis and Interpretation Page 169


HANDBOOK OF COMMUNICATION RESEARCH

Price Relative Price Relative


A 10 100 5 50
B 8 100 16 200
200 250
Arithmetic mean 100 125
Geometric mean 100 100

Table 8.13: Relative prices of commodities in different years

In this case the price of one commodity is doubled and that of the other is halved.
As such there is no change in the price level but the arithmetic average shows an
increase of 25% whereas geometric mean shows the real position.

ii. It is more representative. All units of the groups are undertaken during calculation.
iii. Elimination of unjustified importance of extreme item does not require leaving
them out to remove abnormality.
iv. It is open to further mathematical operations.

DISADVANTAGES OF GEOMETRIC MEAN

i. Not easily located like median and mode. Rigorous calculation is performed on the
data.
ii. Necessary to know the size of all the units for computation purpose.
iii. The computing in geometric mean is very tough and requires good knowledge of
the subject.
iv. The greatest limitation of geometric mean is that it cannot be used if the size of any
item is zero.

c. Harmonic mean

The Harmonic mean is used to find the average of time taken by an individual or a thing e.g. to
find the average time taken in producing a television program when workers with different
output standard are engaged. The number derived by dividing the number of items by
reciprocal of the item values of a series is called as Harmonic mean. As we know reciprocal of
any number is the digit which when multiplied with the number yields the product 1. For
example reciprocal of 2 is ½ of 3 is 1/3 etc.

Harmonic mean is always lower than arithmetic average or geometric mean because it gives
least possible weight to large items values. Since, it is derived by mathematical calculations it
can be subjected to further algebraic analysis. It is very representative as it considers all the
items of the group and it not depend on few of them.

HOW TO CALCULATE HARMONIC MEAN

The formula for the computation of Harmonic means is as follows:-

Chapter VII – Data Analysis and Interpretation Page 170


HANDBOOK OF COMMUNICATION RESEARCH

h = n / (1 /a + 1/b + 1/c …+ 1/n)

Where, h = Harmonic mean


n = Total number of items
a, b, c = Various measurements

Another easier method for calculating harmonic mean is by using reciprocal tables. The
formula the would be –

h = Reciprocal of ∑ Reciprocals / n

The following process is used for calculating the harmonic mean by means of reciprocals –
i. Find the reciprocal of each measurement from the reciprocal tables.
ii. Add up the reciprocal and divide the total by number of items.
iii. Find the reciprocal of the dividend from the reciprocal table. The number thus
arrived at is the harmonic mean.

Find out the average time taken by each worker in producing an advertisement? If A, B & C
are three workers engaged in producing an advertisement. A takes 30 minutes, B takes 20
minutes and C takes 40 minutes in making one single advertisement.

Using the formula - h = n / (1 /a + 1/b + 1/c …+ 1/n)

= 3 / (1/30 + 1/20 + 1/40)

= 3 / (4 + 6 + 3 / 120)

= 3/ (13 / 120)

= 3 x 120 / 13

= 27.69 say 27.7

Using the formula based on the means of reciprocal –

h = Reciprocal of ∑ Reciprocals / n

= Reciprocal of Rec 30 + Rec 20 + Rec 40 / 3

The reciprocal values are:

Rec 30 = 1/30 = 0.0333, Rec 20 = 1/20 = 0.0500 & Rec 40 = 1/40 = 0.0250

= Reciprocal of 0.0333 + 0.0500 + 0.0250 / 3


Chapter VII – Data Analysis and Interpretation Page 171
HANDBOOK OF COMMUNICATION RESEARCH

= Reciprocal of 0.1083 / 3

= Reciprocal of 0.0361 = 1 / 0.0361 = 27.7

ADVANTAGES OF HARMONIC MEAN

i. It is definitely ascertained not like mode.


ii. It is amendable to further mathematical calculation.
iii. More representative than median or mode, as it considers all the items of the group.
iv. The effect of extra – ordinary large items is reduced in harmonic mean.

DISADVANTAGES OF HARMONIC MEAN

i. It requires specific knowledge to calculate harmonic mean, which is generally not


understood by lay person.
ii. The data of all the figures is essential to derive it.
iii. Use of reciprocal table is difficult to understand for common person. As it need
specialized training.
iv. It cannot be located by mere observation.

IMPORTANCE OF AVERAGES IN STATISTICAL ANALYSIS

Any type of mass study is possible using average or measure of central tendency. It forms
the most fundamental basis upon which the entire structure of statistical analysis is built.
The following are the advantages of using the averages in statistical analysis –

i. The idea of average classifies the whole mass in to comparatively fewer types and
this makes further analysis possible.

ii. In communication research, each unit differs from the other and no two cases are
exactly alike. In such a case any type of study becomes difficult. Thus, average is
most suitable for such complex phenomena.

iii. Individual items of a group changes too frequently but, the averages taken from
them are comparatively more stable. This stability in averages makes the statistical
study possible. For example TRP of a channel changes week by week but, the
average TRP does not change so often.

iv. Average is used in all type of mass analysis. It is the basis of statistical analysis as
from averages further statistical analysis like correlation, dispersion, skewness or
the like are made.

v. Averages helps in making generalisation about person or phenomena even when


full information is not available. It is very helpful in communication research as we
Chapter VII – Data Analysis and Interpretation Page 172
HANDBOOK OF COMMUNICATION RESEARCH

can make ideal type like daily soap opera, mythological program etc. Thus, if we do
not even have full information about a program but we know that it’s a
mythological program we can easily assume its quality as mythological serials have
similar format of presentation.

vi. It gives validity to generalisations. The statistical generalisation is not only drawn
from the averages but also applicable for averages. The rate of population growth is
not valid for every single family individually; it is valid for average growth rate of
the whole country.

vii. Average makes the whole idea clear in one figure or a few words only. The
expression to the average students can be either given as in figures as below 60% or
in words as C grade.
LIMITATIONS OF AVERAGE

Although averages are widely used in all communication research, still it has following
limitations –

i. Averages are useful in understanding the group characteristics, but individually


they have very little importance as it has no considerations for single case.

ii. Averages can be derived in different ways like mode, median, geometric, harmonic
etc. All the types have different characteristic and give different measurement.
Thus, an average for one purpose cannot be used for other, this creates confusion.

iii. It gives a theoretical idea about the measurement or quality of a group. In practical
life they are not of much use as one cannot plan his budget on the basis of average
per capita income or average cost of living. It will depend upon his requirement,
size of the family, consumption pattern etc.

iv. Absurd results are sometimes drawn from averages which are not practically
possible like the average children per family came as 2.2, which cannot be possible.

Indeed averages have lot of limitations but still it is widely used in communication
research. As communication research is not based on individual study but mass behaviour,
mass character and phenomena, which can be best represented by averages. Thus, it has got
lot of significance in statistical analysis and limitations persist only because of unscientific
computation or irrational use.

iv. MEASURES OF DISPERSION

The statistical measure to compute the scatteredness of the items within the group or their
tendency of deviation from the average is called measures of dispersion or measures of
variability in statistical terminology. The average gives measure of central tendency but is
Chapter VII – Data Analysis and Interpretation Page 173
HANDBOOK OF COMMUNICATION RESEARCH

incompetent to give complete picture of whole group. It is necessary to know measures of


variability, degree of deviation from the central tendency or scattering of items within the
group. Let us understand the prominent feature of dispersion from the following example:

No. of
reporter Mon Tue Wed Thu Fri Sat Total Average
Reporter 1 5 5 5 5 5 5 30 5
Reporter 2 4 4 5 5 6 6 30 5
Reporter 3 2 3 5 6 6 8 30 5
Table 8.14: Number of news – stories submitted by three reporters in weekdays

We can see that the news-stories submitted by each reporter in weekdays is same and the
average news story given by all three reporters is also equal. But, we cannot say that they all
are equal in all respects. There is a difference in their regularity and certainty. Reporter 1 is
regularly submits 5 reports in a day. Reporter 2 is slightly irregular and reporter 3 is extremely
erratic and uncertain in submitting news–stories. Thus, we can conclude that measures of
dispersion tell us the degree of average deviation from the central tendency. For a stable
measurement which is capable of being used in future, we have to find the average of deviation
of all the units. That is why dispersion is also known as an average of second order.

MEASUREMENT OF DISPERSION

Following methods are generally used for measuring the dispersion:

1. Method of limits –
a. Range
b. Inter – quartile range
c. Semi – inter quartile range

2. Method of deviation –
a. Mean deviation
b. Standard deviation

1. Method of limits
This method of finding the dispersion is based on the limits of a group or a class. It can be
derived in following ways:

a. Range

It is the easiest method of finding the measures of variability. The range signifies the
difference between the highest and the lowest measurement. If we want to find out the
degree of dispersion in the yearly circulation of a newspaper, we have to find the highest
and the lowest limits and the difference between the two would be the range of dispersion.
The range is absolute and relative. When we have to compare the range of two groups we
have to find, it is known as relative measure of range or coefficient of range.

Chapter VII – Data Analysis and Interpretation Page 174


HANDBOOK OF COMMUNICATION RESEARCH

HOW TO CALCULATE RANGE

When two groups or two factors are compared we find the absolute range and then its
coefficient range.

The formula for calculating absolute range is –

Absolute Range = m1 – m0

The formula for calculating coefficient range is –

Coeff. of Range = m1 – m0 / m1 + m0

The symbols used in both the formulae denote –

m1 = Highest measurement

m0 = Lowest measurement

EXAMPLE 1-
Following data relates to the number of advertisement on-aired during prime time (8pm to
10pm) in two channels in a week. Let us find out whose rate of advertisement is more variable
though the method of range.

Channel Mon Tue Wed Thu Fri Sat Sun


A 22 21 18 20 19 26 25
B 25 15 20 24 22 28 30
Table 8.15: Number of advertisement on-aired at prime time in two channels in a week

We found that lowest and highest limit of channel A is 18 and 26 and the lowest and
highest limit of channel B is 15 and 30 respectively.

Using the formula –

Absolute Range = m1 – m0

Absolute Range of channel A = 26 – 18 = 8.

Absolute Range of channel B = 30 – 15 = 15.

Then the coefficient range or relative measure of range of both the channel is derived by
using the formula –

Chapter VII – Data Analysis and Interpretation Page 175


HANDBOOK OF COMMUNICATION RESEARCH

Coeff. of Range = m1 – m0 / m1 + m0

Coeff. of Range channel A = (26 – 18) / (26 + 18) = 8 / 41 = 0.20

Coeff. of Range channel B = (30 – 15) / (30 + 15) = 15 / 45 = 0.33

Coefficient of channel A is 0.20 and channel B is 0.33. It means rate of advertisement is


more variable in channel ‘B’ than ‘A’.

b. Inter – quartile Range

The range method of finding the dispersion is very simple to locate but it has a major
drawback of being affected by the extreme items. If we find that the average income of
people in a group is ranging from Rs.1.5lac to 2.5lacs, but if even one rich person with high
income is included. The range will increase abnormally. To avoid such errors use of inter –
quartile range is made. It is the distance from one quartile to the other. Here, elimination of
25% of items from both sides is done and only the range of middle 50% is considered. This
is done with the assumption that extra – ordinary items are deleted when 25% are left out
on both the ends and only normal items remains.

HOW TO CALCULATE INTER – QUARTILE RANGE

To get normal range of variability inter – quartile range is used.

The formula for calculating inter - quartile range is –

I.Q.R = Q3 – Q1

The formula for calculating coefficient inter - quartile range is –

Coeff. of I. Q. R = (Q3 – Q1) / (Q3 + Q1)

The symbols used in both the formulae denote –

Q1 = Size of first quartile

Q3 = Size of third quartile

EXAMPLE 1-
From the following figures relating to the payment on per day basis to the 10 stringers of a
news agency, find out inter – quartile range and its coefficient.

Chapter VII – Data Analysis and Interpretation Page 176


HANDBOOK OF COMMUNICATION RESEARCH

150, 350, 120, 100, 270, 400, 180, 420, 240, 210

Let us arrange the per day payment of 10 stringers serially –

No. A B C D E F G H I J
Rs. 100 120 150 180 210 240 270 350 400 420

Q1 = Size of (n +¼)th item.

= Size of (10 + ¼) = 2.75th item.

In this example the values of:

n = the total number of items.

Let us see the 2.75th item in the above series i.e. 120. It is at 2nd place and in between 2 & 3.
Thus, 0.75 items of 150 to 120 will be taken.

Size = 120

Interval = 150 – 120

= 120 + 0.75 (150 - 120)

= 120 + 0.75 x 30

Q1 = 120 + 22.5 = 142.5

Q3 = Value of 3 (n + ¼)th item


= Value of 3 (10 +¼) = 8.25th item

In this example the values of:

n = the total number of items.

Let us see the 8.25th item in the above series i.e. 350. It is at 8 th place and in between 8 & 9.
Thus, 0.25 items of 400 to 350 will be taken.

Size = 350

Interval = 400 – 350

= 350 + 0.25 (400 - 350)

Chapter VII – Data Analysis and Interpretation Page 177


HANDBOOK OF COMMUNICATION RESEARCH

= 350 + 0.25 x 50

Q3 = 350 + 12.5 = 362.5

Therefore,

I.Q.R = Q3 – Q1

= 362.5 – 142.5 = 220

Then, Coeff. of I. Q. R = (Q3 – Q1) / (Q3 + Q1)

= (362.5 – 142.5) / (362.5 + 142.5)

= 220 / 505 = 0.44

EXAMPLE 2 –
From the following data regarding the wages of side artist in the films, calculate the inter
quartile range.

Wages in No. of side -


rupees artist
0 - 100 4
100 - 200 8
200 - 250 20
250 - 300 35
300 - 350 10
350 - 400 3
Total 80
Table 8.16: Frequency of side – artists according to their wages

Measurement Frequency Cumulative frequency


(m) (f) (c f)
0 - 100 4 4
100 - 200 8 12
200 - 250 20 32
250 - 300 35 67
300 - 350 10 77
350 - 400 3 80
Table 8.17: Cumulative frequency of side – artists according to their wages

First quartile item (q1) = n/4 = 80 / 4 = 20 here, n = 80.

Third quartile item (q3) = 3 (n/4) = 3 (80/4) = 60 here, n = 80.


Chapter VII – Data Analysis and Interpretation Page 178
HANDBOOK OF COMMUNICATION RESEARCH

Size of first quartile (Q1) = l1 + i / f (q1 - c)

This is the formula used to find median for continuous series.

In this example value of:

l1 = 200 is the lower limit in case of quartile 1 has been taken as mid – value of 100 and
250.

i = 50 is the difference in the class interval of median group.

f = 20 is the frequency of median group.

q1= 20 is the first quartile item.

c = 12 is the cumulative frequency of the previous group.

= 200 + 50 / 20 (20 - 12)

= 200 + 50 x 8
20
= 200 + 400
20
= 200 + 20
= 220

Size of third quartile (Q3) = l1 + i / f (q3 - c)

This is the formula used to find median for continuous series.


In this example value of:

l1 = 250 is the lower limit in case of quartile 3 has been taken as mid – value of 200 and
300.

i = 50 is the difference in the class interval of median group.

f = 35 is the frequency of median group.

q3= 60 is the third quartile item.

c = 32 is the cumulative frequency of the previous group.

= 250 + 50 / 35 (60 - 32)

= 250 + (50/35) x 28

Chapter VII – Data Analysis and Interpretation Page 179


HANDBOOK OF COMMUNICATION RESEARCH

= 200 +1400/ 35

= 200 + 40
= 240

I.Q.R = Q3 – Q1 Here, Q3 = 240 & Q1 = 220

= 240 – 220 = 20

Coeff. of I. Q. R = (Q3 – Q1) / (Q3 + Q1)


= (240 – 220) / (240 + 220)
= 20 / 460 = 0.04

Interpretation – In example 1 Coefficient of Inter Quartile Range is 0.44 and in example 2


Coefficient of Inter Quartile Range is 0.04. It means rate of variability of per day payment
of stringers is more than the daily wages of side artists.

c. Semi – Inter Quartile Range

Inter – quartile range is taken to be more representative than range but it has a weakness. It
is thought that more valid measurement of the degree of dispersion is the average distance
between the median and the two quartiles, and not the absolute distance from one quartile
to the other. Therefore, another measure has been used for this purpose known as Semi –
inter quartile range, where half of the inter–quartile range is expressed. It is also called as
Quartile deviation.

HOW TO CALCULATE SEMI – INTER QUARTILE RANGE

The semi – inter quartile range or quartile deviation can be expressed as under.

Quartile Deviation = Q3 – Q1 / 2

Coeff. of Q. D = (Q3 – Q1/ 2) / (Q3 + Q1 / 2)

or = (Q3 – Q1) / (Q3 + Q1)

We find that coefficient of quartile deviation is the same as the coefficient of inter –
quartile range. The method of calculation of quartile deviation is same as inter – quartile
range. Just the difference is it has to be divided by 2.

Chapter VII – Data Analysis and Interpretation Page 180


HANDBOOK OF COMMUNICATION RESEARCH

2. Method of deviation

Deviation is a measure of difference between the observed value and the mean. It can be
positive or negative. The sign of deviation reports the direction of the difference. It is larger
when sign is positive and smaller if sign is negative. The interval of the value indicates the size
of the difference.

Deviations from the population mean are errors while the deviation form the sample mean are
residuals. The sum of deviation across the entire set of all the observation from the mean is
always zero and the average deviation is zero.

Deviation is calculated in two ways:

a. Mean deviation

The arithmetic average of deviations of each item value from any average is known as mean
deviation. It is not a method of limits. It takes into consideration every single measurement and
not some crucial points only. Thus, it is a method of average deviation.

Any average mean, mode or median can be selected for computing deviation. Generally,
median is considered to be most suitable because the mean deviation from the median is the
least. Mean deviation is symbolically expressed by Greek letter delta (δ).

HOW TO CALCULATE MEAN DEVIATION

The formula used for calculating mean deviation is –

Mean deviation (δ) = ∑f d / n

Coefficient of Mean Deviation = δ / Mean, Median or Mode

The symbols used in both the formulae denote –

d = deviation from mean, mode or median.


fd = product of frequency and deviation.
n = total number of items.
δ = mean deviation

For calculating the coefficient of mean deviation, the mean deviation is divided by the average
(mean, median or mode), whichever ha been used for the purpose of calculating deviation.
The following procedure is followed:

i. First step is to calculate mean, median, mode or arithmetic average and deviation
from it. Median should be given preference in comparison to other forms of
averages.

Chapter VII – Data Analysis and Interpretation Page 181


HANDBOOK OF COMMUNICATION RESEARCH

ii. Secondly, the difference of each items from the average has to be calculated
ignoring the plus or minus sign.
iii. Then, multiply deviation with their respective frequencies (fd).
iv. The product of all the deviation with their respective frequencies (f d) should be
added ∑fd.
v. The total of ∑fd should be divided by the total number of items ∑fd / n.
vi. In case of continuous series where class intervals are given then, the mid – value of
the group should be taken as measurement.

EXAMPLE 1 –
From the following data relating to the number of special correspondents in 5 daily
newspapers houses calculate the Mean deviation and its coefficient.
3, 4, 7, 2, 6

No. of special Deviation


corrspondents from
S.no. (m) median 3 (d)
1 2 1
2 3 0
3 4 1
4 6 3
5 7 4
Total ∑d = 9
Table 8.18: Deviation from median of all the measurements

Median = size of (n+1)th item / 2

= size of (5+1)th item / 2 Here, n = 5 as the total number of newspaper houses is 5.

=3

Mean deviation (δ) = ∑d / n Here, ∑d = 9 as the total of deviation from median is 9.

= 9 / 5 = 1.80

Coeff. of Mean deviation = δ / Mean, Median or Mode


In this case we have taken median, not mode or mean.
The median as calculated above came as 3.
Therefore,

Coeff. of Mean deviation = 1.80 / 3 = 0.60

EXAMPLE 2 –
From the following series calculate the mean deviation and its coefficient.

Chapter VII – Data Analysis and Interpretation Page 182


HANDBOOK OF COMMUNICATION RESEARCH

Size Frequency
0–5 5
Let us first arrange the 5 – 10 8 continuous series -
10 – 15 10
Measurement Mid – value15 –Frequency
20 7 Cumulative Deviation Freq. x
(m) (f)
20 – 25 6 Frequency From median Dev.
25 - 30 4 (cf) 13.5 (d)
(f d)
0–5 2.5 5 5 11 55
5 – 10 7.5 8 13 6 48
10 – 15 12.5 10 23 1 10
15 – 20 17.5 7 30 4 28
20 – 25 22.5 6 36 9 54
25 - 30 27.5 4 40 14 56
Total 251
Table 8.19: Arrangement of data for finding different values

Middle number (m) = n/2 The cumulative frequency is 40. Thus, n = 40.

= 40/2 = 20

Size of 20th item = l1 + i / f (m - c) , Formula used to find median for continuous


series.
In this example the value of –

l1 = 10 is the lower limit in measurement for the size of 20th item.

i = 5 is the difference in the class interval of median group.

f = 10 is the frequency of median group.

m = 20 is the middle number derived above.

c = 13 is the cumulative frequency of the previous group.

Therefore, Size of 20th item = 10 + 5 / 10 (20 - 13)

= 10 + 5 x 7 / 10

= 10 + 35 / 10

= 10 + 3.5 = 13.5

Mean deviation (δ)= ∑f d / n Here, ∑fd = 251 as the total of the product of frequency and
deviation is 251 and n = 40.

Chapter VII – Data Analysis and Interpretation Page 183


HANDBOOK OF COMMUNICATION RESEARCH

Therefore, Mean deviation (δ) = 251/ 40


= 6.275

Coeff. of Mean deviation = δ / Mean, Median or Mode

Here, δ = 6.275 and median = 13.5

= 6.275 / 13.5
= 0.46

b. Standard deviation

The major limitation of mean deviation is that it ignores the plus, minus sign in the calculation
of deviation. This limits it to further analysis. Although mathematically it is more accurate than
method of limits but in order to remove this limitation another method known as standard
deviation is used. In this method the plus and minus sign is taken into consideration but the
deviation are squared up to eliminate the plus and minus signs. The total of square of
deviations is then divided by the total number, and the square root of quotient is the standard
deviation.

HOW TO CALCULATE STANDARD DEVIATION

σ = ∑f d2 / n

Coefficient of S.D. = σ / a Here, a is calculated using the basic formula of Arithmetic average.

a = x1 + x2 + x3 + ….. xn / n

or
a = ∑mf / n

The symbols used in formulae denotes –


σ = Standard deviation, it is expressed by Greek letter Sigma.

d = Deviation from the arithmetic average. Standard deviation is almost always calculated from
arithmetic average.

fd2 = Square of deviation multiplied with frequency.

a = Arithmetic average.

x1 + x2 + x3 + ….. xn = size of various items.

Chapter VII – Data Analysis and Interpretation Page 184


HANDBOOK OF COMMUNICATION RESEARCH

n = Total number of items

∑mf = The product of mid value and the frequency.


The Greek letter sigma is use to represent summation.

The steps followed for computing standard deviation are:–

i. The arithmetic average is calculated first (a).


ii. The deviation is calculated of each item value from the arithmetic average,
consideration of both plus and minus sign is must (d).
iii. Square up the deviations (d2).
iv. Multiply of frequencies with deviations (fd2).
v. Calculate the product of frequencies and deviation (∑f d2).
vi. Divide the product by total of frequencies or total number of items (∑f d2 / n).
vii. Calculate the square root of the quotient. The number thus arrived at is the required
standard deviation.

EXAMPLE 1 –

Calculate the standard deviation and its coefficient of the following series.

Measurement Frequency
1-3 4
3-5 6
Let us find the following values 5-7 9 and arrange them in tabular
form – 7-9 12
9-11 5
Measurement Mid – value Frq. 11-13Product of4 Deviation from Square Product of
(m) (f)mxf a i.e. 7 of d f x d2
(mf) (d) (d2)
1-3 2 4 8 -5 25 100
3-5 4 6 24 -3 9 54
5-7 6 9 54 -1 1 9
7-9 8 12 96 +1 1 12
9-11 10 5 50 +3 9 45
11-13 12 4 40 +5 25 100
40 ∑mf = 280 70 ∑f d2 = 320
Table 8.20: Arrangement of data for finding different values for computing standard deviation
and its coefficient

a = ∑mf / n
Chapter VII – Data Analysis and Interpretation Page 185
HANDBOOK OF COMMUNICATION RESEARCH

a = 280 / 40 = 7

In this example value of –

∑mf = 280 is the product of mid value and the frequency.


The Greek letter sigma is use to represent summation.

n = 40 is the total number of items.

σ = ∑f d2 / n

= 320 / 40

= 8

= 2.83

In this example value of –

∑fd2 = 320 is the square of deviation multiplied with frequency.

n = 40 is the total number of items.

Coefficient of S.D. = σ / a

= 2.83 / 7
= 0.404

Interpretation – In example 1 Coefficient of Standard deviation is 0.404. It means rate of


variability in the series is of 0.404 and it can be compared on basis of variability with another
series coefficient of standard deviation if obtained.

v. MEASURES OF ASSOCIATION

There may or may not be any association between two or more phenomena, groups, classes or
series of data. A good book leads to bestseller of the year. Demand of entertainment serials
begins to increase it’s number of production also increases. Increase in Coal production in the
state may not be related with population growth of the state. Thus, we can say an increase in
one phenomenon brings similar or opposite change in other phenomenon may not have any

Chapter VII – Data Analysis and Interpretation Page 186


HANDBOOK OF COMMUNICATION RESEARCH

association if the subjects are different. If relationship exists it is technically known as


Measures of Association. It is popularly called as Correlation.

It does not matter whether data in one series changes in same or reverse direction from the
other, only there should be a causal connection between the two. In both the cases correlation
exist but it is termed as positive or negative according to the direction of change. Correlation is
said as positive if both the phenomena are changing in same direction. It is also termed as
Direct Correlation. For example, increase in light leads to increase in heat or the height of a
boy increases with his age. Another example is that higher the circulation of newspaper the
more will be the revenue. Likewise if both the phenomena change in opposite direction it is
said as Negative or Inverse or Indirect correlation. For example, increase in medical facilities
reduces the death rate or increase in number of children decreases the standard of living as
money available per head is reduced. Increasingly use of statistical data in write up decreases
readership.

If change in one phenomenon brings change in other phenomena in fixed proportion, the
correlation between them is said to be perfect correlation. Numerically it is indicated as +1 or
-1. The +1 correlation is said as Perfect positive correlation, which increases or decreases in
same direction with fixed proportion. The -1 correlation is said as Perfect negative correlation,
which increases or decreases in opposite direction with fixed proportion.

CLASSIFICATION OF CORRELATION OR MEASURES OF ASSOCIATION

The correlation is classified in the following ways –


i. Positive Correlation
ii. Negative Correlation
iii. Linear Correlation
iv. Curvilinear Correlation

Positive (i) and Negative (ii) Correlation have already been discussed above.

iii. Linear Correlation

When the ratio of change in the two variables is the same, the various values when plotted on
a graph paper would be a straight line. Then the correlation is said to be Linear. Such a
correlation is found generally in cases of physical sciences. As in case of physical science the
increase in one phenomenon is directly proportional to other phenomenon like 5% increase in
height leads to 7% increase in weight. Thus, the relationship between the two will is said to
have a linear correlation.

iv. Curvilinear Correlation

When the ratio of change in two variables varies in some parts of the series, which when
plotted on a graph paper would form a curve it is said to be Curvilinear. In case of
communication research the correlation is never so perfect thus the relationship is generally
Chapter VII – Data Analysis and Interpretation Page 187
HANDBOOK OF COMMUNICATION RESEARCH

curvilinear. Like TV viewership and TV programmes quality are related to each other but
relationship does not follow a straight line.

HOW TO MEASURE OF CORRELATION

After establishing correlation between two variables, it is to be seen as to what is the exact
degree in which the two are correlated. There can be three ways of correlation:
i. The two series may be highly correlated, so that change in one may be accompanied
by an equal change in other.
ii. There may be only a slight correlation between the two series; a change in one series
may not be accompanied by a change in other series.
iii. Sometimes no change is seen even in opposite direction between the two variables.
But, still relationship exists.

Hence, it is required to measure the degree of correlation. Following are the major methods of
ascertaining the degree of correlation:

I. Mathematical methods –

a. Karl Pearson’s coefficient of correlation


This method is on the name of a mathematician Karl Pearson who has been credited for
establishing the discipline of mathematical statistics. He introduced a new formula for
calculation of the coefficient of correlation. Coefficient of correlation between any two series
is computed by dividing the product of deviation from the arithmetic average by the product of
two standard deviations to the number of pairs in it.

HOW TO CALCULATE KARL PEARSON’S COEFFICIENT OF CORRELATION

r = ∑dx dy / n x σx x σy

The symbols used in the formula denotes-

r = Coefficient of correlation

dx = Deviations of item values of x series from its arithmetic average.

dy = Deviations of item values of y series from its arithmetic average.

dx dy = Product of dx and dy

n = Number of pairs.

σx = Standard deviation of x series.

σy = Standard deviation of y series.

Chapter VII – Data Analysis and Interpretation Page 188


HANDBOOK OF COMMUNICATION RESEARCH

In this formula the deviation has to be calculated from the actual arithmetic average. This
involves a long process. To avoid it a short cut formula is used in the deviation is calculated
from the assumed average.

Short cut formula

r = [∑dx dy - n ( ∑dx / n ) ( ∑dy / n )] / n x σx x σy

If standard deviation is not to be calculated separately the same formula would be as under-

r= ∑dx dy - n ( ∑dx / n ) ( ∑dy / n )

nx ∑dx 2/ n – (∑dx / n) 2 x ∑dy 2/ n – (∑dy/ n) 2

EXAMPLE -

From the following numbers of adult and children viewers of TV programmes in a month. Find
out if there is any correlation between the two i.e. viewing of TV programmes by adult and
children.

Adult viewers: 63 64 65 67 68 69 70 71 71
Children viewers: 65 63 63 65 67 68 71 68 69

Adult Deviation Square of Children Deviation Square of Product of


viewers from Deviation viewers (y) from Deviation (dx) and
(x) assumed (dx2) assumed (dy2) (dy)
average average (dx x dy)
68 (dx) 67 (dy)
63 -5 25 65 -2 4 10
64 -4 16 63 -4 16 16
65 -3 9 63 -4 16 12
67 -1 1 65 -2 4 2
68 0 0 67 0 0 0
69 +1 1 67 0 0 0
69 +1 1 68 +1 1 1
70 +2 4 71 +4 16 8
71 +3 9 68 +1 1 3
71 +3 9 69 +2 4 6
-3 75 -4 62 58
Table 8.21: Arrangement of data for finding different values for computing Karl Pearson’s
coefficient of correlation

Chapter VII – Data Analysis and Interpretation Page 189


HANDBOOK OF COMMUNICATION RESEARCH

r= ∑dx dy - n ( ∑dx / n ) ( ∑dy / n )

nx ∑dx 2/ n – (∑dx / n) 2 x ∑dy 2/ n – (∑dy/ n) 2

In this example value of –

∑dx = -3 is the deviations of item values of x series from its arithmetic average.

∑dy = -4 is the deviations of item values of y series from its arithmetic average.

∑dx dy = 58 is the total of the product of dx and dy

n = 10 is number of pairs.

∑dx 2 = 75 is the total of the square of the deviations of item values of x series from its arithmetic
average.

∑dy 2 = 62 is the total of the square of the deviations of item values of y series from its arithmetic
average.

r= 58 – 10 ( -3 / 10 ) ( -4 / 10 )

10 x 75/ 10 – (-3 / 10) 2 x -4/ 10 – (-4/ 10) 2

= 58 – 10 x -0.3 x -0.4

10 x 7.5 – (-0.3) 2 x 6.2 – (-0.4) 2

= 58 – 10 x 0.12

10 x 7.5 – 0.09 x 6.2 – 0.16

= 58 – 1.2

10 x 7.41 x 6.04

Chapter VII – Data Analysis and Interpretation Page 190


HANDBOOK OF COMMUNICATION RESEARCH

= 58 – 1.2

10 x 2.72 x 2.46

= 56.8

66.9

= 0.85

Hence, correlation between the adult and children viewers of TV programmes is of 0.85. It
means that there is a high degree positive correlation between adult and children viewers of
T.V. programmes.

In order to understand the significance of correlation probable error is also calculated. Formula
given below is used for calculating probable errors:

P.E = 0.6745 1 – r2

n
The symbols used in the formula denotes-
P.E = Probable error
r2 = Square of coefficient of correlation

n = Square root of number of pairs.

In this example the probable error would be –

P.E = 0.6745 1 – r2

In the above example value of –

r2 = (0.85)2 is the square of coefficient of correlation

n = 10 is the number of pairs.

P.E = 0.6745 1 – (0.85)2

10

Chapter VII – Data Analysis and Interpretation Page 191


HANDBOOK OF COMMUNICATION RESEARCH

= 0.6745 1 – 0.7225

10

= 0.6745 x 0.2775
3.16

= 0.06
It means that correlation (r) [0.85] is not significant as probable error [0.06] is less than 6 times
of (r).

INTERPRETATION OF KARL PEARSON’S COEFFICIENT OF CORRELATION (r)

The interpretation of correlation (r) should be done on basis of following points –

i. With the use of sign (+) or (-) it should be stated that the correlation is positive
or negative.

ii. Correlation is to be analysed if it is high, low or moderate. To do so the total


range may be divided into three equal parts. The maximum degree of
correlation or perfect correlation can be 1. If correlation is less than 1/3 or 0.33
then it is low degree correlation. If correlation is 2/3 or between 0.34 and 0.67,
it is moderate degree of correlation and when it is more than 0.67, the
correlation is of high degree.

iii. It is to be stated whether the correlation is significant or insignificant. To


analyse the significance of correlation (r), it is required to calculate the
probable error. If (r) is more than 6 times the probable error it is definitely
significant. If (r) is less than 6 times of probable error it can not be definitely
said to be significant.

NATURE OF KARL PEARSON’S COEFFICIENT OF CORRELATION (r)

Pearson’s coefficient of correlation has following characteristics:

i. (r) is always between +1 and -1. (r) can never be more than +1 in any case. +1 is
said to be perfect positive correlation and -1 is said to be perfect negative
correlation. ‘0’ value comes when there is absolute lack of correlation. Such a
condition is not there in case of communication research study.

ii. (r) is linear and not curvilinear in nature.

Chapter VII – Data Analysis and Interpretation Page 192


HANDBOOK OF COMMUNICATION RESEARCH

iii. Karl Pearson’s method takes into consideration not only the direction of change of
different pairs, but also the degree of such variation. Therefore, it is a more exact
measurement than others.

b. Concurrent deviation method

In concurrent deviation method the degree of change is not considered but only the direction
of change is taken into consideration. In this method two series may be said as concurrent if
increase in one is followed by the increase in other and vice – versa. Thus, if simultaneous
increase and decrease is occurring between two series than only it is said to have correlation.
Even if they change in varying degrees.

HOW TO CALCULATE CONCURRENT DEVIATION

The following formula is used for computing concurrent deviation -

r= + + (2 c – n )

The symbols used in the formula denotes-

c = Pairs of concurrent deviation.

n = Numbers of pairs of deviation.

This is one unit less than the actual number of pairs as one pair is eliminated in computing
the deviation.

The steps followed for computing concurrent deviation are as follows:

i. Assemble the two series in tabular form.

ii. Mention the deviation of every item from its previous item. The first item is thus
eliminated. if the second item is more than first item (+) sign will be put before it and
if it is less (-) sign will be placed before it. Than the third item will be compared with
the second and forth item with the fifth and so on. The deviation signs of two series
would be nominated as dx and dy of the two series respectively.

iii. After deriving dx and dy. Multiplication of dx with dy is done using simple
mathematics law i.e. (+) x (+) = (+), (-) x (-) = (+) and (+) x (-) = (-).

iv. Then numbers of pairs of concurrent deviation are sorted out. These are the pairs
which have moved in same direction. This is done by counting the (+) signs in dx dy
column. It is nominated as c.

Chapter VII – Data Analysis and Interpretation Page 193


HANDBOOK OF COMMUNICATION RESEARCH

v. The + and - sign have been placed in both inside and outside the root sign. This is
because if the value of 2c is more than n, the value of 2c – n / n will be also negative.
Hence, square root of negative value cannot be taken out. Thus, by placing (-) sign
before 2c – n the minus value is converted into plus, and the square root is taken out.
When the root has been calculated the (-) sign is again utilized to reconvert it into
minus. Therefore only one sign is used at a time. if the value of 2c – n is negative the
formula would be -

- - (2c – n)
n

and if the value of 2c – n is positive the formula will be + + (2c - n)


n

EXAMPLE –

The following data relates to the money spent on TV news services and the viewers rates per
thousand for the whole year on monthly basis. Find the method of concurrent deviation
whether the two variables are correlated.

Month Amount spent in lakhs Viewers rate per thousand


Jan 1.25 7.02
Feb 1.69 7.15
Mar 1.10 7.18
Apr 0.96 7.04
May 1.02 7.01
Jun 1.14 7.32
July 1.28 7.59
Aug 1.29 8.02
Sep 1.16 8.08
Oct 1.12 7.64
Nov 0.97 7.32
Dec 0.93 7.16

Month Amount Deviation from Viewers Deviation from Product of


spent Previous month rate Previous month dx and dy
(x) (dx) (y) (dy)
Jan 1.25 7.02
Feb 1.69 + 7.15 + +

Chapter VII – Data Analysis and Interpretation Page 194


HANDBOOK OF COMMUNICATION RESEARCH

Mar 1.10 - 7.18 + -


Apr 0.96 - 7.04 - +
May 1.02 + 7.01 - -
Jun 1.14 + 7.32 + +
July 1.28 + 7.59 + +
Aug 1.29 + 8.02 + +
Sep 1.16 - 8.08 + -
Oct 1.12 - 7.64 - +
Nov 0.97 - 7.32 - +
Dec 0.93 - 7.16 - +
n = 11 c=8
Table 8.22: Arrangement of data for finding different values for computing coefficient
concurrent correlation.

Coefficient of concurrent deviation = + + (2c – n /n)

= + + (2 x 8 – 11 / 11)

In the above example value of –

c = 8 is the pairs of concurrent deviation.

n = 11 is the numbers of pairs of deviation.

Since, the value of 2c – n is positive = 5 / 11


(-) sign is not used further

= 0.4545

= 0.67
It means moderate correlation exists between money spent on TV news services and viewers
rate per thousands.

LIMITATION OF COEFFICIENT OF CONCURRENT DEVIATION

The following are the weakness of Coefficient of concurrent deviation:


i. Consideration to the degree of change is not done in this method. Therefore, even a
slight variation has the same effect as a large variation. Thus, it gives only a rough
measure and is not a mathematically precise measure as of Karl Pearson’s methods.
Chapter VII – Data Analysis and Interpretation Page 195
HANDBOOK OF COMMUNICATION RESEARCH

ii. It can be used only in case of time series only.

iii. Where there exists a causal relationship between the two series so that one is the
cause and the other, effect.

c. Spearman’s ranking method

This method was introduced by a psychologist Charles Edward Spearman. He is known as


pioneer of factor analysis and is recognised for establishing rank correlation coefficient. Here,
inspite of individual measurement the rank of the items in the whole group is taken.
Spearman’s ranking method is also known as Method of Rank Difference. The assumption
made here is that if the two series are correlated the rank of each item and its pair would be the
same or approximately the same. This method is very suitable for communication research
studies where actual measurement cannot be given but the rank can be fixed. It is also helpful
in providing measurement to qualitative phenomena and calculating the correlation between
the two given series. For example, we can very easily compare the product knowledge of
persons by relating to the number of hours spent by each in front of television or if the ages of
husbands and wives are highly correlated and a particular husband stands 10 th in order of age,
the wife should also stand nearly the same position in order of age among all the wives under
consideration.

HOW TO CALCULATE SPEARMAN’S RANKING METHOD

The formula used for calculating the coefficient of correlation by method of rank differences
is –

r = 1 – 6 ( ∑d2 ) / n (n2 – 1)

or r = 1 – 6 ( ∑d2 ) / n3 – n

The symbols used in the formula denotes-

d2 = Square of the difference of the rank of individual pairs of two series.

n = Number of items.

The procedure for calculating the coefficient of correlation is –

i. Firstly, finding of the rank of individual item values should be done. It can be
arranged either from the highest or the lowest value. The items should be arranged
according to their rank, only the number of their rank should be written before
them.

ii. Then the difference between ranks of x and y series is drawn. It is nominated as d.

Chapter VII – Data Analysis and Interpretation Page 196


HANDBOOK OF COMMUNICATION RESEARCH

iii. The square of the difference (d2) is made and then the formula of rank differences is
applied.

EXAMPLE –

Ten competitors in the T.V Anchor Hunt contest are ranked by the two judges as follows:-

I - Judge –1 6 5 10 3 2 4 9 7 8.

II – Judge –6 4 9 8 1 2 3 10 5 7.

Find out by the method of rank difference how far the opinions of the two judges are similar.

Ranks by Ranks by Difference Square of


Table 8.23: I - Judge II - Judge (d) Difference (d2) Arrangement of
data for finding 1 6 5 25 the rank difference
for computing 6 4 2 4 coefficient
concurrent 5 9 4 16 correlation
10 8 2 4
6 ( ∑d2 )
3 1 2 4
r=1–
2 2 0 0
n (n2 – 1)
4 3 1 1
In this example 9 10 1 1 the values of –
7 5 2 4
∑d2 = 60 is the 8 7 1 1 square of the
difference of the Total of d2 60 rank of individual
pairs of two series.

n = 10 is the number of items.

6 x 60
=1–
10 (102 – 1)

6 x 60
=1-
10 (100 – 1)

= 1 – 360 / 990

= 1 – 0.36

= 0.64

Chapter VII – Data Analysis and Interpretation Page 197


HANDBOOK OF COMMUNICATION RESEARCH

The opinions of two judges are fairly similar. Sometimes two or more items have equal values.
They have to be given equal rank, which is the average of their ranks if they have been
inequal. Thus, if after 5th rank two items have equal values; their rank will be 6.5 in each case
because of these common ranks the coefficient of correlation has to be corrected as followed.

r = 1 – 6 [ ∑ d2 + 1/12 (t3 - t)….] / n ( n2 - 1 )

Here, t stands for the number of items value is repeated. Thus, if measurement x is repeated 3
times, value of ‘t’ would be 3. One value of (t3 - t) is added for each item.

II. Graphical methods –

Apart form the mathematical methods of measuring correlation graphical methods are also
used. It does not give an exact degree of measurement but certainly give an idea of correlation
by observation. These methods are easy to use and follow by lay person also. The following is
a type of graphical method commonly used.

a. Simple graph

In simple graph the values of two series are plotted on a graph paper. The various points are
joined by straight lines. The trend of two lines is now studied. If they move in same direction
with simultaneous peaks and troughs, the two series are said to be correlated. If no such
tendency is noticeable they are unrelated or independent of each other. If the peak in one line is
coupled with trough in the other line and vice-versa, the two variables are negatively
correlated.

EXAMPLE –

The following data regarding marks of 10 students in two subjects namely Reporting and
Editing is shown by means of graph whether they are correlated.

Student –A B C D E F G H I J
Marks in Reporting – 20 25 22 27 20 30 28 32 45 33
Marks in Editing - 15 22 19 28 18 26 28 30 35 32

Chapter VII – Data Analysis and Interpretation Page 198


HANDBOOK OF COMMUNICATION RESEARCH

Correlation in marks of students

50

40

30
Marks

Series1
20 Series2

10

0
A B C D E F G H I J
Students

Table 8.24: Graphical method for analysing correlation between two series

vi. BI – VARIATE ANALYSIS

It is one of the simplest forms of the variable analysis. As the name suggest it involves the
analysis of two variables for determining the empirical relationship between them. To
understand the relation it necessary to measure how the two variables simultaneously
change together. It is easy to conduct testing of simple hypothesis of association and
causality to what extent it becomes convenient to know and predict a value for the
dependent variable if the value of independent value is known.

The distinguishing factors between bi-variate and uni-variate analysis is that in uni-variate
only one variable is analysed and it is more for description purpose. Whereas, bi-variate
analysis goes beyond description, it also analyse the relationship between two variables.

CLASSIFICATION OF BI-VARIATE ANALYSIS

The common form of bi-variate analysis involves:


a. Percentage table
b. Scatter Plot Graph
c. Computation of a simple correlation coefficient

Bi-variate analysis is a simple i.e. relationship of two variable. Multi-variate analysis where
multiple relationship between multiple variables are examined simultaneously.

vii. MULTI – VARIATE ANALYSIS

Chapter VII – Data Analysis and Interpretation Page 199


HANDBOOK OF COMMUNICATION RESEARCH

As name suggests the analysis of more than two variables of a sample is known as Multi-
Variate Analysis (MVA). It includes a set of techniques for analysing data having multiple
observations on each sample. To make a comparative study between two channels on the
factors influencing their rapid growth and popularity, various factors like picture quality,
format of programs, presentation style, marketing strategy etc shall be compared. Thus, on
two television channels taken as samples multiple variables are studied simultaneously.
Hence, it will be regarded as Multi – Variate Analysis.

The techniques of multivariate analysis are very suitable for analysing data represented by
different variables. Many of its techniques have been partially developed recently because
of their dependency on computational capabilities of modern computers. The major
advantage of multivariate techniques is that it considers the correlation or inter –
dependence among the variables. This leads to a correct interpretation of the results.

NATURE OF MULTI – VARIATE TECHNIQUES

 Multivariate statistical techniques are helpful in making probability statements on the


basis of sampled multiple measurements.
 Multivariate techniques give results evidently.
 These techniques are largely experimental and thus practically possible in day to day
real situations to analyse complex data.
 It represents mass of observational data in a simplified way to give as much as possible
information form the raw data.

TYPES OF MULTI-VARIATE ANALYSIS

Different types of variables are used in multi-variate analysis. The prominent one are
described as follows:

a. Explanatory variable and criterion variable – The explanatory variable is the one
which is considered as the cause of other variable. Suppose, X is a variable considered
as cause of Y, then X is explanatory or casual or independent variable and Y is the
criterion or resultant or dependent variable. It can have set of many variables and
named as (X1,X2,X3…Xp) set of explanatory variables and (Y1,Y2,Y3….Yq) as set of
criterion variables. The usage of term external criterion is done for explanatory variable
and the term internal criterion for criterion variable.

b. Observable variables and latent variables – If any explanatory variable are


observable directly in a situation then it is said as observable variables and the variable
which are unobservable and are influencing the criterion variable, they are termed as
latent variables.

c. Discrete variable and continuous variable – The variable which when measured may
take only the integer value i.e. one single figure or whole number, not in fraction or in
decimal points. It is called as Discrete Variable. The variable which can give any real

Chapter VII – Data Analysis and Interpretation Page 200


HANDBOOK OF COMMUNICATION RESEARCH

value not only a whole number but even in fraction or in decimal points. It is called as
Continuous Variable.

d. Pseudo variable or Dummy variable – If only one of Xi (i = 1,…..,m) is 1 and rest all
are zero, then such variable is called as pseudo variable. It is used in technical sense, is
useful in algebraic manipulation applied in multi-variate techniques.

TYPES OF MULTI-VARIATE TECHNIQUES

There are many multivariate techniques classified on basis of few questions like are the
variable dependent or independent upon others, how many variables are dependent and
whether the data is qualitative or quantitative. Depending upon the question the
multivariate techniques are broadly categorized in two:

1. Data containing both Dependence and interdependence methods.


If variable involved are dependent upon others than we have dependence method and if
variable are not dependent upon other variables than we have interdependence methods.
These are as follows:
a. Multiple regression analysis

z’y = β1 z1 + β2 z2 + …..+βk zk

where, z’y stands for the predicted value of the standardized Y score, z y. The expression β1
z1 + β2 z2 + …..+βk zk is the linear combination of explanatory variables. Constant A is
eliminated in the process of converting X’s to z’s. in the equation least square method is
used, to estimate the beta weights in such a way that the sum of the squared prediction
errors is kept as small as possible. Thus, the expression becomes Σ (zy – z’y)2 is minimized.

Researcher may sometime use step-wise regression techniques to have better idea of
independent contribution of each explanatory variable.

b. Multiple discriminant analysis


c. Multivariate analysis of variance
d. Canonical analysis

2. Data containing several variables without dependency relationship:


The following are multi-variate techniques used for data containing several variables
without dependency relationship:
a. Cluster analysis
b. Multidimensional scaling or MDS
c. Latent structure analysis
d. Factor analysis

h. FACTOR ANALYSIS

Chapter VII – Data Analysis and Interpretation Page 201


HANDBOOK OF COMMUNICATION RESEARCH

As discussed above for the data containing several variables without dependency relationship
various multi-variate techniques are used one of which is factor analysis. Factor analysis is
used where quantitative (metric) inputs are given. It is not suitable of qualitative (non-metric)
inputs. It is not a single method for analysis but a set of techniques, therefore for various
several methods the results will not necessarily be the same.

The methods of factor analysis are:

A. Centroid method
B. Principal components method
C. Maximum likelihood method

BASIC TERMS RELATED TO FACTOR ANALYSIS

 FACTOR: It means several observed variables. Factors can be one or more, depending
upon the nature of the study and number of variables involved in it.

 FACTOR – LOADING: Values that explain the closeness of variables relating to each
one of the factors discovered is termed as factor – loading. It is thus also known as
factor – variable correlation. It shows the absolute size of the loading which helps in
understanding the meaning of particular factor and interpreting it.

 COMMONALITY (h2): This reflects the accountability of each variable for the factors
taken together. The higher the value of (h2) lesser is the number of variables left over
after whatever the factors represent is taken into consideration. The formula of (h2) is:
h2 of the ith variable = (ith factor loading of factor A)2 + (ith factor loading of factorB)2
+ …….
 EIGEN VALUE: This term is also known as latent root. It is the sum of squares of the
values of factor loadings relating to a factor. The main function of this value is to
indicate the relative importance of each factor in accounting for a particular set of
variables being analysed.

 TOTAL SUM OF SQUARES: It is the total of the Eigen values of all factors. The total
of sum squares when divided with the number of variable involved in the study gives
an index that shows how the particular solution accounts for what all variables taken
together represent. If the index is low, it means all the variables are very different from
each other. Unity is achieved in index in two cases; if the variables fall into one or more
highly unneeded groups or if the extracted factors account for all the groups.

 ROTATION: It defines structures in the data. Different rotation reveals different


structure but from the statistical point of view all results are taken as equal. An
orthogonal rotation is obtained if factors are independent; an oblique rotation is made if
factors are correlated. There will be commonness in each structure regardless of
structures. But the Eigen value will change as result of rotation.

Chapter VII – Data Analysis and Interpretation Page 202


HANDBOOK OF COMMUNICATION RESEARCH

 FACTOR SCORES: This score helps in performing other multivariate analyses. It


explains the meaning of the each factor because it is the degree to which each
respondent get high scores on the group of items that load high on each other.

TYPES OF FACTOR ANALYSIS

Factor analysis is mainly used for developing psychological test such as Intelligence Quotient
test, Personality test etc. In communication research this method is helpful in understanding the
media readership, to analyse the communication quotient of persons. It can be either of the two
types given below:

a. R-Type Factor Analysis


b. Q-Type Factor Analysis

a. R-Type Factor Analysis


In R – type factor analysis high correlation occurs when respondents who score high on
variable 1 also score high on variable 2 and the respondents who score low on variable 1 also
score low on variable 2. Factor emerges when there are high correlations within the groups of
variables.

b. Q-Type Factor Analysis


In Q – type factor analysis instead of pairs of variables the correlation are computed between
of pairs of respondents. As much as similarity is observed in the pattern of responses between
respondents correlation becomes higher and higher. Due to high correlation, factor emerges.
This type of analysis is useful when object is to sort out people into groups based on their
simultaneous responses to all variables.

ADVANTAGES OF FACTOR ANALYSIS

i. Factor analysis helps in deriving the significance of data. These facts are within the data
but not clearly understood with mere figures.
ii. If we are asked to rate different news magazines according to preference, it is most
suitable to use factor analysis method because it reveals some unnoticed important points
of various news magazines that are the main cause of their preferences.
iii. It reduces and simplifies the multivariate data.
iv. The empirical clustering of media is very much possible with factor analysis. It provides
a classification basis when data need to be grouped which is derived from various rating
scales.

LIMITATIONS OF FACTOR ANALYSIS

i. Factor analysis methods are quite expensive because it involves lot of computation.
ii. The computation process is very laborious on part of research worker.
iii. Analysis is required to be done twice for more reliable results as results of single factor
analysis are commonly considered to be less dependable and less reliable.
Chapter VII – Data Analysis and Interpretation Page 203
HANDBOOK OF COMMUNICATION RESEARCH

iv. It is very complicated method and requires thorough knowledge and enough experience
for handling computation.

TEST OF SIGNIFICANCE IN COMMUNICATION RESEARCH:

Test of significance: Meaning

Sometimes it is seen that the sample drawn from the population is different in certain
characteristics from the population parameter. There can be two strong reasons for this
difference – either there may be fluctuations of sampling or its parent population is different
from the one brought according to the hypothesis supposed.

The probability of getting a difference of the sample estimate and the population value equal to
or greater than the observed differences with the use of its standard error has to be seen. If the
probability is very low, this means that there are very few chances of occurrence of such
differences due to errors of sampling alone. Thus, it can be concluded that the difference
observed is too great to prove that the samples has been drawn from the population considered.
It has definitely come from a different population. This difference is said to be significant. Or
on other hand if the probability of getting a difference equal to or greater than the observed one
is very high it means that the differences greater than the observed one will be due to errors of
sampling alone in majority of the cases and, therefore the observed difference is not
significant. These kinds of difference are because of errors of random sampling.

In the same way when comparison is made for estimating certain characteristics in the samples
and to know whether the two samples under study have been drawn from the same population
or not. The statistical procedure used for deciding whether the difference under study is
significant or non-significant, it is called as Test of Significance.

Level of significance

When test of significance is applied, the probability of getting a difference equal to or greater
than the observed one in combination with the standard error of the estimate is generally found.
The values of such a probability which are used to provide rough lines of separation between
the acceptance and rejection of the significance of observed differences are known as levels of
significance. The level of significance is commonly taken as 0.05, 0.01 or 0.001 etc.

Degree of Freedom

Degree of freedom is very important to understand in statistical analysis. Suppose that there are
N observations X1,X2,X3…..XN to which we wish to give values such that their sum is a
constant figure. Then we can give values whatever we please to (N - 1) X’s only the value of
Nth will be determined by the condition that sum of all X’s is equal to a given constant
quantity. Thus, here we can say the degree of freedom is (N - 1). For example, if we take any
five values such that their sum is 40, then we take only 4 values like 3,7,11,8. The fifth one
does not depend upon our own will, but that depends upon the condition that their sum is to be

Chapter VII – Data Analysis and Interpretation Page 204


HANDBOOK OF COMMUNICATION RESEARCH

equal to 40 therefore the fifth number will be 11. Hence, we can see there is no full freedom of
taking the five values according to our own will, but it is that of four only.

Therefore, we can define degrees of freedom as number of items or the number of class
intervals whose values can be determined at will. If the restrictions imposed are two then two
degrees of freedom will be lost. Such restrictions are termed as constraints. If the number of
items or the number of class interval be N and the number of constraints be k, then the degree
of freedom will be given as-

Degree of Freedom = N - k

a. ‘t’ – TEST

One of the most commonly used statistical data analysis procedure for hypothesis testing is ‘t’
– test. The two major kinds of ‘t’-test are :
3. Student’s t – test
4. Fisher’s t – test

i. Student’s t – test

This test is very commonly used. It is also known as ‘two – sample t – test’ or ‘independent
samples t-test’. It tests whether or not two independent populations have different mean
values on some measure.

In Student’s t-test the testing of significance of the departure of sample mean X from a
hypothetical value µ or testing the significance of the difference between the two samples
means X and Y. We have assumed that the standard deviation σ of the normal
population, from which the samples are supposed to have been drawn, is known or sample
is large. Although in actual practice the standard deviation σ remains unknown. If samples
are large we can get a fairly close estimate of the standard deviation σ we can use the value
of ‘z’ its estimate given by

S2 = Σ (X - X )2 / N - 1

Here, S = Estimate of (σ) Standard Deviation


X = total sample

X = sample mean

µ = hypothetical value

Also, z = X - µ
Chapter VII – Data Analysis and Interpretation Page 205
HANDBOOK OF COMMUNICATION RESEARCH

S/ N

If N is large, the error involved in replacing σ with its sample estimate S will be negligible,
but if N is small the error may be appreciable.
In the absence of the parameter σ when we use its estimate S, the distribution of z remains
no longer normal but changes to another distribution, named as ‘t’ – distribution. Using this
‘t’ we have

t= X-µ

S/ N

EXAMPLE

Ten printed advertisements are taken from a newspaper at random whose lengths in
centimeters are given below:-
52, 55, 57, 61, 65, 67, 68, 70, 71.

Using the t- test find the suggestion that the mean length of print advertisements in the
newspaper is 60 centimeters.

Let us first construct a table for calculating the sample mean and the S.D.-

S. no. Length d d2
in cm (X) = X - 60
1 52 -8 64
2 55 -5 25
3 57 -3 9
4 61 1 1
5 64 4 16
6 65 5 25
7 67 7 49
8 68 8 64
9 70 10 100
10 71 11 121
Total 630 30 474
Table 8.25: Tabulation for Student’s t – test.

As given the arbitrary mean = 60


Now,
X = Σ X / N = 630 / 10 = 63

Chapter VII – Data Analysis and Interpretation Page 206


HANDBOOK OF COMMUNICATION RESEARCH

Here, value of Σ X = 630, the sum of lengths of all advertisements.

N = 10, the number of advertisements.

Σ (X – X)2 = Σ d2 – (Σ d)2 / N

= 474 – (30)2 / 10

= 474 – 90 /10

= 384

Here, value of Σ d2 = 474, the total of the square of the difference of different lengths from
the mean.

(Σ d) = 30, the total of the difference of the mean.

Therefore, S = Σ (X – X )2

N–1

= 384 / 9

Therefore, S.E. of X = S / N = 384 / 9 x 10 = 2.065

And

t= X-µ

S/ N

= 63 – 60 / 2.065 = 1.45

Hence, the observed value of t is 1.45 which being less than 2.065, the value of t at 5%
level of significance for 9 d. f., is proved to be not significant. Hence, we can say that the
mean length of advertisements is 60cm within the limits of errors of sampling.

ii. Fisher’s t – test


Chapter VII – Data Analysis and Interpretation Page 207
HANDBOOK OF COMMUNICATION RESEARCH

This is the other type of t-test for statistical significance used in the analysis of tables having
commonality especially for small sample sizes. It is after the name of its inventor R.A. Fisher.
It is one of a class of exact tests. Here, comparison of two means from two samples is
conveniently performed. Suppose we have two samples X 1, X2, …..Xn1 and Y1,Y2, …. Yn2. The
following statistics will be calculated for testing the significance of the difference between
their means:

X= 1 Σ X, S12 = 1 Σ (X - X)2
n1 n1 - 1

Y= 1 Σ Y, S12 = 1 Σ (X - Y)2
n2 n2 - 1

σc2 = Σ (X - X)2 + Σ (Y - Y) 2 / (n1 – 1) + (n2 – 1)… Equation (1)

Also, σc2 = (n1 – 1) S1 2+ (n2 – 1) S22 / (n1 – 1) + (n2 – 1) ......Equation (2)

Here, S1 2 and S22 are the estimated variance from two samples and σc2 is an estimate of the
population variance which is estimated by pooling the sums of squares Σ(X - X)2 and Σ(Y-Y)2
and then dividing by the total number of degrees of freedom {(n1 – 1) + (n2 – 1)} contributed by
the two samples as given by equation (1).

Therefore, t= X–Y

σc2( 1 + 1 ) ………..Equation (3)


n1 n2

Degree of freedom (d. f.) = (n1 – 1) + (n2 – 1) …….. Equation (4)

The denominator of the expression given in (3) is the standard error of (X - Y). This gives a
different value from that obtained in equation large samples.

EXAMPLE

The per month sale of six randomly selected local magazines with poor paper quality but good
content is as follows: 43,45,48,49,51 and 52. Where as the per month sale of 10 randomly
selected local magazines with fine paper quality and average content is as follows:
47,49,49,51,53,54,55,55,56 and 57.

Test whether good paper quality effects on the sale of the magazines.

Let, X = Sales of magazines with good paper quality.


Chapter VII – Data Analysis and Interpretation Page 208
HANDBOOK OF COMMUNICATION RESEARCH

Y = Sales of magazine with poor paper quality.

The median of newspaper with good paper quality is 53, denoted as A.

The median of newspaper with poor paper quality is 48, denoted as B.

S. no. X d1 d12 S. no. Y d2 d22


(X- A) (Y – B)
1 47 -6 36 1 43 -5 25
2 49 -4 16 2 45 -3 9
3 49 -4 16 3 48 0 0
4 51 -2 4 4 49 +1 1
5 53 0 0 5 51 +3 9
6 54 +1 1 6 52 +4 16
7 55 +2 4
8 55 +2 4
9 56 +3 9
10 57 +4 16
Total 526 -4 106 288 0 60
Table 8.26: Tabulation for Fisher’s t – test.

Here, X = 526 / 10 = 52.6 The total of the sum of X is 526 and the total number of
magazines of good paper quality are 10.

Σ(X - X)2 = Σd12 – (Σd1)2 / n1 The Σd12 = 106 , Σd1 = -4, n1 = 10.

= 106 - (-4)2 / 10

= 104.4

Y = 288/6 = 48.0 The total of the sum of Y is 288 and the total number of
magazines of poor paper quality are 6.

Σ(Y - Y)2 = Σd22 – (Σd2)2 / n2 The Σd22 = 60 , Σd1 = 0, n2 = 6.

= 60 - (0)2 / 6

= 60.0

σc2 = Σ (X - X)2 + Σ (Y - Y) 2 The values of Σ (X - X)2 = 104.4 & Σ (Y - Y) 2 = 60.0

(n1 – 1) + (n2 – 1)

Chapter VII – Data Analysis and Interpretation Page 209


HANDBOOK OF COMMUNICATION RESEARCH

= 104.4 + 60.0 / 9 + 5

= 11.74

Therefore,

t= X–Y The value of X = 52.6, Y = 48.0 & σc2 = 11.74.

σc2( 1 + 1 )
n1 n2

= 52.6 – 48.0

11.74( 1 + 1)

10 6

= 2.599

Degree of freedom (d. f.) = (n1 – 1) + (n2 – 1) n1 = 10 & n2 = 6

= 9 + 5

= 14

Hence, the observed value of t is 2.599 which is greater than 2.145, which being greater than
2.145, the value of t at 5% level of significance is proved to be significant. This shows that the
observed mean differs from each other. So, it can be said that the good paper quality effect the
sale of magazine.

b. ‘F’ – TEST

‘F’ – test is a statistical test of having varying distribution under the null hypothesis. It is used
for comparing statistical models that have been fit to a data set, to identify the model that best
fits the population from which the data was sampled. The aim of these experiments is to see
whether there exist any real difference between the treatments or they are only the errors of
sampling. One has to start with the null hypothesis that all the treatments are equal so far as
their effects on a characteristic is concerned. This means the difference between them is zero.
We here suppose that τ for the treatment effect our null hypothesis will be
τ1 = τ2 = τ3 =………….τn

Chapter VII – Data Analysis and Interpretation Page 210


HANDBOOK OF COMMUNICATION RESEARCH

It is to be seen either this hypothesis will be disapproved or accepted within the limits of
chance error. The testing can be done by calculating the ratio between the treatment variance
and error variance and then testing its significance by comparing it with the expected value of
the variance ratio at desired probability level. The ratio between the two variances is expressed
by the symbol F and the test is known as F-test.

In honour of Sir R.A. Fisher, who initially developed the statistic as the variance ratio, the
name f- test was coined by George W. Snedecor.

Let V1 and V2 be two variances based on v1 and v2 degree of freedom. Then,

F = V 1 / V2

EXAMPLE

From the given data calculate the different components of variation.

Treatment
Table 8.27: Lots 1 2 3 4 5 Totals Means Tabulation for F–
test. 1 7 8 8 9 8 40 8
2 11 12 13 10 9 55 11
i. The mean (µ) = Σ 3 6 7 12 11 4 40 8 X / nk here,
Σ X = 135, n = 5 & Totals 24 27 33 30 21 135 k=3
Means 8 9 11 10 7
= 135 / 15

= 9

ii. The component of variation due to lots-

nVR = Sum of squares (S.S) / Degree of freedom (D.F)

S.S between lots –

S.S = 1 (R12 + R22 + R32 ) - C.F


n

Here, R1 , R2 , R3 are the totals of lots and C.F = (Total of all the nk variates)2 / nk

D.F = k – 1

Here, n is the number of treatments and k is the number of lots.

S.S. = 1 (402 + 552 + 402) - 1352


5 15

Chapter VII – Data Analysis and Interpretation Page 211


HANDBOOK OF COMMUNICATION RESEARCH

= 30

D.F = 3 – 1 = 2

Therefore, nVR = 30 /2 = 15.

iii. The component of variation due to treatments-

kVT = Sum of squares (S.S)

Degree of freedom (D.F)

Here, D.F. = n – 1

D.F. = 5 – 1 = 4

kVT = 30 / 4 = 7.5

iv. The component due to error -

VE = S.S / D.F.

Here, S.S due to error = Total S.S. – Lot S.S – Treatment S.S
= 88 – 30 – 30

= 28

The value of total S.S. = (72 + 82 + ….+ 42) - 1352 / 15

= 88
D.F. = (n - 1) (k - 1) = (3-1) (5 - 1) = 2 x 4 = 8.

Therefore, VE = 28 / 8 = 3.5

Sources of D.F. Sum of Mean Sq.


Variation Squares (M.S.)
(S.S.)
Lots k -1 n Σ (R- µ) nVR
Treatments n – 1 k Σ (T- µ) kVT
Error (n - 1) (k - 1) Σ (X – R – T + µ)2 VE
Total nk - 1 Σ (X - µ)2
Table 8.28: Table for analysis of variance.

Chapter VII – Data Analysis and Interpretation Page 212


HANDBOOK OF COMMUNICATION RESEARCH

Source of D.F S.S M.S.


variation (variance)
Lots 2 30 15.0
Treatments 4 30 7.5
Error 8 28 3.5
Total 14 88 -
Table 8.29: Tabulation of values as per formulae given in table 8.28.

The observed value of F is compared with the value given in the table for v1 and v2 degree of
freedom at any desired probability level. The levels in which we are generally interested are .
001, .01 and .05

Treatment of Variance (VT) = 7.5


Error Variance (VE) = 3.5

Therefore, F = VT / VE = 7.5 / 3.5 = 2.143

v1 = 4 and v2 = 8.

Value of F at 5% level of significance for v1 = 4 and v2 = 8 is 3.84. As the observed value 3.84
is less than F5% . It is not significant. Thus, there is no significant difference between the
treatments. If the observed value of F been proved significant, the interpretation would have
been that the differences between the treatments are real and not due to errors of sampling.

c. CHI – SQUARE TEST or X2 - TEST

Amongst the several tests developed so far, it is one of the most important tests. As the above
tests like ‘t’ – test, F – test etc are helpful in studying the differences between any two
distributions but they are unable to express all the features of distributions. Chi – square test,
which is represented as X2 and pronounced as Ki- square is a non-parametric test. In non-
parametric test we do not assume that a particular distribution is applicable, or that there is a
certain value attached to a parameter of the population. It can be used to make comparison
between theoretical population and actual data when categories are used. Thus, Chi – square
test is applicable in large number of problems. For certain cases which are to be studied on
basis of any hypothesis based on some general law of nature or any reasoning. In such cases
we need to determine the theoretical or hypothetical frequencies on the basis of the hypothesis
assumed and then, by comparing them with the observed frequencies test whether the observed
frequencies are in agreement with the hypothetical ones. The test is able to help to know
whether the –

i. Particular distributions are in agreement with normal distribution.


ii. Two given distributions are in agreement with one another.

Chapter VII – Data Analysis and Interpretation Page 213


HANDBOOK OF COMMUNICATION RESEARCH

iii. Two observed frequencies are in any particular given ratio.


iv. Two sets of classification are independent of each other etc.

These entire problems can be handled with the help of X2 – test or Chi – square test.

The X2 – test of significance gives the probability of getting a value of X 2 equal to or greater
than the observed one in random sampling. If probability is less than 0.05 which is considered
very small we are justified in suspecting significant divergence between the fact and theory and
then can declare the null hypothesis on the basis of its disproval. In other case if the probability
is not small, it is greater than 0.05, we cannot say that the hypothesis is proved to be correct. It
can be said hat by the application of X2 test we find no grounds to suspect the hypothesis can
be accepted within the limits of experimental error and it can be said that the observed data are
in agreement with the hypothesis.

For very large values of X2 which leads us to suspect the hypothesis or the sampling
techniques, are very rare. Same is the case with very small values nearly equal to zero, which
lead to prove a very close agreement between the fact and theory. In such situation we start
suspecting our sampling technique and say that very close correspondence between the fact
and the theory to this extent is too good to be true.

HOW TO CALCULATE X2 – TEST

The formula for calculating X2 is –

X2 = Σ (O – E)2 / E

The symbols used in the formula denotes –


O – Observed frequency.
E – Expected frequency.
Σ – Summation.

X2 – test is also expressed as - X2 = ( σs2 / σp2 ) (n-1)

The symbols used in the formula denotes –


σs2 = Variance of sample.
σp2 = Variance of population.
(n-1) = Degree of freedom, n is the number of items in the sample.

EXAMPLE
Name of scriptwriters Number of words
Given below is number of words used by 10
A 38
scriptwriters in writing a box news story.
B 40
C 45
D 53
E 47
Chapter VII – Data Analysis F
and Interpretation 43 Page 214
G 55
H 48
I 52
J 49
HANDBOOK OF COMMUNICATION RESEARCH

Let us work out the variance of the sample data –

Name of scriptwriters Number of words Diff.


between (Xi - X)2
Mean

(Xi - X)
A 38 -9 81
B 40 -7 49
C 45 -2 4
D 53 +6 36
E 47 0 0
F 43 -4 16
G 55 +8 64
H 48 +1 1
I 52 +5 25
J 49 +2 4
n = 10 ΣXi = 470 280

Σ(Xi – X)2 = 280


Table 8.30: Tabulation for finding the variance of sample data.

X = ΣXi = 470 / 10 = 47 words. Here, X = mean


n

Therefore, σs = Σ(Xi – X)2 280 31.11


= =
n-1 10 - 1

or σs2 = 31.11

Chapter VII – Data Analysis and Interpretation Page 215


HANDBOOK OF COMMUNICATION RESEARCH

Let the null hypothesis be H0 : σp2 = σs2 . To check this hypothesis we need to find out the
X2 , using the formula –

X2 = ( σs2 / σp2 ) (n-1)

The symbols used in the formula denotes –


σs2 = 31.11, is the variance of sample.
σp2 = 20, is variance of population.
(n-1) = 9, is the degree of freedom, 10 is the number of items in the sample.

X2 = [31.11 / 20 ] (10 -1)

= 13.999.

Degree of freedom here is (n - 1) = (10 - 1) = 9. According to the 5% level of significance table


value of X2 = 16.92 and at 1% level of significance , it is 21.67 for 9 (d.f.) degree of freedom.
Both these value are greater than the calculated value of X 2 which is 13.999. So, we accept the
null hypothesis and conclude that the variance of the given distribution can be taken as 20
words at 5% as also at 1% level of significance. In other words, the sample can be said to have
been taken from a population with variance of 20 words.

CONDITIONS FOR THE APPLICATION OF X2 - TEST

It is to be known that there are certain conditions that must be satisfied before X 2 test is
applied:

i. The number of items in each group should not be less than 10. If it is less than 10,
regrouping is done by combining the frequencies of adjoining groups so that the
new items become greater than 10.
ii. The items in the sample should be independent.
iii. The random sampling must be used for recording and using the observation.
iv. The constraints must be linear. If there are constraints which involve linear equation
in the class intervals frequencies of a contingency table, means it contains equation
having no squares or higher powers of the frequencies are known as linear
constraints.
v. The number of items must be large, atleast 50 for better results.

SOFTWARE PACKAGE FOR DATA ANALYSIS: SPSS

Chapter VII – Data Analysis and Interpretation Page 216


HANDBOOK OF COMMUNICATION RESEARCH

Statistical Package for the Social Sciences (SPSS) is a comprehensive system of research so
that data can be analyzed at a faster speed, conveniently and with less human errors. It is a
powerful tool i.e. capable of analyzing any type of data in the area of communication research
and social sciences, the natural and physical sciences. Its first version was released in 1968
after being developed by Noman H. Nie and C. Hadlai Hull. This package of programme is
available for both personal and mainframe (or multi-user) computers. SPSS package consists
of a set of software tools for data entry, data management, statistical analysis and presentation.
It integrates complex data and file management, statistical analysis and reporting functions. It
is used by market researchers, health researcher, survey companies, government, education
researchers, marketing organizations, communicators and journalists. The original SPSS
manual (Nie, Bent & Hull. 1970) has been considered as one of the most valuable publication.
In addition to statistical analysis, data management and documentation are important features
of the base software. SPSS can take data from almost any type of file and use them to generate
tables, reports, charts, graphs, descriptive statistics and complex statistical analysis.

Salient features of SPSS

1. Easy to understand, learn and use.


2. Provides in depth statistical techniques and capabilities.
3. Researcher is able to have a full range of data management system.
4. Offers various types of editing tools.
5. Reporting and presentation is made easy.
6. Analyze data swiftly and conveniently.

SPSS Inc. has produced several manuals to describe everything their package of programmes
attempts to accomplish. Between 2009 and 2010 the premier vendor for WPSS was called
PASW (Predictive Analytics Software) Statistics. The company announced on July 28, 2009
that it was being acquired By IBM. As of January 2010, it became “SPSS: An IBM Company”.
IBM SPSS is now fully integrated into the IBM Corporation, and is one of the brands under
IBM Software Group’s Business Analytics Portfolio, together with IBM Cognos.

Basic Steps in Data Analysis

1. Get your data into SPSS: You can open a previously saved SPSS data file, read a
spreadsheet, database, or text data file, or enter your data directly in the Data Editor.
2. Select procedure: Select a procedure from the menus to calculate statistics or to create a
chart.
3. Select the variables for the analysis: the variables in the data file are displayed in a
dialog box for the procedure.
4. Run the procedure: Results are displayed in the viewer.

Statistical Procedures

After entering the data set in Data Editor or reading an ASCII data file, we are now ready to
analyze it. The Analyze option has the following sub options:
Chapter VII – Data Analysis and Interpretation Page 217
HANDBOOK OF COMMUNICATION RESEARCH

 Reports
 Descriptive Statistics
 Custom Tables
 Compare means
 General Linear Model (GLM)
 Correlate
 Regression
 Loglinear
 Classify
 Data Reduction
 Scale
 Non parametric tests
 Time Series
 Survival
 Multiple response

Descriptive Statistics: This submenu provides techniques for summarizing data with statistics,
charts, and reports. The various sub-sub menus under this are as follows:
 Frequencies provides information about the relative frequency of the occurrence of
each category of a variable. This can be used in to obtain summary statistics that
describe the typical value and the spread of the observations. To compute summary
statistics for each of several groups of cases, Means procedure or the Explore procedure
can be used.
 Descriptive is used to calculate statistics that summarize the values of a variable like
the measures of central tendency, measures of dispersion, skewness, kurtosis etc.
 Explore produces and displays summary statistics for all cases or separately for groups
of cases. Boxplots, stem-and leaf plots, histograms, tests of normality, robust estimates
of location, frequency tables and other descriptive statistics and plots can also be
obtained.
 Crosstabs is used to count the number of cases that have different combinations of
values of two or more variables, and to calculate summary statistics and tests. The
variables you use to form the categories within which the counts are obtained should
have a limited number of distinct values.
 List Cases displays the values of variables for cases in the data file.
 Report Summaries in Rows produces reports in which different summary statistics are
laid out in rows. Case listings are also available from this command, which or without
summary statistics.
 Report Summaries in Columns produces reports in which different summary statistics
are laid out in separate columns.
 Custom Tables sub-menu provides attractive, flexible displays of frequency counts,
percentages and other statistics.

Chapter VII – Data Analysis and Interpretation Page 218


HANDBOOK OF COMMUNICATION RESEARCH

In the SPSS release 14.0, they introduced the Charts Builder option for creating graphs and
charts. Chart Builder creates the most professional looking graphs SPSS has ever provided.
There are three complete sets of graphing procedures available in SPSS; Legacy Graphs,
Interactive Graphs and Chart Builder Graphs SPSS 15.0 operations reflects now step by step
boxes and new screens.

Available Additional Modules

1. SPSS Programmability Extension (added in version 14). Allows Python, R. and .NET
programming control of SPSS.
2. SPSS Data Preparation (added in version 14). Allows programming of logical checks
and reporting of suspicious values.
3. SPSS Regression - Logistic regression, ordinal regression, multinomial logistic
regression, and mixed models.
4. SPSS Advance Models – Multivariate GLM and repeated measures ANOVA (removed
from base system in version 14).
5. SPSS Decision Trees. Creates classification and decision trees for identifying groups
and predicting behaviour.
6. SPSS Custom Tables. Allows user-defined control of output for reports.
7. SPSS Exact Tests. Allows statistical testing on small samples.
8. SPSS Categories.
9. SPSS Forecasting.
10. SPSS Conjoint.
11. SPSS Missing Values. Simple regression-based imputation.
12. SPSS Complex Samples (added in version 12). Adjusts for stratification and clustering
and other sample selection biases.
13. AMOS (Analysis of Moment Structures) – add-on which allows modeling of structural
equation and covariance structures, path analysis, and has the more basic capabilities
such as linear regression analysis, ANOVA and ANCOVA.

SPSS has produced several manuals to describe everything that their package of programmes
provides. These volumes run into more than three thousand pages in documentation. SPSS is
able to do is described in the manuals. The experienced researcher in the area of social
sciences, communication and journalism the ownership of the manual is required. Otherwise he
may use only those manuals which are required for the type of research being conducted and
statistical tests to be used.

SUMMARY

Data analysis is undertaken to reach to useful information so that final conclusions can be
drawn and recommendations can be made. Data through application of statistical procedures
are analyzed to reach specific conclusions. Charts, graphs and other pictorial presentations are
used for depicting the data. Statistical principles have very wide scope of application. Actually

Chapter VII – Data Analysis and Interpretation Page 219


HANDBOOK OF COMMUNICATION RESEARCH

there are three main aims of statistics i.e. to study the population, variation and methods of
reducing data. The researcher frequently uses frequencies, percentages and measures of central
tendency. We should be aware of advantages, disadvantages and limitations of expressing the
data in frequencies, percentages and averages. Measures of dispersion are equally important to
the researcher as it is necessary to know measures of variability, degree of deviation from the
central tendency or scattering of items within the group. There are number of methods of
measuring the dispersion such as standard deviation, mean deviation etc. which we should be
aware of. Measures of association between two or more phenomena have to be found out at
several places by using various types of correlation methods. Spearman’s ranking method
which was developed by a psychologist is well recognized method for establishing rank,
correlation and coefficient. Graphical methods have to be used by the researcher to make the
presentation more effective and impressive. Data containing several variables without
dependency relationship various multi-variant techniques are used one of which is Factor
Analysis. Test of significance in communication research is very important as the sample
drawn from the population in certain characteristics is different from the population
parameters. There are number of tests like ‘t’ test, ‘F’ test, ‘chi-square’ test have to be
appropriately used for this purpose. There are number of software packages for data analysis
such as SPSS can be used to save time and make the data analysis more convenient.

QUESTIONS

Q-1 Discuss aims of data analysis and its importance in interpretation of research findings.\
Q-2 What are the measures of central tendency? In your view what are essentials for a good
average?
Q-3 Compare mean, median and mode as averages in terms of their advantages and
disadvantages.
Q-4 Explain measures of dispersion. How standard is worked out?
Q-5 Describe measure of association. How correlation is computed?
Q-6 What do you mean factor analysis? What are its methods?
Q-7 Discuss the functions of test of significance in communication research.
Q-8 Write short notes on the following tests
a) ‘t’ trest
b) ‘chi-sqare’ test
c) ‘F’ test
Q-9 Discuss the utility of SPSS in analyzing data.
Q-10 Describe salient features of SPSS and basic steps in data analysis.
Q-11 What are the important modules available for graphing procedures?

KEY WORDS

Statistics : is calculus of observation


Frequencies : are the numbers which occurring frequently in population.

Chapter VII – Data Analysis and Interpretation Page 220


HANDBOOK OF COMMUNICATION RESEARCH

Percentage : is a fraction or ratio with 100 understood as the denominator; it is a


proportion or share in relation to a whole.
Fisher’s t test : is the other type of ‘t’ test from statistical signification used in
the analysis of tables having commonality specially for the small sample size.

FURTHERE READINGS

Arthur Asa Berger, (2000). Media and Communication Research Methods: An Introduction to
Qualitative and Quantitative Approaches. New Delhi: Sage.

Britha Mikkelesen, (2009). Methods for Development Work and Research, sixth printing. New
Delhi: Sage.

C.R. Kothari, (2008). Research Methodology: Methods and Techniques, second revised
edition. New Delhi: New Age International.

Chris Hart, (2010). Doing your Masters Dissertation, fifth printing. New Delhi: Vistaar
Publications, Sage.

Darren George & Paul Mallery (2008). SPSS for Windows – Step by Step, eighth edition,
published by Darling Kindersley (India) Pvt. Ltd., New Delhi.

Fred N. Kerlinger, (2007). Foundations of Behavioural Research, tenth reprint. Delhi: Surjeet
Publication.

Gopal K Kanji, (2006). 100 Statistical Tests, third edition, South Asia. New Delhi: Vistaar
Publications, Sage.

John C. Reinard, (2006). Communication Research Statistics. New Delhi: Sage.

Kultar Singh, (2010). Quantitative Social Research Methods, fourth printing. New Delhi: Sage.

Makhanlal Chaturvedi National University of Journalism and Communication, (2010). Media


Mimansa: Communication Research, Vol. 4, No.1, July – September. Head Publisher
Makhanlal Chaturvedi National University of Journalism and Communication: Bhopal.

Nourusis, Marija (2006) SPSS 15.0 Advanced Statistical Procedures Companion. Upper
Saddel River, NJ: Prentice Hall.

Prof. S.R. Bajpai, (1960). Methods of Social Survey and Research. Kanpur: Kitab Ghar.

SPSS 15.0 Base User’s Guide (2006), Chicago, IL: SPSS Inc.

SPSS 15.0 Base User’s Guide (2006), Chicago, IL: SPSS Inc.
Chapter VII – Data Analysis and Interpretation Page 221
HANDBOOK OF COMMUNICATION RESEARCH

SPSS 14.0 Command Syntax Reference (2006), Chicago, IL: SPSS Inc.

Shri Ram Singh Chandel, (1964). A Handbook of Agricultural Statistics, second edition.
Kanpur: Achal Prakashan Mandir.

http://en.wikipedia.org/wiki/SPSS

http://www.iasri.res.in/iasriwebsite/DESIGNOFEXPAPPLICATION/Electronic-Book/Module
%201/6SPSS-overview.pdf

Zina O’Leary, (2010). The Essential Guide to Doing Your Research Project, New Delhi:
Vistaar Publications, Sage.

Chapter VII – Data Analysis and Interpretation Page 222

You might also like