A Handbook of Communication Research
A Handbook of Communication Research
Chapter VII –
1. Use statistical techniques in analyzing data systematically, scientifically and taking care of
errors.
2. Explain statistical terms used in communication research.
3. Describe measures of central tendency, dispersion and association.
4. Apply test of significance in communication research.
5. Utilize software packages for Data Analysis.
Data analysis is a multifarious process. The aim of data analysis is to reach to useful
informations so that final conclusions can be drawn and recommendations can be made. In
communication research after the coding and classifying raw data, it is set for analysis and
testing the research hypothesis. Analysis means the categorising; ordering, manipulating and
summarising of data. It reduces data to comprehensible and interpretable form.
Based on data analysis the interpretation is done. Interpretation takes the result of analysis,
makes inference relevant to the research problem under study and draw conclusions. Data
through application of statistical procedures are analysed to reach specific conclusions.
Interpretation of a particular data can be done differently. Researchers may use various
statistical analysis techniques to reach conclusions. Charts, graphs and other pictorial
presentations are forms of depicting data. It helps in depicting the data in a form so that the
reader can understand data at a glance.
Statistical principles have very wide scope of application. Knowledge of statistics is a must to a
researcher. This is of great help in analysing data systematically, scientifically and taking care
of significant errors.
AIMS OF STATISTICS
2. To study variation – It is the variation in the individuals which lead to study the
population. A communication researcher takes the variation as an important and
essential property of the population. In order to minimize it (s)he carefully observe the
samples and takes the average which (s)he regards as an approach to the true value and
avoid the variation.
3. To study the methods of reducing the data – A researcher collects lot of data which
needs to be work-out for getting the essence of it. The objective of analysing data is to
extract the relevant information and express it in a summarized and useful form.
i. FREQUENCIES
According to the Oxford dictionary frequency is said as the rate at which something occurs
over a particular period of time in a given sample or population.
Frequencies are simply the numbers which are occurring frequently in population. From a
particular population representative samples are selected. Data derived from the analysis is
tabulated in form of series. The numbers which are occurring frequently in the series are
called frequencies. The frequency which is repeated in the series for maximum time is
called as highest frequency of the series.
In table 8.1, we see that the number of families is taken as individual and number of
children in each family is listed. The occurrence of ‘2’ children in a family is repeated
maximum times. We see that family A, C & E have two children. Thus, the frequency of
number of children is ‘2’is the highest frequency.
Example 2 -
In table 8.2, we see that the marks obtained by the students from 0 to 30 are divided into a
class interval of 10. The highest frequency obtained here is 16 of the class (11-20) due to
maximum number of students scoring marks from 11 to 20.
ii. PERCENTAGES
Suppose, in a research study total number of respondents are 131. Now, ages are coded as
Young age below 35 years old, Middle age from 36 – 45 and Old age above 45. After
classification it was found that 47 respondents were young i.e. below 35 years old. 33
respondents were of middle age between 36 to 45 years old and 51 respondents were of old
age above 45 years old.
To deal with data of mass character measures of central tendency is essential. It is also
called as Averages. In communication research we do not deal with the particular
characteristics of an individual, but with the characteristics like income size, family size,
ages etc of the whole group. To do so we need to take averages.
a. Representative of the whole group: The average must be nearest to the largest
number of the units of the group in terms of measurement and in term of
qualitative expression it must posses most of the qualities of the group.
b. Definite and clearly ascertained: As far as possible the average must be
expressed in form of a single number rather than quality like a bureaucrat, a
village patriarch. Averages in form of this type are not adjustable to statistical
analysis.
c. Possess stability of value: Validity can be possessed only when the average is
not influenced by the extreme – value items of the groups. Otherwise, the
average will be significantly affected if few more items are added or removed
from the group.
d. Subjected to further mathematical analysis: For research purpose there
should be scope for further analysis to attain more accuracy of the average
drawn.
e. Absolute measurement not a relative one: The average drawn should be
supreme or complete not virtual. If an average is 10% higher and lower than
some particular size. It becomes necessary to know the particular size to
ascertain the 10% higher value.
CLASSIFICATION OF AVERAGES
1. Averages of location
a. Mode
b. Median
2. Mathematical averages
a. Arithmetic average
b. Geometric mean
c. Harmonic mean
1. Averages of location
The averages in this method are identified by the location in the series. It is generally of two
types –
a. Mode
Mode is the most common type of average. The value of the item that has highest frequency is
referred as Mode. It means that an item which has highest frequency in the series is located. It
is the measurement or size that is directly applicable to the largest number of cases. Whenever,
we come across data detailing average ages, average height of an Indian, it is calculated using
mode method of average.
The calculation of mode depends upon the frequencies; hence it is very easy and quick. But,
one has to carefully locate the frequency which is occurring maximum time in the series.
Grouping of data is done in either discrete or continuous series. The items-value with the
highest frequency will be considered as mode. Whenever complete data is not available or
people using average are not having technical knowledge then mode is frequently used. Mode
is also helpful in finding averages of intangible things known as types like diligence,
intelligence, labour class, middle class etc.
B 3
C 5
D 2
E 1
F 2
G 4
H 2
I 1
J 0
K 3
L 2
Table 8.4: Number of media available in twelve families
If we have to calculate the mode from the following data regarding the number of media
available with each family, then the media available is arranged in form of discrete series.
The modal number of media available is thus 2, as it has the highest frequency i.e. 4 (D + F +
H +L).
The above calculation of mode was of independent integer having one single number. Now let
us understand the calculation of mode for groups having intervals with a definite magnitude of
upper and lower limit.
From the following data regarding the per day allowance to TV producers of a company, find
out the average allowance by means of Mode.
In this the highest frequency is 25. The modal group therefore can be easily located as 200 –
250. But the mode should not be a group but a single number. Therefore, using the process of
interpolation the actual place of mode with the group will be located. According, to the
frequencies before and after the modal group, the two formulae are used. Let us understand the
calculation of mode using each formula.
i = Class interval which is generally the difference between the upper and the lower limits
of the size of the modal group.
l1 + f1 – f0
Z= xi
2 f1 – f0 – f2
In this example value of:
l1= 200 is the lower limit of the size of the modal group (200 - 250).
i = 50 is the class interval which is the difference between the upper and the lower
limits of the size of the modal group (200 - 250) i.e. 250 – 200 = 50.
Therefore,
Z = 200 + 25 – 15 / (2 x 25 – 15 -12) x 50
= 200 + 10 / 23 x 50
i = Class interval which is generally the difference between the upper and the lower limits
of the size of the modal group.
l1 + f2 xi
Z=
f0 + f2
In this example value of:
l1= 200 is the lower limit of the size of the modal group (200 - 250).
i = 50 is the class interval which is the difference between the upper and the lower
limits of the size of the modal group (200 - 250) i.e. 250 – 200 = 50.
Therefore,
Chapter VII – Data Analysis and Interpretation Page 160
HANDBOOK OF COMMUNICATION RESEARCH
= 200 + 22 = 222
These two formulae are generally used and give same result in a perfectly symmetrical series.
In asymmetrical series the answer may differ somewhat. But, both are standard formulae and
give accurate result.
As we know the location of mode depends upon the frequencies of the group, which itself
depends upon the size of class – interval. If the class – interval is too small, instead of one
highest frequency group there may be two or more such groups. Such series having more than
one modal group are known as bi – modal, tri – modal or multi – modal series. It can be
generally corrected by increasing the size of class interval.
ADVANTAGES OF MODE
It is :
i. Not necessary to know the size of all the units.
ii. Expressed and located graphically also.
iii. Very easy to locate.
iv. Directly applicable to largest number of items.
v. More stable in nature as it’s not affected by extra – ordinary measurements.
DISADVANTAGES OF MODE
b. Median
The measurement of the middle item arranged either in ascending or descending order is
known as Median. To measure the average age of the male labour of a factory, we can arrange
the age of all male labours in ascending order and the age in the middle will be the average age
of the whole group.
Median is frequently used in measuring social phenomena like skills, productivity etc. where
the effect of extreme items is to be eliminated. It is also beneficial in case of abstract
phenomena, which cannot be calculated mathematically.
EXAMPLE 1-
Student : A B C D E F G H I J
Scores : 9 0 3 4 6 3 6 7 8 2
To calculate the average score by means of median from the data of the scores of 10 students in
a class test the following procedure is applied-
i. This is an ungrouped data. Thus, first we have to arrange them in serial order ignoring
the roll number and serial number given.
ii. Then we have to find out the middle number. The formula for finding the middle
number is n+1 in case of ungrouped data as given and n / 2 for continuous series.
2
Here n is the total number of items in the group.
iii. As we use the formula we find the size of the middle number which will be value of the
median. If middle number falls between the two whole numbers, the average of the two
will be the median.
Let us calculate median for the above data using this procedure –
No. :A B C D E F G H I J
Scores : 0 2 3 3 4 6 6 7 8 9
n+1
Middle item =
2
= 10 + 1 / 2 = 5.5
As the size of 5th item was 4 and 6th item was 6. The size of 5.5 item would be the mid – value
of the two.
EXAMPLE 2 -
Now let us understand the calculation of median for groups having intervals.
Measurement Frequency
0-5 5
5-10 7
10-15 10
15-20 18
20-25 20
25-30 12
30-35 8
35-40 6
40-45 4
45-50 1
Table 8.7: Continuous series to calculate median
The following procedure is adopted to calculate median of the above data-
i. The frequencies given must be converted into cumulative frequencies by giving the
progressive total. For the first group the cumulative frequency will be 5, for second it will
be 5+7 = 12, for third it will be 12+10 = 22, for forth it will be 22 + 18 = 40 and so on.
iii. Find the middle number in the cumulative frequencies, and thus find locate the median
group.
M = l1 + ( i / f ) x (m - c)
M = Median
l1 = Lower limit of the size of the median group.
i = Class interval of median group.
f = Frequency of median group.
m = middle number.
c = Cumulative frequency of the previous group.
Let us calculate median for the above data using this procedure –
The middle items lie at cumulative frequency 60 as 45.5 is more than 40 cumulative frequency
and less than 72 cumulative frequency, whose measurement is 20 - 25. Thus, it is called as the
middle group.
M = l1 + ( i / f) x (m - c)
= 20 + (5 / 20) x 5.5
= 20 + 1.38
= 21.38
ADVANTAGES OF MEDIAN
i. Only the value of the middle item and the number of item is sufficient to calculate
the median. Therefore, it is not necessary to know the value of all the items.
DISADVANTAGES OF MEDIAN
2. Mathematical averages
This method of finding average is based on mathematical calculation. It is basically of
three types -
a. Arithmetic average
It is also called as arithmetic mean, which is the best form of average. It is also very
popular as average par excellence. For all kinds of measurements where further analysis is
required arithmetic average is used. It is the derived by dividing the total measurement by
the numbers of items. It is not dependent on the frequency like median or mode. It is based
on the total value of all the items from first to last and hence considered more
representative. It is mostly use the case when all the items are equally important for
consideration.
EXAMPLE 1 –
To know the average news given by 10 reporters in a month of a newspaper house, we need
to calculate per capita average news given in a month by arithmetic average –
News given in a month – 50, 76, 44, 48, 57, 59, 63, 45, 48, 30.
The formula for the calculation of arithmetic average from the ungrouped data is as
follows:
a = x1 + x2 + x3 + ….. xn / n
or
a = ∑m / n
The symbols used in the formulae denote –
a = Arithmetic average or arithmetic mean
x1 = 50
x2 = 76
x3 = 44
x4 = 48
x5 = 57
x6 = 59
x7 = 63
x8 = 45
x9 = 48
x10 = 30
n = 10
The per capita news average news given in a month will be –
a = 50+76+44+48+57+59+63+45+48+30 / 10
= 520 / 10
= 52
EXAMPLE 2 –
Now let us understand the calculation of arithmetic average for groups having intervals.
From the following data of heights of 100 boy students, calculation of arithmetic average can
be done using the procedure given below:
ii. Multiply the measurement (mid - value) with their respective frequencies and divide it
by n (total number of frequencies.)
Symbolically –
a = ∑mf / n
a = ∑mf / n
= 6032 / 100
= 60.32
i. One of the unique characteristic of arithmetic average is that the total of the
measurement can be known if the average and the number of items are known.
ii. Variation is not seen in arithmetic average like mode. It is definite and ascertained.
iii. It consider all the items, hence is more representative.
iv. It is easily understood.
v. It is not necessary to know the size of each individual or each group. Arithmetic
mean can be calculated even if the total of all the items is known.
vi. Further mathematical analysis can be made in this method.
ii. The averages derived are not directly applicable to any items of the group.
iii. It is not suitable for open end tables. To calculate the arithmetic average value of
all the items separately or at least total value of all the items must be known.
b. Geometric mean
g=n x1 x x2 x x3 …..xn
EXAMPLE –
18 x 8 = 144 = 12.
In case of more than two items. We have to calculate 3 rd, 4th or any other root. The use of
logarithm tables is made. The formula used for this purpose is –
EXAMPLE 1 -
To calculate the average percentage increase in population on different censuses in a particular
town. The following data is given:
The values given after before decimal points are called Characteristics which can be positive or
negative. Like the 2010 item’s logarithm is 1.6021. The bar upon a characteristic indicates a
negative value. The value after the decimal point is known as Mantissa which is always
positive.
In this case the price of one commodity is doubled and that of the other is halved.
As such there is no change in the price level but the arithmetic average shows an
increase of 25% whereas geometric mean shows the real position.
ii. It is more representative. All units of the groups are undertaken during calculation.
iii. Elimination of unjustified importance of extreme item does not require leaving
them out to remove abnormality.
iv. It is open to further mathematical operations.
i. Not easily located like median and mode. Rigorous calculation is performed on the
data.
ii. Necessary to know the size of all the units for computation purpose.
iii. The computing in geometric mean is very tough and requires good knowledge of
the subject.
iv. The greatest limitation of geometric mean is that it cannot be used if the size of any
item is zero.
c. Harmonic mean
The Harmonic mean is used to find the average of time taken by an individual or a thing e.g. to
find the average time taken in producing a television program when workers with different
output standard are engaged. The number derived by dividing the number of items by
reciprocal of the item values of a series is called as Harmonic mean. As we know reciprocal of
any number is the digit which when multiplied with the number yields the product 1. For
example reciprocal of 2 is ½ of 3 is 1/3 etc.
Harmonic mean is always lower than arithmetic average or geometric mean because it gives
least possible weight to large items values. Since, it is derived by mathematical calculations it
can be subjected to further algebraic analysis. It is very representative as it considers all the
items of the group and it not depend on few of them.
Another easier method for calculating harmonic mean is by using reciprocal tables. The
formula the would be –
h = Reciprocal of ∑ Reciprocals / n
The following process is used for calculating the harmonic mean by means of reciprocals –
i. Find the reciprocal of each measurement from the reciprocal tables.
ii. Add up the reciprocal and divide the total by number of items.
iii. Find the reciprocal of the dividend from the reciprocal table. The number thus
arrived at is the harmonic mean.
Find out the average time taken by each worker in producing an advertisement? If A, B & C
are three workers engaged in producing an advertisement. A takes 30 minutes, B takes 20
minutes and C takes 40 minutes in making one single advertisement.
= 3 / (4 + 6 + 3 / 120)
= 3/ (13 / 120)
= 3 x 120 / 13
h = Reciprocal of ∑ Reciprocals / n
Rec 30 = 1/30 = 0.0333, Rec 20 = 1/20 = 0.0500 & Rec 40 = 1/40 = 0.0250
= Reciprocal of 0.1083 / 3
Any type of mass study is possible using average or measure of central tendency. It forms
the most fundamental basis upon which the entire structure of statistical analysis is built.
The following are the advantages of using the averages in statistical analysis –
i. The idea of average classifies the whole mass in to comparatively fewer types and
this makes further analysis possible.
ii. In communication research, each unit differs from the other and no two cases are
exactly alike. In such a case any type of study becomes difficult. Thus, average is
most suitable for such complex phenomena.
iii. Individual items of a group changes too frequently but, the averages taken from
them are comparatively more stable. This stability in averages makes the statistical
study possible. For example TRP of a channel changes week by week but, the
average TRP does not change so often.
iv. Average is used in all type of mass analysis. It is the basis of statistical analysis as
from averages further statistical analysis like correlation, dispersion, skewness or
the like are made.
can make ideal type like daily soap opera, mythological program etc. Thus, if we do
not even have full information about a program but we know that it’s a
mythological program we can easily assume its quality as mythological serials have
similar format of presentation.
vi. It gives validity to generalisations. The statistical generalisation is not only drawn
from the averages but also applicable for averages. The rate of population growth is
not valid for every single family individually; it is valid for average growth rate of
the whole country.
vii. Average makes the whole idea clear in one figure or a few words only. The
expression to the average students can be either given as in figures as below 60% or
in words as C grade.
LIMITATIONS OF AVERAGE
Although averages are widely used in all communication research, still it has following
limitations –
ii. Averages can be derived in different ways like mode, median, geometric, harmonic
etc. All the types have different characteristic and give different measurement.
Thus, an average for one purpose cannot be used for other, this creates confusion.
iii. It gives a theoretical idea about the measurement or quality of a group. In practical
life they are not of much use as one cannot plan his budget on the basis of average
per capita income or average cost of living. It will depend upon his requirement,
size of the family, consumption pattern etc.
iv. Absurd results are sometimes drawn from averages which are not practically
possible like the average children per family came as 2.2, which cannot be possible.
Indeed averages have lot of limitations but still it is widely used in communication
research. As communication research is not based on individual study but mass behaviour,
mass character and phenomena, which can be best represented by averages. Thus, it has got
lot of significance in statistical analysis and limitations persist only because of unscientific
computation or irrational use.
The statistical measure to compute the scatteredness of the items within the group or their
tendency of deviation from the average is called measures of dispersion or measures of
variability in statistical terminology. The average gives measure of central tendency but is
Chapter VII – Data Analysis and Interpretation Page 173
HANDBOOK OF COMMUNICATION RESEARCH
No. of
reporter Mon Tue Wed Thu Fri Sat Total Average
Reporter 1 5 5 5 5 5 5 30 5
Reporter 2 4 4 5 5 6 6 30 5
Reporter 3 2 3 5 6 6 8 30 5
Table 8.14: Number of news – stories submitted by three reporters in weekdays
We can see that the news-stories submitted by each reporter in weekdays is same and the
average news story given by all three reporters is also equal. But, we cannot say that they all
are equal in all respects. There is a difference in their regularity and certainty. Reporter 1 is
regularly submits 5 reports in a day. Reporter 2 is slightly irregular and reporter 3 is extremely
erratic and uncertain in submitting news–stories. Thus, we can conclude that measures of
dispersion tell us the degree of average deviation from the central tendency. For a stable
measurement which is capable of being used in future, we have to find the average of deviation
of all the units. That is why dispersion is also known as an average of second order.
MEASUREMENT OF DISPERSION
1. Method of limits –
a. Range
b. Inter – quartile range
c. Semi – inter quartile range
2. Method of deviation –
a. Mean deviation
b. Standard deviation
1. Method of limits
This method of finding the dispersion is based on the limits of a group or a class. It can be
derived in following ways:
a. Range
It is the easiest method of finding the measures of variability. The range signifies the
difference between the highest and the lowest measurement. If we want to find out the
degree of dispersion in the yearly circulation of a newspaper, we have to find the highest
and the lowest limits and the difference between the two would be the range of dispersion.
The range is absolute and relative. When we have to compare the range of two groups we
have to find, it is known as relative measure of range or coefficient of range.
When two groups or two factors are compared we find the absolute range and then its
coefficient range.
Absolute Range = m1 – m0
Coeff. of Range = m1 – m0 / m1 + m0
m1 = Highest measurement
m0 = Lowest measurement
EXAMPLE 1-
Following data relates to the number of advertisement on-aired during prime time (8pm to
10pm) in two channels in a week. Let us find out whose rate of advertisement is more variable
though the method of range.
We found that lowest and highest limit of channel A is 18 and 26 and the lowest and
highest limit of channel B is 15 and 30 respectively.
Absolute Range = m1 – m0
Then the coefficient range or relative measure of range of both the channel is derived by
using the formula –
Coeff. of Range = m1 – m0 / m1 + m0
The range method of finding the dispersion is very simple to locate but it has a major
drawback of being affected by the extreme items. If we find that the average income of
people in a group is ranging from Rs.1.5lac to 2.5lacs, but if even one rich person with high
income is included. The range will increase abnormally. To avoid such errors use of inter –
quartile range is made. It is the distance from one quartile to the other. Here, elimination of
25% of items from both sides is done and only the range of middle 50% is considered. This
is done with the assumption that extra – ordinary items are deleted when 25% are left out
on both the ends and only normal items remains.
I.Q.R = Q3 – Q1
EXAMPLE 1-
From the following figures relating to the payment on per day basis to the 10 stringers of a
news agency, find out inter – quartile range and its coefficient.
150, 350, 120, 100, 270, 400, 180, 420, 240, 210
No. A B C D E F G H I J
Rs. 100 120 150 180 210 240 270 350 400 420
Let us see the 2.75th item in the above series i.e. 120. It is at 2nd place and in between 2 & 3.
Thus, 0.75 items of 150 to 120 will be taken.
Size = 120
= 120 + 0.75 x 30
Let us see the 8.25th item in the above series i.e. 350. It is at 8 th place and in between 8 & 9.
Thus, 0.25 items of 400 to 350 will be taken.
Size = 350
= 350 + 0.25 x 50
Therefore,
I.Q.R = Q3 – Q1
EXAMPLE 2 –
From the following data regarding the wages of side artist in the films, calculate the inter
quartile range.
l1 = 200 is the lower limit in case of quartile 1 has been taken as mid – value of 100 and
250.
= 200 + 50 x 8
20
= 200 + 400
20
= 200 + 20
= 220
l1 = 250 is the lower limit in case of quartile 3 has been taken as mid – value of 200 and
300.
= 250 + (50/35) x 28
= 200 +1400/ 35
= 200 + 40
= 240
= 240 – 220 = 20
Inter – quartile range is taken to be more representative than range but it has a weakness. It
is thought that more valid measurement of the degree of dispersion is the average distance
between the median and the two quartiles, and not the absolute distance from one quartile
to the other. Therefore, another measure has been used for this purpose known as Semi –
inter quartile range, where half of the inter–quartile range is expressed. It is also called as
Quartile deviation.
The semi – inter quartile range or quartile deviation can be expressed as under.
Quartile Deviation = Q3 – Q1 / 2
We find that coefficient of quartile deviation is the same as the coefficient of inter –
quartile range. The method of calculation of quartile deviation is same as inter – quartile
range. Just the difference is it has to be divided by 2.
2. Method of deviation
Deviation is a measure of difference between the observed value and the mean. It can be
positive or negative. The sign of deviation reports the direction of the difference. It is larger
when sign is positive and smaller if sign is negative. The interval of the value indicates the size
of the difference.
Deviations from the population mean are errors while the deviation form the sample mean are
residuals. The sum of deviation across the entire set of all the observation from the mean is
always zero and the average deviation is zero.
a. Mean deviation
The arithmetic average of deviations of each item value from any average is known as mean
deviation. It is not a method of limits. It takes into consideration every single measurement and
not some crucial points only. Thus, it is a method of average deviation.
Any average mean, mode or median can be selected for computing deviation. Generally,
median is considered to be most suitable because the mean deviation from the median is the
least. Mean deviation is symbolically expressed by Greek letter delta (δ).
For calculating the coefficient of mean deviation, the mean deviation is divided by the average
(mean, median or mode), whichever ha been used for the purpose of calculating deviation.
The following procedure is followed:
i. First step is to calculate mean, median, mode or arithmetic average and deviation
from it. Median should be given preference in comparison to other forms of
averages.
ii. Secondly, the difference of each items from the average has to be calculated
ignoring the plus or minus sign.
iii. Then, multiply deviation with their respective frequencies (fd).
iv. The product of all the deviation with their respective frequencies (f d) should be
added ∑fd.
v. The total of ∑fd should be divided by the total number of items ∑fd / n.
vi. In case of continuous series where class intervals are given then, the mid – value of
the group should be taken as measurement.
EXAMPLE 1 –
From the following data relating to the number of special correspondents in 5 daily
newspapers houses calculate the Mean deviation and its coefficient.
3, 4, 7, 2, 6
=3
= 9 / 5 = 1.80
EXAMPLE 2 –
From the following series calculate the mean deviation and its coefficient.
Size Frequency
0–5 5
Let us first arrange the 5 – 10 8 continuous series -
10 – 15 10
Measurement Mid – value15 –Frequency
20 7 Cumulative Deviation Freq. x
(m) (f)
20 – 25 6 Frequency From median Dev.
25 - 30 4 (cf) 13.5 (d)
(f d)
0–5 2.5 5 5 11 55
5 – 10 7.5 8 13 6 48
10 – 15 12.5 10 23 1 10
15 – 20 17.5 7 30 4 28
20 – 25 22.5 6 36 9 54
25 - 30 27.5 4 40 14 56
Total 251
Table 8.19: Arrangement of data for finding different values
Middle number (m) = n/2 The cumulative frequency is 40. Thus, n = 40.
= 40/2 = 20
= 10 + 5 x 7 / 10
= 10 + 35 / 10
= 10 + 3.5 = 13.5
Mean deviation (δ)= ∑f d / n Here, ∑fd = 251 as the total of the product of frequency and
deviation is 251 and n = 40.
= 6.275 / 13.5
= 0.46
b. Standard deviation
The major limitation of mean deviation is that it ignores the plus, minus sign in the calculation
of deviation. This limits it to further analysis. Although mathematically it is more accurate than
method of limits but in order to remove this limitation another method known as standard
deviation is used. In this method the plus and minus sign is taken into consideration but the
deviation are squared up to eliminate the plus and minus signs. The total of square of
deviations is then divided by the total number, and the square root of quotient is the standard
deviation.
σ = ∑f d2 / n
Coefficient of S.D. = σ / a Here, a is calculated using the basic formula of Arithmetic average.
a = x1 + x2 + x3 + ….. xn / n
or
a = ∑mf / n
d = Deviation from the arithmetic average. Standard deviation is almost always calculated from
arithmetic average.
a = Arithmetic average.
EXAMPLE 1 –
Calculate the standard deviation and its coefficient of the following series.
Measurement Frequency
1-3 4
3-5 6
Let us find the following values 5-7 9 and arrange them in tabular
form – 7-9 12
9-11 5
Measurement Mid – value Frq. 11-13Product of4 Deviation from Square Product of
(m) (f)mxf a i.e. 7 of d f x d2
(mf) (d) (d2)
1-3 2 4 8 -5 25 100
3-5 4 6 24 -3 9 54
5-7 6 9 54 -1 1 9
7-9 8 12 96 +1 1 12
9-11 10 5 50 +3 9 45
11-13 12 4 40 +5 25 100
40 ∑mf = 280 70 ∑f d2 = 320
Table 8.20: Arrangement of data for finding different values for computing standard deviation
and its coefficient
a = ∑mf / n
Chapter VII – Data Analysis and Interpretation Page 185
HANDBOOK OF COMMUNICATION RESEARCH
a = 280 / 40 = 7
σ = ∑f d2 / n
= 320 / 40
= 8
= 2.83
Coefficient of S.D. = σ / a
= 2.83 / 7
= 0.404
v. MEASURES OF ASSOCIATION
There may or may not be any association between two or more phenomena, groups, classes or
series of data. A good book leads to bestseller of the year. Demand of entertainment serials
begins to increase it’s number of production also increases. Increase in Coal production in the
state may not be related with population growth of the state. Thus, we can say an increase in
one phenomenon brings similar or opposite change in other phenomenon may not have any
It does not matter whether data in one series changes in same or reverse direction from the
other, only there should be a causal connection between the two. In both the cases correlation
exist but it is termed as positive or negative according to the direction of change. Correlation is
said as positive if both the phenomena are changing in same direction. It is also termed as
Direct Correlation. For example, increase in light leads to increase in heat or the height of a
boy increases with his age. Another example is that higher the circulation of newspaper the
more will be the revenue. Likewise if both the phenomena change in opposite direction it is
said as Negative or Inverse or Indirect correlation. For example, increase in medical facilities
reduces the death rate or increase in number of children decreases the standard of living as
money available per head is reduced. Increasingly use of statistical data in write up decreases
readership.
If change in one phenomenon brings change in other phenomena in fixed proportion, the
correlation between them is said to be perfect correlation. Numerically it is indicated as +1 or
-1. The +1 correlation is said as Perfect positive correlation, which increases or decreases in
same direction with fixed proportion. The -1 correlation is said as Perfect negative correlation,
which increases or decreases in opposite direction with fixed proportion.
Positive (i) and Negative (ii) Correlation have already been discussed above.
When the ratio of change in the two variables is the same, the various values when plotted on
a graph paper would be a straight line. Then the correlation is said to be Linear. Such a
correlation is found generally in cases of physical sciences. As in case of physical science the
increase in one phenomenon is directly proportional to other phenomenon like 5% increase in
height leads to 7% increase in weight. Thus, the relationship between the two will is said to
have a linear correlation.
When the ratio of change in two variables varies in some parts of the series, which when
plotted on a graph paper would form a curve it is said to be Curvilinear. In case of
communication research the correlation is never so perfect thus the relationship is generally
Chapter VII – Data Analysis and Interpretation Page 187
HANDBOOK OF COMMUNICATION RESEARCH
curvilinear. Like TV viewership and TV programmes quality are related to each other but
relationship does not follow a straight line.
After establishing correlation between two variables, it is to be seen as to what is the exact
degree in which the two are correlated. There can be three ways of correlation:
i. The two series may be highly correlated, so that change in one may be accompanied
by an equal change in other.
ii. There may be only a slight correlation between the two series; a change in one series
may not be accompanied by a change in other series.
iii. Sometimes no change is seen even in opposite direction between the two variables.
But, still relationship exists.
Hence, it is required to measure the degree of correlation. Following are the major methods of
ascertaining the degree of correlation:
I. Mathematical methods –
r = ∑dx dy / n x σx x σy
r = Coefficient of correlation
dx dy = Product of dx and dy
n = Number of pairs.
In this formula the deviation has to be calculated from the actual arithmetic average. This
involves a long process. To avoid it a short cut formula is used in the deviation is calculated
from the assumed average.
If standard deviation is not to be calculated separately the same formula would be as under-
EXAMPLE -
From the following numbers of adult and children viewers of TV programmes in a month. Find
out if there is any correlation between the two i.e. viewing of TV programmes by adult and
children.
Adult viewers: 63 64 65 67 68 69 70 71 71
Children viewers: 65 63 63 65 67 68 71 68 69
∑dx = -3 is the deviations of item values of x series from its arithmetic average.
∑dy = -4 is the deviations of item values of y series from its arithmetic average.
n = 10 is number of pairs.
∑dx 2 = 75 is the total of the square of the deviations of item values of x series from its arithmetic
average.
∑dy 2 = 62 is the total of the square of the deviations of item values of y series from its arithmetic
average.
r= 58 – 10 ( -3 / 10 ) ( -4 / 10 )
= 58 – 10 x -0.3 x -0.4
= 58 – 10 x 0.12
= 58 – 1.2
10 x 7.41 x 6.04
= 58 – 1.2
10 x 2.72 x 2.46
= 56.8
66.9
= 0.85
Hence, correlation between the adult and children viewers of TV programmes is of 0.85. It
means that there is a high degree positive correlation between adult and children viewers of
T.V. programmes.
In order to understand the significance of correlation probable error is also calculated. Formula
given below is used for calculating probable errors:
P.E = 0.6745 1 – r2
n
The symbols used in the formula denotes-
P.E = Probable error
r2 = Square of coefficient of correlation
P.E = 0.6745 1 – r2
10
= 0.6745 1 – 0.7225
10
= 0.6745 x 0.2775
3.16
= 0.06
It means that correlation (r) [0.85] is not significant as probable error [0.06] is less than 6 times
of (r).
i. With the use of sign (+) or (-) it should be stated that the correlation is positive
or negative.
i. (r) is always between +1 and -1. (r) can never be more than +1 in any case. +1 is
said to be perfect positive correlation and -1 is said to be perfect negative
correlation. ‘0’ value comes when there is absolute lack of correlation. Such a
condition is not there in case of communication research study.
iii. Karl Pearson’s method takes into consideration not only the direction of change of
different pairs, but also the degree of such variation. Therefore, it is a more exact
measurement than others.
In concurrent deviation method the degree of change is not considered but only the direction
of change is taken into consideration. In this method two series may be said as concurrent if
increase in one is followed by the increase in other and vice – versa. Thus, if simultaneous
increase and decrease is occurring between two series than only it is said to have correlation.
Even if they change in varying degrees.
r= + + (2 c – n )
This is one unit less than the actual number of pairs as one pair is eliminated in computing
the deviation.
ii. Mention the deviation of every item from its previous item. The first item is thus
eliminated. if the second item is more than first item (+) sign will be put before it and
if it is less (-) sign will be placed before it. Than the third item will be compared with
the second and forth item with the fifth and so on. The deviation signs of two series
would be nominated as dx and dy of the two series respectively.
iii. After deriving dx and dy. Multiplication of dx with dy is done using simple
mathematics law i.e. (+) x (+) = (+), (-) x (-) = (+) and (+) x (-) = (-).
iv. Then numbers of pairs of concurrent deviation are sorted out. These are the pairs
which have moved in same direction. This is done by counting the (+) signs in dx dy
column. It is nominated as c.
v. The + and - sign have been placed in both inside and outside the root sign. This is
because if the value of 2c is more than n, the value of 2c – n / n will be also negative.
Hence, square root of negative value cannot be taken out. Thus, by placing (-) sign
before 2c – n the minus value is converted into plus, and the square root is taken out.
When the root has been calculated the (-) sign is again utilized to reconvert it into
minus. Therefore only one sign is used at a time. if the value of 2c – n is negative the
formula would be -
- - (2c – n)
n
EXAMPLE –
The following data relates to the money spent on TV news services and the viewers rates per
thousand for the whole year on monthly basis. Find the method of concurrent deviation
whether the two variables are correlated.
= + + (2 x 8 – 11 / 11)
= 0.4545
= 0.67
It means moderate correlation exists between money spent on TV news services and viewers
rate per thousands.
iii. Where there exists a causal relationship between the two series so that one is the
cause and the other, effect.
The formula used for calculating the coefficient of correlation by method of rank differences
is –
r = 1 – 6 ( ∑d2 ) / n (n2 – 1)
or r = 1 – 6 ( ∑d2 ) / n3 – n
n = Number of items.
i. Firstly, finding of the rank of individual item values should be done. It can be
arranged either from the highest or the lowest value. The items should be arranged
according to their rank, only the number of their rank should be written before
them.
ii. Then the difference between ranks of x and y series is drawn. It is nominated as d.
iii. The square of the difference (d2) is made and then the formula of rank differences is
applied.
EXAMPLE –
Ten competitors in the T.V Anchor Hunt contest are ranked by the two judges as follows:-
I - Judge –1 6 5 10 3 2 4 9 7 8.
II – Judge –6 4 9 8 1 2 3 10 5 7.
Find out by the method of rank difference how far the opinions of the two judges are similar.
6 x 60
=1–
10 (102 – 1)
6 x 60
=1-
10 (100 – 1)
= 1 – 360 / 990
= 1 – 0.36
= 0.64
The opinions of two judges are fairly similar. Sometimes two or more items have equal values.
They have to be given equal rank, which is the average of their ranks if they have been
inequal. Thus, if after 5th rank two items have equal values; their rank will be 6.5 in each case
because of these common ranks the coefficient of correlation has to be corrected as followed.
Here, t stands for the number of items value is repeated. Thus, if measurement x is repeated 3
times, value of ‘t’ would be 3. One value of (t3 - t) is added for each item.
Apart form the mathematical methods of measuring correlation graphical methods are also
used. It does not give an exact degree of measurement but certainly give an idea of correlation
by observation. These methods are easy to use and follow by lay person also. The following is
a type of graphical method commonly used.
a. Simple graph
In simple graph the values of two series are plotted on a graph paper. The various points are
joined by straight lines. The trend of two lines is now studied. If they move in same direction
with simultaneous peaks and troughs, the two series are said to be correlated. If no such
tendency is noticeable they are unrelated or independent of each other. If the peak in one line is
coupled with trough in the other line and vice-versa, the two variables are negatively
correlated.
EXAMPLE –
The following data regarding marks of 10 students in two subjects namely Reporting and
Editing is shown by means of graph whether they are correlated.
Student –A B C D E F G H I J
Marks in Reporting – 20 25 22 27 20 30 28 32 45 33
Marks in Editing - 15 22 19 28 18 26 28 30 35 32
50
40
30
Marks
Series1
20 Series2
10
0
A B C D E F G H I J
Students
Table 8.24: Graphical method for analysing correlation between two series
It is one of the simplest forms of the variable analysis. As the name suggest it involves the
analysis of two variables for determining the empirical relationship between them. To
understand the relation it necessary to measure how the two variables simultaneously
change together. It is easy to conduct testing of simple hypothesis of association and
causality to what extent it becomes convenient to know and predict a value for the
dependent variable if the value of independent value is known.
The distinguishing factors between bi-variate and uni-variate analysis is that in uni-variate
only one variable is analysed and it is more for description purpose. Whereas, bi-variate
analysis goes beyond description, it also analyse the relationship between two variables.
Bi-variate analysis is a simple i.e. relationship of two variable. Multi-variate analysis where
multiple relationship between multiple variables are examined simultaneously.
As name suggests the analysis of more than two variables of a sample is known as Multi-
Variate Analysis (MVA). It includes a set of techniques for analysing data having multiple
observations on each sample. To make a comparative study between two channels on the
factors influencing their rapid growth and popularity, various factors like picture quality,
format of programs, presentation style, marketing strategy etc shall be compared. Thus, on
two television channels taken as samples multiple variables are studied simultaneously.
Hence, it will be regarded as Multi – Variate Analysis.
The techniques of multivariate analysis are very suitable for analysing data represented by
different variables. Many of its techniques have been partially developed recently because
of their dependency on computational capabilities of modern computers. The major
advantage of multivariate techniques is that it considers the correlation or inter –
dependence among the variables. This leads to a correct interpretation of the results.
Different types of variables are used in multi-variate analysis. The prominent one are
described as follows:
a. Explanatory variable and criterion variable – The explanatory variable is the one
which is considered as the cause of other variable. Suppose, X is a variable considered
as cause of Y, then X is explanatory or casual or independent variable and Y is the
criterion or resultant or dependent variable. It can have set of many variables and
named as (X1,X2,X3…Xp) set of explanatory variables and (Y1,Y2,Y3….Yq) as set of
criterion variables. The usage of term external criterion is done for explanatory variable
and the term internal criterion for criterion variable.
c. Discrete variable and continuous variable – The variable which when measured may
take only the integer value i.e. one single figure or whole number, not in fraction or in
decimal points. It is called as Discrete Variable. The variable which can give any real
value not only a whole number but even in fraction or in decimal points. It is called as
Continuous Variable.
d. Pseudo variable or Dummy variable – If only one of Xi (i = 1,…..,m) is 1 and rest all
are zero, then such variable is called as pseudo variable. It is used in technical sense, is
useful in algebraic manipulation applied in multi-variate techniques.
There are many multivariate techniques classified on basis of few questions like are the
variable dependent or independent upon others, how many variables are dependent and
whether the data is qualitative or quantitative. Depending upon the question the
multivariate techniques are broadly categorized in two:
z’y = β1 z1 + β2 z2 + …..+βk zk
where, z’y stands for the predicted value of the standardized Y score, z y. The expression β1
z1 + β2 z2 + …..+βk zk is the linear combination of explanatory variables. Constant A is
eliminated in the process of converting X’s to z’s. in the equation least square method is
used, to estimate the beta weights in such a way that the sum of the squared prediction
errors is kept as small as possible. Thus, the expression becomes Σ (zy – z’y)2 is minimized.
Researcher may sometime use step-wise regression techniques to have better idea of
independent contribution of each explanatory variable.
h. FACTOR ANALYSIS
As discussed above for the data containing several variables without dependency relationship
various multi-variate techniques are used one of which is factor analysis. Factor analysis is
used where quantitative (metric) inputs are given. It is not suitable of qualitative (non-metric)
inputs. It is not a single method for analysis but a set of techniques, therefore for various
several methods the results will not necessarily be the same.
A. Centroid method
B. Principal components method
C. Maximum likelihood method
FACTOR: It means several observed variables. Factors can be one or more, depending
upon the nature of the study and number of variables involved in it.
FACTOR – LOADING: Values that explain the closeness of variables relating to each
one of the factors discovered is termed as factor – loading. It is thus also known as
factor – variable correlation. It shows the absolute size of the loading which helps in
understanding the meaning of particular factor and interpreting it.
COMMONALITY (h2): This reflects the accountability of each variable for the factors
taken together. The higher the value of (h2) lesser is the number of variables left over
after whatever the factors represent is taken into consideration. The formula of (h2) is:
h2 of the ith variable = (ith factor loading of factor A)2 + (ith factor loading of factorB)2
+ …….
EIGEN VALUE: This term is also known as latent root. It is the sum of squares of the
values of factor loadings relating to a factor. The main function of this value is to
indicate the relative importance of each factor in accounting for a particular set of
variables being analysed.
TOTAL SUM OF SQUARES: It is the total of the Eigen values of all factors. The total
of sum squares when divided with the number of variable involved in the study gives
an index that shows how the particular solution accounts for what all variables taken
together represent. If the index is low, it means all the variables are very different from
each other. Unity is achieved in index in two cases; if the variables fall into one or more
highly unneeded groups or if the extracted factors account for all the groups.
Factor analysis is mainly used for developing psychological test such as Intelligence Quotient
test, Personality test etc. In communication research this method is helpful in understanding the
media readership, to analyse the communication quotient of persons. It can be either of the two
types given below:
i. Factor analysis helps in deriving the significance of data. These facts are within the data
but not clearly understood with mere figures.
ii. If we are asked to rate different news magazines according to preference, it is most
suitable to use factor analysis method because it reveals some unnoticed important points
of various news magazines that are the main cause of their preferences.
iii. It reduces and simplifies the multivariate data.
iv. The empirical clustering of media is very much possible with factor analysis. It provides
a classification basis when data need to be grouped which is derived from various rating
scales.
i. Factor analysis methods are quite expensive because it involves lot of computation.
ii. The computation process is very laborious on part of research worker.
iii. Analysis is required to be done twice for more reliable results as results of single factor
analysis are commonly considered to be less dependable and less reliable.
Chapter VII – Data Analysis and Interpretation Page 203
HANDBOOK OF COMMUNICATION RESEARCH
iv. It is very complicated method and requires thorough knowledge and enough experience
for handling computation.
Sometimes it is seen that the sample drawn from the population is different in certain
characteristics from the population parameter. There can be two strong reasons for this
difference – either there may be fluctuations of sampling or its parent population is different
from the one brought according to the hypothesis supposed.
The probability of getting a difference of the sample estimate and the population value equal to
or greater than the observed differences with the use of its standard error has to be seen. If the
probability is very low, this means that there are very few chances of occurrence of such
differences due to errors of sampling alone. Thus, it can be concluded that the difference
observed is too great to prove that the samples has been drawn from the population considered.
It has definitely come from a different population. This difference is said to be significant. Or
on other hand if the probability of getting a difference equal to or greater than the observed one
is very high it means that the differences greater than the observed one will be due to errors of
sampling alone in majority of the cases and, therefore the observed difference is not
significant. These kinds of difference are because of errors of random sampling.
In the same way when comparison is made for estimating certain characteristics in the samples
and to know whether the two samples under study have been drawn from the same population
or not. The statistical procedure used for deciding whether the difference under study is
significant or non-significant, it is called as Test of Significance.
Level of significance
When test of significance is applied, the probability of getting a difference equal to or greater
than the observed one in combination with the standard error of the estimate is generally found.
The values of such a probability which are used to provide rough lines of separation between
the acceptance and rejection of the significance of observed differences are known as levels of
significance. The level of significance is commonly taken as 0.05, 0.01 or 0.001 etc.
Degree of Freedom
Degree of freedom is very important to understand in statistical analysis. Suppose that there are
N observations X1,X2,X3…..XN to which we wish to give values such that their sum is a
constant figure. Then we can give values whatever we please to (N - 1) X’s only the value of
Nth will be determined by the condition that sum of all X’s is equal to a given constant
quantity. Thus, here we can say the degree of freedom is (N - 1). For example, if we take any
five values such that their sum is 40, then we take only 4 values like 3,7,11,8. The fifth one
does not depend upon our own will, but that depends upon the condition that their sum is to be
equal to 40 therefore the fifth number will be 11. Hence, we can see there is no full freedom of
taking the five values according to our own will, but it is that of four only.
Therefore, we can define degrees of freedom as number of items or the number of class
intervals whose values can be determined at will. If the restrictions imposed are two then two
degrees of freedom will be lost. Such restrictions are termed as constraints. If the number of
items or the number of class interval be N and the number of constraints be k, then the degree
of freedom will be given as-
Degree of Freedom = N - k
a. ‘t’ – TEST
One of the most commonly used statistical data analysis procedure for hypothesis testing is ‘t’
– test. The two major kinds of ‘t’-test are :
3. Student’s t – test
4. Fisher’s t – test
i. Student’s t – test
This test is very commonly used. It is also known as ‘two – sample t – test’ or ‘independent
samples t-test’. It tests whether or not two independent populations have different mean
values on some measure.
In Student’s t-test the testing of significance of the departure of sample mean X from a
hypothetical value µ or testing the significance of the difference between the two samples
means X and Y. We have assumed that the standard deviation σ of the normal
population, from which the samples are supposed to have been drawn, is known or sample
is large. Although in actual practice the standard deviation σ remains unknown. If samples
are large we can get a fairly close estimate of the standard deviation σ we can use the value
of ‘z’ its estimate given by
S2 = Σ (X - X )2 / N - 1
X = sample mean
µ = hypothetical value
Also, z = X - µ
Chapter VII – Data Analysis and Interpretation Page 205
HANDBOOK OF COMMUNICATION RESEARCH
S/ N
If N is large, the error involved in replacing σ with its sample estimate S will be negligible,
but if N is small the error may be appreciable.
In the absence of the parameter σ when we use its estimate S, the distribution of z remains
no longer normal but changes to another distribution, named as ‘t’ – distribution. Using this
‘t’ we have
t= X-µ
S/ N
EXAMPLE
Ten printed advertisements are taken from a newspaper at random whose lengths in
centimeters are given below:-
52, 55, 57, 61, 65, 67, 68, 70, 71.
Using the t- test find the suggestion that the mean length of print advertisements in the
newspaper is 60 centimeters.
Let us first construct a table for calculating the sample mean and the S.D.-
S. no. Length d d2
in cm (X) = X - 60
1 52 -8 64
2 55 -5 25
3 57 -3 9
4 61 1 1
5 64 4 16
6 65 5 25
7 67 7 49
8 68 8 64
9 70 10 100
10 71 11 121
Total 630 30 474
Table 8.25: Tabulation for Student’s t – test.
Σ (X – X)2 = Σ d2 – (Σ d)2 / N
= 474 – (30)2 / 10
= 474 – 90 /10
= 384
Here, value of Σ d2 = 474, the total of the square of the difference of different lengths from
the mean.
Therefore, S = Σ (X – X )2
N–1
= 384 / 9
And
t= X-µ
S/ N
= 63 – 60 / 2.065 = 1.45
Hence, the observed value of t is 1.45 which being less than 2.065, the value of t at 5%
level of significance for 9 d. f., is proved to be not significant. Hence, we can say that the
mean length of advertisements is 60cm within the limits of errors of sampling.
This is the other type of t-test for statistical significance used in the analysis of tables having
commonality especially for small sample sizes. It is after the name of its inventor R.A. Fisher.
It is one of a class of exact tests. Here, comparison of two means from two samples is
conveniently performed. Suppose we have two samples X 1, X2, …..Xn1 and Y1,Y2, …. Yn2. The
following statistics will be calculated for testing the significance of the difference between
their means:
X= 1 Σ X, S12 = 1 Σ (X - X)2
n1 n1 - 1
Y= 1 Σ Y, S12 = 1 Σ (X - Y)2
n2 n2 - 1
Here, S1 2 and S22 are the estimated variance from two samples and σc2 is an estimate of the
population variance which is estimated by pooling the sums of squares Σ(X - X)2 and Σ(Y-Y)2
and then dividing by the total number of degrees of freedom {(n1 – 1) + (n2 – 1)} contributed by
the two samples as given by equation (1).
Therefore, t= X–Y
The denominator of the expression given in (3) is the standard error of (X - Y). This gives a
different value from that obtained in equation large samples.
EXAMPLE
The per month sale of six randomly selected local magazines with poor paper quality but good
content is as follows: 43,45,48,49,51 and 52. Where as the per month sale of 10 randomly
selected local magazines with fine paper quality and average content is as follows:
47,49,49,51,53,54,55,55,56 and 57.
Test whether good paper quality effects on the sale of the magazines.
Here, X = 526 / 10 = 52.6 The total of the sum of X is 526 and the total number of
magazines of good paper quality are 10.
Σ(X - X)2 = Σd12 – (Σd1)2 / n1 The Σd12 = 106 , Σd1 = -4, n1 = 10.
= 106 - (-4)2 / 10
= 104.4
Y = 288/6 = 48.0 The total of the sum of Y is 288 and the total number of
magazines of poor paper quality are 6.
= 60 - (0)2 / 6
= 60.0
(n1 – 1) + (n2 – 1)
= 104.4 + 60.0 / 9 + 5
= 11.74
Therefore,
σc2( 1 + 1 )
n1 n2
= 52.6 – 48.0
11.74( 1 + 1)
10 6
= 2.599
= 9 + 5
= 14
Hence, the observed value of t is 2.599 which is greater than 2.145, which being greater than
2.145, the value of t at 5% level of significance is proved to be significant. This shows that the
observed mean differs from each other. So, it can be said that the good paper quality effect the
sale of magazine.
b. ‘F’ – TEST
‘F’ – test is a statistical test of having varying distribution under the null hypothesis. It is used
for comparing statistical models that have been fit to a data set, to identify the model that best
fits the population from which the data was sampled. The aim of these experiments is to see
whether there exist any real difference between the treatments or they are only the errors of
sampling. One has to start with the null hypothesis that all the treatments are equal so far as
their effects on a characteristic is concerned. This means the difference between them is zero.
We here suppose that τ for the treatment effect our null hypothesis will be
τ1 = τ2 = τ3 =………….τn
It is to be seen either this hypothesis will be disapproved or accepted within the limits of
chance error. The testing can be done by calculating the ratio between the treatment variance
and error variance and then testing its significance by comparing it with the expected value of
the variance ratio at desired probability level. The ratio between the two variances is expressed
by the symbol F and the test is known as F-test.
In honour of Sir R.A. Fisher, who initially developed the statistic as the variance ratio, the
name f- test was coined by George W. Snedecor.
F = V 1 / V2
EXAMPLE
Treatment
Table 8.27: Lots 1 2 3 4 5 Totals Means Tabulation for F–
test. 1 7 8 8 9 8 40 8
2 11 12 13 10 9 55 11
i. The mean (µ) = Σ 3 6 7 12 11 4 40 8 X / nk here,
Σ X = 135, n = 5 & Totals 24 27 33 30 21 135 k=3
Means 8 9 11 10 7
= 135 / 15
= 9
Here, R1 , R2 , R3 are the totals of lots and C.F = (Total of all the nk variates)2 / nk
D.F = k – 1
= 30
D.F = 3 – 1 = 2
Here, D.F. = n – 1
D.F. = 5 – 1 = 4
kVT = 30 / 4 = 7.5
VE = S.S / D.F.
Here, S.S due to error = Total S.S. – Lot S.S – Treatment S.S
= 88 – 30 – 30
= 28
= 88
D.F. = (n - 1) (k - 1) = (3-1) (5 - 1) = 2 x 4 = 8.
Therefore, VE = 28 / 8 = 3.5
The observed value of F is compared with the value given in the table for v1 and v2 degree of
freedom at any desired probability level. The levels in which we are generally interested are .
001, .01 and .05
v1 = 4 and v2 = 8.
Value of F at 5% level of significance for v1 = 4 and v2 = 8 is 3.84. As the observed value 3.84
is less than F5% . It is not significant. Thus, there is no significant difference between the
treatments. If the observed value of F been proved significant, the interpretation would have
been that the differences between the treatments are real and not due to errors of sampling.
Amongst the several tests developed so far, it is one of the most important tests. As the above
tests like ‘t’ – test, F – test etc are helpful in studying the differences between any two
distributions but they are unable to express all the features of distributions. Chi – square test,
which is represented as X2 and pronounced as Ki- square is a non-parametric test. In non-
parametric test we do not assume that a particular distribution is applicable, or that there is a
certain value attached to a parameter of the population. It can be used to make comparison
between theoretical population and actual data when categories are used. Thus, Chi – square
test is applicable in large number of problems. For certain cases which are to be studied on
basis of any hypothesis based on some general law of nature or any reasoning. In such cases
we need to determine the theoretical or hypothetical frequencies on the basis of the hypothesis
assumed and then, by comparing them with the observed frequencies test whether the observed
frequencies are in agreement with the hypothetical ones. The test is able to help to know
whether the –
These entire problems can be handled with the help of X2 – test or Chi – square test.
The X2 – test of significance gives the probability of getting a value of X 2 equal to or greater
than the observed one in random sampling. If probability is less than 0.05 which is considered
very small we are justified in suspecting significant divergence between the fact and theory and
then can declare the null hypothesis on the basis of its disproval. In other case if the probability
is not small, it is greater than 0.05, we cannot say that the hypothesis is proved to be correct. It
can be said hat by the application of X2 test we find no grounds to suspect the hypothesis can
be accepted within the limits of experimental error and it can be said that the observed data are
in agreement with the hypothesis.
For very large values of X2 which leads us to suspect the hypothesis or the sampling
techniques, are very rare. Same is the case with very small values nearly equal to zero, which
lead to prove a very close agreement between the fact and theory. In such situation we start
suspecting our sampling technique and say that very close correspondence between the fact
and the theory to this extent is too good to be true.
X2 = Σ (O – E)2 / E
EXAMPLE
Name of scriptwriters Number of words
Given below is number of words used by 10
A 38
scriptwriters in writing a box news story.
B 40
C 45
D 53
E 47
Chapter VII – Data Analysis F
and Interpretation 43 Page 214
G 55
H 48
I 52
J 49
HANDBOOK OF COMMUNICATION RESEARCH
(Xi - X)
A 38 -9 81
B 40 -7 49
C 45 -2 4
D 53 +6 36
E 47 0 0
F 43 -4 16
G 55 +8 64
H 48 +1 1
I 52 +5 25
J 49 +2 4
n = 10 ΣXi = 470 280
or σs2 = 31.11
Let the null hypothesis be H0 : σp2 = σs2 . To check this hypothesis we need to find out the
X2 , using the formula –
= 13.999.
It is to be known that there are certain conditions that must be satisfied before X 2 test is
applied:
i. The number of items in each group should not be less than 10. If it is less than 10,
regrouping is done by combining the frequencies of adjoining groups so that the
new items become greater than 10.
ii. The items in the sample should be independent.
iii. The random sampling must be used for recording and using the observation.
iv. The constraints must be linear. If there are constraints which involve linear equation
in the class intervals frequencies of a contingency table, means it contains equation
having no squares or higher powers of the frequencies are known as linear
constraints.
v. The number of items must be large, atleast 50 for better results.
Statistical Package for the Social Sciences (SPSS) is a comprehensive system of research so
that data can be analyzed at a faster speed, conveniently and with less human errors. It is a
powerful tool i.e. capable of analyzing any type of data in the area of communication research
and social sciences, the natural and physical sciences. Its first version was released in 1968
after being developed by Noman H. Nie and C. Hadlai Hull. This package of programme is
available for both personal and mainframe (or multi-user) computers. SPSS package consists
of a set of software tools for data entry, data management, statistical analysis and presentation.
It integrates complex data and file management, statistical analysis and reporting functions. It
is used by market researchers, health researcher, survey companies, government, education
researchers, marketing organizations, communicators and journalists. The original SPSS
manual (Nie, Bent & Hull. 1970) has been considered as one of the most valuable publication.
In addition to statistical analysis, data management and documentation are important features
of the base software. SPSS can take data from almost any type of file and use them to generate
tables, reports, charts, graphs, descriptive statistics and complex statistical analysis.
SPSS Inc. has produced several manuals to describe everything their package of programmes
attempts to accomplish. Between 2009 and 2010 the premier vendor for WPSS was called
PASW (Predictive Analytics Software) Statistics. The company announced on July 28, 2009
that it was being acquired By IBM. As of January 2010, it became “SPSS: An IBM Company”.
IBM SPSS is now fully integrated into the IBM Corporation, and is one of the brands under
IBM Software Group’s Business Analytics Portfolio, together with IBM Cognos.
1. Get your data into SPSS: You can open a previously saved SPSS data file, read a
spreadsheet, database, or text data file, or enter your data directly in the Data Editor.
2. Select procedure: Select a procedure from the menus to calculate statistics or to create a
chart.
3. Select the variables for the analysis: the variables in the data file are displayed in a
dialog box for the procedure.
4. Run the procedure: Results are displayed in the viewer.
Statistical Procedures
After entering the data set in Data Editor or reading an ASCII data file, we are now ready to
analyze it. The Analyze option has the following sub options:
Chapter VII – Data Analysis and Interpretation Page 217
HANDBOOK OF COMMUNICATION RESEARCH
Reports
Descriptive Statistics
Custom Tables
Compare means
General Linear Model (GLM)
Correlate
Regression
Loglinear
Classify
Data Reduction
Scale
Non parametric tests
Time Series
Survival
Multiple response
Descriptive Statistics: This submenu provides techniques for summarizing data with statistics,
charts, and reports. The various sub-sub menus under this are as follows:
Frequencies provides information about the relative frequency of the occurrence of
each category of a variable. This can be used in to obtain summary statistics that
describe the typical value and the spread of the observations. To compute summary
statistics for each of several groups of cases, Means procedure or the Explore procedure
can be used.
Descriptive is used to calculate statistics that summarize the values of a variable like
the measures of central tendency, measures of dispersion, skewness, kurtosis etc.
Explore produces and displays summary statistics for all cases or separately for groups
of cases. Boxplots, stem-and leaf plots, histograms, tests of normality, robust estimates
of location, frequency tables and other descriptive statistics and plots can also be
obtained.
Crosstabs is used to count the number of cases that have different combinations of
values of two or more variables, and to calculate summary statistics and tests. The
variables you use to form the categories within which the counts are obtained should
have a limited number of distinct values.
List Cases displays the values of variables for cases in the data file.
Report Summaries in Rows produces reports in which different summary statistics are
laid out in rows. Case listings are also available from this command, which or without
summary statistics.
Report Summaries in Columns produces reports in which different summary statistics
are laid out in separate columns.
Custom Tables sub-menu provides attractive, flexible displays of frequency counts,
percentages and other statistics.
In the SPSS release 14.0, they introduced the Charts Builder option for creating graphs and
charts. Chart Builder creates the most professional looking graphs SPSS has ever provided.
There are three complete sets of graphing procedures available in SPSS; Legacy Graphs,
Interactive Graphs and Chart Builder Graphs SPSS 15.0 operations reflects now step by step
boxes and new screens.
1. SPSS Programmability Extension (added in version 14). Allows Python, R. and .NET
programming control of SPSS.
2. SPSS Data Preparation (added in version 14). Allows programming of logical checks
and reporting of suspicious values.
3. SPSS Regression - Logistic regression, ordinal regression, multinomial logistic
regression, and mixed models.
4. SPSS Advance Models – Multivariate GLM and repeated measures ANOVA (removed
from base system in version 14).
5. SPSS Decision Trees. Creates classification and decision trees for identifying groups
and predicting behaviour.
6. SPSS Custom Tables. Allows user-defined control of output for reports.
7. SPSS Exact Tests. Allows statistical testing on small samples.
8. SPSS Categories.
9. SPSS Forecasting.
10. SPSS Conjoint.
11. SPSS Missing Values. Simple regression-based imputation.
12. SPSS Complex Samples (added in version 12). Adjusts for stratification and clustering
and other sample selection biases.
13. AMOS (Analysis of Moment Structures) – add-on which allows modeling of structural
equation and covariance structures, path analysis, and has the more basic capabilities
such as linear regression analysis, ANOVA and ANCOVA.
SPSS has produced several manuals to describe everything that their package of programmes
provides. These volumes run into more than three thousand pages in documentation. SPSS is
able to do is described in the manuals. The experienced researcher in the area of social
sciences, communication and journalism the ownership of the manual is required. Otherwise he
may use only those manuals which are required for the type of research being conducted and
statistical tests to be used.
SUMMARY
Data analysis is undertaken to reach to useful information so that final conclusions can be
drawn and recommendations can be made. Data through application of statistical procedures
are analyzed to reach specific conclusions. Charts, graphs and other pictorial presentations are
used for depicting the data. Statistical principles have very wide scope of application. Actually
there are three main aims of statistics i.e. to study the population, variation and methods of
reducing data. The researcher frequently uses frequencies, percentages and measures of central
tendency. We should be aware of advantages, disadvantages and limitations of expressing the
data in frequencies, percentages and averages. Measures of dispersion are equally important to
the researcher as it is necessary to know measures of variability, degree of deviation from the
central tendency or scattering of items within the group. There are number of methods of
measuring the dispersion such as standard deviation, mean deviation etc. which we should be
aware of. Measures of association between two or more phenomena have to be found out at
several places by using various types of correlation methods. Spearman’s ranking method
which was developed by a psychologist is well recognized method for establishing rank,
correlation and coefficient. Graphical methods have to be used by the researcher to make the
presentation more effective and impressive. Data containing several variables without
dependency relationship various multi-variant techniques are used one of which is Factor
Analysis. Test of significance in communication research is very important as the sample
drawn from the population in certain characteristics is different from the population
parameters. There are number of tests like ‘t’ test, ‘F’ test, ‘chi-square’ test have to be
appropriately used for this purpose. There are number of software packages for data analysis
such as SPSS can be used to save time and make the data analysis more convenient.
QUESTIONS
Q-1 Discuss aims of data analysis and its importance in interpretation of research findings.\
Q-2 What are the measures of central tendency? In your view what are essentials for a good
average?
Q-3 Compare mean, median and mode as averages in terms of their advantages and
disadvantages.
Q-4 Explain measures of dispersion. How standard is worked out?
Q-5 Describe measure of association. How correlation is computed?
Q-6 What do you mean factor analysis? What are its methods?
Q-7 Discuss the functions of test of significance in communication research.
Q-8 Write short notes on the following tests
a) ‘t’ trest
b) ‘chi-sqare’ test
c) ‘F’ test
Q-9 Discuss the utility of SPSS in analyzing data.
Q-10 Describe salient features of SPSS and basic steps in data analysis.
Q-11 What are the important modules available for graphing procedures?
KEY WORDS
FURTHERE READINGS
Arthur Asa Berger, (2000). Media and Communication Research Methods: An Introduction to
Qualitative and Quantitative Approaches. New Delhi: Sage.
Britha Mikkelesen, (2009). Methods for Development Work and Research, sixth printing. New
Delhi: Sage.
C.R. Kothari, (2008). Research Methodology: Methods and Techniques, second revised
edition. New Delhi: New Age International.
Chris Hart, (2010). Doing your Masters Dissertation, fifth printing. New Delhi: Vistaar
Publications, Sage.
Darren George & Paul Mallery (2008). SPSS for Windows – Step by Step, eighth edition,
published by Darling Kindersley (India) Pvt. Ltd., New Delhi.
Fred N. Kerlinger, (2007). Foundations of Behavioural Research, tenth reprint. Delhi: Surjeet
Publication.
Gopal K Kanji, (2006). 100 Statistical Tests, third edition, South Asia. New Delhi: Vistaar
Publications, Sage.
Kultar Singh, (2010). Quantitative Social Research Methods, fourth printing. New Delhi: Sage.
Nourusis, Marija (2006) SPSS 15.0 Advanced Statistical Procedures Companion. Upper
Saddel River, NJ: Prentice Hall.
Prof. S.R. Bajpai, (1960). Methods of Social Survey and Research. Kanpur: Kitab Ghar.
SPSS 15.0 Base User’s Guide (2006), Chicago, IL: SPSS Inc.
SPSS 15.0 Base User’s Guide (2006), Chicago, IL: SPSS Inc.
Chapter VII – Data Analysis and Interpretation Page 221
HANDBOOK OF COMMUNICATION RESEARCH
SPSS 14.0 Command Syntax Reference (2006), Chicago, IL: SPSS Inc.
Shri Ram Singh Chandel, (1964). A Handbook of Agricultural Statistics, second edition.
Kanpur: Achal Prakashan Mandir.
http://en.wikipedia.org/wiki/SPSS
http://www.iasri.res.in/iasriwebsite/DESIGNOFEXPAPPLICATION/Electronic-Book/Module
%201/6SPSS-overview.pdf
Zina O’Leary, (2010). The Essential Guide to Doing Your Research Project, New Delhi:
Vistaar Publications, Sage.