Research Methodoly 151 298
Research Methodoly 151 298
Wilson A., Williams M., Hancock B. Oxon: Radcliffe Medical Press Ltd; 2000.
Research Approaches in Primary Care.
McColl E., Thomas R. London: Royal College of General Practitioners; 2000.
The Use and Design of Questionnaires.
Edwards P., Roberts I., Clarke M. et al. Increasing response rates to postal
questionnaires: systematic review. BMJ. 2002 May 18;324((7347)):1183.
Leung W.C. How to design a questionnaire. Student. BMJ. 2001;9:187–9.
Kothari C.R. Research Methodology. Revised 2nd edition. New Age
International Publishers.
E-REFERENCES
www.pewresearch.org/methodology/u-s-survey-research/questionnaire-
design
https://canadabusiness.ca/business-planning/market.../designing-a-
questionnaire
https://www.managementstudyguide.com/questionnaire-design.htm
IT https://www.simplypsychology.org/validity.html
M
140
CHAPTER
8
DATA PROCESSING AND ANALYSIS
Table of Contents
IT
Learning Objectives
8.1 Introduction
8.2 The Concept of Data Processing
8.2.1 Editing
8.2.2 Coding
8.2.3 Classification
M
8.2.4 Data Entry
8.2.5 Tabulation
Self Assessment Questions
8.3 The Concept of Data Analysis
Self Assessment Questions
8.4 Measures of Central Tendency
8.4.1 Mean
8.4.2 Median
8.4.3 Mode
Self Assessment Questions
8.5 Measures of Dispersion
8.5.1 Range
8.5.2 Mean Deviation
8.5.3 Standard Deviation
Self Assessment Questions
8.6 Measure of Skewness
Self Assessment Questions
8.7 Measures of Relationship
8.7.1 Correlation Analysis
8.7.2 Causal Analysis
Self Assessment Questions
Table of Contents
8.8 Different Charts Used in Data Analysis
Self Assessment Questions
8.9 Summary
8.10 Key Words
8.11 Case Study
8.12 Exercise
8.13 Answers for Self Assessment Questions
8.14 Suggested Books and e-References
IT
M
L E A R N I N G O B J E C T I V E S
IT
After studying this chapter, you will be able to:
Explain the concept of data processing
Describe the concept of data analysis
Discuss the measures of central tendency
Explain the measures of skewness
Discuss the measures of relationship
Describe the various charts used in data analysis
M
Research Methodology and Management Decision
8.1 INTRODUCTION
In the previous chapter, you studied about questionnaire designing. Now, you will
learn the significance and ways of processing and analysing data retrieved from
such questionnaires.
Data in its raw form does not convey any useful information. It needs to be
organised properly to extract relevant information and make it fit for research.
This is done with the help of data processing that involves various steps, including
editing, coding, classification, data entry and tabulation.
After processing data, you need to analyse it to find answers to the research
problem. You can use various statistical measures, such as the measures of central
tendency, dispersion, skewness and relationship to analyse data. The selection of
a measure depends upon the type of the research problem. For example, if you
wish to find out the average marks of students of class IX in English, then you
would use the measures of central tendency. However, if you want to know the
relationship between the eating habits of children and problems of obesity, then
you would use the measures of relationship. It is important to note that no single
statistical measure is complete in itself to analyse a data series. Therefore, you
IT should use an optimum combination of different measures to address the problem
at hand in the most effective manner. Any carelessness in data processing and
data analysis can result in erroneous research findings. Moreover, these data tasks
form a major part of research and consume considerable time and effort of the
researcher. Therefore, it is advisable to remain extra vigilant while processing and
analysing data for making the research as authentic as possible.
The chapter begins by explaining the concept of data processing and data analysis.
Next, it talks about the measures of central tendency, including mean, median,
M
and mode. Information is also provided about the measures of dispersion and
the measures of skewness. It also explains the measures of relationship, including
correlation analysis, regression analysis and multiple regression. Towards the end,
the chapter discusses other statistical measures used for data analysis.
Let us now discuss each step of data processing in the following section.
144
Data Processing and Analysis
8.2.1 EDITING
Editing refers to reviewing the collected data to check whether it is valid or not.
Data is examined to detect errors and omission. Errors are corrected, omitted data
is filled in, and data is prepared for further processing. The data is retained for
analysis.
The editor is responsible for ensuring that the data is accurate, uniform, as complete
as possible and acceptable for tabulation. Editing helps in filtering ambiguous
information that can create a problem at the time of data analysis. Ambiguous
information can be in the form of biased or incorrect responses in a questionnaire
and such information needs to be deleted.
8.2.2 CODING
Coding is the process of providing some codes to the data in the form of symbols,
characters and numbers. It helps the researcher in interpreting the data and
deriving accurate results. If the data is generated with the help of a questionnaire,
it can be coded either at the time of framing the questionnaire or after collecting
the data.
IT
The data that is already coded is known as precoded data.
The data that is coded at the time of data processing is known as postcoded
data.
Generally, a questionnaire may contain the following types of questions:
Interval-scale questions: An interval scale is any range of values that have
a relevant mathematical difference but no true zero. Any question where
the respondent must enter a temperature value is an interval scale question
M
because degrees are interval measurements. The data collected through
interval-scale and closed-ended questions is an example of precoded data.
Closed-ended questions: These questions are those for which a researcher
provides respondents with options from which to choose a response.
Open-ended questions: These questions are those which require more
thought and more than a simple one-word answer. The data collected
through open-ended questions is an example of postcoded data. Apart from
these, the questionnaire can also include questions based on nominal scale,
ordinal scale and ratio scale.
Precoded data has certain advantages over postcoded data:
It is easier to code.
It reduces the effort in data processing.
It leads to fewer chances of human error during data processing.
Let us understand the concept of coding with the help of an example.
145
Research Methodology and Management Decision
Questions 6 – 14: Give the ratings in the following questions as per your choice. The
rating of 1 means the lowest and 5 means the highest.
8.2.3 CLASSIFICATION
Classification refers to categorising the coded questions into different segments as
per their relevance. This is done to simplify data processing and analysis to a great
extent. It is important to note that variables in a segment possess certain similar
characteristics. For example, demographic information is a segment that includes
variables, such as age, education and work experience of the respondents.
Questions in a questionnaire can be classified into qualitative and quantitative
questions:
Qualitative questions: The classification of qualitative questions is called
statistics of attributes. These attributes cannot be measured directly in
numbers. However, qualitative attributes can be quantified. Examples of
attributes are honesty and attitude of the respondents.
Quantitative questions: The classification of quantitative questions is called
statistics of variables. These variables can be expressed in numeric form,
such as demographic factors including age and income.
146
Data Processing and Analysis
These variables can be grouped in the form of class intervals. A class interval
contains a lower limit and an upper limit. The difference between the two limits
is called class magnitude. For example, in the class interval 25-35, 25 is the lower
limit and 35 is the upper limit.
Class intervals can be inclusive or exclusive.
Inclusive class intervals: If the value of the upper limit is included in the
class magnitude, it is an inclusive class interval. For example, the value
35 would be included in the inclusive class 25-35. Thus, the inclusive class
intervals would be 25-35, 36-45, 46-55, and so on.
Exclusive class intervals: If the value of the upper limit is not included in
the class magnitude, then it is known as an exclusive class interval. For
example, the value 35 would not be included in the class 25-35 but it would
be included in group 35-45. Thus, the exclusive class intervals would be 25-
35, 35-45, 45-55, and so on.
Another important term to remember during classification is frequency.
Frequency is the number of occurrences of a repeating event per unit of time.
Table 1 shows the number of respondents in each age group:
IT 25-35
Table 1: Frequency Distribution
Age Group (Class Interval) Number of Respondents
10
35-45 4
45-55 7
55-65 2
In Table 1, 10 respondents are in the age group of 25-35. Thus, 10 is the frequency
M
of the class interval 25-35. When class intervals and frequencies are represented
in a tabular form, as in Table 1, such a representation is known as frequency
distribution.
8.2.5 TABULATION
Tabulation refers to presenting data in the form of a table so that it can be
easily analysed. In this stage, the frequencies of the dataset are also computed.
147
Research Methodology and Management Decision
There are three types of frequencies, namely absolute frequency, relative frequency
and cumulative frequency.
Absolute frequency is the exact frequency given by the respondents.
Relative frequency is calculated with relation to the frequency of the other
class intervals. It is the percentage of all respondents who have given a
particular response.
Cumulative frequency is the percentage of all respondents who have given
a response equal or less than a particular value.
There are two types of frequency distributions, which can be put into a tabular
form:
1. Two-way frequency distribution: In this type of frequency distribution,
two variables can be analysed at a time. This frequency distribution is also
known as cross tabulation.
2. One-way frequency distribution: In this type of frequency distribution, a
single variable is analysed.
Table 2 shows an example of the one-way frequency distribution.
IT Age Group (Class
Table 2: One-Way Frequency Distribution
Number of Persons Relative Cumulative
Interval) (Frequency or Frequency Frequency
Absolute Frequency)
20-30 10 17.86 17.86
30-40 14 25.00 42.86
40-50 20 35.71 78.57
M
50 and above 12 21.43 100.00
Total 56 100 100
In Table 2, age group is taken as a variable and different types of frequencies are
calculated. As already discussed, absolute frequency is the precise frequency
given by the respondents. Relative frequency can be calculated by dividing the
absolute frequency with the total frequency. For example, in case of the 20-30 age
group, absolute frequency is 10 and the total frequency is 56; therefore, the relative
frequency is 17.86 (10/56×100). Cumulative frequency can be calculated by adding
up the relative frequency of the present class interval (whose cumulative frequency
we are calculating) and the relative frequency for the following class interval. For
example, in case of the 20-30 and 30-40 age groups, the relative frequencies are
17.86 and 25.00, respectively. Therefore, the cumulative frequency in the case of
the 30-40 age group is 42.86 (17.86 + 25.00).
148
Data Processing and Analysis
Univariate Analysis
IT Multivariate Analysis
Parametric Test
Inferential Analysis
Non-Parametric Test
M
Figure 2: Types of Data Analysis
149
Research Methodology and Management Decision
Inferential analysis: In this type of data analysis, significance tests are used
to check the validity of a hypothesis for studying a problem. There are two
types of significance tests:
i. Parametric tests: These tests make assumptions about the parameters of
the population from which a sample is derived. Examples of parametric
tests include z-test and t-test.
ii. Non-parametric tests: These tests do not make any assumptions about
the parameters of the population from which the sample is derived. An
example of a non-parametric test is the Kruskal Wallis test.
Weighted Mean
Measures of Central Tendency
Mode
8.4.1 MEAN
Mean represents the value calculated after dividing the sum of observations by the
total number of observations (n) taken. It is also known as arithmetic mean.
n = Number of observations
Let us understand the concept of arithmetic mean with the help of an example.
Suppose you want to find the average weight of a group of five friends. Table 3
shows the weight of each person in the group:
IT People
Table 3: Weight of Five Friends
Weight (kg)
Jenny 35
Robert 40
Ella 34
Andy 39
M
Eliza 42
X = ∑Xi/n
∑Xi = 190
n=5
X = (35 + 40 + 34 + 39 + 42)/5
X = 190/5
X = 38 kg
151
Research Methodology and Management Decision
=9 + 10.5 + 38.5
= 58
Geometric mean: Geometric mean represents the nth root of the product of
all the values or observations involved in a research. The formula used to
M
calculate geometric mean is as follows:
Xxg= n ( x1 )( x2 ) ( x3 )...( xn )
Xg = 4 X1 × X 2 × X 3 × X 4
Xg = 4
10 × 12 × 10 × 11
=X 13200 10.718
4
=
g
Therefore, the geometric mean of four observations is 10.7 years.
152
Data Processing and Analysis
247
247
660 Rec. 247
660 247 660660××44
=
=XXX
HHH Rec
=
=. . Rec
= Rec.
Rec Rec
==..
44 660
660××44 247
247
247
247
660 Rec. 247
660 247 660660××44
== Rec. . =
XXHXHH =Rec
Rec. = Rec
Rec. . == = = 10.6810.68
M
44 660
660××44 247
247
Therefore, the harmonic mean of the four observations is 10.7 years. It is used
for units that add up as reciprocals in a sequence such as speed, distance,
capacitance in series or resistance in parallel.
8.4.2 MEDIAN
Median is defined as a central or mid value of a dataset. Median divides a dataset
into two halves – one half contains the values greater than the mid value (or
median) and the other half contains the values less than the mid value.
Before calculating median, you need to arrange the dataset in the ascending or
descending order. The formula to calculate median is as follows:
n = number of observation
153
Research Methodology and Management Decision
2, 5, 3, 4, 1, 5, 4, 3, 1, 2, 5, 4, 3, 2, 1, 5, 4
Now you want to calculate the average rating by using median. To do so, arrange
the data in the ascending order, as follows:
1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5
Since the number of observations is odd, the following formula will be used to
calculate median
Median = 3
2, 5, 3, 4, 1, 5, 4, 3, 1, 2, 5, 4, 3, 2, 1, 5, 4, 1, 2, 3
Now, to calculate the average rating using median, all the 20 observations are
arranged in ascending order as:
1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5
Here, median is the average of middle two values, i.e., values at 10th and 11th
positions. This is calculated as:
Median = (3 + 3)/2 = 3
8.4.3 MODE
Mode refers to the value that has the highest frequency in a data series.
According to Croxton and Cowden, the mode of a distribution is value at the point
around which the items tend to be most heavily concentrated. It may be regarded as the
most typical of a series of values.
154
Data Processing and Analysis
Let us learn to calculate mode with the help of an example. Suppose the marks of
five friends in a science paper are 70, 90, 50, 70, and 30. You want to find the mode
of their marks.
You need to find the highest frequency of the present data to calculate mode. Here,
the number having the highest frequency is 70 as it occurs two times; therefore, the
mode of students’ marks is 70.
Mode is used as the most important statistic for nominal data where values are
names rather than numbers. In such cases, there is no concept of center because
there are no numbers. In addition, when we are dealing with continuous variables,
probability that observations occurring in the data sample are different is 1.
Therefore, mode cannot be used for continuous variables.
Mode is not considered a true measure of central tendency because of three reasons:
i. It is not necessary that one data series has only one mode because many
numbers in the data series can have the highest frequency.
ii. Mode does not consider all the frequencies to arrive at the central value of
the data series. Therefore, the results of mode are not reliable.
IT iii It is possible that a series has observations that occur only once. In such cases,
mode does not exist.
Let us summarise mean, median and mode as:
Mean: Mean represents the average value in a dataset.
Median: Median represents the middle value in a dataset.
Mode: Mode represents the most common value in a dataset.
The measures of central tendency used for different types of variables are shown
M
in Table 4 as follows:
7. Mean represents the value that you get after dividing the sum of
observations by the total number of observations taken. (True/False)
S elf
A ssessment 8. ______________ mean represents the nth root of the product of all the
Q uestions values or observations involved in the research.
9. Median can be defined as a central value that divides a dataset into two
halves. (True/False)
155
Research Methodology and Management Decision
Range
Measures
of
Dispersion
Mean Standard
Deviation Deviation
IT Figure 4: Measures of Dispersion
8.5.1 RANGE
Range represents the difference between the highest value and the lowest value in
M
a data series. It is considered as a rough measure of variability because it depends
on the size of the data series. When the highest (H) and/or the lowest (L) data point
in a data series changes , the range also changes.
H−L
N ote Coefficient of Range=
H+L
Let us learn to calculate range with the help of the preceding example in which a
group of 17 people rated a book on a 5-pointer scale, where 1 is the lowest rating
and 5 is the highest rating. The rating given by the 17 people is as follows:
2, 5, 3, 4, 1, 5, 4, 3, 1, 2, 5, 4, 3, 2, 1, 5, 4
Now, you want to calculate the range for the data series.
To do so, you need to find the highest and lowest values of the data series. In the
present case,
156
Data Processing and Analysis
Range = (5 – 1)
Range = 4
According to Clark and Schkade, average deviation is the average amount of scatter
of the items in a distribution from either the mean or the median, ignoring the signs of the
deviations. The average that is taken of the scatter is an arithmetic mean, which accounts
for the fact that this measure is often called the mean deviation.
IT
Mean Deviation is used to measure variability across a data series.
X = Mean/Median/Mode
M
n = Number of observations
With the help of MD, you can also calculate the coefficient of MD. The coefficient of
MD refers to the relative measure of dispersion that can be calculated by dividing
MD with mean/median/mode.
Coefficient of MD = MD/X
X = Mean/Median/Mode
Let us understand the concept of MD and the coefficient of MD with the help of an
earlier example in which you calculated the average weight of five friends.
157
Research Methodology and Management Decision
35 + 40 + 34 + 39 + 42
=X = 38
5
The formula to calculate MD is shown as follows:
M.D. = 14/5
M.D. = 2.8
= 2.8/38
IT = 0.074
Therefore, the dispersion of the weight of five friends from the mean value is 2.8,
Therefore, the weight of all friends is dispersed more or less by 2.8 kg from the
average weight. The relative measure of weight is 0.074.
∑(X − X)
2
i
SD of population σ =
n
∑ ( Xi
X − X)
22
i
SD of Sample S =
nn
−1
If the observations are grouped into a frequency table, than the formula of SD and
variance change as follow:
∑(X − X ) fi
2
i
σ=
n
and X =
∑ Xi fi
∑f i
n = ∑ fi
∑(X − X ) fi
2
2 i
Therefore, σ =
n
The coefficient of SD can be calculated by dividing SD with the mean of the series.
It is a relative measure of dispersion.
Let us understand the concepts of SD, the coefficient of SD, and the coefficient of
variance with the help of an example.
Suppose you want to calculate the standard deviation of the weight of five friends
shown in the preceding example. Table 6 shows the data used to calculate standard
IT
deviation, the coefficient of standard deviation, and the coefficient of variance:
= √46/5 = √9.2
= 3.033
= 3.03/38
= 0.0798
159
Research Methodology and Management Decision
10. _______________ is used to study the scattered value near the mean
value of a data series.
S elf
A ssessment 11. Which formula is used for calculating the range of a data series?
Q uestions
a. Highest value of series – Lowest value of series
b. Lowest range – Highest range
c. Lowest value of series – Highest value of Series
d. None of these
12. Coefficient of Mean Deviation = _____________
13. The symbol used to represent Standard Deviation is _________________.
It may be possible that two data series, which are widely different in nature and
composition, have the same mean and standard deviation. However, when you
plot the data of such series on graphs, you obtain curves with different shapes.
M
This shows that the measures of central tendency and dispersion are not sufficient
to study the frequency distribution of a data series because they do not talk about
the shape of the frequency distribution curves. Therefore, you need skewness
to gain understanding of the different shapes of various frequency distribution
curves.
M M M M
O E M E O
D D E M D D
E I A E I E
A N A A
N N N
Positive Skewness Negative Skewness
160
Data Processing and Analysis
Symmetric Distribution
Positive skewness implies that the concentration of values is on the right side of
the curve, whereas negative skewness implies that the concentration of values
is on the left side of the curve. Skewness is calculated by taking the difference
of mean and mode. In positive skewness, the values of these three measures of
central tendency are in the following order:
Skewness = X – Z
M
For moderately asymmetrical curves,
N ote Mode = 3 Median – 2 Mean or
Z = 3M – 2X
X–Z
Coefficient of Skewness = S k =
σ
Sk = (3Mean – 3Median)/σ
161
Research Methodology and Management Decision
For a moderately skewed, if there is more than one mode or if there is no mode,
then you need to calculate skewness and the coefficient of skewness using the
method of Moments.
Let us now calculate skewness and the coefficient of skewness with the help of
an example. Suppose you want to calculate the skewness and the coefficient of
skewness of the data given in Table 7:
X = 89/5
X = 17.8
162
Data Processing and Analysis
Leptokurtic
Frequency
Mesokurtic
Platykurtic
IT
O X
µ
Karl Pearson’s coefficient of kurtosis measure is given as b 2 = 42
M
µ2
Where,
If the items concentrate very much at the centre, β2 > 3, i.e., the curve is more
peaked than the mesokurtic curve and is called a leptokurtic curve.
If the items concentrate at the centre comparatively lesser than the mesokurtic
curve, β2 < 3, i.e., the curve is less peaked than the mesokurtic curve and is
called a platykurtic curve.
163
Research Methodology and Management Decision
Measures of
Relationship
Correlation Regression
Analysis Analysis
Different tools are used to study the correlation pattern between variables. These
include: Rank correlation and Simple correlation.
M
Let us discuss each tool.
Rank correlation: Rank correlation refers to the correlation between two
data series in which the data is ranked. Generally, it is found when the data
is qualitative in nature. It was given by Charles Spearman. Therefore, it is
also known as Spearman’s coefficient of correlation. It calculates the degree
of relationship between two types of variables.
The formula to calculate rank correlation is as follows:
6 ∑ d 2i
Rank Correlation ρ
R= 1 −
(
n n2 − 1 )
Where, di = Difference between the individual/ith pair of variables
n = Number of pairs of observations
Simple correlation: Simple correlation is used to find the degree of linear
relationship between two variables. It is the most commonly used measure
to describe relationship between two linearly related variables. It was given
by Karl Pearson. Therefore, it is also known as Karl Pearson’s coefficient of
correlation.
164
Data Processing and Analysis
PositiveCorrelation
Positive Correlation Negative Correlation
Negative Correlation No
NoCorrelation
Correlation
n ∑ X i Yi − ∑ (X i )∑ (Yi )
r=
n X 2 − ( X )2 n Y 2 − ( Y )2
∑ i ∑ i ∑ i ∑ i
Where, Xi = ith value of X variable
X = Mean of X variable
165
Research Methodology and Management Decision
166
Data Processing and Analysis
r = (25 × 40846 – 698 × 1316)/√ (25 × 22458 – 698 × 698) (25 × 75464 – 1316 × 1316)
r = 102582/√74246 × 154744
r = 0.96
Y = α + βX
Where,
167
Research Methodology and Management Decision
n ∑ X i Yi − ∑ X i ∑ Yi
b=
n ∑ X i2 − ( ∑ X i )
2
1
=a ∑ Yi − b∑ X i
n
2
2.0
3.4
12
6
24.0
20.4
4.0
11.6
10 ×1272.7 − ( 62 )(195 )
=
b = 637 ÷ 563
= 1.1314
10 × 440.7 − ( 62 )( 62 )
1
=a 195 − 1.1314=
× 62 12.485
10
168
Data Processing and Analysis
Thus, the regression equation for the above data is given as:
Y = 12.485 + 1.1314X
With this equation, the values of Y (monthly sales) can be computed for any given
value of X (no. of customers) as depicted in Table 10 below:
8
7.6
9.3
21.08
23.01
(12.485 + 1.1314×7.6)
(12.485 + 1.1314×9.3)
If the research report contains many descriptive tables, it can be made more
readable and attractive if the most important tables are presented through graphs
and diagrams. In the graphical presentation, facts and figures are gathered first
and then they are depicted in the form of graphs and charts to present the statistical
information. The most frequently used graphs and charts include the following:
Bar Chart: A bar chart represents categorical data with the help of rectangular
bars, plotted vertically or horizontally. The heights or lengths of rectangular
169
Research Methodology and Management Decision
bars are proportional to the values represented by them. The data can be in
the form of absolute frequencies or relative frequencies.
Figure 8 below shows a bar chart to depict the relative frequency/percentage
of shortages of anti-inflammatory medicines in the rural health organisations:
NEVER 31
Shortage of anti-inflammatory
OCCASIONALLY 11
medicines
FREQUENTLY 3
RARELY 55
0 10 20 30 40 50 60
Pie Chart: A pie chart is a circular statistical graphic, segregated into different
segments to illustrate the numerical proportions/relative frequency of a
number of items. The arc length of each segment shows the proportionate
quantity represented by it. Pie charts provide a quick overview of the data
presented to the readers. All segments of the pie chart should be added up
to 100%.
Figure 9 below shows a pie chart to depict the relative frequency/percentage
M
of shortages of anti-inflammatory medicines in the rural health organisations:
Never
31%
Rarely
55%
Occasionally
11%
Frequently
3%
170
Data Processing and Analysis
long as there is no gap in the data) to represent continuous data, whereas the
bars in a bar chart are not connected as they represent different categorical
entities. Figure 10 below shows a histogram to depict the frequency of sales
effected by different sales persons in a month, indicating how many sales
persons fall within a particular sales range:
5
No of sales persons
0
(32, 182) (182, 332) (332, 482) (482, 632) (632, 782) (782, 932)
Sales range
IT
Figure 10: Absolute Frequency of Sales Effected by Different Sales Persons in a Month (n=60)
Line graph: A line graph or a line chart is generally used to visualize the
value of a particular variable over time. They are useful to show the trend
of numerical data over a period of time. Two or more distributions (each
depicted by a separate line) can be shown in one graph as long as the
difference between them is easily distinguishable. They also make it possible
to compare the distributions of different groups for example, age distribution
between males and females. Figure 11 below shows a line graph to depict
M
the frequency of daily number of patients being treated at the rural health
organisations in District Y:
DAILY NUMBER OF PATIENTS
25
UNDERGOING TREATMENT
20
15
10
0 1 2 3 4 5 6 7 8 9 10 11 12
DAY NUMBER
171
Research Methodology and Management Decision
have vertical lines extending from the boxes (called whiskers) to indicate the
variability outside the upper and lower quartiles. For example, variability
between sales patterns effected in Area X and Area Y is shown through box
plots in Figure 12 below:
900
800
700
600
500
Sales
400
300
200
100
0
IT Figure 12: Sales Patterns of Food Grains Effected in Area X and Area Y
172
Data Processing and Analysis
The measures of central tendency are used to study the distribution pattern
of a dataset.
Mean represents the value received after dividing the sum of observations
by the total number of observations.
Median refers to the central value of the given dataset.
Mode refers to the value that has the highest frequency in a data series.
The measures of dispersion refer to the measures that are used to study the
dispersed value near the mean value.
Standard deviation is used to calculate the scattering of values in a given
dataset.
The measure of skewness is used to study the shape of the curve that can be
drawn by plotting the data of a frequency distribution on a graph.
The measures of relationship study the relationship between two or more
variables in a given data series.
173
Research Methodology and Management Decision
After retrieving sufficient data from the questionnaires, they classified the collected
data. To do so, they combined customers’ responses from different cities and then
sub grouped them according to their cities. Next, they formed a table to analyse
the relationship between customers’ satisfaction and the sales of the company:
174
Data Processing and Analysis
The correlation between the customers’ satisfaction and the sales of company is as
follows:
r = (25 × 1973 – 185 × 239) / √ (1525 × 25 – 185 × 185) (25 × 2957 – 239 × 239)
r = 5110/8095.41
r = 0.6
Since the correlation coefficient is positive and close to 1, it indicates that the
relationship between the customers’ satisfaction and the sales is positive and
strong.
Similarly, the consultants studied the relationship between different variables, such
as quality of service and customer satisfaction, quality of service and established
standards, and so on. Finally, they concluded that the satisfaction level of the
restaurant’s customers was positive and strong. However, the restaurant’s service
level were far behind the established quality standards.
IT QUESTIONS
1. What are the different steps of data processing used in the case study?
(Hint: The consultants used all the steps of data processing, that is, first
they extracted the relevant data. Then, they classified and organised the
information and studied the relationship between variables.)
2. Which type of measure is used in analysing the table and what type of
analysis is used?
M
(Hint: The measure of relationship is used to analyse the table.)
8.12 EXERCISE
1. Explain the different steps of data processing.
2. What are the different types of data analysis?
3. What are the measures of central tendency? Why are they used?
4. What are the measures of dispersion? Why are they used?
5. What do you understand by ‘skewness’? What is the measure of skewness?
What does its calculated value indicate?
6. What is the purpose of casual analysis?
175
Research Methodology and Management Decision
18. whiskers
E-REFERENCES
Research Guides: Organising Your Social Sciences Research Paper: 6. The
Methodology. (2018). Retrieved from http://libguides.usc.edu/writingguide/
methodology
Research Methods. (2018). Retrieved from https://research-methodology.net/
research-methods/
176
CHAPTER
9
THE CONCEPT OF HYPOTHESIS
Table of Contents
IT
Learning Objectives
9.1 Introduction
9.2 Defining Hypothesis
9.2.1 Characteristics of a Good Hypothesis
9.2.2 Types of Hypotheses
Self Assessment Questions
M
9.3 Hypothesis Testing
9.3.1 Null Hypothesis and Alternative Hypothesis
9.3.2 Decision Rule
9.3.3 Two-tailed Test
9.3.4 One-tailed Test
Self Assessment Questions
9.4 Procedure of Hypothesis Testing
Self Assessment Questions
9.5 Summary
9.6 Key Words
9.7 Case Study
9.8 Exercise
9.9 Answers for Self Assessment Questions
9.10 Suggested Books and e-References
L E A R N I N G O B J E C T I V E S
IT
After studying this chapter, you will be able to:
Explain the concept of hypothesis
Describe the various types of hypothesis
Explain the use of null and alternative hypotheses in hypothesis testing
Differentiate between two-tailed and one-tailed tests
Describe the procedure of hypothesis testing
M
The Concept of Hypothesis
9.1 INTRODUCTION
In the previous chapter, you studied about data processing and analysis. Now,
you will study about the concept of hypothesis.
A hypothesis refers to an assumption that is made in the population parameter
and a sample statistic is used to verify the same. It is a very useful tool to solve
various research problems and issues. A researcher first forms a hypothesis about
a problem and then tests it to check its validity by using statistical measures. The
procedure to utilise test statistics to check whether a hypothesis is true is known
as hypothesis testing.
Suppose a researcher is asked to check whether an organisation’s new
advertisement has resulted into enhanced sales or not. In this case, the researcher
would first form the hypothesis that the new advertisement has no impact on
the organisation’s sales. This hypothesis is known as null hypothesis. After that,
the researcher would form another hypothesis, known as alternative hypothesis,
which states that the new advertisement has a positive impact on the organisation’s
sales. Then, the researcher would analyse the data to find the relationship between
the new advertisement and the organisation’s sales. If he/she finds a relationship
IT between the new advertisement and the sales, he/she would reject the null
hypothesis and accept the alternative hypothesis.
In the field of research, the concept of hypothesis and hypothesis testing hold a very
special place. The formation of hypothesis helps the researcher remain focused
on the research problem. In addition, it gives direction to the research project by
clearly defining the scope of research. Hypothesis testing assists the researcher in
deriving realistic results, as it takes into consideration the errors due to sampling.
In this chapter, you will learn about the concept of hypothesis and explore the
M
characteristics and types of hypothesis. The chapter also provides information
about hypothesis testing, null and alternative hypotheses, decision rules, one-
tailed test and two-tailed test.
179
Research Methodology and Management Decision
Types of
Directional Hypothesis
Hypothesis
Non-Directional Hypothesis
On the Basis of
Formulation
Null Hypothesis
Alternative Hypothesis
On the basis of derivation, there are two types of hypotheses, which are explained
as follows:
1. Inductive hypothesis: In inductive hypothesis, you move from specific
observations to broad generalisations. First, you observe a phenomenon.
180
The Concept of Hypothesis
Then, you form a pattern from your observations. After that, you form a
hypothesis to study the pattern. Finally, you form a theory on the basis
of your study of the pattern. The inductive hypothesis is used to conduct
qualitative studies of subjective variables. In this type of hypothesis, you
should ask open-ended and process-oriented questions.
2. Deductive hypothesis: In this type of hypothesis, you move from a general
statement to a specific, logical conclusion. You start from a theory and based
on it you make a prediction of its consequences. In other words, you predict
what the observations should be if the theory were correct. Finally, analysis
is done to arrive at a conclusion, whether the theory is rejected or accepted
with respect to the problem. In deductive hypothesis, a research goes from
general theory to specific observation. In this type of hypothesis, you should
ask closed-ended and outcome-oriented questions.
On the basis of formulation, there are four types of hypothesis which are explained
as follows:
1. Directional hypothesis: This hypothesis checks the direction of relationship
between two variables. In directional hypothesis, you use terms, such as
more than, less than, negative and positive. An example of the directional
IT hypothesis is: In an organisation, women are more productive than men.
2. Non-directional hypothesis: In this hypothesis, the direction of relationship
between two variables cannot be specified. For example, an organisation
wants to get feedback from its employees about their job satisfaction level. In
this example, the test result can be positive or negative depending on the job
satisfaction of the employees.
3. Null hypothesis: In this hypothesis, there is no relation between two
variables under study. It is denoted by H0. Null hypothesis is used as
M
the first statement in a hypothesis, which you (or the researcher) want to
reject. For example, a null hypothesis is: There is no relation between the
number of years of experience held by an individual and his performance.
Therefore, researchers are more interested in disproving or rejecting the null
hypothesis. This is an example of null hypothesis that would be tested for
rejection because it is generally held that experience and performance are
related.
4. Alternative hypothesis: This hypothesis states that there is a relationship
between two variables under study. It is denoted by H1. It is used as the
second statement in a hypothesis that you want to accept. For example, an
alternative hypothesis can be: There is a relation between the qualification
of an individual and better job opportunities. Since these two variables are
related, you would want to accept this statement.
181
Research Methodology and Management Decision
182
The Concept of Hypothesis
After the null and alternative hypothesis have been stated, the researcher sets the
decision criteria for which he/she needs to state the level of significance of test.
If the null hypothesis is true, the sample mean will be equal to population
N ote mean on average.
The most commonly used levels of significance in statistics are 1%, 5% and 10%.
For example, if 5% is the most commonly used level of significance in behavioural
studies, it implies that the 5% area of the normal curve would be used for testing
the hypothesis and the value for this area is taken from the table of the respective
test statistic. For instance, the z-values for the various levels of significance are
shown in Figure 2:
Mean value
IT
–2.58 –1.96 –1.64 +1.64 +1.96 +2.58
M
90% of Area
95% of Area
99% of Area
In Figure 2, you can see that the areas expressed in percentage and their values
are given on X-axis. Table 1 provides the levels of significance and their z-values:
183
Research Methodology and Management Decision
In hypothesis testing, the value level of significance is very important, as it helps you
in rejecting or accepting a null hypothesis. You should be careful while formulating
or determining the level of significance for a problem/topic. The reason is that you
may reject a true hypothesis on the basis of a level of significance. If the level of
significance is 5%, it implies that the probability of rejecting a true hypothesis
is 0.05 (max).
After the level of significance has been set, the researcher then proceeds to compute
the test statistic which basically describes how far a sample mean is from the
population mean. The greater the value of test statistic, the farther is the sample
mean from the population mean described in null hypothesis. Thereafter, on the
basis of value of test statistic, a decision is made.
If the null hypothesis is true and the probability of obtaining a sample mean is less
than 5%, then we reject the null hypothesis. On the contrary, if null hypothesis is
true and the probability of obtaining a sample mean is more than 5%, then the null
hypothesis is retained.
H0: The average recovery time after operation is less than or equal to 7 weeks.
H1: The average recovery time after operation is greater than 7 weeks.
From the preceding two examples, it is clear that H0 is totally opposite of the
statement the researcher wants to study. The researchers always test H0 for
significance, not H1 because they are usually interested in disproving H0.
H0 and H1 are in the descriptive form. The researcher must convert them into the
quantitative form to compute them.
H0: μ ≤ 7
H1: μ > 7
184
The Concept of Hypothesis
Where,
µ = Population mean
You can also formulate a hypothesis for testing with the help of a benchmark. This
benchmark is a numerical digit with which you have to compare your results and
test the hypothesis. This is one of the finest and widely used methods for framing
null and alternate hypotheses because it represents null and alternate hypotheses
in quantitative form. This makes hypothesis testing easier.
For example, in a school, the average weight of every class is 100 (population
mean). You consider all sections of class 10 as a sample (assume there are 5 sections
of class 10) and calculate their average weight (sample mean). Now, you want to
check whether the sample mean is equal to the population mean or not. In this
case, H0 and H1 would be as follows:
H0: X = 100
Where,
IT
X = Sample mean
The researchers assume that the null hypothesis is true and proceed further to find
out various methods/possibilities to solve the problem. They try to reject the null
hypothesis.
M
A hypothesis can never be right or wrong. Rather, it is judged by what you want to
analyse. If a hypothesis is framed in such a way that it can answer your problem,
then it would be right.
It is important to note that different types of errors may occur while testing a
hypothesis. Therefore, the researcher should take into consideration the possibilities
of these errors while taking decisions.
185
Research Methodology and Management Decision
The decision grid helps the researcher in taking decisions, which is shown in
Figure 3:
Accept H0 Reject H0
As per the grid shown in Figure 3, if H0 is true and it is accepted, then the decision
is correct. If H0 is false and it is rejected, then also the decision is right. However,
if the decision is wrong, two types of errors can occur, which are explained as
follows:
1. Type I errors: These errors occur when the researcher rejects a null hypothesis
(H0) when null hypothesis was true. In this case, the decision taken by the
researcher is wrong. Type I errors are also known as the first kind of error or
M
false positive. These errors are represented by a.
2. Type II errors: Type II errors occur when the researcher accepts a null
hypothesis (H0) that should have been rejected. In this case, the decision taken
by the researcher is wrong. Type II errors are also known as the second kind
of error or false negative. These errors are represented by b. The probability
of rejecting the null hypothesis when it is false = 1 – b and is called as the
power of test.
If you minimise Type I errors, Type II errors would increase or vice-versa.
Therefore, you have to be very careful while minimising one type of error. You
must remember that both the types of errors can be limited using an appropriate
sample size.
For example, a company produces tennis balls and it has laid down that the ball
should weigh 55 grams in order to get good ratings. The samples are drawn on
hourly basis and checked for ideal weight. In a given hour, 11 balls are checked
186
The Concept of Hypothesis
randomly and their mean is calculated as 55.006 grams and SD of 0.029 grams.
If the production line gets out of sync with more than 1% level of significance,
the production line is shut down. Let us see if the production line should be shut
down in this case.
Here,
μp = 55 g;
H0: μp = 55 g
H1: μp ≠ 55 g
α = 1% =0.01
p = 1 – (α/2) = 0.995
Here, tp = 3.169
IT
Now, calculate tc.
tc =
X −µ
s/ n
55.006 − 55
tc =
0.029 / 10
M
0.006
=tc = 0.659
0.0091
The two-tailed test can be shown on a normal curve in Figure 4:
Fail to Reject H0
Reject H0 Reject H0
187
Research Methodology and Management Decision
For example, assume that the null hypothesis states that mean weight of people is
60 kg or more. In this case, the alternative hypothesis would be that the mean
weight of people is less than 60 kg. Here, the rejection region comprises of the
range of numbers 0 to 60 located on the left side of sampling distribution (set of
numbers that are less than 60).
IT The one-tailed test also forms a normal curve as shown in Figure 5:
Mean value
Acceptance Region
(If sample mean lies
in this area, accept
H0)
M
Rejection Region
(If sample mean
lies in this area,
reject H0)
–1.64
The level of significance can be represented with the help of α and α/2 in one-
tailed test and two-tailed test, respectively. For example:
In one-tailed test, if the level of significance is 5%, then α is 5%. In this case,
the value of test statistics would be determined at 0.05.
In two-tailed test, if the level of significance is 5%, then the value of test
statistics would be determined at 0.025% (α/2).
188
The Concept of Hypothesis
1. State H0 and H1
6. Take a Decision
189
Research Methodology and Management Decision
H0: μp = 20 kg
H1: μp ≠ 20 kg
Where, μp= Population mean
2. State the level of significance: This refers to deciding the level of significance
(a) for the hypothesis test. The most commonly used level of significance is
5%. This happens because the range 5% is neither too big nor too small to
accept or reject a hypothesis.
3. Decide on the type of the test of significance: The test of significance is used
to check the hypothesis at a given level of significance. There are various
types of tests of significance, such as t-test, z-test, and F-test. The selection of
a test depends on various factors such as the sample size, variance and type
of population. For example, you use the t-test when the sample size is less
than 30 and the z-test when the sample size is more than 30.
4. State the decision rule: It refers to determining the conditions under
which the null hypothesis is accepted or rejected. If the decision rule is
not determined correctly, then there are chances of committing Type I and
Type II errors. Therefore, you should be careful while making the decision
IT rule.
5. Calculate the test statistics: It refers to ascertaining the value of test statistics
to accept or reject the hypothesis.
6. Take a decision: It refers to either accepting or rejecting H0 on the basis of
the calculated value of test statistics. If the calculated probability is equal to
or smaller than a value (in one tailed test) or smaller than a/2 (in two-tailed
test), then null hypothesis is rejected. However, if calculated probability is
greater than a value, then null hypothesis is accepted. Rejecting H0 may lead
M
to Type I error whereas accepting H0 may lead to Type II error.
190
The Concept of Hypothesis
9.5 SUMMARY
Hypothesis is a proposed explanation given for an observed situation.
Inductive hypothesis is a type of derivation hypothesis, where you move
from specific observations to broad generalisations.
Deductive hypothesis is a type of derivation hypothesis in which you move
from a general statement to a specific conclusion.
Directional hypothesis refers to the formulation hypothesis that checks the
direction of relationship between two variables.
Non-directional hypothesis refers to the formulation hypothesis, where the
direction of the relationship between two variables cannot be specified.
Null hypothesis refers to the hypothesis in which there is no significant
relation between two variables under study. It is denoted by H0. It represents
the first statement of a hypothesis that is assumed to be true.
Alternative hypothesis states that there is a relationship between two
variables under study. It is denoted by H1. It represents the second statement
of a hypothesis that is assumed to be false.
IT
The decision rule states that before accepting or rejecting a null hypothesis,
the researcher should keep in mind all the criteria set for the hypothesis.
Two-tailed test is a part of non-directional hypothesis that talks about
the relationship between two variables under study, but does not explain
anything about the direction of the relationship.
One-tailed test is a part of directional hypothesis that talks not only about
the relationship between two variables under study, but also the direction of
relationship.
M
Hypothesis testing is a step-by-step process that starts with the formulation
of hypothesis and ends with decision making.
191
Research Methodology and Management Decision
Thereafter, the researcher finds out the value of t-statistic at 95% confidence and df
at 6 using t-table, which comes out to be 1.943.
0.2
tc = = 2.47
0.0809
95.0%
Reject
5.0%
Fail to Reject
μ tc = 2.47
tp = 1.943
192
The Concept of Hypothesis
It can be seen that the critical t-value (tc) lies in the rejection region. Therefore, the
researcher rejects the null hypothesis. Rejecting the null hypothesis means that
the sample was not acceptable and it can be stated that there is some issue in the
production of alternators at AM Pvt. Ltd. which it must find out and resolve.
QUESTIONS
1. What would be the value of tc if the sample had 49 alternators in it?
(Hint:
X− µ
tc =
s / n
71.3 − 71.1
tc =
0.214 / 49
0.2
= t c = 6.55 )
0.0305
2. What would be the value of tc if the standard deviation was changed to 0.851?
(Hint:
IT tc =
X− µ
s / n
71.3 − 71.1
tc =
0.851 / 7
0.2
=t c = 0.623
0.321
M
In this case, the null hypothesis would have been accepted.)
9.8 EXERCISE
1. Describe the hypothesis and its types in detail.
2. What are the characteristics of a good hypothesis?
3. Explain the hypothesis testing in detail.
4. Explain the following terms:
a. Null Hypothesis.
b. Two-tailed test.
c. Decision rule.
193
Research Methodology and Management Decision
SUGGESTED BOOKS
IT
Cahoon, M. (1987). Research methodology. Edinburgh: Churchill Livingstone.
Detterman, D. (1985). Research methodology. Norwood, N.J.: Ablex.
Panneerselvam, R. (2014). Research methodology. Delhi: PHI Learning.
E-REFERENCES
Different Research Methods - How to Choose an Appropriate Design? (2018).
Retrieved from https://explorable.com/different-research-methods
M
Research Methodology. (2018). Retrieved from https://explorable.com/
research-methodology
Research Methodology: Approaches & Techniques - Video & Lesson
Transcript | Study.com. (2018). Retrieved from https://study.com/academy/
lesson/research-methodology-approaches-techniques-quiz.html
Research Methods. (2018). Retrieved from http://faculty.webster.edu/
woolflm/statmethods.html
194
CHAPTER
10
PARAMETRIC TESTS
Table of Contents
IT
Learning Objectives
10.1 Introduction
10.2 Types of Hypothesis Testing
Self Assessment Questions
10.3 Parametric Tests
Self Assessment Questions
10.4 One-Sample Tests - Different Situations in Which One Sample Test is Used
M
10.4.1 Exploring Case-I
10.4.2 Exploring Case-II
10.4.3 Exploring Case-III
10.4.4 Exploring Case-IV
10.4.5 Exploring Case-V
10.4.6 Exploring Case-VI
Self Assessment Questions
10.5 Two-Sample Tests
10.5.1 Differences between Two Independent Samples
10.5.2 Differences between Two Proportions
10.5.3 Comparing Two Related Samples
10.5.4 Study of Equality of Variances of Two Populations
Self Assessment Questions
10.6 Exploring ANOVA
10.6.1 One-Way ANOVA
10.6.2 Two-Way ANOVA
Self Assessment Questions
10.7 Summary
10.8 Key Words
10.9 Case Study
10.10 Exercise
10.11 Answers for Self Assessment Questions
10.12 Suggested Books and e-References
L E A R N I N G O B J E C T I V E S
IT
After studying this chapter, you will be able to:
Distinguish between parametric and non-parametric tests for testing hypotheses
Describe the different types of parametric tests
Explain the concepts of one-sample tests and two-sample tests
Describe the concept of ANOVA
M
Parametric Tests
10.1 INTRODUCTION
In the previous chapter, you studied to test a hypothesis to find the solution of
a research problem. To check the validity of a hypothesis, you can use two main
types of tests, parametric tests and non-parametric tests. This chapter describes the
various types of parametric tests.
Parametric tests are statistical measures used in the analysis phase of research to
draw inferences and conclusions to solve a research problem. There are various
types of parametric tests, such as z-test, t-test and F-test. Selection of a particular
test for a research depends upon various factors, such as the type of population,
sample size, Standard Deviation (SD) and variance of population. It is important
for a researcher to identify the appropriate test to maintain the authenticity and
validity of research results.
In this chapter, you will learn about the concept of parametric tests. You will learn
about one-sample and two-sample tests. You will also learn to apply z-test, t-test
and F-test in different conditions and scenarios for one-sample and two-sample
tests.
IT
10.2 TYPES OF HYPOTHESIS TESTING
A hypothesis can be tested by using a large number of tests. Therefore, researchers
have found it more convenient to categorise these tests on the basis of their
similarities and differences. Hypothesis tests are divided into two types, as shown
in Figure 1:
Types of Hypothesis
M
Tests
Parametric tests: In these tests, the researcher makes assumptions about the
parameters of the population from which a sample is derived. An example
of a parametric test is z-test.
Non-parametric tests: These are distribution-free tests of hypotheses. Here,
the researcher does not make assumptions about the parameters of the
population from which a sample is derived. An example of a non-parametric
test is the Kruskal Wallis test.
1. What do you call the hypotheses tests where the researcher makes
assumptions about the parameters of the population from which a
S elf sample is derived?
A ssessment
Q uestions a. Non-parametric tests b. Parametric tests
c. Chi - Square test d. Distribution-free tests
197
Research Methodology and Management Decision
For example, t-test assumes that the variable under study in population is normally
distributed. Researchers calculate the parameters of population using various test
statistics. Then, they test the hypothesis by comparing the calculated value of
parameters with the benchmark value given in the problem. The scale used for
dependent value in parametric tests is mostly the interval scale or ratio.
z-test
Parametric Tests
IT t-test
F-test
198
Parametric Tests
If the sample size is small but the population is normal and the population
N ote standard deviation is known; then, z-test can be used.
F-test: This test is used to compare the ratio of variances of two samples under
study. It involves comparing the ratio of two variances of two samples. The
IT F-distribution is a right-skewed distribution that is used most common in
Analysis of Variance (ANOVA). Here, the test statistic has an F-distribution.
The F-value (test statistic) is calculated for the present data and compared
with the F-value at that level of significance, which is decided earlier in the
question/problem. In a F-test, these are two independent degrees of freedom
in numerator and denominator respectively. The degrees of freedom (d.f.) of
two samples are calculated separately by subtracting one from the number
of observations. After that, the F-value is calculated from the F-distribution
M
table.
All the hypothesis testing are done assuming that the null hypothesis is true.
N ote
Parametric tests are further divided into two parts – one-sample tests and two-
sample tests. You will learn more about them in the next sections.
ASSUMPTIONS OF F-TEST
199
Research Methodology and Management Decision
2. Which of the following parametric tests is used to study the mean and
proportion of samples having a sample size more than 30?
S elf
M
A ssessment a. t-test b. F-test
Q uestions
c. Chi-square test d. z-test
3. The ___________ is used to compare the mean of samples when the
sample size is less than 30 and the population variance is unknown.
4. Which test is used to compare the significant difference between the
variances of two samples under study?
a. z-test b. Chi-square test
c. t-test d. F-test
5. The degree of freedom is calculated by subtracting ___________ from the
___________ for t-test.
200
Parametric Tests
=t
X −µ
×
(N − n )
(s / n ) ( N − 1)
M
Where,
μ = Population mean
N = Population size
n = Sample size
X = Sample mean
z=
X −µ
×
(N − n )
(σ / n ) ( N − 1)
Where,
μ = Population mean
201
Research Methodology and Management Decision
n = Sample size
X = Sample mean
Solution: The null hypothesis and the alternative hypothesis are as follows:
H0: The average of production of product A is the same as the overall production
of all products combined.
H1: The average of production of product A is more than the overall production of
IT all products combined.
Or,
H0: µ = 8 cm
H1: µ > 8 cm
Since the population is finite, the researcher uses the following formula for z-test
to test the hypothesis for significance:
z=
X −µ
×
(N − n )
(σ / n ) ( N − 1)
10 − 8 50 − 35
=z ×
2.5 50 − 1
35
2 15
=z ×
2.5 49
5.91
The z-value for the 5% level of significance for one-tailed test is + 1.64.
202
Parametric Tests
Acceptance Region
Rejection Region
+1.64 +2.61
In Figure 3, it can be observed that the calculated value of z lies in the rejection
region; therefore, H0 is rejected. This implies that the average diameter production
of product A is more than the overall production.
IT
10.4.3 EXPLORING CASE-III
In Case-III, the population is normal and infinite, the sample size is large and the
population variance is unknown. In this case, the following test statistic is used:
X −µ
t=
(s / n )
Where,
M
μ = Population mean
n = Sample size
s = Standard Deviation of sample
X= Sample mean
Let us understand the application of Case-III with the help of an example.
Example 2: The rating given by 36 existing customers of an organisation from the
south part of a city to a newly launched product is as follows (1 being the lowest
and 10 being the highest rating):
5, 6, 10, 9, 8, 7, 2, 3, 8, 9, 7, 9, 10, 4, 3, 2, 10, 8, 9, 6, 2, 6, 5, 8, 9, 7, 7, 7, 7, 2, 4, 5, 5, 5,
10, 10
The marketers have the average rating from the whole city as 7.5. Now, the
organisation wants to know whether the south part also has the same rating. Use
5% as the level of significance.
Solution: The null hypothesis and the alternative hypothesis are as follows:
H0: The average rating of the south part of the city is the same as the average rating
of the city
203
Research Methodology and Management Decision
H1: The average rating of the south part of the city is not the same as the average
rating of the city
Or,
H0: μ = 7.5
H1: μ ≠ 7.5
Where, μ = population mean, that is, the rating given by the customers in the south
part of the city
The data and the calculation part of the previous problem are shown in Table 2:
204
Parametric Tests
∑X i
Sample mean (X) = i =1
n
X = 266/36
X = 7.38
IT
Population mean (μ) = 7.5
Sample size (n) = 36
Since the standard deviation for the population is not given, the researcher needs
to calculate the SD for the sample.
∑(X − X)
2
i
i =1
Standard Deviation of sample (s) =
M
( n − 1)
s= 106.56
35
s = 3.044
The population is infinite; therefore, the researcher uses the following formula for
t-test to test the hypothesis for significance:
X −µ
t=
s/ n
7.38 − 7.5
t=
3.044 / 36
−0.12 −0.12 × 6
t= = = −0.236
3.044 / 6 3.044
The t-value for the 5% level of significance for two-tailed test is + 2.03.
After checking the t-value for significance, the researcher applies two-tailed test.
205
Research Methodology and Management Decision
Acceptance Region
In Figure 4, it can be observed that the calculated z-value lies in the acceptance
region; therefore, We do not reject H0. This implies that the average rating of the
south part of the city is the same as the average rating of the city.
n = Sample size
x = value to be standardised
Solution: The null hypothesis and the alternative hypothesis are as follows:
H0: The proportion of girl students observed in the survey is the same as in the
college record.
206
Parametric Tests
H1: The proportion of girl students observed in the survey is different from their
proportion in the college record.
Or,
H0: p = 0.40
H1: p ≠ 0.40
Where,
p= Probability of success, that is, the actual proportion of girls in the college
p = 0.40
q = 1 – 0.40
q = 0.60
Sample size (n) = 3000
Observed sample proportion, (p̂) = 1450/3000
(p̂) = 0.4833
IT
z=
p̂ − p
pq
n
z = 0.0833/0.009
z = 9.26
The z-value for the 5% level of significance for two-tailed test is ± 1.96. The graphical
M
representation of the preceding solution is shown in Figure 5:
Acceptance Region
Rejection Region
Figure 5: Calculated z-value When the Proportion of Population and Sample Means Are Given
In Figure 5, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. This implies that the proportion of girl students observed in the
survey is different from their proportion in the college record. It can be interpreted
from the calculated z-value that the average number of girls in the college has
increased.
207
Research Methodology and Management Decision
μ = Population mean
n = Sample size
X = Sample mean
208
Parametric Tests
The average recorded package for the marketing executive post is ` 2 lakhs. The
researcher wants to know whether the average recorded package is valid for this
group or not. Use 5% as the level of significance.
Solution: The null hypothesis and the alternative hypothesis are as follows:
H0: The average recorded package and the sample average income of group are
the same.
H1: The average recorded package and the sample average income of group are
different.
Or,
IT
H0: µ = 2,00,000
H1: µ ≠ 2,00,000
Where,
µ = Population mean, that is, the sample mean for the income of the group of 25
executives.
The data and the calculation part for this example are shown in Table 3:
M
Table 3: Income of People at the Marketing Executive Post
No. of 2
Income (lacs) Xi – X (Xi – X )
Observations
1 2 0.04 0.0016
2 1.9 –0.06 0.0036
3 2 0.04 0.0016
4 2 0.04 0.0016
5 1.9 –0.06 0.0036
6 2 0.04 0.0016
7 1.9 –0.06 0.0036
8 2 0.04 0.0016
9 1.9 –0.06 0.0036
10 2 0.04 0.0016
11 2 0.04 0.0016
12 1.9 –0.06 0.0036
13 2 0.04 0.0016
14 1.9 –0.06 0.0036
209
Research Methodology and Management Decision
No. of 2
Income (lacs) Xi – X (Xi – X )
Observations
15 2 0.04 0.0016
16 2 0.04 0.0016
17 1.8 –0.16 0.0256
18 2 0.04 0.0016
19 2 0.04 0.0016
20 2 0.04 0.0016
21 2 0.04 0.0016
22 1.9 –0.06 0.0036
23 2 0.04 0.0016
24 1.9 –0.06 0.0036
25 2 0.04 0.0016
Total ∑Xi = 49 ∑(Xi–X)2 = 0.08
∑X i
Sample mean (X) = i =1
n
49
X=
25
M
X = 1.96
Since the standard deviation for the population is unknown, the researcher needs
to calculate the standard deviation for the sample as follows:
∑(X − X)
2
i
Standard deviation of sample (s) =
n −1
0.08
s=
24
s = 0.058
The population is infinite; therefore, the researcher uses the following formula for
t-test to test the hypothesis for significance:
X −µ
t=
s/ n
−0.04
t=
0.0116
t = – 3.45
210
Parametric Tests
= 25 – 1
= 24
The t-value for the 5% level of significance for two-tailed test and 24 d.f. is ±2.064.
The graphical representation of the preceding solution is shown in Figure 6:
Acceptance Region
Rejection Region
In Figure 6, it can be observed that the calculated t-value lies in the rejection region;
therefore, H0 is rejected. This implies that the average recorded package and the
sample average of the income of the group are different. It can be interpreted that
the average income for the marketing executive post has decreased in the market.
t=
X −µ (N − n )
s/ n ( N − 1)
Where,
μ = Population mean
n = Sample size
X = Sample mean
211
Research Methodology and Management Decision
Differences between two proportions
Comparing two-related samples
Equality of the variances of two populations
The two-sample test in different situations is discussed in detail in the upcoming
sections.
These different situations with examples are discussed in the following sections.
Situation I
In this situation, the population is normal, the sample size is large and the population
variance is unknown. The researcher can use either two-tailed test or one-tailed
212
Parametric Tests
test depending on the alternate hypothesis of the research. If the researcher wants
to compare the two samples drawn from two different populations, then he/she
would use the following test statistic:
X1 − X 2
t=
(s 2
1 ) (
/ n 1 + s 22 / n 2 )
Where,
Or,
H0: μ1 = μ2
H1: μ1 ≠ μ2
Where,
The data and the calculation part of the preceding problem are shown in Table 5:
213
Research Methodology and Management Decision
∑X 1i
Sample mean of Brand A (X1) = i =1
n
214
Parametric Tests
315
X1 =
35
X1 = 9
n
∑X 2i
Sample mean of Brand B (X2) = i =1
n
328
X2 =
35
X2 = 9.37 9.4
∑(X − X1 )
2
(s1) = 1i
(n 1
− 1)
36
=
IT =
34
1.058
= 1.028
∑(X − X2 )
2
2i
(s2) =
(n 2
− 1)
8.2
= = 0.2411
= 0.491
M
34
Since the sample size is more than 30 and two samples are under study, the
researcher applies the following z-test:
X1 − X 2
t=
(s 2
1 ) (
/ n 1 + s 22 / n 2 )
(1.028 ) ( 0.491)
2 2
215
Research Methodology and Management Decision
Acceptance Region
Rejection Region
In Figure 7, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. The popularity of Brand A is not the same as the popularity of
IT Brand B.
Situation-II
In this situation, the population is normal, the sample size is large, and the
population variance is known. The researcher can use either two-tailed test or one-
tailed test depending on the alternate hypothesis of the research. If the researcher
wants to compare two samples drawn from the same population, then he/she
would use the following test statistic:
M
X1 − X 2
Z=
1 1
σ 2p +
n 1 n 2
Where,
216
Parametric Tests
Or,
H0: μ1 = μ2
H1: μ1 ≠ μ2
Since the sample size is more than 30, the population variance is known, and two
samples are under study, the researcher would apply the following z-test:
X1 − X 2
Z=
1 1
σ 2p +
n 1 n 2
1 1
(1000 − 1200 ) / (14 )
2
z= 500 + 400
4 + 5
z= ( −200 ) / 196
2000
z= ( −200 ) / 1764 / 2000
z= ( −200 ) / 0.939
z = −212.99
217
Research Methodology and Management Decision
The z-value for the 5% level of significance for two-tailed test is ± 1.96. The graphical
representation of the preceding solution is shown in Figure 8:
Acceptance Region
Rejection Region
In Figure 8, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. This implies that the population means of product P and Q are
different. It can be interpreted that the calculated z-value showing the difference
IT between means of two samples is statistically significant.
Situation-III
In this situation, the population is normal, the sample size is small, and the
population variance is unknown. The researcher can use either two-tailed test or
one-tailed test on the basis of research problem and the alternative hypothesis. If the
researcher wants to compare two samples drawn from two different populations,
then he/she would use the following test statistic:
M
X1 − X 2
t=
SE
Where,
SE = Standard Error
1 1
SE = S p +
n1 n 2
σ12 ( n 1 − 1) + σ 22 ( n 2 − 1)
Sp =
(n 1
− 1) + ( n 2 − 1)
X1 − X 2
∴t =
σ12 ( n 1 − 1) + σ 22 ( n 2 − 1) 1 1
+
( n 1 − 1) + ( n 2 − 1) n 1 n 2
Where,
218
Parametric Tests
t=
(100 − 200 )
( 5.5 ) (10 − 1) + ( 6.5 ) (10 − 1) 1
2 2
+ 1
(10 − 1) + (10 − 1) 10 10
−100
t=
9 ( 30.25 + 42.25 ) 1
18 5
−100
=
7.25 219
−100 −100
= =
2.692 2.7
= −37.03
t=
(100 − 200 )
( 5.5 ) (10 − 1) + ( 6.5 ) (10 − 1) 1
2 2
+ 1
(10 − 1) + (10 − 1) 10 10
−100
t=
(
9 30.25
Research Methodology and Management + 42.25
Decision ) 1
18 5
−100
=
7.25
−100 −100
= =
2.692 2.7
= −37.03
The t-value for the 5% level of significance for two-tailed test with 18 as degree
of freedom is ± 2.101. The graphical representation of the preceding solution is
shown in Figure 9:
Acceptance Region
Rejection Region
In Figure 9, it can be observed that the t-value lies in the rejection region; therefore,
H0 is rejected. This implies that the average sales volume of City A is not equal to
the average sales volume of city B. It can be interpreted from the calculated t-value
that the difference between the means of the two samples is statistically significant.
M
10.5.2 DIFFERENCES BETWEEN TWO PROPORTIONS
In this study, a researcher finds the relationship between two samples that are
given in the form of proportions. The researcher tries to find whether the two
proportions are significantly different from each other or not. The samples are
drawn from the same or different populations. This study can also be used to
compare the proportions of a sample and a population:
Where,
220
Parametric Tests
Example 8: In a college, there are two streams: Science and commerce. The college
management wants to find out whether there is a significant difference between
the proportions of average students (students who are neither toppers or laggards
with respect to study) of the two streams. Therefore, the management conducts a
survey and finds out that 350 students out of 500 students of the science stream are
under the category of average students. In the case of the commerce stream, 550
students out of 600 students are under the category of average students. Use 5% as
a level of significance.
Solution: The null hypothesis and the alternative hypothesis are as follows:
Or,
H0: p1 = p2
H1: p1 ≠ p2
Where,
M
p1 = Proportion of success in the science stream
221
Research Methodology and Management Decision
0.2
z=
0.029
z = 6.9
The z-value for the 5% level of significance for two-tailed test is ± 1.96.
The graphical representation of the preceding solution is shown in Figure 10:
IT Acceptance Region
Rejection Region
M
–1.96 +1.96 6.9
Figure 10: Rejection of the Calculated z-value in the Case of Two-Sample Proportions
In Figure 10, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. This implies that there is a significant difference between the average
students of the science and commerce streams in the college. It can be interpreted
from the calculated z-value that the difference between the proportions of the two
samples is statistically significant.
Solution: The null hypothesis and the alternative hypothesis are as follows:
222
Parametric Tests
Or,
H0: p1 = p2
H1: p1 ≠ p2
Where,
p1 = 0.71
q1 = 0.29
p2 = 0.625
IT
Proportion of failure in sample two, q2 = 1 – p1= 1 – 0.625
q2 = 0.375
The two samples are taken from the same population; therefore, you can calculate
M
the best estimate for proportion, which is the common value of proportion. The
best estimate for proportion (p0) for the two samples of colleges involved in
ragging can be calculated as follows:
n 1 p1 + n 2 p 2
p0 =
n1 + n 2
p0 =
( 700 × 0.71) + ( 800 × 0.625 )
700 + 800
p0 = 0.66
q0 = 1 – 0.66
q0 = 0.34
223
Research Methodology and Management Decision
0.71 − 0.625
z=
( 0.66 × 0.34 ) + ( 0.66 × 0.34 )
700 800
0.085
z=
0.024
z = 3.54
The z-value for the 1% level of significance for two-tailed test is ± 2.58. The graphical
representation of the preceding solution is shown in Figure 11:
Acceptance Region
Rejection Region
IT –2.58 +2.58 3.54
Figure 11: The z-value Calculated with the Help of Best Estimate of Proportion
In Figure 11, it can be observed that the z-value lies in the rejection region;
therefore, H0 is rejected. This implies that there is a significant difference between
the number of engineering colleges involved in littering. It can be interpreted from
the calculated z-value that the difference between the proportions of two samples
is statistically significant.
M
10.5.3 COMPARING TWO RELATED SAMPLES
In this study, the researcher takes two related samples. The samples are related to
each other in some way. They are compared to find a relationship between them.
The researcher has to test if there is any statistical difference between the means for
the two groups. This type of study is done to find out the impact of certain policies
on an entity, such as the impact made by introducing new human resource policies
on an organisation. To study the impact of changes, data is collected before and
after the occurrence of events. The difference between both the samples (datasets)
is calculated to test whether the samples show a positive or negative impact of the
changes.
If a researcher wants to compare two related samples, then he/she can use the
following test statistic:
D
t=
(SD n )
Where,
224
Parametric Tests
n = Sample Size
(∑ D)
2
∑D 2
−
n
( SD ) =
( n − 1)
∑(X
i =1
1i
− X1 )
2
(n 1
− 1)
n1
∑(X − X2 )
2
2i
s22 = Variance of the second sample = i =1
(n − 1)
M
2
Variance of the two samples can be calculated using the following formula:
X1i = Value of observation of the first sample
X2i = Value of observation of the second sample
X1 = Mean of the first sample
X2 = Mean of the second sample
n1 = Sample size of the first sample
n2 = Sample size of the second sample
Degree of freedom for first sample 1, v1 = n1 – 1
Degree of freedom for second sample 2, v2 = n2 – 1
Let us learn to calculate the equality of variances from two different populations
with the help of an example.
225
Research Methodology and Management Decision
Example 10: A researcher studied two samples of a type of wheat produced from
the north region and the south region of a state. He took two samples of wheat –
type A (north region) and type B (south region). The sample size of type A wheat
is 10 cities and the sample size of type B wheat is 13 cities. The variance for two
samples with respect to gluten content are 5 and 4 respectively. The researcher
wants to find out whether the two populations have the same variance. Test this at
the 5% significance level.
Solution: The null hypothesis and the alternative hypothesis are as follows:
H0: The variance of the two populations is the same.
H1: The variance of the two populations is different.
Or,
H 0 : σ12 =σ 22
H1 : σ12 ≠ σ 22
Where,
Therefore,
The test of significance used is:
s12
M
F=
s 22
F = 5/4
F = 1.25
Degree of Freedom for sample A = n1 – 1
= 10 – 1= 9
Degree of Freedom for sample B = n2 – 1
= 13 – 1
= 12
The value of sample B is greater than the value of sample A; therefore, v1 = 12 and
v2 = 9.
In this case, the F-values for the two-tailed test are calculated as:
Fα/2 = F(0.025,12,9) = 3.87
1
F1–α/2 = F(0.975,12,9) = 0.291 =
F0.025,9,12
226
Parametric Tests
Accept H0
Reject H0
Reject H0
1.25
F1–a/2 = 0.291 Fa/2 = 3.87
In Figure 12, it can be observed that the calculated F-value lies in the acceptance
IT region; therefore, H0 is accepted and H1 is rejected. This implies that there is
no difference between the variances in gluten content of two populations. It
can be interpreted from the calculated F-value that the samples are statistically
insignificant, that is, the variances of the two populations are equal.
X1 − X 2 X1 − X 2
a. z = b. t =
(s 12
/ n 1 ) + ( s 22 / n 2 ) (s 2
1 ) (
/ n 1 + s 22 / n 2 )
X1 − X 2 D
c. z = d. t =
σ (1 / n 1 ) + (1 / n 2 ) (SD / n )
10. In the given formula t = D/(SD/√n) what does D stands for?
a. Mean difference between two samples
b. Standard deviation of sample
c. Sample size
d. Sample density
227
Research Methodology and Management Decision
) ( )
Within Sample k n
(n – k) SS within/(n – k)
∑∑ Xij − X=∑∑
2
SSE X ij − X
=
(
n k
(Error) = SSE i 1 =j 1
=i 1 =j 1
Total (n – 1)
∑∑ ( X )
k n 2
SST
= ij
−X
=i 1 =j 1
∑∑ ( X )
k n
SSE
= −X
1 n ij
2
( )
k =i 1n=j
∑1 XijXij − X
1
XSSE=
n=∑∑
= i
i j1==j 1
228
Parametric Tests
2. Calculate the mean of all sample means (grand mean) with the help of the
following formula:
k n
∑∑ X
=i 1 =j 1
ij
X=
kn
3. Calculate the variation between two samples, known as SS between, with the
help of the following formula:
∑ n (X )
k 2
SSB
= i i
−X
i =1
SS between is the square of deviations of the sample means from the mean of
the sample means value. It helps know variations between two samples.
4. Divide SS between with d.f. k – 1 to get mean of square between (MS between).
MS between is the mean of variations in two samples. The following formula
is used to calculate MS between:
MS between = SS between/(k – 1)
5. Calculate variation within samples, known as SS within (SSE), with the help
IT of following formula:
SS within = ∑ (X1i– X1)2 + ∑ (X2i– X2)2 + ∑ (X3i– X3)2 + .......
Where, X1i, X2i, X3i,.... = observed values in a sample
X1, X2, X3,.... = means of corresponding samples
SS within is the square of deviations of values of data series from the
corresponding means of samples. It helps calculate variations within samples.
M
Please note that here in the given text, the One-way ANOVA has been explained
for cases where all the groups have same sample size.
N ote
However, the researcher must assess the impact of using unequal sample sizes
as it can affect the homogeneity of the variance that is assumed initially in
ANOVA.
6. Divide SS within with d.f. n – k to get mean of square within (MS within).
MS within is the mean of variations occurred within samples. The following
formula is used to calculate the MS within:
MS within = SS within/(n – k)
Where, n= total of the sample size of all the samples that is n1 + n2 +…..
7. Add the square of deviations to get the total variation in samples. The
following formula is used to calculate the total variation: 2
SSE ∑∑ ( X − X )
k n
=
( )
ij
k n 2 =i 1 =j 1
Total variation
= = SST ∑∑ X ij − X
=i 1 =j 1
To calculate total SS, first individual observations are subtracted from the
mean of sample means. After that, the square of individual observations are
taken and summed up to obtain results. The d.f. used in this case is n-1.
229
Research Methodology and Management Decision
Example 11: The researcher observed the sale of a product of a particular brand
in six big retail houses in three cities. He/She wants to determine whether the
mean sale is same across cities. Use the data shown in Table 7 to calculate one-way
ANOVA:
230
Parametric Tests
Within Sample
Total
2.1
54.34
56.48
(3 – 1) = 2 2.1/2 = 1.06
You can check the F-table for significance with the help of one-tailed test. The
graphical representation of the preceding solution is shown in Figure 13:
M
Acceptance Region
Rejection Region
0.29 3.68
Figure 13 shows that the calculated F-value lies in the acceptance region; therefore,
H0 is accepted and H1 is rejected. The value implies that product sale is almost
same in the three cities. You can also use another method of ANOVA, which is
performed with the help of correction factor. It is also termed as the shortcut
method. It is more convenient in case of non-integer values. The steps involved in
this method are mentioned below:
1. Calculate the correction factor with the help of the following formula:
Correction Factor = (T)2/n
Where, T= summation of all the observed values in the samples
n = Total number of observations
231
Research Methodology and Management Decision
Example 12: First, calculate the correction factor and then various components of
ANOVA table.
232
Parametric Tests
Where, T= summation of all the observed values in the three cities collectively
= 773.6
= 2.1
= (3)2+ (8)2 + (4)2 + (9)2 + (6)2 + (7)2 + (6)2 + (9)2 + (8)2 + (5)2 + (7)2 + (4)2+ (9)2 + (8)2 + (6)2
+ (7)2 + (5)2 + (7)2 – 775.67
= 54.34
IT
Total SS = 830 – 773.6
= 56.4
The values of total SS, SS between and SS within are same in both the cases used
for the calculation of ANOVA. Therefore, the ANOVA table would also be same.
(T )
2
Correction factor =
n
Where, T= summation of all the observed values in the samples
n = total number of observations
2. Compute SS between rows. To do so, first take the sum of observed values
in each row. Thereafter, take the square of the sum of observed values
and divide the number with the respective sample size of rows. Then, the
resultant values are added and difference between the added value and
correction factor is taken to obtain the variation between two rows. The
following formula is used to calculate SS between rows:
(T ) − (T )
2 2
∑
j
SS between rows =
nj n
In two-way ANOVA, there are three possible null hypotheses. They are:
N ote 1. There is no difference in the means of the first factor
2. There is no difference in the means of the second factor
3. There is no interaction between first and second factors
For null hypotheses 1 and 2; the alternative hypothesis is: The means of first
factor and second factor are not equal.
3. Divide SS between rows with d.f. k – 1 to get MS between rows, which is the
mean of variations occurred in between row samples. Similarly, MS between
rows for other attributes can also be calculated.
IT The following formula is used to calculate MS between rows:
MS between rows =
SS between rows
( r – 1)
Where, r = number of rows
4. Calculate SS between columns. To do so, first take the sum of observed
values in each column. Thereafter, take the square of sum of observed values
and divide the number with the respective sample size of columns. Then,
M
the resultant values are added and difference between the added value and
correction factor is taken to obtain the variation between columns. Similarly,
SS between columns for other attributes can also be calculated. The following
formula is used to calculate SS between columns:
(T ) − (T )
2 2
∑
j
SS between columns =
nj n
SS between columns
MS between columns =
( c – 1)
Where, c = total of the sample size of all the columns
234
Parametric Tests
6. Calculate total variation by first taking the sum of squares of all individual
values in the samples. After that, subtract the sum of squares from correction
factor. Similarly, total variation for other attributes can also be calculated.
The following formula is used to calculate variation:
∑ X − (T )
2 2
ij
Total SS =
n
7. Compute residual variation by first adding SS between and SS within, and
then subtracting the difference between Total SS and the value obtained by
adding up SS between and SS within. Similarly, residual variation for other
attributes can also be calculated. The following formula is used to calculate
residual variation:
Residual variation = Total SS – (SS between + SS within)
8. Calculate the F-ratio with the help of the following formula:
MS between
F-ratio =
MS within
The calculated value of F-ratio is tested against the tabulated F-value that is
IT
determined at a specified level of significance. If the calculated value of F-ratio
lies under the limits of acceptance region, the null hypothesis is accepted and the
alternate hypothesis is rejected.
Let us understand the application of two-way ANOVA with the help of an example.
Example 13: Three respondents have rated three small cars of different brands on
a five-point scale (5 being the highest) with respect to their features. The ratings
and features are provided in Table 9:
M
Table 9: Rating Given by Customers to Different Brands of Car with Respect to their Features
Respondents Mileage Durability Maintenance Technology Price
Cost
1 Zen 3 2 4 3 5
i10 4 4 4 5 4
Alto 4 3 5 2 4
2 Zen 2 4 3 1 4
i10 4 5 3 4 4
Alto 3 1 2 5 3
3 Zen 4 5 3 2 4
i10 3 2 4 5 3
Alto 4 5 4 5 5
The researcher wants to know difference between the brands in terms of features.
H0: There is no difference in the means of the five features of the cars
235
Research Methodology and Management Decision
(T )
2
Correction factor =
n
162 × 162
=
45
= 583.2
= 585.2 – 583.2
=2
56 × 56 48 × 48 58 × 58
SS between rows (i.e., between cars) = + + − 583.2
15 15 15
IT = 209.1 + 153.6 + 224.3 – 583.2
= 587– 583.2
= 3.8
Total SS = (3)2 + (4)2 + (4)2 + (2)2 + (4)2 + (3)2 + (4)2 + (4)2 + (5)2 + (3)2 + (5)2 + (2)2 + (5)2 +
(4)2 + (4)2 + (2)2 + (4)2+ (3)2 + (4)2 + (5)2 + (1)2 + (3)2 + (3)2 + (2)2 + (1)2 + (4)2 + (5)2 + (4)2 +
(4)2 + (3)2 + (4)2 + (3)2 + (4)2 + (5)2 + (2)2 + (5)2 + (3)2 + (4)2 + (4)2 + (2)2 + (5)2 + (5)2 + (4)2
M
+ (3)2+(5)2 – 583.2
= 638 – 583.2
= 54.8
= 54. 8 – (2 + 3.8)
= 49
236
Parametric Tests
You can check the F-value for significance with the help of one-tailed test. The
graphical representation of the preceding solution for F-value at 4 v1 and 8 v2 is
shown in Figure 14:
Acceptance Region
Rejection Region
0.08 3.84
Rejection Region
M
0.31 4.46
Figure 14 and Figure 15 show that the calculated F-value lies in the acceptance
region; therefore, We do not reject H0. The value implies that the cars have same
features.
10.7 SUMMARY
A hypothesis can be tested by using a large number of tests and these tests
are connected with each other in one way or another.
237
Research Methodology and Management Decision
F-test: A test that is used to compare the significant difference between the
variances of two samples under study.
Non-parametric tests: The tests which do not make assumptions about the
parameters of the population from which a sample is derived.
Parametric tests: The tests which make assumptions about the parameters of
the population from which a sample is derived.
t-distribution: A type of probability distribution that is appropriate for
M
estimating the mean of a normally distributed populated where the sample
size is small and population variance is unknown.
t-test: A test that is used to study the means of samples having a sample size
below 30 and unknown population variance.
z-test: A test that is used to study the means and proportion of samples
whose size is more than 30.
238
Parametric Tests
The customers and dealers have provided marks out of 100 for their satisfaction
level. The data collected by the marketing research department for customer
satisfaction and dealer satisfaction is as follows:
After conducting the research, the company comes to the conclusion that warranty
39
65
cards do not have much impact on the customers’ and dealers’ satisfaction. A reason
behind this can be that the same type of warranty is given by the competitors of
National Motors Inc.
QUESTIONS
M
1. Find out the effect of warranty cards on the satisfaction of customers with the
help of data provided in the case study. Use 5% as the level of significance to
test the hypothesis.
(Hint: H0: The customer satisfaction before and after returning the warranty
card is the same.)
2. What should National Motor do to overcome this problem?
(Hint: The company can conduct a survey regarding the available warranty
cards in the entire motor scooters industry.)
10.10 EXERCISE
1. What are the two types of hypotheses tests?
2. Explain the different types of parametric tests.
3. Explore the following cases of one-sample tests:
a. Normal and infinite population, large sample size, known population
variance and two-tailed or one-tailed test.
b. Normal and infinite population, small sample size, unknown population
variance and two-tailed or one-tailed test.
239
Research Methodology and Management Decision
11.
a.
SD / n
Mean difference between
two samples
ANOVA
12. True
13. Two-way
E-REFERENCES
(2018). Retrieved from http://www.ihmgwalior.net/pdf/research_
methodology.pdf
Alzheimer Europe - Research - Understanding dementia research - Types of
research - The four main approaches. (2018). Retrieved from https://www.
alzheimer-europe.org/Research/Understanding-dementia-research/Types-
of-research/The-four-main-approaches
Research Methodology - Introduction - Notes for Students. (2018). Retrieved
from https://bbamantra.com/research-methodology/
Research methods and methodology. (2018). Retrieved from http://www.
emeraldgrouppublishing.com/research/guides/methods/
240
CHAPTER
11
NON-PARAMETRIC TESTS
Table of Contents
IT
Learning Objectives
11.1 Introduction
11.2 Non-Parametric Tests
Self Assessment Questions
11.3 Sign Test
11.3.1 One Sample Sign Test
11.3.2 Two Sample Sign Test
M
11.3.3 Wilcoxon Matched Pairs Test/Signed Rank Test
Self Assessment Questions
11.4 Rank Correlation
Self Assessment Questions
11.5 Rank Sum Test
11.5.1 Mann-Whitney Test or U Test
11.5.2 Kruskal-Wallis Test
Self Assessment Questions
11.6 Chi-square Test
11.6.1 Chi-square Test for Goodness of Fit
11.6.2 Chi-square Test for Independence
Self Assessment Questions
11.7 Summary
11.8 Key Words
11.9 Case Study
11.10 Exercise
11.11 Answers for Self Assessment Questions
11.12 Suggested Books and e-References
L E A R N I N G O B J E C T I V E S
IT
After studying this chapter, you will be able to:
Explain about the non-parametric test
Discuss about the sign tests
Explain about the rank correlation
Describe about the rank sum tests
Elaborate the Wilcoxon matched pairs test
Explain the concept of Chi-square test
M
Non-Parametric Tests
11.1 INTRODUCTION
In the previous chapter on Parametric Tests, you have learned about different
types of parametric tests used to check the validity of a hypothesis. You have also
studied that parametric tests can only be applied if you know population type and
population parameters, such as mean and variance. However, if this information
is unavailable, you cannot use parametric tests. In such a situation, you need non-
parametric tests to check the validity of a hypothesis and draw inferences.
Non-parametric tests are used when you do not have adequate information about
population type and parameters. These tests are widely used to study data given in
the form of ranks. Examples of non-parametric tests are sign tests, rank correlation,
rank sum test, Wilcoxon matched pairs and chi-square test. The selection of the
test depends on problem type, sample size and data. For example, rank correlation
is used to establish correlation between two ranked data sets. Researchers should
observe caution while selecting a non-parametric test to ensure accurate and
precise results.
This chapter covers non-parametric tests and their types. It provides information
IT about one and two sample sign tests. It also elaborates on rank correlation and
rank sum tests, including the Mann-Whitney and Kruskal Wallis tests. In addition,
it explains the Wilcoxon matched pairs test/signed rank test. Finally, the chapter
also sheds light on chi-square test for goodness of fit and chi-square test for
independence.
243
Research Methodology and Management Decision
Sign Test
Non-parametric Tests
Rank Correlation
Chi-Square Test
Sign test
244
Non-Parametric Tests
value of less or greater than median value is equal. This implies that the proportion
of success (p) and failure (q) is equal which means that p = q = 0.50. Therefore, it is
called binomial sign test. In one sample sign test, the researcher provides sample
values with positive (+) and negative (–) signs to test the hypothesis.
Sign test is a hypothesis test for population median and not for population
N ote mean.
In any given sign test, each data value or observation is converted into a plus sign
or minus sign. The allocation of + and – signs is done by assuming a median value
of the sample. Values that are greater than the median value are replaced by plus
sign and the values that are less than the median value are replaced by minus
sign. The values that are equal to the given median value are discarded or not
considered. After assigning the signs, the researcher may test the null hypothesis
M
that the probability of getting plus and minus signs are 0.5.
When n is large and when p is sufficiently large (i.e., p > 0.10); then, normal
distribution is used as an approximation of binomial distribution.
N ote
The mean and standard deviation of normal distribution are given as follows:
Mean µ = np
SD = σ = npq
Let us understand the application of one sample sign test with the help of an
example.
245
Research Methodology and Management Decision
09, 10, 16, 18, 17, 19, 20, 16, 14, 12, 11, 13, 14, 09 and 13
Test the hypothesis that the median score of all the students is equal to 15 against
the hypothesis that the median score of 15 students is greater than 15. Use 5% level
of significance.
OR
H0: p = 0.5
The researcher assigns minus (–) sign to values of less than 15 and plus (+) sign to
values of greater than 15.
IT Observation
Sign
19
+
17
+
16
+
18
+
17
+
19
+
20
+
16
+
16
+
18
+
11
–
13
–
14
–
09
–
13
–
No. of + signs = 10
No. of – signs = 5
M
Number of observations = 15
It must be remembered that the test statistics is larger of the number of + signs and
the number of – signs.
Now, we need to check whether 10 plus signs observed in the given 15 trials
support the null hypothesis that p = 0.5 or p > 0.5.
246
Non-Parametric Tests
Acceptance Region
Rejection Region
IT –1.96 +1.295 +1.645
Figure 3 shows that the calculated binomial value lies in the acceptance region;
therefore, H0 is accepted. This implies that the median marks scored by 15 students
are 15.
M
11.3.2 TWO SAMPLE SIGN TEST
In two sample sign test, the researcher tests two related samples. This test is
equivalent to paired-t test. Researchers use sign test when data is given as pairs.
In this test, the researcher provides positive (+) and negative (–) signs to values.
These signs are allocated on the basis of the difference between the values of first
sample and second sample. If the difference is positive, the difference value gets a
plus (+) sign and if the difference is negative, the difference value gets a minus (–)
sign. If the values of two samples are equal, these values are discarded.
Thereafter, the researcher calculates the total plus and minus signs and divides
the number by the sample size. Then, standard error is calculated and limits are
determined. Finally, the hypothesis is tested against the calculated value of limit.
Let us understand the application of the two sample sign test with the help of an
example.
247
Research Methodology and Management Decision
The researcher wants to find out whether the first employee is the better performer.
Solution: Null hypothesis (H0) and alternate hypothesis (H1) are as follows:
IT H0: p = 1/2
Or
H1: Sale done by the first employee is more than that of the second employee.
M
The researcher assigns the plus (+) and minus (–) signs to the data shown in
Table 3:
248
Non-Parametric Tests
No. of + signs = 6
No. of – signs = 4
6−5
Z=
10 / 4
1
M
=Z = 0.632
1.581
The value of Z at 0.05 level of significance is +1.645. Since Z = 0.632 and it lies in the
acceptance region; null hypothesis is accepted. This implies that the median sale
done by two employees is equal.
249
Research Methodology and Management Decision
Mean, µT = n (n + 1)/4
T − µT
Z=
σT
If the calculated z-value lies under the limits of acceptance region, the null
IT hypothesis is accepted and the alternate hypothesis is rejected.
Let us understand the application of the Wilcoxon matched pairs test/signed rank
test with the help of an example.
Example 3: Two brands are ranked on a five-point scale (five being the highest).
The researcher wants to determine the difference between the satisfaction levels
of customers for two brands. The data for the Brand A and Brand B and their
difference is provided in Table 4:
M
Table 4: Rating Given by Customers to Brand A and Brand B
No. of Respondents Brand A Brand B Difference(di)
1 2 2 0
2 3 4 –1
3 4 3 1
4 1 2 –1
5 2 5 –3
6 5 4 1
7 4 2 2
8 3 4 –1
9 4 3 1
10 5 4 1
11 2 4 –2
12 4 3 1
250
Non-Parametric Tests
1 2 2 0 0 0 - -
2 3 4 -1 1 – 4.5 – 4.5
3 4 3 1 1 + 4.5 + 4.5
4 1 2 -1 1 – 4.5 – 4.5
5 2 5 -3 3 – 11 – 11
IT 6
8
5
3
4
4
1
-1
1
1
+
–
4.5
9.5
4.5
+ 4.5
+ 9.5
– 4.5
9 4 3 1 1 + 4.5 + 4.5
10 5 4 1 1 + 4.5 + 4.5
11 2 4 -2 2 – 9.5 – 9.5
M
12 4 3 1 1 + 4.5 + 4.5
In this case, the researcher has neglected the first observation, as it is 0. The ranking
of difference is done from smaller to larger value. If there is a tie between the
ranks, the mean of ranks is taken and assigned to identical values. The T statistic
is equal to 32, which is the smallest value between the ranks with positive signs
and negative signs. The T-value, with 5% level of significance and two-tailed test,
is ± 1.96.
251
Research Methodology and Management Decision
Acceptance Region
Figure 4 shows that the calculated Z-value lies in the acceptance region; therefore,
IT H0 is accepted. This implies that customer satisfaction for two brands is same.
252
Non-Parametric Tests
6 ∑ d 2i
ρ = 1 –
(
n n2 − 1 )
Where, di= difference between ranks
n = sample size
The value of Spearman’s rank correlation coefficient lies between +1 and –1,
where +1 indicates perfect positive correlation and –1 indicates perfect negative
correlation. The values that lie between +1 and –1 show different degrees of
correlation. The researcher can assess the value of rank correlation coefficient by
performing a hypothesis test. If the sample size is less than 30, the researcher needs
to use the tabulated value of Spearman’s rank correlation coefficient to test the
value of coefficient. Suppose, the sample size (n) = 15 and σr = 0.6364, which shows
a reasonably high degree of correlation between two data sets. The researcher
wants to check the value of σr (rank correlation coefficient) to judge whether the
correlation is actually present or not. He/She forms a null hypothesis that there
IT
is no correlation between the two data sets and tests it at 5% level of significance
using two-tailed test. The researcher checks the critical value for ρ in the table
showing values of Spearman’s rank correlation coefficient. The critical value of ρ
is – 0.5179 (lower limit) and + 0.5179 (upper limit). The given value of ρ = 0.6364 is
outside the acceptance region; therefore, the researcher rejects the null hypothesis
and concludes that there is a correlation between two data sets.
Let us understand the application of Spearman’s rank correlation coefficient with
the help of an example.
M
Example 4: A researcher wants to test correlation between the IQ level and hours
spent in studying newspaper per week. The data is provided in Table 6:
253
Research Methodology and Management Decision
Use rank correlation to find out correlation between the IQ level and hours spent
on reading newspaper, with 5% level of significance.
H0: There is no correlation between the IQ level and hours spent on reading
newspaper every week.
H1: There is correlation between the IQ level and hours spent on reading newspaper
every week.
Or
H0: ρ = 0
H1: ρ ≠ 0
1 105
in Reading
Newspaper(Y)
6
X
6
Y (di=X-Y)
13 -7 49
2 91 7 13 12 1 1
3 99 24 9.5 4 5.5 30.25
4 100 56 8 1 7 49
5 99 29 9.5 3 6.5 42.25
6 103 30 7 2 5 25
M
7 97 20 11 5 6 36
8 113 12 1 8 -7 49
9 112 10 2.5 9 -6.5 42.25
10 110 17 4.5 6 -1.5 2.25
11 94 16 12 7 5 25
12 110 8 4.5 11 -6.5 42.25
13 112 9 2.5 10 -7.5 56.25
Total ∑di2 = 449.5
6 ∑ d 2i
ρ=1–
(
n n2 − 1 )
ρ = 1 – {6 × 449.5/[13(13 × 13 – 1)]}
ρ = 1 – [2697/2184]
ρ = 1 – 1.235 = –0.235
254
Non-Parametric Tests
The calculated rank correlation value lies in the acceptance region; therefore, H0 is
accepted. This implies that there is no correlation between the IQ level and number
of hours spent on reading newspaper in a week. It can be interpreted that reading
newspaper cannot increase your IQ level unless you analyse news.
255
Research Methodology and Management Decision
The mean and SD are determined to calculate the limits of acceptance region. The
mean could be calculated with the help of following formula:
μU = n1 n2/2
IT Where, n1= sample size of sample 1
n 1 . n 2 n 1 + n 2 +1
σU =
12
M
If the value of U test lies under the limits of the acceptance region, the null
hypothesis is accepted. However, if the calculated U value lies outside the limits of
the acceptance region, the null hypothesis is rejected and the alternate hypothesis
is accepted. Let us take an example to understand the application of the Mann-
Whitney test.
256
Non-Parametric Tests
The researcher wants to find out that the two products are from the same production
house. Use the Mann-Whitney test (or U test) with 10% significance level.
The researcher merges the data of two products and arranges it in the increasing
order. Thereafter, he/she calculates R1 and R2 for Products A and B, respectively,
as shown in Table 9:
IT S. No.
Table 9: Calculation for Mann-Whitney Test
Product A Rank Product B Rank
1 14.5 7.5
2 11.5 9.5
3 1 11.5
4 13 14.5
M
5 3 18.5
6 5.5 2
7 18.5 5.5
8 23.5 7.5
9 16.5 9.5
10 20 16.5
11 21 23.5
12 4 22
R1 = 152 R2 = 148
12 (13)
U11 =
152 − 74
=
2
12 (13)
UU21 =
148 − 70
=
2
257
Research Methodology and Management Decision
Therefore, U = 70
nn1nn2 12 × 12
U =1=
µU
µ= = 72
22 2
n 1nn1n2 2( n
( n11++nn22 ++11)) 12 × 12 (12 + 12 + 1)
SD
=SD =σσ
UU=
= =
12
12 12
SD
=SD 5=
12 17.3
Uα = 42
Since U is greater than Uα, the researcher rejects H0. This implies that Products A
and B are from different production houses.
Ri= sum of the ranks of all the samples separately that is R1, R2, and……………. , ni
= n1, n2, n3,……………
Chi-square value is determined at d.f. k–1 and the specified level of significance
and the calculated H value is tested against it. If the H value lies under the limits
of acceptance region, the researcher accepts the null hypothesis and rejects the
alternate hypothesis. However, if the H value lies outside the limits of acceptance
region, the researcher rejects the null hypothesis and accepts the alternate
hypothesis.
Let us understand the application of the Kruskal-Wallis test with the help of an
example.
258
Non-Parametric Tests
put through a series of tasks and rated using a standardised test. The high score
indicates better performance. The score given by technicians to four machines are
shown in Table 10:
Perform the Kruskal-Wallis test to establish whether all four machines are equally
good. Use 5% level of significance.
First, the researcher merges performance data for the four machines and arranges
it in increasing order. Thereafter, he/she ranks the data and classifies ranks as
R1, R2, R3 and R4 for machines 1, 2, 3 and 4, respectively. Finally, the researcher
takes out the total of ranks in R1, R2, R3 and R4. The calculation is shown in Table 11:
M
Table 11: Allocation of Ranks to Scores Provided to Four Machines
259
Research Methodology and Management Decision
After that, ranks are classified as R1, R2, R3 and R4 for machine 1, 2, 3 and 4,
respectively, as shown in Table 12:
23
3
2
28
34
8.5
16
26
28
5.5
8.5
33
37
15
19
26 5.5 29 10.5 31 13 36 18
27 7 32 14 25 4 35 17
29 10.5 30 12 21 1 38 20
M
Total 28 61 32 89
Where, n = 20
R1 = 28 R2 = 61 R3 = 32 R4 = 89
= 13.85
d. f. = k – 1
=4–1
=3
260
Non-Parametric Tests
Chi-square value at 5% level of significance and 3 d.f. is 7.815. You can check the
value for significance with the help of one-tailed test. The graphical representation
of the preceding solution is given in Figure 6:
Acceptance Region
Rejection Region
7.815 13.85
Figure 6 shows that the calculated chi-square value lies in the rejection region;
therefore, H0 is rejected and H1 is accepted. This implies that all four machines
are not equally good. It can be interpreted that all four machines have different
IT capabilities and machine number 4 is the best, as its score (89) is the highest.
(O − Ei )
2
k
χ =∑
2 i
i =1 Ei
Where, Oi = Observed frequency
Ei = Expected frequency
Expected frequency can be calculated with the help of the following formula:
If the value of chi-square is greater than critical value of χ2, null hypothesis is
rejected.
261
Research Methodology and Management Decision
Figure 7 shows two types of chi-square tests that are mainly used to find out the
association between variables:
Chi-square Tests
Let us discuss Chi-square test for goodness of fit and chi-square test for
independence in detail.
Thereafter, he/she calculates chi-square value with the formula used to calculate
chi-square. In chi-square test, d.f. used is n–1. Chi-square value is determined at
the specified level of significance and d.f. If the calculated chi-square value lies
under the limits of acceptance region, the researcher accepts the null hypothesis
M
and rejects the alternate hypothesis.
Let us understand the application of chi-square test with the help of an example.
Product A 300
Product B 280
Product C 220
Product D 200
262
Non-Parametric Tests
Test the hypothesis that the customers have no preference for any particular
product. Use 5% level of significance.
Solution: For H0: Customers have no preference for any particular product
k
(Oi − Ei ) 2
χ2 =∑
i =1 Ei
IT
χ2 =
(300 − 250) 2 (280 − 250) 2 (220 − 250) 2 (200 − 250) 2
250
+
250
+
250
+
250
6800
χ2
= = 27.2
250
M
Critical Value of χ2 at 5% level of significance with 3 degrees of freedom (k-1 = 3) is
7.81. Since our χ2 is greater than the critical value, we reject H0.
Let us understand the application of chi-square test with the help of an example.
263
Research Methodology and Management Decision
Example 8: The researcher has the data for the preferences of men and women
regarding the joint and nuclear families, as shown in Table 15:
Table 15: Data for Preferences of Men and Women for Joint and Nuclear Families
Joint Family Nuclear Family Total
Men 96 35 131
Women 170 360 530
Total 266 395 661
The researcher wants to find out whether the opinion of men and women about
the type of family is same. Use 5% level of significance.
H0: The opinion of men and women about the type of family is indifferent.
H1: The opinion of men and women about the type of family is different.
The test statistic used for this data is chi-square test for independence. The
following equation is used for calculation:
IT χ2 =∑
k
i =1
(Oi − Ei ) 2
Ei
Ei = Expected frequency
Expected frequency can be calculated with the help of the following equation:
M
Ei = Row total*Column total/Grand total
In the current scenario, expected frequency can be calculated using the following
method:
After calculating expected frequency and the square of differences between the
observed and expected frequency, Table 16 is created:
264
Non-Parametric Tests
d.f. = (r – 1) (c – 1)
= (2 – 1) (2 – 1) = 1
Chi-square value at 5% level of significance with one-tailed test and 1 d.f. is 3.841.
You can check the chi-square value for significance with the help of one-tailed test.
The graphical representation of the preceding solution is shown in Figure 8:
IT Acceptance Region
Rejection Region
+3.841 74.16
M
Figure 8: Rejecting Chi-square Value
Figure 8 shows that the value lies in the rejection region; therefore, H0 is rejected.
The value implies that there is a vast difference between the opinions of men and
women about the type of family.
11.7 SUMMARY
A researcher can use non-parametric tests without taking into consideration
population distribution and sample type. Non-parametric tests are also
known as distribution-free tests.
Sign test is considered as one of the easiest non-parametric tests because it
takes into account only the plus and minus signs of observations in a sample.
265
Research Methodology and Management Decision
One sample sign test is applied on a single sample taken from a symmetrical
population.
Two sample sign test is used to check whether two samples are related to
each other. It is also known as paired sign test.
Rank correlation, also known as Spearman’s rank correlation coefficient, is
used to establish correlation between two data sets that can be ranked.
Rank sum test is used to analyse ordinal data (in the rank form) and calculate
the value of rank sum statistics. To conduct this test, observations need to be
arranged in ascending order.
The Mann-Whitney test (or U test) is used to determine whether two
independent samples are drawn from the same population. It is applied in
general conditions and does not have any specific requirement.
The Kruskal-Wallis test is similar to one-way ANOVA with only one
difference that the former is based on ranks, while the latter is based on
numerical values.
The Wilcoxon matched pairs test/signed rank test is a combination of sign
and rank tests. It is used to compare two paired samples.
IT Chi-square test is used to find out dependency between two types of data.
It can also be used to make comparisons between theoretical population
(expected data) and actual data (observed data).
266
Non-Parametric Tests
267
Research Methodology and Management Decision
Following table shows the preferences of rural and urban customers for two types
(branded and local) of generators:
IT Preferences of Rural and Urban People for Local and Branded Generators
Top Competitors Local Marketers Total
Rural Market 100 150 250
Urban Market 120 99 219
Total 220 249 469
The researchers concluded that the rural market is widely different from the urban
M
market. In addition, the efficiency of generators produced by top competitors is
almost same as those produced by local companies. Therefore, the generators
produced by top competitors, to fulfil demand from urban marketers, can also
be introduced in the rural market. The two marketleaders should market their
products very effectively in the rural market to capture the market share. Local
marketers have the first mover advantage in the rural market. The market leaders
can make slight changes in their generators to improve their capacity and promote
their products as specifically designed for the rural market.
QUESTIONS
1. What are the two research topics identified by researchers?
(Hint: Studying the requirements of rural market in terms of technical
feasibility and consumer preferences)
2. What is the conclusion given by researchers in the case study?
(Hint: The researchers concluded that the rural market is widely different
from the urban market)
268
Non-Parametric Tests
11.10 EXERCISE
1. Explain the concept of non-parametric test.
2. Describe the rank correlation with the help of example.
3. Explain the types of sign tests.
4. Discuss the concept of U test with the help of a diagram.
5. Write short notes on:
a. Chi-square test
b. Wilcoxon matched pairs/Signed rank test
SUGGESTED BOOKS
Biddle, J., & Emmett, R. Research in the History of Economic Thought and
Methodology
Chandra, S., & Sharma, M. Research Methodology
National Academies Press. (2009). Partnerships for Emerging Research
Institutions. Washington, D.C.
E-REFERENCES
Research Guides: Organising Your Social Sciences Research Paper: 6. The
Methodology (2018); Retrieved from http://libguides.usc.edu/writingguide/
methodology
269
Research Methodology and Management Decision
IT
M
270
CHAPTER
12
REPORT WRITING
Table of Contents
IT
Learning Objectives
12.1 Introduction
12.2 Research Proposal
Self Assessment Questions
12.3 Research Report
12.3.1 Written Report
M
12.3.2 Oral Presentations
Self Assessment Questions
12.4 Integral Parts of a Report
Self Assessment Questions
12.5 Summary
12.6 Key Words
12.7 Case Study
12.8 Exercise
12.9 Answers for Self Assessment Questions
12.10 Suggested Books and e-References
L E A R N I N G O B J E C T I V E S
IT
After studying this chapter, you will be able to:
Explain the concept of report proposal
Describe the research report
Outline the importance of written reports
Explain the concept of oral presentation
Discuss the integral parts of a report
M
Report Writing
12.1 INTRODUCTION
In the previous chapter, you studied about model building and decision making.
Now, you will study about report writing.
Report writing is a process to document each and every step involved in the
research process. These steps are Introduction, Literature Review, Methodology,
Data Analysis and Interpretation, Conclusion and Recommendations. It helps the
researcher in checking whether the research is progressing in the right direction
or not. A research report serves as reference for findings and recommendations
of a research in future. The research report consists of a written report and an
oral presentation. The written report states objectives, data, research methodology
and findings. The oral presentation helps the target audience in judging whether
research recommendations are feasible to address the research problem or not.
RESEARCH PROPOSAL
Submitted to
Sales Manager: Vikas Kumar
Submitted by
IT Manali Batra, Senior Researcher
MSD Research Institute
Name: Manali Batra
Designation: Senior Researcher
Location of the work: Max New York Life, Elegance Tower, JasolaVihar
Working days: Monday to Saturday
M
Working hours: 9.30 am to 5.00 pm
Contact number: +919XXXXX69
Time Frame for the Project: 2 months
Expected Cost of the Project: ` XXX thousand
(This includes the cost of project designing, traveling, administration and
reporting.)
Name of the Reporting Officer: Mr. Vikas Kumar
Designation: Sales Manager
Contact Number: +919XXXXX66
Title of the Project
Comparative analysis of Max New York Life (MNYL) and HDFC Life Insurance:
A detailed study on MNYL
Objectives
To study and compare the sales process of MNYL and HDFC
To study the policies and products of MNYL and HDFC
274
Report Writing
™ Sample population – Customers of MNYL and HDFC
4. Tools
™ Excel
™ SPSS
Importance of the Research Work
This study will help us in determining the sales process, products and policies of
M
MNYL and HDFC. It will also shed light on the impact of advertisement on the
sale of insurance companies.
In addition, the study will help us in comparing MNYL and HDFC to know
which one is doing well in the market and satisfying its customers.
Expected Outcomes
This study covers data analysis of MNYL and HDFC for only a limited period
of time from the financial year 2014–15 to 2018–19. Hence, the results are
comparable and representative for this period only.
275
Research Methodology and Management Decision
Written Report
Types of Research
Report
IT Oral Presentation
The written report is an official document giving the facts and information to the
interested readers in a presentable manner. The facts must be accurate, complete
and interpreted. The oral report, on the other hand, is a piece of face to face
communication presenting one’s research work in a seminar, workshop, etc. It
M
helps the researcher to present his/her views more clearly in front of research
stakeholders. Since the reporter has to interact directly with the audience, any
faltering during oral presentation can leave a negative impact on the audience.
However, an oral report helps the researcher to gather valuable suggestions and
feedback from the research stakeholders. As compared to an oral report, a written
report is a permanent record that can be used for reference again and again. Let us
discuss about the written reports and the oral presentations in detail.
276
Report Writing
Audience of a Report
As already discussed, there are two parties involved in a research – the first party
wants the research to be conducted and the second party conducts the research. The
first party is called audience. The researcher should tailor the writing of research
report towards the specific requirements of the target audience. The length and
composition of a research report and the details provided in it vary as per the
target audience. This happens because organisations differ from one another in
significant ways.
The researcher should adapt his writing successfully to three types of audiences
that requires different techniques:
High-tech peers: The research report should make use of the most
professional/complex resources, along with writing of jargon and technical
terms, keeping in mind the expert level knowledge of the audience.
Low-tech peers: The research report should provide proper definitions for
all the abbreviations/acronyms/technical terms used throughout the writing.
This would enhance understanding where it is a mixture of laymen and
professionals.
Lay readers: The research report should use simple terms that are a lot
IT easier to understand and interpret. There should be no use of abbreviations/
acronyms.
277
Research Methodology and Management Decision
278
Report Writing
6. Making the final draft: At this stage of report writing, the researcher gives
a final touch to his/her report. The final report is prepared keeping in mind
the objective of the research. It should be simple, concise and convincing. At
this stage, it is checked whether all the portions of the research are covered
or not.
279
Research Methodology and Management Decision
280
Report Writing
6. The ____________ contains the name of the sponsor of the research, the
name of the researcher and duration of the research.
S elf
A ssessment 7. Bibliography contains the sources of secondary data while appendices
Q uestions contain the sources of primary data or some extra information about the
research topic. (True/False)
8. ____________ contain the questionnaire or other sources of acquiring
data.
281
Research Methodology and Management Decision
12.5 SUMMARY
A research proposal is an agreement between two parties. The first party
wants the research to be conducted and the second party actually conducts
the research.
The research proposal includes purpose, population, research design,
methods of data collection, tests of significance, time frame and budget.
A research report is a crucial part of a research as it includes solutions and
actionable recommendations of the research problem.
A research report can be of two types, namely written report and oral
presentation.
Broadly, reports are classified into two types – technical report and popular
report.
Technical report lays emphasis on the method employed in conducting
research, assumptions made during the research, details about the research
topic and the research findings and recommendations.
Popular report is non-technical in nature and is less comprehensive as the
IT
audience of this report is interested in knowing the results of the research,
not the entire analysis.
The report writing process involves sequential steps, which are analysing the
subject matter, drawing the outline of the report, preparing the rough draft,
reviewing the rough draft and preparing bibliography.
Oral presentations are given with the help of PowerPoint software, which
facilitate data presentation in the form of graphs and charts.
A research report contains many sections that provide segregated research
M
information, which are title page, preliminary pages, executive summary,
introduction and objective, body of the report, findings, conclusion,
recommendations and bibliography and appendices.
Rough draft: The stage in which the researcher starts writing a report.
282
Report Writing
the market and the ways to enter the new market. The researchers prepare the
following report proposal to be submitted to the company:
RESEARCH PROPOSAL
Submitted to
Manager S.R. Dicosta
Submitted by
Veera Malhotra
Senior Researcher
RPS Research Institute
Name: Veera Malhotra
Designation: Senior Researcher
Location of the work: New Delhi
Working Days: Monday to Saturday
Working Hours: 9.30am to 5.00pm
IT
Contact Number: +919XXXXX69
Time Frame for the Project: One month
Name of the Reporting Officer: Mr. S.R. Dicosta
Designation: Sales Manager
Contact Number: +919XXXXX66
Title of the Project
M
Study of Paint Industry in Delhi
Objectives
The objectives of the study are as follows:
To study the paint industry in Delhi
To determine the prospective customers of PR Paint
To compare the paint industry and the present industry of ABC Company
To give recommendations about the launch of PR Paint
Methodology
The research methodology of this project consists of:
1. Research Design
™ Descriptive research design
™ Hypothesis testing
2. Data collection
™ Primary data: Questionnaire and In-depth interviews
283
Research Methodology and Management Decision
Research Report
Study of Paint Industry in Delhi
Findings of the study
The major findings of the research are as follows:
The paint industry in Delhi is very large.
The paint market is highly competitive because a huge variety of paints
with new colour combinations are available. However, there is a scope to
enter in the paint market with new and innovative ideas.
Customers always want to use new colours for their offices and houses.
Recommendations of the Study
Some recommendations made to ABC Company are as follows:
Introduction of a new product, PR Paint, in the new market could be a
good decision.
284
Report Writing
QUESTIONS
1. Which type of report is used in the case study?
(Hint: Popular report is used in the case study.)
285
Research Methodology and Management Decision
12.8 EXERCISE
1. Explain the research proposal.
2. What do you meant by research report?
3. Explain the concept of written report in detail.
4. Discuss the integral parts of a report.
30 mins
Integral Parts of a Report 6. title page
7. True
8. Appendices
M
12.10 SUGGESTED BOOKS AND E-REFERENCES
SUGGESTED BOOKS
Biddle, J., & Emmett, R. Reserach in the history of economic thought and
methodology.
Chandra, S., & Sharma, M. Research methodology.
National Academies Press. (2009). Partnerships for emerging research institutions.
Washington, D.C.
E-REFERENCE
Research Guides: Organising Your Social Sciences Research Paper: 6. The
Methodology. (2018). Retrieved from http://libguides.usc.edu/writingguide/
methodology
Research Methodology. (2018). Retrieved from https://explorable.com/
research-methodology
Research Methodology. (2018). Retrieved from https://books.google.com/
books/about/Research_Methodology.html?id=x_kp__WmFzoC
Research Methods. (2018). Retrieved from https://research-methodology.net/
research-methods/
286