0% found this document useful (0 votes)
28 views148 pages

Research Methodoly 151 298

The document discusses research methodology, focusing on data processing and analysis, particularly in the context of questionnaire design. It outlines the steps involved in data processing, including editing, coding, classification, data entry, and tabulation, as well as various statistical measures for data analysis such as central tendency and correlation. The importance of careful data handling to ensure accurate research findings is emphasized throughout the chapter.

Uploaded by

Karishma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views148 pages

Research Methodoly 151 298

The document discusses research methodology, focusing on data processing and analysis, particularly in the context of questionnaire design. It outlines the steps involved in data processing, including editing, coding, classification, data entry, and tabulation, as well as various statistical measures for data analysis such as central tendency and correlation. The importance of careful data handling to ensure accurate research findings is emphasized throughout the chapter.

Uploaded by

Karishma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 148

Research Methodology and Management Decision

€€ Wilson A., Williams M., Hancock B. Oxon: Radcliffe Medical Press Ltd; 2000.
Research Approaches in Primary Care.
€€ McColl E., Thomas R. London: Royal College of General Practitioners; 2000.
The Use and Design of Questionnaires.
€€ Edwards P., Roberts I., Clarke M. et al. Increasing response rates to postal
questionnaires: systematic review. BMJ. 2002 May 18;324((7347)):1183.
€€ Leung W.C. How to design a questionnaire. Student. BMJ. 2001;9:187–9.
€€ Kothari C.R. Research Methodology. Revised 2nd edition. New Age
International Publishers.

E-REFERENCES
€€ www.pewresearch.org/methodology/u-s-survey-research/questionnaire-
design
€€ https://canadabusiness.ca/business-planning/market.../designing-a-
questionnaire
€€ https://www.managementstudyguide.com/questionnaire-design.htm
IT €€ https://www.simplypsychology.org/validity.html
M

140
CHAPTER

8
DATA PROCESSING AND ANALYSIS

Table of Contents
IT
Learning Objectives
8.1 Introduction
8.2 The Concept of Data Processing
8.2.1 Editing
8.2.2 Coding
8.2.3 Classification
M
8.2.4 Data Entry
8.2.5 Tabulation
Self Assessment Questions
8.3 The Concept of Data Analysis
Self Assessment Questions
8.4 Measures of Central Tendency
8.4.1 Mean
8.4.2 Median
8.4.3 Mode
Self Assessment Questions
8.5 Measures of Dispersion
8.5.1 Range
8.5.2 Mean Deviation
8.5.3 Standard Deviation
Self Assessment Questions
8.6 Measure of Skewness
Self Assessment Questions
8.7 Measures of Relationship
8.7.1 Correlation Analysis
8.7.2 Causal Analysis
Self Assessment Questions
Table of Contents
8.8 Different Charts Used in Data Analysis
Self Assessment Questions
8.9 Summary
8.10 Key Words
8.11 Case Study
8.12 Exercise
8.13 Answers for Self Assessment Questions
8.14 Suggested Books and e-References

IT
M
L E A R N I N G O B J E C T I V E S
IT
After studying this chapter, you will be able to:



Explain the concept of data processing
Describe the concept of data analysis
 Discuss the measures of central tendency
 Explain the measures of skewness
 Discuss the measures of relationship
 Describe the various charts used in data analysis
M
Research Methodology and Management Decision

8.1 INTRODUCTION
In the previous chapter, you studied about questionnaire designing. Now, you will
learn the significance and ways of processing and analysing data retrieved from
such questionnaires.

Data in its raw form does not convey any useful information. It needs to be
organised properly to extract relevant information and make it fit for research.
This is done with the help of data processing that involves various steps, including
editing, coding, classification, data entry and tabulation.

After processing data, you need to analyse it to find answers to the research
problem. You can use various statistical measures, such as the measures of central
tendency, dispersion, skewness and relationship to analyse data. The selection of
a measure depends upon the type of the research problem. For example, if you
wish to find out the average marks of students of class IX in English, then you
would use the measures of central tendency. However, if you want to know the
relationship between the eating habits of children and problems of obesity, then
you would use the measures of relationship. It is important to note that no single
statistical measure is complete in itself to analyse a data series. Therefore, you
IT should use an optimum combination of different measures to address the problem
at hand in the most effective manner. Any carelessness in data processing and
data analysis can result in erroneous research findings. Moreover, these data tasks
form a major part of research and consume considerable time and effort of the
researcher. Therefore, it is advisable to remain extra vigilant while processing and
analysing data for making the research as authentic as possible.

The chapter begins by explaining the concept of data processing and data analysis.
Next, it talks about the measures of central tendency, including mean, median,
M
and mode. Information is also provided about the measures of dispersion and
the measures of skewness. It also explains the measures of relationship, including
correlation analysis, regression analysis and multiple regression. Towards the end,
the chapter discusses other statistical measures used for data analysis.

8.2 THE CONCEPT OF DATA PROCESSING


Data processing is a process of converting raw data (quantitative or qualitative)
into a form, which is fit for analysis. The process involves various steps shown in
Figure 1:

Editing Coding Classification Data Entry Tabulation

Figure 1: Steps of Data Processing

Let us now discuss each step of data processing in the following section.

144
Data Processing and Analysis

8.2.1 EDITING
Editing refers to reviewing the collected data to check whether it is valid or not.
Data is examined to detect errors and omission. Errors are corrected, omitted data
is filled in, and data is prepared for further processing. The data is retained for
analysis.

The editor is responsible for ensuring that the data is accurate, uniform, as complete
as possible and acceptable for tabulation. Editing helps in filtering ambiguous
information that can create a problem at the time of data analysis. Ambiguous
information can be in the form of biased or incorrect responses in a questionnaire
and such information needs to be deleted.

8.2.2 CODING
Coding is the process of providing some codes to the data in the form of symbols,
characters and numbers. It helps the researcher in interpreting the data and
deriving accurate results. If the data is generated with the help of a questionnaire,
it can be coded either at the time of framing the questionnaire or after collecting
the data.
IT
€€

€€
The data that is already coded is known as precoded data.
The data that is coded at the time of data processing is known as postcoded
data.
Generally, a questionnaire may contain the following types of questions:
€€ Interval-scale questions: An interval scale is any range of values that have
a relevant mathematical difference but no true zero. Any question where
the respondent must enter a temperature value is an interval scale question
M
because degrees are interval measurements. The data collected through
interval-scale and closed-ended questions is an example of precoded data.
€€ Closed-ended questions: These questions are those for which a researcher
provides respondents with options from which to choose a response.
€€ Open-ended questions: These questions are those which require more
thought and more than a simple one-word answer. The data collected
through open-ended questions is an example of postcoded data. Apart from
these, the questionnaire can also include questions based on nominal scale,
ordinal scale and ratio scale.
Precoded data has certain advantages over postcoded data:
€€ It is easier to code.
€€ It reduces the effort in data processing.
€€ It leads to fewer chances of human error during data processing.
Let us understand the concept of coding with the help of an example.

The following questionnaire aims to measure the comfort level of women in a


job after marriage. Questions 1 to 5 are Multiple-Choice Questions (close-ended
questions), questions 6 to 14 are interval-scale questions and questions 15–16 are
dichotomous questions.

145
Research Methodology and Management Decision

Questions 1 to 5: Tick all the options that apply to you.

1. Age a. 20-30 b. 30-40 c. 40-50 d. 50 and above


Group
(years)
2. Marital a. Married b. Unmarried c. Divorced d. Please
Status specify…
(for example,
engaged,
widow or
whatever)
3. Children a. None b. One c. Two d. More than two
4. Working a. Working b. Non- c. Retired d. Searching for e. On leave
Status working from the the job
job
5. Work a. Full-time b. Part-time c. Not
Type Applicable

Questions 6 – 14: Give the ratings in the following questions as per your choice. The
rating of 1 means the lowest and 5 means the highest.

6. My work gives me satisfaction more than anything. 1 2 3 4 5


IT 7.
8.
9.
I am able to manage my professional and personal life perfectly.
I go for holidays with my family frequently.
I am able to reach office on time.
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
10. I reach my home on time in the evening. 1 2 3 4 5
11. I complete most of my work projects on time. 1 2 3 4 5
12. I play with my children daily. 1 2 3 4 5
13. I reach home late at nights. 1 2 3 4 5
14. I most often extend the deadline for submission of my projects. 1 2 3 4 5
M
Questions 15 – 16: Answer in Yes or No.

15. I have kept a maid for household work.  Yes  No


16. I have kept babysitters to look after my children.  Yes  No  Not Applicable

8.2.3 CLASSIFICATION
Classification refers to categorising the coded questions into different segments as
per their relevance. This is done to simplify data processing and analysis to a great
extent. It is important to note that variables in a segment possess certain similar
characteristics. For example, demographic information is a segment that includes
variables, such as age, education and work experience of the respondents.
Questions in a questionnaire can be classified into qualitative and quantitative
questions:
€€ Qualitative questions: The classification of qualitative questions is called
statistics of attributes. These attributes cannot be measured directly in
numbers. However, qualitative attributes can be quantified. Examples of
attributes are honesty and attitude of the respondents.
€€ Quantitative questions: The classification of quantitative questions is called
statistics of variables. These variables can be expressed in numeric form,
such as demographic factors including age and income.
146
Data Processing and Analysis

These variables can be grouped in the form of class intervals. A class interval
contains a lower limit and an upper limit. The difference between the two limits
is called class magnitude. For example, in the class interval 25-35, 25 is the lower
limit and 35 is the upper limit.
Class intervals can be inclusive or exclusive.
€€ Inclusive class intervals: If the value of the upper limit is included in the
class magnitude, it is an inclusive class interval. For example, the value
35 would be included in the inclusive class 25-35. Thus, the inclusive class
intervals would be 25-35, 36-45, 46-55, and so on.
€€ Exclusive class intervals: If the value of the upper limit is not included in
the class magnitude, then it is known as an exclusive class interval. For
example, the value 35 would not be included in the class 25-35 but it would
be included in group 35-45. Thus, the exclusive class intervals would be 25-
35, 35-45, 45-55, and so on.
Another important term to remember during classification is frequency.
Frequency is the number of occurrences of a repeating event per unit of time.
Table 1 shows the number of respondents in each age group:
IT 25-35
Table 1: Frequency Distribution
Age Group (Class Interval) Number of Respondents
10
35-45 4
45-55 7
55-65 2

In Table 1, 10 respondents are in the age group of 25-35. Thus, 10 is the frequency
M
of the class interval 25-35. When class intervals and frequencies are represented
in a tabular form, as in Table 1, such a representation is known as frequency
distribution.

8.2.4 DATA ENTRY


After classifying data, the researcher enters data in the computer. If wrong data is
entered, then the result would be inaccurate.
There are various statistical or database management software for data entry, such
as:
€€ Bio Medical Data Package (BMDP)
€€ Statistical Programming Language (S-PLUS)
€€ Statistical Analysis System (SAS)
€€ Statistical Package for Social Sciences (SPSS)
Out of all this software, SPSS is widely used by researchers for data entry.

8.2.5 TABULATION
Tabulation refers to presenting data in the form of a table so that it can be
easily analysed. In this stage, the frequencies of the dataset are also computed.

147
Research Methodology and Management Decision

There are three types of frequencies, namely absolute frequency, relative frequency
and cumulative frequency.
€€ Absolute frequency is the exact frequency given by the respondents.
€€ Relative frequency is calculated with relation to the frequency of the other
class intervals. It is the percentage of all respondents who have given a
particular response.
€€ Cumulative frequency is the percentage of all respondents who have given
a response equal or less than a particular value.
There are two types of frequency distributions, which can be put into a tabular
form:
1. Two-way frequency distribution: In this type of frequency distribution,
two variables can be analysed at a time. This frequency distribution is also
known as cross tabulation.
2. One-way frequency distribution: In this type of frequency distribution, a
single variable is analysed.
Table 2 shows an example of the one-way frequency distribution.
IT Age Group (Class
Table 2: One-Way Frequency Distribution
Number of Persons Relative Cumulative
Interval) (Frequency or Frequency Frequency
Absolute Frequency)
20-30 10 17.86 17.86
30-40 14 25.00 42.86
40-50 20 35.71 78.57
M
50 and above 12 21.43 100.00
Total 56 100 100

In Table 2, age group is taken as a variable and different types of frequencies are
calculated. As already discussed, absolute frequency is the precise frequency
given by the respondents. Relative frequency can be calculated by dividing the
absolute frequency with the total frequency. For example, in case of the 20-30 age
group, absolute frequency is 10 and the total frequency is 56; therefore, the relative
frequency is 17.86 (10/56×100). Cumulative frequency can be calculated by adding
up the relative frequency of the present class interval (whose cumulative frequency
we are calculating) and the relative frequency for the following class interval. For
example, in case of the 20-30 and 30-40 age groups, the relative frequencies are
17.86 and 25.00, respectively. Therefore, the cumulative frequency in the case of
the 30-40 age group is 42.86 (17.86 + 25.00).

1. ________________ helps in filtering ambiguous information that can


create a problem at the time of analysis.
S elf
A ssessment 2. Ambiguous information can be in the form of biased or incorrect
Q uestions responses given by the respondents. (True/False)

148
Data Processing and Analysis

3. The data that is coded at the time of data processing is known as


_______________ data.
4. Data entry refers to presenting data in the form of a table so that it can be
easily analysed. (True/False)

8.3 THE CONCEPT OF DATA ANALYSIS


After processing data, a researcher analyses it to retrieve meaningful information.
Data analysis is broadly classified into two types, as shown in Figure 2:

Univariate Analysis

Descriptive Analysis Bivariate Analysis


Types of Data Analysis

IT Multivariate Analysis

Parametric Test

Inferential Analysis

Non-Parametric Test
M
Figure 2: Types of Data Analysis

Let us now discuss each type:


€€ Descriptive analysis: In this type of data analysis, the distribution patterns
and characteristics of different types of variables are analysed. There are
three types of descriptive analysis:
i. Univariate analysis: This analysis studies a single variable. Examples
include measures of central tendency, dispersion and skewness. However,
sometimes these measures can also be used for bivariate and multivariate
analysis.
ii. Bivariate analysis: In this analysis, two variables are studied. One
variable can be classified as independent and the other as dependent.
Examples are rank correlation, simple correlation and simple regression.
iii. Multivariate analysis: In this analysis, more than two variables are
studied. Among the variables being studied, there can be more than two
independent variables and more than one dependent variable. Examples
include multiple correlations and regressions.

149
Research Methodology and Management Decision

€€ Inferential analysis: In this type of data analysis, significance tests are used
to check the validity of a hypothesis for studying a problem. There are two
types of significance tests:
i. Parametric tests: These tests make assumptions about the parameters of
the population from which a sample is derived. Examples of parametric
tests include z-test and t-test.
ii. Non-parametric tests: These tests do not make any assumptions about
the parameters of the population from which the sample is derived. An
example of a non-parametric test is the Kruskal Wallis test.

This chapter describes only descriptive analysis. Inferential analysis will be


N ote covered in later chapters.

5. Simple regression is which type of data analysis?


S elf a. Univariate analysis b. Bivariate analysis
ITA ssessment
Q uestions c. Multivariate analysis d. Inferential analysis
6. Descriptive analysis uses tests of significance to check the validity of a
hypothesis for studying a problem. (True/False)

8.4 MEASURES OF CENTRAL TENDENCY


The measures of central tendency are used to study the distribution pattern of a
M
dataset. These measures give a central value that represents the large chunk of
data analysed. The central value is nothing but the average of data collected.

Figure 3 displays the various measures of central tendency:

Weighted Mean
Measures of Central Tendency

Mean Geometric Mean

Median Harmonic Mean

Mode

Figure 3: Measures of Central Tendency

Let us now discuss each measure.


150
Data Processing and Analysis

8.4.1 MEAN
Mean represents the value calculated after dividing the sum of observations by the
total number of observations (n) taken. It is also known as arithmetic mean.

The following formula is used to calculate mean:

Mean (X) = X = ∑Xi/n

Where, X = Symbol for mean


∑Xi = Sum of all observations

Xi: X1, X2,………., Xn

n = Number of observations

Let us understand the concept of arithmetic mean with the help of an example.

Suppose you want to find the average weight of a group of five friends. Table 3
shows the weight of each person in the group:
IT People
Table 3: Weight of Five Friends

Weight (kg)
Jenny 35

Robert 40
Ella 34
Andy 39
M
Eliza 42

The average weight of five friends can be calculated as follows:

X = ∑Xi/n

Where, X = Average weight of five friends


∑Xi = Sum of the weights of five friends

∑Xi = 190

n=5

X = (35 + 40 + 34 + 39 + 42)/5

X = 190/5

X = 38 kg

Therefore, the average weight of five friends is 38 kg.

151
Research Methodology and Management Decision

You can calculate different types of mean:


€€ Weighted mean: This mean is calculated after considering the weight
attached to each item. The formula used to calculate weighted mean is as
follows:
Weighted Mean (Xw) = ∑ WiXi/wi
Where, Xw = Symbol for weighted mean
Xi = Value of the ith item
Wi = Weight assigned to the ith item
wi = Number of weights assigned
Example of Weighted Mean
A school grades its students by using weighted mean scores as follows:
15% weightage is assigned for homework, 15% weightage is assigned for
extracurricular activities, and 70% weightage is assigned for the examination.
Aditya scored 60 marks, 70 marks and 55 marks for homework, extracurricular
activities and in examination respectively. Find the weighted score of Aditya
if the total score is 100.
IT Now, we calculate the weighted mean as follows:
Weighted Mean (Xw) = (0.15 × 60) + (0.15 × 70) + (0.70 × 55)

=9 + 10.5 + 38.5

= 58
€€ Geometric mean: Geometric mean represents the nth root of the product of
all the values or observations involved in a research. The formula used to
M
calculate geometric mean is as follows:

Xxg= n ( x1 )( x2 ) ( x3 )...( xn )

Where X1, X2 …………,Xn are the n observations in the data set


n = Number of observations
Example of Geometric Mean
You want to calculate the geometric mean of four observations: 10, 12, 10
and 11.
The calculation of geometric mean is shown as follows:
X1 = 10, X2= 12, X3= 10, X4 = 11
n = 4

Xg = 4 X1 × X 2 × X 3 × X 4
Xg = 4
10 × 12 × 10 × 11
=X 13200 10.718
4
=
g
Therefore, the geometric mean of four observations is 10.7 years.

152
Data Processing and Analysis

€€ Harmonic mean: Harmonic mean refers to reciprocal of the average of the


reciprocals of the values in a data series (or observations). The formula to
calculate harmonic mean is as follows:
Harmonic mean (XH) = Rec. [(Rec. X1 + Rec. X2 +…………. + Rec. Xn)/n]
Where Rec. X1, Rec. X2 …. Rec. Xn are the Reciprocal of Observations 1, 2, .....,
n, respectively
n = Number of observations
Example of Harmonic Mean
Calculate the harmonic mean of four observations: 10, 12, 10 and 11.
Harmonic mean is calculated as:
(XH) = Rec. [(Rec. X1 + Rec. X2 + …………. + Rec. X4)/n]
Where Rec. X1= 1/10; Rec. X2 = 1/12; Rec. X3 = 1/10; Rec. X4 = 1/11
n = 4
 1 1 1 1  
H H== Rec
XX Rec.
.  + + +  / 4 
IT
 10 12 10 11  

 247
247 
 
660 Rec. 247
660 247 660660××44
=
=XXX
HHH Rec
=
=. .     Rec
= Rec.
Rec Rec
==..
44 660
660××44 247
247

 247
247 
 
660  Rec. 247
660 247 660660××44
== Rec. . =
XXHXHH =Rec
Rec. = Rec
Rec. . == = = 10.6810.68

M
44 660
660××44 247
247

Therefore, the harmonic mean of the four observations is 10.7 years. It is used
for units that add up as reciprocals in a sequence such as speed, distance,
capacitance in series or resistance in parallel.

8.4.2 MEDIAN
Median is defined as a central or mid value of a dataset. Median divides a dataset
into two halves – one half contains the values greater than the mid value (or
median) and the other half contains the values less than the mid value.

Before calculating median, you need to arrange the dataset in the ascending or
descending order. The formula to calculate median is as follows:

n = number of observation

Now, If n is an odd number


th
 n +1
Median = Value of   observation
 2 

153
Research Methodology and Management Decision

Now, If n is an even number

Median = Value of {[(n/2)th observation + ((n+1)/2)th ]/2}

Let us understand the concept of median with the help of an example.

A group of 17 people gave the following ratings to a book on a 5-pointer scale


(where 1 is the lowest rating and 5 is the highest rating):

2, 5, 3, 4, 1, 5, 4, 3, 1, 2, 5, 4, 3, 2, 1, 5, 4

Now you want to calculate the average rating by using median. To do so, arrange
the data in the ascending order, as follows:

1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5

Since the number of observations is odd, the following formula will be used to
calculate median

Median = Value of ((n + 1)/2)th observation

Median = ((17 + 1)/2)th observation


IT Median = 9th observation

Median = 3

Therefore, the median rating for the book is 3.

Now, if n is an even number; then, we calculate median as the simple average


of the middle two numbers. In other words, median is the simple average of the
(n/2)th and ((n +1)/2 )th terms.
M
Now, if a group of 20 people gave their ratings to a movie on a 5-point scale as:

2, 5, 3, 4, 1, 5, 4, 3, 1, 2, 5, 4, 3, 2, 1, 5, 4, 1, 2, 3

Where, 1 is the lowest rating and 5 is the highest rating

Now, to calculate the average rating using median, all the 20 observations are
arranged in ascending order as:

1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5

Here, median is the average of middle two values, i.e., values at 10th and 11th
positions. This is calculated as:

Median = (3 + 3)/2 = 3

8.4.3 MODE
Mode refers to the value that has the highest frequency in a data series.

According to Croxton and Cowden, the mode of a distribution is value at the point
around which the items tend to be most heavily concentrated. It may be regarded as the
most typical of a series of values.

154
Data Processing and Analysis

Let us learn to calculate mode with the help of an example. Suppose the marks of
five friends in a science paper are 70, 90, 50, 70, and 30. You want to find the mode
of their marks.

You need to find the highest frequency of the present data to calculate mode. Here,
the number having the highest frequency is 70 as it occurs two times; therefore, the
mode of students’ marks is 70.

Mode is used as the most important statistic for nominal data where values are
names rather than numbers. In such cases, there is no concept of center because
there are no numbers. In addition, when we are dealing with continuous variables,
probability that observations occurring in the data sample are different is 1.
Therefore, mode cannot be used for continuous variables.

Mode is not considered a true measure of central tendency because of three reasons:
i. It is not necessary that one data series has only one mode because many
numbers in the data series can have the highest frequency.
ii. Mode does not consider all the frequencies to arrive at the central value of
the data series. Therefore, the results of mode are not reliable.
IT iii It is possible that a series has observations that occur only once. In such cases,
mode does not exist.
Let us summarise mean, median and mode as:
€€ Mean: Mean represents the average value in a dataset.
€€ Median: Median represents the middle value in a dataset.
€€ Mode: Mode represents the most common value in a dataset.
The measures of central tendency used for different types of variables are shown
M
in Table 4 as follows:

Table 4: Types of Variables and Measures of Central Tendency


Types of Variables Best Measure of Central Tendency
Nominal Mode
Ordinal Median and Mode
Interval/Ratio (not skewed) Mean, Median and Mode
Interval/Ratio (skewed) Median and Mode
* For skewed data, median is better than mean

7. Mean represents the value that you get after dividing the sum of
observations by the total number of observations taken. (True/False)
S elf
A ssessment 8. ______________ mean represents the nth root of the product of all the
Q uestions values or observations involved in the research.
9. Median can be defined as a central value that divides a dataset into two
halves. (True/False)

155
Research Methodology and Management Decision

8.5 MEASURES OF DISPERSION


Using different measures of central tendency, you can find out the mean value, but
these measures do not explain the scattering of values near the mid value in a data
series. The measures of dispersion can be used to study the dispersed values near
the mean value. Figure 4 shows the measures of dispersion:

Range

Measures
of
Dispersion

Mean Standard
Deviation Deviation
IT Figure 4: Measures of Dispersion

Let us now discuss each measure of dispersion.

8.5.1 RANGE
Range represents the difference between the highest value and the lowest value in
M
a data series. It is considered as a rough measure of variability because it depends
on the size of the data series. When the highest (H) and/or the lowest (L) data point
in a data series changes , the range also changes.

H−L
N ote Coefficient of Range=
H+L

The formula used to calculate range is as follows:

Range = (Highest value of data series – Lowest value of data series)

Let us learn to calculate range with the help of the preceding example in which a
group of 17 people rated a book on a 5-pointer scale, where 1 is the lowest rating
and 5 is the highest rating. The rating given by the 17 people is as follows:

2, 5, 3, 4, 1, 5, 4, 3, 1, 2, 5, 4, 3, 2, 1, 5, 4

Now, you want to calculate the range for the data series.

To do so, you need to find the highest and lowest values of the data series. In the
present case,

156
Data Processing and Analysis

Highest value of data series = 5

Lowest value of data series = 1

Therefore, the range would be:

Range= (Highest value of data series – lowest value of data series)

Range = (5 – 1)

Range = 4

Therefore, the range of the ratings given by 17 people to a book is 4.

8.5.2 MEAN DEVIATION


Mean deviation represents the extent of deviation of values from the mean.

According to Clark and Schkade, average deviation is the average amount of scatter
of the items in a distribution from either the mean or the median, ignoring the signs of the
deviations. The average that is taken of the scatter is an arithmetic mean, which accounts
for the fact that this measure is often called the mean deviation.
IT
Mean Deviation is used to measure variability across a data series.

The formula used to calculate Mean Deviation is as follows:

Mean Deviation (MD) = ∑|Xi – X|/n

Where Xi = Individual observation

X = Mean/Median/Mode
M
n = Number of observations

With the help of MD, you can also calculate the coefficient of MD. The coefficient of
MD refers to the relative measure of dispersion that can be calculated by dividing
MD with mean/median/mode.

The formula to calculate the coefficient of mean deviation is as follows:

Coefficient of MD = MD/X

Where M.D = Mean Deviation

X = Mean/Median/Mode

Let us understand the concept of MD and the coefficient of MD with the help of an
earlier example in which you calculated the average weight of five friends.

Table 5 shows the data used for calculating mean deviation:

Table 5: Weight of Five Friends


People Weight (kg) |Xi – X|
Jenny 35 |35 – 38| = 3
Robert 40 |40 – 38| = 2

157
Research Methodology and Management Decision

People Weight (kg) |Xi – X|


Ella 34 |34 – 38| = 4
Andy 39 |39 – 38| = 1
Eliza 42 |42 – 38| = 4
Total 14

35 + 40 + 34 + 39 + 42
=X = 38
5
The formula to calculate MD is shown as follows:

Mean Deviation (M.D.) = ∑|Xi – X|/n

M.D. = 14/5

M.D. = 2.8

Coefficient of Mean Deviation = M.D./X

= 2.8/38
IT = 0.074

Therefore, the dispersion of the weight of five friends from the mean value is 2.8,
Therefore, the weight of all friends is dispersed more or less by 2.8 kg from the
average weight. The relative measure of weight is 0.074.

8.5.3 STANDARD DEVIATION


Standard Deviation is used to calculate the scattering of values in a given dataset.
The symbol used to represent standard deviation is sigma (σ). Standard Deviation
M
(SD) is the square root of variance of a data series. The formula used to calculate
SD is as follows:

For research where entire population is considered,

∑(X − X)
2
i
SD of population σ =
n

and σ = Parameter of the population

For research where only a sample is considered,

∑ ( Xi
X − X)
22
i
SD of Sample S =
nn
−1

and S = Statistic of sample

Also not that the square of SD is called variance

Population variance = σ2 and Sample variance = S2

Sample statics is used to estimate population parameter. S2 is an unbiased estimate


of σ2.
158
Data Processing and Analysis

If the observations are grouped into a frequency table, than the formula of SD and
variance change as follow:

∑(X − X ) fi
2
i
σ=
n

and X =
∑ Xi fi
∑f i

n = ∑ fi

∑(X − X ) fi
2
2 i
Therefore, σ =
n
The coefficient of SD can be calculated by dividing SD with the mean of the series.
It is a relative measure of dispersion.

Let us understand the concepts of SD, the coefficient of SD, and the coefficient of
variance with the help of an example.

Suppose you want to calculate the standard deviation of the weight of five friends
shown in the preceding example. Table 6 shows the data used to calculate standard
IT
deviation, the coefficient of standard deviation, and the coefficient of variance:

Table 6: Weight of Five Friends


People Weight (kg) (Xi) (Xi– X) (Xi–X)2
Jenny 35 −3 9
Robert 40 2 4
Ella 34 −4 16
M
Andy 39 1 1
Eliza 42 4 16
Total ∑(Xi–X)2 =46

The calculation of standard deviation is as follows:


35 + 40 + 34 + 39 + 42
X=
5
= 38
(σ) = √∑ (Xi–X)2/n

= √46/5 = √9.2

= 3.033

The calculation of coefficient of SD is as follows:

Coefficient of Standard Deviation = SD/X

= 3.03/38

= 0.0798

159
Research Methodology and Management Decision

10. _______________ is used to study the scattered value near the mean
value of a data series.
S elf
A ssessment 11. Which formula is used for calculating the range of a data series?
Q uestions
a. Highest value of series – Lowest value of series
b. Lowest range – Highest range
c. Lowest value of series – Highest value of Series
d. None of these
12. Coefficient of Mean Deviation = _____________
13. The symbol used to represent Standard Deviation is _________________.

8.6 MEASURE OF SKEWNESS


A frequency distribution can be represented by drawing a curve or a graph. The
measure of skewness is used to study the shape of a curve that can be drawn by
IT plotting the data of a frequency distribution on a graph.

As you have learned in the preceding sections, through a measure of central


tendency, you measure the concentration of values of a data series in the middle
of a frequency distribution. Through a measure of dispersion, you measure the
scattering of values near the middle value of the data series.

It may be possible that two data series, which are widely different in nature and
composition, have the same mean and standard deviation. However, when you
plot the data of such series on graphs, you obtain curves with different shapes.
M
This shows that the measures of central tendency and dispersion are not sufficient
to study the frequency distribution of a data series because they do not talk about
the shape of the frequency distribution curves. Therefore, you need skewness
to gain understanding of the different shapes of various frequency distribution
curves.

The measure of skewness is used when the concentration of values of a data


series is more on a single side that is either positive or negative. Skewness can be
classified as positive skewness and negative skewness. This is shown in Figure 5:

M M M M
O E M E O
D D E M D D
E I A E I E
A N A A
N N N
Positive Skewness Negative Skewness

Figure 5: Positive Skewness and Negative Skewness – Asymmetric Distribution

160
Data Processing and Analysis

If curve of frequency distribution is symmetrical, then skewness = 0 and mean


= median = mode.
N ote
Mean, Median and
Mode

Symmetric Distribution

Positive skewness implies that the concentration of values is on the right side of
the curve, whereas negative skewness implies that the concentration of values
is on the left side of the curve. Skewness is calculated by taking the difference
of mean and mode. In positive skewness, the values of these three measures of
central tendency are in the following order:

Mean (X) > Median (M) > Mode (Z)


IT
However, in the case of negative skewness, the values of these three measures of
central tendency are in the following order:

Mean (X) < Median (M) < Mode (Z)

The formula to calculate skewness is as follows:

Skewness = X – Z
M
For moderately asymmetrical curves,
N ote Mode = 3 Median – 2 Mean or

Z = 3M – 2X

The coefficient of skewness is the relative measure of skewness that can be


calculated by dividing skewness with standard deviation. The formula used to
calculate the coefficient of skewness is as follows:

X–Z
Coefficient of Skewness = S k =
σ

Pearson’s coefficient of skewness


N ote Sk = {Mean – (3 Median – 2 Mean)}/σ

= [Mean – 3 Median + 2 Mean]/σ

Sk = (3Mean – 3Median)/σ

161
Research Methodology and Management Decision

For a moderately skewed, if there is more than one mode or if there is no mode,
then you need to calculate skewness and the coefficient of skewness using the
method of Moments.

Let us now calculate skewness and the coefficient of skewness with the help of
an example. Suppose you want to calculate the skewness and the coefficient of
skewness of the data given in Table 7:

Table 7: Age of Five Friends

People Age (Years) (Xi- X) (Xi-X)2


Jenny 18 0.2 0.04

Robert 17 -0.8 0.64

Ella 18 0.2 0.04

Andy 17 -0.8 0.64

Eliza 19 1.2 1.44

Total ∑Xi= 89 ∑(Xi–X)2 = 2.80


IT The mean of age is calculated as follows:

Mean of Age, X = ∑Xi/n

X = 89/5

X = 17.8

The median of age is calculated as follows:


M
Median, M = Value of (n + 1/2)th observation
M = (5 + 1/2)th observation
M = 3rd observation = 18
Since the data contains two modes (17 and 18), you do not consider mode in this
case.
The SD of age is calculated as follows:
σ = √∑ (Xi – X)2/n
= √2.80/5 ≅ 0.75
Skewness is calculated as follows:
Skewness = 3(17.8 – 18) = 0.6
The coefficient of skewness is calculated as follows:
Coefficient of Skewness = 0.6/0.75 = 0.8
The skewness in the age of five friends is 0.6 and the relative measure of skewness
is 0.8.

162
Data Processing and Analysis

SKEWNESS AND KURTOSIS

Just like skewness, kurtosis is another measure of the shape of a frequency


E xhibit
distribution. Skewness is used to measure the lack of symmetry in the
frequency curve of a distribution. On the other hand, kurtosis is used to
measure the relative peakedness of the frequency curve of a distribution. In
other words, skewness of a frequency distribution tells about the extent to
which the deviations cluster below or above the average/measure of central
tendency. Whereas, kurtosis studies the concentration of the observation
items at the central part of the series. Under kurtosis, the frequency curves
are divided into three categories based upon the shape of their peak, namely,
mesokurtic curve, platykurtic curve, and leptokurtic curve.

Leptokurtic
Frequency

Mesokurtic
Platykurtic
IT
O X

µ
Karl Pearson’s coefficient of kurtosis measure is given as b 2 = 42
M
µ2

Where,

μ4 is the fourth central moment of a distribution; and

μ2 is the second central moment of a distribution.

If the value of β2 = 3, the curve is known as a mesokurtic curve.

If the items concentrate very much at the centre, β2 > 3, i.e., the curve is more
peaked than the mesokurtic curve and is called a leptokurtic curve.

If the items concentrate at the centre comparatively lesser than the mesokurtic
curve, β2 < 3, i.e., the curve is less peaked than the mesokurtic curve and is
called a platykurtic curve.

14. Negative skewness implies that the concentration of values is on the


S elf
A ssessment right side of the curve. (True/False)
Q uestions

163
Research Methodology and Management Decision

8.7 MEASURES OF RELATIONSHIP


The measures of relationship study the relationship between two or more variables
in a given data series. When you study the relationship between two variables in
a population, it is known as bivariate population. When you study more than two
variables in a population, it is known as multivariate population. The relationship
among variables can be of two types – correlation and cause and effect. Based on
these relationships, there are two types of analysis, as shown in Figure 6:

Measures of
Relationship

Correlation Regression
Analysis Analysis

Figure 6: Measures of Relationship


IT Let us now discuss each type of relationship among variables.

8.7.1 CORRELATION ANALYSIS


Correlation analysis is used to study the association between different types of
variables. It measures the extent to which one variable is linearly related to the
other variables.

Different tools are used to study the correlation pattern between variables. These
include: Rank correlation and Simple correlation.
M
Let us discuss each tool.
€€ Rank correlation: Rank correlation refers to the correlation between two
data series in which the data is ranked. Generally, it is found when the data
is qualitative in nature. It was given by Charles Spearman. Therefore, it is
also known as Spearman’s coefficient of correlation. It calculates the degree
of relationship between two types of variables.
The formula to calculate rank correlation is as follows:

6 ∑ d 2i
Rank Correlation ρ
R= 1 −
(
n n2 − 1 )
Where, di = Difference between the individual/ith pair of variables
n = Number of pairs of observations
€€ Simple correlation: Simple correlation is used to find the degree of linear
relationship between two variables. It is the most commonly used measure
to describe relationship between two linearly related variables. It was given
by Karl Pearson. Therefore, it is also known as Karl Pearson’s coefficient of
correlation.

164
Data Processing and Analysis

Simple correlation can be of three types, as given in Figure 7:

PositiveCorrelation
Positive Correlation Negative Correlation
Negative Correlation No
NoCorrelation
Correlation

Figure 7: Types of Simple Correlation

The strength of association between two variables depends on the calculated


value of the correlation coefficient and the sample size. The value of the
correlation coefficient lies between a range of –1 and +1.
zz If the value of the correlation coefficient is close to –1 and the sample size
is sufficiently large, then there is a strong negative correlation between
IT zz
two variables. For example, if the coefficient of correlation is –0.8, then
there is a strong negative association between variables.
If the value of the correlation coefficient is close to +1 and the sample size
is sufficiently large, then there is a strong positive correlation between
two variables. For example, if the coefficient of correlation is 0.8, then
there is a strong positive association between variables.
zz If the correlation coefficient is not close to –1 or + 1 and the sample size is
sufficiently large, then there is weak correlation between two variables.
M
For example, if the coefficient of correlation is 0.3 or –0.3, then the
association between variables is weak.
The formula used to calculate simple correlation is as follows:
Correlation (r) = ∑ (Xi – X) (Yi – Y)/(n – 1)Sx Sy
Or
Cov(X, Y)
rx ,y =
SD(X) SD(Y)
1
n
∑ X i Yi − XY
r=
1 2  1 2
 n ∑ X i − X i  n ∑ Yi − Yi 
2 2

  
n ∑ X i Yi − ∑ (X i )∑ (Yi )
r=
 n X 2 − ( X )2   n Y 2 − ( Y )2 
 ∑ i ∑ i   ∑ i ∑ i 
Where, Xi = ith value of X variable
X = Mean of X variable

165
Research Methodology and Management Decision

Yi = ith value of Y variable


Y = Mean of Y variable
n = Number of pairs of observations
Sx = Standard deviation of X
Sy = Standard deviation of Y
Let us learn to calculate simple correlation between two variables with the
help of an example. Suppose you want to study the correlation between the
age and weight of a group of people to find out the relation between the two.
Table 8 shows the data:

Table 8: Age and Weight of a Group of People


Number of Age (Xi) Weight(Yi) Xi2 Yi2 XiYi
observations
1 18 35 324 1225 630
2 20 38 400 1444 760
3 25 50 625 2500 1250
IT 4
5
6
30
35
24
65
70
50
900
1225
576
4225
4900
2500
1950
2450
1200
7 17 35 289 1225 595
8 16 39 256 1521 624
9 49 76 2401 5776 3724
10 45 72 2025 5184 3240
11 50 85 2500 7225 4250
M
12 18 32 324 1024 576
13 20 34 400 1156 680
14 25 57 625 3249 1425
15 24 50 576 2500 1200
16 17 35 289 1225 595
17 16 39 256 1521 624
18 23 44 529 1936 1012
19 22 45 484 2025 990
20 34 60 1156 3600 2040
21 36 65 1296 4225 2340
22 31 63 961 3969 1953
23 43 70 1849 4900 3010
24 44 72 1936 5184 3168
25 16 35 256 1225 560
Total ∑Xi=698 ∑Yi=1316 ∑Xi2=22458 ∑Yi2=75464 ∑XiYi=40846

The calculation of correlation is as follows:


Correlation (r) = (n∑XiYi – ∑Xi∑Yi)/√n∑Xi2 – (∑Xi)2 × n∑Yi2 – (∑Yi)2

166
Data Processing and Analysis

r = (25 × 40846 – 698 × 1316)/√ (25 × 22458 – 698 × 698) (25 × 75464 – 1316 × 1316)
r = 102582/√74246 × 154744
r = 0.96

8.7.2 REGRESSION ANALYSIS


Correlation need not necessarily imply causality. But it can be said that if correlation
between any two variables is very high, then it might be indicative of causality i.e.
a situation where one variable denotes the cause and the other variable denotes its
effect. For example, if X and Y are correlated, the causal relationship inferred from
correlation between them may indicate that, X is a cause of Y, Y is a cause of X, or
both X and Y are caused by some other variable Z, etc.

Correlations are employed through methods such as regression analysis. In


order to understand In common parlance, regression analysis (whether simple or
multiple) is also termed as Causal analysis. Causality between different variables
can be understood using causal analysis.

Cause and effect analysis is measured using simple regression or multiple


IT
regression.

Regression is one step ahead of correlation in identification of relationship between


two variables. This is because regression allows for prediction of values within
the given data range. In simple language, if we know X, we can predict Y and if
we know Y, we can predict X. This is possible with the help of an equation called
regression equation.

The variable Y is generally termed as dependent or criterion variable and the


M
variable X is termed as independent or predictor variable. Regression equation
is used to generally predict the values of Y based on the values of X. However, it
cannot be rightly said that Y is caused by X. Before making such an interpretation,
it is extremely imperative for the researcher to thoroughly understand the variables
under study and the circumstances or context under which they operate.

The regression equation can be written as below:

Y = α + βX

Where,

Y represents scores on Y variable

X represents scores on X variable

α represents regression constant in the sample


β represents regression coefficient in the sample

167
Research Methodology and Management Decision

α and β are calculated with the following formula:

n ∑ X i Yi − ∑ X i ∑ Yi
b=
n ∑ X i2 − ( ∑ X i )
2

1
=a  ∑ Yi − b∑ X i 
n

Simple regression analysis is useful in a number of situations, for example, it is


used in analysing the relationship between number of consumers (independent
variable) and product sales of a month (dependent variable). The regression
equation to the data is fitted with the use of least squares method in regression
analysis.
Let us take an example with data of number of customers and monthly sales for 10
number of observations (N) as shown in Table 9:

Table 9: Customers and Monthly Sales


S. No. No. of Consumers Monthly Sales (Y) XY X2
(X) (in ‘00) (in ‘000)
IT 1

2
2.0
3.4
12
6
24.0
20.4
4.0
11.6

3 6.2 7 43.4 38.4

4 7.6 11 83.6 57.8

5 6.5 13 84.5 42.3

6 8.2 33 270.6 67.2


M
7 7.6 31 235.6 57.8

8 9.3 22 204.6 86.5

9 3.1 36 111.6 9.6

10 8.1 24 194.4 65.6


Total 62.0 195 1,272.7 440.7

Now, regression equation is given by:


Y = α + βX
Using the formula for β

10 ×1272.7 − ( 62 )(195 )
=
b = 637 ÷ 563
= 1.1314
10 × 440.7 − ( 62 )( 62 )

Using the formula for α

1
=a 195 − 1.1314=
× 62  12.485
10 

168
Data Processing and Analysis

Thus, the regression equation for the above data is given as:
Y = 12.485 + 1.1314X
With this equation, the values of Y (monthly sales) can be computed for any given
value of X (no. of customers) as depicted in Table 10 below:

Table 10: Monthly Sales for given Number of Customers


S. No. No. of Consumers Y=12.485+1.1314X Monthly Sales (Y) (in
(X) (in ‘00) ‘000)
1 2.0 14.75 (12.485 + 1.1314×2.0)

2 3.4 16.33 (12.485 + 1.1314×3.4)

3 6.2 19.50 (12.485 + 1.1314×6.2)

4 7.6 21.08 (12.485 + 1.1314×7.6)

5 6.5 19.84 (12.485 + 1.1314×6.5)

6 8.2 21.76 (12.485 + 1.1314×8.2)


IT 7

8
7.6

9.3
21.08

23.01
(12.485 + 1.1314×7.6)

(12.485 + 1.1314×9.3)

9 3.1 15.99 (12.485 + 1.1314×3.1)

10 8.1 21.65 (12.485 + 1.1314×8.1)

Total 62.0 195.00


M
15. __________________ is the study of the association between different
types of variables.
S elf
A ssessment 16. Causal analysis is used to study the cause and effect relationship of two
Q uestions variables. (True/False)

8.8 DIFFERENT CHARTS USED IN DATA ANALYSIS


Graphical illustrations are visually appealing and bring life to a report so as to
give the target audience refreshing breaks from the monotony caused by texts and
tables.

If the research report contains many descriptive tables, it can be made more
readable and attractive if the most important tables are presented through graphs
and diagrams. In the graphical presentation, facts and figures are gathered first
and then they are depicted in the form of graphs and charts to present the statistical
information. The most frequently used graphs and charts include the following:
€€ Bar Chart: A bar chart represents categorical data with the help of rectangular
bars, plotted vertically or horizontally. The heights or lengths of rectangular

169
Research Methodology and Management Decision

bars are proportional to the values represented by them. The data can be in
the form of absolute frequencies or relative frequencies.
Figure 8 below shows a bar chart to depict the relative frequency/percentage
of shortages of anti-inflammatory medicines in the rural health organisations:

NEVER 31

Shortage of anti-inflammatory
OCCASIONALLY 11

medicines
FREQUENTLY 3

RARELY 55

0 10 20 30 40 50 60

Percentage of health clinical

Figure 8: Relative Frequency of Shortages of Anti-inflammatory Medicines


IT €€
in Rural Health Organisations in Bar Chart

Pie Chart: A pie chart is a circular statistical graphic, segregated into different
segments to illustrate the numerical proportions/relative frequency of a
number of items. The arc length of each segment shows the proportionate
quantity represented by it. Pie charts provide a quick overview of the data
presented to the readers. All segments of the pie chart should be added up
to 100%.
Figure 9 below shows a pie chart to depict the relative frequency/percentage
M
of shortages of anti-inflammatory medicines in the rural health organisations:

Percentage of health clinical

Never
31%

Rarely
55%
Occasionally
11%

Frequently
3%

Rarely Frequently Occasionally Never

Figure 9: Relative Frequency of Shortages of Anti-inflammatory


Medicines in Rural Health Organisations in Pie Chart

€€ Histogram: A histogram is an accurate representation of the probability


distribution of a continuous data variable grouped into bins. They are very
similar to bar charts used to show categorical data. The only difference
between the two is that, the histogram bars are connected to each other (so

170
Data Processing and Analysis

long as there is no gap in the data) to represent continuous data, whereas the
bars in a bar chart are not connected as they represent different categorical
entities. Figure 10 below shows a histogram to depict the frequency of sales
effected by different sales persons in a month, indicating how many sales
persons fall within a particular sales range:

Sales Patterns of Sales Persons


6

5
No of sales persons

0
(32, 182) (182, 332) (332, 482) (482, 632) (632, 782) (782, 932)
Sales range
IT
€€
Figure 10: Absolute Frequency of Sales Effected by Different Sales Persons in a Month (n=60)

Line graph: A line graph or a line chart is generally used to visualize the
value of a particular variable over time. They are useful to show the trend
of numerical data over a period of time. Two or more distributions (each
depicted by a separate line) can be shown in one graph as long as the
difference between them is easily distinguishable. They also make it possible
to compare the distributions of different groups for example, age distribution
between males and females. Figure 11 below shows a line graph to depict
M
the frequency of daily number of patients being treated at the rural health
organisations in District Y:
DAILY NUMBER OF PATIENTS

25
UNDERGOING TREATMENT

20

15

10

0 1 2 3 4 5 6 7 8 9 10 11 12
DAY NUMBER

Figure 11: Daily Number of Patients Being Treated at the Rural


Health Organisations in District Y in Line Chart

€€ Box and Whisker Plot: This is a method of graphically representing different


groups of numerical data through their quartiles. The box plots can also

171
Research Methodology and Management Decision

have vertical lines extending from the boxes (called whiskers) to indicate the
variability outside the upper and lower quartiles. For example, variability
between sales patterns effected in Area X and Area Y is shown through box
plots in Figure 12 below:

Representation of sales in different areas

Sales in Area X Sales in Area Y

900
800
700
600
500
Sales

400
300
200
100
0
IT Figure 12: Sales Patterns of Food Grains Effected in Area X and Area Y

17. Bars in a histogram are not connected as they represent different


categorical entities. (True/False)
S elf
A ssessment 18. In a boxplot, vertical lines extending from the boxes are called ________.
Q uestions
M
8.9 SUMMARY
€€ A researcher collects any type of data, quantitative and qualitative, in raw
form. After that, he/she needs to process the collected data to make it fit for
analysis.
€€ Editing refers to reviewing the collected data to check whether it is valid or
not. This helps in eliminating the extra information and retaining the relevant
matter for analysis.
€€ When the data is generated with the help of a questionnaire, it can be coded
either at the time of framing the questionnaire or after collecting the data.
€€ Classification refers to categorising the coded questions into different
segments as per their relevance.
€€ Tabulation refers to presenting the data in the form of a table so that it can be
analysed easily.
€€ Descriptive analysis is used to study the relationship pattern among variables.
€€ Inferential analysis uses various types of test of significance to check the
validity of a hypothesis for studying a problem.

172
Data Processing and Analysis

€€ The measures of central tendency are used to study the distribution pattern
of a dataset.
€€ Mean represents the value received after dividing the sum of observations
by the total number of observations.
€€ Median refers to the central value of the given dataset.
€€ Mode refers to the value that has the highest frequency in a data series.
€€ The measures of dispersion refer to the measures that are used to study the
dispersed value near the mean value.
€€ Standard deviation is used to calculate the scattering of values in a given
dataset.
€€ The measure of skewness is used to study the shape of the curve that can be
drawn by plotting the data of a frequency distribution on a graph.
€€ The measures of relationship study the relationship between two or more
variables in a given data series.

8.10 KEY WORDS


IT €€ Base period: This refers to the period that acts as a benchmark for measuring
economic and financial data.
€€ Hypothesis: This is a proposed explanation of a phenomenon, which needs
to be tested.
€€ Measures of central tendency: These measures are used to find the central
value of a data series.
€€ Measures of dispersion: These measures are used to find the scattering of
M
values around the mean value of a data series.
€€ Measures of relationship: These measures are used to find the relationship
between different variables.
€€ Univariate analysis: This is the analysis of a single variable.

8.11 CASE STUDY: QUALITY STANDARDS IN A SERVICE SECTOR COMPANY


TPR Inc. was a multi-cuisine restaurant based in India. It had several outlets in
major Indian cities. The restaurant management wanted to find out if its various
outlets were meeting the established standards of quality and customer service. It
hired a consultancy firm for the purpose.
The consultants collected a large scale of data with the help of questionnaires,
interviews, and observations in the restaurants’ outlets. Then, they carefully
followed the data processing steps to analyse it and retrieve relevant and
meaningful information from it.
While processing the responses in the questionnaires, they found that quite a large
number of questions were left unanswered. Instead of ignoring such questions,
they proceeded in a systematic manner. Each questionnaire comprised a series of
interval questions, closed-ended questions and open-ended questions.

173
Research Methodology and Management Decision

In case of interval questions, they gave a mid-value to the unanswered questions.


In case of open-ended questions, they went back to the customers and requested
them to fill in the answers.

After retrieving sufficient data from the questionnaires, they classified the collected
data. To do so, they combined customers’ responses from different cities and then
sub grouped them according to their cities. Next, they formed a table to analyse
the relationship between customers’ satisfaction and the sales of the company:

Calculating the Correlation between


Customer Satisfaction and Sales of the Company
Number of Customer Sales of Com- Xi2 Yi2 XiYi
Observations Satisfaction(Xi) pany(Yi)
1 4 5 16 25 20
2 6 6 36 36 36
3 7 6 49 36 42
4 8 4 64 16 32
5 9 6 81 36 54
IT 6
7
8
10
8
7
9
10
2
100
64
49
81
100
4
90
80
14
9 1 3 1 9 3
10 2 4 4 16 8
11 9 9 81 81 81
12 8 8 64 64 64
M
13 7 9 49 81 63
14 10 11 100 121 110
15 6 5 36 25 30
16 9 12 81 144 108
17 8 15 64 225 120
18 10 12 100 144 120
19 9 16 81 256 144
20 8 20 64 400 160
21 10 20 100 400 200
22 4 6 16 36 24
23 5 8 25 64 40
24 10 14 100 196 140
25 10 19 100 361 190
Total 185 239 1525 2957 1973

174
Data Processing and Analysis

The correlation between the customers’ satisfaction and the sales of company is as
follows:

Correlation(r) = (n∑XiYi -∑ Xi∑Yi) / √n∑Xi2

r = (25 × 1973 – 185 × 239) / √ (1525 × 25 – 185 × 185) (25 × 2957 – 239 × 239)

r = 5110/8095.41

r = 0.6

Since the correlation coefficient is positive and close to 1, it indicates that the
relationship between the customers’ satisfaction and the sales is positive and
strong.

Similarly, the consultants studied the relationship between different variables, such
as quality of service and customer satisfaction, quality of service and established
standards, and so on. Finally, they concluded that the satisfaction level of the
restaurant’s customers was positive and strong. However, the restaurant’s service
level were far behind the established quality standards.
IT QUESTIONS
1. What are the different steps of data processing used in the case study?
(Hint: The consultants used all the steps of data processing, that is, first
they extracted the relevant data. Then, they classified and organised the
information and studied the relationship between variables.)
2. Which type of measure is used in analysing the table and what type of
analysis is used?
M
(Hint: The measure of relationship is used to analyse the table.)

8.12 EXERCISE
1. Explain the different steps of data processing.
2. What are the different types of data analysis?
3. What are the measures of central tendency? Why are they used?
4. What are the measures of dispersion? Why are they used?
5. What do you understand by ‘skewness’? What is the measure of skewness?
What does its calculated value indicate?
6. What is the purpose of casual analysis?

8.13 ANSWERS FOR SELF ASSESSMENT QUESTIONS


Topic Q. No. Answer
The Concept of Data Processing 1. Editing
2 True

175
Research Methodology and Management Decision

Topic Q. No. Answer


3. Postcoded data
4. False
The Concept of Data Analysis 5. b. Bivariate analysis
6. False
Measures of Central Tendency 7. True
8. Geometric
9. True
Measures of Dispersion 10. Measures of Dispersion
11 a. H
 ighest value of series – Lowest
value of series
12 MD/X
Where, MD = Mean Deviation
X = Mean/Median/Mode
13 Sigma (σ)
Measure of Skewness 14. False
Measures of Relationship 15. Correlation analysis
IT Different Charts Used in Data
Analysis
16.
17.
True
False

18. whiskers

8.14 SUGGESTED BOOKS AND E-REFERENCES


M
SUGGESTED BOOKS
€€ Cahoon, M. (1987). Research methodology. Edinburgh: Churchill Livingstone.
€€ Chandra, S., & Sharma, M. Research methodology.
€€ Panneerselvam, R. (2014). Research methodology. Delhi: PHI Learning.
€€ Welman, J., Kruger, F., & Mitchell, B. (2005). Research methodology. Cape
Town: Oxford University Press.

E-REFERENCES
€€ Research Guides: Organising Your Social Sciences Research Paper: 6. The
Methodology. (2018). Retrieved from http://libguides.usc.edu/writingguide/
methodology
€€ Research Methods. (2018). Retrieved from https://research-methodology.net/
research-methods/

176
CHAPTER

9
THE CONCEPT OF HYPOTHESIS

Table of Contents
IT
Learning Objectives
9.1 Introduction
9.2 Defining Hypothesis
9.2.1 Characteristics of a Good Hypothesis
9.2.2 Types of Hypotheses
Self Assessment Questions
M
9.3 Hypothesis Testing
9.3.1 Null Hypothesis and Alternative Hypothesis
9.3.2 Decision Rule
9.3.3 Two-tailed Test
9.3.4 One-tailed Test
Self Assessment Questions
9.4 Procedure of Hypothesis Testing
Self Assessment Questions
9.5 Summary
9.6 Key Words
9.7 Case Study
9.8 Exercise
9.9 Answers for Self Assessment Questions
9.10 Suggested Books and e-References
L E A R N I N G O B J E C T I V E S
IT
After studying this chapter, you will be able to:



Explain the concept of hypothesis
Describe the various types of hypothesis
 Explain the use of null and alternative hypotheses in hypothesis testing
 Differentiate between two-tailed and one-tailed tests
 Describe the procedure of hypothesis testing
M
The Concept of Hypothesis

9.1 INTRODUCTION
In the previous chapter, you studied about data processing and analysis. Now,
you will study about the concept of hypothesis.
A hypothesis refers to an assumption that is made in the population parameter
and a sample statistic is used to verify the same. It is a very useful tool to solve
various research problems and issues. A researcher first forms a hypothesis about
a problem and then tests it to check its validity by using statistical measures. The
procedure to utilise test statistics to check whether a hypothesis is true is known
as hypothesis testing.
Suppose a researcher is asked to check whether an organisation’s new
advertisement has resulted into enhanced sales or not. In this case, the researcher
would first form the hypothesis that the new advertisement has no impact on
the organisation’s sales. This hypothesis is known as null hypothesis. After that,
the researcher would form another hypothesis, known as alternative hypothesis,
which states that the new advertisement has a positive impact on the organisation’s
sales. Then, the researcher would analyse the data to find the relationship between
the new advertisement and the organisation’s sales. If he/she finds a relationship
IT between the new advertisement and the sales, he/she would reject the null
hypothesis and accept the alternative hypothesis.
In the field of research, the concept of hypothesis and hypothesis testing hold a very
special place. The formation of hypothesis helps the researcher remain focused
on the research problem. In addition, it gives direction to the research project by
clearly defining the scope of research. Hypothesis testing assists the researcher in
deriving realistic results, as it takes into consideration the errors due to sampling.
In this chapter, you will learn about the concept of hypothesis and explore the
M
characteristics and types of hypothesis. The chapter also provides information
about hypothesis testing, null and alternative hypotheses, decision rules, one-
tailed test and two-tailed test.

9.2 DEFINING HYPOTHESIS


Hypothesis is a proposed explanation given for an observed situation. It is a specific
prediction, which can be tested, about what you expect to happen in a study or
research. It represents a tentative relationship between two or more variables,
which is predicted by the researchers. For example, a study designed to look at the
relationship between stress and common cold might have a hypothesis that states,
“This study is designed to assess the hypothesis that people with high-stress levels
will be more likely to catch common cold after being exposed to the virus than are
people who have low-stress levels.”
Some definitions of hypothesis by experts are given below:
According to Mouton, hypothesis is: A statement postulating a possible relationship
between two or more phenomena or variables.
According to Guy, hypothesis is: A statement describing a phenomenon or which
specifies a relationship between two or more phenomena.

179
Research Methodology and Management Decision

Both hypothesis and problem statements arise from a predefined situation.


However, there is a difference between the two. A problem statement cannot be
directly tested, while a hypothesis statement is derived from a problem statement.
The hypothesis is formulated after a problem has been stated and the researcher
has done a detailed theoretical study of the problem. It is formulated to solve the
problem by testing it with the help of various tests of significance.

9.2.1 CHARACTERISTICS OF A GOOD HYPOTHESIS


A hypothesis is a supposition (assumed or tentative) statement regarding the
relationship that exists between two or more variables. The following are the
characteristics of a good hypothesis:
€€ Clear topic: A hypothesis should clearly define its topic. The topic should
also be meaningful.
€€ Precise: A hypothesis should be clear and specific to facilitate a deep and
comprehensive study, and enable researchers to draw reliable inferences on
its basis.
€€ Testable: A hypothesis should be capable of being tested. Hypothesis is
specific and it may either agree or disagree with the research question.
IT €€ Limited in scope: A hypothesis should be limited in scope, as narrower
hypotheses are generally more testable.
€€ Consistent: A hypothesis should be based on previous research.

9.2.2 TYPES OF HYPOTHESES


There are six types of hypotheses, which are classified on the basis of their
derivation and formulation are shown in Figure 1:
M
Inductive Hypothesis
On the Basis of
Derivation
Deductive Hypothesis

Types of
Directional Hypothesis
Hypothesis

Non-Directional Hypothesis
On the Basis of
Formulation
Null Hypothesis

Alternative Hypothesis

Figure 1: Types of Hypotheses

On the basis of derivation, there are two types of hypotheses, which are explained
as follows:
1. Inductive hypothesis: In inductive hypothesis, you move from specific
observations to broad generalisations. First, you observe a phenomenon.

180
The Concept of Hypothesis

Then, you form a pattern from your observations. After that, you form a
hypothesis to study the pattern. Finally, you form a theory on the basis
of your study of the pattern. The inductive hypothesis is used to conduct
qualitative studies of subjective variables. In this type of hypothesis, you
should ask open-ended and process-oriented questions.
2. Deductive hypothesis: In this type of hypothesis, you move from a general
statement to a specific, logical conclusion. You start from a theory and based
on it you make a prediction of its consequences. In other words, you predict
what the observations should be if the theory were correct. Finally, analysis
is done to arrive at a conclusion, whether the theory is rejected or accepted
with respect to the problem. In deductive hypothesis, a research goes from
general theory to specific observation. In this type of hypothesis, you should
ask closed-ended and outcome-oriented questions.
On the basis of formulation, there are four types of hypothesis which are explained
as follows:
1. Directional hypothesis: This hypothesis checks the direction of relationship
between two variables. In directional hypothesis, you use terms, such as
more than, less than, negative and positive. An example of the directional
IT hypothesis is: In an organisation, women are more productive than men.
2. Non-directional hypothesis: In this hypothesis, the direction of relationship
between two variables cannot be specified. For example, an organisation
wants to get feedback from its employees about their job satisfaction level. In
this example, the test result can be positive or negative depending on the job
satisfaction of the employees.
3. Null hypothesis: In this hypothesis, there is no relation between two
variables under study. It is denoted by H0. Null hypothesis is used as
M
the first statement in a hypothesis, which you (or the researcher) want to
reject. For example, a null hypothesis is: There is no relation between the
number of years of experience held by an individual and his performance.
Therefore, researchers are more interested in disproving or rejecting the null
hypothesis. This is an example of null hypothesis that would be tested for
rejection because it is generally held that experience and performance are
related.
4. Alternative hypothesis: This hypothesis states that there is a relationship
between two variables under study. It is denoted by H1. It is used as the
second statement in a hypothesis that you want to accept. For example, an
alternative hypothesis can be: There is a relation between the qualification
of an individual and better job opportunities. Since these two variables are
related, you would want to accept this statement.

Before studying the concept of null hypothesis and alternative hypothesis in


detail; you must understand how the process of hypothesis testing works.
N ote
The researchers initially state the null and alternative hypothesis. After this,
they conduct certain specific tests and at the end of test, they make statements
regarding the likelihood that a research hypothesis is FALSE.

181
Research Methodology and Management Decision

It is true that the researchers make probability statements regarding the


likelihood of hypothesis being false instead of it being true; i.e., researchers are
interested in rejecting null hypothesis rather than accepting the null hypothesis
because they never know how much type II error they might be making.

1. _______________ is a specific prediction, which can be tested, about what


you expect to happen in a study or research.
S elf
A ssessment 2. Match the following:
Q uestions
1. Inductive hypothesis i. General statement to a specific
conclusion
2. Deductive Hypothesis ii. Second statement in a hypothesis
3. Null hypothesis iii. From Specific observations to
broad generalisations
4. Alternative Hypothesis iv. First statement in a hypothesis
a. 1 iv, 2 iii, 3 ii, 4I
IT b. 1 iii,
c. 1 iii,
2 iv,
2 i,
3 i,
3 iv,
4 ii
4 ii
d. 1 i, 2 ii, 3 iii, 4 iv

9.3 HYPOTHESIS TESTING


M
Hypothesis testing is a process to make decisions for research problems by using
sample data. It is a logical method of taking and validating decisions.

In hypothesis testing, you take two statements:


i. The first statement states that there is no relationship or no difference
between two variables under study. You take this statement (also known as
null hypothesis) as true. A null hypothesis statement involves equality (≤, ≥,
or =) about a population parameter.
ii. The second statement states that there is a relationship or difference between
two variables under study. You take this statement (also known as alternative
hypothesis) as false. Alternative hypothesis contradicts null hypothesis and
must not involved equality (<, ≠, >).
After that, you test the null hypothesis to accept or reject it. The null hypothesis
is tested with the help of the levels of significance. A significance level is the
probability of rejecting null hypothesis in a statistical test when it is true. It is
expressed in percentage and its value can be calculated from the tables of various
test statistics. Examples of test statistic are t-test, z-test, and F-test. You would learn
about these tests in detail in the upcoming chapters.

182
The Concept of Hypothesis

After the null and alternative hypothesis have been stated, the researcher sets the
decision criteria for which he/she needs to state the level of significance of test.

If the null hypothesis is true, the sample mean will be equal to population
N ote mean on average.

The most commonly used levels of significance in statistics are 1%, 5% and 10%.
For example, if 5% is the most commonly used level of significance in behavioural
studies, it implies that the 5% area of the normal curve would be used for testing
the hypothesis and the value for this area is taken from the table of the respective
test statistic. For instance, the z-values for the various levels of significance are
shown in Figure 2:

Mean value
IT
–2.58 –1.96 –1.64 +1.64 +1.96 +2.58
M
90% of Area
95% of Area
99% of Area

Figure 2: z-values for the Levels of Significance

In Figure 2, you can see that the areas expressed in percentage and their values
are given on X-axis. Table 1 provides the levels of significance and their z-values:

Table 1: z-values of Levels of Significance


Level of Significance z-value
1% +/– 2.58 (for two-tailed)
1% +/– 2.33 (for one-tailed)
5% +/– 1.96 (for two-tailed)
5% +/– 1.64 (for one-tailed)
10% +/– 1.64 (for two-tailed)
10% +/– 1.28 (for one-tailed)

183
Research Methodology and Management Decision

In hypothesis testing, the value level of significance is very important, as it helps you
in rejecting or accepting a null hypothesis. You should be careful while formulating
or determining the level of significance for a problem/topic. The reason is that you
may reject a true hypothesis on the basis of a level of significance. If the level of
significance is 5%, it implies that the probability of rejecting a true hypothesis
is 0.05 (max).

After the level of significance has been set, the researcher then proceeds to compute
the test statistic which basically describes how far a sample mean is from the
population mean. The greater the value of test statistic, the farther is the sample
mean from the population mean described in null hypothesis. Thereafter, on the
basis of value of test statistic, a decision is made.

If the null hypothesis is true and the probability of obtaining a sample mean is less
than 5%, then we reject the null hypothesis. On the contrary, if null hypothesis is
true and the probability of obtaining a sample mean is more than 5%, then the null
hypothesis is retained.

9.3.1 NULL HYPOTHESIS AND ALTERNATIVE HYPOTHESIS


IT Null hypothesis represents the first statement of a hypothesis that is assumed
to be true. This statement indicates that there is no relationship between two
variables under study and if there exists, any relation that is purely due to chance.
Alternative hypothesis represents the second statement of a hypothesis that is
assumed to be false. This statement indicates that there is a relationship between
the two variables.

Let us understand these through the following example:


M
Example 1: Assume that a researcher if a patient takes physiotherapy sessions
two times instead of three times in a week post operation; then, his/her recovery
time would be greater. Assume that the average recovery time after operation is
7 weeks:

H0: The average recovery time after operation is less than or equal to 7 weeks.

H1: The average recovery time after operation is greater than 7 weeks.

From the preceding two examples, it is clear that H0 is totally opposite of the
statement the researcher wants to study. The researchers always test H0 for
significance, not H1 because they are usually interested in disproving H0.

H0 and H1 are in the descriptive form. The researcher must convert them into the
quantitative form to compute them.

In Example 1, the quantitative forms of H0 and H1 are as follows:

H0: μ ≤ 7

H1: μ > 7

184
The Concept of Hypothesis

Where,

µ = Population mean

You can also formulate a hypothesis for testing with the help of a benchmark. This
benchmark is a numerical digit with which you have to compare your results and
test the hypothesis. This is one of the finest and widely used methods for framing
null and alternate hypotheses because it represents null and alternate hypotheses
in quantitative form. This makes hypothesis testing easier.

For example, in a school, the average weight of every class is 100 (population
mean). You consider all sections of class 10 as a sample (assume there are 5 sections
of class 10) and calculate their average weight (sample mean). Now, you want to
check whether the sample mean is equal to the population mean or not. In this
case, H0 and H1 would be as follows:

H0: X = 100

H1: X < 100

Where,
IT
X = Sample mean

μp = Population mean = 100

The researchers assume that the null hypothesis is true and proceed further to find
out various methods/possibilities to solve the problem. They try to reject the null
hypothesis.
M
A hypothesis can never be right or wrong. Rather, it is judged by what you want to
analyse. If a hypothesis is framed in such a way that it can answer your problem,
then it would be right.

9.3.2 DECISION RULE


Decision rule refers to the process or criteria that a researcher uses to decide
whether to accept or reject the null hypothesis. For example, a researcher forms a
hypothesis that the mean age of a population is equal to 30. The researcher then
collects a sample of observations to test this hypothesis. He/she will then create a
decision criteria. For instance, the researcher may decide to accept the hypothesis
if the sample mean was in the range of 10% on either side of 30; i.e., 30 ± 10% =
between 27 and 33. It means that the researcher would reject the hypothesis if
mean of sample was below 27 or above 33.

It is important to note that different types of errors may occur while testing a
hypothesis. Therefore, the researcher should take into consideration the possibilities
of these errors while taking decisions.

185
Research Methodology and Management Decision

The decision grid helps the researcher in taking decisions, which is shown in
Figure 3:

Accept H0 Reject H0

H0 (true) Correct Decision Type I Error (a error)

H0 (false) Type II Error (b error) Correct Decision

IT Figure 3: Decision Grid

As per the grid shown in Figure 3, if H0 is true and it is accepted, then the decision
is correct. If H0 is false and it is rejected, then also the decision is right. However,
if the decision is wrong, two types of errors can occur, which are explained as
follows:
1. Type I errors: These errors occur when the researcher rejects a null hypothesis
(H0) when null hypothesis was true. In this case, the decision taken by the
researcher is wrong. Type I errors are also known as the first kind of error or
M
false positive. These errors are represented by a.
2. Type II errors: Type II errors occur when the researcher accepts a null
hypothesis (H0) that should have been rejected. In this case, the decision taken
by the researcher is wrong. Type II errors are also known as the second kind
of error or false negative. These errors are represented by b. The probability
of rejecting the null hypothesis when it is false = 1 – b and is called as the
power of test.
If you minimise Type I errors, Type II errors would increase or vice-versa.
Therefore, you have to be very careful while minimising one type of error. You
must remember that both the types of errors can be limited using an appropriate
sample size.

9.3.3 TWO-TAILED TEST


Two-tailed test is a part of non-directional hypothesis that talks about the
relationship between two variables but does not explain anything about the
direction of the relationship.

For example, a company produces tennis balls and it has laid down that the ball
should weigh 55 grams in order to get good ratings. The samples are drawn on
hourly basis and checked for ideal weight. In a given hour, 11 balls are checked

186
The Concept of Hypothesis

randomly and their mean is calculated as 55.006 grams and SD of 0.029 grams.
If the production line gets out of sync with more than 1% level of significance,
the production line is shut down. Let us see if the production line should be shut
down in this case.

Here,

μp = 55 g;

H0: μp = 55 g

H1: μp ≠ 55 g

α = 1% =0.01

Therefore, α/2 = 0.005

p = 1 – (α/2) = 0.995

Degree of freedom of sample = n – 1 = 11 – 1 = 10

Here, tp = 3.169
IT
Now, calculate tc.

tc =
X −µ
s/ n

55.006 − 55
tc =
0.029 / 10
M
0.006
=tc = 0.659
0.0091
The two-tailed test can be shown on a normal curve in Figure 4:

Fail to Reject H0

Reject H0 Reject H0

tp = –3.169 tc = 0.659 tp = +3.169


μ

Figure 4: Two-tailed Test

187
Research Methodology and Management Decision

In Figure 4, at the 1% level of significance, the t value would be ±3.169. If the


calculated value of test statistics lies in between the range of –3.169 and +3.169,
then H0 would be accepted. However, if the calculated value of test statistics lies
outside this range, it would be rejected. Here, the rejection region is equally divided
between two tails of the distribution (–0.005 is upper tail and 0.005 is lower tail). In
this example, the null hypothesis is accepted.

9.3.4 ONE-TAILED TEST


One-tailed test is a part of directional hypothesis that talks not only about the
relationship between two variables but also the direction of relationship. It is
considered when you want to test a hypothesis on either positive or negative side
of a normal curve. When the hypothesis testing involves rejection region only on
one side of the sampling distribution, it is called as a one-tailed hypothesis test.

For example, assume that the null hypothesis states that mean weight of people is
60 kg or more. In this case, the alternative hypothesis would be that the mean
weight of people is less than 60 kg. Here, the rejection region comprises of the
range of numbers 0 to 60 located on the left side of sampling distribution (set of
numbers that are less than 60).
IT The one-tailed test also forms a normal curve as shown in Figure 5:

Mean value

Acceptance Region
(If sample mean lies
in this area, accept
H0)
M
Rejection Region
(If sample mean
lies in this area,
reject H0)

–1.64

Figure 5: One-tailed Test

In Figure 5, at the 5% level of significance, z value would be –1.64. If the sample


mean is greater than –1.64, then H0 is accepted. Else, H0 is rejected.

The level of significance can be represented with the help of α and α/2 in one-
tailed test and two-tailed test, respectively. For example:
€€ In one-tailed test, if the level of significance is 5%, then α is 5%. In this case,
the value of test statistics would be determined at 0.05.
€€ In two-tailed test, if the level of significance is 5%, then the value of test
statistics would be determined at 0.025% (α/2).

188
The Concept of Hypothesis

3. ______________ is a process to make decisions for research problems by


using available data.
S elf
A ssessment 4. Null hypothesis is tested with the help of the levels of significance.
Q uestions (True/False)
5. ______________ errors are also known as the first kind of error or false
positive.
6. Which one of the following tests is a part of non-directional hypothesis?
a. Two-tailed test
b. One-tailed test
c. Both a and b
d. None of these

9.4 PROCEDURE OF HYPOTHESIS TESTING


IT Hypothesis testing is a step-by-step process that starts with the formulation of
hypothesis and ends with decision making. The steps involved in hypothesis
testing are shown in Figure 6:

1. State H0 and H1

2. State the Level of Significance and the Nature of Tail-test (Two-tail or


One-tail Test)
M
3. Decide on the Type of the Test of Significance

4. State the Decision Rule

5. Calculate the Test Statistics

6. Take a Decision

Figure 6: Process of Hypothesis Testing

Let us now discuss the process of hypothesis testing in detail.


1. State H0 and H1: In this step, null hypothesis and the alternative hypothesis
are framed.
For example, a research organisation wants to perform a significance test to
determine whether, the mean weight of Indian children aged 5 is 20 kg or not
(as claimed by reports). In this case, H0 and H1 would be as follows:
μp = 20

189
Research Methodology and Management Decision

H0: μp = 20 kg
H1: μp ≠ 20 kg
Where, μp= Population mean
2. State the level of significance: This refers to deciding the level of significance
(a) for the hypothesis test. The most commonly used level of significance is
5%. This happens because the range 5% is neither too big nor too small to
accept or reject a hypothesis.
3. Decide on the type of the test of significance: The test of significance is used
to check the hypothesis at a given level of significance. There are various
types of tests of significance, such as t-test, z-test, and F-test. The selection of
a test depends on various factors such as the sample size, variance and type
of population. For example, you use the t-test when the sample size is less
than 30 and the z-test when the sample size is more than 30.
4. State the decision rule: It refers to determining the conditions under
which the null hypothesis is accepted or rejected. If the decision rule is
not determined correctly, then there are chances of committing Type I and
Type II errors. Therefore, you should be careful while making the decision
IT rule.
5. Calculate the test statistics: It refers to ascertaining the value of test statistics
to accept or reject the hypothesis.
6. Take a decision: It refers to either accepting or rejecting H0 on the basis of
the calculated value of test statistics. If the calculated probability is equal to
or smaller than a value (in one tailed test) or smaller than a/2 (in two-tailed
test), then null hypothesis is rejected. However, if calculated probability is
greater than a value, then null hypothesis is accepted. Rejecting H0 may lead
M
to Type I error whereas accepting H0 may lead to Type II error.

7. Hypothesis testing is a step-by-step process that starts with the


formulation of hypothesis and ends with _______________.
S elf
A ssessment 8. What does μp stand for in hypothesis testing?
Q uestions
a. Sample mean b. Population mean
c. Level of significance d. Coefficient of correlation
9. Which one of the following is the commonly used level of significance
for minimising Type I and Type II errors?
a. 10% b. 12%
c. 5% d. 7%
10. Which of the following are the types of test of significance?
a. t-test b. z-test
c. F-test d. All of the above

190
The Concept of Hypothesis

9.5 SUMMARY
€€ Hypothesis is a proposed explanation given for an observed situation.
€€ Inductive hypothesis is a type of derivation hypothesis, where you move
from specific observations to broad generalisations.
€€ Deductive hypothesis is a type of derivation hypothesis in which you move
from a general statement to a specific conclusion.
€€ Directional hypothesis refers to the formulation hypothesis that checks the
direction of relationship between two variables.
€€ Non-directional hypothesis refers to the formulation hypothesis, where the
direction of the relationship between two variables cannot be specified.
€€ Null hypothesis refers to the hypothesis in which there is no significant
relation between two variables under study. It is denoted by H0. It represents
the first statement of a hypothesis that is assumed to be true.
€€ Alternative hypothesis states that there is a relationship between two
variables under study. It is denoted by H1. It represents the second statement
of a hypothesis that is assumed to be false.
IT €€

€€
The decision rule states that before accepting or rejecting a null hypothesis,
the researcher should keep in mind all the criteria set for the hypothesis.
Two-tailed test is a part of non-directional hypothesis that talks about
the relationship between two variables under study, but does not explain
anything about the direction of the relationship.
€€ One-tailed test is a part of directional hypothesis that talks not only about
the relationship between two variables under study, but also the direction of
relationship.
M
€€ Hypothesis testing is a step-by-step process that starts with the formulation
of hypothesis and ends with decision making.

9.6 KEY WORDS


€€ Alternative hypothesis: The hypothesis that finds out the relation between
two variables.
€€ Deductive hypothesis: The type of hypothesis that moves from a general
observation to a specific conclusion.
€€ Directional hypothesis: The hypothesis that checks the direction of
relationship between two variables under study.
€€ Non-directional hypothesis: The hypothesis where the direction of
relationship between two variables under study cannot be specified.
€€ Null hypothesis: The hypothesis that says there is no relationship between
two variables under study.

191
Research Methodology and Management Decision

9.7 CASE STUDY: ONE-TAILED HYPOTHESIS TESTING FOR TESTING


PRODUCTION QUALITY
AM Pvt. Ltd. is an alternator manufacturer who produces alternators. The Indian
government has policy that an alternator can be sold in the market if it can run at
less than 71.1° C under stress test assuming 95% confidence. To test the quality,
the samples are chosen randomly on a daily basis by the quality department of
the company. On a particular date (say, D), 7 samples were drawn having a mean
71.3° C and a standard deviation of 0.214°. The quality department of the company
wants to find out if there is any quality issue or not.
For testing the above stated problem, the null hypothesis and alternate hypothesis
is stated as:
H0: µp ≤ 71.1°
H0: µp > 71.1°
Now, the researcher (quality department) finds the p and α values:
p = 95% = 0.95
IT Level of significance a = 1 – p = 0.05

Degree of freedom (df) for the sample = 7-1 = 6

Thereafter, the researcher finds out the value of t-statistic at 95% confidence and df
at 6 using t-table, which comes out to be 1.943.

Now, the researcher calculates the value of t-statistic as:


X− µ
M
tc =
s / n
71.3 − 71.1
tc =
0.214 / 7

0.2
tc = = 2.47
0.0809

The researcher draws a detailed graph to represent his research as:

95.0%

Reject

5.0%
Fail to Reject

μ tc = 2.47
tp = 1.943

192
The Concept of Hypothesis

It can be seen that the critical t-value (tc) lies in the rejection region. Therefore, the
researcher rejects the null hypothesis. Rejecting the null hypothesis means that
the sample was not acceptable and it can be stated that there is some issue in the
production of alternators at AM Pvt. Ltd. which it must find out and resolve.

QUESTIONS
1. What would be the value of tc if the sample had 49 alternators in it?
(Hint:
X− µ
tc =
s / n
71.3 − 71.1
tc =
0.214 / 49
0.2
= t c = 6.55 )
0.0305
2. What would be the value of tc if the standard deviation was changed to 0.851?
(Hint:
IT tc =
X− µ
s / n
71.3 − 71.1
tc =
0.851 / 7
0.2
=t c = 0.623
0.321
M
In this case, the null hypothesis would have been accepted.)

9.8 EXERCISE
1. Describe the hypothesis and its types in detail.
2. What are the characteristics of a good hypothesis?
3. Explain the hypothesis testing in detail.
4. Explain the following terms:
a. Null Hypothesis.
b. Two-tailed test.
c. Decision rule.

9.9 ANSWERS FOR SELF ASSESSMENT QUESTIONS

Topic Q. No. Answer


Defining Hypothesis 1. Hypothesis
2. c. 1 iii, 2 i, 3 iv, 4 ii

193
Research Methodology and Management Decision

Topic Q. No. Answer


Hypothesis Testing 3. Hypothesis testing
4. True
5. Type I
6. a. Two-tailed test
Procedure of Hypothesis Testing 7. Decision making
8. b. Population mean
9. c. 5%
10. d. All of the above

9.10 SUGGESTED BOOKS AND E-REFERENCES

SUGGESTED BOOKS
IT €€

€€

€€
Cahoon, M. (1987). Research methodology. Edinburgh: Churchill Livingstone.
Detterman, D. (1985). Research methodology. Norwood, N.J.: Ablex.
Panneerselvam, R. (2014). Research methodology. Delhi: PHI Learning.

E-REFERENCES
€€ Different Research Methods - How to Choose an Appropriate Design? (2018).
Retrieved from https://explorable.com/different-research-methods
M
€€ Research Methodology. (2018). Retrieved from https://explorable.com/
research-methodology
€€ Research Methodology: Approaches & Techniques - Video & Lesson
Transcript | Study.com. (2018). Retrieved from https://study.com/academy/
lesson/research-methodology-approaches-techniques-quiz.html
€€ Research Methods. (2018). Retrieved from http://faculty.webster.edu/
woolflm/statmethods.html

194
CHAPTER

10
PARAMETRIC TESTS

Table of Contents
IT
Learning Objectives
10.1 Introduction
10.2 Types of Hypothesis Testing
Self Assessment Questions
10.3 Parametric Tests
Self Assessment Questions
10.4 One-Sample Tests - Different Situations in Which One Sample Test is Used
M
10.4.1 Exploring Case-I
10.4.2 Exploring Case-II
10.4.3 Exploring Case-III
10.4.4 Exploring Case-IV
10.4.5 Exploring Case-V
10.4.6 Exploring Case-VI
Self Assessment Questions
10.5 Two-Sample Tests
10.5.1 Differences between Two Independent Samples
10.5.2 Differences between Two Proportions
10.5.3 Comparing Two Related Samples
10.5.4 Study of Equality of Variances of Two Populations
Self Assessment Questions
10.6 Exploring ANOVA
10.6.1 One-Way ANOVA
10.6.2 Two-Way ANOVA
Self Assessment Questions
10.7 Summary
10.8 Key Words
10.9 Case Study
10.10 Exercise
10.11 Answers for Self Assessment Questions
10.12 Suggested Books and e-References
L E A R N I N G O B J E C T I V E S
IT
After studying this chapter, you will be able to:



Distinguish between parametric and non-parametric tests for testing hypotheses
Describe the different types of parametric tests
 Explain the concepts of one-sample tests and two-sample tests
 Describe the concept of ANOVA
M
Parametric Tests

10.1 INTRODUCTION
In the previous chapter, you studied to test a hypothesis to find the solution of
a research problem. To check the validity of a hypothesis, you can use two main
types of tests, parametric tests and non-parametric tests. This chapter describes the
various types of parametric tests.

Parametric tests are statistical measures used in the analysis phase of research to
draw inferences and conclusions to solve a research problem. There are various
types of parametric tests, such as z-test, t-test and F-test. Selection of a particular
test for a research depends upon various factors, such as the type of population,
sample size, Standard Deviation (SD) and variance of population. It is important
for a researcher to identify the appropriate test to maintain the authenticity and
validity of research results.

In this chapter, you will learn about the concept of parametric tests. You will learn
about one-sample and two-sample tests. You will also learn to apply z-test, t-test
and F-test in different conditions and scenarios for one-sample and two-sample
tests.
IT
10.2 TYPES OF HYPOTHESIS TESTING
A hypothesis can be tested by using a large number of tests. Therefore, researchers
have found it more convenient to categorise these tests on the basis of their
similarities and differences. Hypothesis tests are divided into two types, as shown
in Figure 1:

Types of Hypothesis
M
Tests

Parametric Tests Non-Parametric Tests

Figure 1: Types of Hypothesis Tests

€€ Parametric tests: In these tests, the researcher makes assumptions about the
parameters of the population from which a sample is derived. An example
of a parametric test is z-test.
€€ Non-parametric tests: These are distribution-free tests of hypotheses. Here,
the researcher does not make assumptions about the parameters of the
population from which a sample is derived. An example of a non-parametric
test is the Kruskal Wallis test.

1. What do you call the hypotheses tests where the researcher makes
assumptions about the parameters of the population from which a
S elf sample is derived?
A ssessment
Q uestions a. Non-parametric tests b. Parametric tests
c. Chi - Square test d. Distribution-free tests

197
Research Methodology and Management Decision

10.3 PARAMETRIC TESTS


In parametric tests, researchers assume certain properties of the parent population
from which samples are drawn. These assumptions include properties, such as the
sample size, type of population, mean and variance of population and distribution
of the variable.

For example, t-test assumes that the variable under study in population is normally
distributed. Researchers calculate the parameters of population using various test
statistics. Then, they test the hypothesis by comparing the calculated value of
parameters with the benchmark value given in the problem. The scale used for
dependent value in parametric tests is mostly the interval scale or ratio.

There are various types of parametric tests, as shown in Figure 2:

z-test
Parametric Tests

IT t-test

F-test

Figure 2: Types of Parametric Tests


M
Let us now discuss each type of test:
€€ z-test: This test is used to study the mean and proportion of samples having
a sample size of more than 30. It involves comparison of means of two
different and unrelated samples drawn from the same population whose
variance in known. The z-value (test statistic) is calculated for the present
data and compared with the z-value at that level of significance, which is
decided earlier in the question/problem. After comparison, researcher may
decide to reject or support null hypothesis.
The z-test is used in the following cases:
zz To compare the mean of a sample with the mean of a hypothesised
population when the sample size is large and the population variance is
known
zz To compare the significant difference between the means of two
independent samples in the case of large samples or when the population
variance is known
zz To compare the proportion of a sample with the proportion of the
population
€€ t-test: This test is used to study the mean of samples when the sample size
is less than 30 and/or the population variance is unknown. It is based on

198
Parametric Tests

t-distribution. A t-distribution is a type of probability distribution that is


appropriate for estimating the mean of a normally distributed population
where the sample size is small and population variance is unknown.
The t-value (test statistic) is calculated for the present data and compared
with the t-value at a specified level of significance for concerning degrees of
freedom for accepting/rejecting the null hypothesis. The degree of freedom is
calculated by subtracting one observation from the number of observations.
It is used to check the t-value in the t-distribution table.
Sometimes, the t-test is used to compare the means of two related samples
when the sample size is small and the population variance is unknown. In
such a situation, it is known as the paired t-test.

If the sample size is small but the population is normal and the population
N ote standard deviation is known; then, z-test can be used.

€€ F-test: This test is used to compare the ratio of variances of two samples under
study. It involves comparing the ratio of two variances of two samples. The
IT F-distribution is a right-skewed distribution that is used most common in
Analysis of Variance (ANOVA). Here, the test statistic has an F-distribution.
The F-value (test statistic) is calculated for the present data and compared
with the F-value at that level of significance, which is decided earlier in the
question/problem. In a F-test, these are two independent degrees of freedom
in numerator and denominator respectively. The degrees of freedom (d.f.) of
two samples are calculated separately by subtracting one from the number
of observations. After that, the F-value is calculated from the F-distribution
M
table.

All the hypothesis testing are done assuming that the null hypothesis is true.
N ote

Parametric tests are further divided into two parts – one-sample tests and two-
sample tests. You will learn more about them in the next sections.

ASSUMPTIONS OF F-TEST

F-distribution is usually asymmetric with minimum value of zero. However,


E xhibit
the maximum value is infinity.

Assumptions for using an F-test include:


1. Both the samples come from normal distribution.
2. Observations in each sample are selected randomly.
F-statistic can never be negative as it is a ratio of two squared numbers.

199
Research Methodology and Management Decision

The degrees of freedom for different tests is calculated in different ways as


follows:

Test Degree of Freedom


One sample t-test n – 1; where, n = sample size
Paired data t-test n – 1; where, n = number of pairs of data points
t-test for two independent (n1 – 1) + (n2 – 1); where, n1 and n2 are sizes of
populations two samples
Chi-square test for (r–1) (c–1); where r equals number of levels for
independence one category of variable and c equals number of
levels for second category of variable
Chi-square test for n – 1; where, n = the number of levels of a single
goodness of fit categorical variable
One Factor ANOVA Degree of Freedom of Numerator (dfn) = k – 1;
(F-test) and
Degree of Freedom of Denominator (dfd) = N
– k;
IT Where, n = Total number of data values in an
experiment, and
k = the number of groups

2. Which of the following parametric tests is used to study the mean and
proportion of samples having a sample size more than 30?
S elf
M
A ssessment a. t-test b. F-test
Q uestions
c. Chi-square test d. z-test
3. The ___________ is used to compare the mean of samples when the
sample size is less than 30 and the population variance is unknown.
4. Which test is used to compare the significant difference between the
variances of two samples under study?
a. z-test b. Chi-square test
c. t-test d. F-test
5. The degree of freedom is calculated by subtracting ___________ from the
___________ for t-test.

10.4 ONE-SAMPLE TESTS - DIFFERENT SITUATIONS IN WHICH ONE SAMPLE


TEST IS USED
In a one-sample test, the researcher compares the mean of a sample to a pre-
specified value and tests for deviation from that value. In this test, you can
determine the mean, variance and proportion of the sample and population with
the help of z-test and t-test.

200
Parametric Tests

The one-sample test is used in various situations as mentioned in Table 1:

Table 1: Cases of One-Sample Tests


Case Population Sample Population Sample Population Sample Test
Size Mean Mean Variance Variance
Case-I Normal and Large Unknown Known Two-tailed or
finite and one-tailed
n/N < 0.05
Case-II Normal and Large Known Known Two-tailed or
finite one-tailed
Case-III Normal and Large Unknown Known Two-tailed or
infinite one-tailed
Case-IV Population proportion and sample proportion are known
Case-V Normal and Small Unknown Known Two-tailed or
infinite one-tailed
Case-VI Normal and Small Unknown Known Two-tailed or
finite one-tailed

Let us study these cases in detail.


IT
10.4.1 EXPLORING CASE-I
In this case, the population is normal and finite, the sample size is large and the
population variance is unknown. The researcher uses the following test statistic:

=t
X −µ
×
(N − n )
(s / n ) ( N − 1)
M
Where,

μ = Population mean

N = Population size

n = Sample size

s = Standard Deviation of the sample

X = Sample mean

10.4.2 EXPLORING CASE-II


In Case-II, the population is normal and finite, the sample size is large and the
population variance is known. In this case, the researcher uses the following test
statistic:

z=
X −µ
×
(N − n )
(σ / n ) ( N − 1)

Where,

μ = Population mean

201
Research Methodology and Management Decision

n = Sample size

σ = Standard Deviation of the population

X = Sample mean

Let us understand the application of Case-II with the help of an example.

Example 1: The population mean diameter of all products produced by an


organisation is presumed to be 8 cm, with a SD of 2.5. The size of the population is
50. Now, the organisation has taken a random sample of 35 pieces of product A to
know whether the average diameter of sample production of diameter sample this
product is the same or more than the overall production. The average mean for
product A is 10 cm. Use 5% as the level of significance. Construct the hypothesis
and carry out the test of significant for this problem.

Solution: The null hypothesis and the alternative hypothesis are as follows:

H0: The average of production of product A is the same as the overall production
of all products combined.

H1: The average of production of product A is more than the overall production of
IT all products combined.

Or,

H0: µ = 8 cm

H1: µ > 8 cm

Assumed Population mean (µ) = 8 cm


M
Population size (N) = 50

Sample size (n) = 35

Sample mean (X) = 10

Standard Deviation of population (σ) = 2.5

Since the population is finite, the researcher uses the following formula for z-test
to test the hypothesis for significance:

z=
X −µ
×
(N − n )
(σ / n ) ( N − 1)

10 − 8 50 − 35
=z ×
2.5 50 − 1
35

2 15
=z ×
2.5 49
5.91

z =4.728 × 0.5532 =2.6155

The z-value for the 5% level of significance for one-tailed test is + 1.64.

202
Parametric Tests

The graphical representation of the preceding solution is given in Figure 3:

Acceptance Region

Rejection Region

+1.64 +2.61

Figure 3: Rejecting Calculated z-value

In Figure 3, it can be observed that the calculated value of z lies in the rejection
region; therefore, H0 is rejected. This implies that the average diameter production
of product A is more than the overall production.
IT
10.4.3 EXPLORING CASE-III
In Case-III, the population is normal and infinite, the sample size is large and the
population variance is unknown. In this case, the following test statistic is used:

X −µ
t=
(s / n )
Where,
M
μ = Population mean
n = Sample size
s = Standard Deviation of sample
X= Sample mean
Let us understand the application of Case-III with the help of an example.
Example 2: The rating given by 36 existing customers of an organisation from the
south part of a city to a newly launched product is as follows (1 being the lowest
and 10 being the highest rating):
5, 6, 10, 9, 8, 7, 2, 3, 8, 9, 7, 9, 10, 4, 3, 2, 10, 8, 9, 6, 2, 6, 5, 8, 9, 7, 7, 7, 7, 2, 4, 5, 5, 5,
10, 10
The marketers have the average rating from the whole city as 7.5. Now, the
organisation wants to know whether the south part also has the same rating. Use
5% as the level of significance.
Solution: The null hypothesis and the alternative hypothesis are as follows:
H0: The average rating of the south part of the city is the same as the average rating
of the city

203
Research Methodology and Management Decision

H1: The average rating of the south part of the city is not the same as the average
rating of the city

Or,

H0: μ = 7.5

H1: μ ≠ 7.5

Where, μ = population mean, that is, the rating given by the customers in the south
part of the city

The data and the calculation part of the previous problem are shown in Table 2:

Table 2: Ratings given by Customers


No. of Rating Given by Xi – X (Xi – X)2
Observations Customers (Xi)
1 6 –1.4 1.96
2 7 –0.4 0.16
3 10 2.6 6.76
IT 4
5
6
9
8
7
1.6
0.6
–0.4
2.56
0.36
0.16
7 5 –2.4 5.76
8 8 0.6 0.36
9 8 0.6 0.36
10 9 1.6 2.56
11 7 –0.4 0.16
M
12 9 1.6 2.56
13 10 2.6 6.76
14 4 –3.4 11.56
15 8 0.6 0.36
16 5 –2.4 5.76
17 10 2.6 6.76
18 8 0.6 0.36
19 9 1.6 2.56
20 6 –1.4 1.96
21 6 –1.4 1.96
22 6 –1.4 1.96
23 8 0.6 0.36
24 8 0.6 0.36
25 9 1.6 2.56
26 7 –0.4 0.16
27 7 –0.4 0.16
28 7 –0.4 0.16
29 7 –0.4 0.16

204
Parametric Tests

No. of Rating Given by Xi – X (Xi – X)2


Observations Customers (Xi)
30 5 –2.4 5.76
31 4 –3.4 11.56
32 6 –1.4 1.96
33 8 0.6 0.36
34 5 –2.4 5.76
35 10 2.6 6.76
36 10 2.6 6.76
Total ∑Xi = 266 ∑(Xi–X) =0.4 2
∑(Xi–X)2=106.56

∑X i
Sample mean (X) = i =1
n

X = 266/36
X = 7.38
IT
Population mean (μ) = 7.5
Sample size (n) = 36
Since the standard deviation for the population is not given, the researcher needs
to calculate the SD for the sample.

∑(X − X)
2
i
i =1
Standard Deviation of sample (s) =
M
( n − 1)

s= 106.56
35

s = 3.044
The population is infinite; therefore, the researcher uses the following formula for
t-test to test the hypothesis for significance:

X −µ
t=
s/ n
7.38 − 7.5
t=
3.044 / 36
−0.12 −0.12 × 6
t= = = −0.236
3.044 / 6 3.044
The t-value for the 5% level of significance for two-tailed test is + 2.03.

After checking the t-value for significance, the researcher applies two-tailed test.

205
Research Methodology and Management Decision

The graphical representation of the preceding solution is shown in Figure 4:

Acceptance Region

-0.196 0.28 +2.03

Figure 4: The Position of Calculated z-value

In Figure 4, it can be observed that the calculated z-value lies in the acceptance
region; therefore, We do not reject H0. This implies that the average rating of the
south part of the city is the same as the average rating of the city.

10.4.4 EXPLORING CASE-IV


IT In Case-IV, the observed sample proportion are known. In such a situation, the
researcher uses the following test statistic:
p̂ − p
z=
pq
n
x
p̂ =
n
M
Where, p = Proportion of success of population (assumed)

n = Sample size

q = Proportion of failure of population

x = value to be standardised

p̂ (Pronounced as p-hat) = Observed sample proportion

p̂ can be an unbiased measure of p

Example 3: According to a record of a college, the proportion of girl students


presumed in the college is 40%. The college principal conducted a survey of 3000
students to validate the college record. Out of 3000 students, 1450 are girls and
the rest are boys. Now, the principal wants to check the authenticity of the survey
through the test of significance to know the degree of validity of the record. Use
5% as the level of significance.

Solution: The null hypothesis and the alternative hypothesis are as follows:

H0: The proportion of girl students observed in the survey is the same as in the
college record.

206
Parametric Tests

H1: The proportion of girl students observed in the survey is different from their
proportion in the college record.
Or,
H0: p = 0.40
H1: p ≠ 0.40
Where,
p= Probability of success, that is, the actual proportion of girls in the college
p = 0.40
q = 1 – 0.40
q = 0.60
Sample size (n) = 3000
Observed sample proportion, (p̂) = 1450/3000
(p̂) = 0.4833
IT
z=
p̂ − p
pq
n

z = 0.0833/0.009
z = 9.26
The z-value for the 5% level of significance for two-tailed test is ± 1.96. The graphical
M
representation of the preceding solution is shown in Figure 5:

Acceptance Region

Rejection Region

–1.96 +1.96 9.26

Figure 5: Calculated z-value When the Proportion of Population and Sample Means Are Given

In Figure 5, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. This implies that the proportion of girl students observed in the
survey is different from their proportion in the college record. It can be interpreted
from the calculated z-value that the average number of girls in the college has
increased.

207
Research Methodology and Management Decision

10.4.5 EXPLORING CASE-V


In Case-V, the population is normal and infinite, the sample size is small, and the
population variance is unknown. In this case, the researcher uses the following
test statistic:
X −µ
t=
s/ n
Where,

μ = Population mean

n = Sample size

s = Standard deviation of sample

X = Sample mean

Let us understand the application of case V with the help of an example.

Example 4: A researcher wants to study the average income of a group of 25 people


working as marketing executives in different organisations (especially small and
IT medium enterprises). The salary of the sample of 25 marketing executives included
in a study sample is recorded as:

No. of Observations Income (lacs)


1 2
2 1.9
3 2
4 2
M
5 1.9
6 2
7 1.9
8 2
9 1.9
10 2
11 2
12 1.9
13 2
14 1.9
15 2
16 2
17 1.8
18 2
19 2
20 2
21 2
22 1.9

208
Parametric Tests

No. of Observations Income (lacs)


23 2
24 1.9
25 2
Total

The average recorded package for the marketing executive post is ` 2 lakhs. The
researcher wants to know whether the average recorded package is valid for this
group or not. Use 5% as the level of significance.

Solution: The null hypothesis and the alternative hypothesis are as follows:

H0: The average recorded package and the sample average income of group are
the same.

H1: The average recorded package and the sample average income of group are
different.

Or,
IT
H0: µ = 2,00,000

H1: µ ≠ 2,00,000

Where,

µ = Population mean, that is, the sample mean for the income of the group of 25
executives.

The data and the calculation part for this example are shown in Table 3:
M
Table 3: Income of People at the Marketing Executive Post
No. of 2
Income (lacs) Xi – X (Xi – X )
Observations
1 2 0.04 0.0016
2 1.9 –0.06 0.0036
3 2 0.04 0.0016
4 2 0.04 0.0016
5 1.9 –0.06 0.0036
6 2 0.04 0.0016
7 1.9 –0.06 0.0036
8 2 0.04 0.0016
9 1.9 –0.06 0.0036
10 2 0.04 0.0016
11 2 0.04 0.0016
12 1.9 –0.06 0.0036
13 2 0.04 0.0016
14 1.9 –0.06 0.0036

209
Research Methodology and Management Decision

No. of 2
Income (lacs) Xi – X (Xi – X )
Observations
15 2 0.04 0.0016
16 2 0.04 0.0016
17 1.8 –0.16 0.0256
18 2 0.04 0.0016
19 2 0.04 0.0016
20 2 0.04 0.0016
21 2 0.04 0.0016
22 1.9 –0.06 0.0036
23 2 0.04 0.0016
24 1.9 –0.06 0.0036
25 2 0.04 0.0016
Total ∑Xi = 49 ∑(Xi–X)2 = 0.08

Population mean (μ) = 2 lakhs (assumed)


IT Sample size (n) = 25
n

∑X i
Sample mean (X) = i =1
n

49
X=
25
M
X = 1.96

Since the standard deviation for the population is unknown, the researcher needs
to calculate the standard deviation for the sample as follows:

∑(X − X)
2
i
Standard deviation of sample (s) =
n −1

0.08
s=
24

s = 0.058

The population is infinite; therefore, the researcher uses the following formula for
t-test to test the hypothesis for significance:

X −µ
t=
s/ n
−0.04
t=
0.0116
t = – 3.45

210
Parametric Tests

Degree of Freedom (d.f) = n – 1

= 25 – 1

= 24

The t-value for the 5% level of significance for two-tailed test and 24 d.f. is ±2.064.
The graphical representation of the preceding solution is shown in Figure 6:

Acceptance Region

Rejection Region

–3.45 –2.064 +2.064


IT Figure 6: The Position of Calculated t-value

In Figure 6, it can be observed that the calculated t-value lies in the rejection region;
therefore, H0 is rejected. This implies that the average recorded package and the
sample average of the income of the group are different. It can be interpreted that
the average income for the marketing executive post has decreased in the market.

10.4.6 EXPLORING CASE-VI


M
In Case-VI, the population is normal and finite, the sample size is small, and the
population variance is unknown. In this case, the researcher uses the following
test statistic:

t=
X −µ (N − n )
s/ n ( N − 1)
Where,

μ = Population mean

n = Sample size

s = Standard deviation of sample

X = Sample mean

6. One-sample test helps in determining ___________, ___________ and


S elf ___________ of a sample population with the help of z-test, t-test and
A ssessment chi-square test.
Q uestions

211
Research Methodology and Management Decision

7. In the given formula z =


X −µ ( N − n ) , what does μ stands for?
σ/ n ( N − 1)
a. Sample Size
b. Population mean
c. Standard deviation of sample
d. Sample mean

10.5 TWO-SAMPLE TESTS


In a two-sample test, a researcher wants to study the relationship between two
samples drawn from two different or same populations. In this section, you will
learn about the application of z-test, t-test, and F-test in different situations. These
situations are as follows:
€€ Differences between two independent samples
IT €€

€€
Differences between two proportions
Comparing two-related samples
€€ Equality of the variances of two populations
The two-sample test in different situations is discussed in detail in the upcoming
sections.

10.5.1 DIFFERENCES BETWEEN TWO INDEPENDENT SAMPLES


M
In this study, a researcher finds the relationship between two samples that are taken
from two independent groups, in terms of their means. The samples are compared
to find out whether they are significantly different in terms of their mean value, or
they are drawn from the same population. The formula and method of conducting
the two-sample test are different in different situations.
Table 4 lists the different situations for conducting two-sample tests:

Table 4: Situations to Find Differences between Two Samples (Two-Sample Tests)


Situation Population Sample Population Test
Size Variance
Situation I Normal Large Unknown One-tailed or two-tailed
Situation II Normal Large Known One-tailed or two-tailed
Situation III Normal Small Unknown One-tailed or two-tailed

These different situations with examples are discussed in the following sections.

Situation I
In this situation, the population is normal, the sample size is large and the population
variance is unknown. The researcher can use either two-tailed test or one-tailed

212
Parametric Tests

test depending on the alternate hypothesis of the research. If the researcher wants
to compare the two samples drawn from two different populations, then he/she
would use the following test statistic:
X1 − X 2
t=
(s 2
1 ) (
/ n 1 + s 22 / n 2 )
Where,

X1 = Sample mean of the first sample

X2 = Sample mean of the second sample

s1 = Standard deviation of the first sample

s2 = Standard deviation of the second sample

n1 = Sample size of the first sample

n2 = Sample size of the second sample


IT
When in any research problem, the value of population variance in known,
N ote then, the researcher should use t-statistic.

Example 5: A researcher wants to compare the popularity of Brand A and


Brand B. Therefore, he/she takes a sample of 35 people and asks them to rate the
two brands on a 10-point scale (10 being the highest and 1 being the lowest). Use
5% as the level of significance.
M
Solution: The null hypothesis and the alternative hypothesis are as follows:

H0: The popularity of Brand A and Brand B is the same.

H1: The popularity of Brand A and Brand B is different

Or,

H0: μ1 = μ2

H1: μ1 ≠ μ2

Where,

μ1 = Population mean of Brand A

μ2 = Population mean of Brand B

The data and the calculation part of the preceding problem are shown in Table 5:

Table 5: Calculating the Popularity of Brand A and Brand B


No. of Brand A Brand B
(X1i – X1) (X1i – X1)2 (X2i – X2) (X2i – X2)2
Observations (X1i) (X2i)
1 7 9 –2 4 –0.4 0.16

213
Research Methodology and Management Decision

No. of Brand A Brand B


(X1i – X1) (X1i – X1)2 (X2i – X2) (X2i – X2)2
Observations (X1i) (X2i)
2 8 9 –1 1 –0.4 0.16
3 9 9 0 0 –0.4 0.16
4 10 9 1 1 –0.4 0.16
5 10 9 1 1 –0.4 0.16
6 9 9 0 0 –0.4 0.16
7 10 9 1 1 –0.4 0.16
8 10 9 1 1 –0.4 0.16
9 10 9 1 1 –0.4 0.16
10 6 10 –3 9 0.6 0.36
11 9 10 0 0 0.6 0.36
12 8 10 –1 1 0.6 0.36
13 8 10 –1 1 0.6 0.36
14 9 9 0 0 –0.4 0.16
15 7 10 –2 4 0.6 0.36
IT 16
17
18
9
10
8
10
10
10 –1
0
1
0
1
1
0.6
0.6
0.6
0.36
0.36
0.36
19 9 10 0 0 0.6 0.36
20 10 9 1 1 –0.4 0.16
21 10 9 1 1 –0.4 0.16
22 9 9 0 0 –0.4 0.16
M
23 9 9 0 0 –0.4 0.16
24 8 9 –1 1 –0.4 0.16
25 9 9 0 0 –0.4 0.16
26 9 9 0 0 –0.4 0.16
27 10 9 1 1 –0.4 0.16
28 10 9 1 1 –0.4 0.16
29 10 9 1 1 –0.4 0.16
30 10 9 1 1 –0.4 0.16
31 10 9 1 1 –0.4 0.16
32 9 10 0 0 0.6 0.36
33 9 10 0 0 0.6 0.36
34 8 10 –1 1 0.6 0.36
35 9 10 0 0 0.6 0.36
Total ∑X1i =315 ∑X2i =328 ∑(X1i–X1)2=36 ∑(X2i–X2)2=8.2

∑X 1i
Sample mean of Brand A (X1) = i =1
n

214
Parametric Tests

315
X1 =
35

X1 = 9
n

∑X 2i
Sample mean of Brand B (X2) = i =1
n

328
X2 =
35

X2 = 9.37  9.4

Standard Deviation of Sample A = s1

∑(X − X1 )
2

(s1) = 1i

(n 1
− 1)
36
=
IT =
34
1.058
= 1.028

Standard Deviation of Sample B = s2

∑(X − X2 )
2
2i
(s2) =
(n 2
− 1)
8.2
= = 0.2411
= 0.491
M
34

Since the sample size is more than 30 and two samples are under study, the
researcher applies the following z-test:

X1 − X 2
t=
(s 2
1 ) (
/ n 1 + s 22 / n 2 )

(1.028 ) ( 0.491)
2 2

9 − 9.4 ) / (1.028 ) + ( 0.491)


2 2
(
t=
( 9 − 9.4 ) / 35
t= 35 + 35
35
1.056 + 0.241
t = −0.4 / 1.056 + 0.241
t = −0.4 / 35
35
1.297
t= −0.4 / 1.297 = 0.037
t= −0.4 / 35 = 0.037
35
t= −0.4 / 0.037 = −10.81
t= −0.4 / 0.037 = −10.81
The t-value for the 5% level of significance for two-tailed test is ± 2.032 (degree of
freedom = 35 – 1 = 34)

215
Research Methodology and Management Decision

The graphical representation of the preceding solution is shown in Figure 7:

Acceptance Region

Rejection Region

–10.81 –2.032 2.032

Figure 7: The Position of Calculated z-value in the Case of Two Samples

In Figure 7, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. The popularity of Brand A is not the same as the popularity of
IT Brand B.

Situation-II
In this situation, the population is normal, the sample size is large, and the
population variance is known. The researcher can use either two-tailed test or one-
tailed test depending on the alternate hypothesis of the research. If the researcher
wants to compare two samples drawn from the same population, then he/she
would use the following test statistic:
M
X1 − X 2
Z=
 1   1  
σ 2p   +   
 n 1   n 2  
Where,

X1 = Sample mean of the first sample

X2 = Sample mean of the second sample

σ = Standard deviation of the populations

n1 = Sample size of the first sample

n2 = Sample size of the second sample

Example 6: A researcher has collected two samples from various production


houses of an organisation. He has taken a sample of Product P from 500 production
houses. He has found that the average production of Product P is equal to 1000
pieces/month with a standard deviation of 13 pieces. He has also taken a sample
of product Q from 400 production houses. He finds that the average production

216
Parametric Tests

of product Q is 1200 pieces/month with a standard deviation of 15 pieces. The


standard deviation of the production houses of the organisation is 14. Is this the
same organisation from where the researcher has collected the samples? Use 5%
as the level of significance.

Solution: The null hypothesis and alternative hypothesis are as follows:

H0: Population means of products P and Q are the same.

H1: Population means of products P and Q are different.

Or,

H0: μ1 = μ2

H1: μ1 ≠ μ2

Where, μ1 = Population mean of product P

μ2 = Population mean of product Q

Given details as below:


IT
Sample mean of product P (X1) = 1000

Sample mean of product Q (X2) = 1200

Standard deviation of sample P (s1) = 13

Standard deviation of sample Q (s2) = 15

Standard deviation of population (σ) = 14


M
Number of observations of sample P (n1) = 500

Number of observations of sample Q (n2) = 400

Since the sample size is more than 30, the population variance is known, and two
samples are under study, the researcher would apply the following z-test:
X1 − X 2
Z=
 1   1  
σ 2p   +   
 n 1   n 2  

 1 1 
(1000 − 1200 ) / (14 )
2
z=  500 + 400 
 
4 + 5
z= ( −200 ) / 196  
 2000 
z= ( −200 ) / 1764 / 2000
z= ( −200 ) / 0.939
z = −212.99

217
Research Methodology and Management Decision

The z-value for the 5% level of significance for two-tailed test is ± 1.96. The graphical
representation of the preceding solution is shown in Figure 8:

Acceptance Region

Rejection Region

–2.58 –1.96 +1.96

Figure 8: Representation of Calculated z-value in Case of Two Samples

In Figure 8, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. This implies that the population means of product P and Q are
different. It can be interpreted that the calculated z-value showing the difference
IT between means of two samples is statistically significant.

Situation-III
In this situation, the population is normal, the sample size is small, and the
population variance is unknown. The researcher can use either two-tailed test or
one-tailed test on the basis of research problem and the alternative hypothesis. If the
researcher wants to compare two samples drawn from two different populations,
then he/she would use the following test statistic:
M
X1 − X 2
t=
SE
Where,

SE = Standard Error
1 1
SE = S p +
n1 n 2

Where Sp= pooled standard deviation and

σ12 ( n 1 − 1) + σ 22 ( n 2 − 1)
Sp =
(n 1
− 1) + ( n 2 − 1)
X1 − X 2
∴t =
 σ12 ( n 1 − 1) + σ 22 ( n 2 − 1)   1 1
  + 
 ( n 1 − 1) + ( n 2 − 1)   n 1 n 2 

Where,

X1 = Sample mean of the first sample

X2 = Sample mean of the second sample

218
Parametric Tests

σ1 = Standard deviation of the first sample


σ2 = Standard deviation of the second sample
n1 = Sample size of the first sample
n2 = Sample size of the second sample
Example 7: The average sales volume of two cities, A and B, for an organisation in
10 retail outlets is 100 and 200, respectively. The standard deviation for A is 5.5 and
for B is 6.5. Test the hypothesis for the difference in sales of the two cities by using
5% as a test of significance.
Solution: The null hypothesis and the alternative hypothesis are as follows:
H0: Average sale of City A is equal to the average sale of city B.
H1: Average sale of City A is not equal to the average sale of city B.
Or,
H0: μ1 = μ2
ITH1: μ1 ≠ μ2
Where,
μ1 = Population mean of city A
μ2 = Population mean of city B

Sample mean of city A (X1) = 100

Sample mean of city B (X2) = 200


M
Standard deviation of sample A (s1) = 5.5
Standard deviation of sample B (s2) = 6.5
Number of observations of sample A and B (n) = n1 = n2 = 10
Since the sample size is less than 30 and two samples are under study, the researcher
would apply the following t-test:
X1 − X 2
t=
σs ( n 1 − 1) + σ
2
1
s 22 ( n 2 − 1)   1 1
  + 
 ( n 1 − 1) + ( n 2 − 1)   n 1 n 2 

t=
(100 − 200 )
 ( 5.5 ) (10 − 1) + ( 6.5 ) (10 − 1)   1
2 2

   + 1 


(10 − 1) + (10 − 1)  10 10 

−100
t=
 9 ( 30.25 + 42.25 )   1 
  
 18   5 
−100
=
7.25 219
−100 −100
= =
2.692 2.7
= −37.03
t=
(100 − 200 )
 ( 5.5 ) (10 − 1) + ( 6.5 ) (10 − 1)   1
2 2

   + 1 


(10 − 1) + (10 − 1)  10 10 

−100
t=
(
 9 30.25
Research Methodology and Management + 42.25
Decision ) 1
  
 18   5 
−100
=
7.25
−100 −100
= =
2.692 2.7
= −37.03
The t-value for the 5% level of significance for two-tailed test with 18 as degree
of freedom is ± 2.101. The graphical representation of the preceding solution is
shown in Figure 9:

Acceptance Region

Rejection Region

–37.03 –2.101 +2.101


IT Figure 9: Rejection of Calculated t-value in Case of Two Samples

In Figure 9, it can be observed that the t-value lies in the rejection region; therefore,
H0 is rejected. This implies that the average sales volume of City A is not equal to
the average sales volume of city B. It can be interpreted from the calculated t-value
that the difference between the means of the two samples is statistically significant.
M
10.5.2 DIFFERENCES BETWEEN TWO PROPORTIONS
In this study, a researcher finds the relationship between two samples that are
given in the form of proportions. The researcher tries to find whether the two
proportions are significantly different from each other or not. The samples are
drawn from the same or different populations. This study can also be used to
compare the proportions of a sample and a population:

Difference between the proportion of two samples belonging to two independent


groups can be tested if the population is normal, the sample size is large, and the
proportion of samples is known. The researcher can use either two-tailed test or
one-tailed test on the basis of the nature of research question. If the researcher
wants to compare the proportions of two samples drawn from two different
populations, then he/she would use the following test statistic:
p1 − p 2
z=
 p1q 1   p 2q 2 
 + 
 n1   n 2 

Where,

p1 = Proportion of success of the first sample

p2 = Proportion of success of the second sample

220
Parametric Tests

q1 = Proportion of failure of the first sample

q2 = Proportion of failure of the second sample

n1 = Sample size of the first sample

n2 = Sample size of the second sample

Example 8: In a college, there are two streams: Science and commerce. The college
management wants to find out whether there is a significant difference between
the proportions of average students (students who are neither toppers or laggards
with respect to study) of the two streams. Therefore, the management conducts a
survey and finds out that 350 students out of 500 students of the science stream are
under the category of average students. In the case of the commerce stream, 550
students out of 600 students are under the category of average students. Use 5% as
a level of significance.

Solution: The null hypothesis and the alternative hypothesis are as follows:

H0: There is no difference between the proportions of average students of the


science and commerce streams in the college.
IT
H1: There is a significant difference between the proportions of average students of
the science and commerce streams in the college.

Or,

H0: p1 = p2

H1: p1 ≠ p2

Where,
M
p1 = Proportion of success in the science stream

p2 = Proportion of success in the commerce stream


Proportion of success in the science stream, p1 = 350/500
p1 = 0.7
Proportion of failure in the science stream, q1 = 1 – p1 = 1 – 0.7
q1 = 0.3
Proportion of success in the commerce stream, p2 = 300/600
p2 = 0.5
Proportion of failure in the commerce stream, q2 = 1 – p1= 1 – 0.5
q2 = 0.5
Sample size of science stream, (n1) = 500
Sample size of commerce stream, (n2) = 600

221
Research Methodology and Management Decision

The test of significance used is:


p1 − p 2
z=
 p1q 1   p 2q 2 
 + 
 n1   n 2 
0.7 − 0.5
z=
( 0.7 )( 0.3 ) + ( 0.5 )( 0.5 )
500 600

0.2
z=
0.029

z = 6.9
The z-value for the 5% level of significance for two-tailed test is ± 1.96.
The graphical representation of the preceding solution is shown in Figure 10:
IT Acceptance Region

Rejection Region
M
–1.96 +1.96 6.9

Figure 10: Rejection of the Calculated z-value in the Case of Two-Sample Proportions

In Figure 10, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. This implies that there is a significant difference between the average
students of the science and commerce streams in the college. It can be interpreted
from the calculated z-value that the difference between the proportions of the two
samples is statistically significant.

Example 9: In a sample of 700 engineering colleges from a state, littering by


first year students was prevalent in 500 colleges. After the ban on littering in the
same state, it was found that 500 colleges out of 800 colleges were involved in
the littering. The decrease in the proportion of the number of colleges involved
in littering was significant or not? Test the hypothesis at the 1% level of significance.

Solution: The null hypothesis and the alternative hypothesis are as follows:

H0: There is no difference between the proportion of the number of engineering


colleges involved in littering before and after the ban on littering.

H1: There is a significant difference between the proportion of the number of


engineering colleges involved in littering before and after the ban on littering.

222
Parametric Tests

Or,

H0: p1 = p2

H1: p1 ≠ p2

Where,

p1 = Proportion of success in sample one

p2 = Proportion of success in sample two

Proportion of success in sample one, p1 = 500/700

p1 = 0.71

Proportion of failure in sample one, q1 = 1 – p1= 1 – 0.71

q1 = 0.29

Proportion of success in sample two, p2 = 500/800

p2 = 0.625
IT
Proportion of failure in sample two, q2 = 1 – p1= 1 – 0.625

q2 = 0.375

Size of sample one, (n1) = 700

Size of sample two, (n2) = 800

The two samples are taken from the same population; therefore, you can calculate
M
the best estimate for proportion, which is the common value of proportion. The
best estimate for proportion (p0) for the two samples of colleges involved in
ragging can be calculated as follows:
n 1 p1 + n 2 p 2
p0 =
n1 + n 2

p0 =
( 700 × 0.71) + ( 800 × 0.625 )
700 + 800
p0 = 0.66

q0 = 1 – 0.66

q0 = 0.34

The test of significance used is as follows:


p1 − p 2
z=
 p1q 1   p 2q 2 
 + 
 n1   n 2 

223
Research Methodology and Management Decision

0.71 − 0.625
z=
( 0.66 × 0.34 ) + ( 0.66 × 0.34 )
700 800
0.085
z=
0.024
z = 3.54

The z-value for the 1% level of significance for two-tailed test is ± 2.58. The graphical
representation of the preceding solution is shown in Figure 11:

Acceptance Region

Rejection Region
IT –2.58 +2.58 3.54

Figure 11: The z-value Calculated with the Help of Best Estimate of Proportion

In Figure 11, it can be observed that the z-value lies in the rejection region;
therefore, H0 is rejected. This implies that there is a significant difference between
the number of engineering colleges involved in littering. It can be interpreted from
the calculated z-value that the difference between the proportions of two samples
is statistically significant.
M
10.5.3 COMPARING TWO RELATED SAMPLES
In this study, the researcher takes two related samples. The samples are related to
each other in some way. They are compared to find a relationship between them.
The researcher has to test if there is any statistical difference between the means for
the two groups. This type of study is done to find out the impact of certain policies
on an entity, such as the impact made by introducing new human resource policies
on an organisation. To study the impact of changes, data is collected before and
after the occurrence of events. The difference between both the samples (datasets)
is calculated to test whether the samples show a positive or negative impact of the
changes.

If a researcher wants to compare two related samples, then he/she can use the
following test statistic:
D
t=
(SD n )
Where,

D = Mean difference between the two samples


SD = Standard Deviation of the sample

224
Parametric Tests

n = Sample Size

SD of a sample can be calculated by using the following formula:

(∑ D)
2

∑D 2

n
( SD ) =
( n − 1)

10.5.4 STUDY OF EQUALITY OF VARIANCES OF TWO POPULATIONS


In this study, a researcher takes two samples from two populations and finds
whether there is a significant difference between the two populations by comparing
their variances. The sample variances are known to the researcher. The researcher
uses the F-test to study the equality of variances of the two populations. If the
researcher wants to compare the variances of two different populations, then he/
she would use the following test statistic:
s12
F=
s 22

Where, s1 is larger of the two variances


IT
s12 = Variance of the first sample =
n1

∑(X
i =1
1i
− X1 )
2

(n 1
− 1)

n1

∑(X − X2 )
2
2i
s22 = Variance of the second sample = i =1

(n − 1)
M
2

Variance of the two samples can be calculated using the following formula:
X1i = Value of observation of the first sample
X2i = Value of observation of the second sample
X1 = Mean of the first sample
X2 = Mean of the second sample
n1 = Sample size of the first sample
n2 = Sample size of the second sample
Degree of freedom for first sample 1, v1 = n1 – 1
Degree of freedom for second sample 2, v2 = n2 – 1

F-value is calculated by dividing the larger variance by smaller variance.


N ote

Let us learn to calculate the equality of variances from two different populations
with the help of an example.

225
Research Methodology and Management Decision

Example 10: A researcher studied two samples of a type of wheat produced from
the north region and the south region of a state. He took two samples of wheat –
type A (north region) and type B (south region). The sample size of type A wheat
is 10 cities and the sample size of type B wheat is 13 cities. The variance for two
samples with respect to gluten content are 5 and 4 respectively. The researcher
wants to find out whether the two populations have the same variance. Test this at
the 5% significance level.
Solution: The null hypothesis and the alternative hypothesis are as follows:
H0: The variance of the two populations is the same.
H1: The variance of the two populations is different.
Or,
H 0 : σ12 =σ 22
H1 : σ12 ≠ σ 22

Where,

s12 = Population variance from sample A


IT s 22 = Population variance from sample B

We are given that σ12 = 5 and σ 22 = 4

Therefore,
The test of significance used is:

s12
M
F=
s 22

F = 5/4
F = 1.25
Degree of Freedom for sample A = n1 – 1
= 10 – 1= 9
Degree of Freedom for sample B = n2 – 1
= 13 – 1
= 12
The value of sample B is greater than the value of sample A; therefore, v1 = 12 and
v2 = 9.
In this case, the F-values for the two-tailed test are calculated as:
Fα/2 = F(0.025,12,9) = 3.87
1
F1–α/2 = F(0.975,12,9) = 0.291 =
F0.025,9,12

226
Parametric Tests

The graphical representation of the preceding solution is shown in Figure 12:

Accept H0

Reject H0

Reject H0
1.25
F1–a/2 = 0.291 Fa/2 = 3.87

Figure 12: The Position of Calculated F-value

In Figure 12, it can be observed that the calculated F-value lies in the acceptance
IT region; therefore, H0 is accepted and H1 is rejected. This implies that there is
no difference between the variances in gluten content of two populations. It
can be interpreted from the calculated F-value that the samples are statistically
insignificant, that is, the variances of the two populations are equal.

8. In the study of differences between two samples, researchers try to find


out the relationship between two samples from different populations in
S elf
A ssessment terms of their ___________.
M
Q uestions 9. A researcher takes two samples from the same population before and
after a change and compares them to find the impact of the change. What
test statistic will he use?

X1 − X 2 X1 − X 2
a. z = b. t =
(s 12
/ n 1 ) + ( s 22 / n 2 ) (s 2
1 ) (
/ n 1 + s 22 / n 2 )
X1 − X 2 D
c. z = d. t =
σ (1 / n 1 ) + (1 / n 2 ) (SD / n )
10. In the given formula t = D/(SD/√n) what does D stands for?
a. Mean difference between two samples
b. Standard deviation of sample
c. Sample size
d. Sample density

227
Research Methodology and Management Decision

10.6 EXPLORING ANOVA


ANOVA is used to study and explain the amount of variation in more than two
samples or data sets. In a data set, two main types of variations can occur. One
type of variation occurs due to chance, while the other type of variation occurs due
to specific reasons. These variations are studied separately in ANOVA to identify
the actual cause of variation and help the researcher take effective decisions. There
are two main types of ANOVA. Let us learn about these in detail.

10.6.1 ONE-WAY ANOVA


One-way Analysis of Variance (ANOVA) is used to test whether the means of two
or more independent (unrelated) groups are statistically significantly different.
The assumptions of One-way ANOVA are as follows:
€€ Populations from which the samples are obtained are normally distributed.
€€ Samples are independent.
€€ Variances of the populations are equal.
Here, H0 = All the population means are equal
IT and H1 = At least one population mean is different
A table of variation, ANOVA table is created in this test. This is shown in Table 6:

Table 6: Showing the General Table of ANOVA


Source of Sum of Squares(SS) Degree of Mean of F-ratio
Variation Freedom(d.f.) Square(MS)
Between (k – 1) SS between/(k – 1) MS between/
∑ n (X )
k 2
Sample MS within
M
SSB
= i i
−X
(Factor/ i =1
Treatment)
2

) ( )
Within Sample k n
(n – k) SS within/(n – k)
∑∑ Xij − X=∑∑
2
SSE X ij − X
=
(
n k
(Error) = SSE i 1 =j 1
=i 1 =j 1

Total (n – 1)
∑∑ ( X )
k n 2

SST
= ij
−X
=i 1 =j 1

Xij = individual observation,


Xi = sample mean of the ith treatment (or group),
X = overall sample mean,
k = the number of treatments OR independent comparison groups
and
n = total number of observations or total sample size

The process of carrying out one-way ANOVA is as follows:


1. Calculate the mean of each sample using the following formula:
2

∑∑ ( X )
k n
SSE
= −X
1 n ij
2

( )
k =i 1n=j
∑1 XijXij − X
1

XSSE=
n=∑∑
= i
i j1==j 1

Means of samples are termed as X1, X2, X3,……………

228
Parametric Tests

2. Calculate the mean of all sample means (grand mean) with the help of the
following formula:
k n

∑∑ X
=i 1 =j 1
ij

X=
kn
3. Calculate the variation between two samples, known as SS between, with the
help of the following formula:

∑ n (X )
k 2

SSB
= i i
−X
i =1

SS between is the square of deviations of the sample means from the mean of
the sample means value. It helps know variations between two samples.
4. Divide SS between with d.f. k – 1 to get mean of square between (MS between).
MS between is the mean of variations in two samples. The following formula
is used to calculate MS between:
MS between = SS between/(k – 1)
5. Calculate variation within samples, known as SS within (SSE), with the help
IT of following formula:
SS within = ∑ (X1i– X1)2 + ∑ (X2i– X2)2 + ∑ (X3i– X3)2 + .......
Where, X1i, X2i, X3i,.... = observed values in a sample
X1, X2, X3,.... = means of corresponding samples
SS within is the square of deviations of values of data series from the
corresponding means of samples. It helps calculate variations within samples.
M
Please note that here in the given text, the One-way ANOVA has been explained
for cases where all the groups have same sample size.
N ote
However, the researcher must assess the impact of using unequal sample sizes
as it can affect the homogeneity of the variance that is assumed initially in
ANOVA.

6. Divide SS within with d.f. n – k to get mean of square within (MS within).
MS within is the mean of variations occurred within samples. The following
formula is used to calculate the MS within:
MS within = SS within/(n – k)
Where, n= total of the sample size of all the samples that is n1 + n2 +…..
7. Add the square of deviations to get the total variation in samples. The
following formula is used to calculate the total variation: 2

SSE ∑∑ ( X − X )
k n
=

( )
ij
k n 2 =i 1 =j 1

Total variation
= = SST ∑∑ X ij − X
=i 1 =j 1

To calculate total SS, first individual observations are subtracted from the
mean of sample means. After that, the square of individual observations are
taken and summed up to obtain results. The d.f. used in this case is n-1.

229
Research Methodology and Management Decision

8. Calculate the F-ratio with the help of the following formula:


F-ratio = MS between/MS within
The calculated value of F-ratio is tested against the tabulated value of F-ratio
(determined at a specified level of significance). If the value of F-ratio lies
under the limits of acceptance region, the null hypothesis is accepted and the
alternate hypothesis is rejected.
Let us understand the application of one-way ANOVA with the help of the
following example.

Example 11: The researcher observed the sale of a product of a particular brand
in six big retail houses in three cities. He/She wants to determine whether the
mean sale is same across cities. Use the data shown in Table 7 to calculate one-way
ANOVA:

Table 7: Sales Data of the Product in City A, B and C


Retail Houses City A (in Lakhs) City B (in Lakhs) City C (in Lakhs)
1 3 6 9
2 8 9 8
IT 3
4
5
4
9
6
8
5
7
6
7
5
6 7 4 7

Solution: Null hypothesis and alternate hypothesis are as follows:


H0: The sale in three cities is same
M
H1: The mean sale of at least one city is different from the rest of the two cities
First, calculate the mean sale of three cities separately, as follows:

Mean for City A (X1) =


( 3 + 8 + 4 + 9 + 6 + 7 ) = 6.17
6

Mean for City B (X2) =


( 6 + 9 + 8 + 5 + 7 + 4 ) = 6.5
6

Mean for City C (X3) =


(9 + 8 + 6 + 7 + 5 + 7) = 7
6

Mean of the samples (X) =


( 6.17 + 6.5 + 7 )
3
X = 6.6
SS between = n1(X1 – X)2 + n2 (X2 – X)2 + n3 (X3 – X)2
= 6(6.17 – 6.6)2 + 6(6.5 – 6.6)2 + 6(7 – 6.6)2
= 1.11 + 0.06 + 0.96
= 2.1

230
Parametric Tests

SS within = ∑(X1i – X1)2 + ∑(X2i – X2)2+ ∑(X3i – X3)2


= [(3 – 6.17)2 + (8 – 6.17)2 + (4 – 6.17)2 + (9 – 6.17)2 + (6 – 6.17)2 + (7 – 6.17)2 + (6 – 6.5)2
+ (9 – 6.5)2 + (8 – 6.5)2 + (5 – 6.5)2 + (7 – 6.5)2 + (4 – 6.5)2 + (9 – 7)2 + (8 – 7)2 + (6 – 7)2 +
(7 – 7)2 + (5 – 7)2 + (7 – 7)2]
= (10.05 + 3.35 + 4.71 + 8.01 + 0.03 + 0.7 + 0.25 + 6.25 + 2.25 + 2.25 + 0.25 + 6.25 + 4 + 1
+ 1 + 0 + 4 + 0)
= 54.34
Total variance = [(3 – 6.6)2 + (8 – 6.6)2 + (4 – 6.6)2 + (9 – 6.6)2 + (6 – 6.6)2 + (7- 6.6)2 + (6
– 6.6)2 + (9 – 6.6)2 + (8 – 6.6)2 + (5 – 6.6)2 + (7 – 6.6)2 + (4 – 6.6)2 + (9 – 6.6)2 + (8 – 6.6)2 +
(6 – 6.6)2 + (7 – 6.6)2 + (5 – 6.6)2 + (7 – 6.6)2]
= 56.48
ANOVA table created after completing preceding calculation is shown in Table 8:

Table 8: Showing the Calculation of ANOVA


Source of
SS d.f. MS F-ratio 5% F limit
Variation
IT
Between Sample

Within Sample
Total
2.1

54.34
56.48
(3 – 1) = 2 2.1/2 = 1.06

(18 – 3) = 15 54.34/15 = 3.62


(18 – 1) = 17
1.06/6.04 =
0.29
3.68

You can check the F-table for significance with the help of one-tailed test. The
graphical representation of the preceding solution is shown in Figure 13:
M
Acceptance Region

Rejection Region

0.29 3.68

Figure 13: Graph Showing the Position of the Calculated F-Value

Figure 13 shows that the calculated F-value lies in the acceptance region; therefore,
H0 is accepted and H1 is rejected. The value implies that product sale is almost
same in the three cities. You can also use another method of ANOVA, which is
performed with the help of correction factor. It is also termed as the shortcut
method. It is more convenient in case of non-integer values. The steps involved in
this method are mentioned below:
1. Calculate the correction factor with the help of the following formula:
Correction Factor = (T)2/n
Where, T= summation of all the observed values in the samples
n = Total number of observations
231
Research Methodology and Management Decision

2. Compute SS between by first taking the sum of observed values in each


sample. Thereafter, obtain the square of the sum of observed values and
divide the number with the respective size of samples. Then, add the
resultant values and take difference between the added value and correction
factor to obtain variation between two samples. The following formula is
used to calculate the variation:
SS between = ∑ (Tj)2/nj – (T)2/n
Where, Ti= sum of the observed value of a sample = T1, T2, ……….
nj = sample size of a sample = n1, n2,……………………
n = sum of the sample size of different samples
3. Divide SS between with d.f. k – 1 to get MS between. The following formula
is used to calculate MS between:
MS between = SS between/(k – 1)
4. Calculate and add the squares of all individual values in samples. The sum
of the square of individual values is subtracted from SS between and the
value obtained is termed as SS within or variation within the samples. The
following formula is used to calculate SS within:
IT SS within = ∑Xij2 – ∑ (Tj)2/nj
Where, Xij2= square of all individual values in samples
5. Divide SS within with d.f. n – k to get MS within. The following formula is
used to calculate MS within:
MS within = SS within / (n – k)
Where, n = total of the sample size of all the samples that is n1 + n2 + …..
M
6. Calculate total variation by taking the sum of squares of all individual values
in the samples. After that, subtract each variation of individual values with its
corresponding correction factor. The following formula is used to calculate
the variation:
Total SS = ∑Xij2 – (T)2/n
7. Calculate the F-ratio with the help of the following formula:
F-ratio = MS between/MS within
The calculated value of F-ratio is tested against the tabulated F-value that
is determined at a specified level of significance. If the calculated value
of F-ratio lies under the limits of acceptance region, the null hypothesis is
accepted and the alternate hypothesis is rejected.
Let us learn the application of one-way ANOVA with the help of correction factor
using Example 12.

Example 12: First, calculate the correction factor and then various components of
ANOVA table.

The correction factor can be calculated as follows:

Correction factor = (T)2/n

232
Parametric Tests

Where, T= summation of all the observed values in the three cities collectively

n = sum of the sample size of different samples

Correction factor = (118)2/18

= 773.6

SS between = ∑ (Tj)2/nj - (T)2/n

= (37 × 37)/6 + (39 × 39)/6 + (42 × 42)/6 – 773.6

= 228.17 + 253.5 + 294 – 773.6

= 2.1

SS within = ∑Xij2 – ∑ (Tj)2/nj

= (3)2+ (8)2 + (4)2 + (9)2 + (6)2 + (7)2 + (6)2 + (9)2 + (8)2 + (5)2 + (7)2 + (4)2+ (9)2 + (8)2 + (6)2
+ (7)2 + (5)2 + (7)2 – 775.67

= 54.34
IT
Total SS = 830 – 773.6

= 56.4

The values of total SS, SS between and SS within are same in both the cases used
for the calculation of ANOVA. Therefore, the ANOVA table would also be same.

10.6.2 TWO-WAY ANOVA


Two-way ANOVA is used when a researcher wants to test the differences between
M
groups that have been split on the basis of two attributes or independent variables
or factors. The steps involved in performing two-way ANOVA are as follows:
1. Calculate the correction factor of all attributes/factors separately with the
help of the following formula:

(T )
2

Correction factor =
n
Where, T= summation of all the observed values in the samples
n = total number of observations
2. Compute SS between rows. To do so, first take the sum of observed values
in each row. Thereafter, take the square of the sum of observed values
and divide the number with the respective sample size of rows. Then, the
resultant values are added and difference between the added value and
correction factor is taken to obtain the variation between two rows. The
following formula is used to calculate SS between rows:

(T ) − (T )
2 2


j
SS between rows =
nj n

Where, Ti= sum of the observed value of a row = T1, T2,……….


233
Research Methodology and Management Decision

nj = sample size of a row = n1, n2,……………………


n = sum of the sample size of different samples

In two-way ANOVA, there are three possible null hypotheses. They are:
N ote 1. There is no difference in the means of the first factor
2. There is no difference in the means of the second factor
3. There is no interaction between first and second factors
For null hypotheses 1 and 2; the alternative hypothesis is: The means of first
factor and second factor are not equal.

For null hypothesis 3; the alternative hypothesis is: There is an interaction


between first factor and second factor.

3. Divide SS between rows with d.f. k – 1 to get MS between rows, which is the
mean of variations occurred in between row samples. Similarly, MS between
rows for other attributes can also be calculated.
IT The following formula is used to calculate MS between rows:

MS between rows =
SS between rows
( r – 1)
Where, r = number of rows
4. Calculate SS between columns. To do so, first take the sum of observed
values in each column. Thereafter, take the square of sum of observed values
and divide the number with the respective sample size of columns. Then,
M
the resultant values are added and difference between the added value and
correction factor is taken to obtain the variation between columns. Similarly,
SS between columns for other attributes can also be calculated. The following
formula is used to calculate SS between columns:

(T ) − (T )
2 2


j
SS between columns =
nj n

Where, Tj= sum of the observed value of a column = T1, T2,……….


nj = sample size of a columns = n1, n2,……………………
5. Divide SS between columns with d.f. n – k to get MS between columns, which
is the mean of variations occurred within samples. Similarly, MS within for
other attributes can also be calculated. The following formula is used to
calculate MS within columns:

SS between columns
MS between columns =
( c – 1)
Where, c = total of the sample size of all the columns

234
Parametric Tests

6. Calculate total variation by first taking the sum of squares of all individual
values in the samples. After that, subtract the sum of squares from correction
factor. Similarly, total variation for other attributes can also be calculated.
The following formula is used to calculate variation:

∑ X − (T )
2 2
ij
Total SS =
n
7. Compute residual variation by first adding SS between and SS within, and
then subtracting the difference between Total SS and the value obtained by
adding up SS between and SS within. Similarly, residual variation for other
attributes can also be calculated. The following formula is used to calculate
residual variation:
Residual variation = Total SS – (SS between + SS within)
8. Calculate the F-ratio with the help of the following formula:
MS between
F-ratio =
MS within

The calculated value of F-ratio is tested against the tabulated F-value that is
IT
determined at a specified level of significance. If the calculated value of F-ratio
lies under the limits of acceptance region, the null hypothesis is accepted and the
alternate hypothesis is rejected.

Let us understand the application of two-way ANOVA with the help of an example.

Example 13: Three respondents have rated three small cars of different brands on
a five-point scale (5 being the highest) with respect to their features. The ratings
and features are provided in Table 9:
M
Table 9: Rating Given by Customers to Different Brands of Car with Respect to their Features
Respondents Mileage Durability Maintenance Technology Price
Cost
1 Zen 3 2 4 3 5
i10 4 4 4 5 4
Alto 4 3 5 2 4
2 Zen 2 4 3 1 4
i10 4 5 3 4 4
Alto 3 1 2 5 3
3 Zen 4 5 3 2 4
i10 3 2 4 5 3
Alto 4 5 4 5 5

The researcher wants to know difference between the brands in terms of features.

Solution: Null hypothesis and alternate hypothesis are as follows:

H0: There is no difference in the means of the five features of the cars

H1: The means of the five features are not equal.

235
Research Methodology and Management Decision

(T )
2

Correction factor =
n

162 × 162
=
45

= 583.2

SS between columns (i.e., between variables) =


31 × 31 31 × 31 32 × 32 32 × 32 36 × 36
+ + + + − 583.2
9 9 9 9 9
= 106.8 + 106.8 + 113.8 + 113.8 + 144 – 583.2

= 585.2 – 583.2

=2

56 × 56 48 × 48 58 × 58
SS between rows (i.e., between cars) = + + − 583.2
15 15 15
IT = 209.1 + 153.6 + 224.3 – 583.2

= 587– 583.2

= 3.8

Total SS = (3)2 + (4)2 + (4)2 + (2)2 + (4)2 + (3)2 + (4)2 + (4)2 + (5)2 + (3)2 + (5)2 + (2)2 + (5)2 +
(4)2 + (4)2 + (2)2 + (4)2+ (3)2 + (4)2 + (5)2 + (1)2 + (3)2 + (3)2 + (2)2 + (1)2 + (4)2 + (5)2 + (4)2 +
(4)2 + (3)2 + (4)2 + (3)2 + (4)2 + (5)2 + (2)2 + (5)2 + (3)2 + (4)2 + (4)2 + (2)2 + (5)2 + (5)2 + (4)2
M
+ (3)2+(5)2 – 583.2

= 638 – 583.2

= 54.8

SS residual = Total SS – (SS between columns + SS between rows)

= 54. 8 – (2 + 3.8)

= 49

ANOVA table created after preceding calculation is shown in Table 10:

Table 10: Calculation of ANOVA for the Three Brands of Car


Source of SS d.f. MS F-ratio 5% F limit
Variation
Between columns 2 (5 – 1)=4 2/4 = 0.5 0.5/6.125 = 0.08 F(4,8) = 3.84
Between rows 3.8 (3 – 1)=2 3.8/2 = 1.9 1.9/6.125 = 0.31 F(2,8) = 4.46
Residual 49 (5 – 1) × (3 – 1) = 8 49/8 = 6.125
Total 56.48 (45 – 1)=44

236
Parametric Tests

You can check the F-value for significance with the help of one-tailed test. The
graphical representation of the preceding solution for F-value at 4 v1 and 8 v2 is
shown in Figure 14:

Acceptance Region

Rejection Region

0.08 3.84

Figure 14: Rejecting the Calculated F-Value

The graphical representation of the preceding solution for F-value at 2 v1 and 8 v2


is shown in Figure 15:
IT Acceptance Region

Rejection Region
M
0.31 4.46

Figure 15: Accepting the Calculated F-Value

Figure 14 and Figure 15 show that the calculated F-value lies in the acceptance
region; therefore, We do not reject H0. The value implies that the cars have same
features.

11. _____________ is a non-parametric test that is used to study more than


two samples or data sets.
S elf
A ssessment 12. One-way ANOVA determines whether all samples have the same type of
Q uestions variations. (True/False)
13. _____________ ANOVA is used when you need to determine the relation
between two attributes.

10.7 SUMMARY
€€ A hypothesis can be tested by using a large number of tests and these tests
are connected with each other in one way or another.

237
Research Methodology and Management Decision

€€ In parametric tests, researchers make some assumptions about some


properties of the parent population from which samples are drawn. In non-
parametric tests, no assumptions are made.
€€ The different types of parametric tests are z-test, t-test, Chi-square test and
F-test.
€€ In a one-sample test, you study the relationship between a sample and the
population.
€€ In a two-sample test, you study the relationship between two samples drawn
from two different or same populations.
€€ ANOVA is used to study and explain more than two samples or data sets. It
helps in explaining the amount of variation in two data sets.

10.8 KEY WORDS


€€ Distribution pattern: A probability distribution pattern that is similar to
normal distribution and is used for testing hypothesis.
IT €€

€€
F-test: A test that is used to compare the significant difference between the
variances of two samples under study.
Non-parametric tests: The tests which do not make assumptions about the
parameters of the population from which a sample is derived.
€€ Parametric tests: The tests which make assumptions about the parameters of
the population from which a sample is derived.
€€ t-distribution: A type of probability distribution that is appropriate for
M
estimating the mean of a normally distributed populated where the sample
size is small and population variance is unknown.
€€ t-test: A test that is used to study the means of samples having a sample size
below 30 and unknown population variance.
€€ z-test: A test that is used to study the means and proportion of samples
whose size is more than 30.

10.9 CASE STUDY: NATIONAL MOTORS INC.


National Motors Inc. is a manufacturer of motor scooters. As a part of its operating
policy, the executives want to determine whether the customers’ and dealers’
satisfaction depends on warranty cards or not. To test this, the company has
withdrawn its warranty cards from the market. The marketing research department
of the company develops a questionnaire in a summated scale form to collect data
about customer satisfaction with and without the warranty card. The department
mails the questionnaire to a random sample of customers after they have received
warranty cards. The same questionnaire is then sent to the same set of customers
after their warranty cards are expired. The company also sends the questionnaire
to dealers who have provided their customers with warranty cards.

238
Parametric Tests

The customers and dealers have provided marks out of 100 for their satisfaction
level. The data collected by the marketing research department for customer
satisfaction and dealer satisfaction is as follows:

Table: Data Collected for Customer and Dealer Satisfaction


No. of Customers’ Satisfaction Customers’ Dealers
Observations When They Have Satisfaction When
Warranty Cards They Do not Have
Warranty Cards
1 74 43 92
2 81 23 42
3 35 88 54
4 59 55 59
5 90 67 83
6 33 53 30
7 82 85 34
8 68 70 54
IT 9
10
56
46
30
75

After conducting the research, the company comes to the conclusion that warranty
39
65

cards do not have much impact on the customers’ and dealers’ satisfaction. A reason
behind this can be that the same type of warranty is given by the competitors of
National Motors Inc.

QUESTIONS
M
1. Find out the effect of warranty cards on the satisfaction of customers with the
help of data provided in the case study. Use 5% as the level of significance to
test the hypothesis.
(Hint: H0: The customer satisfaction before and after returning the warranty
card is the same.)
2. What should National Motor do to overcome this problem?
(Hint: The company can conduct a survey regarding the available warranty
cards in the entire motor scooters industry.)

10.10 EXERCISE
1. What are the two types of hypotheses tests?
2. Explain the different types of parametric tests.
3. Explore the following cases of one-sample tests:
a. Normal and infinite population, large sample size, known population
variance and two-tailed or one-tailed test.
b. Normal and infinite population, small sample size, unknown population
variance and two-tailed or one-tailed test.

239
Research Methodology and Management Decision

4. Explain any two-sample tests along with examples.


5. Explain the concept of ANOVA in detail.

10.11 ANSWERS FOR SELF ASSESSMENT QUESTIONS


Topic Q. No. Answer
Types of Hypothesis Testing 1. b. Parametric test
Parametric Tests 2. d. z-test
3. t-test
4. d. F-test
one observation, number of
5.
observations
One-Sample Tests - Different Situations 6. means, variance, proportion
in Which One Sample Test is Used
7. b. Population mean
Two-Sample Tests 8. means
9. D
d. t=
IT Exploring ANOVA
10.

11.
a.
SD / n
Mean difference between
two samples
ANOVA
12. True
13. Two-way

10.12 SUGGESTED BOOKS AND E-REFERENCES


M
SUGGESTED BOOKS
€€ Biddle, J., & Emmett, R. Research in the history of economic thought and
methodology.
€€ National Academies Press. (2009). Partnerships for emerging research institutions.
Washington, D.C.
€€ Panneerselvam, R. (2014). Research methodology. Delhi: PHI Learning.

E-REFERENCES
€€ (2018). Retrieved from http://www.ihmgwalior.net/pdf/research_
methodology.pdf
€€ Alzheimer Europe - Research - Understanding dementia research - Types of
research - The four main approaches. (2018). Retrieved from https://www.
alzheimer-europe.org/Research/Understanding-dementia-research/Types-
of-research/The-four-main-approaches
€€ Research Methodology - Introduction - Notes for Students. (2018). Retrieved
from https://bbamantra.com/research-methodology/
€€ Research methods and methodology. (2018). Retrieved from http://www.
emeraldgrouppublishing.com/research/guides/methods/
240
CHAPTER

11
NON-PARAMETRIC TESTS

Table of Contents
IT
Learning Objectives
11.1 Introduction
11.2 Non-Parametric Tests
Self Assessment Questions
11.3 Sign Test
11.3.1 One Sample Sign Test
11.3.2 Two Sample Sign Test
M
11.3.3 Wilcoxon Matched Pairs Test/Signed Rank Test
Self Assessment Questions
11.4 Rank Correlation
Self Assessment Questions
11.5 Rank Sum Test
11.5.1 Mann-Whitney Test or U Test
11.5.2 Kruskal-Wallis Test
Self Assessment Questions
11.6 Chi-square Test
11.6.1 Chi-square Test for Goodness of Fit
11.6.2 Chi-square Test for Independence
Self Assessment Questions
11.7 Summary
11.8 Key Words
11.9 Case Study
11.10 Exercise
11.11 Answers for Self Assessment Questions
11.12 Suggested Books and e-References
L E A R N I N G O B J E C T I V E S
IT
After studying this chapter, you will be able to:



Explain about the non-parametric test
Discuss about the sign tests
 Explain about the rank correlation
 Describe about the rank sum tests
 Elaborate the Wilcoxon matched pairs test
 Explain the concept of Chi-square test
M
Non-Parametric Tests

11.1 INTRODUCTION
In the previous chapter on Parametric Tests, you have learned about different
types of parametric tests used to check the validity of a hypothesis. You have also
studied that parametric tests can only be applied if you know population type and
population parameters, such as mean and variance. However, if this information
is unavailable, you cannot use parametric tests. In such a situation, you need non-
parametric tests to check the validity of a hypothesis and draw inferences.

Non-parametric tests are used when you do not have adequate information about
population type and parameters. These tests are widely used to study data given in
the form of ranks. Examples of non-parametric tests are sign tests, rank correlation,
rank sum test, Wilcoxon matched pairs and chi-square test. The selection of the
test depends on problem type, sample size and data. For example, rank correlation
is used to establish correlation between two ranked data sets. Researchers should
observe caution while selecting a non-parametric test to ensure accurate and
precise results.

This chapter covers non-parametric tests and their types. It provides information
IT about one and two sample sign tests. It also elaborates on rank correlation and
rank sum tests, including the Mann-Whitney and Kruskal Wallis tests. In addition,
it explains the Wilcoxon matched pairs test/signed rank test. Finally, the chapter
also sheds light on chi-square test for goodness of fit and chi-square test for
independence.

11.2 NON-PARAMETRIC TESTS


M
As already discussed earlier, non-parametric tests are not based on assumptions
about a population and its parameters. A researcher can use these tests without
taking into consideration population distribution and sample type. Non-parametric
tests are also known as distribution-free tests because they do not assume that
the given data follows a specific distribution. These tests are mainly used when
the test model does not specify any stringent conditions regarding the population
parameters from which a sample is drawn.

Let us understand the reason behind choosing a non-parametric test over a


parametric test with the help of a simple example. Suppose, a researcher wants
to find out the preference of customers about the different brands of toothpaste
available in the market. He/She would ask customers to rank different brands
according to their preferences. The data collected would be in the rank form on
which parametric tests cannot be performed. This is because a parametric test
requires numeric values, such as mean and variance, to test a hypothesis. Therefore,
in this case, the researcher would use a non-parametric test.

243
Research Methodology and Management Decision

The different types of non-parametric tests are shown in Figure 1:

Sign Test

Non-parametric Tests
Rank Correlation

Rank Sum Test

Matched Pairs Test

Chi-Square Test

Figure 1: Non-Parametric Tests

1. Non-parametric tests are also known as ______________ tests.


IT S elf
A ssessment
Q uestions
2. A researcher can use non-parametric tests without taking into
consideration population distribution and sample type. (True/False)

11.3 SIGN TEST


Sign test is considered one of the easiest non-parametric tests because it takes
into account only the plus and minus signs of observations in a sample. It does
M
not consider the magnitude of observations while analysing the data present in
a sample. Sign test can be used in place of some parametric tests, such as one-
sample t-test and paired t-test. It uses binomial distribution to test the validity of a
hypothesis. There are two types of sign tests, which are shown in Figure 2:

Sign test

One Sample Sign Test Two Sample Sign Test

Figure 2: Types of Sign Tests

Let us now discuss the types of sign tests in detail.

11.3.1 ONE SAMPLE SIGN TEST


One sample sign test is applied on a sample where the researcher does not assume
that the data is normally distributed. In this test, the probability of getting a sample

244
Non-Parametric Tests

value of less or greater than median value is equal. This implies that the proportion
of success (p) and failure (q) is equal which means that p = q = 0.50. Therefore, it is
called binomial sign test. In one sample sign test, the researcher provides sample
values with positive (+) and negative (–) signs to test the hypothesis.

Here, the researchers usually tests the null hypothesis: M = M0 against an


appropriate alternate hypothesis.

Sign test is a hypothesis test for population median and not for population
N ote mean.

Here, three types of tests are possible as shown in Table 1:

Table 1: Three types of Sign Tests for Population Median


Null Hypothesis Alternate Hypothesis Type of Test
H0: M = M0 H1: M > M0 Right-tailed test
IT H0: M = M0
H0: M = M0
H1: M < M0
H1: M ≠ M0
Left-tailed test
Two-tailed test

In any given sign test, each data value or observation is converted into a plus sign
or minus sign. The allocation of + and – signs is done by assuming a median value
of the sample. Values that are greater than the median value are replaced by plus
sign and the values that are less than the median value are replaced by minus
sign. The values that are equal to the given median value are discarded or not
considered. After assigning the signs, the researcher may test the null hypothesis
M
that the probability of getting plus and minus signs are 0.5.

The sign test can be performed by using two methods as follows:


1. When the sample size is small, the test is carried out by calculating the
binomial probabilities using the binomial probabilities table.
2. When np ≥ 5 and nq ≥ 5; then, normal distribution can be used as an
approximation of binomial distribution.

When n is large and when p is sufficiently large (i.e., p > 0.10); then, normal
distribution is used as an approximation of binomial distribution.
N ote
The mean and standard deviation of normal distribution are given as follows:

Mean µ = np

SD = σ = npq

Let us understand the application of one sample sign test with the help of an
example.

245
Research Methodology and Management Decision

Example 1: The scores of 15 students in a class test of 20 marks are as follows:

09, 10, 16, 18, 17, 19, 20, 16, 14, 12, 11, 13, 14, 09 and 13

Test the hypothesis that the median score of all the students is equal to 15 against
the hypothesis that the median score of 15 students is greater than 15. Use 5% level
of significance.

Solution: Null and alternate hypotheses are as follows:

H0: Median score of 15 students is 15

H1: Median score of 15 students is greater than 15

OR

H0: p = 0.5

H1: p > 0.5

The researcher assigns minus (–) sign to values of less than 15 and plus (+) sign to
values of greater than 15.
IT Observation
Sign
19
+
17
+
16
+
18
+
17
+
19
+
20
+
16
+
16
+
18
+
11

13

14

09

13

The following result is obtained:

No. of + signs = 10

No. of – signs = 5
M
Number of observations = 15

It must be remembered that the test statistics is larger of the number of + signs and
the number of – signs.

Now, we need to check whether 10 plus signs observed in the given 15 trials
support the null hypothesis that p = 0.5 or p > 0.5.

Now, we use binomial probability table to find the probability of 10 or more


successes as follows:
⇒ P (10 or more successes (X ≥ 10) | n = 15, p = 0.5) = P(X = 10) + P(X = 11) +
…….. + P(X = 15)
⇒ P(X = 10) + P(X = 11) + …….. + P(X = 15)
⇒ 0.092 + 0.042 + 0.014 + 0.003 + 0.000 + 0.000
⇒ P (10 or more successes (X ≥ 10) | n = 15, p = 0.5) = 0.151
Since the value of one-tailed p is greater than a = 0.05, null hypothesis is accepted.

Note that here, np = 15 (0.5) = 7.5.

Therefore, we can also use normal approximation to binomial distrinution.

246
Non-Parametric Tests

Z-statistic is calculated as:


X − np
Z=
Z = X − np
npq
npq
10 − 7.5 2.5
= = = 1.295
15 1.93
4
The value of Z at 0.05 level of significance is +1.645. Since Z = 1.295 lies in the
acceptance region; the null hypothesis is accepted.

Acceptance Region

Rejection Region
IT –1.96 +1.295 +1.645

Figure 3: Graph Showing the Position of the Calculated Binomial Value

Figure 3 shows that the calculated binomial value lies in the acceptance region;
therefore, H0 is accepted. This implies that the median marks scored by 15 students
are 15.
M
11.3.2 TWO SAMPLE SIGN TEST
In two sample sign test, the researcher tests two related samples. This test is
equivalent to paired-t test. Researchers use sign test when data is given as pairs.
In this test, the researcher provides positive (+) and negative (–) signs to values.
These signs are allocated on the basis of the difference between the values of first
sample and second sample. If the difference is positive, the difference value gets a
plus (+) sign and if the difference is negative, the difference value gets a minus (–)
sign. If the values of two samples are equal, these values are discarded.

Thereafter, the researcher calculates the total plus and minus signs and divides
the number by the sample size. Then, standard error is calculated and limits are
determined. Finally, the hypothesis is tested against the calculated value of limit.

Let us understand the application of the two sample sign test with the help of an
example.

Example 2: Sales achieved by two employees in a year is shown in Table 2:

Table 2: Data Showing Sale Done by Employees


Month Employee 1(in Lakhs) Employee 2 (in Lakhs)
1 2 1.5

247
Research Methodology and Management Decision

Month Employee 1(in Lakhs) Employee 2 (in Lakhs)


2 2 2.5
3 4 3
4 1 1
5 1 1.5
6 2.5 2.75
7 3 2.5
8 3.5 1
9 4 3
10 1.5 1.4
11 2 4
12 3 3

The researcher wants to find out whether the first employee is the better performer.

Use 5% as level of significance.

Solution: Null hypothesis (H0) and alternate hypothesis (H1) are as follows:
IT H0: p = 1/2

H1: p > 1/2

Or

H0: Sale done by two employees is same.

H1: Sale done by the first employee is more than that of the second employee.
M
The researcher assigns the plus (+) and minus (–) signs to the data shown in
Table 3:

Table 3: Signs Allocated to the Data


Month Employee 1(in Lakhs)X Employee 2(in Lakhs)Y Sign(X-Y)
1 2 1.5 +
2 2 2.5 –
3 4 3 +
4 1 1 0
5 1 1.5 –
6 2.5 2.75 –
7 3 2.5 +
8 3.5 1 +
9 4 3 +
10 1.5 1.4 +
11 2 4 –
12 3 3 0

248
Non-Parametric Tests

According to the Table,

No. of + signs = 6

No. of – signs = 4

Number of observations = 10 (2 of the observations are 0; therefore, the researcher


does not consider them)

Now, we use binomial probability table to find the probability of 6 or more


successes as follows:
⇒ P (6 or more successes (X ≥ 6) | n = 10, p = 0.5) = P(X = 6) + P(X = 7) + …….. +
P(X = 10)
⇒ P(X = 6) + P(X = 7) + …….. + P(X = 10)
⇒ 0.205 + 0.117 + 0.044 + 0.010 = 0.376
Note that here, np = 10 (0.5) = 5

Therefore, we can also use normal approximation to binomial distribution.

Z-statistic is calculated as:


IT
Z=
X − np
npq

6−5
Z=
10 / 4

1
M
=Z = 0.632
1.581

The value of Z at 0.05 level of significance is +1.645. Since Z = 0.632 and it lies in the
acceptance region; null hypothesis is accepted. This implies that the median sale
done by two employees is equal.

11.3.3 WILCOXON MATCHED PAIRS TEST/SIGNED RANK TEST


The Wilcoxon matched pairs tests/Signed rank test is a combination of sign and
rank tests and is used to compare a paired samples. It is used in place of paired
t-test when the distribution is not normal. The Wilcoxon matched pairs test is used
when the researcher wants to determine the direction and magnitude of difference
in the matched values. Steps to perform the test are mentioned below:
1. Determine the difference (di) among observed values.
2. Rank the difference |di| in the ascending order (lowest to highest). If the
difference between two values is zero, the researcher needs to ignore those
values.
3. Segregate the ranks according to the positive and negative signs of di values.

249
Research Methodology and Management Decision

4. Add the ranks with negative and positive signs separately.


5. Determine the T-value by comparing the sums of ranks with negative signs
and positive signs. If the sum of ranks with positive sign is more than the
sum of ranks with negative sign, the T-value would be equal to sum of ranks
of negative sign or vice versa.
Mean is calculated using the following formula:

Mean, µT = n (n + 1)/4

SD is calculated using the following formula:

Standard deviation, σT = √n (n + 1) (2n + 1)/24

Where, n = number of observations – number of ignored observations

The test statistic z can be calculated as follows:

T − µT
Z=
σT
If the calculated z-value lies under the limits of acceptance region, the null
IT hypothesis is accepted and the alternate hypothesis is rejected.

Let us understand the application of the Wilcoxon matched pairs test/signed rank
test with the help of an example.

Example 3: Two brands are ranked on a five-point scale (five being the highest).
The researcher wants to determine the difference between the satisfaction levels
of customers for two brands. The data for the Brand A and Brand B and their
difference is provided in Table 4:
M
Table 4: Rating Given by Customers to Brand A and Brand B
No. of Respondents Brand A Brand B Difference(di)
1 2 2 0
2 3 4 –1
3 4 3 1
4 1 2 –1
5 2 5 –3
6 5 4 1
7 4 2 2
8 3 4 –1
9 4 3 1
10 5 4 1
11 2 4 –2
12 4 3 1

250
Non-Parametric Tests

Use the Wilcoxon matched pairs test with 5% level of significance.

Solution: Null hypothesis and alternate hypothesis are as follows:

H0: Customer satisfaction for the two brands is same.

H1: Customer satisfaction for the two brands is different.

The researcher calculates the T statistic, as shown in Table 5:

Table 5: Calculation of Wilcoxon Matched Pairs Test


No. of Brand Brand Difference |di| Sign Rank Rank with Signs
Respondents A B (di) |di|

1 2 2 0 0 0 - -

2 3 4 -1 1 – 4.5 – 4.5

3 4 3 1 1 + 4.5 + 4.5

4 1 2 -1 1 – 4.5 – 4.5

5 2 5 -3 3 – 11 – 11
IT 6

8
5

3
4

4
1

-1
1

1
+


4.5

9.5

4.5
+ 4.5

+ 9.5

– 4.5

9 4 3 1 1 + 4.5 + 4.5

10 5 4 1 1 + 4.5 + 4.5

11 2 4 -2 2 – 9.5 – 9.5
M
12 4 3 1 1 + 4.5 + 4.5

Total Sum of Positive Ranks (W+)


= 32
Sum of Negative Ranks (W–)
= 34

T (smaller of W+ and W–) =


32

In this case, the researcher has neglected the first observation, as it is 0. The ranking
of difference is done from smaller to larger value. If there is a tie between the
ranks, the mean of ranks is taken and assigned to identical values. The T statistic
is equal to 32, which is the smallest value between the ranks with positive signs
and negative signs. The T-value, with 5% level of significance and two-tailed test,
is ± 1.96.

Value of z-statistic is calculated as:


T − µT
Z=
σT
32 − [(11(11 + 1)) / 4] 32 − 33 −1
Z= = = = − 0.088
11(11 + 1)[2(11) + 1] 11×12 × 23 11.25
24 24

251
Research Methodology and Management Decision

The graphical representation of the preceding solution is shown in Figure 4:

Acceptance Region

Rejection Region Rejection Region

–1.96 –0.088 +1.96

Figure 4: Position of the Calculated Value

Figure 4 shows that the calculated Z-value lies in the acceptance region; therefore,
IT H0 is accepted. This implies that customer satisfaction for two brands is same.

3. ___________________ is considered as one of the easiest non-parametric


tests because it takes into account only the plus (+) and minus (–) signs
S elf of observations in a sample.
A ssessment
Q uestions 4. In one sample sign test, the probability of getting a sample value less
than or greater than mean is equal. (True/False)
5. Which of the following tests is also known as paired sign test?
a. Sign test
M
b. One sample sign test
c. Two sample sign test
d. None of these

11.4 RANK CORRELATION


Rank correlation, also known as Spearman’s rank correlation coefficient, is used to
establish correlation between two data sets that can be ranked. Steps to calculate
rank correlation are mentioned below.
1. Assign ranks to all observations present in two data sets in the descending
order. If two or more values in the data sets are identical, calculate mean
rank and allocate it to all identical values. For example, if third, fourth and
fifth ranks have the same value, take out their mean (3 + 4 + 5)/3 = 4 and
allocate it as rank to those values.
2. Calculate the difference between ranks by subtracting the rank of one data
set from that of second data set. The difference is denoted as di.
3. Calculate the square of di.

252
Non-Parametric Tests

4. Find the sum of square of di.


5. Calculate Spearman’s rank correlation coefficient by using the following
formula:

6 ∑ d 2i
ρ = 1 –
(
n n2 − 1 )
Where, di= difference between ranks
n = sample size
The value of Spearman’s rank correlation coefficient lies between +1 and –1,
where +1 indicates perfect positive correlation and –1 indicates perfect negative
correlation. The values that lie between +1 and –1 show different degrees of
correlation. The researcher can assess the value of rank correlation coefficient by
performing a hypothesis test. If the sample size is less than 30, the researcher needs
to use the tabulated value of Spearman’s rank correlation coefficient to test the
value of coefficient. Suppose, the sample size (n) = 15 and σr = 0.6364, which shows
a reasonably high degree of correlation between two data sets. The researcher
wants to check the value of σr (rank correlation coefficient) to judge whether the
correlation is actually present or not. He/She forms a null hypothesis that there
IT
is no correlation between the two data sets and tests it at 5% level of significance
using two-tailed test. The researcher checks the critical value for ρ in the table
showing values of Spearman’s rank correlation coefficient. The critical value of ρ
is – 0.5179 (lower limit) and + 0.5179 (upper limit). The given value of ρ = 0.6364 is
outside the acceptance region; therefore, the researcher rejects the null hypothesis
and concludes that there is a correlation between two data sets.
Let us understand the application of Spearman’s rank correlation coefficient with
the help of an example.
M
Example 4: A researcher wants to test correlation between the IQ level and hours
spent in studying newspaper per week. The data is provided in Table 6:

Table 6: Data Showing IQ Level and Hours Spent on Reading Newspaper


No. of Observations IQ(X) Hours Spent in Studying
Newspaper(Y) Per Week
1 105 6
2 91 7
3 99 24
4 100 56
5 99 29
6 103 30
7 97 20
8 113 12
9 112 10
10 110 17
11 94 16
12 110 8
13 112 9

253
Research Methodology and Management Decision

Use rank correlation to find out correlation between the IQ level and hours spent
on reading newspaper, with 5% level of significance.

Solution: Null hypothesis and alternate hypothesis are as follows:

H0: There is no correlation between the IQ level and hours spent on reading
newspaper every week.

H1: There is correlation between the IQ level and hours spent on reading newspaper
every week.

Or

H0: ρ = 0

H1: ρ ≠ 0

Table 7 shows the calculation of rank correlation test:

Table 7: Calculation of Rank Correlation Test


No. of IQ(X) Hours Spent Rank Rank Difference di2
IT Observations

1 105
in Reading
Newspaper(Y)
6
X

6
Y (di=X-Y)

13 -7 49
2 91 7 13 12 1 1
3 99 24 9.5 4 5.5 30.25
4 100 56 8 1 7 49
5 99 29 9.5 3 6.5 42.25
6 103 30 7 2 5 25
M
7 97 20 11 5 6 36
8 113 12 1 8 -7 49
9 112 10 2.5 9 -6.5 42.25
10 110 17 4.5 6 -1.5 2.25
11 94 16 12 7 5 25
12 110 8 4.5 11 -6.5 42.25
13 112 9 2.5 10 -7.5 56.25
Total ∑di2 = 449.5

The calculation of rank correlation is shown below:

6 ∑ d 2i
ρ=1–
(
n n2 − 1 )
ρ = 1 – {6 × 449.5/[13(13 × 13 – 1)]}

ρ = 1 – [2697/2184]

ρ = 1 – 1.235 = –0.235

254
Non-Parametric Tests

The rank correlation value at 5% level of significance with a degree of freedom


(d.f.) 13 and two-tailed test is ± 0.484. The researcher can check the rank correlation
value for significance with the help of two-tailed test.

The calculated rank correlation value lies in the acceptance region; therefore, H0 is
accepted. This implies that there is no correlation between the IQ level and number
of hours spent on reading newspaper in a week. It can be interpreted that reading
newspaper cannot increase your IQ level unless you analyse news.

6. _______________, also known as Spearman’s rank correlation coefficient,


S elf
is used to establish correlation between two data sets that can be ranked.
A ssessment
Q uestions

11.5 RANK SUM TEST


Rank sum test is used to analyse ordinal data (data in the rank form) and calculate
the value of rank sum. First, observations of different samples are arranged in the
IT ascending order of value. Thereafter, these observations are ranked and the sum
of ranked observations is calculated. Finally, the sum is tested against the specified
test statistic value to test the hypothesis. There are two types of rank sum tests, as
shown in Figure 5:

Rank Sum Test


M
Mann-Whitney Test Kruskal - Wallis Test

Figure 5: Types of Rank Sum Tests

Let us learn about these in detail.

11.5.1 MANN-WHITNEY OR U TEST


The Mann-Whitney test (or U test) is used to determine whether two independent
samples are drawn from the same population. The test is applied in general
conditions and does not have any specific requirement. The only requirement of
the test is that population should be continuous. However, failure to fulfil this
requirement does not have a huge impact on the result. In the Mann-Whitney
test, first two samples are merged in increasing or decreasing order. After that, the
data in the merged sample is ranked from lowest to highest. After rank allocation,
the ranks are classified as R1 for sample 1 and R2 for sample 2. After that, the total

255
Research Methodology and Management Decision

of ranks in R1 and R2 is determined. Finally, the U test is applied in the following


manner:
n 1 ( n 1 + 1)
U=
1
R1 −
2
n 2 ( n 2 + 1)
U=
2
R2 −
2
Where, U = smaller of U1 and U2 and

n1= sample size of sample 1 and

n2 = sample size of sample 2

R1= sum of the ranks of sample 1 and

R2= sum of the ranks of sample 2

The mean and SD are determined to calculate the limits of acceptance region. The
mean could be calculated with the help of following formula:

μU = n1 n2/2
IT Where, n1= sample size of sample 1

n2 = sample size of sample 2

The formula for standard deviation is as follows:

n 1 . n 2 n 1 + n 2 +1
σU =
12
M
If the value of U test lies under the limits of the acceptance region, the null
hypothesis is accepted. However, if the calculated U value lies outside the limits of
the acceptance region, the null hypothesis is rejected and the alternate hypothesis
is accepted. Let us take an example to understand the application of the Mann-
Whitney test.

Example 5: The production of Product A and Product B in a year is shown in


Table 8:

Table 8: Production of Product A and Product B


No. of Respondents Product A Product B
1 40 28
2 35 30
3 20 35
4 36 40
5 22 45
6 26 21
7 45 26

256
Non-Parametric Tests

No. of Respondents Product A Product B


8 50 28
9 44 30
10 47 44
11 48 50
12 25 49

The researcher wants to find out that the two products are from the same production
house. Use the Mann-Whitney test (or U test) with 10% significance level.

Solution: Null hypothesis and alternate hypothesis are as follows:

H0: MedA = MedB

H1: MedA ≠ MedB

The researcher merges the data of two products and arranges it in the increasing
order. Thereafter, he/she calculates R1 and R2 for Products A and B, respectively,
as shown in Table 9:
IT S. No.
Table 9: Calculation for Mann-Whitney Test
Product A Rank Product B Rank
1 14.5 7.5
2 11.5 9.5
3 1 11.5
4 13 14.5
M
5 3 18.5
6 5.5 2
7 18.5 5.5
8 23.5 7.5
9 16.5 9.5
10 20 16.5
11 21 23.5
12 4 22
R1 = 152 R2 = 148

The calculation of U statistic is as follows:

12 (13)
U11 =
152 − 74
=
2

12 (13)
UU21 =
148 − 70
=
2

257
Research Methodology and Management Decision

Therefore, U = 70
nn1nn2 12 × 12
U =1=
µU
µ= = 72
22 2

n 1nn1n2 2( n
( n11++nn22 ++11)) 12 × 12 (12 + 12 + 1)
SD
=SD =σσ
UU=
= =
12
12 12

SD
=SD 5=
12 17.3

The U-value at 10% level of significance and two-tailed test is 42.

Uα = 42

Since U is greater than Uα, the researcher rejects H0. This implies that Products A
and B are from different production houses.

11.5.2 KRUSKAL-WALLIS TEST


The Kruskal-Wallis test is equivalent to one-way ANOVA (explained later in this
IT chapter) with only one difference that the former is based on ranks while the latter
is based on numerical values. The test is an extension of the Mann-Whitney U-test.
In the Kruskal-Wallis test, the samples must be more than two, whereas, samples
are two in the Mann-Whitney U-test. The Kruskal-Wallis test is used to determine
whether samples in a study are taken from the same population. In the test, the
data from different samples is merged and values are ranked in any order (low
to high or high to low). Ranks are classified as R1,R2….and Rn, according to the
samples to which they belong. The test is performed in the following manner:
M
H = 12/n (n + 1) ∑ (Ri2/ni) – 3(n + 1)

Where, n = sample size

Ri= sum of the ranks of all the samples separately that is R1, R2, and……………. , ni
= n1, n2, n3,……………

Chi-square value is determined at d.f. k–1 and the specified level of significance
and the calculated H value is tested against it. If the H value lies under the limits
of acceptance region, the researcher accepts the null hypothesis and rejects the
alternate hypothesis. However, if the H value lies outside the limits of acceptance
region, the researcher rejects the null hypothesis and accepts the alternate
hypothesis.

Let us understand the application of the Kruskal-Wallis test with the help of an
example.

Example 6: An organisation wants to purchase hundreds of different milling


machines. As these machines cost a lot, the organisation wants to check whether
it should purchase machines or not. Initially, it borrows four machines and
randomly assigns them to 20 technicians with similar skill sets. Each machine was

258
Non-Parametric Tests

put through a series of tasks and rated using a standardised test. The high score
indicates better performance. The score given by technicians to four machines are
shown in Table 10:

Table 10: Scores Given by Technicians to Four Machines

Machine 1 Machine 2 Machine 3 Machine 4


24 28 26 33
23 34 28 37
26 29 31 36
27 32 25 35
29 30 21 38

Perform the Kruskal-Wallis test to establish whether all four machines are equally
good. Use 5% level of significance.

Solution: Null hypothesis and alternate hypothesis are as follows:


IT
H0: All four machines are equally good. (This implies that Median1 = Median2 =
Median3 = Median4)

H1: At least two machines are different.

First, the researcher merges performance data for the four machines and arranges
it in increasing order. Thereafter, he/she ranks the data and classifies ranks as
R1, R2, R3 and R4 for machines 1, 2, 3 and 4, respectively. Finally, the researcher
takes out the total of ranks in R1, R2, R3 and R4. The calculation is shown in Table 11:
M
Table 11: Allocation of Ranks to Scores Provided to Four Machines

No. of Observations Machines Data Ranks


1 21 1
2 23 2
3 24 3
4 25 4
5 26 5.5
6 26 5.5
7 27 7
8 28 8.5
9 28 8.5
10 29 10.5
11 29 10.5
12 30 12

259
Research Methodology and Management Decision

No. of Observations Machines Data Ranks


13 31 13
14 32 14
15 33 15
16 34 16
17 35 17
18 36 18
19 37 19
20 38 20

After that, ranks are classified as R1, R2, R3 and R4 for machine 1, 2, 3 and 4,
respectively, as shown in Table 12:

Table 12: Calculation of Kruskal-Wallis Test

Machine 1 R1 Machine 2 R2 Machine 3 R3 Machine 4 R4


IT 24

23
3

2
28

34
8.5

16
26

28
5.5

8.5
33

37
15

19

26 5.5 29 10.5 31 13 36 18

27 7 32 14 25 4 35 17

29 10.5 30 12 21 1 38 20
M
Total 28 61 32 89

The calculation of the Kruskal-Wallis test is as follows:

H = 12/n (n + 1) ∑ (R12/n1 + R22/n2 + R32/n3 + R42/n4) – 3(n + 1)

Where, n = 20

R1 = 28 R2 = 61 R3 = 32 R4 = 89

n1= n2= n3= n4= 5

H = 12/20(20 + 1) (28 × 28/5 + 61 × 61/5 + 32 × 32/5 + 89 × 89/5) – 3(20 + 1)

= (0.02857) (156.8 + 744.2 + 204.8 + 1584.2) – 63

= 13.85

d. f. = k – 1

=4–1

=3

260
Non-Parametric Tests

Chi-square value at 5% level of significance and 3 d.f. is 7.815. You can check the
value for significance with the help of one-tailed test. The graphical representation
of the preceding solution is given in Figure 6:

Acceptance Region

Rejection Region

7.815 13.85

Figure 6: Showing the Rejection of the Calculated Chi-square Value

Figure 6 shows that the calculated chi-square value lies in the rejection region;
therefore, H0 is rejected and H1 is accepted. This implies that all four machines
are not equally good. It can be interpreted that all four machines have different
IT capabilities and machine number 4 is the best, as its score (89) is the highest.

7. ________________ is used to determine whether two independent


samples are drawn from the same population.
S elf
A ssessment 8. ______________ test is used to determine whether the samples in the
Q uestions study are taken from the same population.
9. Chi-square value is determined at d.f. k–1 and the specified level of
M
significance and the H value is tested against it. (True/False)

11.6 CHI-SQUARE TEST


Chi-square test is used to find out dependency between two attributes. It can also
be used to make comparisons between theoretical population (expected data) and
actual data (observed data). The formula used in chi-square test is as follows:

(O − Ei )
2
k
χ =∑
2 i

i =1 Ei
Where, Oi = Observed frequency

Ei = Expected frequency

Expected frequency can be calculated with the help of the following formula:

Ei = Row total*Column total/Grand total (For test of independence)

If the value of chi-square is greater than critical value of χ2, null hypothesis is
rejected.

261
Research Methodology and Management Decision

Figure 7 shows two types of chi-square tests that are mainly used to find out the
association between variables:

Chi-square Tests

Chi-square Test for Chi-square Test for


Goodness of Fit Independence

Figure 7: Types of Chi-square Tests

Let us discuss Chi-square test for goodness of fit and chi-square test for
independence in detail.

11.6.1 CHI-SQUARE TEST FOR GOODNESS OF FIT


IT The test helps the researcher know whether the theoretical distribution
(distribution of expected frequency) is fitted to the observed data and to what
extent. In chi-square test, first the researcher finds out expected frequency on the
basis of distribution.

Thereafter, he/she calculates chi-square value with the formula used to calculate
chi-square. In chi-square test, d.f. used is n–1. Chi-square value is determined at
the specified level of significance and d.f. If the calculated chi-square value lies
under the limits of acceptance region, the researcher accepts the null hypothesis
M
and rejects the alternate hypothesis.

Let us understand the application of chi-square test with the help of an example.

Example 7: An FMCG company produces various products. Currently, this


company wants to launch four more products namely A, B, C and D belonging to
same category. However, before launching the products, the company wants to
evaluate the customer preferences for each product. For this, the company carries
out a survey of 1000 customers and records their responses as shown in Table 13:

Table 13: Customers’ Responses

Product Number of Customers the Product is Preferred by

Product A 300

Product B 280

Product C 220

Product D 200

262
Non-Parametric Tests

Test the hypothesis that the customers have no preference for any particular
product. Use 5% level of significance.

Solution: For H0: Customers have no preference for any particular product

The expected frequency and observed frequency of customers’ responses are


shown in Table 14 as follows:

Table 14: Expected Frequency and Observed Frequency of Customers’ Responses

Product Expected Frequency (Ei) Observed Frequency (Oi)


Product A 250 300

Product B 250 280

Product C 250 220

Product D 250 200

k
(Oi − Ei ) 2
χ2 =∑
i =1 Ei
IT
χ2 =
(300 − 250) 2 (280 − 250) 2 (220 − 250) 2 (200 − 250) 2
250
+
250
+
250
+
250

502 + 302 + 302 + 502


χ2 =
250

6800
χ2
= = 27.2
250
M
Critical Value of χ2 at 5% level of significance with 3 degrees of freedom (k-1 = 3) is
7.81. Since our χ2 is greater than the critical value, we reject H0.

11.6.2 CHI-SQUARE TEST FOR INDEPENDENCE


In chi-square test for independence, two attributes are tested to find out whether
they are associated with each other. For example, the researcher wants to know that
the introduction of better/unique services helps increase sales of an organisation
or not. In this case, the researcher is trying to establish a relation between two
attributes − better services and sales. In chi-square test, first expected frequency
is calculated and then the value of chi-square is ascertained. The d.f. used in this
case is (r–1) (c–1), where r equals number of levels for one category of variable
and c equals number of levels for second category of variable. The chi-square
value is determined at the specified level of significance and d.f. If the calculated
chi-square value lies under the limits of acceptance region, the null hypothesis is
accepted and the alternate hypothesis is rejected.

Let us understand the application of chi-square test with the help of an example.

263
Research Methodology and Management Decision

Example 8: The researcher has the data for the preferences of men and women
regarding the joint and nuclear families, as shown in Table 15:

Table 15: Data for Preferences of Men and Women for Joint and Nuclear Families
Joint Family Nuclear Family Total
Men 96 35 131
Women 170 360 530
Total 266 395 661

The researcher wants to find out whether the opinion of men and women about
the type of family is same. Use 5% level of significance.

Solution: Null hypothesis and alternate hypothesis are as follows:

H0: The opinion of men and women about the type of family is indifferent.

H1: The opinion of men and women about the type of family is different.

The test statistic used for this data is chi-square test for independence. The
following equation is used for calculation:
IT χ2 =∑
k

i =1
(Oi − Ei ) 2
Ei

Where, Oi = Observed frequency

Ei = Expected frequency

Expected frequency can be calculated with the help of the following equation:
M
Ei = Row total*Column total/Grand total

In the current scenario, expected frequency can be calculated using the following
method:

E1i =131 × 266/661 = 52.72

E2i=131 × 395/661= 78.28

E3i=530 × 266/661 = 213.28

E4i=530 × 395/661= 316.72

After calculating expected frequency and the square of differences between the
observed and expected frequency, Table 16 is created:

Table 16: Calculation of Chi-Square Test for Independence


No. of Observed Expected Oi–Ei (Oi–Ei)2 (Oi–Ei)2/Ei
Observations Frequency(Oi) Frequency(Ei)
Men
Joint Family 96 52.72 43.28 1873.158 35.53032
Nuclear Family 35 78.28 – 43.28 1873.158 23.92895

264
Non-Parametric Tests

No. of Observed Expected Oi–Ei (Oi–Ei)2 (Oi–Ei)2/Ei


Observations Frequency(Oi) Frequency(Ei)
Women
Joint Family 170 213.28 – 43.28 1873.158 8.782626
Nuclear Family 360 316.72 43.28 1873.158 5.914241
Total 74.15614

Calculated value of chi-square = 74.16

d.f. = (r – 1) (c – 1)

= (2 – 1) (2 – 1) = 1

Chi-square value at 5% level of significance with one-tailed test and 1 d.f. is 3.841.
You can check the chi-square value for significance with the help of one-tailed test.
The graphical representation of the preceding solution is shown in Figure 8:

IT Acceptance Region

Rejection Region

+3.841 74.16
M
Figure 8: Rejecting Chi-square Value

Figure 8 shows that the value lies in the rejection region; therefore, H0 is rejected.
The value implies that there is a vast difference between the opinions of men and
women about the type of family.

10. Expected frequency can be calculated using the formula __________.


S elf 11. Chi-square test for independence refers to a test in which two attributes
A ssessment are tested to find out whether they are associated with each other. (True/
Q uestions False)

11.7 SUMMARY
€€ A researcher can use non-parametric tests without taking into consideration
population distribution and sample type. Non-parametric tests are also
known as distribution-free tests.
€€ Sign test is considered as one of the easiest non-parametric tests because it
takes into account only the plus and minus signs of observations in a sample.

265
Research Methodology and Management Decision

€€ One sample sign test is applied on a single sample taken from a symmetrical
population.
€€ Two sample sign test is used to check whether two samples are related to
each other. It is also known as paired sign test.
€€ Rank correlation, also known as Spearman’s rank correlation coefficient, is
used to establish correlation between two data sets that can be ranked.
€€ Rank sum test is used to analyse ordinal data (in the rank form) and calculate
the value of rank sum statistics. To conduct this test, observations need to be
arranged in ascending order.
€€ The Mann-Whitney test (or U test) is used to determine whether two
independent samples are drawn from the same population. It is applied in
general conditions and does not have any specific requirement.
€€ The Kruskal-Wallis test is similar to one-way ANOVA with only one
difference that the former is based on ranks, while the latter is based on
numerical values.
€€ The Wilcoxon matched pairs test/signed rank test is a combination of sign
and rank tests. It is used to compare two paired samples.
IT €€ Chi-square test is used to find out dependency between two types of data.
It can also be used to make comparisons between theoretical population
(expected data) and actual data (observed data).

11.8 KEY WORDS


€€ Non-parametric tests: These tests do not require any information about the
parameters of a population from which a sample is derived.
M
€€ Sign tests: These are based on signs, not on the magnitude of observed
values.
€€ Rank correlation coefficient: This test is used to study correlation among
the ranks of different data sets.
€€ Mann-Whitney test: It is used to study independent samples to judge
whether they are taken from the same population.
€€ Kruskal-Wallis test: It is used to study more than two independent samples
to check whether they are taken from the same population.
€€ Signed rank test: It is used to study both the direction and magnitude of
samples.
€€ Chi-square test of goodness of fit: It is used to analyse nominal data (in
the yes/no format) and find out the best solution to the problem under
consideration.
€€ Chi-square test of independence: It is used to find out whether two attributes
are associated with each other.
€€ Correction factor: It refers to the adjustment made in a calculation to control
deviations in a sample or a method of measurement.

266
Non-Parametric Tests

11.9 CASE STUDY: PROBLEM FACED BY PORTABLE GENERATOR INDUSTRY


History of Portable Generators: The economic liberation policy of 1985 increased
foreign industrial collaborations in India. There was a spurt in industrial tie-ups
and consequently in industrial output. Foreign companies have collaborated with
several Indian companies. Portable generator was one such industrial segment
where foreign companies have collaborated with India companies to manufacture
generators in India. For example, Sri Ram Group entered into Joint Venture (JV)
with Honda of Japan to form Sri Ram Honda. The JV had a capacity to build
500 portable generators a day. In addition, the Birla group has partnered with
Yamaha of Japan to form Birla Yamaha, which manufactures portable generators.
However, some Indian companies independently entered in the portable generator
industry with their local brands. For example, Greaves Cotton produced portable
generators under the brand name, Lombardini. Kirloskare Group introduced a
1.5 KVA portable generator and Enfield India launched its generator under the
brand name, Gee. There were 50−60 local brands with capacity to produce 100
portable generators a day. By 1986, the total output of the portable generator
industry was 2.5 lakh units a month due to huge demand from customers.
However, this demand was short-lived and by 1987 many units closed down
IT the production of generators. For example, Kirloskare Group has withdrawn its
1.5 KVA portable generator from the market. Lombardini has also disappeared
from the market. Two major competitors, Sri Ram Honda and Birla Yamaha, were
indulged in a price war.

Scenario of the Portable Generator Industry


In the portable generator industry, the rural market is emerging and requires
generators mainly to run pump sets in farms. The market has been totally ignored
by two market leaders (Sri Ram Honda and Birla Yamaha). The leaders produce
M
expensive, good quality generators. These generators are light and fragile; therefore,
cannot be used in farms. Local brands so far have satisfies the requirement of the
rural market. The market leaders have finally realised the importance of the rural
market. For example, until now, portable generators were marketed on factors such
as low noise, fuel efficiency and reliable machine. However, market requirements
have changed over time. So leaders are conducting market research to know the
changed requirements. Sri Ram Honda and Birla Yamaha hired researchers to
study the changing market scenario. The researchers divided the problem into
two research topics. The first research topic is to study the requirements of rural
market in terms of technical feasibility and consumer preferences. The second
topic is to compare the two types of generators on the basis of their efficiency with
respect to the rural market. To study the first research topic, researchers collected
two samples: one from a market leader and another from a local marketer.
Rural technicians have assigned scores to two types of generators according to
efficiency. The collected data is as follows:

Scores given by Rural Technicians to Generators


Generators
No. of Technicians Top Competitors(A) Local Marketers(B)
1 33 45

267
Research Methodology and Management Decision

Scores given by Rural Technicians to Generators


Generators
No. of Technicians Top Competitors(A) Local Marketers(B)
2 35 30
3 24 35
4 36 40
5 22 45
6 26 41
7 45 46
8 50 52
9 44 49
10 47 44
11 48 50
12 25 42

Following table shows the preferences of rural and urban customers for two types
(branded and local) of generators:
IT Preferences of Rural and Urban People for Local and Branded Generators
Top Competitors Local Marketers Total
Rural Market 100 150 250
Urban Market 120 99 219
Total 220 249 469

The researchers concluded that the rural market is widely different from the urban
M
market. In addition, the efficiency of generators produced by top competitors is
almost same as those produced by local companies. Therefore, the generators
produced by top competitors, to fulfil demand from urban marketers, can also
be introduced in the rural market. The two marketleaders should market their
products very effectively in the rural market to capture the market share. Local
marketers have the first mover advantage in the rural market. The market leaders
can make slight changes in their generators to improve their capacity and promote
their products as specifically designed for the rural market.

QUESTIONS
1. What are the two research topics identified by researchers?
(Hint: Studying the requirements of rural market in terms of technical
feasibility and consumer preferences)
2. What is the conclusion given by researchers in the case study?
(Hint: The researchers concluded that the rural market is widely different
from the urban market)

268
Non-Parametric Tests

11.10 EXERCISE
1. Explain the concept of non-parametric test.
2. Describe the rank correlation with the help of example.
3. Explain the types of sign tests.
4. Discuss the concept of U test with the help of a diagram.
5. Write short notes on:
a. Chi-square test
b. Wilcoxon matched pairs/Signed rank test

11.11 ANSWERS FOR SELF ASSESSMENT QUESTIONS


Topic Q. No. Answer
Non-Parametric Test 1. distribution-free
2. True
IT Sign Test 3.
4.
5.
Sign test
True
c. Two sample sign test
Rank Correlation 6. Rank correlation
Rank Sum Test 7. Mann-Whitney test (or U test)
8. Kruskal-Wallis
9. True
M
Chi-square Test 10. Ei = Row total × Column total/Grand
total
11. True

11.12 SUGGESTED BOOKS AND E-REFERENCES

SUGGESTED BOOKS
€€ Biddle, J., & Emmett, R. Research in the History of Economic Thought and
Methodology
€€ Chandra, S., & Sharma, M. Research Methodology
€€ National Academies Press. (2009). Partnerships for Emerging Research
Institutions. Washington, D.C.

E-REFERENCES
€€ Research Guides: Organising Your Social Sciences Research Paper: 6. The
Methodology (2018); Retrieved from http://libguides.usc.edu/writingguide/
methodology

269
Research Methodology and Management Decision

€€ Research Methodology (2018); Retrieved from https://explorable.com/


research-methodology
€€ Research Methodology (2018); Retrieved from https://books.google.com/
books/about/Research_Methodology.html?id=x_kp__WmFzoC
€€ Research Methods (2018); Retrieved from https://research-methodology.net/
research-methods/

IT
M

270
CHAPTER

12
REPORT WRITING

Table of Contents
IT
Learning Objectives
12.1 Introduction
12.2 Research Proposal
Self Assessment Questions
12.3 Research Report
12.3.1 Written Report
M
12.3.2 Oral Presentations
Self Assessment Questions
12.4 Integral Parts of a Report
Self Assessment Questions
12.5 Summary
12.6 Key Words
12.7 Case Study
12.8 Exercise
12.9 Answers for Self Assessment Questions
12.10 Suggested Books and e-References
L E A R N I N G O B J E C T I V E S
IT
After studying this chapter, you will be able to:



Explain the concept of report proposal
Describe the research report
 Outline the importance of written reports
 Explain the concept of oral presentation
 Discuss the integral parts of a report
M
Report Writing

12.1 INTRODUCTION
In the previous chapter, you studied about model building and decision making.
Now, you will study about report writing.

Report writing is a process to document each and every step involved in the
research process. These steps are Introduction, Literature Review, Methodology,
Data Analysis and Interpretation, Conclusion and Recommendations. It helps the
researcher in checking whether the research is progressing in the right direction
or not. A research report serves as reference for findings and recommendations
of a research in future. The research report consists of a written report and an
oral presentation. The written report states objectives, data, research methodology
and findings. The oral presentation helps the target audience in judging whether
research recommendations are feasible to address the research problem or not.

An organisation takes several crucial decisions on the basis of a research report


related to its functioning. If the report is not clear and concise, then the organisation
may misinterpret the research findings and take wrong decisions, which may
prove disastrous for the organisation. Therefore, the researcher should observe
utmost care and adopt a predetermined structure while writing a report to prevent
IT the creeping in of ambiguities in the report.

The chapter begins by explaining the concept of research proposal. Next, it


provides in-depth information about research report. Then the following the
topics are explained: written report, audience of a report, types of report and steps
in writing a report. The integral parts of a report are also discussed. Towards the
end, the concept of oral presentations is discussed.

12.2 RESEARCH PROPOSAL


M
A research proposal is a clearly outlined plan submitted by one party, for
acceptance or rejection by another party. The first party wants the research to be
conducted and the second party actually conducts the research. The first party
can be an organisation, government body, or any other entity that has a problem,
which can be solved only through research. The second party can be a research
agency, research institution, or an independent researcher. The research proposal
is a detailed description of the research prepared by the second party to explain
the first party how the research would be conducted and what the requirements
of the research are.

The research proposal includes the following information:


€€ Purpose: This is the objective for which the research is to be conducted. It
also provides information about the needs and significance of the research.
€€ Population: It refers to the universe from which the researcher takes samples.
€€ Research design: It is the layout of the research giving details of procedures
required for conducting research. Research design defines the information
needed, deigns exploratory or descriptive phases of research, specifies
measurement and scaling procedures, defines an appropriate data collection
method, specifies sampling process and sample size, develops data analysis
plan, etc.
273
Research Methodology and Management Decision

€€ Methods of data collection: These are the techniques of collecting data


for the research. Examples of data collection methods are questionnaires,
observation and interviews.
€€ Tests of significance: These tests help in analysing the collected data. The
researcher can use z-test, t-test or F-test depending on the sample size, the
type of data and the research methodology.
€€ Time frame: It is the duration within which the research would be completed.
The research proposal includes the tentative schedule to start and complete
each activity in the research.
€€ Budget: It is the estimated cost to conduct the research. The research proposal
should clearly indicate funds required for the research work.
An example of the research proposal is as follows:

RESEARCH PROPOSAL

Submitted to
Sales Manager: Vikas Kumar

Submitted by
IT Manali Batra, Senior Researcher
MSD Research Institute
Name: Manali Batra
Designation: Senior Researcher
Location of the work: Max New York Life, Elegance Tower, JasolaVihar
Working days: Monday to Saturday
M
Working hours: 9.30 am to 5.00 pm
Contact number: +919XXXXX69
Time Frame for the Project: 2 months
Expected Cost of the Project: ` XXX thousand
(This includes the cost of project designing, traveling, administration and
reporting.)
Name of the Reporting Officer: Mr. Vikas Kumar
Designation: Sales Manager
Contact Number: +919XXXXX66
Title of the Project
Comparative analysis of Max New York Life (MNYL) and HDFC Life Insurance:
A detailed study on MNYL

Objectives
€€ To study and compare the sales process of MNYL and HDFC
€€ To study the policies and products of MNYL and HDFC

274
Report Writing

€€ To compare the customer satisfaction of both the companies


€€ To study the impact of advertisement on the sale of both the companies
Methodology

The research methodology of this project consists of:


1. Research design
™ Descriptive research design
™ Hypothesis testing
2. Data collection
™ Primary data – Questionnaire, in-depth interviews
™ Secondary data – the Internet, articles in different sources (print media),
MNYL
3. Sample
™ Sampling – Purposive and convenient sampling
™ Sample size – 200
IT


™ Sample population – Customers of MNYL and HDFC
4. Tools
™ Excel
™ SPSS
Importance of the Research Work

This study will help us in determining the sales process, products and policies of
M
MNYL and HDFC. It will also shed light on the impact of advertisement on the
sale of insurance companies.

In addition, the study will help us in comparing MNYL and HDFC to know
which one is doing well in the market and satisfying its customers.

Expected Outcomes

The study aims to obtain information about:


€€ The sales process of MNYL and HDFC
€€ Products and policies of two companies
€€ The effect of advertisement on sales
€€ The trend of sales in both companies
€€ Comparison of customer satisfaction and expectations from respective
companies
Limitations of Study

This study covers data analysis of MNYL and HDFC for only a limited period
of time from the financial year 2014–15 to 2018–19. Hence, the results are
comparable and representative for this period only.

275
Research Methodology and Management Decision

1. A ________ is an agreement between two parties. The first party wants


S elf the research to be conducted and the second party actually conducts the
A ssessment research.
Q uestions

12.3 RESEARCH REPORT


A research report is a crucial part of a research as it includes solutions and actionable
recommendations of the research problem. A research report is prepared by an
analyst or researcher who is a part of the research team. If the report is not made
properly, all efforts of the researcher would become useless. The research report
can be divided into two types, as shown in Figure 1:

Written Report
Types of Research
Report
IT Oral Presentation

Figure 1: Parts of the Research Report

The written report is an official document giving the facts and information to the
interested readers in a presentable manner. The facts must be accurate, complete
and interpreted. The oral report, on the other hand, is a piece of face to face
communication presenting one’s research work in a seminar, workshop, etc. It
M
helps the researcher to present his/her views more clearly in front of research
stakeholders. Since the reporter has to interact directly with the audience, any
faltering during oral presentation can leave a negative impact on the audience.
However, an oral report helps the researcher to gather valuable suggestions and
feedback from the research stakeholders. As compared to an oral report, a written
report is a permanent record that can be used for reference again and again. Let us
discuss about the written reports and the oral presentations in detail.

12.3.1 WRITTEN REPORT


A research report refers to the systematic and orderly presentation of a research
activity in a written form. While writing a report, the researcher should take into
consideration various aspects such as specific objectives of the study, description
of the methods or techniques used, review of the data on which the study is
based, assumptions made in the course of study and presentation of the findings
including their limitations and supporting data.

A written report provides information about a subject or topic. In other words, it


provides the readers with insight into on which topic the research work is carried,
time duration, the methodology adopted for research, and so on. In addition, a
written report helps in identifying alternative solutions to address a problem by
presenting present and past findings and recommendations.

276
Report Writing

Audience of a Report
As already discussed, there are two parties involved in a research – the first party
wants the research to be conducted and the second party conducts the research. The
first party is called audience. The researcher should tailor the writing of research
report towards the specific requirements of the target audience. The length and
composition of a research report and the details provided in it vary as per the
target audience. This happens because organisations differ from one another in
significant ways.
The researcher should adapt his writing successfully to three types of audiences
that requires different techniques:
€€ High-tech peers: The research report should make use of the most
professional/complex resources, along with writing of jargon and technical
terms, keeping in mind the expert level knowledge of the audience.
€€ Low-tech peers: The research report should provide proper definitions for
all the abbreviations/acronyms/technical terms used throughout the writing.
This would enhance understanding where it is a mixture of laymen and
professionals.
€€ Lay readers: The research report should use simple terms that are a lot
IT easier to understand and interpret. There should be no use of abbreviations/
acronyms.

Other Types of Reports


As already discussed, different types of audience prefer different types of reports.
Broadly, reports are classified into two types – technical report and popular report.
The two types of reports are described as follows:
€€ Technical report: It lays emphasis on the method employed in conducting
M
research, assumptions made during the research, details about the research
topic, and the research findings and recommendations. Technical reports
are full-fledged reports that are generally lengthy. These reports involve a
detailed description of the research work. The target audience of technical
reports is students, government bodies, special commissions and other
organisations that need in-depth analysis of the topic.
€€ Popular report: This report is non-technical in nature and is less
comprehensive as the audience of this report is interested in knowing
the results of the research, not the entire analysis. Therefore the popular
report focuses on the findings and recommendations of the research. It lays
emphasis on simplicity and attractiveness in information presentation. The
content of the popular report should be simple, clear and less technical in
nature. Information should be explained with the help of simple charts and
graphs instead of mathematical equations. The popular report should be
attractive in terms of layout, fonts, figures, print and use of subheadings.

Steps in Writing a Report


The research report should be written in such a format that it is easily comprehensible
by the target audience. The report writing process involves sequential steps that
are described as follows:
1. Analysing the subject matter: It involves determining the kind of
development pattern to be adopted for writing the report for a particular

277
Research Methodology and Management Decision

research. Two kinds of development patterns are mostly used in research


reports: logical development and chronological development. In logical
development, the researcher makes logical decisions by using mental
thoughts and links between one topic and the other. Logical thinking is
mostly based on the study that the researcher has done during the research
work. In logical development, the subject matter moves from simple to
complex. In chronological development, the subject matter is sequentially
structured.
2. Drawing the outline of the report: At this stage of report writing, the
researcher makes a structure or outline of the report. It consists of brief
description of the topic to be covered in the report. This helps the researcher
not to miss out any topic to be studied in the report. The outline is also
considered as the framework of the report.
3. Preparing the rough draft: At this stage, the researcher starts writing the
report. The researcher organises his/her thoughts and mentions methods to
be used for data collection, analysis techniques, major findings of the research
and limitations faced by him/her during the study. The recommendations of
the study are also described in the rough draft.
4. Reviewing the rough draft: The researcher checks whether the report
IT conveys the intent of the research work to be carried out. In addition, at
this stage, the researcher also checks whether the report is apt for the target
audience.
5. Preparing bibliography: Bibliography is a section of the report that contains
sources of secondary data collection. It includes names of books, journals,
magazines and other sources of print media from where the data is collected.
It also contains the Internet links used in the preparation of the research
report. There is a proper pattern to write the name of the source from where
the data is collected.
M
Multiple styles of referencing can be used such as APA citation, Harvard
referencing and MLA format, each having its unique rules for the structure
of references with respect to author name, book title, date, publisher name,
etc. Let us understand the pattern of mentioning data sources in bibliography
with the help of the following examples:
For books and pamphlets, the order of writing in APA referencing is as
follows:
Last name of the author, initials of the first name (year). title of the book
(edition). place. publisher name.
For example,
Sekaran, U., & Bougie, R. (2016). RESEARCH METHODS FOR BUSINESS
(4th ed.). New York: Wiley.
For websites, the order of writing in APA referencing is as follows:
Article title. (year). Retrieved from : URL.
For example,
4 Types of Research Methods For Start-Ups. (2019). Retrieved from https://
www.bl.uk/business-and-ip-centre/articles/4-basic-research-methods-for-
business-start-ups.

278
Report Writing

6. Making the final draft: At this stage of report writing, the researcher gives
a final touch to his/her report. The final report is prepared keeping in mind
the objective of the research. It should be simple, concise and convincing. At
this stage, it is checked whether all the portions of the research are covered
or not.

12.3.2 ORAL PRESENTATIONS


Most of the time, oral presentations are given with the help of PowerPoint
software, which facilitate data presentation in the form of graphs and charts. These
presentations are preferred by most organisations, as they are less time consuming
and economical. Oral presentations can be given to a large number of audiences
in a single instance, whereas written reports can be read by only one person in a
single instance.
The duration of an oral presentation is maximum 30 minutes. The researcher should
be able to explain his/her entire research work in the given time. He/she should
have convincing skills and presentable enough to gain the attention of the target
audience. The researcher should also handle the queries of the audience patiently
and should be well-prepared for the presentation to minimise the chances of errors.
He/she should not get irritated and frustrated while answering the queries.
IT WRITING AN ACADEMIC MANUSCRIPT
The major steps involved in the process of writing or developing an academic
E xhibit
manuscript include:
1. Finalise the list of authors: Most of research projects are carried out by
a team of researchers and each researcher contributes in a different way.
Some researchers contribute majorly and are called lead researchers.
M
The names of all the researchers who fulfil the criteria for authorship on
paper are finalized to be included in the research paper.
2. Start preparing the research paper documentation before the
experiments complete: While writing about the experiments that the
researchers are carrying out, they may generate ideas that need to be
executed in addition to the ongoing experiments. So, it is always a good
practice to start writing before the experiments are completed.
3. Decide the time to publish: When the researchers are satisfied that
they are done with their experiments and the findings of the research
represent a certain story that adds value to the given literature; it is time
to publish the paper.
4 Decide a suitable name for the research paper and draft an abstract:
Deciding the title and writing abstract helps in identifying what all
experiments and results the research team would publish in a single
research paper. They may decide to include the advanced stages of
experiments along with their findings in next research paper.
5. Determine the format of your research paper: The three basic types
of research paper include: full-length research papers (wide scope and
its uses the Introduction, Methods, Results and Discussion (“IMRAD”
format)), short research papers (usually 3500 words or less) and rapid
communications research paper (narrow scope).

279
Research Methodology and Management Decision

6. Peer-review: The research paper should be reviewed by the research


team internally.
7. Decide the content to be included in different sections: Research
team should divide their research paper into different sections such as
Introduction, Methods, Results, and Discussion. After this, they should
aggregate and place the relevant content under each heading.
8. Create the tables, graphs and figures: Before beginning to write the
paper, the researchers must prepare their figures and tables in advance.
9. Prepare the first draft: The research team must prepare the first draft
of the research paper. For this, the research team may divide different
sections to be drafted among them. However, before finalizing the
research paper, it must be reviewed by a single editor so that the writing
style of the research paper can be normalized.
10. Revise the manuscript: Important activities to be included in this stage
include: making alterations, polishing the writing style and grammar
and formatting the document.
11. List the references and make a bibliography: All the sources of data
IT that have been referred or used by the research team must be included
under the references and bibliography sections.

2. Which of the following are the most common purposes of writing a


report?
S elf
A ssessment a. Providing information b. Generating ideas
Q uestions
M
c. Finding solution d. All of the above
3. Which of the following types of audience needs only one- or two-page
report?
a. Mathematicians b. Business firms
c. Students of literature d. Chemists
4. Oral presentations can be given to a large number of audiences in a
single instance, whereas written reports can be read by only one person
in a single instance. (True/False)
5. What is the duration of an oral presentation?
a. 10 mins b. 25 mins
c. 30 mins d. 40 mins

12.4 INTEGRAL PARTS OF A REPORT


A research report contains many sections that provide segregated research
information. Every part of the report is written and described in a different format.
The length, data, objective and style of every part are different.

280
Report Writing

The following points explain different parts of a report:


€€ Title page: It includes the heading of the report. The report should have
a descriptive title that gives an overview of the research. The title page
contains the name of the sponsor of the study, the name of the researcher,
and duration of the research. Some examples of research titles are as follows:
zz Study of the types of investors in the present scenario
zz Factors affecting consumer preferences during shopping
zz Impact of retail display and store design on customer buying behaviour
€€ Preliminary pages: These pages include acknowledgement or preface of the
report which includes topic of the research and the person who authorises
the researcher to conduct the research. It also contains the name of the people
who have contributed to the research. Preface talks about the subject matter
of the report.
€€ Executive summary: It contains a brief account of the introduction, body and
conclusion of the research. It gives an idea of every segment in the report.
The summary can come at the start or end of the report. It depends on the
type of report and the way of report writing.
IT €€ Introduction and objective: It contains the detailed background of the
research topic and the purpose of conducting the research. For example, if
the research is carried on an organisation’s product, the introduction would
include product features and the background, profile, market and future
plans of the organisation. It can also involve the industry background, which
includes information regarding main players and the level of competition in
the market.
€€ Body of the report: This part contains a detailed description of the research
topic. It also contains methodology used in the research and analysis of the
M
collected data.
€€ Findings, conclusion and recommendations: This part contains major
findings of the research.
€€ Bibliography and appendices: This part lists sources from where the research
data is collected. Bibliography contains sources of secondary data while
appendices contain the sources of primary data or some extra information
about the research topic. Appendices also contain the questionnaire or other
sources of acquiring data.

6. The ____________ contains the name of the sponsor of the research, the
name of the researcher and duration of the research.
S elf
A ssessment 7. Bibliography contains the sources of secondary data while appendices
Q uestions contain the sources of primary data or some extra information about the
research topic. (True/False)
8. ____________ contain the questionnaire or other sources of acquiring
data.

281
Research Methodology and Management Decision

12.5 SUMMARY
€€ A research proposal is an agreement between two parties. The first party
wants the research to be conducted and the second party actually conducts
the research.
€€ The research proposal includes purpose, population, research design,
methods of data collection, tests of significance, time frame and budget.
€€ A research report is a crucial part of a research as it includes solutions and
actionable recommendations of the research problem.
€€ A research report can be of two types, namely written report and oral
presentation.
€€ Broadly, reports are classified into two types – technical report and popular
report.
€€ Technical report lays emphasis on the method employed in conducting
research, assumptions made during the research, details about the research
topic and the research findings and recommendations.
€€ Popular report is non-technical in nature and is less comprehensive as the
IT €€
audience of this report is interested in knowing the results of the research,
not the entire analysis.
The report writing process involves sequential steps, which are analysing the
subject matter, drawing the outline of the report, preparing the rough draft,
reviewing the rough draft and preparing bibliography.
€€ Oral presentations are given with the help of PowerPoint software, which
facilitate data presentation in the form of graphs and charts.
€€ A research report contains many sections that provide segregated research
M
information, which are title page, preliminary pages, executive summary,
introduction and objective, body of the report, findings, conclusion,
recommendations and bibliography and appendices.

12.6 KEY WORDS


€€ Final outline: The stage of the report writing in which the researcher makes
a structure or outline of the report.
€€ Population: The universe from which the researcher takes samples for the
research.
€€ Review of the rough draft: The stage in which the researcher reviews his/
her report.

€€ Rough draft: The stage in which the researcher starts writing a report.

12.7 CASE STUDY: NEW PRODUCT


ABC Company wants to launch a new product, PR Paints, in a new market.
Therefore, it hires a research agency to conduct a research and present a short
report of 1–2 pages. Through the research, the manager wants to know about

282
Report Writing

the market and the ways to enter the new market. The researchers prepare the
following report proposal to be submitted to the company:

RESEARCH PROPOSAL
Submitted to
Manager S.R. Dicosta
Submitted by
Veera Malhotra
Senior Researcher
RPS Research Institute
Name: Veera Malhotra
Designation: Senior Researcher
Location of the work: New Delhi
Working Days: Monday to Saturday
Working Hours: 9.30am to 5.00pm
IT
Contact Number: +919XXXXX69
Time Frame for the Project: One month
Name of the Reporting Officer: Mr. S.R. Dicosta
Designation: Sales Manager
Contact Number: +919XXXXX66
Title of the Project
M
Study of Paint Industry in Delhi
Objectives
The objectives of the study are as follows:
€€ To study the paint industry in Delhi
€€ To determine the prospective customers of PR Paint
€€ To compare the paint industry and the present industry of ABC Company
€€ To give recommendations about the launch of PR Paint
Methodology
The research methodology of this project consists of:
1. Research Design
™ Descriptive research design
™ Hypothesis testing
2. Data collection
™ Primary data: Questionnaire and In-depth interviews

283
Research Methodology and Management Decision

™ Secondary data: The Internet and articles in different sources (print


media)
3. Sample
™ Sampling: Purposive and convenient sampling
™ Sample size: 200
™ Sample population: Customers of similar product in the market
4. Tools
™ Excel
™ SPSS
™ TABLEAU
Importance of the Work
This study will help in knowing the competition level in the new market and
how the company will beat this competition and enter the market.
Expected Outcomes
IT From this project, we will come to know about the following:
€€ What is size of the paint industry in Delhi?
€€ What are the products in this industry at present?
€€ What is the level of competition in the market?
€€ How to enter the market?
€€ What product development strategy will meet the customers’ requirements?
M
After sending the report proposal, the researcher starts research and writes a
report after the completion of the research. The report is as follows:

Research Report
Study of Paint Industry in Delhi
Findings of the study
The major findings of the research are as follows:
€€ The paint industry in Delhi is very large.
€€ The paint market is highly competitive because a huge variety of paints
with new colour combinations are available. However, there is a scope to
enter in the paint market with new and innovative ideas.
€€ Customers always want to use new colours for their offices and houses.
Recommendations of the Study
Some recommendations made to ABC Company are as follows:
€€ Introduction of a new product, PR Paint, in the new market could be a
good decision.

284
Report Writing

€€ Consider the requirements of consumers while introducing a new product


in the paint market. The consumers want that the paint should give a
smooth touch, different shades and innovative colour combinations.
€€ Provide customers a different range of colours, which help in keeping the
product price higher.
Objectives of the Study
The objectives of the research are as follows:
€€ To study the paint industry in Delhi
€€ To study the market of PR Paint
€€ To compare the paint industry and the present industry of ABC Company
€€ To give recommendations about the launch of PR Paint
Data Collection
The data is collected from the following sources:
€€ Primary data: Includes the data collected by conducting interviews with
shopkeepers and customers. The shopkeepers were asked which company
IT was providing them better discounts and which products were preferred
by the customers. The customers were asked about how and why they
selected a particular paint. Also, customers’ requirements in terms of
expected product design or features were analysed.
€€ Secondary data: Includes the data collected from the Internet, books and
articles on the paint industry in Delhi. The researcher also used documents
of the company to know about its background.
Results
M
The researcher tries to explain the results of the research with the help of SWOT
(Strengths, weaknesses, opportunities and Threats) analysis, which is presented
in the following table:
Strengths Weaknesses
€€ Strong brand image €€ Centralised structure
€€ Dedicated sales team €€ Rigid department heads
€€ Value-added services
Opportunities Threats
€€ Large untapped market €€ Presence of very strong competitors
€€ Distinguishable product (such €€ Aggressive marketing by competitors
as PR Paint) €€ Various paints of good quality
€€ Unsatisfied customer
€€ New area of expansion

Table: SWOT Analysis

QUESTIONS
1. Which type of report is used in the case study?
(Hint: Popular report is used in the case study.)

285
Research Methodology and Management Decision

2. What is the conclusion given by the researcher in the case study?


(Hint: The researcher concluded that the competition is very high in the
paint industry.)

12.8 EXERCISE
1. Explain the research proposal.
2. What do you meant by research report?
3. Explain the concept of written report in detail.
4. Discuss the integral parts of a report.

12.9 ANSWERS FOR SELF ASSESSMENT QUESTIONS


Topic Q. No. Answer
Research Proposal 1. research proposal
Research Report 2. d. All of the above
IT 3.
4.
5.
b.
True
c.
Business firms

30 mins
Integral Parts of a Report 6. title page
7. True
8. Appendices
M
12.10 SUGGESTED BOOKS AND E-REFERENCES

SUGGESTED BOOKS
€€ Biddle, J., & Emmett, R. Reserach in the history of economic thought and
methodology.
€€ Chandra, S., & Sharma, M. Research methodology.
€€ National Academies Press. (2009). Partnerships for emerging research institutions.
Washington, D.C.

E-REFERENCE
€€ Research Guides: Organising Your Social Sciences Research Paper: 6. The
Methodology. (2018). Retrieved from http://libguides.usc.edu/writingguide/
methodology
€€ Research Methodology. (2018). Retrieved from https://explorable.com/
research-methodology
€€ Research Methodology. (2018). Retrieved from https://books.google.com/
books/about/Research_Methodology.html?id=x_kp__WmFzoC
€€ Research Methods. (2018). Retrieved from https://research-methodology.net/
research-methods/

286

You might also like