Research Methdology

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 133

RESEARCH METHDOLOGY

UNIT 01 :INTRODUCTION TO RESERCH

Structure

 Introduction
 What is Research?
 Types of Research
 Exploratory Research
 Conclusive Research
 Process of Research
 Research Applications in Social and Business Sciences
 Features of a Good Research Study
 Summary

Introduction
You might have watched on TV the panel discussion that takes place before a cricket
match starts. The facilitator asks the panel members questions like:

 Which team will win the match today? Will Sachin Tendulkar score a century?
 What will be the score that the batting side will pile? quote factors such as the
following:
 The outcome of previous instances when the two sides met and the winning streak
of the teams at the venue
 The number of centuries Tendulkar has scored on a particular ground and against the
opposite side Weather conditions, etc.

What the panel members are doing is that they are using the existing evidence or data
systematically to make match predictions. In other words, we could say that they are using
research methodology to answer the questions.
Research methodology refers to the procedures used in making systematic observations or
otherwise obtaining data, evidence, or information as part of a research project or study. It
defines what the activity of research is, how to proceed, how to measure progress and what
constitutes success. We will study more about the various aspects of research methodology in
this unit. But first, let us understand what research is.

Research helps in decision making, especially in business. Effective decisions lead to


managerial success, and this requires reducing the element of risk and uncertainty. For
example, let us say, an ice-cream company has come up with a new flavour of ice-cream,
which is a mixture of mango and vanilla. They are thinking of two names –'Aam Masti' or
'Mango Mania'. They would like to sell the ice-cream to children and are not sure which

1|Page
name would be more appealing. One of the ways in which this can be done is by using the
scientific method of inquiry and following a structured approach to collect and analyse
information and then eventually subject it to the manager's judgement. This is no magic
mantra but a scientific and structured tool available to every manager, namely, research.
Thus, research refers to a wide range of activities involving a search for information, which is
used in various disciplines.
Research activities may range from a simple collection of facts (example, the number of
MBA students who opt for higher studies abroad in a particular institute) to validation of
information (for example, is the new diet cola more popular among women?) to an
exhaustive theory and model construction (for example, constructing a model of India's
weather patterns in 2050 based on climate change projections).In this unit, we will discuss the
meaning of research, the types of researches available to a researcher and the process of a
research study. We will also discuss the application of research in different areas of
management and describe the features of a good research study.

What is Research?
Different scholars have interpreted the term 'research' differently. Fred Kerlinger
(1986) stated that 'Scientific research is a systematic, controlled and critical investigation of
propositions about various phenomena'. Grinnell (1993) has simplified the debate and stated
'The word research is composed of two syllables, 're' and 'search'.
The dictionary defines the former as a prefix meaning 'again', 'anew' or 'over again'. Search is
defined as a verb meaning 'to examine closely and carefully', 'to test and try', or 'to probe'.
Together, they form a noun describing a careful, systematic, patient study and investigation in
some field of knowledge, undertaken to establish facts or principles. Thus, drawing from the
common threads of the above definitions, we derive that management research is an unbiased,
structured, and sequential method of enquiry, directed towards a clear implicit or explicit
business objective. This enquiry might lead to proving existing theorems and models or
arriving at new theories and models. Let us now understand each part of the definition. The
most important and difficult task of a researcher is to be as objective and neutral as possible.
Even though the researcher might have a lot of knowledge about the topic, he/she must not
try to deliberately get results in the direction of the hypotheses.

The second thing to be remembered is that you follow a structured and sequential method of
enquiry. For example, you may want to look at what are the options that you can choose if
you study abroad. You search the Internet and ask your relatives and friends about what are
the options for studying abroad. This is search and not research. For research, there must be a
structured approach that you need to follow, and then only will it be called scientific. Thus,
you may do a background analysis of how many students go abroad to study, and based on
this, form a hypothesis that 80 per cent of young Indians go to universities in the USA for
further study. Then, you conduct a small survey amongst the students who are intending to go
abroad for study. Based on the data collected, you are able to prove or disprove the
hypotheses. So, we can state that you had conducted a research study. We will understand the
process of research later in the unit. The last and most important aspect of our definition that
needs to be carefully considered is the decision-assisting nature of business research. As
Easterby- Smith et al. (2002) state, business research must have some practical consequences,
either immediately, when it is conducted for solving an immediate business problem or when
the theory or model developed demands that managers and researchers work towards a goal
—whether immediate or futuristic, else the research loses its significance in the field of
management.

2|Page
The advantage of doing research is that one is able to take a decision with more confidence as
one has tested it through research. For example, you can conduct a study of young women
professionals and see that they have a need for a night crèche facility when they need to go
out of town on official duty. Here, you may conduct a small research to test what facilities
they would like in this crèche and how much would they be willing to pay for this facility. In
fact, it would not be wrong to say that without the tool of research there would be no new
business practices or methods, as no one would want to start something new (for example,
launch a new product, enter a new market segment, etc) without testing through research.

Types of Research
Though every research conducted is unique, it is possible to categorize the research
approach that you may decide to take.
Sometimes, research may be done for a purely academic reason of a need to know.
For example, studies on employee dissatisfaction and attrition led to the study of impact of
fixed working hours on family life and responsibilities. This study led to the organizations
realizing that they need to have flexible working hours so that employees can better manage
their work-life balance. The context of this kind of study is vast and time period, flexible.
This type of research is termed as fundamental or basic research. On the other hand, you have
studies that are specific to a particular business decision. For example, you want to study the
reason why a particular product is not doing well and you need to identify the reasons for
this, in order to take corrective action. Thus, the study you undertake would be of practical
value to the specific organization. Secondly, it has implications for immediate action. This
action-oriented research is termed as applied research.
However, at this juncture we would like to advise the reader not to look at the two as
opposites of each other. It may happen that the research which started as applied might lead
to some fundamental and basic research, which expands the body of knowledge or vice versa.
The process followed in both basic and applied research is systematic and scientific; the
difference between them could simply be a matter of context and purpose.
Research studies can also be classified on the basis of the nature of inquiry or the
objective behind the conduction. Based on the nature of inquiry or the objective, research can
be of the following types:
 Exploratory research
 Conclusive research
Exploratory Research
As the name suggests, exploratory research is used to gain a deeper understanding of
the issue or problem that is troubling the decision maker. The idea is to provide direction to
subsequent and more structured and rigorous research. The following are some examples of
exploratory research:

 Let us say a diet food company wants to find out what kind of snacks customers like
to eat and where they generally buy healthy food from.
 A reality show producer wants to make a show for children. He would like to know
what kind of shows children like to watch.
 There is an investment bank that would like to know from its customers about what
kind of help they want from the bank while making their investments.

3|Page
As can be seen from the examples above, an informal exploratory study would be needed.
Exploratory research studies are less structured, more flexible in approach and sometimes
could lead to some testable hypotheses. Exploratory studies are also conducted to develop the
research questionnaire.
(These will be discussed in detail in Unit 3.) The nature of the study being loosely structured
means that the researcher's skill in observing and recording all possible information will
increase the accuracy of the findings.

Conclusive Research
Conclusive research is especially carried out to test and validate the study of
hypotheses. In contrast to exploratory research, these studies are more structured and definite.
The variables and constructs in the research are clearly defined; for example, studying the
customer satisfaction for the different pizzas in the menu of Pizza Hut amongst heavy
consumers of pizza. Now, this needs clear definition of what is customer satisfaction;
secondly, what will be the way that heavy users can be identified.

The timeframe of the study and respondent selection are more formal and
representative. The emphasis on reliability and validity of the research findings are all the
more significant, as the results might need to be implemented.
Based on the nature of investigation required, conclusive research can
further be divided into the following types:

 Descriptive research

 Causal research

Descriptive research
The main goal of descriptive research is to describe the data and characteristics about
what is being studied. The annual census carried out by the Government of India is an
example of descriptive research. The census describes the number of people living in a
particular area. It also gives other related data about them. It is contemporary and time-
bound. Some more examples of descriptive research are as follows:

 A study to distinguish between the characteristics of the customers who buy normal
petrol and those who buy premium petrol

 A study to find out the level of involvement of middle level versus senior level
managers in a company's stock-related decisions

 A study on the organizational climate in different organizations

All the above research studies are conducted to test specific hypotheses and trends.
For example, we might study in the second example the hypotheses that the level of
involvement of senior level managers is higher than middle level managers in stock-related
decisions. Thus, these studies are more structured and require a formal, specific and
systematic approach to sampling, collecting information and testing the data to verify the

4|Page
research hypotheses.

Causal research
Causal research studies explore the effect of one thing on another and more
specifically, the effect of one variable on another. For example, if a fast-food outlet currently
sells vegetarian fare, what will be the impact on sales if the price of the vegetarian food is
increased by 10 per cent? Causal research studies are highly structured and require a rigid
sequential approach to sampling, data collection and data analysis. This kind of research, like
research in pure sciences, requires experimentation to establish causality. In majority of the
situations, it is quantitative in nature and requires statistical testing of the information
collected.
Process of Research
Any research starts with the need and desire to know more. This need might be purely
academic (basic or fundamental research) or there might be an immediate business decision
that requires an effective and workable solution (applied research).
While conducting research, information is gathered through a sound and scientific research
process. Each year, organizations spend enormous amounts of money on research and
development in order to maintain their competitive edge. Thus, we propose a broad
framework that can be easily be followed in most researches. 

In the following paragraphs, we will briefly discuss the steps that in general any research
study might follow.
The Management Dilemma
Any research starts with the need and desire to know more. This is essentially the
management dilemma. It could be the researcher himself or herself, or it could be a business
manager who gets the study done by a researcher. The need might be purely academic (basic
or fundamental research) or there might be an immediate business decision that requires an
effective and workable solution (applied research).
Defining the research problem
This is the first and the most critical step of the research journey. For example, a soft
drink manufacturer who is making and selling aerated drinks now wants to expand his
business. He wants to know whether moving into bottled water would be a better idea or he
should look at fruit juice based drinks. Thus, a comprehensive and detailed survey of the
bottled water as well as the fruit juice market will have to be done. He will also have to
decide whether he wants to know consumer acceptance of a new drink. Thus, there has to be
complete clarity in the mind of the researcher regarding the information he must collect.
Formulating the research hypotheses
In the model, we have drawn broken lines to link the research definition problem
stage to the hypotheses formulation stage. The reason is that every research study might not
always begin with a hypothesis; in fact the task of the study might be to collect detailed data
that might lead to, at the end of the study, some indicative hypotheses to be tested in

5|Page
subsequent research. For example, while studying the lifestyle and eating-out behaviour of
consumers at Pizza Hut, one may find that the young student group consumes more pizzas.
This may lead to a hypothesis that young consumers consume more pizzas than older
consumers.
Hypothesis is, in fact the assumptions about the expected results of the research. For
example, in the above example of work-life balance among women professionals we might
start with a hypothesis that higher the work- family conflict higher is the intention to leave the
job. Conversion of the defined problem into working hypotheses will be discussed in Unit 2.
Developing the research proposal
Once the management dilemma has been converted into a defined problem and a
working hypothesis, the next step is to develop a plan of investigation. This is called the
research proposal. The reason for its placement before the other stages is that before you start
you need to spell out the research problem, the scope and the objectives of the study and the
operational plan for achieving this. The proposal is a flexible contract about the proposed
methodology and once it is made and accepted, the research is ready to begin. The
formulation of a research proposal, its types and purpose will be explained in the next unit.
Research design formulation
Based on the orientation of the research, i.e., exploratory, descriptive or causal, the
researcher has a number of techniques for testing the stated objectives. These are termed in
research as research designs. The main task of the design is to explain how the research
problem will be investigated. There are different kind of designs available to you while doing
a research. These will be discussed in detail in Unit 3.
Sampling design
It is not always possible to study the entire population. Thus, one goes about studying
a small and representative sub-group. This sub-group is referred to as the sample of the study.
There are different techniques available for selecting the group based on certain assumptions.
The most important criteria for this selection would be the representativeness of the sample
selected from the population under study.

Two categories of sampling designs available to the researcher are probability and
non-probability. In the probability sampling designs, the population under study is finite and
one can calculate the probability of a person being selected. On the other hand, in non-
probability designs, one cannot calculate the probability of selection. The selection of one or
the other depends on the nature of the research, degree of accuracy required (the probability
sampling techniques reveal more accurate results) and the time and financial resources
available for the research. Another important decision the researcher needs to take is to
determine the best sample size to be selected in order to obtain results that can be considered
as representative of the population under study.

Planning and collecting the data for research


We have placed planning and collecting data for research as simultaneous to the sampling
plan. The reason for this is that the sampling plan helps in identifying the group to be studied

6|Page
and the data collection plan helps in obtaining information from the specified population. The
data collection methods may be classified into secondary and primary data methods. Primary,
as the name suggests is original and collected first hand for the problem under study. There
are a number of primary data personal/telephonic interviews, mail surveys and
questionnaires.
Secondary data is information that has been collected and compiled earlier. For example,
company records, magazine articles, expert opinion surveys, sales records, customer
feedback, government data and previous researches done on the topic of interest. This step in
the research process requires careful and rigorous quality checks to ensure the reliability and
validity of the data collected.

Data refining and preparation for analysis


Once the data is collected, it must be refined and processed in order to answer the
research question(s) and test the formulated hypotheses (if any). This stage requires editing of
the data for any omissions and irregularities. Then it is coded and tabulated in a manner in
which it can be subjected to statistical testing. In case of data that is subjective and
qualitative, the information collected has to be post coded i.e. after the data has been
collected.
Data analysis and interpretation of findings
This stage requires selecting the analytical tools for testing the obtained information.
There are a number of statistical techniques available to the researcher—frequency analysis,
percentages, arithmetic mean, t-test and chi- square analysis. These will be explained in the
later units. Methods available to the researcher like interviews, focus group discussions,Once
the data has been analysed and summarized, linking the results with the research objectives
and stating clearly the implications of the study is the most important task for the researcher.
The research report and implications for the manager's dilemma
The report preparation, from the problem formulation to the interpretation, is the final part of
the research process. As we stated earlier, business research is ultimately always directed
towards answering the question 'so what are the implications for the corporate world?' Thus,
in this step, the researcher's expertise in analyzing, interpreting and recommending is very
important. This report has to give complete details about everything that was done right from
problem formulation to methodology to the conclusions that brings an end to the study. The
nature of the report may be different depending on whether it is meant for a business person
or is an academic report. This will be discussed in detail in Unit 13.
Research Applications in Social and Business Sciences
Research is a crucial element in the area of business. It helps the decision maker to
identify new opportunities for business growth. Research provides information about various
aspects of business, like product life cycle, consumer behaviour, market opportunities and
threats, technological changes, social changes, economic changes, environmental changes,
and so on, which are important for any decision maker to run the business smoothly.
Research is crucial in the following areas of business:

7|Page
Marketing function:
Research is the lifeline in the field of marketing, where it is carried out on a vast area
of topics and is conducted both in-house by the organization itself and outsourced to external
agencies. This could be related to the 4 Ps—product, price, place and promotions.

Personnel and human resource management:


Human resources (HR) and organizational behaviour is an area which involves basic
or fundamental research as a lot of academic, macro-level research may be adapted and
implemented by organizations into their policies and programmes.

Financial and accounting research:


The area of financial and accounting research is quite vast and includes asset pricing,
corporate finance and capital markets, market-based accounting research, modelling and
forecasting in volatility, risk, etc.
management is one in which research results are implemented, taking on huge cost and
process implications. Research in this area relates to operation planning, demand forecasting,
process planning, project management, supply chain management, quality assurance and
management.
Research in social science includes an in-depth study and evaluation of human
behaviour by using scientific methods in either quantitative or qualitative manner. As social
science is concerned with the study of society and human behaviour it is important for a
business organization in terms of understanding their customers, their taste, needs,
preferences, lifestyle and their behaviour. New products or services are unlikely to succeed
without proper consumer studies and survey.

Features of a Good Research Study


In the above sections, we learnt that research studies can vary from the loosely
structured method based on observations and impressions to the strictly scientific and
quantifiable methods. However, for a research to be of value, it must possess the following
characteristics:
a) It must have a clearly stated purpose. This not only refers to the objective of the study,
but also precise definition of the scope and domain of the study.
b) It must follow a systematic and detailed plan for investigating the research problem.
The systematic conduction also requires that all the steps in the research process are
interlinked and follow a sequence.
c) The selection of techniques for collecting information, sampling plans and data
analysis techniques must be supported by a logical justification about why the methods were
selected.
d) The results of the study must be presented in an unbiased, objective and neutral
manner.

8|Page
e) The research at every stage and at any cost must maintain the highest ethical
standards.
f) And lastly, the reason for a structured, ethical, justifiable and objective approach is the
fact that the research carried out by you must be replicable. This means that the process
followed by you must be 'reliable', i.e., in case the study is carried out under similar
conditions it should be able to reveal similar results.
Summary
Let us recapitulate the main points discussed in this unit:

 Research is a tool of special significance in all areas of management. It can be


defined as an unbiased, structured and sequential method of inquiry, directed towards a
clear implicit or explicit business objective. This enquiry might lead to proving existing
postulates or arriving at new theories and models.

 Research may be done for a purely academic rea or a need to know (fundamental or
basic research) or it could be undertaken as it would be of practical value to an
organization with implications for immediate action (applied research).

 Based on the nature of inquiry or the objective, research can be exploratory or


conclusive research.

 Conclusive research can be of two types—descriptive or causal studies. A research


study usually follows a structured sequence of steps:

o Developing and defining the research problem


        o      Formulating the study hypothesis
o Developing the study plan or proposal
        o      Identifying the research design
o Designing the sampling approach
o Conceptualizing and developing the data collection plan o Executing data
analysis
o Working out data inference and conclusions
o Compiling and preparing the research report

 Different kinds of studies are carried out in the area of business management, such
as marketing, finance, human resources and operations, each having their own
orientation and approach.

 For a research to be recognized as significant, it must follow some basic criteria –


clearly stated purpose; a systematic and detailed plan; logical justification for the
selection of techniques for collecting information, sampling plans and data analysis
techniques; unbiased, objective and neutral results; ethical standards; sequential and
replicable

UNIT 02 :RESERCH PROBLEM & FORMULATION OF RESERCH HYPOTHESIS

9|Page
Structure

 Introduction
 Defining the Research Problem
 Management Problem vs. Research Problem
 Problem Identification Process
 Components of the Research Problem
 Formulating the Research Hypotheses
 Types of Research Hypotheses
 Writing a Research Proposal
 Contents of a Research Proposal
 Types of Research Proposal
 Summary
 Keywords

Introduction
In the last unit, you were introduced to the meaning of research as well its  types, process and
features. In this unit, we will focus on the research problem and the formulation of the
research hypothesis. The most important aspect of the business research method is to identify
the 'what', i.e., what is the exact research question to which you are seeking an answer. The
second important thing is that the process of arriving at the question should be logical and
follow a line of reasoning that can lend itself to scientific inquiry. This reasoning approach
needs to be converted into a possible research question. And based on the initial study of the
research topic, one should be able to make certain assumptions which can lend direction to
the study as research hypotheses.Thus, in this unit, we will understand how to identify a
problem that can be subjected to research and help us reduce decision risks.

Defining the Research Problem


The challenge for a business manager is not only to identify and define the decision problem;
the bigger challenge is to convert the decision into a research problem that can lead to a
scientific inquiry. As Powers et al. (1985) have put it, 'Potential research questions may occur
to us on a regular basis,but the process of formulating them in a meaningful way is not at all
an easy task'. One needs to narrow down the decision problem and rephrase it into workable
research questions.Thus, the first and the most important step of the research process is like
the start of a journey, in this instance the research journey, and the identification of the
problem gives an indication of the expected result. A research problem can be defined as a
gap or uncertainty in the decision makers' existing body of knowledge which inhibits efficient
decision making. Sometimes it may so happen that there might be multiple alternative paths
one can take and we will have to select which of these we would like to consider as the
problem to be studied.
As Kerlinger (1986) states, 'If one wants to solve a problem, one must generally know
what the problem is. It can be said that a large part of the problem lies in knowing what one is
trying to do'. The defined research problem might be classified as simple or complex.
Simple problems are those that are easy to understand and the components and identified
relationships are linear, e.g., the relation between cigarette smoking and lung cancer.
Complex problems on the other hand, talk about the interrelationship between multiple
variables, e.g., the effect of job autonomy and organizational commitment on work
exhaustion, at the same time considering the interacting (combined) effect of autonomy and

10 | P a g e
commitment. This might be further different for males and females. These kinds of problems
require a model or framework to be developed to define the research approach.

Management Problem vs. Research Problem


The problem recognition process starts with the decision maker and some difficulty or
dilemma that he/she might be facing. Sometimes, this might be related to actual and
immediate difficulties faced by the manager (applied research) or gaps experienced in the
existing body of knowledge (basic research). The broad decision problem has to be narrowed
down to an information-oriented problem which focuses on the data or information required
to arrive at any meaningful conclusion. Given in Table 2.1 is a set of decision problems and
the subsequent research problems that might address them. 

Formulating research problem 

The steps involved in formulating the research problems are to: 

 Develop a suitable title


 Build a conceptual model of the problem
 Define objective of the study
 Set up investigative questions
 Formulate hypothesis
 State the operational definition of concepts
 Determine the scope

We can say that,management problem is a difficulty faced by the decision maker and by itself
cannot be tested. In case the decision maker is a business manager, the management's
problem is looking for answers to the problem faced by the manager, as in the above example
of how to reduce the turnover rate in a BPO company. This problem has to be reduced to a
simpler form of research question. And as said earlier, there can be more than one research
problems that can help the manager in taking a decision. It will depend on the researcher how
he looks at it. For example, he may say that the research problem is:

 What are the management policies in other BPO companies?


 Why do the employees of the company leave the company? What is the problem
area?
 Are the shift duties creating a problem of work family conflict which is why they
leave?
 How can the company work on employee engagement so that he stays with the
company?

Thus, as you can see we can have many questions. Finally, the research problem you think is
likely to give the possible solution is the one you decide to take as your research problem.

Problem Identification Process

11 | P a g e
The process of identifying the research problem involves the following steps:

1. Management decision problem: The entire process explained above begins with
the identification of the difficulty encountered by the business manager/researcher.The
manager could do the study himself or give it to a researcher or a research agency.This
step requires the researcher/decision maker to carry out a problem appraisal, which
would involve a complete audit of the origin and symptoms of the diagnosed business
problem. 

2. Discussion with subject experts: The next step involves getting the problem in the
right perspective through discussions with industry and subject experts. These
individuals are knowledgeable about the industry as well as the organization. They
could be found both within and outside the company. The information about the current
state and the future projections can be obtained in an interview. Thus, the researcher
must have a predetermined set of questions related to the doubts experienced in
problem formulation. It should be remembered that the purpose of the interview is
simply to gain clarity on the problem area and not to arrive at any kind of conclusion or
solutions to the problem. For example, for the organic food study, which is mentioned
in Table 2.1 as a decision problem, the researcher might decide to go to food experts
like doctors and dieticians to seek their opinion. This data should, in practice, be
supported with secondary data in the form of theory as well as organizational facts. 

3. Review of existing literature: A literature review is a comprehensive collection of


the information obtained from published and unpublished sources of data in the
specific area of interest to the researcher. This may include journals, newspapers,
magazines, reports, government publications, and also computerized databases. The
advantage of a survey is that it provides different perspectives and methodologies to be
used to investigate the problem, as well as identifying possible variables that may be
studied. Second, the survey might also show that our research problem has already
been investigated and this might be useful in solving the decision dilemma. It also
helps in narrowing the scope of the study into a research problem. Once the data has
been collected, the researcher must write it down in his/her own words and clearly
show how this is linked to the research topic under study. The logical and theoretical
framework developed on the basis of past studies should be able to provide the
foundation for the problem statement. The reporting should cite the author and the year
of the study clearly. There are several internationally accepted forms of citing
references and quoting from published sources. The Publication Manual of the
American Psychological Association (2001) and the Chicago Manual of Style (1993)
are academically accepted as referencing styles in management. 

4. Organizational analysis: Another significant source for deriving the research


problem is the industry and organizational data. In case the researcher/investigator is
the manager himself/herself, the data might be easily available. This data needs to
include the organizational demographics—origin and history of the firm; size, assets,
nature of business, location and resources; management philosophy and policies as well

12 | P a g e
as the detailed organizational structure, with the job descriptions. It is to be
remembered here that the organizational data might not be always essential, for
example in case of basic research, where the nature of study is not company specific
but general. 

5. Qualitative survey: Sometimes, the expert interview, secondary data and


organizational information might not be enough to define the problem. In such a case, a
small exploratory qualitative survey can be done to understand the reason for the same.
For example, soaps like Dove may be very good in terms of price and quality but very
few people in the smaller towns buy it. When we do a secondary data analysis, or talk
to experts, there seems to be no problem. Then we do a quick round of interview with
women who come to a kirana store to find out why Dove is not bought. The women tell
us that the same soap is used by the whole family, and husband and sons do not use
Dove as they say this soap is for women, which is the reason why Dove is not bought
by them. These surveys thus are done on small samples and might make use of focus
group discussions or interviews with the respondent population to help uncover
relevant and current issues which might have a significant bearing on the problem
definition. In the organic food research, focus group discussions with young and old
consumers revealed the level of awareness about organic food and consumer
sentiments related to purchase of more expensive but a healthy food product. 

6. Management research problem: Once the audit process of secondary review and
interviews and survey is over, the researcher is ready to focus and define the issues of
concern that need to be investigated further, in the form of an unambiguous and clearly
defined research problem. Here, it is important to remember that simply using the word
'problem' does not mean that there is something wrong that has to be corrected, it
simply indicates the gaps in information or knowledge base available to the researcher.
These might be the reason for his inability to take the correct decision. Second,
identifying all possible dimensions of the problem might be a monumental and
impossible task for the researcher. For example, the lack of sales of a newly launched
product could be due to consumer perceptions about the product, ineffective supply
chain, gaps in the distribution network, competitor offerings or advertising
ineffectiveness. It is the researcher who has to identify and then refine the most
probable cause of the problem and formalize it as the research problem. This would be
achieved through the five preliminary investigative steps indicated above. Once done,
the research problem has to be clearly defined in terms of certain components. This
will be discussed in the next section.
 

7. Theoretical foundation and model building: Having identified and defined the
variables under study, the next step is to try and form a theoretical framework. It can be
best understood as a schema or network of the probable relationship between the
identified variables. Another advantage of the model is that it clearly shows the
expected direction of the relationships between the concepts. There is also an
indication of whether the relationship would be positive or negative.
This step, however, is not mandatory as sometimes the objective of the research is to
explore the probable variables that might explain the observed phenomena and the
outcome of the study helps to finally develop a conceptual model.

13 | P a g e
8. Statement of research objectives: Next, the research question(s) that were
formulated need to be broken down into tasks or objectives that need to be met in order
to answer the research question.This section makes active use of verbs such as 'to find
out', 'to determine', 'to establish', and 'to measure' so as to spell out the objectives of the
study. In certain cases, the main objectives of the study might need to be broken down
into sub-objectives which clearly state the tasks to be accomplished. 

In the organic food research, the objectives and sub-objectives of the study were as follows:

1. To study the existing organic market:


o To categorize the organic products available in Delhi into grain,snacks,
herbs, pickles, squashes, and fruits and vegetables
o To estimate the demand pattern of various products for each of the above
categories
o To understand the marketing strategies adopted by different players for
promoting and propagating organic products
2. Consumer diagnostic research:
o To study the existing consumer profile, i.e., perception and attitudes
towards organic products and purchase and consumption patterns
o To study the potential customers in terms of consumer segments, level of
awareness, perception and attitude towards health and organic products
3. Opinion survey:
o To assess the awareness and opinions of experts such as doctors,dieticians
and chefs in order to understand organic consumption.

Components of the Research Problem

To address the problems of clarity and focus, we need to understand the components of a well
defined problem. These are:

a. The unit of analysis: The researcher must specify in the problem statement the
individual(s) from whom the research information is to be collected and on whom the
research results are applicable. This could be the entire organization, departments,
groups or individuals.
b. Research variables: The research problem also requires identification of the key
variables under study. A variable is any concept that varies and we can assign to it
numerals or values. A variable may be dichotomous in nature, that is, it can possess
two values such as male-female or customer-non- customer. Values that can only fit
into prescribed number of categories are discrete variables, for example, Strongly
Disagree (1) to Strongly Agree (5). There are still others that possess an indefinite set,
e.g., age, income and production data. These are called continuous variables.

Variables can be further classified into four categories, depending on the role they play in
the problem under consideration. These are:

14 | P a g e
1. Dependent variables
2. Independent variables
3. Moderating variables
4. Extraneous variables

1. Dependent variable (DV) : The most important variable to be studied and analysed
in research study is the effect-dependent variable. The entire research process is
involved in either describing this variable or investigating the probable causes of the
observed effect. Thus, this must be a measurable variable. For example, in the organic
food study, the consumer's purchase intentions as well as sales of organic food
products in the domestic market, could serve as the dependent variable.

2. Independent variable (IV): Any variable that can be stated as influencing or


impacting the dependent variable is referred to as an independent variable.Often, the
task of the research study is to establish the relationship between the independent and
dependent variable(s).In the organic food study, the consumers' attitude towards
healthy lifestyle could impact their organic purchase intention. Thus, attitude becomes
the independent and intention the dependent variable. Another researcher might want to
assess the impact of job autonomy and role of stress on the organizational commitment
of the employees; here job autonomy and role stress are independent variables.

3. Moderating variables (MV): Moderating variables are the ones that have a strong
effect on the relationship between the independent and dependent variables. These
variables must be considered in the expected pattern of relationship as they modify the
direction as well as the magnitude of the independent-dependent association. In the
organic food study, the strength of the relation between attitude and intention might be
modified by the education and the income level of the buyer. Here, education and
income are the moderating variables.

There might be instances when confusion might arise between a moderating variable
and an independent variable. Consider the following situation:
 Proposition 1: Turnover intention (DV) is an inverse function of organizational
commitment (IV), especially for workers who have a higher job satisfaction level
(MV).
While another study might have the following proposition to test: Proposition 2:
Turnover intention (DV) is an inverse function of job satisfaction (IV), especially for
workers who have a higher organizational commitment (MV). Thus, the two
propositions are studying the relation between the same three variables. However, the
decision to classify one as independent and the other as moderating depends on the
research interest of the decision maker. At this stage, we can clearly distinguish
between the different kinds of variables discussed above. An independent variable is
the prime antecedent condition which is qualified as explaining the variance in the
dependent variable; the moderating variable is a contributing variable which might
impact the defined relationship. 

15 | P a g e
4. Extraneous variables are outside the domain of the study and responsible for
chance variations, but in some instances, their effect might need to be controlled.

Formulating the Research Hypotheses


The problem identification process ends in the hypotheses formulation stage. Any
assumption that the researcher makes on the probable direction of the results that might be
obtained on completion of the research process is termed as a hypothesis. Unlike the research
problem that generally takes on a question form, the hypotheses are always in a sentence
form. The statements thus made can then be empirically tested. Kerlinger (1986) defines a
hypothesis as '…a conjectural statement of the relationship between two or more variables'. 
According to Grinnell (1993), 'A hypotheses is written in such a way that it can be proved or
disproved by valid and reliable data— it is in order to obtain these data that we perform our
study'.
 While designing any hypotheses, there are a few criteria that the researcher must fulfill.
These are: A hypothesis must be formulated in simple, clear, and declarative form. A broad
hypothesis might not be empirically testable. Thus, it might be advisable to make the
hypothesis uni-dimensional, and to test only one relationship between only two variables at a
time. For example: 

 Consumer liking for the electronic advertisement for the new diet drink will have
positive impact on brand awareness of the drink 
 High organizational commitment will lead to lower turnover intention
 A hypothesis must be measurable and quantifiable
 A hypothesis is a conjectural statement based on the existing literature and theories
about the topic and not based on the gut feel of the researcher
 The validation of the hypothesis would necessarily involve testing the statistical
significance of the hypothesized relation

Types of Research Hypotheses


Null Hypothesis: This is the conventional approach to making a prediction. It involves a
statement that says there is no relationship between two groups that the researcher compares
on a certain variable. The hypothesis also may state that there is no significant difference
when different groups are compared with respect to a variable. For example, “There is no
difference in the academic performance of high school students who participate in
extracurricular activities and those who do not participate in such activities” is a null
hypothesis. In many cases, the purpose of a null hypothesis is to allow the experimental
results to contradict the hypothesis and prove the point that there is a definite relationship.

Non-directional Hypothesis: Certain hypothesis statements convey a relationship between


the variables that the researcher compares, but do not specify the exact nature of this
relationship. This form of hypothesis is used in studies where there is no enough past research
on which to base a prediction. Continuing with the same example, a non-directional
hypothesis would read, “The academic performance of high school students is related to their
participation in extracurricular activities.”

Directional Hypothesis: This type of hypothesis suggests the outcome the investigator

16 | P a g e
expects at the end of the study. Scientific journal articles generally use this form of
hypothesis. The investigator bases this hypothesis on the trends apparent from previous
research on this topic. Considering the previous example, a researcher may state the
hypothesis as, “High school students who participate in extracurricular activities have a lower
GPA than those who do not participate in such activities.” Such hypotheses provide a definite
direction to the prediction.

Causal Hypothesis: Some studies involve a measurement of the degree of influence of one
variable on another. In such cases, the researcher states the hypothesis in terms of the effect
of variations in a factor on another factor. This causal hypothesis is said to be bivariate
because it specifies two aspects -- the cause and the effect. For the example mentioned, the
causal hypothesis will state, “High school students who participate in extracurricular
activities spend less time studying which leads to a lower GPA.” When verifying such
hypotheses, the researcher needs to use statistical techniques to demonstrate the presence of a
relationship between the cause and effect. Such hypotheses also need the researcher to rule
out the possibility that the effect is a result of a cause other than what the study has examined.

Writing a Research Proposal


We have learnt that research always begins with a purpose. Either this is
the researcher's own pursuit, or it is carried out to address and answer a specific managerial
question and arrive at a solution. This clear statement of purpose guides the research process
and must be converted into a plan for the study. 
This framework or plan is termed as the research proposal. A research proposal is a
formal document that presents the research objectives, design for achieving these objectives
and the expected outcomes/ deliverables of the study.
This step is essential both for academic and corporate research, as it clearly establishes the
research process to be followed to address the research questions. In a business or corporate
setting, this step is often preceded by a PR (Proposal Request). Here the manager or the
corporate spells out his decision problem and requests the potential suppliers of research to
work out a research plan/proposal to address the stated issues.
Another advantage of a formal proposal is that sometimes the manager may not be able to
clearly tell his problem or the researcher might not be able to understand and convert the
decision into a workable research problem. The researcher lists the objectives of the study
and then together with the manager, can review whether the listed objectives and direction of
the study will be able to deliver output for arriving at a workable solution.
For the researcher, the document provides an opportunity to identify any shortfalls in the
logic or the assumption of the study. It also helps to monitor the methodical work being
carried out to accomplish the project.

Contents of a Research Proposal


There is a broad framework that most proposals follow. In this section, we will briefly
discuss these steps.

Executive summary
This is a broad overview that gives the purpose and objective of the study. In a short
paragraph the author gives a summary about the management problem/academic concern.

Background of the problem

17 | P a g e
This is the detailed background of the management problem. It requires a sequential and
systematic build-up to the research questions and why the study should be done. The
researcher must be able to demonstrate that there could be a number of ways in which the
management dilemma could be answered.
For example, a pharmaceutical company develops a new hair growing solution and packages
it in two different types of bottles. They want to know which one people will buy. The
product testing could be done internally in the company, or the two sample bottles could be
formulated and tested for their acceptability amongst likely consumers or retailers keeping
the product; or the two types could be developed and test launched and tested for their
sales potential. The researcher thus must spell out all probabilities and then systematically
and logically argue for the research study. This section must be objective and written in
simple language, avoiding any metaphors or idioms to dramatize the plan. The logical
arguments should speak for themselves and be able to convince the reader of the need for the
study in order to find probable solutions to the management dilemma.

Problem statement and research objectives

The clear definition of the problem broken down into specific objectives is the next step. This
section is crisp and to the point. It begins by stating the main thrust area of the study. For
example, in the above case, the problem statement could be:
To test the acceptability of a spray or capped bottle dispenser for a new hair growing
formulation.The basic objectives of this research would be to:

 Determine the comparative preference of the two prototypes amongst customers of


hair growing solutions
 To conduct a sample usage test of both the bottles with the identified population of
the bottles amongst the respondents
 To prepare a comparative analysis of the advantages and problems associated with
each bottle, on the basis of the sample usage test 
 Prepare a detailed report on the basis of the findings
 If the study is addressed towards testing some assumptions in the form of
hypotheses, they must be clearly stated in this section.

Research design
This is the working section of the proposal as it needs to indicate the logical and systematic
approach intended to be followed in order to achieve the listed objectives. This would include
specifying the population to be studied, the sampling process and plan, sample size and
selection. It also details the information areas of the study and the probable sources of data,
i.e., the data collection methods. In case the process must include an instrument design, then
the intended approach needs to be detailed here. A note of caution must be given here: this is
not a simple statement of the sampling and data collection plan; it requires a clear and logical
justification of using the techniques over the methods available for research.

Scheduling the research


The time-bound dissemination of the study with the major phases of the research must be
presented. This can be done using the CPM/GANTT/ PERT charts. This gives a clear way for
monitoring and managing the research task. It also has the additional benefit of providing the
researcher with a means of spelling out the payment points linked to the delivered
phase outputs.

18 | P a g e
Results and outcomes of the research
Here the clear terms of contract or expected outcomes of the study must be spelt out. This is
essential even if it is an academic research. The expected deliverables need to clearly
demonstrate how the researcher intends to link the findings of the proposed study design to
the stated research objectives. For example, in the pharmaceutical study, the expected
deliverables are:

 To identify the usage problems with each bottle type


 To recommend, on the basis of the sample study, which bottle should be used for
packaging the liquid.

Costing and budgeting the research


In all instances of business research, both internal and external, an estimated cost of the study
is required.In addition to these sections, academic research proposals require a section on
review of related literature; this generally follows the 'problem background' section. If the
proposal is meant to establish the credentials of the research supplier, then detailed
qualifications of the research team, including the research experience in the required or
related area, help in the selection of the research proposal.
Sometimes, the research study requires an understanding of some technical terms or
explanations of the constructs under study; in such cases the researcher needs to attach a
glossary of terms in the appendix of the research proposal.The last section of the proposal is
to state the complete details of the references used in the formulation of the research
proposal. Thus the data source and address have to be attached with the formulated
document.

Types of a Research Proposals


Basically, the proposals formulated could be of three types: 

1. Academic research proposals


2. Internal organizational proposals 
3. External organizational proposals

Academic research proposal


The academic research proposal might be generated by students or academicians pursuing the
study for fundamental academic research. These kind of studies need extensive search of past
studies and data on the topic of study. An example is an academician wanting to explore the
viability of different eco-friendly packaging options available to a manufacturer.

Internal organizational proposal


The internal organizational proposals are conducted within an organization and are submitted
to the management for approval and funding. They are of a highly focused nature and are
oriented towards solving immediate problems. For example, a pharmaceutical company,
which has developed a new hair growing formulation wants to test whether to package the
liquid in a spray type or capped dispenser. The solutions are time-driven and applicability is
only for this product. These studies do not require extensive management to assess the nature
of work required.

External organizational proposals

19 | P a g e
External organizational proposals have their basis or origin within the company, but the scope
and nature of the study requires a more structured and objective research. For example, if the
above stated pharmaceutical company wishes to explore the herbal cosmetic market and
wants market analysis and feasibility study conducted; the PR might be spelt out to
solicit proposals to address the research question, and execute an outsourced research

Summary
Let us recapitulate the main points discussed in this unit:

 The most important step in research is to identify the decision to be made and how it
can be converted into a research problem
 The problem definition process is a well-integrated, linked and step wise process.
These include the unit of analysis—which is the individual or group that is to be
studied. The second element is a clear  definition of the variables under study
 By the time the research problem is identified and stated, the researcher should be
able to specify what is the causal or  independent variable and which is the effect or
dependent variable under study. Also, it is best to acknowledge the effect or presence
of any external variables which might have a contingent effect on the cause and effect
relationship that is to be studied. These can be further classified as moderator,
intervening, and extraneous variables
 It is advisable to the researcher to construct a model or theoretical framework based
on the process of problem formulation.This is recommended but not necessarily an
essential step as some studies might be of a nature that the intent is to conduct the study
and then arrive at a theory or a model
 The problem formulation process ultimately ends as a research hypothesis
 An entire step wise document in the shape of a formal plan to be followed is made.
This is called the research proposal
 There are three different kinds of research proposals available to the researcher -
academic, internal and external

UNIT 03 :RESERCH DESIGNS


Structure

 Introduction
 Nature and Classification of Research Designs
 Exploratory Research Designs
 Secondary Resource Analysis
 Case Study Method
 Expert Opinion Survey
 Focus Group Discussions
 Descriptive Research Designs
 Cross-Sectional Studies
 Longitudinal Studies
 Experimental Designs
 Errors Affecting Research Design
 Summary
 Keywords

20 | P a g e
 Introduction
In the last unit, we studied the defining of the research problem and the formulation of
the research hypothesis. However, in research, it is not enough to define the problem to
formulate the hypotheses. It has been found by research scholars and managers alike that
most research studies do not result in any significant findings because of a faulty research
design. Most researchers feel that once the problem is defined and hypotheses are made, one
can go ahead and collect the data on a specified group, or sample, and then analyse it using
statistical tests. However, unless the formulated research problem and the study hypotheses
are tested through a well defined plan, answers are going to be based on hit and trial rather
than any sound logic.

Several design approaches are available to a researcher and it depends on whether the study is
of descriptive or conclusive nature. The designs range from very simple, loosely structured to
highly scientific experimentation. In this unit, we will study the complete choice of designs,
along with detailed reasoning on which design should be used under what conditions. Just
like with experiments in science, in business research also, there are chances of error, and this
needs to be understood and controlled for more accurate results for the decision maker.

Nature and Classification of Research Designs


Once you have established the 'what' of the study, i.e., the research problem, the next
step is the 'how' of the study, which specifies the method of achieving the research objectives.
In other words, 'how' is the research design.
Green et al. (2008) define research designs as 'the specification of methods and procedures for
acquiring the information needed. It is the overall operational pattern or framework of the
project that stipulates what information is to be collected from which sources by what
procedures. If it is a good design, it will ensure that the information obtained is relevant to the
research questions and that it was collected by objective and economical procedures.'

Thyer (1993) states that, 'A traditional research design is a blueprint or detailed plan
for how a research study is to be completed—operationalizing variables so they can be
measured, selecting a sample of interest to study, collecting data to be used as a basis for
testing hypotheses, and analysing the results'. Sellitz et al. (1962) state that, 'A research
design is the arrangement of conditions for collection and analysis of data in a manner that
aims to combine relevance to the research purpose with economy in procedure'.

One of the most comprehensive and holistic definitions has been given by Kerlinger
(1995). He refers to a research design as, '….. a plan, structure and strategy of investigation
so conceived as to obtain answers to research questions or problems. The plan is the complete
scheme or programme of the research. It includes an outline of what the investigator will do
from writing the hypotheses and their operational implications to the final analysis of data'.
Thus, the formulated design must ensure three basic principles:

 Convert the research question and the stated assumptions/ hypotheses into variables
that can be measured.
 Specify the process to complete the above task.

21 | P a g e
 Specify the 'control mechanism(s)' to follow so that the effect of other variables that
could have an effect on the outcome of the study has been controlled.

Classification Research Designs


At this stage, one needs to understand the difference between research design and
research method. While the design is the specific framework that has been created to seek
answers to the research question, the research method is the technique to collect the
information required to answer the research problem, given the created framework. Thus,
research designs have a critical and directive role to play in the research process. The
execution details of the research question to be investigated are referred to as the research
design.

The researcher has a number of designs available to him for invstigating the research
objectives. The classification that is universally followed is the one based upon the objective
or the purpose of the study. A simple classification that is based upon the research needs
ranging from simple and loosely structured to the specific and more formally structured. The
best way is to view the designs on a continuum. Hence, in case the research objective is
diffused and requires refinement, one uses the exploratory design, and this might lead to the
slightly more concrete descriptive design—here one describes all the aspects of the construct
and concepts under study. This leads to a more structured and controlled experimental
research design. Analysis.

 Exploratory Research Designs


Exploratory designs, as stated earlier, are the simplest and most loosely structured
designs. As the name suggests, the basic objective of the study is to explore and obtain clarity
about the problem situation. It is flexible in its approach and mostly involves a qualitative
investigation. The sample size is not strictly representative and at times it might only involve
unstructured interviews with a couple of subject experts. The essential purpose of the study is
to:

 Define and understand the research problem to be investigated


 Explore and evaluate the diverse and multiple research opportunities
 Assist in the development and formulation of the research hypotheses
 Define the variables and constructs under study
 Identify the possible nature of relationships that might exist between the variables
under study
 Explore the external factors and variables that might impact the research

For example, a university professor might decide to do an exploratory analysis of the new
channels of distribution that are being used by the marketers to promote and sell products and
services. To do this, a structured and defined methodology might not be essential as the basic
objective is to understand how to teach this to students of marketing. The researcher can
make use of different methods and techniques in an exploratory research– like secondary data
sources, unstructured or structured observations, expert interviews and focus group
discussions with the concerned respondent group. Here, we will technique to collect the
information required to answer the research problem, given the created framework. Thus,
research designs have a critical and directive role to play in the research process. The

22 | P a g e
execution details of the research question to be investigated are referred to as the research
design.
Secondary Resource Analysis
Secondary sources of data, as the name suggests, are data in terms of the details of
previously collected findings in facts and figures—which have been authenticated and
published. It is a fast and inexpensive way of collecting information. The past details can
sometimes point out to the researcher that his proposed research is redundant and has already
been established earlier. Secondly, the researcher might find that a small but significant
aspect of the concept has not been addressed and should be studied. For example, a marketer
might have extensively studied the potential of the different channels of communication for
promoting a 'home maintenance service' in Greater Mumbai. However, there is no impact of
any mix that he has tested and postulated the need for studying the potential of WOM (word
of mouth) in a close-knit and predominantly Parsi colony where this might be the most
effective culture-dependent technique that would work. Thus, such insights might provide
leads for carrying out an experimental and conclusive research subsequently.

Another valuable secondary resource is the compiled and readily available


databases of the entire industry, business or construct. These might be available on free and
public domains or through a structured acquisition process and expenditure. These may be
both government and non- government publications. Based on the resources and the level of
accuracy required, the researcher might decide to make use of them.

Case Study Method


Another way of conducting an exploratory research is the case study method. This
requires an in-depth study and is focused on a single unit of analysis. This unit could be an
individual employee or a customer; an organization or a complete country analysis might also
be the case of interest. They are generally, post-hoc studies and report those incidences which
might have occurred earlier. The scenario is reproduced based upon the secondary
information and a primary interview/discussion with those involved in the occurrence. Thus,
there might be an element of bias as the data, in most cases, becomes a judgmental analysis
rather than a simple recounting of events.

For example, BCA Corporation wants to implement a performance appraisal system


in the organization and is debating between the merits of a traditional appraisal system and a
360° appraisal system. For a historical understanding of the two techniques, the HR director
makes use of books on the subject. However, for better understanding, he should do an in-
depth case accounting of Allied Association which had implemented traditional appraisal
formats, and Surakhsha International-360° systems. The two exploratory researches would be
sufficient to arrive at a decision in terms of what would be best for the organization.
An anthropologist research associate, on going through the findings.

 Expert Opinion Survey
At times, there might be a situation when the topic of research is such that there is no
previous information available on it. In these cases, it is advisable to seek help from experts
who might be able to provide some valuable insights based upon their experience in the field
or with the concept. This approach of collecting particulars from significant and erudite
people is referred to as the expert opinion survey. This methodology might be formal and
structured and might be useful when being authenticated or supported by a secondary/
primary research or it might be fluid and unstructured and might require an in- depth

23 | P a g e
interviewing of the expert. For example, the evaluation of the merit of marketing organic
food products in the domestic Indian market cannot be done with the help of secondary data
as no such structured data sources exist. In this case the following can be contacted:

 Doctors and dieticians as experts would be able to provide information on whether


consumers would eat organic food products as a healthier alternative.
 Chefs who are experimental and would like to look at providing better value to their
clients.Retailers who like to sell contemporary new products.
 These could be useful in measuring the viability of the proposed plan. Discussions
with knowledgeable people may reveal some information regarding who might be
considered as potential consumers. Secondly, the question whether a healthy
proposition or a lifestyle proposition would work better to capture the targeted
consumers needs to be examined. Thus, this method can play a directional role in
shaping the research study.

Focus Group Discussions


 Another way to conduct an exploratory analysis is carry out discussions with
individuals associated with the problem under study. This technique, though originally from
sociology, is actively used in business research. In a typical focus group, there is a carefully
selected small set of individuals who are representative of the larger respondent population
under study. It is called a focus group as the selected members discuss the concerned topic
for the duration of ninety minutes to, sometimes, two hours. Usually the group is made up of
six to ten individuals. The number thus stated is because less than six would not be able to
throw up enough perspectives for the discussion and there might emerge a one-sided
discussion on the topic.

On the other hand, more than ten might lead to more confusion rather than any fruitful
discussion and that would be unwieldy to manage. Generally, these discussions are carried
out in neutral settings by a trained observer, also referred to as the moderator. The moderator,
in most cases, does not participate in the discussion. His prime objective is to manage a
relatively non- structured and informal discussion. He initiates the process and then
manoeuvres it to steer it only to the desired information needs.

Sometimes, there is more than one observer to record the verbal and non- verbal
content of the discussion. The conduction and recording of the dialogue requires considerable
skill and behavioural understanding and focus group discussions were carried out with the
typical consumers/buyers of grocery products. The objective was to establish the level of
awareness about health hazards, environmental concerns and awareness of organic food
products. A series of such focus group discussions carried out across four metros—Delhi,
Mumbai, Bengaluru and Hyderabad—revealed that even though the new age consumer was
concerned about health, the awareness about organic products varied from extremely low to
non-existent.
Descriptive Research Designs
As the name implies, the objective of descriptive research studies is to provide a
comprehensive and detailed explanation of the phenomena under study. The intended
objective might be to give a detailed sketch or profile of the respondent population being
studied. For example, to design an advertising and sales promotion campaign for high-end

24 | P a g e
watches, a marketer would require a holistic profile of the population that buys such luxury
products. Thus a descriptive study, which generates data on the who, what, when, where, why
and how of luxury accessory brand purchase would be the design necessary to fulfil the
research objectives.

Descriptive research thus leads to conclusive studies. However, such research lacks
the precision and accuracy of experimental designs, yet it lends itself to a wide range of
situations and is more frequently used in business research. Based on the time period of the
collection of the research information, descriptive research is further subdivided into two
categories: cross-sectional studies and longitudinal studies.

Cross Sectional Studies


As the name suggests, cross-sectional studies involve a slice of the population. Just as
in scientific experiments one takes a cross-section of the leaf or the cheek cells to study the
cell structure under the microscope, similarly one takes a current subdivision of the
population and studies the nature of the relevant variables being investigated.There are two
essential characteristics of cross-sectional studies:

 The cross-sectional study is carried out at a single moment in time and thus the
applicability is most relevant for a specific period. For example, a cross-sectional study
on the attitude of Americans towards Asian-Americans, pre- and post-9/11, was vastly
different, and a study done in 2012 would reveal a different attitude and behaviour
towards the population which might not be absolutely in line with that found earlier.
 Secondly, these studies are carried out on a section of respondents from the
population units under study (e.g., organizational employees, voters, consumers,
industry sectors). This sample is under consideration and under investigation only for
the time coordinates of the study.

There are also situations in which the population being studied is not of a
homogeneous nature and there are different groups that exist. Thus it becomes essential to
study the sub-segments independently. This variation of the design is termed as multiple
cross-sectional studies. Usually this multi- sample analysis is carried out at the same moment
in time. However, there might be instances when the data is obtained from different samples
at different time intervals and then they are compared. Cohort analysis is the name given to
such cross-sectional surveys conducted on different sample groups at different time intervals.
Cohorts are essentially groups of people who share a time zone or have experienced an event
that took place at a particular time period. For example, in the 9/11 case, if we study and
compare the attitudes of middle-aged Americans versus teenaged Americans towards Asian-
Americans post the event, it would be a cohort analysis.
 
The technique is especially useful in predicting election results. Cohorts of males–
females, different religious sects, urban–rural or region-wise cohorts are studied by leading
opinion poll experts like Nielsen, Gallup and others. Thus, cross-sectional studies are
extremely useful for studying current patterns of behaviour or opinion.

Longitudinal Studies
A single sample of the identified population that is studied over a longer period of
time is termed as a longitudinal study design. A panel of consumers specifically chosen to

25 | P a g e
study their grocery purchase pattern is an example of a longitudinal design. There are certain
distinguishing features of the same:

 The study involves the selection of a representative panel, or a group of individuals


that typically represent the population under study.
 The second feature involves the repeated measurement of the group over fixed
intervals of time. This measurement is specifically made for the variables under study
 A distinguishing and mandatory feature of the design is that once the sample is
selected, it needs to stay constant over the period of the study. That means the number
of panel members has to be the same. Thus, in case a panel member leaves the panel, it
is critical to replace him/her with another representative member from the population
under study.

The kinds of panels are defined as true panels and the ones using a different group
every time are called omnibus panels. The advantages of a true panel are that it has a
more committed sample group that is likely to tolerate extended or long data collecting
sessions. Secondly, the profile information is a one-time task and need not be collected
every time.

However, the problem is getting a committed group of people for the entire study period.
Secondly, there is an element of mortality or attrition where the members of the panel might
leave midway and the replaced new recruits might be vastly different and could skew the
results in an absolutely different direction. A third disadvantage is the highly structured study
situation which might be responsible for a consistent and structured behavior, which might
not be the real or field conditions.

Experimental Designs
 Experimental designs are conducted to infer causality. In an experiment, a researcher
actively manipulates one or more causal variables and measures their effects on the
dependent variables of interest. Since any changes in the dependent variable may be caused
by a number of other variables, the relationship between cause and effect often tends to be
probabilistic in nature. It is virtually impossible to prove causality. One can only infer a
cause-and- effect relationship.

The necessary conditions for making causal inferences are: (i) concomitant variation,
(ii) time order of occurrence of variables and (iii) absence of other possible causal factors.
The first condition implies that cause and effect variables should have a high correlation. The
second condition means that the causal variable must occur prior to or simultaneously with
the effect variable. The third condition means that all other variables except the one whose
influence we are trying to study should be absent or kept constant.

There are two conditions that should be satisfied while conducting an experiment. These are:

 Internal validity: Internal validity tries to examine whether the observed effect on a
dependent variable is actually caused by the treatments (independent variables) in
question. For an experiment to be possessing internal validity, all the other causal
factors except the one whose influence is being examined should be absent. Control of
extraneous variables is a necessary condition for inferring causality. Without internal
validity, the experiment gets confounded.

26 | P a g e
 External validity: External validity refers to the generalization of the results of an
experiment. The concern is whether the results of an experiment can be generalized
beyond the experimental situations. If it is possible to generalize the results, then for
what population, settings, times, independent variables and the dependent variables can
the results be projected. It is desirable to have an experiment that is valid both
internally and externally. However, in reality, a researcher might have to trade off one
type of validity for another. To remove the influence of an extraneous variable, a
researcher may set up an experiment with artificial settings, thereby increasing its
internal validity. However, in the process the external validity will be reduced.

There are four types of experimental designs. These are explained below:

1. Pre-experimental designs:These do not make use of any randomization procedures


to control the extraneous variables. Therefore, the internal validity of such designs is
questionable.
2. Quasi-experimental designs: In these designs, the researcher can control when
measurements are taken and on whom they are taken. However, this design lacks
complete control of scheduling of treatment and also lacks the ability to randomize test
units' exposure to treatments. As the experimental control is lacking, the possibility of
getting confounded results is very high. incorporated into the findings.
3. True experimental designs: In these designs, researchers can randomly assign test
units and treatments to an experimental group. Here, the researcher is able to eliminate
the effect of extraneous variables from both the experimental and the control group.
Randomization procedure allows the researcher the use of statistical techniques for
analyzing the experimental results.
4. Statistical designs: These designs allow for statistical control and analysis of
external variables. The main advantages of statistical design are the following:

o The effect of more than one level of independent variables on the


dependent variable can be manipulated.
o The effect of more than one independent variable can be examined.
o The effect of specific extraneous variables can be controlled.

Statistical design includes the following designs:

1. Completely randomized design: This design is used when a researcher is


investigating the effect of one independent variable on the dependent variable. The
independent variable is required to be measured in nominal scale i.e. it should have a
number of categories. Each of the categories of the independent variable is considered
as the treatment. The basic assumption of this design is that there are no differences in
the test units. All the test units are treated alike and randomly assigned to the test
groups. This means that there are no extraneous variables that could influence the
outcome.

Suppose we know that the sales of a product are influenced by the price. In this case,
sales are a dependent variable and the price is the independent variable. Let there be
three levels of price, namely, low, medium and high. We wish to determine the most
effective price level at which the sale is the highest. Here,the test units are the stores
which are randomly assigned to the three treatment levels. The average sales for each
27 | P a g e
price level are computed and examined to see whether there is any significant
difference sales at various price levels. The statistical technique to test for such a
difference is called analysis of variance (ANOVA).

The main limitation of completely randomized design is that it does not take into
account the effect of extraneous variables on the dependent variable. The possible
extraneous variables in the present example could be the size of the store, the
competitor's price and the price of the substitute product in question. This design
assumes that all the extraneous factors have the same influence on all the test units
which may not be true in reality. This design is very simple and inexpensive to
conduct.

2. Randomized block design: As discussed, the main limitation of the completely


randomized design is that all extraneous variables were assumed to be constant over all
the treatment groups. This may not be true. There may be different extraneous
variables influencing the dependent variable. In the randomized block design, it is
possible to separate the influence of one extraneous variable on a particular dependent
variable, thereby providing a clear picture of the impact of treatment on test units.

In the example considered in the completely randomized design, the price level (low,
medium and high) was considered as an independent variable and all the test units
(stores) were assumed to be more or less equal. However, all stores may not be of the
same size and, therefore, can be classified as small, medium and large size stores. In
this design, the extraneous variables, like the size of the store, could be treated as
different blocks. Now the treatments are randomly assigned to the blocks in such a way
that each treatment appears in each block at least once. The purpose of forming these
blocks is that it is hoped that the scores of the test units within each block would be
more or less homogeneous when the treatment is absent. What is assumed here is that
block (size of the store) is correlated to the dependent variable (sales). It may be noted
that blocking is done prior to the application of the treatment. In this experiment, one
might randomly assign twelve small-sized stores to three price levels in such a way that
there are four stores for each of the three price levels. Similarly, twelve medium-sized
stores and twelve large-sized stores may be randomly assigned to three price levels.
Now the technique of analysis of variance could be employed to analyse the effect of
treatment on the dependent variable and to separate the influence of the extraneous
variable (size of store) from the experiment.

3.  Factorial design: A factorial design may be employed to measure the effect of two
or more independent variables at various levels. The factorial designs allow for
interaction between the variables. An interaction is said to take place when the
simultaneous effect of two or more variables is different from the sum of their
individual effects. An individual may have a high preference for mangoes and may also
like ice-cream, which does not mean that he would like mango ice cream, leading to an
interaction. The sales of a product may be influenced by two factors, namely, price

28 | P a g e
level and store size. There may be three levels of price-low(A1), medium (A2) and
high(A3). The store size could be categorized into small (B) and big (B). This could be
conceptualized as a two-factor design with information reportedin the form of a table.
In the table, each level of one factor may be presented asa row and each level of
another variable would be presented as a column. This example could be summarized
in the form of a table having three rows and two columns. This would require 3 × 2 = 6
cells. Therefore, six different levels of treatment combinations would be produced each
with a specific level of price and store size. The respondents would be randomly
selected and randomly assigned to the six cells.

Respondents in each cell receive a special treatment combination. For example, respondents
in the upper left hand corner cell would face small level of price and small store. Similarly,
the respondents in the lower right hand corner cell will be subjected to both high price level
and big store.
The main advantages of factorial design are:

 It is possible to measure the main effects and the interaction effect of two or more
independent variables at various levels.
 It allows saving time and effort because all observations are employed to study the
effects of each factor.
 The conclusion reached using factorial design has broader applications as each
factor is studied with different combinations of other factors.

The limitation of this design is that the number of combinations (number of cells) increases
with increased number of factors and levels. However, a fractional factorial design could be
used if the interest is in studying only a few of the interactions or main effects.

Errors Affecting Research Design


We have discussed three types of research designs, namely, exploratory, descriptive
and experimental. All of these have some scope of error. There could be various sources of
errors in research design.
Exploratory research is conducted using focus group discussion, secondary data, analysis of
case study and expert opinion survey. It is quite likely that members of the focus group have
not been selected properly. Secondary data may not be free of errors (in fact, one needs to
evaluate the methodology used in collecting such a data). Also, the experts chosen for the
survey may not be experts in the ?eld. As a matter of fact, getting an expert is very difficult
task. All these factors could lead to errors in the exploratory design.
In the descriptive design, the purpose is to describe a phenomenon. For this one could use a
structured questionnaire. It could always happen that the respondents do not give correct
responses to some of the questions, thereby
resulting in wrong data.
In true experimental and statistical design, the respondents are selected at random which may
not be the case in real life. Many times, in actual business situations, value judgments play a
very important role in selecting the respondents. Further, there can always be errors in
observations.

Summary
Let us recapitulate the main points discussed in this unit:

29 | P a g e
1. Research design is the blueprint or the framework for carrying out the research
study.
2. The researcher has a number of designs available to him for investigating the
research objectives. Based upon the objective or the purpose of the study, research
design may be exploratory, descriptive or experimental.
3. Exploratory designs are loosely structured and investigative in nature.
4. In case the hypothesis formulated is descriptive in nature, the study design would
also be descriptive. The study involves collecting the who, what, why, where, why,
when and how about the population under study.
5. Descriptive studies can further be divided into cross-sectional an longitudinal
design.
In case the study is conducted on a single part of the population, it is called single
cross-sectional and in case it is done on more than one segment it is called multiple
cross-sectional.
The second type of descriptive design is the longitudinal design. Here, a selected
sample is studied at different intervals (fixed) of time to measure the variable(s) under
study.
6. Experimental designs are conducted to infer causality. There are four types of
experimental designs – pre-experimental designs, quasi- experimental designs, true
experimental designs and statistical designs.

UNIT 04 : PRIMARY & SECONDARY DATA

Structure

 Introduction

 Classification of Data

 Secondary Data

 Uses of Secondary Data

 Advantages and Disadvantages of Secondary Data

 Types and Sources of Secondary Data

 Primary Data Collection: Observation Method

 Primary Data Collection: Focus Group Discussions

 Primary Data Collection: Personal Interview Method

 Summary

 Keywords

 Introduction 

30 | P a g e
 In the last unit, we discussed research design and its various aspects. Once the
research design is in place, it is time to answer the research problem and hypotheses. But this
cannot be done unless one collects the relevant information necessary for arriving at any
suitable conclusion. The information thus collected is usually termed as data. The researcher
has a choice of a wide variety of methods to collect data. It has to be remembered that there
might be a lot of information available on the topic under study; however you need to pick up
only that information which is of direct relevance to the current problem under study. 

Classification of Data

To understand the number of choices available to a researcher for collecting the


study-specific information, one needs to be fully aware of the resources available for the
study and the level of accuracy required. To appreciate the truth of this statement, one needs
to examine the variety of methods available to the researcher. The data sources could be
either problem specific and primary and secondary in nature

Primary data, as the name suggests, is original, problem- or project- specific, and collected
for the specific needs spelt out by the researcher. The accuracy and relevance is reasonably
high. The time and money required for this are quite high and sometimes a researcher might
not have the resources or the time or both to go ahead with this method. In this case, the
researcher can look at alternative sources of data which are economical and reliable enough
to take the study forward. These include the second category of
data sources—namely the secondary data.

Secondary data is that information which is not topical or research- specific and has been
collected and compiled by some other researcher or investigative body. This type of data is
recorded and published in a structured format, and thus, is quicker to access and manage.
Secondly, in most instances, unless it is a data product, it is not too expensive to collect. The
information required is readily available as a data product or as the audit information which
the researcher or the organization can get and use for arriving at quick decisions. In
comparison to the original research-centric data,secondary data can be economically and
quickly collected by the decision maker in a short span of time. However, one must
remember that it is a little low on accuracy since what is primary and original for one
researcher would essentially become secondary and historical for someone else. Table 4.1
gives a snapshot of the major differences between the two methods.
Secondary Data
We have already discussed what secondary data is. Let us see what are its uses, types
and sources.
Uses of Secondary Data
Secondary data can be used for multiple purposes:
 Problem identification and formulation stage: Existing information on the topic under
study is useful to help develop the research question.
 Hypotheses designing: Previous research studies done in the area could help in
hypothesizing about expected results.

31 | P a g e
 Sampling considerations: There might be respondent related databases available to
seek respondent statistics and relevant contact details. These would help during
sampling for the study.
 Primary base: The secondary information collected can be used to design the
primary data collection instruments, in order to phrase and design the right
questions.
 Validation board: Earlier records and studies can also be used to support or
validate the information collected through primary sources.
Before we examine the wide range of secondary sources available to the business researcher,
it is essential that one is aware of the advantages and disadvantages of using secondary
sources.
Advantages and Disadvantages of Secondary Data
There are multiple advantages of using secondary data.

 Resource advantage: Any research that is making use of secondary information


will be able to save immensely in terms of both cost and time.
 Accessibility of data: The other major advantage of secondary sources is that it is
very easy to access this data.

 Accuracy and stability of data: Data from recognized sources has the additional
advantage of accuracy and reliability.

 Assessment of data: It can be used to compare and support the primary research
findings of the present study. However, there is need for caution as well because
in using secondary data, there might be some disadvantages like:

 Applicability of data: The information might not be directly suitable for our
study. Also since it is old data it might not be applicable today.

 Accuracy of data: All data that is available might not be reliable and accurate.
Types and Sources of Secondary Data

Secondary data can be divided into internal and external sources. Internal, as the name
implies, is an organization-or environment-specific source and includes the historical
output and records available with the organization which might be the backdrop of the
study. The data that is independent of the organization and covers the larger industry-
scape would be available in the form of published material, computerized databases or
data compiled by syndicated services. Discussed below are three major sources of
data – internal, external, computer-stored data and syndicated databases.
1. Internal sources of data
Compilation of various kinds of information and data is mandatory for any organization that
exists.

32 | P a g e
 Company records: This includes all the data about the inception, the owners, the
mission and vision statements, infrastructure and other details, including the
process and manufacturing (if any) and sales, as well as historical timeline of the
events.
 Employee records: All details regarding the employees (regular and part-time)
would be part of employee records.
 Sales data: This data can take on different forms:
1. Cash register receipt
2. Salespersons' call records: This is a document to be prepared and updated every
day by each individual salesperson.
3. Sales invoices: These are about the customers who have placed an order with the
company, then complete details including the size of the order, location, price by
unit, terms of sale and shipment details (if any).
4. Financial records and sales reports: Besides this, there are other published sources
like warranty records, CRM data and customer grievance data which are
extremely critical for evaluating the health of a product or an organization.

 2. External data sources


As stated earlier, a source that collects and compiles data but is not a part of the
organization is referred to as external data source. External sources of data include the
following:
 Published data: There could be two kinds of published data—one that is from the
official and government sources and the other kind is that which has been prepared by
individuals or private agencies or organizations.
 Government sources: The Indian government publishes a lot of documents that are
readily available and are extremely useful for the purpose of providing background
data.
 Other data sources: This source is the most voluminous and most frequently used, in
every research study. The information could be
1. Books and periodicals
2. Guides (including Industry guides)
3. Directories and indices
4. Standard non-governmental statistical data
 Reference databases: These refer users to the articles, research papers, abstracts
and other printed news contained in other sources. They provide online indices
and abstracts and are thus also called bibliographic databases.

 Source databases: These provide numerical data, complete text, or a combination


of both.
 Based on storage and recovery mechanisms: Another useful way of classifying
databases is based on their method of storage and retrieval.
 Online databases: These can be accessed in real-time directly from the producers
of the database or through a vendor. Examples include ABI/Inform, EBSCO and
Emerald.
 CD-ROM databases: Here information is available on a CD-ROM.

33 | P a g e
    3.   Computer-stored data

 Information today is also available in the electronic form. The databases available to
the researcher can be classified based on the type of information or by the method of storage
and recovery as described here. Figure 4.3 gives a classification of the sources of
computerized data.

  4.    Syndicated data sources


Syndicated service agencies are organizations that collect organization/ product/category-
specific data from a regular consumer base and create a common pool of data that can be used
by multiple buyers, for their individual purposes.
There are different ways to classify syndicate sources.
 Household/individual data: These could be in the form of surveys or panel data
available through reputed agencies.
 Surveys: Surveys are usually one-time assessments conducted on a large
representative respondent base. For example, opinion polls before elections, a survey
of the best  business schools to study in
 Product purchase panels: These specially selected respondent groups specifically
record certain identified purchases, generally related to household products and
groceries
 Media-specific panels: Panels are also created for collecting information related to
promotion and advertising. The task of the media panel is to make use of different
kinds of electronic equipment to automatically record consumer viewing behaviour.
These are used to calculate the television rating performance (TRP) of different
programs.
 Scanner devices and individual source systems: To overcome the problems of panel
data, a new service is provided by research agencies through electronic scanner
devices-e.g. sales volume tracking data
 Institutional syndicated data: Syndicated data is also available at the institutional
level. Retailer and wholesaler audits are examples of this data. Usually the records are
noted as: Beginning stocks + deliveries - ending inventory = sales for the period

Primary Data Collection:Observation Method


The researcher has available to him/her a wide variety of data collection methods
which are primary or problem specific in nature. However, in this unit, we would be
discussing the major and most often used methods like the observation method, focus group
discussion and interview method. The questionnaire method is the most commonly used
method of primary data collection. We will focus on the questionnaire method in detail in
Unit 6. Let us discuss some of the other widely used methods now.

Observation is a direct method of collecting primary data. It is one of the most appropriate
methods to use in case of descriptive research. The method of observation involves viewing
and recording individuals, groups, organizations or events in a scientific manner in order to
collect valuable data related to the topic under study.

34 | P a g e
The mode of observation could be standardized or structured observation. Here, the nature of
content to be recorded and the format and the broad areas of recording are predetermined.
Thus, the observer's bias is reduced and the authenticity and reliability of the information
collected is higher. For example, Fisher Price toys carry out an observational study whenever
they come out with a new toy. The observer is supposed to record the appeal of the toy for a
child.
The opposite of this is called unstructured observation. Here, the observer is supposed to
make a note of whatever he understands as relevant for the research study. This kind of
approach is more useful in exploratory studies. Since it lacks structure, the chances of
observer's bias are high. An example of this is the observation of consumers at a bank, a
restaurant or a doctor's clinic.
However, it is critical here to understand that the researcher must have a preconceived plan to
capture the observations made. It is not to be treated as a blank sheet where the observer
reports what he sees. The aspects to be observed must be clearly listed as in an audit form, or
they could be indicative areas on which the observation is to be made.
Another way of distinguishing observations is the level of respondent awareness of being
observed. This might be disguised; here, the observation is done without the respondent's
knowledge, who has no idea that he/she is being observed. This can also be done with devices
like a one-way mirror or a hidden camera or a recorder. The only disadvantage is that this is
ethically an intrusion of an individual's right to privacy. On the other hand, the knowledge
that the person is under observation can be conveyed to the respondent, and this is
undisguised observation. The decision to choose one over the other depends upon the nature
of the study.
The observation method can also be distinguished on the basis of the setting in which the
information is being collected. This could be natural observation, which as the name
suggests, is carried out in actual real life locations, for example the observation of how
employees interact with each other during lunch breaks. On the other hand, it could be an
artificial or simulated environment. This is actively done in the armed forces where stress
tests are carried out to measure an individual's tolerance level.

There is another differentiation–the observation could be done by a human observer or a


mechanical device.

Human observation: As the name suggests, this technique involves observation and recording
done by human observers. The task of the observer is simple and predefined in case of a
structured observation study as the format and the areas to be observed and recorded are
clearly defined. In an unstructured observation, the observer records in a narrative form the
entire event that he has observed.

Mechanical observation: In these methods, man is replaced by machine. Some examples


are
 Store cameras and cameras in banks and other service areas
 Universal product code (UPC) scanned by electric scanners in stores
 Psycho-galvanometer, which measures galvanic skin response (GSR) or changes in
the electrical resistance of the skin. Thus, the respondent could be exposed to different
35 | P a g e
kinds of packaging, advertisements and product composition, to note his/her reaction
to them
 Eye-tracking equipment such as oculometers, eye cameras or eye view minuters
record the movements of the eye. The oculometer determines what the individual is
looking at, while the pupilometer measures the interest of the person in the stimulus.
The pupilometer measures changes in the diameter of the respondent's pupils
Trace analysis: In this, the remains or the leftovers of the consumers' basket—like his credit
card spend, the recycle bin on his computer, his garbage (garbology) are evaluated to measure
current trends and patterns of usage and disposal Observational techniques are an extremely
useful method of primary data collection and are always a part of the inputs, whether
accompanying other techniques, like interviews, discussions or questionnaire administration,
or as the prime method of data collection. However, the disadvantage which they suffer from
is that they are always behaviourally driven and cannot be used to investigate the reasons or
causes of the observed behaviour. Another problem is that if one is observing the occurrence
of a certain phenomena, one has to wait for the event to occur. One alternative to this is to
study the recordings, whether verbal, written or audio-visual, in order to formulate the study-
related inferences.
Primary Data Collection:Focus Group Discussions

Focus group discussion (FGD) is a highly versatile and dynamic method of collecting
primary data from a representative group of respondents. The process generally involves a
moderator who steers the discussion on the topic under study. There is a group of carefully
selected respondents who are invited and gathered at a neutral setting. The moderator initiates
the discussion and then the group carries it forward by holding a focused and interactive
discussion.
 Key elements of a focus group
 Size: Ideal recommended size for a group discussion is eight to twelve members. Less
than eight would not generate all the possible perspectives on the topic and the group
dynamics required for a meaningful session. And more than twelve would make it
difficult to get any meaningful insight.
 Nature: Individuals who are from a similar background—in terms of demographic and
psychographic traits—must be included; otherwise disagreement might emerge as a
result of other factors rather than the one under study. The other requirement is that
the respondents must be similar in terms of the subject/policy/product knowledge and
experience with the product under study. Moreover, the organizer of the focus group
discussion must ensure that the following criteria are taken care of:
 Acquaintance: It has been found that knowing each other in a group discussion is
disruptive and hampers the free flow of discussion. It is recommended that the group
should consist of strangers rather than subjects who know each other.
 Setting: The space or setting in which the discussion takes place should be as neutral,
informal and comfortable as possible. In case one-way mirrors or cameras are
installed, there is a need to ensure that these gadgets are not directly visible.
 Time period: The discussion should be held in a single setting unless there is a 'before'
and 'after' design, which requires group perceptions, before the study variable is

36 | P a g e
introduced; and later to gauge the group's reactions. The ideal duration of discussion
should not exceed an hour and a half. This is usually preceded by a short rapport
formation session between the moderator and the group members.
 The recording: This is most often machine recording even though sometimes this may
be accompanied by human recording as well.
 The moderator: The moderator is the one who manages the discussion. He might be a
participant in the group discussion or he might be a non- participant. He must be a
good listener and unbiased in his conduct of the discussions.
Steps for planning and conducting focus groups
1. The focus group must be conducted in a stepwise manner:

      1. Clearly define and enlist the research objectives of the study that requires group
discussion.
       2.  A comprehensive moderator's structured outline for conducting the whole process
needs to be charted out.
       3.  After this, the actual focus group discussion is carried out.
     4. The focus summary of the findings are clubbed under different heads as indicated in the
focus group objectives and reported in a narrative form. This may include expressions like
'majority of the participants were of the view' or 'there was a considerable disagreement on
this issue.

Types of focus groups


The researcher has different kinds of group discussion methods available to him or her. These
are: 
 Two-way focus group: Here one respondent group sits and listens to the other and
after learning from them or understanding the needs of the group, they carry out a
discussion amongst themselves. For example, in a management school, the faculty
group could listen to the opinions and needs of the student group.
 Dual-moderator group: Here, there are two different moderators; one responsible for
the overt task of managing the group discussion and the other for the second objective
of managing the 'group mind' in order to maximize the group performance.
 Fencing-moderator group: The two moderators take opposite sides on the topic being
discussed and thus, in the short time available, ensure that all possible perspectives
are thoroughly explored.
 Friendship groups: There are situations where the comfort level of the members needs
to be high so that they elicit meaningful responses. This is especially the case when a
supportive peer group encourages admission about the related organizations or
people/issues.
 Mini-groups: These groups might be of a smaller size (usually four to six) and are
usually expert groups/committees that on account of their composition are able to
decisively contribute to the topic under study.
 Creativity group: These are usually longer than one and a half hour duration and
might take the workshop mode. Here, the entire group is instructed, after which they
brainstorm in smaller sub-groups. They then reassemble to present their sub-group's
opinion. This might also stretch across a day or two.

37 | P a g e
 Brand-obsessive group: These are special respondent sub-strata who are passionately
involved with a brand or product category (say, cars). They are selected, as they can
provide valuable insights that can be successfully
 incorporated into the brand's marketing strategy.
 Online focus group: This is a recent addition to the methodology and is extensively
used today. Here, the respondents logs in at the designated time into a web-based chat
room. The discussion between the moderator and the participants is real-time.
Primary Data Collection:Personal Interview Method
A personal interview is a one-to-one interaction between the investigator/ interviewer and the
interviewee. The purpose of the dialogue is research specific and ranges from completely
unstructured to highly structured.
Uses of the interview method
The interview has varied applications in business research and can be used effectively at
various stages.
2. Problem definition: The interview method can be used right in the beginning of the
study. Here, the researcher uses the method to get clarity about the topic under study.
3. Exploratory research: Here because the structure is loose this method can be actively
used.
4. Primary data collection: There are situations when the method is used as a primary
method of data collection. This is generally the case when the area to be investigated
is high on emotional responses.
The interview process
The steps undertaken for organizing a personal interview are somewhat similar to those of a
focus group discussion.
 Interview objective: The information needs that are to be addressed by the instrument
should be clearly spelt out as study objectives. This step includes a clear definition of
the construct/variable(s) to be studied.
 Interview guidelines: A typical interview may take from 20 minutes to close to an
hour. A brief outline to be used by the investigator is formulated depending upon the
contours of the interview.
 Structure: Based on the needs of the study, the actual interview may be
unstructured, semi-structured or structured.
1. Unstructured: This type of interview has no defined guidelines. It usually begins with
a casually worded opening remark ike 'so tell us/me something about yourself'. The
direction the interview will take is not known to the researcher also. The probability
of subjectivity is very high.
2. Semi-structured: This has a more defined format and usually only the broad areas to
be investigated are formulated. The questions, sequence and language are left to the
investigator's choice. Probing is of critical importance in obtaining meaningful
responses and uncovering hidden issues. After asking the initial question, the
direction of the interview is determined by the respondent's initial reply, the
interviewer's probes for elaboration and the respondent's answers.

38 | P a g e
3. Structured: This format has the highest reliability and validity. There is considerable
structure to the questions and the questioning is also done based on a prescribed
sequence. They are sometimes used as the primary data collection instrument also.
 Interviewing skills: The quality of the output and the depth of information collected
depend upon the probing and listening skills of the interviewer. His attitude needs to
be as objective as possible.
 Analysis and interpretation: The information collected is not subjected to any
statistical analysis. Mostly the data is in narrative form, in the case of structured
interviews it might be summarized in prose form. Figure 4.4 presents classification of
the types of personal interview.
 Personal methods: These are the traditional one-to-one methods that have been used
actively in all branches of social sciences. However, they are distinguished in terms of
the location of interview.
 At-home interviews: This face-to-face interaction takes place at the respondent's
residence. Thus, the interviewer needs to initially contact the respondent to ascertain
the interview time.
 Mall-intercept interviews: As the name suggests, this method involves conducting
interviews with the respondents as they are shopping in malls. Sometimes, product
testing or product reactions can be carried out through structured methods and
followed by 20-30 minute interviews to test the reactions.
 Computer-assisted personal interviewing (CAPI): This technique is carried out with
the help of the computer. In this form of interviewing, the respondent faces an
assigned computer terminal and answers a questionnaire on the computer screen by
using the keyboard or a mouse. A number of pre-designed packages are available to
help the researcher design simple questions that are self-explanatory and instead of
probing, the respondent is guided to a set of questions depending on the answer given.
There is usually an interviewer present at the time of respondent's computer- assisted
interview and is available for help and guidance, if required.
 Telephone method: The telephone method replaces the face-to-face interaction
between the interviewer and interviewee. This involves calling up the subjects and
asking them a set of questions. The advantage of the method is that geographic
boundaries are not a constraint and the interview can be conducted at the individual
respondent's location. The format and sequencing of the questions remains the same.
 Traditional telephonic interviews: The process can be accomplished using the
traditional telephone for conducting the questioning.
 Computer-assisted telephone interviewing: In this process, the interviewer is replaced
by the computer and it involves conducting the telephonic interview using a
computerized interview format. The interviewer sits in front of a computer terminal
and wears a mini-headset, in order to hear the respondent answer. However, unlike the
traditional method where he had to manually record the responses, the responses are
simultaneously recorded on the computer.
Since the interview requires a one-to-one dialogue to be carried out, it is more cumbersome
and costly as compared to a focus group discussion. Also, conduction of interview requires
considerable skills on part of the interviewer and thus adequate training in interviewing skills

39 | P a g e
is needed for capturing comprehensive study-related data.
Summary

Let us recapitulate the main points discussed in this unit:

 The researcher has access to two major sources of data: original as in primary
sources or secondary data.

 Secondary information is a useful, fast and cost-effective way of testing and


achieving the study objectives.

 Secondary data could be collected and compiled within the organization/industry.

 Source of data placed outside the organization is termed external data source.

 The observational method is the simplest method of primary data collection. This


can be differentiated into structure-unstructured; human– mechanically observed data.

 The focus group discussion is a cost effective method and can ideally be done on a
small group of respondents to obtain meaningful data

 Interview method involves a dialogue between the interviewee and the interviewer.
This can range from unstructured to completely structured. Today the interviewer can
make use of the telephone as well as computer to assist him in conducting the
interview.

Unit 05 : ATTITUDE MESUREMENT & SCALING


Structure
 Introduction
 Types of Measurement Scales
 Attitude
 Classification of Scales
 Single Item vs. Multiple Item Scale
 Comparative vs. Non-Comparative Scales
 Measurement Error
 Criteria for Good Measurement
 Summary
 Keywords

Introduction
In the previous unit, we studied the various types, sources and methods of collecting
data. In this unit, we will focus on the different types of measurements and statistical
techniques that are applicable for the same. The various formats of a rating scale and the
construction of an attitude measurement scale, along with the description of the distinct
criteria involved in analysing a good measurement scale are elaborated in this Unit.

40 | P a g e
The term 'measurement' means assigning numbers or some other symbols to the
characteristics of certain objects. When numbers are used, the researcher must have a rule for
assigning a number to an observation in a way that provides an accurate description. We do
not measure the object but some characteristics of it. Therefore, in research,
people/consumers are not measured; only their perceptions, attitude or any other relevant
characteristics are measured. There are two reasons for which numbers are usually assigned:
 Firstly, numbers permit statistical analysis of the resulting data
 Secondly, they facilitate the communication of measurement results.
Scaling is an extension of measurement. Scaling involves creating a continuum on
which measurements on objects are located. Suppose you want to measure the satisfaction
level of Kingfisher Airlines and a scale of 1 to 11 is used for the said purpose. This scale
indicates the degree of dissatisfaction, with 1 = extremely dissatisfied and 11 = extremely
satisfied.
In this Unit, you will also study the sources of measurement errors and the
criteria for evaluating measurements.
Types of Measurement Scales
There are four types of measurement scales—nominal, ordinal, interval and ratio. We
will discuss each one of them in detail. The choice of the measurement scale has implications
for the statistical technique to be used for data analysis.
Nominal scale: This is the lowest level of measurement. Here, numbers are assigned for the
purpose of identification of the objects. Any object which is assigned a higher number is in
no way superior to the one which is assigned a lower number. Each number is assigned to
only one object and each object has only one number assigned to it. It may be noted that the
objects are divided into mutually exclusive and collectively exhaustive categories.
Example:
What is your religion?
a) Hinduism b) Sikhism
c) Christianity d) Islam
e) Any other (please specify)
A Hindu may be assigned number 1, a Sikh may be assigned number
2, and a Christian may be assigned number 3, and so on. Any religion which is assigned a
higher number is in no way superior to the one which is assigned a lower number. The
assignment of numbers is only for the purpose of identification.

Nominal scale measurements are used for identifying choice of


food (vegan,vegetarian,or non-vegetarian), gender (male/female/other), caste, respondents,
marital status, brands, attributes, stores, the players of a hockey team, and so on.

41 | P a g e
The assigned numbers cannot be added, subtracted, multiplied or divided. The only
arithmetic operations that can be carried out are the count of each category. Therefore, a
frequency distribution table can be prepared for the nominal scale variables and mode of
distribution can be worked out. One can also use chi-square test and compute contingency
coefficient using nominal scale variables.
Ordinal scale: This is the next higher level of measurement than the nominal scale
measurement. One of the limitations of the nominal scale measurements is that we cannot say
whether the assigned number to an object is higher or lower than the one assigned to another
option.

The ordinal scale measurement takes care of this limitation. An ordinal scale
measurement tells whether an object has more or fewer characteristics than some other
objects. However, it cannot answer how much more or how much less.
Example:
Rank the following attributes while choosing a restaurant for dinner. The most important
attribute may be ranked as 1, the next important may be assigned a rank of 2, and so on.
In the ordinal scale, the assigned ranks cannot be added, multiplied, subtracted or
divided. One can compute median, percentiles and quartiles of the distribution. The other
major statistical analysis which can be carried out is the rank order correlation coefficient, a
sign test. All the statistical techniques which are applicable in the case of nominal scale
measurement can also be used for the ordinal scale measurement. However, the reverse is not
true. This is because ordinal scale data can be converted into nominal scale data but not the
other way round.

Interval scale: The interval scale measurement is the next higher level of measurement. It
takes care of the limitation of the ordinal scale measurement where the difference between
the score on the ordinal scale does not have any meaningful interpretation. In the interval
scale, the difference in the score on the scale has meaningful interpretations. It is assumed
that the respondent is able to answer the questions on a continuum scale. The mathematical
form of the data on the interval scale may be written as,
Y=a+bX Where a ≠ 0
In the interval scale, the difference in score has a meaningful interpretation while the ratio of
the score on this scale does not have a meaningful interpretation. This can be seen from the
following interval scale question:
How likely are you to buy a new designer carpet in the next six months?

Suppose a respondent ticks the response category 'likely' and another respondent ticks the
category 'unlikely'. If we use any of the scales A, B or C, we note that the difference between
nttheernasl cores in each case is 2. Whereas, when the ratio of the scores is taken, itDiasta2, 3
and –1 for the scales A, B and C, respectively. Therefore, the ratio of the scores on the scale
does not have a meaningful interpretation. The following are some examples of interval scale
data:
42 | P a g e
The numbers on this scale can be added, subtracted, multiplied or divided. One can compute
arithmetic mean, standard deviation, correlation coefficient, and conduct a t-test, Z-test,
regression analysis and factor analysis. As the interval scale data can be converted into the
ordinal and the nominal scale data, all the techniques applicable for the ordinal and the
nominal scale data can also be used for interval scale data.

Ratio scale: This is the highest level of measurement and takes care of the limitations of the
interval scale measurement, where the ratio of the measurements on the scale does not have a
meaningful interpretation. The ratio scale measurement can be converted into interval,
ordinal and nominal scale. But the other way round is not possible. The mathematical form of
the ratio scale data is given by Y = b X. In this case, there is a natural zero (origin), whereas
in the interval scale, we had an arbitrary zero. Examples of the ratio scale data are weight,
distance travelled, income and sales of a company, to mention a few.
All the mathematical operations can be carried out using the ratio scale data. In addition to
the statistical analysis mentioned in the interval, ordinal and nominal scale data, one can
compute the coefficient of variation, geometric mean, and harmonic mean using the ratio
scale measurement.

Attitude
Attitude is viewed as an enduring disposition to respond consistently in a given
manner to various aspects of the world, including persons, events and objects. A company is
able to sell its products or services when its customers have a
 Cognitive component: This component represents an individual's information and
knowledge about an object. It includes awareness of the existence of the object, beliefs about
the characteristics or attributes of the object and judgement about the relative importance of
each of the attributes. In a survey, if the respondents are asked to name the companies
manufacturing plastic products, some respondents may remember names like Tupperware,
Modicare and Pearl Pet. This is called unaided recall awareness. More names are likely to be
remembered when the investigator makes a mention of them. This is aided recall. The
examples of beliefs or judgements could be that the products of Tupperware are of high
quality, non-toxic and can be used in parties; a mutton dish can be cooked in a pressure
cooker in less than thirty minutes, and so on.
Affective component: The affective component summarizes a person's overall feeling or
emotions towards the objects. The examples this component could be: the food cooked in a
pressure cooker is tasty, taste of orange juice is good or the taste of bitter gourd is very bad.
Intention or action component: This component of an aptitude, also called the behavioural
component, reflects predisposition to an action by reflecting the consumer's buying or
purchase intention. It also reflects a person's expectations of future behaviour towards an
object.
There is a relationship between attitude and behaviour. If a consumer does not have a
favourable attitude towards a product, he/she will certainly not buy the product. However,
43 | P a g e
having a favourable attitude does not mean that it would be reflected in the purchase
behaviour. This is because the intention to buy a product has to be backed by the purchasing
power of the consumer. Therefore, the relationship between the attitude and the purchase
behaviour is a necessary condition for the purchase of the product but it is not a sufficient
condition. This relationship could hold true at the aggregate level but not at the individual
level.

Classification of Scales
One of the ways of classifying scales is based on the number of items in the scale. Based
upon this, the following classification may be proposed:
 
Single Item vs. Multiple Item Scale
Single item scale: In the single item scale, there is only one item to measure a given
construct. For example:
Consider the following question:
How satisfied are you with your current job?
 Very Dissatisfied
 Dissatisfied
 Neutral
 Satisfied
 Very satisfied
The problem with the above question is that there are several aspects to a job, like pay, work
environment, rules and regulations, job security and communication with the seniors. The
respondent may be satisfied on some of the factors but may not on others. By asking a
question as stated above, it will be difficult to analyse the problematic areas. To overcome
this problem, a multiple item scale is proposed.

Multiple item scale: In multiple item scale, there are many items that play an important role
in forming the underlying construct that the researcher is trying to measure. This is because
each item forms some part of the construct (satisfaction) which the researcher is trying to
measure. As an example, some of the following questions may be asked in a multiple item
scale:

How satisfied are you with the pay you are getting on your current job?
 Very Dissatisfied
 Dissatisfied
 Neutral
 Satisfied
 Very satisfied

44 | P a g e
How satisfied are you with the rules and regulations of your organization?
 Very Dissatisfied
 Dissatisfied
 Neutral
 Satisfied
 Very satisfied
Comparative Scales vs. Non-Comparative Scales

Comparative scales
In comparative scales, it is assumed that respondents make use of a standard frame of
reference before answering a question. For example:
A question like 'How do you rate Barista in comparison to Cafe Coffee Day on the basis of
quality of beverages?' is an example of the comparative rating scale. It involves direct
comparison of stimulus objects.
Example:
Please rate Domino's in comparison to Pizza Hut on the basis of your satisfaction level on an
11-point scale, based on the following parameters: 1 = Extremely poor, 6 = Average, 11 =
Extremely good. Circle your response:

a. Variety of menu options 1 2 3 4 5 6 7 8


9 10 11
b. Value for money 1 2 3 4 5 6 7 8 9
10 11
c. Speed of service (delivery time) 1 2 3 4 5 6 7
8 9 10 11
d. Promotional offers 1 2 3 4 5 6 7 8 9
10 11
e. Food quality 1 2 3 4 5 6 7 8 9 10
11

Comparative scale data is interpreted generally in a relative kind. Described below are types
of comparative rating scales:
(I) Paired comparison scales: Here a respondent is presented with two objects and is
asked to select one according to whatever criterion he or she wants to use. The resulting data
from this scale is ordinal in nature. For example, suppose a parent wants to offer one of the

45 | P a g e
four items to a child—chocolate, burger, ice cream and pizza. The child is offered to choose
one out of the two from six possible pairs, i.e., chocolate or burger, chocolate or ice cream,
chocolate or pizza, burger or ice cream, burger or pizza and ice cream or pizza. In general, if
there are n items, the number of paired comparison would be (n(n – 1)/2). Paired comparison
technique is useful when the number of items is limited because it requires a direct
comparison and overt choice.

(ii) Rank order scaling: In rank order scaling, respondents are presented with
several objects simultaneously and asked to order or rank them according to some criterion.
Consider, for example, the following question:
Rank the following soft drinks in order of your preference. The most preferred soft drink
should be ranked one, the second most preferred should be ranked two, and so on.
Like paired comparison, this approach is also comparative in nature. The problem with this
scale is that if a respondent does not like any of the above- mentioned soft drinks and is
forced to rank them in the order of his choice, then the soft drink which is ranked one should
be treated as the least disliked soft drink, and similarly, the other rankings can be interpreted.
The rank order scaling results in the ordinal data.

(iii) Constant sum rating scaling: In constant sum rating scale, the respondents are asked to
allocate a total of 100 points between various objects and brands. The respondent distributes
the points to the various objects in the order of his preference. 
Consider the following example:
Allocate a total of 100 points among the various schools into which you would like to admit
your child. The points should be allocated in such a way that the sum total of the points
allocated to various schools adds up to 100.
Suppose Mother's International is awarded 30 points, whereas Laxman Public School is
awarded 15 points. One can make a statement that the respondent rates Mother's International
twice as high as Laxman Public School. This type of data is not only comparative in nature
but could also result in ratio scale measurement.
 (iv) Q-sort technique: This technique makes use of the rank order procedure in which
objects are sorted into different piles based on their similarity with respect to certain criterion.
Suppose there are 100 statements and an individual is asked to pile them into five groups in
such a way that the strongly agreed statements could be put in one pile, agreed statements
could be put in another pile, neutral statements form the third pile, disagreed statements come
in the fourth pile, strongly disagreed statements form the fifth pile, and so on. The data
generated in this way would be ordinal in nature. The distribution of the number of statement
in each pile should be such that the resulting data may follow a normal distribution.
Non-comparative scales

In non-comparative scales, respondents do not make use of any frame of reference before
answering the questions. The resulting data is generally assumed to be interval or ratio scale.

46 | P a g e
Non-comparative scales are divided into two categories, namely, the graphic rating scales and
the itemized rating scales. A useful and widely used itemized rating scale is the Likert scale.

Graphic rating scale


A graphic rating scale is a continuous scale. In the graphic rating scale, the respondent is
asked to tick his preference on a graph. 

To measure the preference of an individual towards fast food, one has to measure the distance
from the extreme left to the position where a tick mark has been put. Higher the distance,
higher would be the individual's preference for fast food. This scale suffers from two
limitations—one, if a respondent has put a tick mark at a particular position, and after ten
minutes, he or she is given another form to put a tick mark, it will virtually be impossible to
put a tick at the same position as before. Does it mean that the respondent's preference for
fast food has undergone a change in 10 minutes?

The basic assumption of this scale is that the respondents can distinguish the fine shade of
difference between preference/attitude, which need not be the case. Further, the coding,
editing and tabulation of data generated through such a procedure is a very tedious task and
researchers try to avoid using it.

Itemized rating scale


In the itemized rating scale, the respondents are provided with a scale that has a number of
brief descriptions associated with each response category.
 The response categories are ordered in terms of the scale position and the respondents are
supposed to select the specified category that describes in the best possible way an object is
rated. There are certain issues that should be kept in mind while designing the itemized rating
scale. These issues are:

Number of categories to be used: There is no hard and fast rule as to how many categories
should be used in an itemized rating scale. However, it is standard practice to use five or six
categories. Some researchers are of the opinion that more than five categories should be used
in situations where small changes in attitudes are to be measured. There are others that argue
that the respondents would find it difficult to distinguish between more than five categories.

Odd or even number of categories: It has been a matter of debate among the researchers as
to whether odd or even number of categories is to be used. By using even number of
categories, the scale would not have a neutral category and the respondent will be forced to
choose either the positive or the negative side of the attitude. If odd numbers of categories are
used, the respondent has the freedom to be neutral.

Balanced versus unbalanced scales: A balanced scale has an equal number of favourable
and unfavourable categories. The following is an example of a balanced scale:

47 | P a g e
How important is price to you in buying a new car?
– Very important
– Relatively important
– Neither important nor unimportant
– Relatively unimportant
– Very unimportant
In this question, there are five response categories, two of which emphasize the importance of price
and two others that do not show its importance. The middle category is neutral.

The following is the example of an unbalanced scale:


How important is price to you in buying a new car?
o More important than any other factor
o Extremely important
o Important
o Somewhat important
o Unimportant

In this question, there are four response categories that are skewed towards the importance
given to the price, whereas one category is for the unimportant side. Therefore, this question
is an unbalanced question.
 Nature and degree of verbal description: Many researchers believe that each category
must have a verbal, numerical or pictorial description. Verbal description should be clearly
and precisely worded so that the respondents are able to differentiate between them. Further,
the researcher must decide whether to label every scale category, some scale categories or
only extreme scale categories.
Forced versus non-forced scales: In the forced scale, the respondent is forced to take a stand,
whereas in the non-forced scale, the respondent can be neutral if he/she so desires. The
argument for a forced scale is that those who are reluctant to reveal their attitude are
encouraged to do so with the forced scale. Paired comparison scale, rank order scale and
constant sum rating scales are examples of forced scales.
Physical form: There are many options that are available for the presentation of the scales. It
could be presented vertically or horizontally. The categories could be expressed in boxes,
discrete lines or as units on a continuum. They may or may not have numbers assigned to
them. The numerical values, if used, may be positive, negative or both.
Suppose we want to measure the perception about Jet Airways using a multi-

48 | P a g e
item scale. One of the questions is about the behaviour of the crew members. Given below is
a set of scale configurations that may be used to measure their behaviour:

The behaviour of the crew members of jet Airways is:

Below, we will describe Likert scale, which is very commonly used in survey research.

Likert scale: This is a multiple item agree–disagree five-point scale. The respondents are
given a certain number of items (statements) on which they are asked to express their degree
of agreement/disagreement. This is also called a summative scale because the scores on
individual items can be added together to produce a total score for the respondent. An
assumption of the Likert scale is that each of the items (statements) measures some aspect of
a single common factor; otherwise the scores on the items cannot legitimately be summed up.
In a typical research study, there are generally twenty-five to thirty items on a Likert scale.

To construct a Likert scale to measure a particular construct, a large number of


statements pertaining to the construct are listed. These statements could range from eighty to
120. The identification of the statements is done through exploratory research, which is
carried out by conducting a focus group, unstructured interviews with knowledgeable people,
literature survey, analysis of case studies, and so on.
Suppose we want to assess the image of a company. As a first step, an  exploratory research
may be conducted by having an informal interview with the customers and employees of the
company. The general public may also be contacted. A survey of the literature on the subject
may also give a set of information that could be useful for constructing the statements.
Suppose the number of statements to measure the constructs is 100. Now samples of
representative respondents are asked to state their degree of agreement/disagreement on those
statements. 

It may be noted that only anchor labels and no numerical values are assigned to the response
categories. Once the scale is administered, numerical values are assigned to the response
categories. The scale contains statements, some of which are favourable to the construct we
are trying to measure and some are unfavourable to it.

For example, out of the ten statements given, statements numbering 1, 2, 4, 6 and 9 in Table
5.1 are favourable statements, whereas the remaining are unfavourable statements. The
reason for having a mixture of favourable and unfavourable statements in a Likert scale is
that the responses by the respondent should not become monotonous while answering the
questions. Generally, in a Likert scale, there is an approximately equal number of favourable
and unfavourable statements. Once the scale is administered, numerical values are assigned to
the responses. The rule is that a 'strongly agree' response for a favourable statement should
get the same numerical value as the 'strongly disagree' response of the unfavourable
statement.

49 | P a g e
Suppose for a favourable statement, the numbering is done as: Strongly disagree = 1;
Disagree = 2; Neither agree nor disagree = 3; Agree = 4; and Strongly agree = 5.
Accordingly, an unfavourable statement would get the numerical values as: Strongly disagree
= 5; Disagree = 4; Neither agree nor disagree = 3; Agree = 2; and Strongly agree = 1. In order
to measure the image that the respondent has about the company, the scores are added.
For example, if a respondent has ticked () statements numbering from one to ten as shown in
Table 5.1, his total score would be 3 + 5 + 4 + 4 + 5 + 4 + 4 + 5 +
4 + 4 = 42 out of 50. Now if there are 100 respondents and 100 statements, the score on the
image of the company can be worked out for each respondent by adding his/her scores on the
100 statements. The minimum score for each respondent will be 100, whereas the maximum
score would be 500.
As mentioned earlier, a typical Likert scale comprises about 25–30 statements. In order to
select twenty-five statements from the 100 statements, we need to discard some of them. The
rule behind discarding the statements is that those items that are non-discriminating should be
removed.
As mentioned earlier, the score for each of the respondents on each of the
statements can be used to measure his/her total score about the image of the company. 

Measurement Error
Measurement error occurs when the observed measurement on a construct or concept
deviates from its true values. The following is a list of the reasons measurement errors:
There are factors like mood, fatigue and health of the respondent which may influence the
observed response while the instrument is being administered.
The variations in the environment in which measurements are taken may also result in a
departure from the true value.
At times, the errors may be committed at the time of coding, on entering data from
questionnaire to the spreadsheet on the computer, and at the tabulation stage.
The observed measurement in any research need not be equal to the true measurement. The
observed measurement can be written as,
O = T + S + R Where,
O = Observed measurement T = True score S=Systematic error
R = Random error
It may be noted that the errors consist of two components—systematic error and random
error. Systematic error causes a constant bias in the measurement. Suppose there is a
weighing scale that weighs 50 gm less for every one kg of product being weighed. The error
would consistently remain the same irrespective of the kind of product and the time at which
the product is weighed. Random error, on the other hand, involves influences that bias the
measurements but are not systematic. Suppose we use different weighing scales to weigh 1

50 | P a g e
kg of a product, and if systematic error is assumed to be absent, we may find that recorded
weights may fall within a range around the true value of the weight, thereby causing random
error.

Criteria for Good Measurement


There are three criteria for evaluating measurements: reliability, validity and sensitivity.

1. Reliability
Reliability is concerned with consistency, accuracy and predictability of the scale. It refers to
the extent to which a measurement process is free from random errors. The reliability of a
scale can be measured using the following methods:
Test–retest reliability: In this method, repeated measurements of the same person or group
using the same scale under similar conditions are taken. A very high correlation between the
two scores indicates that the scale is reliable. The researcher has to be careful in deciding the
time difference between two observations. If the time difference between two observations is
very small, it is very likely that the respondent would give same answer which would result
in higher correlation. Further, if the difference is too large, the attitude might have undergone
a change during that period, resulting in a weak correlation, and hence poor reliability.
Therefore, the researcher has to be very careful in deciding the time difference between the
observations. Generally, a time difference of about five-six months is considered as an ideal
period.
Split-half reliability method: This method is used in the case of multiple item scales. Here,
the number of items is randomly divided into two parts and a correlation coefficient between
the two is obtained. A high correlation indicates that the internal consistency of the construct
leads to greater reliability.

2. Validity
The validity of a scale refers to the question whether we are measuring what we want to
measure. Validity of the scale refers to the extent to which the measurement process is free
from both systematic and random errors. The validity of a scale is a more serious issue than
reliability. There are different ways to measure validity.
Content validity: This is also called face validity. It involves subjective judgement by an
expert for assessing the appropriateness of the construct. For example, to measure the
perception of a customer towards Kingfisher Airlines, a multiple item scale is developed. A
set of fifteen items is proposed. These items when combined in an index measure the
perception of Kingfisher Airlines. In order to judge the content validity of these fifteen items,
a set of experts may be requested to examine the representativeness of the fifteen items. The
items covered may be lacking in the content validity if we have omitted the behaviour of the
crew, food quality, food quantity, etc., from the list.
 In fact, conducting the exploratory research to exhaust the list of items measuring perception
of the airline would be of immense help in such a case.

51 | P a g e
Predictive validity: This involves the ability of a measured phenomenon at one point of time
to predict another phenomenon at some point in the future. If the correlation coefficient
between the two is high, the initial measure is said to have a high predictive ability. As an
example, consider the use of CAT (common admission test) to shortlist candidates for
admission to the MBA (Masters of Business Administration) programme in a business
school. The CAT scores are supposed to predict the candidate's aptitude for studies towards
business education.
3. Sensitivity
Sensitivity refers to an instrument's ability to accurately measure the variability in a concept.
A dichotomous response category, such as agree or disagree, does not allow the recording of
any attitude changes. A more sensitive measure with numerous categories on the scale may
be required. For example, adding 'strongly agree', 'agree', 'neither agree nor disagree',
'disagree' and 'strongly disagree' categories will increase the sensitivity of the scale.
The sensitivity of scale based on a single question or a single item can be increased by adding
questions or items. In other words, because composite measures allow for a greater range of
possible scores, they are more sensitive than a single-item scale.

Summary
Let us recapitulate the main points discussed in the unit:
Measurement means the assignment of numbers or other symbols to the characteristics of
certain objects. Scaling is an extension of measurement.
Scaling involves creating a continuum on which measurements on the objects are located.
There are four types of measurement scales: nominal, ordinal, interval and ratio scale.
Attitude is the predisposition of an individual to evaluate some objects or
symbols. It has three components: cognitive, affective and intention or action component.
Scales can be classified as single-item and multiple-item scales. Another
classification could be whether the scales are comparative or non- comparative in nature.
The observed measurement need not be equal to the true value of the
measurement. Some systematic and random errors may be found in the observed
measurement.
There are three criteria for determining the accuracy of a
measurement—reliability, validity and sensitivity.

UNIT 06 : QUESTIONNAIRE DESIGN


 Structure

52 | P a g e
 Introduction
 The Questionnaire Method
 Types of Questionnaire
 Process of Questionnaire Designing
 Advantages and Disadvantages of the Questionnaire Method
 Summary
 Keywords

Introduction
In the last unit, we discussed some methods of primary data collection, like observation,
focus group discussion and interviews. However, a discussion on data collection would be
incomplete if one did not talk about the questionnaire method. This is the most cost effective
and widely used method, apart from being extremely user friendly. The questionnaire method
is flexible enough to reveal data that is in the respondents' own words and language. It can be
made extremely scientific by framing questions which enable a very advanced level of
quantitative measurement and analysis. The pattern of questioning is always designed
keeping in mind the respondent's comfort and ease of answering. Today, with the wide use of
technology it is very easy to use the questionnaire method even without being present
physically in front of the respondent.
Even though all of us have filled a questionnaire at some time or the other and know what it
must include, designing a well structured and study specific questionnaire requires a
structured and logical path so that the effort of collecting information using the questionnaire
is meaningful. In this unit, you will learn about the various aspects of the questionnaire
method in detail. The entire process of questionnaire designing will be discussed at length,
with special reference to the different kinds of questionnaires available to the researcher.

The Questionnaire Method


The questionnaire is a research technique that consists of a series of questions asked to
respondents in order to obtain statistically useful information about a given topic. It is one of
the most cost-effective methods of collecting primary data, which can be used with
considerable ease by most individual and business researchers. It has the advantage of
flexibility of approach and can be successfully adapted for most research studies. The
instrument has been designed differently by various researchers. Some take the traditional
view of a written document requiring the subject to record his/her own responses (Kervin,
1999). Others have taken a broader perspective to include structured interview also as a
questionnaire (Bell,1999). It is essentially a data-collection instrument that has a pre-designed
set of questions, following a particular structure (De Vaus, 2002). Since it includes a standard
set of questions, it can be successfully used to collect information from a large sample in a
reasonably short time period. However, the use of questionnaire is not always the best
method in all research studies. For example, at the exploratory stage, rather than
questionnaire, it is advisable to use a more unstructured interview. Secondly, when the
number of respondents is small and one has to collect more subjective data, then a
questionnaire is not advisable.
Criteria for designing a questionnaire

53 | P a g e
There are certain criteria that must be kept in mind while designing the questionnaire. The ?
rst and foremost requirement is that the spelt-out research objectives must be converted
clear questions which will extract answers from the respondent. This is not as easy as it
sounds, for example, if one wants to know how many times your teacher praised you in the
week? It is very dif?cult to give an exact number. The second requirement is, it should be
designed to engage the respondent and encourage a meaningful response. For example, a
questionnaire measuring stress cannot have a voluminous set of questions which fatigue the
subject. The questions, thus, should encourage response and be easy to understand. Lastly,
the questions should be self- explanatory and not confusing as then the person will answer the
way he understood the question and not in terms of what was asked. This will be  discussed
in detail later, when we discuss the wording of the questions.

 Types of Questionnaires

There are many different types of questionnaires available to the researcher. The categorization can
be done on the basis of a variety of parameters. The two criteria that are most frequently used for
designing purposes are the degree of structure and the degree of concealment. Structure refers to
the degree to which the response category has been de?ned. Concealment refers
to the degree to which the purpose of the study is explained to the respondent. Instead of
considering them as individual types, most research studies use a mixed format. Thus, they will be
discussed here as a two-by-two matrix 

Let us discuss the types of questionnaires. Questionnaires can be categorized on the basis of their
structure or method of administration.

Based on the structure, questionnaires can be divided into the following

categories:

Formalized and unconcealed questionnaire: This is the one that is the most frequently used by all
management researchers. For example, if a new brokerage firm wants to understand the investment
behaviour of people, they would structure the questions and answers as follows:

1. Do you carry out any investment(s)?

Yes No If yes, continue, else terminate.

2. Out of the following options, where do you invest? (tick all that apply).

Precious metals
real estate
stocks

government instruments ,

mutual funds
any other .

This kind of structured questionnaire is easy to administer, and has self- explanatory questions and
clearly defined answer categories.

54 | P a g e
These questionnaires have a formal method of questioning; however, the purpose is not clear to the
respondent. The research studies which are trying to find out the latent causes of behaviour and
cannot rely on direct questions use these. For example, young people cannot be asked direct
questions on whether they are likely to be indulging in corruption at work. Thus, the respondent has
to be given a set of questions that can give an indication of what are his basic values, opinions and
beliefs, as these would influence how he would react to issues.

Non-formalized and unconcealed: Some researchers argue that rather than giving the
respondents pre-designed response categories, it is better to give them unstructured questions
where they have the freedom of expressing themselves the way they want. Some examples of
these kinds of questions are given below:
1. Why do you think maggie noodles are liked by young children ?
2. How do you generally decide on where you are going to invest your money?
3. Give THREE reasons why you believe that the show Satyamev jayate has affected the
common Indian?

The data obtained here is rich in content, but quanti?cation cannot go beyond frequency and
percentages to represent the ?ndings.
Non-formalized and concealed: If the objective of the research study is to uncover socially
unacceptable desires and subconscious and unconscious motivations, the investigator makes
use of questions of low structure and disguised purpose. However, these require interpretation
that is highly skilled. Cost, time and effort are also much higher than in others.
Another useful way of categorizing questionnaires is on the basis of method of
administration. Thus, the questionnaire that has been prepared would necessitate a face-to-
face interaction. In this case, the interviewer reads out each question and makes a note of the
respondent's answers. This administration is called a schedule. It might have a mix of the
questionnaire types as described in the section above and might have some structured and
some unstructured questions. The other kind is the self-administered questionnaire, where the
respondent reads all the instructions and questions on his own and records his own statements
or responses.
Thus, all the questions and instructions need to be explicit and self- explanatory.
The selection of one over the other depends on certain study prerequisites.
Population characteristics: In case the population is illiterate or unable to write the responses,
then one must as a rule use the schedule, as the questionnaire cannot be effectively answered
by the subject himself.
Population spread: In case the sample to be studied is large and widely spread, then one needs
to use the questionnaire. When the resources available for the study are limited, then
schedules become expensive to use and the self-administered questionnaire is better.

55 | P a g e
Study area: In case one is studying a sensitive topic like harassment at work, a self
administered questionnaire is suggested. However, in case the study topic needs additional
probing then a schedule is better. There is another categorization that is based upon the mode
of administration; this would be discussed in later sections of the unit.

Process of Questionnaire Designing

Even though the questionnaire method is most used by researchers, designing a well-
structured questionnaire needs considerable skill. Presented below is a standardized process
that a researcher can follow.
1. Convert the research objectives into information areas
This is the first step of the design process. By this time the researcher is clear about the
research questions; research objectives; variables to be studied; research information required
and the characteristics of the population being studied. Once these tasks are done, one can
prepare a tabled framework so that the questions which need to be developed become clear.T
2. Method of administration
Once the researcher has identified his information area; he needs to specify how the
information should be collected. The researcher usually has available a variety of methods for
administering the study. The main methods are personal schedule (discussed earlier in the
unit), self-administered questionnaire through mail, fax, e-mail and web-based questionnaire. 

3. Content of the questionnaire


The next step is to determine the matter to be included as questions in the measure. The
researcher needs to do an objective quality check in order to see what research
objective/information need the question would be covering before using any of the framed
questions.
How essential is it to ask the question? You must remember that the time of the respondent is
precious and it should not be wasted. Unless a question is adding to the data needed for
getting an answer to the research problem, it should not be included. For example, if one is
studying the usage of plastic bags, then demographic questions on age group, occupation,
education and gender might make sense but questions related to marital status, family size
and the state to which the respondent belongs are not required as they have no direct relation
with the usage or attitude towards plastic bags.
Sometimes, especially in self-administered questionnaires, one may ask some neutral
questions at the beginning of the questionnaire to establish an involvement and rapport. For
example, for a bio-fertilizer usage study, the following question was asked:
Farming for you is a:
 Noble profession
 Ancestral profession

56 | P a g e
 Profession like any other
 profession that is not money making
Any other

Do we need to ask several questions instead of a single one? After deciding on the signi?
cance of the question, one needs to ascertain whether a single question will serve the purpose
or should more than one question be asked. For example, in a TV serial study, one may give
ten popular serials to be ranked as 1 to 10 in order of preference. Then the second question
after the ranking question is:
'Why do you like the serial (the one you ranked No. 1/ prefer watching most)?' (Incorrect)
Here, one lady might say, 'Everyone in my family watches it'. While another might say, 'It
deals with the problems of living in a typical Indian joint family system' and yet another
might say, 'My friend recommended it to me’.
Thus, we need to ask her:
‘What do you like about ?’
'Who all in your household watch the serial?'and
'How did you first hear about the serial?' (Correct)

4. Motivating the respondent to answer


The questionnaire should be designed in a manner that it involves the respondent and
motivates him/her to give information. There are different situations which might lead to this.
Each of these is examined separately here:
Does the person have the required information? It has been found that the person has had no
experience with the issue being studied. Look at the following question:
How do you evaluate the negotiation skills module, viz., the communication and presentation
skill module? (Incorrect)
In this case it might be that the person has not undergone one or even both the modules, so
how can he compare? Thus, certain qualifying or filter questions must be asked. Filter
questions enable the researcher to ?lter out the respondents who are not adequately informed.
Thus, the correct question would have been:
Have you been through the following training modules?
 Negotiation skills module Yes/no
 Communication and presentation skills Yes/no
In case the answer to both is yes, please answer the following question, or else move to the
next question.
How do you evaluate the negotiation skills module, viz., the communication and presentation
skill module? (Correct)

57 | P a g e
Does the person remember? Many times, the question addressed might be putting too much
stress on an individual's memory. For example, consider the following questions:
How much did you spend on eating out last month? (Incorrect) Such questions are beyond
any normal individual's memory bank.
Thus, the questions listed above could have been rephrased as follows:

When you go out to eat, on an average your bill amount is:


Less than `100'
`101–250'
`251–500'
More than `500'

How often do you eat out in a week? 1–2 times


3–4 times
5–6 times
Every day (Correct)

Can the respondent articulate? Sometimes the respondent might not know how to put the
answer in clear words. For example, if you ask a respondent to:
Describe a river rafting experience.
Most respondents would not know what phrases to use to give an answer. Thus, in the above
case, one can provide answer categories to the person as
follows:
Describe the river rafting experience. (Correct)

Sensitive information: There might be instances when the question being asked might be
embarrassing to the respondents and thus they would not be comfortable disclosing the data
required.
For example, questions such as the following will not get any answers.

Have you ever used fake receipts to claim your medical allowance?
(Incorrect)

Have you ever spit tobacco on the road (to tobacco consumers)?
 (Incorrect)
58 | P a g e
However, in case the socially undesirable habit is in the context of a third person, the chances
of getting some correct responses are possible. Thus the questions should be rephrased as
follows:
Do you associate with people who use fake receipts to claim their medical allowance?
(Correct)
Do you think tobacco consumers spit tobacco on the road?
(Correct)
5. Determining the type of questions

Available to the researcher are different kinds of question-response options(Figure 6.2)


Open-ended questions
In open-ended questions, the openness refers to the option of answering in one's own words.
They are also referred to as unstructured questions or free- response or free-answer questions.
Some illustrations of this type are listed below:
What is your age?
Which is your favourite TV serial? I like Nescafe because
My career goal is to

Closed-ended questions
In closed-ended questions, both the question and response formats are structured and de?ned.
There are three kinds of formats as we observed earlier—dichotomous questions, multiple–
choice questions and those that have a scaled response.

 Dichotomous questions: These are restrictive alternatives and provide the


respondents only with two answers. These could be 'yes' or 'no', like or dislike, similar
or different, married or unmarried, etc.Are you diabetic? Yes/No Have you read the
new book by Dan Brown? Yes/no
 What kind of petrol do you use in your car? Normal/Premium Dichotomous questions are the
easiest type of questions to code and analyse. They are based on the nominal level of
measurement and are categorical or binary in nature.
 Multiple-choice questions: Unlike dichotomous questions, the person is given a
number of response alternatives here. He might be asked to choose the one that is
most applicable. For example, this question was given to a retailer who is currently
not selling organic food products:
Will you consider selling organic food products in your store?
Definitely not in the next one year Probably not in the next one year Undecided

59 | P a g e
Probably in the next one year Definitely in the next one year
Sometimes, multiple-choice questions do not have verbal but rather numerical options for the
respondent to choose from, for example:
How much do you spend on grocery products (average in one month)?
Less than `2500/- 
Between `2500–5000/-
More than `5000/-

Most multiple-choice questions are based upon ordinal or interval levels of measurement.
There could also be instances when multiple options are given to the respondent and he can
select all those that apply in the case. These kinds of multiple-choice questions are called
checklists. For example, in the organic food study, the retailer who does not stock organic
products was given multiple reasons as follows:
You do not currently sell organic food products because (Could be = 1)
You do not know about organic food products. You are not interested.
Organic products do not have attractive packaging. Organic food products are not supplied
regularly.
Any other
 Scales: Scales refer to the attitudinal scales that were discussed in detail in Unit 5.
Since these questions have been discussed in detail in the earlier unit, we will only
illustrate this with an example. The following is a question which has two sub-
questions designed on the Likert scale. These require simple agreement disagreement
on the part of the respondent. This scale is based on the interval level of measurement.
 
Given below are statements related to your organization. Please indicate your
agreement/disagreement with each:

6. Criteria for question designing


Step six of the questionnaire involves translating the objectives identified into meaningful
questions. There are certain designing criteria that a researcher should keep in mind when
writing the research questions.

Clearly specify the issue: By reading the question, the person should be able to clearly
understand the information needed.
Which newspaper do you read? (Incorrect)
This might seem to be a well-defined and structured question. However, the 'you' could be the
person filling the questionnaire or the family. He could be reading different newspapers. He

60 | P a g e
might be reading different papers at home and say, the college library. A better way to word
the question would be:’
Which newspaper or newspapers did you personally read at home during the last month? In
case of more than one newspaper, please list all that you read. (Correct)

Use simple terminology: The researcher must take care to ask questions in a language that is
understood by the population under study. Technical words or difficult words that are not
used in everyday communication must be avoided.
Do you think thermal wear provides immunity? (Incorrect)
Do you think that thermal wear provides you protection from the cold? (Correct)
Avoid ambiguity in questioning: The words used in the questionnaire should mean the same
thing to all those answering the questionnaire. A lot of words are subjective and relative in
meaning. Consider the following question:
How often do you visit Pizza
Hut? Never Occasionally Sometimes Often
Regularly (Incorrect)
These are ambiguous measures, as occasionally in the above question might be three to four
times in a week for one person, while for another it could be three times in a month. A much
better wording for this question would be the following:
In a typical month, how often do you visit Pizza Hut?
 
Less than once 1 or 2 times
3 or 4 times
More than 4 times (Correct)
Avoid leading questions: Any question that provides a clue to the respondents in terms of the
direction in which one wants them to answer is called a leading or biasing question.
For example,
Do you think that working mothers should buy ready-to-eat food even when it might contain
some chemical preservatives?
Yes
No
Don't know (Incorrect)
The question would mostly generate a negative answer, as no working mother would like to
buy something that is convenient but might be harmful.

61 | P a g e
Thus, it is advisable to construct a neutral question as follows: Do you think that working
mothers should buy ready-to-eat food? Yes
No
Don't know (Correct)

Avoid loaded questions: Questions that address sensitive issues are termed as loaded
questions and the response to these questions might not always be honest, as the person might
not wish to admit the answer. For example, questions such as the following will rarely get an
affirmative answer:

Will you take dowry when you get married? (Incorrect)


Sensitive questions like this can be rephrased in a variety of ways. For example, the question
could be constructed in the context of a third person as follows:
Do you think most Indian men would take dowry when they get married? (Correct)

Avoid double-barrelled questions: Questions that have two separate options separated by an
'or' or 'and' like the following:
Do you think Nokia and Samsung have a wide variety of touch phones? Yes/no (Incorrect)
The problem is that respondent may feel Nokia has better phones or Samsung has better
phones or both. These questions are referred to as double-barrelled and the researcher should
always split them into two separate questions. For example,
A wide variety of touch phones is available for: Nokia
Samsung
Both (Correct)
 7. Determine the questionnaire structure
The questions now have to be put together in a proper sequence.
Instructions: The questionnaires, even the schedules, always begin with standardized
instructions. These begin by greeting the respondent and then introducing the researcher and
then the purpose of questionnaire
administration. For example, in the study on organic food products, the following instructions
were given at the beginning of the questionnaire:
'Hi. We are carrying out a market research on the purchase behaviour for grocery
products/organic food. We are conducting a survey of consumers, retailers and experts in the
NCR for the same.
As you are involved in the purchase and/or consumption of food products, we seek your
cooperation for providing the following relevant information for our research. Thank you
very much.'

62 | P a g e
Opening questions: After instructions come the opening questions, which lead the reader
into the study topic. For example, a questionnaire on understanding the consumer's buying
behaviour in malls can ask an opening question that is generic in nature, such as:
What is your opinion about shopping at a mall?
Study questions: After the opening question/s, the bulk of the instrument needs to be
devoted to the main questions that are related to the specific information needs of the study.
Here also, the general rule is that the simpler questions, which do not require a lot of thinking
or response time should be asked ?rst as they build the tempo for answering the more
difficult/sensitive questions later on. This method of going in a sequential manner from the
general to the specific is called the funnel approach.
Classification information: This is the information that is related to the basic socio-economic
and demographic traits of the person. These might include name (kept optional in some
cases), address, e-mail address and telephone number.

Acknowledgement: The questionnaire ends by acknowledging the inputs of the respondent


and thanking him for his cooperation and valuable contribution.

8. Physical characteristics of the questionnaire


The researcher must pay special attention to the look of the questionnaire. The ?rst thing is
the quality of the paper on which the questionnaire is printed. Paper should be of good
quality. The font style and spacing used in the entire document should be uniform. One must
ensure that every question and its response options are printed on the same page. Surveys for
different groups could be on different coloured paper.
For example, if Delhi is being studied as  five zones, then the questionnaire used in each zone
could be printed on a differently coloured paper. Each question and section must be
numbered properly. In case there is any response instruction for an individual question, it
must be placed before the question.

9. Pilot testing of the questionnaire


Pilot testing refers to testing and administering the designed instrument to a small group of
people from the population under study. This is to essentially cover any errors that might
have still remained even after the earlier eight steps. Every aspect of the questionnaire has to
be tested and one must record all the experiences of the conduction, including the time taken
to administer it. Sometimes, the researcher might also get the questionnaire whetted by
academic or industry experts for their inputs. As far as possible, the pilot should be a small
scale replica of the actual survey that would be subsequently conducted.

10. Administering the questionnaire


Once all the nine steps have been completed, the final instrument is ready for conduction and
the questionnaire needs to be administered according to the sampling plan.

63 | P a g e
Layout of Questionnaire:

Having a good set of questions to ask the respondent doesn't totally guarantee success in
conducting a survey. The overall look of the questionnaire is also necessary to achieve the
goals of the survey.
Most often the respondents consider the questionnaire layout ?rst before having the
motivation to complete the survey. Studies show that respondents may not be able answer the
questions truthfully because of being pre-occupied or bothered by the number of pages to
answer, or the overall look of the questionnaire. Therefore, a good-looking questionnaire
layout is an important factor in increasing response rates
Format

1) The Cover Page

Placing a cover page on your survey questionnaire increases the level of motivation and
willingness to participate. The survey cover can instantly connect the respondents to the
survey and make them feel that they are important to make the survey a success.
The cover should contain the following:
1. The title of the survey or study
2. A one or two-sentence description of the survey, stating its purpose
3. Initial instructions
4. The name of the company conducting the survey
5. Any sponsors
 
The cover, as well as the back cover, should look simple to give an impression that the survey
is conducted in a professional manner. However, studies show that using colored covers
increase response rates by 2% to 4%, so feel free to add some spark on your cover.
2) The Instructions Page
In this page, explain further the purpose of the survey. Provide brief and speci?c instructions
on how the respondent should answer the questions. Also, instructions on how the respondent
should answer the questions. Also, instruct the respondent about the deadline for completing
the survey.
In addition, inform the respondent about con?dentiality matters, and offer contact numbers
that the respondent may call if there are any problems or comments regarding the survey
questionnaire.

3) The Questionnaire Proper

64 | P a g e
In forming the survey layout, the order of questions should be taken into consideration. The
questions should be arranged from general to speci?c. The very ?rst question should be a
general one but is pertaining to goals or purpose of the survey, so that the respondent won't
get intimidated but rather, become slowly engaged to the questionnaire. Being “general”
means that the ?rst question should be applicable to all respondents and is easy to answer in
just a few seconds.
The questions should be grouped according to their content. This helps the respondent to
organize his thoughts and reactions, leading to a more accurate response to the questions.
With regards to the appearance, the questions should be consistent in font style, font size, and
even the indentation.

4) The Navigational Path


In a survey, the navigational path simply means the path that should be followed by the
respondents when answering the questionnaire. There are four types of navigational paths:
verbal, numerical, and symbolic or graphical. Here are examples for each type:
1. Verbal (e.g. Skip to No. 12 ; Proceed to the Next Page)
2. Numerical (e.g. Page 1, 2, 3…)
3. Symbolic (e.g. ?, and other arrows )
Remember that the navigational path you utilize should be consistent in all the pages of the
questionnaire.

5) Survey Length
According to Dillman (2000), the length of the survey varies depending on three factors
relating to the respondent: his sense of commitment, interest and sense of responsibility in
completing the survey. As a rule of thumb, keep the questions as short as possible to keep
these three levels at their peaks.

 Advantages and Disadvantages of the Questionnaire Method


 
The questionnaire has many advantages over the other data collection methods discussed
earlier.

Probably the greatest benefit of the method is its adaptability. There is, actually speaking, no
domain or branch for which a questionnaire cannot be designed. It can be shaped in a manner
that can be easily understood by the population under study. The language, the content and
the manner of questioning can be modified suitably. The instrument is particularly suitable
for studies that are trying to establish the reasons for certain occurrences or  behaviour. The
second advantage is that it assures anonymity if it is self- administered by the respondent, as
there is no pressure or embarrassment in
revealing sensitive data.

65 | P a g e
A lot of questionnaires do not even require the person to ?ll in his/ her name. Administering
the questionnaire is much faster and less expensive as compared to other primary and a few
secondary sources as well. There is considerable ease of quantitative coding and analysis of
the obtained information as most response categories are closed-ended and based on the
measurement levels as discussed in Unit 5. The chance of researcher bias is very little here.
Lastly, there is no pressure of immediate response, thus the subject can ?ll in the
questionnaire whenever he or she wants.
However, the method does not come without some disadvantages. The major disadvantage is
that the inexpensive standardized instrument has limited applicability, that is, it can be used
only with those who can read and write.

The return ratio, i.e., the number of people who return the duly filled in questionnaires
sometimes not even 50 per cent of the number of forms distributed. Skewed sample response
could be another problem. This can occur in two cases; one, if the investigator distributes the
same to his friends and acquaintances and second, because of the self-selection of the
subjects. This means that the ones who ?ll in the questionnaire and return it might not be the
representatives of the population at large. In case the person is not clear about a question,
clari?cation with the researcher might not be possible.

Summary

 Let us recapitulate the main points discussed in this unit:


 The questionnaire is a research technique that consists of a series of questions asked
to respondents, in order to obtain statistically useful information about a given topic.
 It is one of the most cost-effective methods of collecting primary data, which has the
advantage of flexibility of approach and can be successfully adapted for most research
studies.
 There are many different types of questionnaires available to the researcher.
 Based  on  the  structure,   questionnaires  can  be  categorized  as unconcealed and
formalized, concealed and formalized, unconcealed  and non-formalized and
concealed and non-formalized.
 Based on the method of administration, the questionnaire could be  in the form of a
schedule or self-administered questionnaire.
UNIT 07 : SAMPLING

Learning Objectives
After going through this unit, you will be able to: 
 Explain the basic concepts of sampling Distinguish between sample and census
 Differentiate between a sampling and non-sampling error Describe sampling design
 Explain different types of probability sampling designs Describe various types of non-
probability sampling designs
 Estimate the sample size required while estimating the population mean and
proportion

66 | P a g e
Structure
7.1 Introduction
7.2 Sampling Concepts
        7.2.1 Sample vs. Census
        7.2.2 Sampling vs. Non-Sampling Error
7.3 Sampling Design
        7.3.1 Probability Sampling Design
        7.3.2 Non-Probability Sampling Designs
7.4 Determination of Sample Size
        7.4.1 Sample Size for Estimating Population Mean
        7.4.2  Determination of Sample Size for Estimating the Population Proportion
7.5 Summary
7.6 Keywords
7.1 Introduction

In the last unit, we discussed the concept of questionnaire designing. In this unit, we will
discuss an important aspect of research—sampling. Let us understand what is sampling and
what role it plays in research. As we have discussed earlier, research objectives are generally
translated into research questions that enable the researchers to identify the information
needs. Once the information needs are specified, the sources for collecting the information
are sought. Some of the information may be collected through secondary sources (published
material), whereas the rest may be obtained through primary sources. The primary methods
of collecting information include the observation method, personal interview with
questionnaire, telephone surveys  and mail surveys. Surveys are, therefore, useful in
information collection, and their analysis plays a vital role in finding answers to research
questions. Survey respondents should be selected using the appropriate procedures;
otherwise, the researchers may not be able to get the right information to solve the problem
under investigation. This is done through sampling.
In this unit, we will discuss in detail the concept of sampling, including sampling and non-
sampling error, probability and non-probability sampling designs, as well as determination of
sample size.

7.2 Sampling Concepts


The process of selecting the right individuals, objects or events for a study is known as
sampling. Sampling involves the study of a small number of individuals or objects chosen
from a larger group. Before we get into the details of various issues pertaining to sampling, it
would be appropriate to discuss some of the sampling concepts.

67 | P a g e
Population: Population refers to any group of people or objects that form the subject of study
in a particular survey and are similar in one or more ways. For example, the number of full-
time MBA students in a business school could form one population. If there are 200 such
students, the population size would be 200. We may be interested in understanding their
perceptions about business education. If in an organization there are 1,000 engineers, out of
which 350 are mechanical engineers and we are interested in examining the proportion of
mechanical engineers who intend to leave the organization within six months, all the 350
mechanical engineers would form the population of interest. If the interest is in studying how
the patients in a hospital are looked after, then all the patients of the hospital would fall under
the category of population.
Element: An element comprises a single member of the population. Out of the 350
mechanical engineers mentioned above, each mechanical engineer would form an element of
the population.
Sampling frame: Sampling frame comprises all the elements of a population with proper
identification that is available to us for selection at any stage of sampling. Some examples of
sampling frames are:
 The list of registered voters in constituency
 The telephone directory
 The number of students registered with a university
 The attendance sheet of a particular class
 The payroll of an organization
When the population size is very large, it becomes virtually impossible to form a sampling
frame. We know that soft drinks have a large number of consumers and, therefore, it becomes
very difficult to form the sampling frame for the same.
Sample: It is a subset of the population. It comprises only some elements of the population.
For instance, if out of 350 mechanical engineers employed in an organization, 30 are
surveyed regarding their intention to leave the organization in the next six months, then these
30 members would constitute the sample.
Sampling unit: A sampling unit is a single member of the sample. If a sample of 50 students
is taken from a population of 200 MBA students in a business school, then each of the 50
students is a sampling unit.
Sampling: It is a process of selecting an adequate number of elements from the population so
that the study of the sample will not only help in understanding the characteristics of the
population but also enable us to generalize the results. We will see later that there are two
types of sampling designs—probability sampling design and non-probability sampling
design.
Census (or complete enumeration): An examination of each and every element of the
population is called census or complete enumeration. Census is an alternative to sampling.
We will discuss the inherent advantages of sampling over a complete enumeration later.

7.2.1 Sample vs. Census

68 | P a g e
In a research study, we are generally interested in studying the characteristics of a population.
Suppose there are 2 lakh households in a town, and we are interested in estimating the
proportion of households that spend their summer vacations at a hill station. This information
can be obtained by asking every household in that town. If all the households in a population
are asked to provide information, such a survey is called a census. An alternative way of
obtaining the same informations by choosing a subset of all 2 lakh households and asking
them for the same information. This subset is called a sample.
Based upon the information obtained from the sample, a generalization about the population
characteristics could be made. However, that sample has to be representative of the
population. For a sample to be representative of the population, the distribution of sampling
units in the sample has to be in the same proportion as the elements in the population. For
example, if in a town there are 50, 35 and 15 per cent households in lower, middle and upper
income groups, respectively, then a sample taken from this population should have the same
proportions for it to be representative. There are several advantages of a sample over a
census, some of which are as follows:
Sample saves time and cost. Many times a decision-maker may not have too much of time to
wait till all the information is available. Then, a sample could come to his rescue.
There are situations where a sample is the only option. When we want to estimate the average
life of fluorescent bulbs, they are burnt out completely. If we go for a complete enumeration,
there would not be anything left for use. Another example could be testing the quality of a
photographic film.
The study of sample instead of complete enumeration may, at times, produce more reliable
results. This is because by studying a sample, fatigue is reduced and fewer errors occur while
collecting the data, especially when a large number of elements are involved.
A census is appropriate when the population size is small, e.g., the number of public sector
banks in a country. Suppose the researcher is interested in  collecting information from the
top management of a bank regarding their views on the monetary policy announced by the
Reserve Bank of India (RBI). In this case, a complete enumeration may be possible as the
population size is not very large.

7.2.2 Sampling vs. Non-Sampling Error

There are two types of errors that may occur when we try to estimate the population
parameters from the sample. These are called sampling and non- sampling errors.
Sampling error: This error arises when a sample is not representative of the population. It is
the difference between sample mean and population mean. The sampling error reduces with
the increase in sample size as an increased sample may result in increasing the
representatives of the sample.

7.3 Sampling Design


Sampling design refers to the process of selecting samples from a population. There are two
types of sampling designs—probability sampling design and non-probability sampling
design.

69 | P a g e
Probability sampling designs are used in conclusive research. In a probability sampling
design, each and every element of the population has a known chance of being selected in the
sample. The known chance does not mean equal chance. Simple random sampling is a special
case of probability sampling design where every element of the population has both known
and equal chances of being selected in the sample.
In case of non-probability sampling design, the elements of the population do not have any
known chance of being selected in the sample. These sampling designs are used in
exploratory research.
   7.3.1 Probability Sampling Design
Under probability sampling design, the following sampling designs would be covered:
 Simple random sampling with replacement (SRSWR)
 Simple random sampling without replacement (SRSWOR)
 Systematic sampling
 Stratified random sampling
a. Simple random sampling with replacement (SRSWR)
Under this scheme, a list is prepared which consists of all the elements of the population from
where the samples are to be drawn. If there are 1,000 elements in the population, we write the
identification number or the name of all the 1,000 elements on 1,000 different slips. These are
put in a box and shuffled properly. If there are 20 elements to be selected from the
population, the simple random sampling procedure involves selecting a slip from the box and
reading the identification number. Once this is done, the chosen slip is put back to the box
and again a slip is picked up, and the identification number is read from that slip. This
process continues till a sample of 20 is selected. Please note that the first element is chosen
with a probability of 1/1,000. The second one is also selected with the same probability and
so are all the subsequent elements of the population.

b. Simple random sampling without replacement (SRSWOR)


In case of simple random sample without replacement, the procedure is identical to what was
explained in the case of simple random sampling with replacement. The only difference here
is that the chosen slip is not placed back in the box. This way, the first unit would be selected
with the probability of 1/1,000, second unit with the probability of 1/999, the third will be
selected with a probability of 1/998, and so on, till we select the required number of elements
(in this case, 20) for our sample.

The simple random sampling (with or without replacement) is not used in consumer research.
This is because in a consumer research, the population size is usually very large, which
creates problems during the preparation of a sampling frame. For example, the number of
consumers of soft drinks, pizza, shampoo, soap, chocolate, etc., is very large. However, these
(SRSWR and SRSWOR) designs could be useful when the population size is very small, for
example, the number of steel/aluminium-producing companies in India and the number of

70 | P a g e
banks in India. Since the population size is quite small, the preparation of a sampling frame
does not create any problem.

Another problem with these (SRSWR and SRSWOR) designs is that we may not get a
representative sample using such a scheme. Consider an example of a locality having 10,000
households, out of which 5,000 belong to low-income group, 3,500 belong to middle income
group and the remaining 1,500 belong to high-income group. Suppose it is decided to take a
sample of 100 households using simple random sampling. The selected sample may not
contain even a single household belonging to the high- and middle- income group and only
the low-income households may get selected, thus resulting in a non-representative sample.

c. Systematic sampling
Systematic sampling takes care of the limitation of the simple random sampling that the
sample may not be a representative one. In this design, the entire population is arranged in a
particular order. The order could be the calendar dates or the elements of a population
arranged in an ascending or a descending order of the magnitude, which may be assumed as
random. List of subjects arranged in the alphabetical order could also be used and they are
usually assumed to be random in order. Once this is done, the steps followed in the
systematic sampling design are as follows:
First of all, a sampling interval, K = N/n is calculated, where N=the
size of the population, and n = the size of the sample.
It is seen that the sampling interval K should be an integer. If it is not, it is rounded off to
make it an integer. A random number is selected from 1 to K, Let us call it c.
The first element to be selected from the ordered population would be C, the next element
would be C + K and the subsequent one would be C + 2K, and so on, till a sample of size n is
selected.
This way we can get representation from all the classes in the population and overcome the
limitations of the simple random sampling. To take an example, assume that there are 1,000
grocery shops in a small town. These shops could be arranged in an ascending order of their
sales, with the first shop having the smallest sales and the last shop having the highest sales.
 
If it is decided to take a sample of 50 shops, then our sampling interval
K will be equal to 1000 ÷ 50 = 20. Now, we select a random number from 1 to
20. Suppose the chosen number is 10. This means that the shop number 10 will be selected
first and then shop number 10 + 20 = 30 and the next
one would be 10 + 2 × 20 = 50, and so on, till all the 50 shops are selected. This way, we can
get a representative sample in the sense that it will contain small, medium and large shops.

It may be noted that in systematic sampling, the first unit of the sample is selected at random

71 | P a g e
(probability sampling design), and having chosen this, we have no control over the
subsequent units of sample (non-probability sampling). This design of sampling is called
mixed sampling.
The main advantage of systematic sampling design is its simplicity. When sampling from a
list of population arranged in a particular order, one can easily choose a random start as
described earlier. After having chosen a random start, every Kth   item can be selected instead
of going for a simple random selection. This design is statistically more efficient than a
simple random
sampling, provided the condition of ordering of the population is satisfied.

The use of systematic sampling is quite common as it is easy and cheap to select a systematic
sample. In systematic sampling, one does not have to jump back and forth all over the
sampling frame wherever a random number leads, and neither does one have to check for
duplication of elements as compared to simple random sampling. Another advantage of
systematic sampling over simple random sampling is that one does not require a complete
sampling frame to draw a systematic sample. The investigator may be instructed to interview
every 10th customer entering a mall without a list of all customers.

d. Stratified random sampling


Under this sampling design, the entire population (universe) is divided into strata (groups),
which are mutually exclusive and collectively exhaustive. By mutually exclusive, it is meant
that if an element belongs to one stratum, it cannot belong to any other stratum. Strata are
collectively exhaustive if all the elements of various strata put together completely cover all
the elements of the population. The elements are selected using a simple random sampling
independently from each group.
There are two reasons for using a stratified random sampling rather than simple random
sampling. One is that the researchers are often interested in obtaining data about the
component parts of a universe. For example, a researcher may be interested in knowing the
average monthly sales of cell phones in 'large', 'medium' and 'small' stores. In such a case,
separate sampling from within each stratum would be called for. The second reason for using
a stratified random sampling is that it is more efficient as compared to a simple random
sampling. This is because dividing the population into various strata increases the
representativeness of the sampling as the elements of each stratum are homogeneous to each
other.
There are certain issues that may be of interest while setting up a stratified random sample.
These are:
What criteria should be used for stratifying the universe(population)?
The criteria for stratification should be related to the objectives of the study. The entire
population should be stratified in such a way that the elements are homogeneous within the
strata, whereas there should be heterogeneity between the strata. For example, if the interest
is to estimate the expenditure of households on entertainment, the appropriate criteria for

72 | P a g e
stratification would be the household income. This is because the expenditure on
entertainment and household income are highly correlated.
Generally, stratification is done on the basis of demographic variables like age, income,
education and gender. Customers are usually stratified on the basis of life stages and income
levels to study their buying patterns. Companies may be stratified according to size and
profits for analysing the stock market reactions.
How many strata should be constructed?
Going by common sense, as many strata as possible should be used so that the elements of
each stratum will be as homogeneous as possible. However, it may not be practical to
increase the number of strata and, therefore, the number may have to be limited. Too many
strata may complicate the survey and make preparation and tabulation difficult. Costs of
adding more strata may be more than the benefits obtained. Further, the researcher may end
up with the practical difficulty of preparing a separate sampling frame as the simple random
samples are to be drawn from each stratum.

What would be the appropriate number of sample size taken from each stratum?
This question pertains to the number of observations to be taken from each stratum. At the
outset, one needs to determine the total sample size for the universe and then allocate it
between each stratum. This may be explained as follows:
Let there be a population of size N. Let this population be divided into three strata based on
certain criterion. Let N, N and N denote the sizes 1, 2 and 3 of strsta  1,2  and  3, 
respectively,  such  that  N=N1+N2+NS.  These  strata  are mutually  exclusive  and 
collectively  exhaustive.  Each  of  these  three  strata could be treated as three populations.
Now,  if a total sample of size n is to could be treated as three populations. Now, if a total
sample of size n is to be taken from the population, the question arises that how much of the
sample should be taken from strata 1, 2 and 3, respectively, so that the sum total of sample
sizes from each strata adds up to n. Let the size of the sample from first, second and third
strata be n1+n2+n3   and n respectively such that n= n1+n2+n3 Then, There are two schemes
that  may be used to determine the values of n , (i = 1, 2, 3) from each strata.
 These are proportionate and disproportionate allocation schemes.
Proportionate allocation scheme: In this scheme, the size of the sample in each stratum is
proportional to the size of the population of the strata. For example, if a bank wants to
conduct a survey to understand the problems that its customers are facing, it may be
appropriate to divide them into three strata based upon the size of their deposits with the
bank. Let us assume that there are 10,000 customers in a bank. Out of this, 1,500 of them are
big account holders (having deposits of more than `10 lakh), 3,500 of them are medium
account  holders  (having   deposits   of   more   than Rs 2 lakh but less than Rs10 lakh) and
the remaining 5,000 are small account holders (having deposits of less than 2 lakh). Suppose
the total budget for sampling is fixed at
`20,000 and the cost of sampling a unit (customer) is
`20. If a sample of 100 is to be chosen from all the three strata, the size of
73 | P a g e
This way the size of the sample chosen from each stratum is proportional to the size of the
stratum. Once we have determined the sample size from each stratum, one may use the
simple random sampling or systematic sampling or any other sampling design to take out
samples from each strata.
Disproportionate allocation: As per the proportionate allocation explained above, the sizes of
the samples from strata 1, 2 and 3 are 15, 35 and 50, respectively. As it is known that the cost
of sampling of a unit is `20, irrespective of the strata from where the sample is drawn, the
bank would Could be treated as three populations. Now, if a total sample of size n is to
naturally be more interested in drawing a large sample from stratum 1, which has the big
customers, as it gets most of its business from strata 1. In other words, the bank may follow a
disproportionate allocation of sample as the importance of each stratum is not the same from
the point of view of the bank. The bank may like to take a sample of 45 from strata 1, and 40
and 15 from strata 2 and 3, respectively. Also, a large sample may be desired from the strata
having more variability.

7.3.2  Non- Probability Sampling Designs


Under the non-probability sampling, the following designs would be considered—
convenience sampling, purposive (judgemental) sampling and snowball sampling.
Convenience sampling
Convenience sampling is used to obtain information quickly and inexpensively. The only
criterion for selecting sampling units in this scheme is the convenience of the researcher or
the investigator.
Mostly, the convenience samples used are neighbours, friends, family members, colleagues
and 'passers-by'. This sampling design is often used in the pre- test phase of a research study
such as the pre-testing of a questionnaire. Some of the examples of convenience sampling
are:
People interviewed in a shopping center for their opinion on a TV programme
Monitoring the price level in a grocery shop with the objective of inferring the trends in
inflation in the economy
Requesting people to volunteer to test products
Using students or employees of an organization for conducting an experiment

In all the above situations, the sampling unit may either be self-selected or selected because
of ease of availability. No effort is made to choose a representative sample. Therefore, in this
design, the difference between the population value (parameters) of interest and the sample
value (statistic) is unknown both in terms of the magnitude and direction. Therefore, it is not
possible to make an estimate of the sampling error and researchers would not be able to make
a conclusive statement about the results from such a sample. Because of this, convenience
sampling should not be used in conclusive research (descriptive and causal research).

Convenience sampling is commonly used in exploratory research. This is because the

74 | P a g e
purpose of an exploratory research is to gain an insight into the problem and generate a set of
hypotheses which could be tested with the help of conclusive research. When very little is
known about a subject, a small-scale convenience sampling can be of use in the exploratory
work to help understand the range of variability of responses in a subject area.

Judgemental sampling
Under judgemental sampling, experts in a particular field choose what they believe to be the
best sample for the study in question. Judgement sampling calls for special efforts to locate
and gain access to the individuals who have the required information. Here, the judgement of
an expert is used to identify a representative sample. For example, the shoppers at a shopping
centre may serve to represent the residents of a city or some cities may be selected to
represent a country. Judgemental sampling design is used when the required information is
possessed by a limited number/category of people. This approach may not empirically
produce satisfactory results and may, therefore, curtail generalizability of the findings due to
the fact that we are using a sample of experts (respondents) that are usually conveniently
available to us.
 
Further, there is no objective way to evaluate the precision of the results. A company wanting
to launch a new product may use judgemental sampling for selecting 'experts' who have prior
knowledge or experience of similar products. A focus group discussion of such experts may
be conducted to get valuable insights. Opinion leaders who are knowledgeable are included in
the organizational context. Enlightened opinions (views and knowledge) constitute a rich data
source. A very special effort is needed to locate and have access to individuals who possess
the required information.
The most common application of judgemental sampling is in business- to- business (B to B)
marketing. Here, a very small sample of lead users, key accounts or technologically
sophisticated firms or individuals is regularly used to test new product concepts, producing
programmes, etc.

Snowball sampling
Snowball sampling is generally used when it is difficult to identify the members of the
desired population, e.g., deep-sea divers, families with triplets, people using walking sticks,
doctors specializing in a particular ailment, etc. Under this design, each respondent, after
being interviewed, is asked to identify one or more experts in the field. This could result in a
very useful sample. The main problem is in making the initial contact. Once this is done,
these cases identify more members of the population, who then identify further members, and
so on. It may be difficult to get a representative sample. One plausible reason for this is that
initial respondents may identify other potential respondents who are similar to themselves.
The next problem is to identify new cases.
7.4 Determination of Sample Size

The size of a sample depends upon the basic characteristics of the population, the type of
information required from the survey and the cost involved. Therefore, a sample may vary in

75 | P a g e
size for several reasons. The size of the population does not influence the size of the sample,
as will be shown later on. There are various methods of determining the sample size in
practice:
Researchers may arbitrarily decide the size of the sample without giving any explicit
consideration to the accuracy of the sample results or the cost of sampling. This arbitrary
approach should be avoided.
 For some projects, the total budget for the field survey (usually mentioned)
mentioned) in a project proposal is allocated. If the cost of sampling per sample unit is
known, one can easily obtain the sample size by dividing the total budget allocation
by the cost of sampling per unit. This method concentrates only on the cost aspect of
sampling, rather than the value of information obtained from such a sample.
 There are other researchers who decide on the sample size based on what was done by
the other researchers in similar studies. Again, this approach cannot be a substitute for
the formal scientific approach.
 The most commonly used approach for determining the size of the sample is the
confidence interval approach covered under inferential statistics. Below will be
discussed this approach while
 determining the size of a sample for estimating population mean and population
proportion. In a confidence interval approach, the following points are taken into
account for determining the sample size in estimation of problems involving means:

(a) The variability of the population: It would be seen that the higher the variability as
measured by the population standard deviation, the larger will be the size of the sample. If the
standard deviation of the population is unknown, a researcher may use the estimates of the
standard deviation from previous studies. Alternatively, the estimates of the population
standard deviation can be computed from the sample data.
(b) The confidence attached to the estimate: It is a matter of judgement how much
confidence you want to attach to your estimate. Assuming a normal distribution, the higher
the confidence the researcher wants for the estimate, larger will be the sample size. This is
because the value of the standard normal ordinate 'Z' will vary accordingly. For 90 per cent
confidence, the value of 'Z' would be 1.645 and for 95 per cent confidence, the corresponding
'Z' value would be 1.96, and so on (see Appendix 1 at the end of the book). It would be seen
later that a higher confidence would lead to a larger 'Z' value.
 
c) The allowable error or margin of error: How accurate do we want our estimate to be is
again a matter of judgement of the researcher. It will, of course, depend upon the objectives
of the study and the consequences resulting from the higher inaccuracy. If the researcher
seeks greater precision, the resulting sample size would be large.

7.4.1 Sample Size for Estimating Population Mean

76 | P a g e
The formula for determining the sample size in such a case is given by, It may be noted from
above that the size of the sample is directly proportional to the variability in the population
and the value of 'Z' for a confidence interval. It varies inversely with the size of the error. It
may also be noted that the size of a sample does not depend upon the size of the population.

A solved out example for the determination of a sample size is given below:
Example 7.1: An economist is interested in estimating the average monthly household
expenditure on food items by the households of a town. Based on past data, it is estimated
that the standard deviation of the population on the monthly expenditure on food item is `30.
With allowable error set at `7, estimate the sample size required at 90 per cent confidence.
Solution:
7.4.2 Determination of Sample Size for Estimating the Population Proportion
The formula for determining the sample size in such a case is given by, 
n= pq [(Z/E)*(Z/E)]
The above formula will be used if the value of population proportion p is known. If, however,
p is unknown, we substitute the maximum value of pq in the above formula. It can be shown
that the maximum value of pq is ¼ when p =1/2 and q=1/2
7.5 Summary
Let us recapitulate the main points discussed in this unit:
Surveys are useful for information collection. The survey respondents should be selected
using appropriate and right procedures. The process of selecting the right individuals, objects
or events for the study is known as sampling.
 An alternative to sample is a census where each and every element of the population
(universe) is examined. There are many advantages of sampling over complete enumeration.
While estimating the population parameter using
sample results, the researcher may incur two types of errors:o Sampling error o Non-sampling
error
The process of selecting samples from the population is referred to as sampling design. There
are two types of sampling designs:

 Probability sampling design


 Non-probability sampling design 

Probability sampling designs are used in a conclusive research whereas non-probability


sampling designs are appropriate for an exploratory research.
There are four probability sampling designs:
 Simple random sampling with replacement
77 | P a g e
 Simple random sampling without replacement
 Systematic sampling
 Stratified random sampling

Under the non-probability sampling designs, there are convenience sampling, judgmental
sampling and snowball sampling.

UNIT 08 :DATA PROCESSING


Structure

 Introduction
 Data Editing
 Field Editing
 Centralized In-House Editing
 Coding
 Coding Closed-Ended Structured Questions
 Coding Open-Ended Structured Questions
 Classification and Tabulation of Data
 Summary
 Keywords

Introduction
In the last few units, you have learnt about the various aspects of data collection. The critical
job of the researcher begins after the data has been collected. He has to use this information
to assess whether he had been correct or incorrect while making certain assumptions in the
form of the hypotheses at the beginning of the study. The raw data that has been collected
must be refined and structured in such a format that it can lend itself to statistical enquiry.
This process of preparing the data for an analysis is a structured and sequential process. The
process starts by validating the measuring instrument, which could be a questionnaire or any
other primary technique. This is followed by editing, coding, classifying and tabulating the
obtained data. In this unit we will learn these steps of preparing the data through editing,
coding and tabulating, so that it is ready for any kind of statistical analysis, in order to
achieve the research objectives we had formulated earlier.
 
 Data Editing
Data editing is the process that involves detecting and correcting errors (logical
inconsistencies) in data. After collection, the data is subjected to processing. Processing
requires that the researcher must go over all the raw data forms and check them for errors.
The significance of validation becomes more important in the following cases:

In case the form had been translated into another language, expert analysis is done to see
whether the meaning of the questions in the two measures is the same or not.
The second case could be that the questionnaire survey has to be done at multiple locations
and it has been outsourced to an outside research agency. The respondent seems to have used

78 | P a g e
the same response category for all the questions; for example, there is a tendency on a five
point scale to give 3 as the answer for all questions.
The form that is received back is incomplete, in the sense that either the person has not
answered all questions, or in case of a multiple- page questionnaire, one or more pages are
missing.
The forms received are not in the proportion of the sampling plan. For
example, instead of an equal representation from government and private sector employees,
65 per cent of the forms are from the government sector. In such a case the researcher either
would need to discard the extra forms or get an equal number filled-in from private sector
employees.
Once the validation process has been completed, the next step is the editing of the raw data
obtained. While carrying out the editing, the researcher needs to ensure that:

 The data obtained is complete in all respects.


 It is accurate in terms of information recorded and responses sought.
 Questionnaires are legible and are correctly deciphered, especially the open-ended
questions.
 The response format is in the form that was instructed.
 The data is structured in a manner that entering the information will not be a
problem.
 The editing process is carried out at two levels, the first of these is field editing and
the second is central editing.

Field Editing
Usually, the preliminary editing of the information obtained is done by the field investigators
or supervisors who review the filled forms for any inconsistencies, non-responses, illegible
responses or incomplete questionnaires. Thus the errors can be corrected immediately and if
need be the respondent who filled in the form, can be contacted again. The other advantage is
that regular field editing ensures that one can also check that the surveyor is able to handle
the process of instructions and probing correctly or not. Thus, the researcher can advise and
train the investigator on how to administer the questionnaire correctly.

Centralized In-House Editing


The second level of editing takes place at the researcher's end. At this stage there are
two kinds of typical problems that the researcher might encounter.

First, one might detect an incorrect entry. For example, in case of a five- point scale one
might find that someone has used a value more than 5. In another case, one might be asking a
question like, 'how many days do you travel out of the city in a week?' and the person says
'15 days'. Here one can carry out a quick frequency check of the responses; this will
immediately detect an unexpected value.

The second and the major problem that most researchers face is that of 'armchair
interviewing' or a fudged interview. One way to handle this is to first scroll the answers to the
open-ended questions, as generally if the investigator is filling in multiple forms faking these
would be difficult.

79 | P a g e
The researcher has some standard processes available to him to carry out the editing process.
These are briefly discussed below.

Backtracking: The best and the most efficient way of handling unsatisfactory responses is to
return to the field, and go back to the respondents. This technique is best used for industrial
surveys but a little difficult in individual surveys.
Allocating missing values: This is a contingency plan that the researcher
might need to adopt in case going back to the field is not possible. Then the option might be
to assign a missing value to the blanks or the unsatisfactory responses. However, this works
in case:

 The number of blank or wrong answers is small.


 The number of such responses per person is small.
 The important parameters being studied do not have too many blanks; otherwise the
sample size for those variables becomes too small for generalizations.

Plug value: In cases such as the third condition above, when the variable being studied is the
key variable, then sometimes the researcher might insert a plug value. Sometimes one can
plug an average or a neutral value in such cases, for example a 3 for a five-point scale or the
researcher might have to establish a rule as to what value will be put if the person has not
answered. Sometimes, the respondents' pattern of responses to other questions is used to
extrapolate and calculate an appropriate response for the missing answer.
Discarding unsatisfactory responses: If the response sheet has too many blanks/illegible or
multiple responses for a single answer, the form is not worth correcting and editing. Hence, it
is much better to completely discard the whole questionnaire.

Coding
The process of identifying and denoting a numeral to the responses given by a
respondent is called coding. This is essentially done in order to help the researcher in
recording the data in a tabular form later. It is advisable to assign a numeric code even for the
categorical data (e.g., gender). In fact, even the open-ended questions, which are in a
statement form, we will try to categorize them into numbers. The reason for doing this is that
the graphic representation of data into charts and figures becomes easier.

Usually, the codes that have been formulated are organized into fields, records and
files. For example, the gender of a person is one field and the codes used could be 0 for males
and 1 for females. All related fields, for example, and all the demographic variables like age,
gender, income, marital status and education could be one record. The records of the entire
sample under study form a single file. The data that is entered in the spreadsheet, such as on
EXCEL, is in the form of a data matrix, which is simply a rectangular arrangement of the
data in rows and columns. Here, every row represents a single case or record. 

Codebook formulation: In order to manage the data entry process, it is best to prepare a
method for entering the records. This coding scheme for all the variables under study is called
a code book. Generally, while designing the rules, care must be taken to decide on some
categories that are:

80 | P a g e
 Comprehensive: Should cover all the possible answers to the question that was
asked.
 Mutually exclusive: The categories and codes devised must be exclusive or clearly
different from each other.
 Single variable entry: The response that is being entered and the code for it should
indicate only a single variable. For example, a 'working single mother' might seem an
apparently simple category which one could code as 'occupation'. However, it needs
three columns—occupation, marital status and family life. So, one needs to have three
different codes to enter this information.

Based on the above rules, one creates a code book. This would generally contain information
on the question number, variable name, response descriptors, coding instructions and the
column descriptor. Table 8.2 gives an example from a questionnaire designed to measure the
consumer buying behavior for ready-to-eat food products.
 
As we have read in Unit 6, a questionnaire can have both closed- ended and open-ended
questions. When the questions are structured and the response categories are prescribed then
one does what is called pre- coding, i.e., giving numeral codes to the designed responses
before administration. However, if the questions are structured and the answers are open
ended, one needs to decide on the codes after the administration of the survey. This is called
post- coding.

Coding Closed-Ended Structured Questions

The method of coding for structured questions is easier as the response categories are decided
in advance. The coding method to be followed for different kinds of questions is discussed
below.

Dichotomous questions: For dichotomous questions, which are on a nominal scale, the
responses can be binary, for example: Do you eat ready-to- eat food? Yes = 1; No = 0. This
means if someone eats ready-to-eat food, he/she will be given a score of 1 and if not, then 0.

Ranking questions: For ranking questions where there are multiple objects to be ranked, the
person will have to make multiple columns, with column numbers equaling the number of
objects to be ranked. 

Scaled questions: For questions that are on a scale, usually an interval scale, the
question/statement will have a single column and the coding instruction would indicate what
number needs to be allocated for the response options given in the scale. Consider the
following question:
Please indicate your level of agreement with the following statements. 
SA – Strongly agree; A – Agree; N – Neutral; D – Disagree; SD – Strongly disagree
The code book for this will look as follows:

Missing values: It is advisable to use a standard format for signifying a non- response or a
missing value. For example, a code of 9 could be used for a single-column variable, 99 for a
double-column variable, and 999 for a three character variable and so on. The researcher
must take care as far as possible to use a value that is starkly different from the valid
responses. This is one of the reasons why 9 is suggested. However, in case you have a 10

81 | P a g e
point scale, do not use 9.

Coding Open-Ended Structured Questions


The coding of open-ended questions is quite difficult as the respondents' exact answers are
noted on the questionnaire. Then the researcher (either individually or as a team) looks for
patterns and assigns a category code.
The following example is an open ended question:
If you think lean management was a success so far, please specify three most significant
reasons that have contributed to its success in your opinion.
 
The method of coding for structured questions is easier as the response categories are decided
in advance. The coding method to be followed for different kinds of questions is discussed
below.

Dichotomous questions: For dichotomous questions, which are on a nominal scale, the
responses can be binary, for example: Do you eat ready-to- eat food? Yes = 1; No = 0. This
means if someone eats ready-to-eat food, he/she will be given a score of 1 and if not, then 0.

 Ranking questions: For ranking questions where there are multiple objects to be ranked, the
person will have to make multiple columns, with column numbers equaling the number of
objects to be ranked. 

Classification and Tabulation of Data

Sometimes, the data obtained from the primary instrument is so huge that it becomes
difficult to interpret. In such cases, the researcher might decide to reduce the information into
homogenous categories. This method of arrangement is called classification of data. This can
be done on the basis of class intervals.

Classification by class intervals: Numerical data, like the ratio scale data, can be classified
into class intervals. This is to assist the quantitative analysis of data. For example, the age
data obtained from the sample could be reduced to homogenous grouped data. For example,
all those below twenty- five form one group, 25–35 are another group, and so on. Thus, each
group will have class limits—an upper and a lower limit. The difference between the limits is
termed as the class magnitude. One can have class intervals of both equal and unequal
magnitude.
The decision on how many classes and whether equal or unequal depends upon the
judgement of the researcher. Generally, multiples of 2 or 5 are preferred. Some researchers
adopt the following formula for determining the number of class intervals:
I = R/(1 + 3.3 log N)
I = size of class interval,
R = Range (i.e., difference between the values of the largest item and smallest item among
the given items),
N = Number of items to be grouped.
The class intervals that are decided upon could be exclusive, for example: 10–15
15–20

82 | P a g e
20–25
25–30
In this case, the upper limit of each is excluded from the category. Thus, we read the first
interval above as 10 and under 15, the next one as 15 and under 20, and so on.
The other kind is inclusive, that is: 10–15
16–20
21–25
26–30
Here, both the lower and the upper limits are included in the interval. It says 10–15 but
actually means 10–15.99. It is recommended that when one has continuous data it should be
signified as 10–15.99, as then all possibilities of the responses are exhausted here. However,
for discrete data one can use 10–15.
Once the categories and codes have been decided upon, the researcher needs to arrange them
according to some logical pattern. This is referred to as tabulation of data. This involves an
orderly arrangement of data into an array that is suitable for a statistical analysis. Usually,
this is an orderly arrangement of the rows and columns. In case there is data to be entered for
one variable, the process is a simple tabulation and, when it is two or more variables, then
one carries out a cross-tabulation of data.

Summary

Let us recapitulate the main points discussed in this unit:

 Data processing refers to the primary data that has been collected specifically for the
study.
 The researcher has to check for omissions or errors. This is the editing
 stage of the data processing step. This is done first at the field and then at the central
office level.
 At this stage, the research team conducts some data treatment such as allocating the
missing values, if possible, backtracking and sometimes, plugging the incomplete data.
 Once this is completed, the researcher prepares the code book.
 Classification into attributes or class intervals is carried out and the entered data is
now ready for analysis in a tabular for.

UNIT 09 : UNIVARIATE AND BIVARIATE


Structure

 Introduction
 Descriptive vs. Inferential Analysis
 Descriptive Analysis
 Inferential Analysis
 Descriptive Analysis of Univariate Data
 Analysis of Nominal Scale Data with only One Possible Response
 Analysis of Ordinal Scaled Question       

 Measures of Central Tendency


 Measures of Dispersion

83 | P a g e
 Descriptive Analysis of Bivariate Data
 Summary
 Keywords
 Introduction

In the previous unit, we studied the processing of data collected from both primary and
secondary sources. The next step is to analyse the same so as to draw logical inferences
from them. The data collected in a survey could be voluminous in nature, depending
upon the size of the sample. In a typical research study there may be a large number of
variables that the researcher needs to analyse.

 Univariate analysis – the examination of the distribution of cases on only one


variable at a time (e.g., weight of college students)
 Bivariate analysis – the examination of two variables simultaneously (e.g.the
relation between gender and weight of college students )
 Multivariate analysis –the examination of more than two variables simultaneously
(e.g., the relationship between gender, race and weight of college students)

 In this unit, we will concentrate on the descriptive analysis of univariate and bivariate data.

Descriptive vs. Inferential Analysis

At the data analysis stage, the first step is to describe the sample which is followed by
inferential analysis. In the descriptive analysis, we describe the sample whereas the
inferential analysis deals with generalizing the results as obtained from the sample.

Descriptive Analysis
Descriptive analysis refers to transformation of raw data into a form that will facilitate
easy understanding and interpretation. Descriptive analysis deals with summary measures
relating to the sample data. The common ways of summarizing data are by calculating
average, range, standard deviation, frequency and percentage distribution. Below is a set of
typical questions that are required to be answered under descriptive statistics:

 What is the average income of the sample?


 What is the standard deviation of ages in the sample?
o What percentage of sample respondents are married?
o What is the median age of the sample respondents?

Types of descriptive analysis


The type of descriptive analysis to be carried out depends on the measurement of variables
into four forms—nominal, ordinal, interval and ratio.

Inferential Analysis
After descriptive analysis has been carried out, the tools of inferential statistics are
applied. Under inferential statistics, inferences are drawn on population parameters based on
sample results. The researcher tries to generalize the results for the population based on
sample results. The following is an illustrative list of questions that are covered under
inferential statistics.

84 | P a g e
 Is the average age of the population significantly different from 35?
 Is the job satisfaction of unskilled workers significantly related with their pay
packet?
 Do the users and non-users of a brand vary significantly with respect to age?

Descriptive Analysis of Univariate Data

The first step under univariate analysis is the preparation of frequency distributions of
each variable. The frequency distribution is the counting of responses or observations for
each of the categories or codes assigned to a variable.

Analysis of Nominal Scale Data with only One Possible Response


Consider a nominal scale variable—gender of respondents in a survey research.

Analysis of Ordinal Scaled Questions


There could always be some ordinal-scaled questions in the questionnaire. The question
before the researcher is how to tabulate and interpret the responses to such questions. It could
be done in two ways as shown in the following example. The questions asked of the
respondents in such a case could be:
Rank the following five attributes while choosing a restaurant for dinner. Assign a rank of 1
to the most important, 2 to the next important … and 5 to the least important: Ambience,
Food quality, Menu variety, Service, Location

9.3.3 Measures of Central Tendency


There are three measures of central tendency that are used in research— mean, median and
mode.

1. Mean
The mean represents the arithmetic average of a variable and is appropriate for
interval and ratio scale data. The mean is computed as:

 Where X = Mean of some variable=X 


Xi = Value of ith observation on that
n = Number of observations in the sample

It is also possible to compute the value of mean when interval or ratio scale data are grouped
into categories or classes. The formula for mean in such a case is given by:

Where,
fi= Frequency of ith class
Xi= Midpoint of ith  class
k = Number of classes
 

2. Median

85 | P a g e
The median can be computed for ratio, interval or ordinal scale data. The median is
that value in the distribution such that 50 per cent of the observations are below it and 50 per
cent are above it. The median for the ungrouped data is defined as the middle value when the
data is arranged in ascending or descending order of magnitude.
n case the number of items in the sample is odd, the value of (n + 1)/2th  item gives
the median. However if there are even number of items in the sample, say of size 2n, the
arithmetic mean of nth and (n + 1)th items gives the median.
It is again emphasized that data needs to be arranged in ascending or descending orders of
magnitude before computing the median.

Given below are a few examples to illustrate the computation of median:

Example 9.2: The marks of 21 students in economics are given 62, 38, 42,
43, 57, 72, 68, 60, 72, 70, 65, 47, 49, 39, 66, 73, 81, 55, 57, 57 and 59. Compute
the median of the distribution.

Solution:
By arranging the data in ascending order of magnitude, we obtain: 38, 39, 42, 43, 47, 49, 55,
57, 57, 57, 59, 60, 62, 65, 66, 68, 70, 72, 72, 73 and 81.
The  median  will  be  the  value  of  the  11th    observation  arranged  as  above. Therefore,
the value of median equals 59. This means 50 per cent of students score marks below 59 and
50 per cent score above 59.
The median could also be computed for the grouped data. In that case, first of all, median
class is located and then median is computed using interpolation by using the assumption that
all items are evenly spread over the entire class interval. The median for the grouped data is
computed using the following formula:

Where,
l = Lower limit of the median class
f = Frequency of the median class
CF = Cumulating frequency for the class immediately below the class containing the median
h= size of the interval of the median class.

Given below is an example to illustrate the computation of median in the case of grouped
data:

Example 9.3: The distribution of dividend declared by seventy-seven companies is given in


the following table. Compute the median of the distribution.
Where,
l = Lower limit of the median class = 30
f = Frequency of the median class = 18
CF = Cumulating frequency for the class immediately below the class containing the median
= 37
h= Size of the interval of the median class =10

Substituting these values in the formula for median, we get Median = 30.83
The results show that half of the companies have declared less than 30.83 per cent dividend
and the other half have declared more than 30.83 per cent dividend.

86 | P a g e
The limitation of median as a measure of central tendency is that it does not use each and
every observation in its computation since it is a positional average.

3. Mode
The mode is that measure of central tendency which is appropriate for nominal or
higher order scales. It is the point of maximum frequency in a distribution  around which
other items of the set cluster densely. Mode should not be  computed for ordinal or interval
data unless these data have been grouped first. The concept is widely used in business, e.g., a
shoe store owner would be naturally interested in knowing the size of the shoe that the
majority of the customers ask for. Similarly, a garment manufacturer is interested in
determining the size of the shirt that fits most people so as to plan its production accordingly.

9.3.4 Measures of Dispersion


The measures of central tendency locate the centre of the distribution. However, they
do not provide enough information to the researcher to fully understand the distribution being
examined. There is a need to study the spread of distribution of a variable and the methods
which provide that are called measures of dispersion.
The study of dispersion could help in taking better decisions. This is because small dispersion
indicates high uniformity of the items, whereas large variability denotes less uniformity. If
returns on a particular investment show lot of variability (dispersion), it indicates a risky
investment as compared to the one where variability is very small. The various measures of
dispersion are discussed below:

I) Range: This is the simplest measure of dispersion and is defined as the distance
between the highest (maximum) value and the lowest (minimum) value in an ordered set of
values. The range could be computed for interval scale and ration scale data.

Where,
Range = Xmax– Xmin
Where,
Xmax= Maximum value of the variable
Xmin= Minimum value of the variable

The limitation of range as a measure of dispersion is that it considers only the extreme value
and ignores all other data points. The value of range could vary considerably from sample to
sample. Even with this limitation, range as a measure of dispersion is widely used in
industrial quality control for the preparation of control charts.

Example 9.6: The following are the prices of shares of a company from Monday to Friday:
Calculate the range of the distribution.
Solution:
L = Largest values = 210
S = Smallest value = 100
Therefore, range = L – S = 210 – 100 = 110.
 
In the case of a frequency distribution, range is calculated by taking the difference between
the lower limit of the lowest class and upper limit of the highest class. The limitation of range
is that it is not based on each and every observation of the distribution and, therefore, does
not take into account the form of distribution within the range.

87 | P a g e
(ii) Variance and standard deviation: Variance is defined as the mean squared
deviation of a variable from its arithmetic mean. The positive square root of the variance is
called standard deviation. The population standard deviation is denoted by σ and computed
using the following formula:
σ = Population standard deviation
X = Value of observations
μ = Population mean of observations
N = Total number of observations in the population
However, in survey research, we generally take a sample from the population. If the standard
deviation is computed from the sample data, the following formula may be used.

Where,

Xi = Value of ith observation


X= Sample mean
fi  = frequency of ith class interval
n = sample size
The standard deviation could be computed in case of interval and ratio scale data.

(iii) Coefficient of variation: This measure is computed for ratio scale measurement. The
standard deviation measures the variability of a variable around the mean. The unit of
measurement of standard deviation is the same as that of arithmetic mean of the variable
itself. The measure of dispersion is considerably affected by the unit of measurement. In such
a case, it is not possible to compare the variability of two distributions using standard
deviation as a measure of variability. To compare the variability of two or more distributions,
a measure of relative dispersion called the coefficient of variation can be used. This measure
is independent of units of measurement. The formula of coefficient of variation is:
9.4 Descriptive Analysis of Bivariate Data
 
As already mentioned, bivariate analysis examines the relationship between two variables.
There are various methods used for carrying out bivariate analysis. We will discuss two
methods, namely, cross-tabulation and correlation coefficient in this course. The discussion of
correlation coefficient is taken up in Unit 13.

Cross-tabulation
In simple tabulation, the frequency and the percentage for each question is calculated. In
cross-tabulation, responses to two questions are combined and data is tabulated together. A
cross-tabulation counts the number of observations in each cross-category of two variables.
The descriptive result of a cross-tabulation is a frequency count for each cell in the analysis.
For example, in cross-tabulating a two-category measure of income (low- and high-income
households) with a two-category measure of purchase intention of a product (low and high
purchase intentions) the basic result is a cross- classification as shown in Table 9.5.
The results of cross-tabulation show the number of sample respondents with low income
having low purchase intention, low income with high purchase intention, high income with
low purchase intention and high income with high purchase intention.

As is the case with simple tabulation, the results of a cross-tabulation are more meaningful if
cell frequencies are computed as percentages. The percentages can be computed in three

88 | P a g e
ways. As is the case in Table 9.5, the percentages can be computed (1) row-wise so that the
percentages in each row add up to 100 per cent; (2) column-wise so that the percentages in
each column add up to 100 per cent or (3) cell percentages, such that percentages added
across all cells equal 100 per cent. The interpretation of percentages is different in each of the
three cases. Therefore, the question arises that which of these percentages is most useful to
the researcher. What is the general rule for computing percentages?

The basis for calculating category percentage depends upon the nature of relationship
between the variables. One of the variables could be viewed as dependent variable and the
other one as independent variable. In the cross- tabulation presented in Table 9.5, the
purchase intention could be treated as dependent variable, which depends upon income
(independent variable). The rule is to cast percentages in the direction of independent (causal)
variable across the dependent variable.

For Table 9.5, there are 200 respondents with low income, out of which 120 have low
purchase intention for the product. In terms of percentages, 60 per cent of the respondents
with low income have low purchase intention for the product. Now there are 250 people with
high income, out of which 60 have low purchase intention and 190 have high purchase
intention for the product. By calculating percentages column wise, it is seen that 24 per cent
have low purchase intention whereas 76 per cent have high purchase intention for the
product. The results indicate that with increase in income, the purchase intention for the
product increases. Table 9.6 presents the percentages column-wise as given below:
From the above example, it is clear that any two variables with each having certain categories
can be cross-tabulated. The interpretation of the cross- tabulation results may show a high
association between the two variables. That does not mean one of them, the independent
variable, is the cause of the other variable—the dependent variable. Causality between the
two variable is more of an assumption made by the researcher based on his experience or
expectations. Just because there is high association between two variables, it does not imply a
cause-and-effect relationship.
As mentioned earlier, correlation coefficient would be discussed in Unit 13.
9.5 Summary
Let us recapitulate the main points discussed in this unit
Data analysis could be univariate, bivariate and multivariate. Further, it could be descriptive
or inferential.
The type of analysis depends upon the level of measurement, i.e., nominal,
ordinal, interval and ratio.
The bivariate analysis of data is illustrated through cross-table and correlation coefficient.

UNIT 10: TESTING OF HYPOTHESES


Structure

 Introduction
 Concepts in Testing of Hypothesis
  Steps in Testing of Hypothesis Exercise
 Tests Concerning Means— Case of Single Population
 Tests for Difference between Two Population Means
 Tests Concerning Population Proportion— Case of Single Population
 Tests for Difference between Two Population Proportions
 Summary

89 | P a g e
 Keywords

Introduction
In the previous unit, we studied the descriptive analysis of univariate and bivariate
data. In this unit, we will study the testing of hypothesis. A hypothesis is an assumption or a
statement that may or may not be true. The hypothesis is tested on the basis of information
obtained from a sample. Instead of asking, for example, what the mean assessed value of an
apartment in a multi- storeyed building is, one may be interested in knowing whether or not
the assessed value equals some particular value, say `80 lakh. Some other examples would be
whether a new drug is more effective than the existing drug based on the sample data, and
whether the proportion of smokers in a class is different from 0.30. The formulation of
hypothesis has already been discussed in Unit 2.
 
We will now study the concepts and steps in the testing of hypothesis exercise.

Concepts in Testing of Hypothesis


  Below are discussed some concepts on testing of hypotheses to be used in this unit
Null hypothesis: The hypotheses that are proposed with the intent of receiving a rejection for
them are called null hypotheses. This requires that we hypothesize the opposite of what is
desired to be proved. For example, if we want to show that sales and advertisement
expenditure are related, we formulate the null hypothesis that they are not related. If we want
to prove that the average wages of skilled workers in town 1 is greater than that of town 2, we
formulate the null hypotheses that there is no difference in the average wages of the skilled
workers in both the towns. A null hypothesis is denoted by H0

Alternative hypotheses: Rejection of null hypotheses leads to the acceptance of alternative


hypotheses. The rejection of null hypothesis indicates that the relationship between variables
(e.g., sales and advertisement expenditure) or the difference between means (e.g., wages of
skilled workers in town 1 and town 2) or the difference between proportions have statistical
significance and the acceptance of the null hypotheses indicates that these differences are due
to chance. The alternative hypotheses are denoted by H1.

One-tailed and two-tailed tests: A test is called one-sided (or one-tailed) only if the null
hypothesis gets rejected when a value of the test statistic falls in one specified tail of the
distribution. Further, the test is called two-sided (or two-tailed) if null hypothesis gets
rejected when a value of the test statistic falls in either one or the other of the two tails of its
sampling distribution.
For example, consider a soft drink bottling plant which dispenses soft drinks in bottles
of 300 ml capacity. The bottling is done through an automatic plant. An overfilling of bottle
(liquid content more than 300 ml) means a huge loss to the company given the large volume
of sales. An under-filling means the customers are getting less than 300 ml of the drink when
they are paying for 300 ml. This could create a bad reputation of the company. The company
wants to avoid both overfilling and under-filling. Therefore, it would prefer to test the
hypothesis whether the mean content of the bottles is different from 300ml. This hypothesis
could be written as:

H0 μ = 300 ml
H1   μ = 300 ml
The hypotheses stated above are called two-tailed or two-sided  hypotheses. However, if the
concern is the overfilling of bottles, it could be stated as:

90 | P a g e
 
H0: μ = 300 ml
H1: μ > 300 ml

Such hypotheses are called one-tailed or one-sided hypotheses and the researcher would be
interested in the upper tail (right hand tail) of the distribution. If however, the concern is loss
of reputation of the company (under-filling of the bottles), the hypothesis may be stated as:
H0 : μ = 300 ml
H1: μ < 300 ml
The hypothesis stated above is also called one-tailed test and the researcher would be
interested in the lower tail (left hand tail) of the distribution.

Type I and type II errors: The acceptance or rejection of a hypothesis is based upon sample
results and there is always a possibility of a sample not being representative of the
population. This could result in errors, as a consequence of which inferences drawn could be
wrong. 

If null hypothesis H0 is true and is accepted or H0 when false is rejected,the decision is


correct in either case. However, if the hypothesis H0  is rejected when it is actually true, the
researcher is committing what is called a Type I error. The probability of committing a Type
I error is denoted by alpha (α). This is termed as the level of significance. 
Similarly, if the null hypothesis H0 when false is accepted, the researcher is committing an
error called Type II error. The probability of committing a Type II error is denoted by beta
(β). The expression 1 – β is called power of test.

Steps in Testing of Hypothesis Exercise


The following steps are followed in the testing of a hypothesis:

Setting up of a hypothesis: The first step is to establish the hypothesis to be tested. As you
know, these statistical hypotheses are generally assumptions about the value of the population
parameter; the hypothesis specifies a single value or a range of values for two different
hypotheses rather than constructing a single hypothesis. These two hypotheses are generally
referred to as—(1) the null hypotheses denoted by H0
and (2) alternative hypothesis denoted by H1 .

The null hypothesis is the hypothesis of the population parameter taking a specified value. In
case of two populations, the null hypothesis is of no  difference or the difference taking a
specified value. The hypothesis that different from the null hypothesis is the alternative
hypothesis. If the null hypothesis H0  is rejected based upon the sample information, the
alternative hypothesis H1 is accepted. Therefore, the two hypotheses are constructedin such a
way that if one is true, the other one is false and vice versa.

Setting up a suitable significance level: The next step in the testing of hypothesis exercise is
to choose a suitable level of significance. The level of significance denoted by α is chosen
before drawing any sample. The level of significance denotes the probability of rejecting the
null hypothesis when it is true. The value of α varies from problem to problem, but usually it
is taken as either 5 per cent or 1 per cent. A 5 per cent level of significance means that there

91 | P a g e
are 5 chances out of hundred that a null hypothesis will get rejected when it should be
accepted. When the null hypothesis is rejected at any level of significance, the test result is
said to be significant. Further, if a hypothesis is rejected at 1 per cent level, it must also be
rejected at 5 per cent significance level.

Determination of a test statistic: The next step is to determine a suitable test statistic and its
distribution. As would be seen later, the test statistic could be t, Z, χ2 or F, depending upon
various assumptions to be discussed later in the book.
Determination of critical region: Before a sample is drawn from the population, it is very
important to specify the values of a test statistic that will lead to rejection or acceptance of the
null hypothesis. The one that leads to the rejection of null hypothesis is called the critical
region. Given a level of significance, α, the optimal critical region for a two-tailed test
consists of that α/2 per cent area in the right hand tail of the distribution plus that α/2 per cent
in the left hand tail of the distribution where null hypothesis is rejected.
Computing the value of test statistic: The next step is to compute the value of the test statistic
based upon a random sample of size n. Once the value of test statistic is computed, one needs
to examine whether the sample results fall in the critical region or in the acceptance region.
Making a decision: The hypothesis may be rejected or accepted depending upon whether the
value of the test statistic falls in the rejection or the acceptance region. Management decisions
are based upon the statistical decision of either rejecting or accepting the null hypothesis.
In case a hypothesis is rejected, the difference between the sample statistic and the
hypothesized population parameter is considered to be significant. On the other hand, if the
hypothesis is accepted, the difference between the sample statistic and the hypothesized
population parameter is not regarded as significant and can be attributed to chance.
Test Concerning Means – Case of Single Population
 In this section, a number of illustrations will be taken up to explain the test of hypothesis
concerning mean. Two cases of large samples and small samples will be taken up.
In case of large sample
As mentioned earlier, in case the sample size n is large or small but the value of the
population standard deviation is known, a Z test is appropriate. There can be alternate cases
of two-tailed and one-tailed tests of hypotheses. Corresponding to the null hypothesis H0: μ =
μ0, the following criteria could be used as shown in Table 10.2.

The test statistic is given by,

Where,
X = Sample mean
σ = Population standard deviation
μH₀ =  The value of μ under the assumption that the null hypothesis is true.
n = Size of sample.
Table 10.2 Criteria for accepting or rejecting null hypothesis under different cases of
alternative hypotheses

92 | P a g e
If the population standard deviation σ is unknown, the sample standard deviation is used as
an estimate of σ. It may be noted that Zα and Zα/2 are Z values such that the area to the right
under the standard normal distribution is α and α /2 respectively. Below are solved examples
using the above concepts:

Example 10.1:
A sample of 200 bulbs made by a company gives a lifetime mean of 1540 hours with a
standard deviation of 42 hours. Is it likely that the sample has been drawn from a population
with a mean lifetime of 1500 hours? You may use 5 per cent level of significance.
Solution:
In the above example, the sample size is large (n=200), Sample mean(X) equals 1540 hours
and the sample standard deviation (s) is equal to 42 hours. Then null and alternative
hypotheses can be written as:

It is a two-tailed test with level of significance (α) to be equal to 0.05. Since n is large (n >
30), though population standard deviation μ is unknown, one can use Z test. The test statistics
are given by:

Where,
µHo= Value of µ under the assumption that the null hypothesis is true
 = Estimated standard error of mean

The value of α = 0.05 and since it is a two-tailed test, the critical valueZ is given by –Zα/2
and Zα/2
which could be obtained from the standard normal table given in Appendix 1 at the end of the
book.
Rejection regions for Example 10.1
 Since the computed value of Z = 13.47 lies in the rejection region, the null hypothesis is
rejected. Therefore, it can be concluded that the average life of the bulb is significantly
different from 1500 hours.
Example 10.2: On a typing test, a random sample of 36 graduates of a secretarial school
averaged 73.6 words with a standard deviation of 8.10 words per minute. Test an employer's
claim that the school's graduates average less than 75.0 words per minute using the 5 per cent
level of significance.
Solution:
H0  : µ = 75
H1  : µ < 75

93 | P a g e
X = 73.6, s = 8.10, n = 36 and α = 0.05. As the sample size is large (n > 30), though
population standard deviation μ is unknown, Z test is appropriate. The test statistic is given
by:
Since it is a one -tailed test and the interest is in the left hand tail of the distribution, the
critical value of Z is given by – Zα= –1.645. Now, the computedvalue of Z lies in the
acceptance region, and the null hypothesis is accepted

In case of small sample


In case the sample size is small (n ≤ 30) and is drawn from a population having a normal
population with unknown standard deviation μ, a t test is used to conduct the hypothesis for
the test of mean. The t distribution is a symmetrical distribution just like the normal one.
However, t distribution is higher at the tail
 
and lower at the peak. The t distribution is flatter than the normal distribution. With an
increase in the sample size (and hence, degrees of freedom), t distribution loses its flatness
and approaches the normal distribution whenever n > 30. 

The procedure for testing the hypothesis of a mean is similar to what is explained in the case
of a large sample. The test statistic used in this case is:
A few examples pertaining to 't' test are worked out for testing the hypothesis of mean in case
of a small sample.

Example 10.3:

Prices of share (in `) of a company on the different days in a month were found to be 66, 65,
69, 70, 69, 71, 70, 63, 64 and 68. Examine whether the mean price of shares in the month is
different from 65. You may use 10 per cent level of significance.

Solution :
Since the sample size is n = 10, which is small, and the sample standard deviation is
unknown, the appropriate test in this case would be t. First of all, we need to estimate the
value of sample mean (X) and the standard deviation (s). It is known that the sample mean
and the standard deviation are given by the following formula:
 The test statistic is given by:

The critical values of t with 9 degrees of freedom for a two-tailed test are given by –1.833
and 1.833. Therefore, the average price of the share of the company is different from 65.
Tests for Difference between Two Population Means

94 | P a g e
So far, we have been concerned with the testing of means of a single population. We took up
the cases of both large and small samples. It would be interesting to examine the difference
between the two population means. Again, various cases would be examined as discussed
below:
In case of large sample
In case both the sample sizes are greater than 30, a Z test is used. The hypothesis to be tested
may be written as:

Where,
µ1= mean of population 1 µ = mean of population 2
The above is a case of two-tailed test. The test statistic used is:
X =Mean of sample drawn from population X =Mean of sample drawn from population n1=
size of sample drawn from population 1 n2= size of sample drawn from population 2
If and σ2 are unknown, their estimates given by sˆ 1 and sˆ 2 are used.
 
The Z value for the problem can be computed using the above formula and compared with
the table value to either accept or reject the hypothesis. Let us consider the following
problem:
Example 10.4: A study is carried out to examine whether the mean hourly wages of the
unskilled workers in the two cities—Ambala Cantt. and Lucknow are the same. A random
sample of hourly earnings in both the cities is taken and the results are presented in the Table
10.4.
  Table 10.4 Survey Data on Hourly Earnings in two Cities          

City Sample Mean Hourly Earnings Standard Deviation of Sample Sample


Size
Ambala Cantt ` 8.95 ( X1 ) 0.40 (s1) 200 (n1)
Lucknow ` 9.10 ( X 2 ) 0.60 (s2) 175 (n2)
Using a 5 per cent level of significance, test the hypothesis of no difference in the average
wages of unskilled workers in the two cities.
Solution : We use subscripts 1 and 2 for Ambala Cantt and Lucknow respectively.

As the problem is of a two-tailed test, the critical values of Z at 5 per cent level of

95 | P a g e
significance are given by –Zσ/2= –1.96 and  Zσ/2=  1.96.  The  sample  value of Z = –2.83
lies in the rejection region.
 
In case of small sample
If the sizes of both samples are less than 30 and the population standard deviation is
unknown, the procedure described above to discuss the equality of two population means is
not applicable in the sense that a t test would be applicable under the assumption that two
population variances are equal.
If the two population variances are equal, it implies that their respective
unbiased estimates are also equal. In such a case, the expression becomes:

To get an estimate of sˆ 2  , a weighted average of s12   and S22   is used, where the weights
are the number of degrees of freedom of each sample. The weighted average is called a
'pooled estimate' . This pooled estimate is given by the expression:
 
Once the value of t statistic is computed from the sample data, it is compared with the
tabulated value at the level of significance α to arrive at a decision regarding the acceptance
or rejection of hypothesis. Let us work out a problem
 illustrating the concepts defined above.
Example 10.5: Two drugs meant to provide relief to arthritis patients were produced in two
different laboratories. The first drug was administered to a group of 12 patients and produced
an average of 8.5 hours of relief with a standard deviation of 1.8 hours. The second drug was
tested on a sample of 8 patients and produced an average of 7.9 hours of relief with a
standard deviation of 2.1 hours. Test the hypothesis that the first drug provides a significantly
higher period of relief. You may use 5 per cent level of significance.

The critical value of t with 18 degrees of freedom at 5 per cent level of significance is given
by 1.734. The sample value of t = 0.685 lies in the acceptance region 
Therefore, the null hypothesis is accepted as there is not enough evidence to reject it. So, one
may conclude that the first drug is not significantly more effective than the second drug.

Activity 2:
From an IT company, take a random sample of ten male and female software engineers with
two years of work experience. Test the hypotheses that there is no significant difference in
their average salaries at 5 per cent level of significance.
Hint: Refer to Section 10.2.1. While testing the hypotheses, you can follow the below steps:

96 | P a g e
a. State the hypotheses
b. Formulate an analysis plan
c. Analyse sample data
d. Interpret results
Tests Concerning Population Proportion—Case of Single Population
 
We have already discussed the tests concerning population means. In the tests about
proportion, one is interested in examining whether the respondents possess a particular
attribute or not.
The random variable in such a case is a binary one in the sense that it takes only two values—
yes or no. As we know that either a student is a smoker or not, a consumer either uses a
particular brand of product or not and lastly, a skilled worker may be either satisfied or not
with the present job. At this stage it may be recalled that the binomial distribution is a
theoretically correct distribution to use while dealing with proportions. Further, as the sample
size increases, the binomial distribution approaches the normal distribution in characteristic.
To be specific, whenever both np and nq (where n = number of trials, p = probability of
success and q = probability of failure) are at least 5, one can use the normal distribution as a
substitute for the binomial distribution.

The case of single population proportion


Suppose we want to test the hypotheses, For a given level of significance α, the computed
value of Z is compared with the corresponding critical values, i.e. Zα/2 or –Zα/2 to accept or
reject the null hypothesis. We will consider a few examples to explain the testing procedure
for a single population proportion.

Example 10.6: An officer of the health department claims that 60 per cent of the male
population of a village smokes. A random sample of 50 males showed that 35 of them were
smokers. Are these sample results consistent with the claim of the health officer? Use a level
of significance of 0.05.

 It is a one-tailed test. For a given level of significance  α  =  0.05,  the  critical value of Z is
given by Zα = Z0.05 =1.645 It is seen that the sample value of Z = 1.44 lies in the acceptance
region .
  Therefore, there is not enough evidence to reject the null hypothesis. So it can be concluded
that the proportion of male smokers is not statistically different from 0.60.

Test for Difference between Two Population Proportions

97 | P a g e
Here, we need to test whether the two population proportions are equal or not. The
hypothesis under investigation is:
The alternative hypothesis assumed is two-sided. It could as well have been one-sided. The
test statistic is given by:

Now, for a given level of significance α, the sample Z value is compared with the critical Z
value to accept or reject the null hypothesis. We consider below a few examples to illustrate
the testing procedure described above.

Example 10.7: A company is interested in considering two different television advertisements


for the promotion of a new product. The management believes that advertisement A is more
effective than advertisement B. Two test market areas with virtually identical consumer
characteristics are selected. Advertisement A is used in one area and advertisement B in the
other area. In a random sample of 60 consumers who saw advertisement A, 18 tried the
product. In a random sample of 100 customers who saw advertisement B, 22 tried the
product. Does this indicate that advertisement A is more effective than advertisement B, if a 5
per cent level of significance is used?
Solution :

The critical value of Z at 5 per cent level of significance is 1.645. The sample value Z = 1.13,
lies in the acceptance region

Summary
Let us recapitulate the main points discussed in this unit:

 A hypothesis is a statement or an assumption regarding a population, which may or may not be


true.
 The sequences of steps that need to be followed for the testing of hypothesis are: setting up of a
hypothesis, setting up of a suitable significance level, determination of a test statistic, determination
of critical region, computing the value of test-statistic and making a decision.
 In the test procedure for a single population mean or for examining the equality of two
population means, for large samples, a Z test is appropriate whereas for the small samples, a t test is
used in the two cases where: (i) population variances are equal and (ii) population variances are not
equal.
 In the testing procedures concerning the proportion of a single population and the difference
between two population proportions, the hypotheses concerning them are carried out using a Z test
under the assumption that the normal distribution could be used as an approximation to the
binomial distribution for a large sample.

UNIT 11 :
STRUCTURE
 Introduction
 A Chi-Square Test for the Goodness of Fit
 A Chi-Square Test for the Independence of Variables
 A Chi-Square Test for the Equality of More than Two Population Proportions
 Summary

98 | P a g e
 Keywords

Introduction

In the last unit, we discussed the Z test for the equality of two population proportions.
Now, in case we have more than two populations and want to test the equality of all of them
simultaneously, it is not possible to do it using the Z test. This is because the Z test can
examine the equality of only two proportions at a time. In such a situation, the chi-square test
can come to the rescue and carry out the test in one go.
The chi-square test is widely used in research. For the use of chi- square test,
data is required in the form of frequencies. Data expressed in percentages or proportion can
also be used, provided it could be converted into frequencies. The majority of the applications
of chi-square (χ2) are with discrete data. The test could also be applied to continuous data,
provided it is reduced to certain
categories and tabulated in such a way that the chi- square may be applied.

Some of the important properties of the chi-square distribution are:


 Unlike the normal and t distribution, the chi-square distribution is not symmetric.
 The values of a chi-square are greater than or equal to zero.
 The shape of a chi-square distribution depends upon the degrees of freedom. With the
increase in degrees of freedom, the distribution tends to normal.
 There are many applications of a chi-square test. Those mentioned below will be
discussed in this unit:
 A chi-square test for the goodness of fit
 A chi-square test for the independence of variables
 A chi-square test for the equality of more than two population proportions.

A Chi-Square Test for the Goodness of Fit


As discussed before, the data in chi-square tests is often in terms of counts or
frequencies. The actual survey data may be on a nominal or higher scale of measurement. If it
is on a higher scale of measurement, it can always be converted into categories. The real
world situations in business allow for the collection of count data, e.g., gender, marital status,
job classification, age and income. Therefore, a chi-square becomes a much sought after tool
for analysis. The researcher has to decide what statistical test is implied by the chi-square
statistic in a particular situation. Below are discussed common principles of all chi-square
tests. The principles are summarized in the following steps:
 State the null and alternative hypothesis about a population.
 Specify a level of significance.

99 | P a g e
 Compute the expected frequencies of the occurrence of certain events under the
assumption that the null hypothesis is true.
 Make a note of the observed counts of the data points falling in different cells
 Compute the chi-square value given by the formula:
 Where,
Oi  = Observed frequency of ith cell
Ei  = Expected frequency of ith cell
k = Total number of cells
k – 1 = degrees of freedom

 Compare the sample value of the statistic as obtained in the previous step with the
critical value at a given level of significance and make the decision.
A goodness of fit test is a statistical test of how well the observed data supports the
assumption about the distribution of a population. The test also examines how well an
assumed distribution fits the data. Many times, the researcher assumes that the sample is
drawn from a normal or any other distribution of interest. A test of how normal or any other
distribution fits a given data may be of some interest. Consider, for example, the case of the
multinomial experiment which is the extension of a binomial experiment. In the multinomial
experiment, the number of the categories k is greater than 2. Further, a data point can fall into
one of the k categories and the probability of the data point falling in the i category is a
constant and is denoted by pi where i = 1, 2, 3, 4, ..., k. In summary, a multinomial
experiment has the following features:
 There are fixed number of trials.
 The trials are statistically independent.
 All the possible outcomes of a trial get classified into one of the several categories.
 The probabilities for the different categories remain constant for each trial.
Consider as an example that a respondent can fall into any one of the four non- overlapping
income categories. Let the probabilities that the respondent will fall into any of the four
groups may be denoted by the four parameters P1, P2,Ps, and P¢. Given these, the
multinomial distribution with these parameters, and the number of people in a random
sample, specifies the probabilities of any combination of the cell counts.
Given such a situation, we may use a multinomial distribution to test how well the data fits
the assumption of K probability P1, PZ,...PK of falling into the k cells. the hypothesis to be
tested is:
Ho: Probabilities of the occurrence of events E1, Ez,...Ek are given by specified probabilities
P1, Pz,...Pk
H1: Probabilities of the k events are not the pi Stated in the null hypothesis.
Such hypothesis could be tested using the chi-square statistics. Three are given in a set of
illustrated example

100 | P a g e
Example 11.1: The manager of ABC ice-cream parlour has to take a decision regarding how
much of each flavour of ice-cream should he stock so that the demands of the customers are
satisfied. The ice-cream suppliers claim that among the four most popular flavours, A2 per
cent customers prefer vanilla, 18 per cent chocolate, 12 per cent strawberry and 8 per cent
mango. A random sample of 2OO CUSTOMERS produces the results as given below. At the
α = o.oc significance level, test the claim that the percentages given by the supplies are
correct.

Flavour                    Vanilla    Chocolate Strawberry Mango


Number preferring      120         40               18            22

Solution:
Let
Pv: proportion of customers preferring vanilla flavour
Pc  : proportion of customers preferring chocolate flavour
Ps : proportion of customers preferring strawberry flavour
Pm: proportion of customers preferring mango flavour

Ho = Pv = 0.62,Pc = 0.18,Ps = 0.12,Pm = 0.08


H1: Proportions are not that specified in the null hypothesis
The expected frequencies corresponding to the various flavours under the assumption that the
null hypothesis is true are:
Vanilla = 200 × 0.62 = 124
Chocolate = 200 × 0.18 = 36
Strawberry = 200 × 0.12 = 24
Mango = 200 × 0.08 = 16

Flavour O (Observed Frequencies) E (Expected Frequencies) O – E (O – E)2


(O - E )2 /E
Vanilla                        120                           124                                 – 4              16   
0.129
Chocolate                  40                             36               4              16   
0.444
Strawberry                  18                             24                          – 6               36   
1.500
Mango                          22                             16               6               36   
2.250

101 | P a g e
Total                                                                                                                                      
4.323
Table c2 3 (5 per cent) = 9.488        
As sample χ2 lies in the acceptance region, accept H0 . Therefore, the customer preference
rates are as stated.
It may be worth pointing out that for the application of a chi-square test, the expected
frequency in each cell should be at least 5.0. In case it is found that one or more cells have the
expected frequency less than 5, one could still carry out the chi-square analysis by combining
them into meaningful cells so that the expected number has a total of at least 5. Another point
worth mentioning is that the degree of freedom, usually denoted by df in such cases, is given
by k – 1, where k denotes the number of cells (categories).
It may be noted that in Example 11.1, the hypothesized probabilities were not equal. There
are situations where the hypothesized probabilities in each category are equal or in other
words, the interest is in investigating the uniformity of the distribution. The following
example illustrates this.
Example  11.2:  An  insurance  company  provides  auto  insurance  and  is analysing the data
obtained from fatal crashes. A sample of the motor vehicle deaths is randomly selected for a
two-year period. The number of fatalities is listed below for the different days of the week. At
the 0.05 significance level, test the claim that accidents occur on different days with equal
frequency.

Day                                    Monday Tuesday Wednesday Thursday


Friday Saturday Sunday
Number of fatalities                  31               20         20                22    22    
29           36

Monday (P1) = 180 X Ho = 25714


Monday (P2) = 180 X Ho = 25714
Wednesday (Ps) = 180 X Ho = 25714
Thursday (Pa) = 180 X Ho = 25714
Friday (Pc) = 180 X Ho = 25714
Saturday (Pa) = 180 X Ho = 25714
Sunday (PF) = 180 X Ho = 25714
The computation of sample chi-square value is given in the following table:

Day           Observed Frequencies (O) Expected Frequencies (E) O – E (O – E)2


(O – E)2 E

102 | P a g e
Monday                  31                                      25.714                  5.286  
27.942    1.087
Tuesday                  20                                      25.714               – 5.714  
32.650    1.270
Wednesday          20                                      25.714               – 5.714   
32.650    1.270
Thursday                  22                                      25.714               – 3.714   
13.794    0.536
Friday                  22                                      25.714                – 3.714     13.794
   0.536
Saturday                  29                                      25.714                    3.286   
10.798    0.420
Sunday                  36                                      25.714                   10.286  
105.802    4.114
Total                                                                                                                                  
9.233
Degrees of freedom = 7 – 1 = 6 Critical (Table) = 12.592

Since the sample chi-square value is less than the tabulated χ2, there is not enough evidence
to reject the null hypothesis 
A Chi-Square Test for Independence of Variables 
The chi-square test can be used to test the independence of two variables each having
at least two categories. The test makes use of contingency tables, also referred to as cross-
tabs with the cells corresponding to a cross classification of attributes or events.

Assuming that there are r rows and c columns, the count in the cell corresponding to the
ithrow and the jthcolumn is denoted by Oij , where i = 1,2, ..., r and j = 1, 2, ..., c. The total
for row i is denoted by Ri  whereas that corresponding to column j is denoted by Cj . The
total sample size is given by n, which is also the sum of all the r row totals or the sum of all
the columns
The hypothesis test for independence is
H0: Row and column variables are independent of each other.
H1: Row and column variables are not independent.
The hypothesis is tested using a chi-square test statistic for independence given by:

The degrees of freedom for the chi-square statistic are given by (r – 1) (c – 1).

103 | P a g e
For a given level of significance α, the sample value of the chi-square is compared with the
critical value for the degree of freedom (r – 1) (c – 1) to make a decision.
The expected frequency in the cell corresponding to the ith row and the jthcolumn is given
by:

Example 11.3: A sample of 870 trainees was subjected to different types of training classified
as intensive, good and average and their performance was noted as above average, average
and poor. The resulting data is presented in the table below. Use a 5 per cent level of
significance to examine whether there is any relationship between the type of training and
performance

Performance Training
                  Intensive Good Average Total
Above average 100    150             40 290
Average                 100         100            100 300
Poor                   50            80            150 280
Total                  250          330             290 870
Solution:
H0: Attribute performance and the training are independent.
H1: Attribute performance and the training are not independent
The expected frequencies corresponding to the ith row and the jth column in the contingency
table are denoted by Eij , where i = 1, 2, 3 and j = 1, 2, 3.

E1,1 = 290x 250 = 83.33


E1,z = 290x 330 = 110.00
E1,Z= 290x 290 = 96.67
Ez,1 =  300x 250 = 86.21
Ez,z = 300x 330 = 113.79
EZ,Z=300x 290 = 100.00
EZ,1= 280x 250 = 80.46
EZ,Z= 280x 230 = 106.21
EZ,Z= 280x 290 = 93.33

Row, Column Oij Eij     (Oij – Eij)2 (Oij - Eij )2 /Eij


104 | P a g e
1,1                 100 83.33 277.89 3.335
1,2                 150 110.00 1600.00 14.545
1,3                  40 96.67 3211.49 33.221
2,1                 100  86.21 190.16 2.21
2,2                 100 113.79 190.16 1.671
2,3                 100 100.00       0 0.000
3,1                   50   80.46 927.81 11.53
3,2                   80 106.21 686.96 6.468
3,3                 150 93.33 3211.49 34.41
Total                                                 107.39

The critical value of the chi-square at 5 per cent level of significance with 4
degrees of freedom is given by 9.49. The sample value of the chi- square falls in the rejection
region as shown in the figure below.

Therefore, the null hypothesis is rejected and one can conclude that there is an association
between the type of training and performance.

A Chi-Square Test for the Equality of More than Two Population Proportions

  In certain situations, the researchers may be interested to test whether the proportion
of a particular characteristic is the same in several populations. The interest may lie in finding
out whether the proportion of people liking a movie is the same for the three age groups —
twenty-five and under, over twenty-five and under fifty, and fifty and over. To take another
example, the the satisfied employees in four categories—class I, class II, class III and class
IV employees—is the same. In a sense, the question of whether the proportions are equal is a
question of whether the three age populations of different categories are homogeneous with
respect to the characteristics being studied. Therefore, the tests for equality of proportions
across several populations are also called tests of homogeneity.
The analysis is carried out exactly in the same way as was done for the other two cases. The
formula for a chi-square analysis remains the same. However, two important assumptions
here are different.
(I) We identify our population (e.g., age groups or various classes of employees) and the
sample directly from these populations.
(ii) As we identify the populations of interest and the sample from them directly, the sizes of
the sample from different populations of interest are fixed. This is also called a chi-square
analysis with fixed marginal totals. The hypothesis to
tested is as under:

105 | P a g e
H0: The proportion of people satisfying a particular characteristic is the same in population.
H1: The proportion of people satisfying a particular characteristic is not the same in all
populations.
The expected frequency for each cell could also be obtained by using the formula as
explained earlier. There is an alternative way of computing the same, which would give
identical results. This is shown in the following example:
Example 11.5: An accountant wants to test the hypothesis that the proportion of incorrect
transactions in four client accounts is about the same. A random sample of 80 transactions of
one client reveals that 21 are incorrect; for the second client, the number is 25 out of 100; for
the third client, the number is 30 out of 90 sampled and for the fourth, 40 are incorrect out of
a sample of 110. Conduct the test at α = 0.05.
Let
p1 = Proportion of incorrect transaction for 1st client 
p2 = Proportion of incorrect transaction for 2nd client 
p3 = Proportion of incorrect transaction for 3rd client 
p4 = Proportion of incorrect transaction for 4th client

H0 = p1 = p2 = p3  = p4
H1  : All proportions are not the same The observed data in the problem can be rewritten as:

Transactions                Client 1 Client 2 Client 3 Client 4 Total


Incorrect transactions     21     25    30             40 116
Correct transactions     59     75    60             70 264
Total                             80     100    90           110 380

frequencies in each cell would be the same using the formula Eij = Ri ÌCj n as already
explained. Now the value of the chi-square statistic can be calculated as:

The critical value of the chi-square with 3 degrees of freedom at 5 per cent level of
significance equals 7.815. Since the sample value of χ2 is less than the critical value, there is
not enough evidence to reject the null hypothesis. Therefore, the null hypothesis is accepted.
Therefore, there is no significant difference in the proportion of incorrect transaction for the
four clients.

Summary

106 | P a g e
 Let us recapitulate the main points discussed in this unit:

 Chi-square test has a variety of applications in research. Chi-square is a non-


symmetrical distribution taking non-negative values.
 It can be used to test the goodness of fit of a distribution, independence of variables
and equality of more than two population proportions.
 A necessary condition for the application of chi-square test is that the expected
frequency in each cell should be at least 5.
 The first and foremost thing for the application of chi-square is the computation of
expected frequencies.
 The data in chi-square test is in terms of counts or frequencies. In case the actual
data is on a scale higher than that of nominal or ordinal, It can always be converted into
categories

UNIT 12 : ANALYSIS OF VARIANCE

STRUCTURE

 Introduction
 Completely Randomized Design in a One-Way ANOVA
 Randomized Block Design in Two-Way ANOVA
 Factorial Design
 Summary
 Keywords

INTRODUCTION
In Unit 10, we discussed the test of hypothesis concerning the equality of two
population means using both the Z and t tests. However, if there are more than two
populations, the test for the equality of means could be carried out by considering two
populations at a time. This would be a very cumbersome procedure. One easy way out could
be to use the analysis of variance (ANOVA) technique. The technique helps in performing
this test in one go and, therefore, is considered to be an important technique for analysis for
the researcher. Through this technique it is possible to draw inferences on whether the
samples have been drawn from populations having the same mean.
The technique has found applications in the fields of economics, psychology,
sociology, business and industry. It proves handy in situations where we want to compare the
means of more than two populations. Some examples could be to compare:

 The mean cholesterol content of various diet foods


 The average mileage of, say, five automobiles
 The average telephone bill of households belonging to four different income groups,
and so on

 R.A. Fisher developed the theory concerning ANOVA. The basic principle underlying the
technique is that the total variation in the dependent variable is broken into two parts—one
which can be attributed to some specific causes and the other that may be attributed to

107 | P a g e
chance. The one which is attributed to specific causes is called the variation between samples
and the one which is attributed to chance is termed as the variation within samples.
Therefore, in ANOVA, the total variance may be decomposed into various components
corresponding to the sources of the variation.
In ANOVA, the dependent variable in question is metric (interval or ratio scale), whereas the
independent variables are categorical (nominal scale). If there is one independent variable
(one factor) divided into various categories, we have one-way or one-factor analysis of
variance. In the two- way or two-factor analysis of variance, two factors each divided into the
various categories are involved.
In ANOVA, it is assumed that each of the samples is drawn from a normal population and
each of these populations has an equal variance. Another assumption that is made is that all
the factors except the one being tested are controlled (kept constant). Basically, two estimates
of the population variances are made. One estimate is based upon between the samples and
the other one is based upon 'within the samples'. The two estimates of variances can
compared for their equality using F statistic.

Completely Randomized Design in a One-Way ANOVA


Completely randomized design involves the testing of the equality of means of two or
more groups. In this design, there is one dependent variable and one independent variable.
The dependent variable is metric (interval/ratio scale) whereas the independent variable is
categorical (nominal scale). A sample is drawn at random from each category of the
independent variable. The size of the sample from each category could be equal or different.

Randomized Block Design in Two-Way ANOVA


It could not be shown that there really is a significant difference in the average
cholesterol content of the four diet foods. The results were not statistically different because
there was a considerable difference in the values within each of the samples resulting in a
large experimental error. However, we have additional information that each value was
randomly measured in three different laboratories in such a way that the first value of each
sample came from laboratory 1, the second value from laboratory 2, and the third value from
laboratory 3 (the random assignment of test units to labs). In such a case, a two-way analysis
of variance is suggested.

We had earlier partitioned the total sum of squares into two components—one which is due
to the differences between the sample (treatment sum of squares) and the other one due to the
differences within the samples (error sum of squares). Now, the error sum of squares includes
the sum of squares due to laboratories (called blocks) as an extraneous factor.
In two-way analysis of variance, we remove the effect of the extraneous factors (laboratories
or blocks) from the error sum of squares. Therefore, the total sum of squares is partitioned
into three components—one due to treatment, second due to block and the third one due to
chance (called the error sum of squares). It may be noted that the total sum of squares (TSS)
and the treatment sum of squares (TrSS) would remain the same as computed earlier in 

108 | P a g e
Factorial Design
In factorial design, the dependent variable is the interval or the ratio scale and there
are two or more independent variables which are nominal scale. In the factorial design, it is
possible to examine the interaction between the variables. If there are two independent
variables each having three cells, there would be a total of nine interactions. The details on
this are already explained in Unit 3 (Research Design). Let us consider an example to explain
factorial design.
It is generally observed that there are differences in the pay packages offered to fresh MBA
graduates. The variations could be either due to the type of business school where they have
studied or it could be due to their area of specialization. The variation can also be due to an
interaction between the business school and the area of specialization. For example,
specialization in finance from a particular business school might fetch a better package.
Summary
Let us recapitulate the main points discussed in this unit:

 R.A. Fisher developed the theory of analysis of


variance. This technique could be used to test the equality of more than two population
means in one go. The basic principle underlying the technique is that the total
variations in the dependent variable can be broken into two components—one which
can be attributed to specific causes and the other one may be attributed to chance. In
analysis of variance, the dependent variable is metric, whereas the independent variable
is categorical (nominal scale).
 The analysis of variance techniques in this unit are
illustrated through
 completely randomized design, randomized block
design and factorial design.
 In a completely randomized design, there is one
dependent and one
 independent variable. The dependent variable is metric
whereas the independent variable is categorical. Random samples are drawn from each
category of the independent variable. The sample size from each category could be the
same or different.
 In the randomized block design, there is one
independent variable and one extraneous factor (block). Both independent variable and
extraneous factor (block) are nominal scale variables. The effect of the extraneous
factor is
 removed from the analysis.
 In factorial design, the dependent variable is metric and
there are two or more independent variables which are non-metric. In this design, it is
possible to examine the interaction between the variables. If there are two independent
variables each having three cells, there would be a total of nine interactions.

Keywords

109 | P a g e
 Analysis of variance: A technique used to compare means of two or more samples
(using the F distribution). This technique can be used only for numerical data
 Completely randomized design: A design that involves the testing of the equality of
means of two or more groups; there is one dependent variable and one independent
variable in this design
 Factorial design: A design for an experiment that allows the experimenter to find out
the effect of two or more independent variables each having two or more categories
along with their interactions on dependent variable
 One-way ANOVA: A technique that compares the mean of two or more groups
based on one independent variable (or factor)
 Two-way ANOVA: A statistical test used to determine the effect of two nominal
predictor variables on a continuous outcome variable. A two-way ANOVA test
analyses the effect of the independent variables on the expected outcome along with
their relationship to the outcome itself.

UNIT 13 :CORRECTION & REGRETION ANALYSIS

Structure

 Introduction
 Concept of Correlation
 Quantitative Estimate of a Linear Correlation
 Testing the Significance of the Correlation Coefficient
 Regression Analysis
 Test of Significance of Regression Parameters
 Goodness of Fit of Regression Equation
 Uses of Regression Analysis in Prediction
 Summary
 Keywords

Introduction
Correlation and regression analysis are generally performed together. Correlation
measures the degree of association between two or more set of variables. Regression, on the
other hand, is used to explain the variations in one variable—usually called the dependent
variable—by a set of independent variables. It identifies the nature of the relationship. The
number of independent variables in regression analysis could be one or more. In case of one
independent variable, we classify it as a simple regression, whereas in case of more than one
independent variable, it is called a multiple regression analysis.
In this unit, you will study the importance of correlation and regression analysis in research
methodology, with a focus on quantitative estimate of a linear coefficient, the significance of
correlation coefficient and regression parameters, and goodness of fit to fregression equation.
Concept of Correlation
Correlation measures the degree of association between two or more variables. When
we are dealing with two variables, we are talking in terms of simple correlation and when
more than two variables are involved, the subject matter of interest is called multiple

110 | P a g e
correlation. In this unit, we will discuss simple correlation. There are three types of
correlation:

1. Positive correlation: When two variables X and Y move in the same direction, the
correlation between the two is positive. If one variable increases, the other variable also
increases, and if one variable decreases, the other variable also decreases.

The examples of positive correlation are: a particular quantity of a commodity supplied


and the price of the commodity, the sales revenue and the advertising expenditure,
consumption expenditure and the disposable income. (Example)

2. Negative correlation: When two variables X and Y move in the opposite direction,
the correlation is negative. If one variable increases, the other. decreases, and vice
versa. The examples of negative correlation are usually the quantity demanded and the
price of the commodity. The scatter of the points on the variables X and Y is clustered
around a negatively sloped straight line/curve in such a situation, as shown in Figure
13.2. In the figure, we find that the variables X and Y are moving in the opposite
direction.(Example)

3. Zero correlation: The correlation between two variables X and Y is zero when the
variables move in no connection with each other. If the variable X increases, Y may
increase or decrease in different situations. 

Zero correlation does not mean that the variables are not related. We are dealing with a
linear correlation here and there could be a non-linear relation between them.

Quantitative Estimate of a Linear Correlation


A quantitative estimate of a linear correlation between two variables X and Y is given
by English mathematician Karl Person
 
It may be noted that the above-mentioned formulae are for the linear correlation coefficient.
The linear correlation coefficient takes a value between
–1 and +1 (both values inclusive). If the value of the correlation coefficient is equal to 1, the
two variables are perfectly positively correlated and the scatter of the points of the variables
X and Y will lie on a positively sloped straight line. Similarly, if the correlation coefficient
between the two variables X and Y is –1, the scatter of the points of these variables will lie on
a negatively sloped straight line. Such a correlation will be called a perfectly negative

111 | P a g e
correlation. It may be noted that the closer the scatter of points to the line, higher is the
degree of correlation between the variables.

Testing the Significance of the Correlation Coefficient


The statistical test for the significance of a correlation coefficient is conducted using a t-
statistic. 

Regression Analysis
One of the problems with Karl Pearson's formula of correlation coefficient is that it is
applicable only when the relationship between the two variables is linear. There can,
however, be situations when the variables are connected in a non-linear relationship. It may
be noted that zero correlation and the independence of the two variables are not the same
thing. Zero correlation does not mean that the variables are not related. They may be non-
linearly related. However, statistical independence implies that there is a zero correlation
between the variables.

Another problem with the simple correlation coefficient is that it does not indicate
which variable is influencing which one. If, for example, the correlation coefficient between
the variables X and Y is 0.96, it can only be said that the variables X and Y are positively and
highly correlated. We cannot say that whether the variable X influences Y or Y influences X
or there may be a third variable Z which may be influencing both these variables, This results
in a high correlation between X and Y. To overcome this limitation of the correlation
analysis, we have another concept called the regression analysis.

Regression analysis could be used for a variety of purposes in research. It could be


used to test whether an overall relationship exists between the dependent variable and a set of
independent variables (concepts to be explained later). It can also be used to measure the
relative importance of various independent variables in explaining the dependent variable.
The other use of regression analysis is for a prediction of the values of dependent variable,
that is, by knowing the values of the independent variables, one can predict the values of the
dependent variable.

For example, food expenditure in households could be predicted by using family income and
family size as independent variables in regression. In another example, the amount spent by a
consumer at a retail store in the last three months can be explained by the store's location,
prices, credit policy, merchandise quality and speed of service by using the regression
analysis. Likewise, another example could be to predict the sales volume of a photocopier by
using a set of independent variables like the size of sales force, amount of the advertising
budget and the consumer attitudes towards the company's product. Similarly, the willingness
to export the product by the small entrepreneurs could be explained by the employee size,
firm revenue and the years of operation in the domestic market.

112 | P a g e
In regression analysis, it is assumed that there is a variable that is influencing another
variable. For example, we may write,
Y = f (X)
This indicates that the values of Y depend upon the values of X. Further, there is a one-way
causation between X and Y in the sense that it is X which influences the values of Y and not
the other way round. The variable Y is called a dependent variable or an effect variable,
whereas the variable X is called an independent variable, explanatory variable, causal
variable or a regressor. The relationship between Y and X may be assumed to be linear and
we may write the following expression as:
Y=α+βX
The above expression shows that if we have a pair of data on the variables X and Y, the
scatter of all the points between these two variables will lie on a positively or negatively
sloped straight line, depending upon whether the sign of beta (β) is positive or negative. This
means that the correlation coefficient between X and Y will either be +1 or –1. In fact, such a
thing rarely happens. If we plot the data on the variables X and Y on a two- dimensional
plane, all the scatter of points would not lie on either positively or negatively sloped straight
line. This is because the variable Y is not only influenced by the variable X but
 
also by many other variables, which we have ignored for various reasons. The possible
reasons for ignoring those variables could be the non-availability of data or poor knowledge
about the existence of such variables influencing the dependent variable Y or the errors of
measurements in the variables X and Y or the researcher's inability to quantify such variables.
Therefore, to account for those variables which have been omitted for one reason or the other,
a stochastic error term is added to the above equation, which appears as:

Y=α+βX+U

Where,
U = Stochastic error term
α, β = Parameters to be estimated
The above equation is called a simple linear regression equation. This is because there is one
dependent variable and one independent variable. In case of multiple regressions, there are at
least two independent variables. The equation is estimated using the ordinary least squares
(OLS) method of estimation. The OLS method of estimation states that the regression line
should be drawn in such a way so as to minimize the error sum of squares. 
 
In the above expression, n and k denote the sample size and the number of parameters to be
estimated in a given regression. The standard error of estimates indicates how close the

113 | P a g e
scatter of the points is to the regression line. However, this measure suffers from the defect
that it depends upon the units of measurement and, therefore, the fit of the two regression
equations with different standard errors of estimates cannot be compared. To overcome this
problem, we will introduce the concept of rz, the coefficient of determination, later in the
unit.
Test of Significance of Regression Parameters
We need to test the significance of the regression coefficients α and β, which is carried
out with the help of the t-statistic. The hypothesis to be tested for the slope coefficient is
mentioned below as:
Ho: β = 0
H1: β ≠ 0

The acceptance of the null hypothesis (H ) would indicate that the variable X does not
influence Y. In the above case, we have used a two- tailed test. The decision whether a
researcher should use a two-tailed or a one-tailed alternative depends upon whether the
direction of the relationship between the dependent and the causal variable is known or not. If
we know the direction of the relationship between the causal variable and the dependent
variable, we
should go for a one-tailed test and if there is no clue about the direction of relationship
between the two variables, it is suggested that a two-tailed alternative should be adopted.
The test statistic to be used to test the significance of the slope coefficient is given by:
(formula)

Once we compute the t-statistic, it is compared with table value of t with n – k degrees of
freedom where n is the number of the observations in the sample  and  k  represents  the 
number  of  parameters  to  be  estimated  in  a regression equation (in the present case k = 2).
In case the computed value of |t| is greater than the tabulated valued of |t| at a given level of
significance, the null hypotheses is rejected.
 Goodness of Fit of Regression Equation
A researcher would be interested in knowing how good the estimated regression
equation is. To answer this question, there is a measure rz which, in the case of simple linear
regression model, is simply the square of the correlation coefficient. This measure is also
called the coefficient of determination of a regression equation and it takes values between 0
and 1 (both values inclusive). It indicates the explanatory power of the regression model. If
for a particular regression model, r2 is equal to 0.86, it means that 86 per cent of the
variations in the dependent variable Y are explained by the variations in the independent
variable X. Then rz may be computed as:

The value of r2  is free from the units of measurements and, therefore, can be used to
compare the goodness of fit of two or more regressions. The test for the goodness of fit is
carried out by using the F-statistic. 

114 | P a g e
For a given level of significance α, the computed value of the F-statistic is compared with the
tabulated value of F with k – 1 degrees of freedom in the numerator and n – k degrees of
freedom in the denominator. If the computed F exceeds the tabulated F, the null hypothesis is
rejected in favour of the alternative hypothesis.
Uses of Regression Analysis in Prediction

The regression analysis can be employed for prediction. The prediction estimates could be
both point and interval. Further, the interval prediction can be approximate as well as exact.
Summary
Let us recapitulate the main points discussed in the unit:
In this unit, the concept of correlation is defined as measuring the association between two
variables. 

 The correlation can be positive, negative or zero. A quantitative measure of


correlation between two variables is explained.
 The correlation coefficient takes any value between minus 1 (-1) and plus 1 (+1).
The significance of correlation coefficient is tested using t-statistic.
 The limitations of correlation coefficient are discussed in this unit and the case for
regression analysis is argued out.
 The parameters of a linear regression equation are estimated using the OLS method.
 The test of significance of regression parameters is conducted using t- statistic.
 The goodness of fit of regression equation is given by r2 whose statistical
significance is examined by F-statistic.
 The uses of regression equation in prediction are explained. Both point and interval
predictions are discussed.

Keywords

 Correlation: Measures the association between two variables


 Simple regression: A type of regression in which there is only one independent and
one dependant variable
 Stochastic error term: Takes into account those variables which have been
 omitted from the regression equation for one reason or another
 Standard error of estimate: Given by the standard deviation of error term
 r2: Measure the goodness of fit of a simple regression

UNIT 14 : Multivariate Analysis of Data


Structure
 Introduction
 Factor Analysis
 Steps in a Factor Analysis Exercise
 Illustration of Factor Analysis Exercise
 Discriminant Analysis
 Discriminant Analysis Model
 Illustration of Discriminant Analysis

115 | P a g e
 Cluster Analysis
 Uses of Cluster Analysis
 Statistics Associated with Cluster Analysis
 Key Concepts in Cluster Analysis
 Process of Clustering
 Summary
 Glossary
Introduction 

In the unit on univariate and bivariate analysis of data, we made a mention of multivariate
analysis. In the multivariate analysis of data, we analyse more than two variables at at a time
the multivariate analysis of data has a number of uses in research which will be shown
through specific techniques. In this unit, we are going to discuss factor analysis, discriminant
analysis and cluster  analysis - some very commonly used multivariate techniques.
 Factor Analysis
Factor analysis is a data reduction method. It is a very useful method to reduce a large
number of variables resulting in data complexity to a few manageable factors. These factors
explain for most part the variations in the original set of data. Factor analysis helps in
identifying the underlying structure of the data. A factor is a linear combination of variables.
It is a construct that is not directly observable but that needs to be inferred from the input
variables. The factors are statistically independent.
Factor analysis requires some specific conditions that must be ensured before
executing the technique. These are mentioned below:
 The factor analysis exercise requires metric data.
 This means the data should be either interval or ratio scale in nature.
 The variables for factor analysis are identified through exploratory research.
 Generally in a survey research, a five or seven-point Likert scale or any other interval
scale may be used.
As the responses to different statements are obtained through different scales, all the
responses need to be standardized. The standardization helps in comparison of different
responses from such scales. The standardization is carried out using the following formulae:

Standardized  score  of  ith   respondent  on  a  statement  = 


(Actual  score  of  ith respondent on statement – Mean of all respondents on the
statement)/Standard deviation of all respondents on the statement

The size of the sample respondents should be at least four to five times more than the number
of variables (number of statements).The basic principle behind the application of factor
analysis is that the initial set of variables should be highly correlated. If the correlation

116 | P a g e
coefficients between all the variables are small, factor analysis may not be an appropriate
technique. A correlation matrix of the variables could be computed and tested for its
statistical significance.

The hypothesis to be tested may be written as:


Ho  :  Correlation  matrix  is  insignificant,  i.e.,  correlation  matrix  is  an  identity matrix
where diagonal elements are one and off diagonal elements are zero.
H1 : Correlation matrix is significant.
 The test is carried out using the Bartlett test of  sphericity,  which  takes  the 
determinant  of  the  correlation  matrix  into consideration. The test converts it into
chi- square statistics with degrees of freedom equal to [(K(K-1))/2], where k is the
number of variables on which factor analysis is applied. The significance of the
correlation matrix ensures that a factor analysis exercise could be carried out.
 Another condition which needs to be fulfilled before a factor analysis could be carried
out is the value of Kaiser-Meyer-Olkin (KMO) statistics which takes a value between
0 and 1. For the application of factor analysis, the value of KMO statistics should be
greater than 0.5. The KMO statistics compare the magnitude of observed correlation
coefficients with the magnitudes of partial correlation coefficients. A small value of
KMO shows that correlation between variables cannot be explained by other variables.

Steps in a Factor Analysis Exercise


There are basically two steps that are performed in a factor analysis exercise.
1. Extraction of factors: The first and foremost step is to decide how many factors are
to be extracted from the given set of data. This could be accomplished by the principal
component method. As we know, factors are linear combinations of the variables which are
supposed to be highly  

2. Rotation of factors: The second step in the factor analysis exercise is the rotation of
initial factor solutions. This is because the initial factors are very difficult to interpret.
Therefore, the initial solution is rotated so as to yield a solution that can be interpreted easily.
Most of the computer software would give options for orthogonal rotation, varimax rotation
and oblique rotation. Generally, the varimax rotation method is used, as this results in
independent factors. 

Illustration of Factor Analysis Exercise

We will explain all that is discussed above with the help of a numerical example. 

Discriminant Analysis

Discriminant analysis is used to predict group membership. This technique is used to


classify individuals/objects into one of the alternative groups on the basis of a set of predictor
variables. The dependent variable in discriminant analysis is categorical and on a nominal

117 | P a g e
scale, whereas the independent or predictor variables are either interval or ratio scale in
nature. When there are two groups (categories) of dependent variables, we have two-group
discriminant analysis and when there are more than two groups, it is a case of multiple
discriminant analysis. In case of two-group discriminant analysis, there is one discriminant
function, whereas in case of multiple discriminant analysis, the number of functions is one
less than the number of groups.
The objectives of discriminant analysis are the following:
To find a linear combination of variables that discriminate between categories of dependent
variables in the best possible manner

 To find out which independent variables are relatively better in discriminating


between groups
 To determine the statistical significance of the discriminant function and whether any
statistical difference exists among groups in terms of predictor variables
 To develop the procedure for assigning new objects, firms or individuals whose
profile but not the group identity are known to one of the two groups
 To evaluate the accuracy of classification, i.e., the percentage of customers that it is
able to classify correctly
Discriminant analysis can be very useful for answering the following questions:
 What are the demographic variables on which potentially successful salesmen and
potentially unsuccessful salesmen differ?
 What are the variables on which users/non-users of a product can be differentiated?
 What are the economic and psychographic variables on which price- sensitive and
non-price sensitive customers be differentiated?
 What are the variables on which the buyers of local/national brand of a product be
differentiated?
Discriminant Analysis Model

The mathematical form of the discriminant analysis model is:

  Y= b0 + b1 X1 + b2 X2 + b2 X2 + ... + bK XK

Where,
Y = Dependent variable
bi =Coefficients of independent variables;(i=0,1,2,....K)
Xj =Predictor or independent variables;(j=1,2,....K)

118 | P a g e
It may be kept in ming that the dependent variable Y should be a categorized variable,
whereas the independent variables Xs should becontinuous. As the dependent variable is a
categorized variable, it should be coded as 0, 1, similar to the dummy variable coding.
The method of estimating bS  is based on the principle that the ratio of 'between group sum of
squares' to 'within group sum of squares' be
maximized. This will make the groups differ as much as possible on the values of the
discriminant function.
After having estimated the model, the bi coefficients are used to calculate Y, the (also called
discriminant coefficient) are used to calculate Y, the discriminant score by substituting the
values of Xj in the estimated discriminant model. For any new data point that we want to
classify into one of the groups, a decision rule is formulated for this purpose to determine the
cut-off score, which is usually the midpoint of the mean discriminant scores of the two
groups in case of two-group discriminant analysis, provided the size of the samples in the two
groups is the same. The accuracy of classification is determined by using a classification
matrix (also called confusion matrix).
The relative importance of the independent variables could be determined from the
standardized discriminant function coefficient and the structure matrix. The difference
between the standardized and un- standardized discriminant function is that in the un-
standardized discriminant function we have a constant term, whereas in the standardized
discriminant function, there is no constant term.
llustration of Discriminant Analysis
We will illustrate the estimation and the use of the discriminant model in the case of two
groups with the help of an example.
Cluster Analysis
Cluster analysis is similar in terms of analysing the function of multiple independent
variables. However, there are essential differences between the other data reduction
techniques and cluster analysis.
 In factor analysis, the objective was to reduce the original correlated variables to a
manageable number of factors. However, the data reduction was carried out on the variables.
On the other hand, in cluster analysis the focus is on the individuals or entities and the
objective is to group the individuals on the variables.
The other data classification technique was two group discriminant analyses. Here also, one
might wish to group individuals or objects into groups, but the technique has an established
classification rule and the objective of the technique is to validate the information to see
whether the groups obtained by the identified function are correctly classified or not. In
cluster analysis, the whole population/sample is undifferentiated and the attempts to assess
similarity in response to variables and the grouping happens post the answers have been
obtained on the questions/variables.
Uses of Cluster Analysis

119 | P a g e
Cluster analysis has widespread use in the field of management. However, its most
valuable contribution is in the area of marketing. Some applications of the technique are as
follows:
 Market segmentation: As we know, market segmentation is the process of splitting
customers/ potential customers, within a market into different groups/segments. The
advantage with the technique is that one can look at the combination of variables to
predict consumer or potential consumer groups.
 Career planning and training analysis: In the area of human resources (HR) the
technique can be used to group people into clusters on the basis of their educational
qualification, experience, aptitude and aspirations. This grouping can assist the HR
division to effectively manage training and manpower development for the members
of different clusters effectively.
 Segmenting financial sectors/instruments: This is an emerging area where different
factors like raw material cost, financial allocations, seasonality and other factors are
being used to group sectors together to understand the growth and performance of a
group of industries.

Statistics Associated with Cluster Analysis

Cluster analysis is the simplest in terms of mathematical derivations. The simplest way to
explain the technique is to understand that it simply measures the distance between objects on
the basis of multiple variables and looks for similarity as a function of distance, i.e., the
shorter the distance between two objects, the more similar they are. For obtaining a cluster
solution to data that is collected on an interval or ratio scale the statistical assessment of the
distance between two objects can be done by calculating the Euclidean distance between
them. The distance between person A and B can be calculated:
 
Where  XB1  and  XB2  represent  the  variables  under  study.For  example,  two variables–
nutrition and ease of preparation were placed on a 10-point scale of importance (with 1 =
very unimportant and 10 = very important). The values selected by persons A and B were as
Follows:
Then the distance between the two is,
Suppose there was a third person C who had selected
 Then the distance between A and C would be 5.0 and between B and
C would be 1.0.Thus, B and C are the most similar pair as the inter-person distance is the
least and, as stated earlier, the shorter the distance, the greater the similarity. If, in addition to
having nutrition and ease of preparation for breakfast, we also had a variable that measured
cost, we would effectively have a s- dimensional solution. Then the formula would have
been:
And generally, for any two objects I and j:

120 | P a g e
dij= Distance between persons i and j
k = Variable (interval/ratio)
i = Object/person
j = Object/person
Key Concepts in Cluster Analysis
The following statistics and concepts are associated with cluster analysis:
 ANOVA table: The univariate or one-way ANOVA statistics for each clustering
variable. The higher is the ANOVA value, the greater is the difference between the
clusters on that variable.
 Cluster variate: The variables or parameters used to cluster and calculate the similarity
between objects.
 Cluster centroid: The average values of the objects on all variables in the cluster
variate.

 Cluster seeds: Initial cluster centres in the non-hierarchical clustering that are the
initial points from which one starts. Then the clusters are created around these seeds.
 Cluster membership: The address or the cluster to which a particular
 person/object belongs.
 Dendrogram: This is a tree-like diagram that graphically presents the cluster results.
The vertical axis represents the objects and the horizontal axis represents the inter-
respondent distance. The figures are to be read from left to right.
 Distances between final cluster centres: These are the distances between the
individual pairs of clusters. A robust solution that is able to demarcate the groups
distinctly is the one where the inter-cluster distance is large; the larger the distance the
more distinct are the clusters.
 Entropy group: Individuam,ls or small groups that do not seem to fit into any cluster.
 Final cluster centres: The mean value of the cluster on each of the variables that is
part of the cluster variate.
 Hierarchical methods: A step-wise process that starts with the most similar pair and
formulates a tree-like structure composed of separate clusters.
 Non-hierarchical methods: Cluster seeds or centres are the starting points and one
builds individual clusters around it based on some pre-specified distance of the seeds.
 Summary: Number of cases in each cluster is indicated in the non- hierarchical
clustering method.

Process of Clustering

Cluster analysis requires a step-wise execution. This sequence is presented below:


 (i) Determine the similarity of each pair of respondent by computing the Euclidean
distance between them.

121 | P a g e
(ii) Using the single-linkage method, prepare a dendrogram.
Summary

Let us recapitulate the main points discussed in this unit:


 Factor analysis is a data reduction technique. It helps in identifying the underlying
structure of the data. For the application of factor analysis, the number of observations
should be at least 4-5 times the number of variables. Further, the variables should be
highly correlated and the KMO statistics should be greater than 0.5. To decide the
number of factors to be extracted, the principal component method is used. For the
interpretation of factor solution, Varimax rotation is used.
 Discriminant analysis is used to predict group membership. It tries to find a linear
combination of variables that discriminate between categories of dependent variable
in the best possible manner. The principle used in estimating a discriminant model is
that the ratio of 'between group variance' to 'within group variance' should be
maximized. The unit highlights the procedure for classifying the objects into one of
the alternative groups by using the classification rule.
 Cluster Analysis tries to group the individuals/objects. It has many applications like
market segmentation, career planning and training analysis and segmenting financial
sector/instruments etc. The principle involved in segmentation makes use of
Euclidean distance. The similar objects where the distances are small are
grouped together.
Keywords

 Factor loading: It gives the correlation coefficient between the factor score and the
variable in question
 Bartlett's test of sphericity: Use to test the significance of correlation matrix
 Communality: It is a measure of the percentage of variable's variation i.e. explained
by the factors
 Wilks' lambda: It is given by ratio of between group variance to total variance
 Hit Ratio: It is the ratio of number of correct prediction to total number of cases 
 Cluster membership: The address or the cluster to which a particular person/object
belongs
 Dendrogram: This is a tree-like diagram that graphically presents the cluster results.
The vertical axis represents the objects and the horizontal represents the inter-
respondent distance. The figures are to be read from left to right

UNIT 15 :RESERCH REPORT WRITING

Structure

 Introduction
 Types of Research Reports
 Brief Reports
 Detailed Reports
 Report Writing: Structure of the Research Report

122 | P a g e
 Preliminary Section
 Main Report
 Interpretations of Results and Suggested Recommendations
 Report Writing: Formulation Rules for Writing the Report
 Guidelines for Presenting Tabular Data
 Guidelines for Visual Representations: Graphs
 Summary
 Keywords
Introduction

In the previous units, we have discussed and learnt about data collection and
processing. On completion of the research study and after obtaining the research results, the
real skill of the researcher lies in analysing and interpreting the findings and linking them
with the propositions formulated in the form of research hypotheses at the beginning of the
study. The statistical or qualitative summary of results would be little more than numbers or
conclusions unless one is able to present the documented version of the research endeavour.
One cannot overemphasize the significance of a well- documented and structured research
report. Just like all the other steps in the research process, this requires careful and sequential
treatment.
In this unit, we will be discussing in detail the documentation of the research study.
The format and the steps might be moderately adjusted and altered based on the reader's
requirement. Thus, it might be for an academic and theoretical purpose or might need to be
clearly spelt and linked with the business manager’s decision dilemma

Types of Research Reports


The research report is a concrete proof of the study that was undertaken.It is a one-
way communication of the researcher's study and analysis to the reader/manager, and thus
needs to be all-inclusive and yet neutral in its reporting. As the report documents all the steps
followed and the analysis carried out, it also serves to authenticate the quality of the work
carried out and establishes the strength of the findings obtained.
The form and structure of the research report might change according to the purpose
for which it has been designed. Based on the size of the report, it is possible to divide the
report into brief reports and detailed reports.

Brief Reports
 These kinds of reports are not formally structured and are generally short,
sometimes not running more than four to five pages. The information provided has
limited scope and is a prelude to the formal structured report that would subsequently
follow. These reports could be designed in several ways.
Working papers or basic reports are written for the purpose of recording the process carried
out in terms of scope and framework of the study, the methodology followed and instrument
designed. The results and findings would also be recorded here. However, the interpretation

123 | P a g e
of the findings and study background might be missing, as the focus is more on the present
study rather than past literature.
Survey reports might or might not have an academic orientation. The focus here is to present
findings in easy-to-comprehend format that includes figures and tables. The advantage of
these reports is that they are simple and easy to understand and present the findings in a clear
and usable format.
Detailed Reports
These are more formal and could be academic, technical or business reports.
 Technical reports: These are major documents and would include all elements of the
basic report, as well as the interpretations and conclusions, as related to the obtained
results. This would have a complete problem background and any additional past
data/records that are essential for understanding and interpreting the study results. All
sources of data, sampling plan, data collection instrument(s), data analysis outputs
would be formally and sequentially documented.
 Business reports: These reports include conclusions as understood by the business
manager. The tables, figures and numbers of the first report would now be pictorially
shown as bar charts and graphs and the reporting tone would be more in business
terms. Tabular data might be attached the appendix.
Activity 1:
Find a technical and business report and examine the contents of the report against what has
been discussed in the unit. What deviations did you find from the stated structure? What do
you think could have been the reason for this?
Hint: You can avail the report from your library from the Internet.

Report Writing:Structure of the Research Report 


Whatever the type of report, reporting requires a structured format and by and large,
the process is standardized. As stated above, the major difference amongst the types of
reports is that all the elements that make a research report would be present only in a detailed
technical report. Usage of theoretical and technical jargon would be more in the technical
report and visual presentation of data would be more in the management report.
 
The preliminary section includes the title page, followed by the letter of authorization,
acknowledgements, executive summary and the table of contents. Then come the background
section, which includes the problem statement, introduction, study background, scope and
objectives of the study and the review of literature (depends on the purpose). This is followed
by the methodology section, which, as stated earlier, is again specific to the technical report.
This is followed by the findings section and then come the conclusions. The technical report
would have a detailed bibliography at the end. In a management report, the sequencing of the
elements might be reversed to suit the needs of the decision-maker, as here the reader needs
to review and absorb the findings. Thus, the last section on interpretation of findings would

124 | P a g e
be presented immediately after the study objectives and a short reporting on methodology
could be presented in the appendix.
 
1. Preliminary Section
2. Title Page
3. Letter of Authorization Executive Summary Acknowledgements Table of
Contents
4. Background Section
5. Problem Statement
6. Study Introduction and Background Scope and Objectives of the Study Review
of Liteature
7. Methodology Section Research Design Sampling Design
8. Data Collection Data Analysis
9. Findings Section
10. Results
11. Interpretation of Results
12. Conclusions Section
13. Conclusion and Recommendations Limitations of the Study
14. Appendices Glossary of Terms
Preliminary Section
This section mainly consists of identification information for the study conducted. It has the
following individual elements:

Title page: The title should be crisp and indicative of the nature of the project, as illustrated
in the following examples.
Comparative analysis of BPO workers and schoolteachers with reference to their work-life
balance Segmentation analysis of luxury apartment buyers in the National Capital Region
(NCR)

Letter of transmittal: This is the letter that broadly refers to the purpose behind the study.
The tone in this note can be slightly informal and indicative of the rapport between the client-
reader and the researcher. The letter broadly refers to three issues. It indicates the terms of the
study or objectives; next it goes on to broadly give an indication of the process carried out to
conduct the study and the implications of the findings. The conclusions generally are
indicative of the researcher's learnings from the study. A sample letter of transmittal is as
folllows:

Dear Prem,
Please find the enclosed document which covers a summary of the findings of the November-
December 2011 study of the new product offering and its acceptability. I would be sending
three hard copies of the same tomorrow. Once the core group has discussed the direction of

125 | P a g e
the expected results I would request you to kindly get back with your comments/queries/
suggestions, so that they can be incorporated in the preparation of the final report document.
The major findings of the study were that the response of the non- vegetarians consuming the
new vegan keema bonda pav at Just Bondas were positive. As you can observe, however, the
introduction of vegan mockmeat bonda has not been well received by the regular customers
who visit the outlets for their regular alloo bonda. These findings, though on a small
respondent base, are significant as they could be an indication of a deflecting loyal customer
base.

Best regards,
Nayan

Letter of authorization: The author of this letter is the business manager who formally gives
the permission for executing the project. The tone of this letter, unlike the above document, is
very precise and formal.

Table of contents: All reports should have a section that clearly indicates the division of the
report based on the formal areas of the study as indicated in the research structure. The major
divisions and subdivisions of the study, along with their starting page numbers, should be
presented. Once the major sections of the report are listed, the list of tables come next,
followed by the list of figures and graphs, exhibits (if any) and finally the list of appendices.

Executive summary: The summary of the entire report, starting from the scope and
objectives of the study to the methodology employed and the results obtained, has to be
presented in a brief and concise manner. The executive summary essentially can be divided
into four or five sections. It begins with the study background, scope and objectives of the
study, followed by the execution, including the sample details and methodology of the study.
Next comes the findings and results obtained. The fourth section covers the conclusions and
finally, the last section includes recommendations and suggestions.
Acknowledgements: A small note acknowledging the contribution of the respondents, the
corporates and the experts who provided inputs for accomplishing the study is included here.

Main Report
This is the most significant and academically robust part of the report.
Problem definition: This section begins with the formal definition of the research problem.

Study background: Study background essentially begins by presenting the decision-makers'


problem and then moves on to a description of the theoretical and contemporary market data
that laid the foundation that guided the research.

126 | P a g e
In case the study is an academic research, there is a separate section devoted to the review of
related literature, which presents a detailed reporting of work done on the same or related
topic of interest.
 
Study scope and objectives: The logical arguments then conclude in the form of definite
statements related to the purpose of the study. In case the study is causal in nature, the
formulated hypotheses are presented here as well.

Methodology of research: The section would essentially have five to six sections specifying
the details of how the research was conducted. These would essentially be:

 Research framework or design: The variables and concepts being investigated are
clearly defined, with a clear reference to the relationship being studied. The
justification for using a particular design also has to be presented here.
 Sampling design: The entire sampling plan in terms of the population being studied,
along with the reasons for collecting the study-related information from the selected
group is given here.
 Data collection methods: In this section, the researcher should clearly list the
information needed for the study as drawn from the study objectives stated earlier.
The secondary data sources considered and the primary instrument designed for the
specific study are discussed here. However, the final draft of the measuring instrument
can be included in the appendix.
 Data analysis: The assumptions and constraints of the analysis need to be explained
here in simple, non-technical terms.
 Study results and findings: This is the most critical chapter of the report and requires
special care; it is probably also one of the longest chapters in the document.

Interpretations of Results and Suggested Recommendations


This section comes after the main report and contains interpretations of results and
suggested recommendations. It presents the information in a summarized and numerical
form.
Sometimes, the research results obtained may not be in the direction as found by
earlier researchers. Here, the skill of the researcher in justifying the obtained direction is
based on his/her individual opinion and expertise in the area of study. After the interpretation
of results, sometimes, the study requirement might be to formulate indicative
recommendations to the decision-makers as well. Thus, in case the report includes
recommendations, they should be realistic, workable and topically related to the industry
studied.

127 | P a g e
Limitations of the study is the last part in this section is a brief discussion of the
problems encountered during the study and the constraints in terms of time, financial or
human resources.
End is the final section of the report provides all the supportive material in the study.
Some of the common details presented in this section are as follows:

 Appendices: The appendix section follows the main body of the report and essentially
consists of two kinds of information:
1. Secondary information like long articles or in case the study uses/ is based on/refers to
some technical information that needs to be understood by the reader; long tables or
articles or legal or policy documents.
2. Primary data that can be compressed and presented in the main body of the report.
This includes the original questionnaire, discussion guides, formula used for the
study, sample details, original data, long tables and graphs which can be described in
statement form in the text.
Bibliography: This is an important part of the final section as it provides the complete details
of the information sources and papers cited in a standardized format. It is recommended to
follow the publication manuals from the American Psychological Association (APA) or the
Harvard method of citation for preparing this section. The reporting content of the
bibliography could also be in terms of:
 Selected bibliography: Selective references are cited in terms of relevance
 and reader requirement. Thus, the books or journals that are technical and not really
needed to understand the study outcomes are not reported.
 Complete bibliography: All the items that have been referred to, even when
 not cited in the text, are given here.
 Annotated bibliography: Along with the complete details of the cited work, some
brief information about the nature of information sought from the article is given.
At this juncture we would like to refer to citation in the form of a footnote. To explain the
difference we would first like to explain what a typical footnote is:

 Footnote: A typical footnote, as the name indicates, is part of the main report and
comes at the bottom of a page or at the end of the main text. This could refer to a
source that the author has referred to or it may be an explanation of a particular
concept referred to in the text.
The referencing protocol of a footnote and bibliography is different.
In a footnote, one gives the first name of the person first and the surname next. However, this
order is reversed in the bibliography. Here we start with the surname and then give the first
name.
In a bibliography, we generally mention the page numbers of the article or the total pages in
the book. However, in a footnote, the specific page from which the information is cited is
mentioned. A bibliography is generally arranged alphabetically depending on the author's

128 | P a g e
name, but in the footnote the reporting is based on the sequence in which they occur in the
text.

Glossary of terms: In case there are specific terms and technical jargon used in the report,
the researcher should consider putting a glossary in the form of a word list of terms used in
the study. This section usually the last section of the report.

Activity 2:

Read a research report and see whether it has:


a. Followed the structure as you studied in this unit.
b. Was the report format correct or incorrect in your opinion?
Hint: You can visit a library or search the Internet to read the report.

Report Writing: Formulation Rules for Writing the Report


Listed below are some features of a good research study that should be kept in mind
while documenting and preparing the report.
Clear report mandate: While writing the research problem statement and study background,
the writer needs to be absolutely clear in terms of why and how the problem was formulated.
Clearly designed methodology: Any research study has its unique orientation and scope and
thus has a specific and customized research design, sampling and data collection plan. In
researches that are not completely transparent on the set of procedures, one cannot be
absolutely confident of the findings and resulting conclusions.
Clear representation of findings: Complete honesty and transparency in stating the
treatment of data and editing of missing or contrary data is extremely critical.

Representativeness of study finding: A good research report is also explicit in terms of


extent and scope of the results obtained, and in terms of the applicability of findings.

Thus, some guidelines should be kept in mind while writing the report.
Command over the medium: A correct and effective language of communication is critical
in putting ideas and objectives in the vernacular of the reader/decision-maker.

129 | P a g e
Phrasing protocol: There is a debate about whether or not one makes use of personal
pronoun while reporting. The use of personal pronoun such as
'I think…..' or 'in my opinion…..' lends a subjectivity and personalization of judgement.
Thus, the tone of the reporting should be neutral. For example: 'Given the nature of the
forecasted growth and the opinion of the respondents, it is likely that the……'
Whenever the writer is reproducing information verbatim from another document or
comment of an expert or published source, it must be in inverted commas or italics and the
author or source should be duly acknowledged.
For example:
Sarah Churchman, Head of Diversity, Price water house Coopers, states 'At Price water house
Coopers, we firmly believe that promoting work-life balance is a 'business-critical' issue and
not simply the 'right thing to do'. The writer should avoid long sentences and break up the
information in clear chunks, so that the reader can process it with ease. 

Simplicity of approach: Along with grammatically and structurally correct language, care


must be taken to avoid technical jargon as far as possible. In case it is important to use certain
terminology, then, definition of these terms can be provided in the glossary of terms at the
end of the report.

Report formatting and presentation: In terms of paper quality, page margins and font style
and size, a professional standard should be maintained. The font style must be uniform
throughout the report. The topics, subtopics, headings and subheadings must be construed in
the same manner throughout the report. The researcher can provide data relief and variation
by adequately supplementing the text with graphs and figures.

Guidelines for Presenting Tabular Data


Most research studies involve some form of numerical data, and even though one can
discuss this in text, it is best represented in tabular form. The data can be given in simple
summary tables, which only contain limited information and yet, are, essentially critical to
the report text.
The mechanics of creating a summary table are very simple and are illustrated below with an
example in Table 15.1. The illustration has been labelled with numbers which relate to the
relevant section.

Table identification details: The table must have a title (1a) and an identification number
(1b). The table title should be short and usually would not include any verbs or articles. It
only refers to the population or parameter being studied. The title should be briefly yet clearly
descriptive of the information provided. The numbering of tables is usually in a series and
generally one makes use of Arabic numbers to identify them.
Data arrays: The arrangement of data in a table is usually done in an

130 | P a g e
ascending manner. 

Measurement unit: The unit in which the parameter or information is presented should be
clearly mentioned.

Spaces, leaders and rulings (SLR): For limited data, the table need not be divided using
grid lines or rulings, simple white spaces add to the clarity of information presented and
processed. In case the number of parameters is too many, it is advisable to use vertical ruling.
Horizontal lines are drawn to separate the headings from the main data.

Assumptions, details and comments: Any clarification or assumption made or a special


definition required for understanding the data, or formula used to arrive at a particular figure,
e.g., total market sale or total market size, can be given after the main tabled data in the form
of footnotes.

Data sources: In case the information documented and tabled is secondary in nature,
complete reference of the source must be cited after the footnote, if any.

Special mention: In case some figure or information is significant and the reader should pay
special attention to it, the number or figure can be bold or can be highlighted to increase
focus.
 
Guidelines for Visual Representations: Graphs
Similar to the summarized and succinct data in the form of tables, the data can also be
presented through visual representations in the form of graphs.

Line and curve graphs: Usually, when the objective is to demonstrate trends and some sort
of pattern in the data, a line chart is the best option available to the researcher. It is also
possible to show patterns of growth of different sectors or industries in the same time period
or to compare the change in the studied variable across different organizations or brands in
the same industry. Certain points to be kept in mind while formulating line charts include:
The time units or the causal variable being studied are to be put on the X-axis, or the
horizontal axis. If the intention is to compare different series on the same chart, the lines
should be of different colours or forms 
Too many lines are not advisable; an ideal number would be five or less than five lines on the
chart.
The researcher also must take care to formulate the zero baseline in the chart as otherwise,
the data would seem to be misleading.
 

131 | P a g e
Area or stratum charts: Area charts are like the line charts and are used to demonstrate
changes in a pattern over a period of time. What is done is that the change in each of the
components is individually shown on the same chart and each of them is stacked one on top
of the other. The areas between the various lines indicate the scale or volume of the relevant
factors/ categories (Figure 15.4).

Pie charts: Another way of demonstrating the area or stratum or sectional representation is
through pie charts. The critical difference between a line and pie chart is that the pie chart
cannot show changes over time. It simply shows the cross-section of a single time period.
There are certain rules that the researcher should keep in mind while creating pie charts.
The complete data must be shown as 100 per cent area of the subject being graphed.
It is a good idea to have the percentages displayed within or above the pie rather than in the
legend as then it is easier to understand the magnitude of the section in comparison to the
total. 
Bar charts and histograms: A very useful representation of quantum or magnitude of
different objects on the same parameter are bar diagrams. The comparative position of objects
becomes very clear. The usual practice is to formulate vertical bars; however, it is possible to
use horizontal bars as well if none of the variables is time related . Horizontal bars are
especially useful when one is showing both positive and negative patterns on the same graph.
These are called bilateral bar charts and are especially useful to highlight the objects or
sectors showing a varied pattern on the studied parameter.

Another variation of the bar chart is the histogram.Here the bars are vertical and the height of
each bar reflects the relative or cumulative frequency of that particular variable.
Summary
Let us recapitulate the main points discussed in this unit:

 The most important task ahead of the researcher is to document the entire work done
in the form of a well structured research report.
 The orientation and structure will depend on what kind of report is being constructed.
These could be brief or detailed; academic, technical or business report.
 The reports generally follow a standardized structure. The entire report can be divided
into three main sections—the preliminary section, the main body and endnotes.
 There must be no ambiguity in either presenting the findings or representativeness of
the findings.
 Visual relief from the written text can be provided through figures, tables and graphs.

Keywords

132 | P a g e
 Annotated bibliography: A bibliography that includes brief explanations or notes for
each reference
 Bibliography: A list of the works of a specific author or publisher
 Executive summary: The summary of the entire report, starting from the scope and
objectives of the study to the methodology employed and the results obtained,
presented in a brief and concise manner
 Letter of transmittal: The letter that broadly refers to the purpose behind the study
 Working paper: Report that is written for the purpose of recording the process
carried out in terms of scope and framework of the study, the methodology followed
and instrument designed

133 | P a g e

You might also like