Quantitative Techniques and Methods Notes
Quantitative Techniques and Methods Notes
Quantitative Techniques-II
Objectives: To familiarize the students with different statistical techniques useful in conducting business research and then
applying the same in their business strategies.
in
e.
Questionnaire designing: construction, types, developing a good questionnaire, mailed questionnaire and
schedule
fre
s4
5. Sampling design and techniques, Scaling techniques: meaning and types, sampling distribution, Data
e
ot
7. Multiple Regression and Correlation Analysis: Least square regression plane, linear Multiple regression
tp
ht
in
e.
Unit 12: Hypothesis Testing fre 233
s4
CONTENTS
Objectives
Introduction
1.1 Quantitative Decision-making and its Overview
1.2 Meaning of Quantitative Techniques
1.3 Statistics and Operations Research
1.3.1 Types of Statistical Data
1.3.2 Classification of Statistical Methods
1.4 Models in Operations Research
1.5 Various Statistical Techniques
in
1.6 Advantages of Quantitative Approach to Management
e.
1.7 Quantitative Techniques in Business and Management
fre
s4
e
1.8 Summary
ot
.n
1.9 Keywords
w
w
Objectives
Know the various models frequently used in operations research and the basis of their
classification;
Introduction
You may be aware of the fact that prior to the industrial revolution individual business was
small and production was carried out on a very small scale mainly to cater to the local needs.
The management of such business enterprises was very different from the present management
of large scale business. The information needed by the decision-maker (usually the owner) to
make effective decisions was much less extensive than at present. Thus, he used to make decisions
based upon his past experience and intuition only. Some of the reasons for this were:
Notes The marketing of the product was not a problem because customers were, for the large
part, personally known to the owner of the business. There was hardly any competition in
the business.
Test marketing of the product was not needed because the owner used to know the choice
and requirement of the customers just by personal interaction.
The manager (also the owner) also used to work with his workers at the shop floor. He
knew all of them personally as the number were small. This reduced the need for keeping
personal data.
The progress of the work was being made daily at the work centre itself. Thus production
records were not needed.
Any facts the owner needed could be learnt direct from observation and most of what he
required was known to him.
Now, in the face of increasing complexity in business and industry, intuition alone has no place
in decision-making because basing a decision on intuition becomes highly questionable when
the decision involves the choice among several courses of action each of which can achieve
in
several management objectives simultaneously.
e.
fre
Hence there is a need for training people who can manage a system both efficiently and creatively.
s4
e
ot
personnel. Today, these techniques are also widely used in regional planning, transportation,
ht
Availability of high speed computers to apply quantitative techniques (or models) to real
life problems in all types of organizations such as business, industry, military, health, and
so on. Computers have played an important role in arriving at the optimal solution of
complex managerial problems both in terms of time and cost.
In spite of these reasons, the quantitative approach, however, does not totally eliminate the
scope of qualitative or judgment ability of the decision-maker. Of course, these techniques
complement the experience and knowledge of decision maker in decision-making.
Self Assessment
2. The quantitative approach does not totally eliminate the scope of …………….ability of the Notes
decision-maker.
Task Explain with the help of example some of the important Quantitative Techniques
used in modern business and in industrial unit.
in
e.
fre
The quantitative approach in decision-making requires that, problems be defined, analysed and
s4
solved in a conscious, rational, systematic and scientific manner based on data, facts, information,
e
ot
and logic and not on mere whims and guesses. In other words, quantitative techniques (tools or
.n
w
methods) provide the decision-maker a scientific method based on quantitative data in identifying
w
a course of action among the given list of courses of action to achieve the optimal value of the
w
//
techniques is that numbers, symbols or mathematical formulae (or expressions) are used to
ht
The word statistics can be uses, in a number of ways. Commonly it is described in two senses
namely:
1. Plural Sense (Statistical Data): The plural sense of statistics means some sort of statistical
data. When it means statistical data, it refers to numerical description of quantitative
aspects of things; These descriptions may take the form of counts or measurements.
Example: Statistics of students of a college include count of the number of students, and
separate counts of number of various kinds as such, male and females, married and unmarried,
or undergraduates and post-graduates. They may also include such measurements as their heights
and weights.
2. Singular Sense (Statistical Methods): The large volume of numerical information (or
data) gives rise to the need for systematic methods which can be used to collect, organise
or classify, present, analyse and interpret the information effectively for the purpose of
making wise decisions. Statistical methods include all those devices of analysis and synthesis
by means of which statistical data are systematically collected and used to explain or
describe a given phenomena. The above mentioned five functions of statistical methods
are also called phases of a statistical investigation.
Methods used in analysing the presented data are numerous and contain simple to
sophisticated mathematical techniques.
Notes As an illustration, let us suppose that we are interested in knowing the income level of the
people living in a certain city. For this we may adopt the following procedures:
(i) Data collection: The following data is required for the given purpose:
(a) Population of the city
(b) Number of individuals who are getting income
(c) Daily income of each earning individual
(ii) Organise (or condense) the data: The data so obtained should now be organised in
different income groups. This will reduce the bulk of the data.
(iii) Presentation: The organised data may now be presented by means of various types of
graphs or other visual aids. Data presented in an orderly manner facilitates statistical
analysis.
(iv) Analysis: On the basis of systematic presentation (tabular form or graphical form),
determine the average income of an individual and extent of disparities that exist.
This information will help to get an understanding of the phenomenon (i.e. income
in
of ‘individuals).
e.
fre
(v) Interpretation: All the above steps may now lead to drawing conclusions which will
s4
Characteristics of Data
w
w
//w
It is probably more common to refer to data in quantitative form as statistical data. But not all
s:
tp
numerical data is statistical. In order that numerical description may be called statistics they
ht
They must be aggregate of facts, for example, single unconnected figures cannot be- used
to study the characteristics of the phenomenon.
They should be affected to a marked extent by multiplicity of causes, for example, in social
services the observations recorded are affected by a number of factors (controllable and
uncontrollable).
Example: Collected data on price serve no purpose unless one knows whether he wants
to collect data on wholesale or retail prices and what are the relevant commodities in view.
They must be’ placed in relation to each other. That is, data collected should be comparable;
otherwise these cannot be placed in relation to each other, e.g. statistics on the yield of
crop and quality of soil are related but these yields cannot have any relation with the
statistics on the health of the people.
They must be numerically expressed. That is, any facts to be called statistics must be
numerically or quantitatively expressed. Qualitative characteristics such as beauty,
intelligence, etc. cannot be included in statistics unless they are quantified.
An effective managerial decision concerning a problem on hand depends on the availability and
reliability of statistical data. Statistical data can be broadly grouped into two categories:
The secondary data are those which have already been collected by another organisation and are
available in the published form. You must first check whether any such data is available on the
subject matter of interest and make use of it, since it will save considerable time and money. But
the data must be scrutinized properly since it was originally collected perhaps for another
purpose. The data must also be checked for reliability, relevance and accuracy.
A great deal of data is regularly collected and disseminated by international bodies such as:
World Bank, Asian Development Bank, International Labour Organisation, Secretariat of United
Nations, etc., Government and its many agencies: Reserve Bank of India, Census Commission,
Ministries-Ministry of Economic Affairs, Commerce Ministry; Private Research Organizations,
Trade Associations, etc.
in
e.
fre
When secondary data is not available or it is not reliable, you would need to collect original data
s4
to suit your objectives. Original data collected specifically for a current research are known as
e
primary data. Primary data can be collected from customers, retailers, distributors, manufacturers
ot
.n
or other information sources. Primary data may be collected through any of the three methods:
w
Data are also classified as micro and macro. Micro data relate to a particular unit or region
s:
tp
Operations Research
You would recall that in Operations Research a mathematical model to represent the situation
under study is constructed. This helps in two ways. Either to predict the performance of the
system under certain controls or to determine the action or control needed to optimise
performance.
By now you may have realised that effective decisions have to be based upon realistic data. The
field of statistics provides the methods for collecting, presenting and meaningfully interpreting
the given data. Statistical Methods broadly fall into three categories as shown in the following
chart.
There are statistical methods which are used for re-arranging, grouping and summarising sets
of data to obtain better information of facts and thereby better description of the situation that
can be made.
Example: Changes in the price-index. Yield by wheat etc. are frequently illustrated
using different types of charts and graphs.
These devices summarise large quantitatives of numerical data for easy understanding. Various
types of averages can also reduce a large mass of data to a single descriptive number. The
descriptive statistics include the methods of collection and presentation of data, measure of
Central tendency and dispersion, trends, ‘index numbers, etc.
Inductive Statistics
It is concerned with the development of some criteria which can be used to derive information
in
about the nature of the members of entire groups (also called population or universe) from the
e.
nature of the small portion (also called sample) of the given group. The specific values of the
fre
population members are called ‘parameters’ and that of sample are called ‘statistics’. Thus,
s4
inductive statistics is concerned with estimating population parameters from the sample statistics
e
ot
Samples are drawn instead of a complete enumeration for the following reasons:
w
//
The population units may be too many in number and/or widely dispersed. Thus complete
enumeration is extremely time consuming and at the end of a full enumeration so much
time is lost that the data becomes obsolete by that time.
Statistical decision theory deals with analysing complex business problems with alternative
courses of action (or strategies) and possible consequences. Basically, it is to provide more
concrete information concerning these consequences, so that best course of action can be identified
from alternative courses of action.
Statistical decision theory relies heavily not only upon the nature of the problem on hand, but
also upon the decision environment. Basically there are four different states of decision
environment as given below:
Since statistical decision theory also uses probabilities (subjective or prior) in analysis, therefore Notes
it is also called a subjectivist approach. It is also known as Bayesian approach because Baye’s
theorem is used to revise prior probabilities in the light of additional information.
Self Assessment
in
e.
In this Section we are presenting several classifications of OR models so that you should know
more about the role of models in decision-making:
fre
es4
ot
Purpose
.n
w
w
A Model is the representation of a system which, in turn, represents a specific part of reality
w
(an object of interest or subject of inquiry in real life). The means of representing a system may
//
s:
Through all these means, an attempt is made to abstract the essence of reality, which in turn, is
quite helpful to describe, explain and predict the behaviour of the system Thus, depending upon
the purpose, the stage at which the model is developed, models can be classified into four
categories.
1. Descriptive model: Such Models are used to describe the behaviour of a system based on
certain information.
Example: A model can be built to describe the behaviour of demand for an inventory
item for a stated period, by keeping the record of various demand levels and their respective
frequencies.
A descriptive model is used to display the problem situation more vividly including the
alternative choices to enable the decision-maker to evaluate results of each alternative
choice. However, such model does not select the best alternative.
2. Explanatory model: Such models are used to explain the behaviour of a system by
establishing relationships between its various components.
Example: A model can be built to predict stock prices (within an industry group), for
given any level of earnings per share.
Notes 4 Prescriptive (or normative) model: A prescriptive model is one which provides the norms
for the comparison of alternative solutions which result in the selection of the best
alternative (the most preferred course of action).
Degree of Abstraction
The following chart shows the classification of models according to the degree of abstraction:
in
e.
fre
s4
Any three-dimensional model that looks like the real thing but is either reduced in size or scaled
e
up, is a physical (conic) model. These models include city planning maps, plant layout charts,
ot
.n
plastic model of airplane, body parts, etc. These models are easy to observe, build and describe,
w
flow chart (or diagram) depicting the sequence of activities during the complete processing of a
tp
ht
product is an example of schematic model. Another example of schematic model is the Computer
programme where main features of the programme are represented by a schematic description
of steps.
Analog models are closely associated with iconic models. However, they are not replicas of
problem situations. Rather they are small physical systems that have similar characteristics and
work like an object or system it represents.
Degree of Certainty
Models can also be classified according to the degree of assumed certainty. Under this classification
models are divided into deterministic versus probabilistic models.
Models in which selection of each course of action (or strategy) results in unique and known
pay-off or consequence are called deterministic models.
Situations in which each course of action (or strategy) can result in more than one pay-offs or Notes
consequences are called probabilistic models. Since in such models the concept of probability is
used, therefore the pay-off or consequences due to a managerial action cannot be predicted with
certainty.
The following chart describes the classification of models based on specified behaviour
characteristics. Such type of classification helps in understanding the nature and role of models
in representing management and economic status of organisations.
in
e.
fre
es4
ot
.n
w
w
//w
s:
tp
ht
Source: Loomba, M.P. 1978. Management-A Quantitative Perspective; Macmillan Publishing Co.: New York)
The models that are concerned with a particular set of fixed conditions and do not change in a
short-term period (or planning period) are known as static models. This implies that such
models are independent of time and only one decision is required for a given time period.
Notes The resources required for a product and the technology or manufacturing process
do not change in short-term period.
Linear programming is the particular example of static models. On the other hand, there are
certain types of problems where time factor plays an important role and admit the impact of
changes over a period of time. In all such situations decision-maker has to make a sequence of
optimal decisions at every decision point (i.e. variable time) regardless of what the prior decision
Notes has been. The problem of product development in which the decision-maker has to make decisions
at every decision point such as product design, test-market, full-scale production, etc. is an
example of dynamic model. Dynamic programming is the particular example of dynamic model.
Linear Models are those in which each component exhibits a linear behaviour. The word ‘linear’
is used to describe the relationship among two or more variables which are directly proportional.
For example, if our resources increase b some percentage, then it would increase the output by
the same percentage.
If one or more components of a model exhibit a non-linear behaviour, then such models are
called non-linear models.
A mathematical model of the form Z = 5 + 3 is called a linear model whereas a model of the form
Z = 5x2 + 3xy + y2 is called a non-linear model.
in
e.
fre
The type of procedure used to derive solutions to mathematical models divides them into two
s4
Example: We can test the effect of different number of service counters assuming different
arrival rates of customers on total cost of providing service to customers.
The following table summarises our discussion on classification of models.
You have read about certain standard techniques or prototype models of operations research
which can be helpful to a decision-maker in solving a variety of problems.
A brief comment on certain standard techniques of statistics which can be helpful to a decision-
maker in solving problems is given below. However, each one of these techniques requires
detailed studies and in our context we are merely listing these to arouse your interest.
(i) Measures of Central Tendency: Obviously for proper understanding of quantitative data,
they should be classified and converted into a frequency distribution (number of times or
frequency with which a particular data occurs in the given mass of data). This type of
condensation of data reduces their bulk and gives a clear picture of their structure. If you
want to know any specific characteristics of the given data or if frequency distribution of
one set of data to be compared with another, then it is necessary that the frequency
in
e.
distribution itself must be summarized and condensed in such a manner that it must help
fre
us to make useful inferences about the data and also provide yardstick for comparing
s4
different sets of data. Measures of average or central tendency provide one such yardstick.
e
ot
(a) Mean: The mean is the common arithmetic average. It is computed by dividing the
//
(b) Median: The median is that item which lies exactly half-way between the lowest and
ht
Notes
The measures of the direction and degree of symmetry are called measures of the skewness.
Another characteristic of the frequency distribution is the shape of the peak, when it is
plotted on a graph paper.
(iii) Correlation: Correlation coefficient measures the degree to which the change in one
variable (the dependent variable) is associated with change in the other variable
(independent one). For example, as a marketing manager, you would like to know if there
is any relation between the amount of money you spend on advertising and the sales you
achieve. Here, sales are the dependent variable and advertising budget is the independent
variable. Correlation coefficient, in this case, would tell you the extent of relationship
in
e.
between these two variables, whether the relationship is directly proportional (i.e. increase
fre
or decrease in advertising is associated with increase or decrease in sales) or it is an
s4
inverse relationship (i.e. increasing advertising is associated with decrease in sales and
e
ot
!
w
//
s:
Caution Correlation coefficient does not indicate a casual relationship, Sales is not a direct
tp
result of advertising alone, and there are many other factors which affect sales.
ht
(iv) Regression Analysis: For determining causal relationship between two variables you
may use regression analysis. Using this technique you can predict the dependent variables
on the basis of the independent variables. In 1970, NCAER (National Council of‘Applied
and Economic Research) predicted the annual stock of scooters using a regression model
in which real personal disposable income and relative weighted price index of scooters
were used as independent variable.
The correlation and regression analysis are suitable techniques to find relationship between
two variables only. But in reality you would rarely find a one-to-one causal relationship;
rather you would find that the dependent variables are affected by a number of independent
variables.
Example: Sale affected by the advertising budget, the media plan, the content of the
advertisements, number of salesmen, price of the product, efficiency of the distribution network
and a host of other variables.
(v) Time Series Analysis: A time series consists of a set of data (arranged in some desired
manner) recorded either at successive points in time or over successive periods of time.
The changes in such type of data from time to time are considered as the resultant of the
combined impact of a force that is constantly at work. This force has four components: Notes
(i) Editing time series data, (ii) secular trend, (iii) periodic changes, cyclical changes and
seasonal variations, and (iv) irregular or random variations. With time series analysis,
you can isolate and measure the separate effects of these forces on the variables. Examples
of these changes can be seen, if you start measuring increase in cost of living, increase of
population over a period of time, growth of agricultural food production in India over the
last fifteen years, seasonal requirement of items, and impact of floods, strikes, and wars
and so on.
(vii) Index Numbers: Index number is a relative number that is used to represent the net result
of change in a group of related variables that has some over a period of time. Index
numbers are stated in the form of percentages.
Example: If we say that the index of prices is 105, it means that prices have gone up by 5%
as compared to a point of reference, called the base year. If the prices of the year 1985 are
compared with those of 1975, the year 1985 would be called “given or current year” and the year
1975 would be termed as the “base year”. Index numbers are also used in comparing production,
sales price, volume employment, etc. changes over period of time, relative to a base.
in
e.
fre
(viii) Sampling and Statistical Inference: In many cases due to shortage of time, cost or non-
s4
availability of data, only limited part or section of the universe (or population) is examined
e
to (i) get information about the universe as clearly and precisely as possible, and
ot
.n
(ii) determine the reliability of the estimates. This small part or section selected: from the
w
universe is called the sample and the process of selections such a section (or past) is called
w
w
sampling.
//
s:
tp
Scheme of drawing samples from the population can be classified into two broad categories:
ht
(a) Random sampling schemes: In these schemes drawing of elements from the population
is random and selection of an element is made in such a way that every element has
equal chance (probability) of being selected.
(b) Non-random sampling schemes: In these schemes, drawing of elements from the
population is based on the choice or purpose of selector.
The sampling analysis through the use of various ‘tests’ namely Z-normal distribution, student’s
‘t’ distribution; F-distribution and x 2-distribution make possible to derive inferences about
population parameters with specified level of significance and given degree of freedom. You
will read about a number of tests in this block to derive inference about population parameters.
Self Assessment
8. Regression measures the degree to which the change in one variable (the dependent
variable) is associated with change in the other variable (independent one).
9. Index number is a relative number that is used to represent the net result of change in a
group of related variables that has some over a period of time.
Executives at all levels in business and industry come across the problem of making decision at
every stage in their day-to-day activities. Quantitative techniques provide the executive with
scientific basis for decision-making and enhance his ability to matte long-range plans and to
Notes solve every day problems of running a business and industry with greater efficiency and
confidence.
Let us now also look at some of the advantages of the study of statistics:
Example: The statement that “literacy rate as per 1981 census was 36% compared to 29%
for 1971 census” is more convincing than stating simply that “literacy in our country has increased”.
2. Condensation: The new data is often unwieldy and complex. The purpose of statistical
methods is to simplify large mass of data and to present meaningful information from
them.
Example: It is difficult to form a precise idea about the income position of the people of
in
India from the data of individual income in the country. The data will be easy to understand and
e.
fre
more precisely if it can be expressed in the form of per capita income.
s4
past and present results with a view to ascertaining the reasons for change which have
.n
w
taken place and the effect of such changes in the future. Thus, if one wants to appreciate the
w
significance of figures, then he must compare them with other of the same kind.
w
//
s:
tp
ht
Example: The statement “per capita income has increased considerably” shall not be
meaningful unless some comparison of figures of past is made. This will help in drawing
conclusions as to whether the standard of living of people of India is improving.
4. Formulation of policies: Statistics provides the basic material for framing policies not
only in business but in other fields also.
Example: Data on birth and mortality rate not only help is assessing future growth in
population but also provide necessary data for framing a scheme of family planning.
5. Formulating and testing hypothesis: Statistical methods are useful in formulating and
testing hypothesis or assumption or statement and to develop new theories.
Example: The hypothesis: “whether a student has benefited from a particular media of
instruction”, can be tested by using appropriate statistical method.
6. Prediction: For framing suitable policies or plans, and then for implementation it is
necessary to have the knowledge of future trends. Statistical methods are highly useful for
forecasting future events.
Example: For a businessman to decide how many units of an item should be produced in
the current year, it is necessary for him to analyse the sales data of the past years.
Management
(i) Marketing:
(a) Analysis of marketing research information
(b) Statistical records for building and maintaining an extensive market
(c) Sales forecasting
(ii) Production:
(a) Production Planning, control and analysis
(b) Evaluation of machine performance
(c) Quality control requirements
in
(d) Inventory control measures
e.
fre
(iii) Finance, Accounting and Investment:
e s4
Economics
Diagnosing the disease based on data like temperature, pulse rate, blood pressure etc.
Task Think of any major decision you made recently. Recall the steps taken by you to
arrive at the final decision. Prepare a list of those steps.
Self Assessment
10. Statements of facts conveyed numerically are more precise and convincing than those
stated qualitatively.
in
e.
11. Statistical methods are useful in formulating and testing hypothesis or assumption or
fre
statement and to develop new theories.
s4
e
12. Statistical methods are not useful for forecasting future events.
ot
.n
w
1.8 Summary
w
w
//
s:
Quantitative techniques refer to the group of statistical, and operations research (or
tp
programming) techniques.
ht
The word statistics can be uses, in a number of ways. Commonly it is described in two
senses namely: Plural Sense (Statistical Data) and Singular Sense (Statistical Methods).
The field of statistics provides the methods for collecting, presenting and meaningfully
interpreting the given data.
Depending upon the purpose, models can be classified into four categories: Descriptive
model, Explanatory model, Predictive model and Prescriptive (or normative) model.
The main three types of averages commonly used are: Mean, Median, Mode.
Quantitative techniques provide the executive with scientific basis for decision-making
and enhance his ability to matte long-range plans and to solve every day problems of
running a business and industry with greater efficiency and confidence.
1.9 Keywords
Descriptive Models: Models which are used to describe the behaviour of a system based on data.
Descriptive Statistics: It is concerned with the analysis and synthesis of data so that better
description of the’ situation can be made.
Explanatory Models: Models which are used to explain behaviour of a system by establishing
relationships between its various components.
Inductive Statistics: It is concerned with the developments of scientific criteria which can be
used to derive information about the group of data by examining only a small portion (sample)
of that group.
Predictive Models: Models which are used to predict the status of a system in the near future
based on data.
Quantitative Techniques: It is the name given to the group of statistical and operations research
(or programming) techniques.
Statistical Data: It refers to numerical description of quantitative aspects of things. These
descriptions may take the form of counts or measurement.
Statistical Decision Theory: It is concerned with the establishment of rules and procedures for
choosing the course of action from alternative courses of actions under situation of uncertainty.
Statistical Methods: These methods include all those devices of analysis and synthesis by means
of which statistical data are systematically collected and used to explain or describe a given
phenomenon.
in
e.
1. Quantitative techniques 2. fre
Qualitative or judgment
s4
Books Gupta, S.P. and M.P. Gupta, 1987. Business Statistics, Sultan Chand & Sons: New
Delhi.
Loomba, M.P., 1978. Management–A Quantitative Perspective, MacMillan
Publishing Company: New York.
Shenoy; G.V., U.K. Srivastava and S.C. Sharma, 1985. Quantitative Techniques
for Managerial Decision Making, Wiley Eastern: New Delhi.
Venkata Rao, K., 1986. Management Science, McGraw-Hill Book Company:
Singapore.
in
www.scribd.com/doc
e.
fre
www.soas.ac.uk
s4
www.web-source.net
e
ot
.n
w
w
w
//
s:
tp
ht
CONTENTS
Objectives
Introduction
2.1 Meaning of Research
2.1.1 Definition of Research
2.2 Features of a Good Research Study
2.3 Scope and Significance of Research
2.4 Goals, Strategy and Tactics of Research
2.5 Internal and External Research Suppliers
2.5.1 External Organizations for Conducting Marketing Research
in
2.6 Marketing Research – A Definition
e.
2.6.1 Explanation
fre
s4
e
Notes Objectives
Introduction
in
e.
First is Variable. What exactly is a variable? Variable is the quantity, in which we are interested,
fre
that varies in the course of the research or that has different variables for different samples in
s4
our study. In one word, we can define variable as a factor whose change or difference we study.
e
ot
Now, there are two types of variables. The first one is Dependent variable and the second is
.n
w
Independent variable. Dependent variable is that quantity or aspect of nature whose change at
w
different stages the researcher wants to understand or explain. In cause and effect investigation,
w
//
Now, what exactly is an independent variable? Independent variable is a variable, whose effect
ht
upon the dependent variable we try to understand. There may be several independent variables.
For instance, we may simultaneously investigate the effect of mother's cigarette smoking, mother's
exercise, parents' weights and other variables upon the weight of a baby. In this case, mother's
cigarette smoking, mother's exercise, parents' weights and other variables are independent variables,
which we want to study, upon the weight of the baby, which is dependent variable.
Now, there are certain other areas. One such area is Universe. We can define it as the total
population. It is the laboratory for the research. In our research we may have or we may take the
entire population of India. In that case, as it is obvious, no researcher can carry out research on
the entire population of India to find out the truth or to find out some areas of his research
interest. In some cases, Universe or population may be a particular group. To clarify this point
further, let us assume we want to study some effects on some particular group of people,
religions; Hindu, Muslim, Christian, Jain, etc., or certain particular age-group, the age of 25 to
35. In that case our universe is getting limited to that particular religion or to that particular age-
group of people. Similarly even then we find that carrying out research on the universe, i.e., on
the entire population of that particular group may not be always possible because of the time
factor and the money involved. In that case, what we usually do is to take out samples selected
from the entire population. In selecting samples, we use the available sampling techniques to
draw from the total population. Apart from these, we need to clarify certain other concepts.
Empirical means the observations and propositions which are primarily based on some sense
experiments or derived from experience by methods of inductive logic including mathematics
and statistics. This technical definition is difficult to understand. To be clearer we can define
empirical research as that type of research where we try to deduce some logic and principles Notes
based on our survey reports. In other words, when we want to analyze the survey report using
some mathematical and statistical tools and deduce logic to authenticate our findings, we follow
the empirical research method.
Research has been defined by various authors in different ways. It always begins with a question
or a problem. Its purpose is to find answers to questions through the application of systematic
and scientific methods. Thus, research is the systematic approach towards purposeful
investigation. This needs formulating a hypothesis, collection of data on relevant variables,
analyzing and interpreting the results and reaching conclusions either in the form of a solution
or certain generalizations.
Notes Research is an academic activity and a systematized effort to gain new knowledge.
in
Research in common man's language refers to "search for knowledge".
e.
fre
Research is an art of scientific investigation. It is also a systematic design, collection, analysis
s4
and the reporting, the findings and solutions for the marketing problems of a company. Research
e
ot
Example: Should we maintain the same advertising budget as last year? Research will
provide an answer to this question.
To find alternative strategies
Example: Should we follow pull strategy or push strategy to promote the product?
To develop new concepts
Research may be defined as a documented prose work. Documented prose work means organized
analysis of the subject based on borrowed materials with suitable acknowledgement and
consultation in the main body of the paper. Research in management is particularly important
to find out different phenomena.
Notes
!
Caution At the outset we should distinguish between researches in different areas.
Management research comes within the purview of social science research and there are other
different types of research which broadly fall into the category of physical science research.
Carrying out research in social science subjects, i.e., Commerce, Management, Economics,
Sociology, etc., is basically different from Physical Science because, here we need to study the
society based on certain trends and for this the laboratory is the society.
Self Assessment
in
e.
fre
Following are the features of a good research study:
es4
i. Objectivity: A good research is objective in the sense that it must answer the research
ot
.n
questions. This necessitates the formulation of a proper hypothesis; otherwise there may
w
ii. Control: A good research must be able to control all the variables. This requires
s:
tp
randomization at all stages, e.g., in selecting the subjects, the sample size and the
ht
experimental treatments. This shall ensure an adequate control over the independent
variables.
iii. Generalisability: We should be able to have almost the same result by using an identical
methodology so that we can apply the result to similar situations.
iv. Free from Personal Biases: A good research should be free from the researcher's personal
biases and must be based on objectivity and not subjectivity.
v. Systematic: A good research study must have various well planned steps, i.e., all steps
must be interrelated and one step should lead to another step.
vi. Reproducible: A researcher should be able to get approximately the same results by using
an identical methodology by conducting investigation on a population having
characteristics identical to the one in the earlier study.
Hence, the following points must be ensured:
Purpose clearly detailed
Research design thoroughly planned
High ethical standards applied
Limitations frankly revealed
A complete and proper analysis
Findings presented unambiguously
ii. Facilitates large-scale production: The MR helps large scale enterprises in the areas of
production to determine:
(a) What to produce?
in
availability of consumer credit in that particular place.
e.
MR helps the marketer to identify: fre
s4
iv. Complex market: In a complex and dynamic environment, the role of MR is very vital. MR
acts as a bridge between the consumer and the purchaser. This is because MR enables the
management to know the need of the customer, the about demand for the product and
helps the producer to anticipate the changes in the market.
v. Problem-solving: The MR focuses on both short range and long range decisions and helps
in making decisions with respect to the 4p's of marketing, namely, product, price, place
and promotion.
vi. Distribution: The MR helps the manufacturer to decide about the channel, media, logistics
planning so that its customers and distributors are benefited. Based on the study of MR,
suitable distributors, retailers, wholesalers and agents are selected by the company for
distributing their products.
vii. Sales promotion: The MR helps in effective sales promotion. It enlightens the manufacturer
with regard to the method of sales promotion to be undertaken, such as advertising,
personal selling, publicity etc. It also helps in understanding the attitude of the customers
and helps how to design the advertisement in line with prevailing attitudes.
Self Assessment
Many researchers agree that the goals of scientific research are: description, prediction, and
explanation/understanding. Some individuals add control and application to the list of goals.
The goal of research is to find out answers to questions through the application of systematic
and scientific way.
Though there is a specific purpose behind each research study, however, the objectives can be
broadly classified as under:
in
Marketing research can be conducted by having:
e.
(a)
fre
Internal marketing departments in the organisation
e s4
or
ot
.n
(b) By taking the help of external agencies such as ORG, Marg, AC Neilson etc.
w
w
w
The type of organisation selected for market research depends on how big an organisation is
//
s:
and the varied type of products manufactured etc. There are two types of internal departments.
tp
or
(b) A full-fledged market research department where several employees are involved.
Small companies may not be able to afford a full-fledged MR department. They may appoint
one or two persons to conduct MR and report the results to the head of the company. In medium
sized firms, MR reports are collected by head of the marketing department. In larger firms, an
independent marketing research department is established on a permanent basis and an
experienced person is appointed as the head of marketing department.
The marketing department usually has executives, secretaries, assistants and others. The
marketing department can function either on a centralized or decentralized basis. The centralized
marketing department has the advantage of good coordination with various departments. On
the other hand, decentralized marketing departments score in gaining valuable knowledge
regarding markets, products in the respective area (i.e. is localised).
The advertising agencies conduct marketing research for their clients. Ad agencies undertake
media studies, group research etc. The ad agencies also conduct image opinion research, market
potential research etc.
Trade Associations
Trade associations also conduct MR. For instance, the Confederation of Engineering Industries
in
conducts MR for various engineering products. In India, many consumer goods manufacturers
e.
such as Godrej, P&G, HLL have their own MR organizations. fre
es4
Manufacturers
ot
.n
w
Manufacturers of smaller industries join together and undertake marketing research for their
w
w
mutual benefit.
//
s:
tp
ht
Example: Textile companies have the Textile Manufacturers Association, which conducts
research on potential for garments made in the country.
The retailers have been predominantly concerned with shop location studies, special promotion
studies, pricing, retail stores investigation and sales research and so on. The retailers also conduct
a study on consumer behaviour and attitudes towards a particular product. The wholesalers are
interested in conducting MR on retailers' behaviour. They are also interested in learning about
the attitudes of the retailers towards inventories, service provided by the wholesalers etc. The
latter category of researchers is lesser in number.
Government Agencies
MR is also carried out by few government agencies. The government departments collect
information on subjects such as agricultural market surplus, consumer goods market surplus,
price indices, imports and exports, etc. This helps them in formulating policies.
Example: The institutes like the IIMs, IIFT engage themselves in doing marketing research
for certain corporate entities.
Notes
Self Assessment
in
e.
fre
2.6 Marketing Research – A Definition
es4
ot
(c) Fact finding method, used for the purpose of important decision-making and to regulate
ht
2.6.1 Explanation
From the above definition, it becomes clear that MR is the collection and interpretation of facts
that help in marketing management to provide products and services more efficiently into the
hands of consumers. It includes various types of research such as:
Scientific research is one which yields the same results when repeated by different individuals.
Scientific method consists of the following steps:
1. Observation: The researcher wants to observe a set of important factors that is related to
his problem.
2. Formulates Hypothesis: The researcher formulates a hypothesis which will explain what
he has observed.
3. Future Prediction: The researcher draws a logical conclusion.
4. Testing the Hypothesis: The researcher will arrive at the conclusion based on data.
Notes
Example: A simple example will highlight how a scientific method works.
Let us assume that a researcher is conducting a market research for a client manufacturing
men's apparel.
1. Observation: The researcher observes that some of the competitors are doing brisk
business. The increase on sales of apparel is mainly due to round or turtleneck shirt
and narrow bottom pants.
2. Formulation of Hypothesis: The researcher now presumes that the products of his clients
are somewhat similar and the variation in shirt and pant variety as above is the
main cause for an increase in the sales of his competitors.
3. Future Prediction: It is predicted that if his client introduces similar products, the
sales will increase.
4. Hypothesis Testing: The client now produces round-neck shirts and narrow bottom
pants for test marketing.
in
2.7.1 Characteristics of Scientific Method
e.
fre
s4
(a) Validity
e
ot
(b) Reliability.
.n
w
As long as the questionnaire serves this purpose, we say that the instrument is valid.
tp
ht
In physical sciences, the instruments used such as barometer, thermometer or foot ruler which
measures what they are meant to do. Also, the measurement can be repeated any number of
times by different individuals, but the result will be the same.
In Marketing Research, the instrument used is a questionnaire. There are five main problems
faced by researcher regarding validity and reliability:
1. Different respondents interpret the same question in different manner. So the reply of the
respondents will be different.
Notes
Did u know? Reliability implies that we must obtain similar result again when measured.
Example: Linear measurements using a foot ruler, velocity of light and sound in a given
media will be the same, when measured repeatedly.
in
e.
The conclusions should be based on facts. Our mindsets should not influence the decision-
making.
fre
s4
e
ot
.n
Example: When the Howthorne studies began, it was thought that "employee satisfaction
w
has improved productivity". Later research proved otherwise. In fact, subsequent research
w
w
justified that productivity and employee satisfaction are not directly related.
//
s:
Similarly, in marketing research, the researcher should not proceed with pre-conceived notions.
tp
ht
He must keep an open mind and be objective. Sometimes, researchers approach the respondents,
who are easy to reach, and with whom they are comfortable even though they may not represent
the true sample. In this case, the objectivity is sacrificed.
Accuracy
Accuracy is possible through the use of scientific instruments. For, the measuring instrument is
valid and reliable. In marketing research, a questionnaire is used to measure these aspects such
as attitude, preference etc. but this instrument is crude.
It is difficult to judge whether the respondent is answering correctly. Due to all these factors,
accuracy is often sacrificed.
Science is marked by continuity. This is because, every time there is an invention, the same is
carried forward for further improving the same.
Example: Basic telephony vs Latest mobile phones, early steam engines vs electrically
driven engines.
In marketing research, there is less continuity. The present researcher does not start from where
it was left off. Each project is independent. What is learnt in one assignment is not made use of
in subsequent projects.
Due to all the above three reasons, we can conclude that marketing research is not scientific.
Role of investigators
in
Inaccuracy of measuring instruments
e.
Influence of measurement fre
s4
Pressures of time-frame
e
ot
.n
Testing of hypothesis
w
w
Role of Investigators
ht
Organisations are the clients of researchers. Sometimes, the investigator tries to fit in results
which are readily acceptable to clients. This is possible when the investigator manipulates the
data or does not conduct an exhaustive study. In either of these circumstances, the study becomes
unscientific.
Accuracy of measurement separates scientific and unscientific methods. Since human beings are
the participants, subjectivity invariably creeps in. Most of the information obtained from the
respondent is qualitative in nature.
Example: Brand preference of a respondent. The respondent might say that "I immensely
like this brand." It is difficult, if not impossible, to quantify this reply. The questionnaire used to
measure attitudes is not precise.
Also, each of the interviewers will administer the questionnaire differently. Added to this, the
respondent's casual answer may be due to pre-occupation, fatigue etc.
Influence of Measurement
In physical sciences, the researcher can repeat an experiment any time to get the same results.
This is not the case with marketing research. When a respondent realises that he is being measured,
his response and behaviour undergoes a change. Because, human reaction changes quickly. The
reliability and validity of research will suffer a great deal.
Marketing research must be conducted and completed within a given time-span. If more time is
consumed in conducting the research, competitors might enter and capture the market. The
research is concluded in a hurry, leading to lack of credibility due to the pressure exerted by the
clients on the researcher.
Testing of Hypothesis
Any hypothesis formed in M.R must be tested. Thus, experimentation has to be resorted to. In
marketing research, it is almost impractical to carryout experiment due to many factors which
come in the way. For instance, while measuring the impact of advertising on sales, it may so
happen that competitors have also advertise, resulting in lesser volume of sales. Also, it is
impossible to reproduce the same experiment. The reliability of research suffers on account.
The subject becomes very complex, due to the fact that it is human beings, who interact with the
in
e.
researcher. Human reaction varies from time to time. Different individuals react differently for
fre
a given stimuli. Added to this are the environmental factors and peer influence adding to the
s4
complexity. Ads or promotions campaign held at different times yield different results due to
e
ot
changed perceptions and reactions by audience. Due to all these reasons, we can say that marketing
.n
research is complex.
w
w
w
Market researchers are subjected to pressure. The research is to be completed quickly before
//
s:
competitors enter the market. Due to this pressure, reliability might suffer and face difficulty in
tp
testing the hypothesis. Testing the hypothesis is the core in scientific research. In marketing, it
ht
Self Assessment
Research process is the main content of marketing research. It defines marketing research and
describes the skills required to identify the problem, the decision alternatives, and the client's
needs, which are critical components of a research activities. The marketing research is expected
to understand the market information needs of decision makers, and on the other hand expected
to follow the proper processes and procedures for obtaining that information. Marketing research
process involves a number of inter-related activities which overlap and do not rigidly follow a
particular sequence. A researcher is often required to think a few steps ahead, because various
steps in research process are inter-woven into each other and each step will have some influence
over the other steps. In marketing research, even though our focus is on one particular step, Notes
other inter-related steps of operations are also being looked into simultaneously. As we complete
one activity or operation, our focus naturally shifts from it to the subsequent one, i.e. the focus
is not concentrated exclusively on one single activity or operation at any particular point of
time. The research process provides systematic, planned approach to the research project and
ensures that all aspects of the research project are consistent with each other.
A research problem refers to some difficulty which an organisation faces and wishes to obtain a
solution for the same.
Defining a research problem is the fuel that drives the scientific process, and is the foundation of
any research method and experimental design, from true experiment to case study. The first and
foremost step in the research process consists of problem or opportunity identification. The
necessity of properly identified research problems cannot be overemphasized. It is rightly said
in
e.
that a problem properly defined is half solved. fre
s4
Based upon the objective, the research problem could be in any of the following three areas:
e
ot
i. Exploratory for gathering preliminary information that may help in defining the problem
.n
and suggest hypothesis. The major emphasis of exploratory research is on the discovery of
w
w
ideas. The idea is to clarify concepts and subsequently make more extensive research on
w
them.
//
s:
tp
ii. Descriptive, which may describe things such as market potential for a product or the
ht
Example: An ambiguous definition: "Find out by how much sales have declined recently".
Let us suppose that the research problem is defined in a broad and general way as follows:
"Why is the productivity in Korea much higher than that in India"? In this type of question, a
number of ambiguities are there, such as:
What sort of productivity is to be specified; is it men, machine, materials?
To which type of industry is the productivity related to?
In which time-frame are we analyzing the productivity?
Notes
Example: An unambiguous definition: On the contrary, a problem will be as follows:
"What are the factors responsible for increased labour productivity in Korean textile
manufacturing industries during 1996-07 relative to Indian textile industries?”
Research design is one of the important steps in marketing research. It helps in establishing the
manner researchers go about to achieve the objective of the study. The preparation of a research
design involves a careful consideration of the following questions and making appropriate
in
decisions about them:
e.
fre
1. What the study is about?
s4
e
ot
There are nine steps in the research process that can be followed while designing a research
project. They are as follows:
in
e.
Determine the sample size fre
s4
Organize the field work
e
ot
Self Assessment
w
//
s:
tp
10. The …………….provides systematic, planned approach to the research project and ensures
that all aspects of the research project are consistent with each other.
11. Marketing research process involves a number of ……………activities which overlap and
do not rigidly follow a particular sequence.
Problem formulation is the key to research process. For a researcher, the problem formulation
means converting the management problem to a research problem. In order to attain clarity, the
MR Manager and the researcher must articulate clearly so that perfect understanding of each
other is achieved.
Notes 1. Determine the objective: The objective may be general or specific. General category – It
would like to know how effective the advertising campaign was.
The corollary looks like a statement with an objective. In reality, this is far from the case.
There are two ways of determining the objectives precisely: (1) The researcher should
clarify with the MR manager "what effective means". Does effective mean, the awareness
or does it refer to an increase in sales or does it mean it has improved the knowledge of the
audience, or the perception of audience about the product? In each of the above
circumstances, the question to be asked from the audience varies (2) Another way to
determine objectives is to find out from the MR Manager, "What action will be taken,
given the specified outcome of the study?"
Example: If research findings to the previous advertisement by the company was indeed
ineffective, what course of action does the company intend to take? (a) Increase the budget for
the next Ad (b) Use different appeal (c) Change the media (d) Go to a new agency.
If the objectives are proper, the research questions will be precise. However, we should
remember that objectives do undergo a change.
in
e.
2. Consider environmental factors: Environmental factors influence the outcome of the
fre
research and the decision. Therefore, the researcher must help his client to identify the
s4
Example: Assume that the company wants to introduce a new product like iced tea or
w
3. What is the perception of the people about other products of the company, with respect to
price, image of the company.
4. Size of the market and target audience.
All the above factors could influence the decision. Therefore, the researcher must work
very closely with his client.
3. Nature of the problem: By understanding the nature of the problem, the researcher can
collect relevant data and help suggest a suitable solution. Every problem is related to
either one or more variables. Before beginning the data collection, a preliminary
investigation of the problem is necessary for a better understanding of the same.
Notes Initial investigation could be carried by using a focus group of consumers or sales
representatives.
If a focus group is carried out with consumers, some of the following questions will help
the researcher to understand the problem better:
i. Did the customer ever include this company's product in his mental map?
ii. If the customer is not buying the company's product, the reasons for his not doing
so.
iii. Why did the customer turn to the competitor's product? Notes
4. Stating the alternatives: The researcher would be better served by generating as many
alternatives as possible during the problem formulation hypothesis.
Example: Whether to introduce a sachet form of packaging with a view to increase sales.
The hypothesis may state that acceptance of the sachet by the customer will increase the sales by
20%. Thereafter, the test marketing will be conducted before deciding whether to introduce the
sachet variant. Therefore, for every alternative, a hypothesis has to be developed.
There are several methods to establish the value of research. Some of them are (1) Bayesian
approach (2) Simple saving method (3) Return on investment (4) Cost benefit approach.
in
Example: Company 'X' wants to launch a product. The company's intuitive feeling is
e.
that the possibilities of the product's failure are 35%. However, if research is conducted and
fre
appropriate data is gathered, the chances of failure could be reduced to 30%. The company has
s4
calculated that losses would be to the tune of 3,00,000 if the product fails. The company has
e
ot
received a quotation from an MR agency. The cost of the intended research is 75,000. The
.n
w
question is: "Should the company spend this money to conduct the research?"
w
w
Calculation:
//
s:
tp
= 1,05,000
= 90,000
= 1,05,000 – 90,000
= 15,000
Since the value of information, namely 15,000 is lower than the cost of research, i.e., 75,000,
conducting this particular research is not recommended.
Example: Company 'A' would like to introduce a new product in the market. The research
agencies have given an estimation of 5 lakhs and a time period of five months. According to
the past experience of the company, the probability of earning 10 lakh is 0.4 and 5 lakh is 0.3
and losing 7 lakh is 0.3. Should the company undertake the research?
Calculation:
Since we find that the expected value of information i.e. 3.4 lakh, less than the cost of M.R at
5 lakh, there is no need to carry out this research.
Assume that company 'X' wants to introduce a new product (tea powder). Before introducing
this product, it has to be test marketed. The company needs to know the extent of competition,
price and quality acceptance from the market. In this context, following is the list of information
required:
(a) Total demand and company sales
Example: What is the overall industry demand? What is the share of competitors? The
above information will help the management estimate the overall share and its own share in the
market.
(b) Distribution coverage
Example:
(1) Availability of products at different outlets.
in
e.
(2) Effect of shelf display on sales.
fre
s4
(c) Market awareness, attitudes and usage
e
ot
.n
w
Example:
w
w
Example: "How much did the competitor spend to market a similar product"?
Example: "Causes for the decline in sales of a specific company's product in a specific
territory under a specific salesman".
The researcher may explore possible reasons as to why sales are failing.
Less availability
Inefficient advertising/salesmanship
Less awareness
Conclusive research: Narrow down the option. Only one or two factors are responsible for
decline in sales. Therefore zero down, and use judgment and past experience.
(a) Who should be interviewed for collecting data?: If the study is undertaken to determine
whether children influence the brand, for ready - to eat cereal (corn flakes) purchased
by parents. The researcher must decide, if only adults are to be studied or children
too included. The researcher must decide if data is to be collected by observation
method or by interviewing. If an interview is chosen, should it be a personal
in
interview or telephonic interview or questionnaire?
e.
fre
(b) Should a few cases be studied or a large sample be chosen?: The researcher may feel that
es4
there are some cases available which are identical and similar in nature. He may
ot
decide to use these cases for formulating the initial hypothesis. If suitable cases are
.n
w
not available, then the researcher may decide to choose a larger sample.
w
w
//
Example: In a test of advertising copy, the respondents can first be interviewed to measure
their present awareness, and their attitudes towards certain brands. Then, they can be shown a
pilot version of the proposed advertisement copy. Following this, their attitude too has to be
measured again, to see if the proposed copy had any effect on them.
If it is a questionnaire, then the following questions should be postal– (a) What are the contents
of the questionnaire? (b) What type of questions are to be asked? Pointed questions, general
questions etc. (c) In what sequence should the questions be asked? (d) Should there be a fixed set
of alternatives or should the question be open-ended. (e) Should the purpose be made clear to
the respondents or should the same be disguised? are to be determined well in advance.
The first task is to carefully select which groups of people or stores are to be sampled.
Example: Collecting the data from a fast food chain. Here, it is necessary to define what
is meant by fast food chain. Also, the precise geographical location should be mentioned.
The next step is to decide whether to choose probability sampling or non-probability sampling.
Probability sampling is one in which each element has a known chance of being selected. A non-
probability sampling can be convenience or judgment sampling.
!
Caution While selecting the sample, the sample unit has to be clearly specified.
Example: Survey on the attitudes towards the use of shampoo with reference to a specific
brand, where husbands, wives or a combination of them are to be surveyed or a specific segment
is to be surveyed. The sample size depends on the size of the sample frame/universe.
in
e.
2.9.6 Organize the Fieldworkfre
es4
This includes selection, training and evaluating the field sales force to collect the data:
ot
.n
(d) Week, day and time to meet the specific respondents etc., are to be decided.
This involves:
(a) Editing
(b) Tabulating
(c) Codifying.
Editing: The data collected should be scanned to make sure that it is complete and that all the
instructions are followed. This process is called editing. Once these forms have been edited, they
must be coded.
Coding means assigning numbers to each of the answers, so that they can be analysed.
The final step is called data tabulation. It is the orderly arrangement data in a tabular form. Also,
at the time of analysing the data, the statistical tests to be used must be finalised such as T-Test,
Z-Test, Chi-square Test, ANOVA etc.
Tasks
B. Given the following research problem, identify the corresponding decision problem
for which the information will be useful:
1. Assess the level of awareness among housewives regarding the benefits of
introducing a new product in the market.
2. Assess attitudes and opinions of customers towards existing five-star hotels.
3. Design a test market to assess the effect of particular discount scheme on the
volume of sales of the product.
Self Assessment
in
13. ………..means assigning numbers to each of the answers, so that they can be analysed.
e.
fre
s4
2.10 Summary
e
ot
.n
//
The goal of research is to find out answers to questions through the application of
ht
systematic and scientific way.
Scientific research is one which yields the same results when repeated by different
individuals.
Research process is the main content of marketing research.
A research problem refers to some difficulty which an organisation faces and wishes to
obtain a solution for the same.
Defining a research problem is the fuel that drives the scientific process, and is the foundation
of any research method and experimental design, from true experiment to case study.
Research methodology is a method to solve the research problem systematically.
Problem formulation is the key to research process. For a researcher, the problem
formulation means converting the management problem to a research problem.
2.11 Keywords
Research Problem: A research problem refers to some difficulty which an organisation faces and
wishes to obtain a solution for the same.
Research Process: Research process is the main content of marketing research. It defines marketing
research and describes the skills required to identify the problem, the decision alternatives, and
the client’s needs, which are critical components of a research activities.
Research: Research is an art of scientific investigation. It is also a systematic design, collection,
analysis and the reporting the findings and solutions for the marketing problems of a company.
Notes Validity: Validity is the ability of a measuring instrument to measure what it is supposed to.
Variable: Variable is the quantity, in which we are interested, that varies in the course of the
research or that has different variables for different samples in our study.
1. Research 2. Scientific
3. Objective 4. MR
5. Market research 6. Questions
7. Validity 8. Reliability
9. Accuracy 10. research process
11. inter-related 12. Sample unit
13. Coding
in
2.12 Review Questions
e.
fre
s4
CONTENTS
Objectives
Introduction
3.1 Construct
3.2 Definitions/Concept
3.2.1 Types of Questions
3.2.2 Time in Research
3.2.3 Types of Relationships
3.3 Variables
3.3.1 Meaning
3.3.2 Attribute
in
e.
3.3.3 Dependency fre
s4
3.3.4 Exhaustive
e
ot
3.5 Summary
//
s:
tp
3.6 Keywords
ht
Objectives
Introduction
Learning about research is a lot like learning about anything else. To start, we need to learn the
jargon people use, the controversies they fight over, and the different factions that define the
major players.
If the language is not stuck to, it might just be a surprise to see how esoteric the discussion can
get (but not enough to cause you to give up in total despair). The language of research also
includes some of the major issues in research like the types of questions we can ask in a
project, the role of time in research, and the different types of relationships we can estimate.
Then we have to consider defining some basic terms like variable, hypothesis, data, and unit
of analysis.
Notes Research involves an eclectic blending of an enormous range of skills and activities. To be a
good social researcher, you have to be able to work well with a wide variety of people, understand
the specific methods used to conduct research, understand the subject that you are studying, be
able to convince someone to give you the funds to study it, stay on track and on schedule, speak
and write persuasively, and on and on.
3.1 Construct
in
2. Nomothetic and Idiographic: The word nomothetic comes perhaps from the writings of
e.
the psychologist Gordon Allport. It refers to laws or rules that pertain to the general case
fre
(nomos in Greek) and is contrasted with the term “idiographic” which refers to laws or
s4
Greek).
.n
w
w
w
//
s:
tp
Notes In any event, the point is that most social research is concerned with the nomothetic–
ht
the general case – rather than the individual. We often study individuals, but usually we
are interested in generalizing to more than just the individual.
3. Probabilistic and Realistic: In our post-positivist view of science, we no longer regard
certainty as attainable. Thus probabilistic as a term represents much contemporary social
research which is most often than not, based on probabilities. The inferences that we make
in social research have probabilities associated with them – they are seldom meant to be
considered covering laws that pertain to all cases. Part of the reason we have seen statistics
become so dominant in social research is that it allows us to estimate probabilities for the
situations we study.
3.2 Definitions/Concept
Every research involves certain mandatory concepts that have to be understood and well applied
by every researcher.
There are three basic types of questions that research projects can address:
1. Descriptive: When a study is designed primarily to describe what is going on or what
exists. Public opinion polls that seek only to describe the proportion of people who hold
various opinions are primarily descriptive in nature.
Example: If we want to know what percent of the population would vote for a Democratic
or a Republican in the next presidential election, we are simply interested in describing something.
2. Relational: When a study is designed to look at the relationships between two or more Notes
variables.
Example: A public opinion poll that compares what proportion of males and females
say they would vote for a Democratic or a Republican candidate in the next presidential election
is essentially studying the relationship between gender and voting preference.
3. Causal: When a study is designed to determine whether one or more variables (e.g., a
program or treatment variable) causes or affects one or more outcome variables.
Example: If we did a public opinion poll to try to determine whether a recent political
advertising campaign changed voter preferences, we would essentially be studying whether the
campaign (cause) changed the proportion of voters who would vote Democratic or Republican
(effect).
The three question types can be viewed as cumulative. That is, a relational study assumes that
you can first describe (by measuring or observing) each of the variables you are trying to relate.
in
And, a causal study assumes that you can describe both the cause and effect variables and that
e.
fre
you can show that they are related to each other. Causal studies are probably the most demanding
s4
of the three.
e
ot
.n
Time is an important element of any research design; let us discuss one of the most fundamental
//
s:
A cross-sectional study is one that takes place at a single point in time. In effect, we are taking a
‘slice’ or cross-section of whatever it is we’re observing or measuring. A longitudinal study is
one that takes place over time – we have at least two (and often more) waves of measurement in
a longitudinal design.
A further distinction is made between two types of longitudinal designs: Repeated measures
and time series.
There is no universally agreed upon rule for distinguishing these two terms, but in general, if
you have two or a few waves of measurement, you are using a repeated measures design. If you
have many waves of measurement over time, you have a time series. How many is ‘many’?
Usually, we wouldn’t use the term time series unless we had at least twenty waves of measurement,
and often far more. Sometimes the way we distinguish these is with the analysis methods we
would use. Time series analysis requires that you have at least twenty or so observations.
Repeated measures analyses (like repeated measures ANOVA) aren’t often used with as many as
twenty waves of measurement.
A relationship refers to the correspondence between two variables. When we talk about types of
relationships, we can mean that in at least two ways: the nature of the relationship or the pattern
of it.
Figure 3.1
While all relationships tell about the correspondence between two variables, there is a special
type of relationship that holds that the two variables are not only in correspondence, but that
in
e.
one causes the other. This is the key distinction between a simple correlational relationship and
fre
a causal relationship. A correlational relationship simply says that two things perform in a
s4
synchronized manner.
e
ot
.n
For instance, we often talk of a correlation between inflation and unemployment. When inflation
w
is high, unemployment also tends to be high. When inflation is low, unemployment also tends to
w
w
be low. The two variables are correlated. But knowing that two variables are correlated does not
//
s:
tell us whether one causes the other. We know, for instance, that there is a correlation between the
tp
number of roads built in Europe and the number of children born in India. Does that mean that is
ht
we want fewer children in India, we should stop building so many roads in Europe? Or, does it
mean that if we don’t have enough roads in Europe, we should encourage Indian citizens to have
more babies? Of course not. While there is a relationship between the number of roads built and
the number of babies, we don’t believe that the relationship is a causal one. This leads to
consideration of what is often termed the third variable problem. In this example, it may be that
there is a third variable that is causing both the building of roads and the birthrate that is causing
the correlation we observe. For instance, perhaps the general world economy is responsible for
both. When the economy is good more roads are built in Europe and more children are born in
India. The key lesson here is that you have to be careful when you interpret correlations. If you
observe a correlation between the number of hours students use the computer to study and their
grade point averages (with high computer users getting higher grades), you cannot assume that
the relationship is causal: that computer use improves grades. In this case, the third variable might
be socioeconomic status – richer students who have greater resources at their disposal tend to both
use computers and do better in their grades. It’s the resources that drive both use and grades, not
computer use that causes the change in the grade point average.
Patterns of Relationships
We have several terms to describe the major different types of patterns one might find in a
relationship. First, there is the case of no relationship at all. If you know the values on one
variable, you don’t know anything about the values on the other.
Then, we have the positive relationship. In a positive relationship, high values on one variable
are associated with high values on the other and low values on one are associated with low
values on the other. In this example, we assume an idealized positive relationship between
years of education and the salary one might expect to be making.
On the other hand a negative relationship implies that high values on one variable are associated
with low values on the other. This is also sometimes termed an inverse relationship. Here, we
show an idealized negative relationship between a measure of self esteem and a measure of
paranoia in psychiatric patients.
in
Figure 3.3
e.
fre
e s4
ot
.n
w
w
w
//
s:
tp
ht
These are the simplest types of relationships we might typically estimate in research. But the
pattern of a relationship can be more complex than this. For instance, the figure on the left shows
a relationship that changes over the range of both variables, a curvilinear relationship. In this
example, the horizontal axis represents dosage of a drug for an illness and the vertical axis
represents a severity of illness measure. As dosage rises, severity of illness goes down. But at
some point, the patient begins to experience negative side effects associated with too high a
dosage, and the severity of illness begins to increase again.
Self Assessment
You won’t be able to do very much in research unless you know how to talk about variables.
3.3.1 Meaning
A variable is any entity that can take on different values. OK, so what does that mean? Anything
that can vary can be considered a variable.
Example: Age can be considered a variable because age can take different values for
different people or for the same person at different times.
Similarly, country can be considered a variable because a person’s country can be assigned a
value.
!
Caution Variables aren’t always ‘quantitative’ or numerical. The variable ‘gender’ consists
in
e.
of two text values: ‘male’ and ‘female’. We can, if it is useful, assign quantitative values
fre
instead of (or in place of) the text values, but we don’t have to assign numbers in order for
s4
something to be a variable.
e
ot
.n
It’s also important to realize that variables aren’t only things that we measure in the traditional
w
sense. For instance, in much social research and in program evaluation, we consider the treatment
w
w
or program to be made up of one or more variables (i.e., the ‘cause’ can be considered a variable).
//
s:
An educational program can have varying amounts of ‘time on task’, ‘classroom settings’, ‘student-
tp
teacher ratios’, and so on. So even the program can be considered a variable (which can be made
ht
up of a number of sub-variables).
3.3.2 Attribute
An attribute is a specific value on a variable. For instance, the variable sex or gender has two
attributes: male and female. Or, the variable agreement might be defined as having five attributes:
1. Strongly disagree
2. Disagree
3. Neutral
4. Agree
5. Strongly agree
In contrast to variables, which are intended for bulk data, attributes are intended for
ancillary data, or information about the data. The total amount of ancillary data associated
with a net CDF object, and stored in its attributes, is typically small enough to be memory-
resident. However variables are often too large to entirely fit in memory and must be
split into sections for processing.
Another difference between attributes and variables is that variables may be multidimensional.
Attributes are all either scalars (single-valued) or vectors (a single, fixed dimension).
Variables are created with a name, type, and shape before they are assigned data values, so a Notes
variable may exist with no values. The value of an attribute is specified when it is created, unless
it is a zero-length attribute.
A variable may have attributes, but an attribute cannot have attributes. Attributes assigned to
variables may have the same units as the variable (for example, valid-range) or have no units
(for example, scale-factor). If you want to store data that requires units different from those of
the associated variable, it is better to use a variable than an attribute.
3.3.3 Dependency
Another important distinction having to do with the term ‘variable’ is the distinction between
an independent and dependent variable. This distinction is particularly relevant when you are
investigating cause-effect relationships.
The terms dependent and independent variables are used to distinguish between two types of
quantities being considered, separating them into those available at the start of a process and
those being created by it, where the latter (dependent variables) are dependent on the former
(independent variables).
in
e.
fre
In a research experiment, the dependent variable (DV) is the event studied and expected to
s4
change whenever the independent variable is altered.
e
ot
In the design of experiments, an independent variable’s values are controlled or selected by the
.n
w
variable). In such an experiment, an attempt is made to find evidence that the values of the
w
//
independent variable determine the values of the dependent variable. The independent variable
s:
tp
(IV) can be changed as required, and its values do not represent a problem requiring explanation
ht
!
Caution The dependent variable, usually cannot be directly controlled.
Controlled variables are also important to identify in experiments. They are the variables that
are kept constant to prevent their influence on the effect of the independent variable on the
dependent. Every experiment has a controlling variable, and it is necessary to not change it, or
the results of the experiment won’t be valid.
“Extraneous variables” are those that might affect the relationship between the independent
and dependent variables. Extraneous variables are usually not theoretically interesting. They
are measured in order for the experimenter to compensate for them.
Example: An experimenter who wishes to measure the degree to which caffeine intake
(the independent variable) influences explicit recall for a word list (the dependent variable)
might also measure the participant’s age (extraneous variable). He can then use these age data to
control for the uninteresting effect of age, clarifying the relationship between caffeine and
memory.
In summary:
1. Independent variables answer the question “What do I change?”
2. Dependent variables answer the question “What do I observe?”
3. Controlled variables answer the question “What do I keep the same?”
Notes 4. Extraneous variables answer the question “What uninteresting variables might mediate
the effect of the IV on the DV?”
3.3.4 Exhaustive
Finally, there are two traits of variables that should always be achieved. Each variable should be
exhaustive, it should include all possible answerable responses.
Example: If the variable is “religion” and the only options are “Christians”, Hindus”,
“Jewish”, and “Muslim”, there are quite a few religions that haven’t been included. The list does
not exhaust all possibilities. On the other hand, if you exhaust all the possibilities with some
variables – religion being one of them – you would simply have too many responses.
The way to deal with this is to explicitly list the most common attributes and then use a
general category like “Other” to account for all remaining ones. In addition to being exhaustive,
the attributes of a variable should be mutually exclusive; no respondent should be able to
have two attributes simultaneously. While this might seem obvious, it is often rather tricky in
in
practice.
e.
fre
s4
e
Example: You might be tempted to represent the variable “Employment Status” with
ot
the two attributes “employed” and “unemployed.” But these attributes are not necessarily
.n
w
mutually exclusive – a person who is looking for a second job while employed would be able to
w
w
check both attributes! But don’t we often use questions on surveys that ask the respondent to
//
s:
“check all that apply” and then list a series of categories? Yes, we do, but technically speaking,
tp
each of the categories in a question like that is its own variable and is treated dichotomously as
ht
Task Give three examples of events in which the variables are mutually exclusive.
Self Assessment
A hypothesis is a proposition – a tentative assumption which a researcher wants to test for its
logical or empirical consequences. Hypotheses are more useful when stated in precise and
clearly defined terms. It may be mentioned that though a hypothesis is useful it is not always
necessary, especially in case of exploratory researches. However, in a problem-oriented research,
it is necessary to formulate a hypothesis or hypotheses. In such researches, hypotheses are
generally concerned with the causes of a certain phenomenon or a relationship between two or
more variables under investigation.
Formulate a Hypothesis: Let us discuss about introduction of a new drug. The drug is
tested on a few patients and based on the response from patients, a decision has to be made
whether the drug should be introduced or not. We make certain assumptions about the
parameter to be tested – these assumptions are known as hypotheses.
We start with a ‘null hypothesis’: H0 : m = 100. This is a claim or hypothesis about the
values or population parameter.
in
e.
This is tested against alternate hypothesis, H1 : m 100. fre
s4
The null hypothesis is tested with available evidence and a decision is made whether to
e
accept this hypothesis or reject it. If the null hypothesis is rejected, we accept the alternate
ot
.n
hypothesis.
w
w
Setting up a Suitable Significance Level: There are two types of errors that can be committed
w
//
Type I error: An error made in rejecting the null hypothesis, when in fact it is true.
ht
Type II error: An error made in accepting the null hypothesis, when in fact it is
untrue.
The level of significance signifies the probability of committing Type I error and is
generally taken as equal to 5% (a = .05).
This means that even after testing the hypothesis, when a decision is made, we may still be
committing 5% error in rejecting the null hypothesis when it is actually true. Sometimes,
the value of ‘a’ is taken as .01 but it is the discretion of the investigator, depending upon
the sensitivity of the study.
Choose a Test Criterion: This means selection of a suitable test statistic that can be used
along with the available information carrying out the test. The different test statistics that
are normally used are:
Normal Distribution: z-statistic, this is most often used, when the samples are more
than 30.
t-statistic: ‘t’ test is used for small samples only.
F-statistic
Chi-Square statistic.
Compute the Test Characteristic: This involves the actual collection and computation of
the sample data. For the case under consideration, we have to find the sample mean () and
Notes then compute the calculated ‘Z’. This calculated value (absolute) is compared with tabulated
value obtained from normal distribution table against the decided criterion (value of ‘a’
and one tail or two tails).
Make a Decision: If the calculated value of the test characteristic is greater than the tabulated
value, the null hypothesis is rejected and the alternate hypothesis is accepted. Talking in
terms of critical region, the value of calculated characteristic falls outside the acceptance
region.
Rejection Region
(/2)
in
e.
fre
e s4
ot
(Right)
w
Acceptance Zone
w
(1-)
s:
tp
ht
Task In the given following research problem, identify the corresponding decision
problem for which the information will be useful:
1. Assess the level of awareness among housewives regarding the benefits of a new
product to be introduced in the market.
2. Assess attitudes and opinions of customers towards existing five-star hotels.
3. Design a test market to assess the effect of particular discount scheme on the volume
of sales of the product.
Self Assessment
9. …………………means selection of a suitable test statistic that can be used along with the
available information carrying out the test.
The first thing that a good researcher needs to have is the language of research.
If one doesn’t, it is for sure that one is going to have a hard time discussing research.
One has to take care of some of the major issues in research like the types of questions one
can ask in a project, the role of time in research, and the different types of relationships one
can estimate.
Then one has to consider defining some basic terms like variable, hypothesis, data, and
unit of analysis.
A good research proposal gives you an opportunity to think through your project carefully,
and clarify and define what you want to research.
It provides you with an outline and to guide you through the research process.
It also lets your supervisor and department or faculty know what you would like to
research and how you plan to go about it.
in
e.
A hypothesis is a proposition – a tentative assumption which a researcher wants to test for
fre
its logical or empirical consequences. Hypotheses are more useful when stated in precise
s4
3.6 Keywords
w
w
//
s:
Coding: Coding means assigning numbers to each of the answers, so that they can be analysed.
tp
ht
Data Collection: The search for answers to research questions is called data collection.
Editing: The data collected should be scanned to make sure that it is complete and that all the
instructions are followed.
Problem Formulation: The problem formulation means converting the management problem
to a research problem.
Research Process: Research process defines marketing research and describes the skills required
to identify the problem, the decision alternatives, and the client’s needs, which are critical
components of a research activities.
1. Probabilities
2. Longitudinal study
3. Relationship
4. Negative
5. Variable
6. Attribute
7. Extraneous
8. Null hypothesis
in
e.
fre
s4
CONTENTS
Objectives
Introduction
4.1 Sources for Problem Identification
4.1.1 Self Questioning by Researcher while Defining the Problem
4.2 Selection of Problem
4.2.1 Selection Criteria
4.3 Understanding Problem
4.4 Necessity of Defined Problem
4.5 Pilot Testing
4.5.1 Data Collection
in
e.
4.5.2 Data Processing fre
s4
4.5.3 Analysis and Interpretation
e
ot
4.7 Summary
w
w
4.8 Keywords
//
s:
tp
Objectives
Introduction
There is a famous saying that “problem well-defined is half solved”. This statement is strikingly
true in market research, because if the problem is not stated properly, the objectives will not be
clear. If the objective is not clearly defined, the data collection becomes meaningless.
The first step in research is to formulate the problem. A company manufacturing television sets
might think that it is losing sales to a foreign company. A brief illustration aptly demonstrates
how such problem can be ill-conceived. The management of a company felt, a drop in sales was
because of the poor quality of product. Subsequently, research was undertaken with a view to
Notes improve the quality of the product. But despite an improvement in quality, sales did not pick up.
In this case, we may say that the problem is ill-defined. The actual reason was ineffective sales
promotion. The problem thus needs to be carefully identified.
!
Caution It is vital and any error in defining the problem incorrectly can result in wastage
of time and money.
Problem definition might refer to either a real-life situation or it may also refer to a set of
opportunities. Market research problems or opportunities will arise under the following
circumstances – (1) Unanticipated change (2) Planned change. Many factors in the environment
can create problems or opportunities. Thus, changes in the demographics, technological and
legal changes affect the marketing function. Now the question is how the company responds to
new technology, or product introduced by the competitor or how to cope with the changes in
lifestyles. It may be a problem and at the same time, it can also be viewed as an opportunity. In
order to conduct research, the problem must be defined accurately.
in
While formulating the problem, clearly define:
e.
fre
1. Who is the focus?
s4
e
Example: “Why does the upper-middle class of Bangalore shop at Lifestyle during the
Diwali season”?
Here all the above four aspects are covered. We may be interested in a number of variables due
to which shopping is done at a particular place. The characteristic of interest to the researcher
may be (1) Variety offered at Lifestyle (2) Discount offered by way of promotion (3) Ambience
at the Lifestyle and the (4) Personalised service offered. In some cases, the cause of the problem
is obvious whereas in others the cause is not so obvious. The obvious causes are the products
being on the decline. Not so obvious causes could be a bad first experience for the customer.
Research students can adopt the following ways to identify the problems:
Research reports already published may be referred to define a specific problem.
Cultural and technological changes can act as a sources for research problem identification.
Seminars/symposiums/focus groups can act as a useful source.
Notes
Notes Problem formulation is the key to research process. For a researcher, problem
formulation means converting the management problem to a research problem. In order
to attain clarity, the M.R manager and researcher must articulate clearly so that perfect
understanding of each others is achieved. In research process, the first and foremost step
happens to be that of selecting and properly defining a research problem. A researcher
must find the problem and formulate it so that it becomes susceptible to research. Like a
medical doctor, a researcher must examine all the symptoms (presented to him or observed
by him) concerning a problem before he can diagnose correctly. To define a problem
correctly, a researcher must know: what a problem is?
in
3. Can relevant data be gathered through the process of marketing research?
e.
fre
4. Is the research problem significant?
es4
7. What exactly will be the difficulties in conducting the study, and hurdles to be overcome?
//
s:
tp
Managers often want the results of research in accordance with their expectation. This satisfies
them immensely. If one were to closely look at the questionnaire, it is found that in most cases,
there are stereotyped answers given by the respondents. A researcher must be creative and
should look at problems in a different perspective.
Task Cultural and technological changes can act as a source for research problem
identification. Why/ why not?
Self Assessment
The research problem undertaken for study must be carefully selected. The task is a difficult one,
although it may not appear to be so. Help may be taken from a research guide in this connection.
Nevertheless, every researcher must find out his own salvation for research problems cannot be
borrowed. A problem must spring from the researcher’s mind like a plant springing from its
Notes own seed. If our eyes need glasses, it is not the optician alone who decides about the number of
the lens we require. We have to see ourself and enable him to prescribe for us the right number
by cooperating with him. Thus, a research guide can at the most only help a researcher choose a
subject.
Inevitably, selecting a problem is somewhat arbitrary, idiosyncratic, and personal. Avoid
selecting the first problem that you encounter. Try to select the most interesting and personally
satisfying choice from among two or three possibilities. The problem selection should matter to
you. You should be eager and enthusiastic.
!
Caution A good topic should be small enough for a conclusive investigation and large
enough to yield interesting results. Remember that research must yield a publication for
it to have meaning. You may wish to query likely periodical editors to see if they might
be interested in an article on your research topic.
In some cases, as with a thesis or a dissertation, some sort of preliminary study may be needed
to see if the problem and the study are feasible and to identify snags. Such a Pilot Study can be
in
quite valuable.
e.
fre
s4
e
ot
.n
Task Analyse what problems you might encounter while selecting a problem.
w
w
w
in
e.
be classified into two categories:
fre
1. Difficulty related problems
es4
2. Opportunity related problems, while the first category produces negative results such as,
ot
Problem definition might refer to either a real-life situation or it may also refer to a set of
w
w
opportunities. Market research problems or opportunities will arise under the following
//
s:
circumstances: (1) Unanticipated change (2) Planned change. Many factors in the environment
tp
can create problems or opportunities. Thus, changes in the demographics, technological and
ht
legal changes affect the marketing function. Now the question is how the company responds to
new technology, or product introduced by the competitor or how to cope with the changes in
life-styles. It may be a problem and at the same time, it can also be viewed as an opportunity. In
order to conduct research, the problem must be defined accurately.
While formulating the problem, clearly define:
1. Who is the focus?
2. What is the subject-matter of research?
3. To which geographical territory/area the problem refers to?
4. To which period does the study pertains to?
Example: “Why does the upper-middle class of Bangalore shop at Life-style during the
Diwali season”?
Here all the above four aspects are covered. We may be interested in a number of variables due
to which shopping is done at a particular place. The characteristic of interest to the researcher
may be (1) Variety offered at life-style (2) Discount offered by way of promotion (3) Ambience
at the life-style and (4) Personalised service offered. In some cases, the cause of the problem is
obvious whereas in others the cause is not so obvious. The obvious causes are the products being
on the decline. Not so obvious causes could be a bad first experience for the customer.
Defining a research problem properly is a prerequisite for any study and is a step of the highest
importance. A problem well defined is half solved. Defining the problem is often more essential
than its solution because when the problem is formulated, an appropriate technique can be applied
Notes to generate alternative solutions. This statement signifies the need for defining a research problem.
The problem to be investigated must be defined unambiguously for that will help to discriminate
relevant data from the irrelevant ones. A proper definition of research problem will enable the
researcher to be on the track whereas an ill-defined problem may create hurdles. When you define
a research problem you are trying to reduce the outcome of an answer. The question of course
when you speak about “marketing research” is how I can target more customers that I can sell my
product to. You are looking for specific answers such as: “What type of soda do all foreign born
males between the ages of 25-35 drink?” This is defining the problem. What do you consider
foreign born males? What constitutes soda? etc. This is important because companies and sales
organization attempt to “target” their market instead of taking a shotgun approach. The process is
to first make sure any information you obtain is credible and from a reputable organization. Then
break down your problem and pick apart any inconsistencies you may see within you research
project. Problem formulation is the key to research process. For a researcher, problem formulation
means converting the management problem to a research problem.
In order to attain clarity, the manager and researcher must articulate clearly so that perfect
understanding of each other’s is achieved.
in
Self Assessment
e.
fre
s4
Fill in the blanks:
e
ot
6. In order to attain clarity, the manager and researcher must ...................... clearly.
.n
w
8. When you define a research problem you are trying to ...................... the outcome of an
s:
tp
answer.
ht
9. Changes in the demographics, technological and legal changes affect the .......................
function.
A pilot test is a method used to test the design and/or methods and/or instrument prior to
carrying out the research. Basically, pilot testing means finding out if your survey, key informant
interview guide or observation form will work in the “real world” by trying it out first on a few
people. Can be small, 3-5, since the purpose is not to collect data but to refine your process and/
or instrument.
Pilot testing involves conducting a preliminary test of data collection tools and procedures to
identify and eliminate problems, allowing programs to make corrective changes or adjustments
before actually collecting data from the target population. A pilot survey is very useful when
the actual survey is to be on a big scale as it may provide data which will allow costs to be
trimmed. Also, a pilot survey will give an estimate of the non-response rate and it will also give
a guide as to the adequacy of the sampling frame chosen.
The purpose is to make sure that everyone in your sample not only understands the
questions, but understands them in the same way. This way, too, you can see if any
questions make respondents feel uncomfortable. You’ll also be able to find out how long
it takes to complete the survey in real time.
A pilot test usually involves simulating the actual data collection process on a small scale to get Notes
feedback on whether or not the instruments are likely to work as expected in a “real world”
situation. A typical pilot test involves administering instruments to a small group of individuals
that has similar characteristics to the target population, and in a manner that simulates how data
will be collected when the instruments are administered to the target population.
Pilot testing gives programs an opportunity to make revisions to instruments and data collection
procedures to ensure that appropriate questions are being asked, the right data will be collected,
and the data collection methods will work. Programs that neglect pilot testing run the risk of
collecting useless data.
Pilot testing provides an opportunity to detect and remedy a wide range of potential problems
with an instrument. These problems may include:
Questions that respondents don’t understand
Ambiguous questions
Questions that combine two or more issues in a single question (double-barreled questions)
Questions that make respondents uncomfortable
in
Pilot testing can also help programs identify ways to improve how an instrument is administered.
e.
fre
For example, if respondents show fatigue while completing the instrument, then the program
s4
should look for ways to shorten the instrument. If respondents are confused about how to return
e
ot
the completed instrument, then the program needs to clarify instructions and simplify this
.n
process.
w
w
w
Data collection is the systematic recording of information; data analysis involves working to
uncover patterns and trends in data sets; data interpretation involves explaining those patterns
and trends. Scientists interpret data based on their background knowledge and experience, thus
different scientists can interpret the same data in different ways.
Processing data is very important in market research. After collecting the data, the next job of
the researcher is to analyze and interpret the data. The purpose of analysis is to draw conclusion.
There are two parts in processing the data.
1. Data Analysis
2. Interpretation of data
Analysis of the data involves organizing the data in a particular manner. Interpretation of data
is a method for deriving conclusions from the data analyzed. Analysis of data is not complete,
unless it is interpreted.
2. Coding
in
3. Editing
e.
fre
4. Tabulation of data
e s4
Data collection is a significant part of market research. Even more significant is, to filter out the
relevant data from the mass of data collected. Data continues to be in raw form, unless they are
processed and analyzed.
Primary data collected by surveys, observations by field investigations are hastily entered into
questionnaires. Due to the pressure of interviewing, the researcher has to write down the
responses immediately. Many times this may not be systematic. The information so collected by
field staff is called raw data.
The information collected may be illegible, incomplete and inaccurate to some extent. Also the
information collected will be scattered in several data collection formats. The data lying in such
a crude form are not ready for analysis. Keeping this in mind the researcher must take some
measures to organize the data, so that it can be analyzed.
The various steps which are required to be taken for his purpose are (a) editing and (b) coding
and (c) tabulating.
Coding
Coding refers to all those activities which helps in transforming edited questionnaires into a
form which is ready for analysis. Coding speeds up the tabulation while editing eliminates
errors. Coding involves assigning numbers or other symbols to answers, so that the responses
can be grouped into limited number of classes or categories.
Editing Notes
The main purpose of editing is to eliminate errors and confusion. Editing involves inspection
and correction of each questionnaire. The main role of editing is to identify commissions,
ambiguities and errors in response.
Therefore editing means, the activity of inspecting, correcting and modifying the correct data.
Tabulation of Data
Tabulation refers to counting the number of cases that fall into various categories. The results
are summarized in the form of statistical tables. The raw data is divided into groups and
subgroups. The counting and placing of data in particular group and subgroup are done.
Tabulation involves
1. Sorting and counting
2. Summarizing of data
Tabulation may be of two types (1) simple tabulation (2) cross tabulation. In simple tabulation,
in
e.
a single variable is counted. Cross tabulation includes 2 or more variables, which are treated
fre
simultaneously. Tabulation can be done entirely by hand or by machine or both hand and
s4
machine.
e
ot
.n
Before taking up summarizing, the data should be classified into (1) Relevant data (2) Irrelevant
s:
tp
data. During the field study, the researcher has collected lot of data which he may think would
ht
be of use. Summarizing the data includes (1) Classification of data (2) Frequency distribution
(3) Use of appropriate statistical tool.
Classification of Data
(a) Number of Groups: Number of groups should be sufficient to record all possible data.
Classification should not be too narrow. If it is too narrow, there can be an overlap.
Example: If a researcher is conducting a survey on “Why the current car owner dislikes
the car”? The car owner may indicate the following:
1. Difficulty in seeking entry to the back seat
2. Interior space
3. Cramped leg room
4. Mileage
5. Rattling of the engine
6. Dickey space
Now all the above data can be classified into 2 or 3 categories such as (1) Discomfort
(2) Expense (3) Pride (4) Safety (5) Design of the car.
(b) Width of the Class Interval: Class interval should be uniform and should be of equal
width. This will give consistency in the data distribution.
(c) Exclusive Categories: Classification made should be done in such a way that, the response
can be placed in only one category.
Notes
Example: Problem of Leg room is the answer by respondent. This should be placed
either under Discomfort or Design but not both.
(d) Exhaustive Categories: This should be made to include all responses including “Don’t
Know” answers. Sometimes this will influence the ultimate answer to the research problem.
(e) Avoid Extremes: Avoid open ended class interval.
Frequency Distribution: Frequency distribution, simply reports the number of responses that
each question received. Frequency distribution, organizes data into classes or groups. It shows
the number of data that falls into particular class.
in
4000-6999 100
e.
7000-9999
fre 122
s4
10000-12999 140
e
ot
.n
In marketing research central value or tendency plays a very important role. The researcher
w
may be interested in knowing the average sales/shop, average consumption per month etc. The
w
w
population parameters can be calculated with the help of simple average. The average of sample
//
s:
may be taken as population parameter. E.g. If the average income of the population is to be
tp
ht
computed, the researcher may select a sample, collect data on family income and calculate the
relevant statistics which will be a representative of the population.
The total purchasing power of the community can be estimated on sample average. If the sample
is stratified, the purchasing power of each income class may also be estimated. The median
figure will reveal that half the population has more income than the median income, and half
the population has less income than median income. The mode will reveal the most common
frequency. Based on this, shoppers can play their strategy to sell the product.
The three most common ways to measure centrality or central tendency is mode, median and
mean.
Mode
The mode is the central value or item, that occurs most often, when data is categorized in a
frequency distribution, it is very easy to identify the mode, since the category in which the mode
lies has the greatest number of observations.
D1
M0 = LLM 0 i
D1 D2
D1 = Difference between the frequency of modal class and the class immediately
preceding the modal class
D2 = Difference between the frequency of the modal class and the class immediately
succeeding the modal class.
in
i = size of the modal class interval
e.
fre
s4
95
Md = 10,000 5,000
95 75
e
ot
.n
w
95
s:
170
ht
Conclusion: Majority have the income of 12794. This is how statistical techniques are used in
MR application.
Median
Median lies precisely halfway between highest and lowest values. It is necessary to arrange the
data into ascending or descending order before selecting the median value. For ungrouped data
with an odd number of observation, the median would be the middle value. For even number
of observations, the median value is half way between central value.
For a grouped data median is calculated using the formula
N
– C.F.
Md = LM d 2 i
FM d
CF = Cumulative frequency for the class just below the median class.
Fmd = Frequency of the median class.
Notes Conclusion: Half of the population has income> 21568' and half of the population has income
< 21568.
Mean
In a grouped data, the midpoint of each category would be multiplied by the number of
observation in that category. Sum up and the total to be divided by the total number of observation.
fx
Eqn., X =
f
Example: 2 students X, Y attend 3 classes tests and the scores areas follows:
Though Mean is same, X is better than Y.
Measures of Dispersion
in
e.
X fre 55% 60% 65% 60%
Y 65% 60% 55% 60%
s4
e
Y - has Deteriorated
w
w
w
//
s:
tp
ht
Dispersion is the spread of the data in a distribution. A measure of dispersion indicates the
degrees of scattered ness of the observations. Let curves A and B represent two frequency
distributions. Observe that A and B have the same mean. But curve A has less variability than B.
If we measure only the mean of these two distributions, we will miss an important difference
between A and B. To increase our understanding of the pattern of the data we must also measure
its dispersion.
Range: It is the difference between the highest and lowest observed values.
i.e. range = H – L, H = Highest, L = Lowest.
Note:
1. Range is the crudest measure of dispersion.
HL
2. is called the coefficient of range.
H+L
Q –Q
Q is given by Q = 3 1
2
Note:
Q –Q
3 1
1. Q Q is called the coefficient of quartile deviation.
3 1
2. Quartile deviation is not a true measure of dispersion but only a distance of scale.
Mean Deviation (MD): If A is any average then mean deviation about A is given by
fi | xi – A |
MD(A) =
N
Note:
in
e.
fi | xi – x | fre
1. Mean deviation about mean MD x
s4
N
e
ot
2. Of all the mean deviations taken about different averages mean derivation about the
.n
w
MD(A)
s:
A
ht
Variance (s2): A measure of the average squared distance between the mean and each term in the
population.
1 2
s2 = f (x x)
N i i
Standard deviation (s) is the positive square root of the variance
1
s = fi (xi – x)2
N
1
s2 = f (x 2 (x)2
N i i
Note: Combined variance of two sets of data of N 1 and N2 items with means x 1 and x2 and
standard deviations s1 and s2 respectively is obtained by
N 1s 12 N 2 s 22 N 1d12 N 1d 22
s2 = N1 N2
2
Where, d = (x – x1 )2 , d 22 (x – x2 )2
1
Notes
N1 x1 + N 2 x 2
and x =
N1 + N 2
Sample variance (s2): Let x1, x2, x3, ……… xn, represents a sample with mean x
s2 =
(x x)2
n 1
=
x 2 n(x)2
n 1 n 1
in
e.
fre
It is a relative measure of dispersion that enables us to compare two distributions. It relates the
s4
standard deviation and the mean by expressing the standard deviation as a percentage of the
e
ot
mean.
.n
w
w
σ
100
w
C.V. =
x
//
s:
tp
Note:
ht
HL 300
Coefficient of range = = 0.7
H L 350 50
n +1 14
= = 3.5
4 4
3(n +1)
= 10.5
4
Q1 = 103 + 0.5 (103 – 103) = 103
Q3 = 174 + 0.5 (200 – 174) = 187
Q Q
3 1
Coefficient of QD = Q +Q
3 1
84 Notes
= = 0.2896
290
Illustration 2: Calculate coefficient of mean deviation about (i) Median (ii) mean from the
following data
X 14 16 18 20 22 24 26
f 2 4 5 3 2 1 4
Solution:
X F Cf fx |x – x | |x – M| f| x – x | f| x – M|
14 2 2 28 5.71 4 11.42 8
16 4 6 64 3.71 2 14.84 8
18 5 11 90 1.71 0 8.55 0
20 3 14 60 0.29 2 0.87 6
22 2 16 44 2.29 4 4.58 8
in
e.
24 1 17 24 4.29 6 fre 4.29 6
s4
26 4 21 104 6.29 8 25.16 32
e
21 414 69.71 68
ot
.n
w
fi xi 414
w
= = 19.71
w
x N 21
//
s:
tp
N +1 22
ht
= = 11 Median M = 18
2 2
fi |x i - x| 69.71
Now (i) M.D. ( x ) = = 3.32
N 21
MD(x) 3.32
Coefficient of MD( x ) = = 0.16
x 19.71
fi xi M 68
(ii) M.D. (M) = = 3.24
N 21
MD(M) 3.24
Coefficient of MD (M) = = 0.18
M 18
Illustration 3: A purchasing agent obtained a sample of incandescent lamps from two suppliers.
He had the sample tested in his laboratory for length of life with following results.
Notes Solution:
32
u A = 60 = 0.533
x A = 1000 + 200
x A = 1000 + 200 (0.533) = 1106.67
in
1 68
s 2u = (0.533)2
e.
= fu2 – ( u ) =
N fre 60
s4
= 1.133 – 0.2809
e
ot
s 2u = 0.8524
.n
w
w
su = 0.9233
// w
σA
ht
184.66
= 100 16.68%
1106.67
15
V = 60 0.25
xB = 1000 + 200 V
= 1000 + 58
xB = 1058
1 27
s 2v = fv2 – ( V )2 = – (0.25)2
N 60
= 0.45 – 0.0625
s 2v = 0.3875 Notes
sv = 0.6225
sB = 200 sv
= 200 × 0.6225 = 124.5
sB
C.V for Sample B = 100
xB
124.5
100 = 11.77%
1058
Since C.V. for sample B is smaller, sample B lamps are more uniform.
Interpretation means bring out the meaning of data or we can say that interpretation is to
convert data into information. The essence of any research is to draw conclusion about the study.
in
This requires high degree of skill. There are two methods of drawing conclusions (1) induction
e.
(2) deduction. fre
s4
In induction method, one starts from observed data and then generalization is done, which
e
ot
On the other hand, deductive reasoning starts from some general law and then applied to a
w
Example of induction: All products manufactured by Sony are excellent. DVD player model
ht
Example of deduction: All products have to reach decline stage one day and become obsolete.
This Radio is in decline mode. Therefore it will become obsolescent.
During inductive phase, we reason from observation. During deductive phase, we reason towards
observation. Both logic and observation are essential for interpretation.
Successful interpretation depends on ‘How Well the data is analyzed’. If data is not properly
analyzed, the interpretation may go wrong. If analysis has to be corrected, then data collection
must be proper. Similarly if data collected is proper but analyzed wrongly, then also the
interpretation or conclusion will be wrong. Sometimes even with proper data and proper analysis,
can still lead to wrong interpretation. Interpretation depends on. Experience of the researcher
and methods used by him for interpretation.
Example: A detergent manufacturer is trying to decide, which of the three sale promotion
methods (Discount, contest, buy one get one free) would be most effective in increasing the
sales. Each sales promotion method is run at different times in different cities. The sales got by
the different sale promotion is a follows.
Notes The results can conclude that the second Sales Promotion method was the most effective in
developing sales. This may be adopted nationally to promote the product. But one cannot say
that the same method of sales promotion will be effective in each and every city under study.
7. Do not miss the significance of some answers, because they are found from a very few
respondents, such as “don’t know” or “can’t say”.
in
e.
4.6 Reporting the Results fre
s4
The goal of research is not just to discover something but to communicate that discovery to a
e
ot
larger audience—other social scientists, government officials, your teachers, the general public—
.n
w
perhaps several of these audiences. Whatever the study’s particular outcome, if the research
w
w
report enables the intended audience to comprehend the results and learn from them, the research
//
s:
can be judged a success. If the intended audience is not able to learn about the study’s results, the
tp
research should be judged a failure no matter how expensive the research, how sophisticated its
ht
This conclusion may seem obvious, and perhaps a bit unnecessary. After all, you may think that
all researchers write up their results for other people to read. But the fact is that many research
projects fail to produce a research report. Sometimes the problem is that the research is poorly
designed to begin with and cannot be carried out in a satisfactory manner; sometimes
unanticipated difficulties derail a viable project. But too often the researcher just never gets
around to writing a report. And then there are many research reports that are very incomplete
or poorly written or that speak to only one of several interested audiences. The failure may not
be complete, but the project’s full potential is not achieved.
The stage of reporting research results is also the point at which the need for new research is
identified. It is the time when, so to speak, “the rubber hits the road”—when we have to make
our research make sense to others. To whom will our research be addressed? How should we
present our results to them? Should we seek to influence how our research report is used?
The research report will present research findings and interpretations in a way that reflects
some combination of the researcher’s goals, the research sponsor’s goals, the concerns of the
research subjects, and perhaps the concerns of a wider anticipated readership. Understanding
the goals of these different groups will help the researcher begin to shape the final report even
at the start of the research. In designing a proposal and in negotiating access to a setting for the
research, commitments often must be made to produce a particular type of report, or at least
cover certain issues in the final report. As the research progresses, feedback about the research
from its participants, sponsoring agencies, collaborators, or other interested parties may suggest
the importance of focusing on particular issues in the final report. Social researchers traditionally
have tried to distance themselves from the concerns of such interested parties, paying attention
only to what is needed to advance scientific knowledge. But in recent years, some social scientists Notes
have recommended bringing these interested parties into the research and reporting process
itself.
Self Assessment
10. A pilot survey is very useful when the actual survey is to be on a ............ .
11. Coding speeds up the tabulation while editing eliminates ..................... .
12. ..................... refers to counting the number of cases that fall into various categories.
13. ..................... lies precisely halfway between highest and lowest values.
14. Coefficient of variation (C.V.) is a relative measure of dispersion that enables us to compare
..................... .
4.7 Summary
in
e.
fre
Proper problem formulation is the key to success in research.
es4
It is vital and any error in defining the problem incorrectly can result in wastage of time
ot
.n
and money.
w
w
The task of defining a research problem, very often, follows a sequential pattern.
tp
ht
The problem is stated in a general way, the ambiguities are resolved, thinking and
rethinking process results in a more specific formulation of the problem.
It is done so that it may be a realistic one in terms of the available data and resources and
is also analytically meaningful.
All this results in a well defined research problem that is not only meaningful from an
operational point of view.
But is equally capable of paving the way for the development of working hypotheses
and for means of solving the problem itself.
Data when collected is raw in nature. When processed, it becomes information
without data analysis, and interpretation, researcher cannot draw any conclusion.
Interpretation can use either induction or deduction logic. While interpreting certain
precautions are to be taken.
4.8 Keywords
Data collection: Data collection is the systematic recording of information; data analysis involves
working to uncover patterns and trends in data sets; data interpretation involves explaining
those patterns and trends.
Editing: Editing, include inspection and correction of each questionnaire.
Frequency Distribution: Frequency distribution, organizes data into classes or groups.
Median: Median lies precisely halfway between highest and lowest values.
Notes Mode: The mode is the central value or item that occurs most often, when data is categorized in
a frequency distribution.
1. The objective of research problem should be clearly defined; otherwise the data collection
becomes meaningless. Discuss with suitable examples.
2. Cultural and technological changes can act as a source for research problem identification.
Why/why not?
3. Defining a research problem properly is a prerequisite for any study. Why?
in
4. What precautions should be taken while formulating a problem?
e.
5. fre
If you are appointed to do a research for some problem with the client, what would you
s4
take as the sources for problem identification?
e
ot
6. It may be a problem and at the same time, it can also be viewed as an opportunity. Why/
.n
w
why not?
w
w
7. In some cases, some sort of preliminary study may be needed. Which cases are being
//
s:
(a) Mode
(b) Median
(c) Mean
10. How to summarise and classify the collected data?
1. Time, Money
2. Expectation
3. What a problem is
4. Conclusive
5. Carefully
6. Articulate
7. Formulation
8. Reduce
9. Marketing
12. Tabulation
13. Median
14. Two distributions
S. N. Murthy and U. Bhojanna, Business Research Methods, 2nd Edition, Excel Books.
in
e.
fre
s4
Online links
e
www.experiment-resources.com
ot
.n
www.scribd.com
w
w
w
//
s:
tp
ht
CONTENTS
Objectives
Introduction
5.1 Use of Literature Review
5.2 Search for Related Literature
5.3 Reading the Literature
5.4 Guidelines for Information Presentation
5.5 Process of Literature Review
5.6 Summary
5.7 Keywords
5.8 Review Questions
5.9 Further Readings
in
e.
Objectives
fre
s4
e
ot
Introduction
A literature review is an evaluative report of information found in the literature related to your
selected area of study. The review should describe, summarise, evaluate and clarify this literature.
It should provide a theoretical base for the research and help you (the author) determine the
nature of your research. Works which are irrelevant should be discarded and those which are
peripheral should be looked at critically.
In writing the literature review, the purpose is to convey to the reader what knowledge and
ideas have been established on a topic, and what their strengths and weaknesses are. As a piece
of writing, the literature review must be defined by a guiding concept.
Notes Besides enlarging the body of knowledge about the topic, writing a literature
review leads the writer to gain and demonstrate skills in the following areas:
1. Searching skills: It improves the ability of the researcher to sift the literature
efficiently, using manual or computerized methods, to identify a set of useful articles
and books.
Contd...
2. Analysing skills: It is the ability to apply principles of analysis to identify the Notes
unbiased and valid studies?
It helps the researcher to learn about the studies similar to his own study and the research
design and methodology adopted to carry out those studies by earlier researchers.
It provides useful source of data related to the subject being studied.
It helps in introducing important and useful research personalities.
It provides an opportunity to see the study in a historical perspective.
in
e.
Literature review provides new ideas, methods and approaches to deal with research
fre
problems.
es4
It helps the researcher to compare his own study with other relevant studies.
ot
.n
w
It helps in anticipating the problems arising during the collection of data. The researcher
w
Information in literature review should be organised and related directly to the research
problem.
Self Assessment
When beginning a search for related literature, practical research suggests travelling to the
library and looking at the selection of indices, abstracts and available bibliographies. Information
available on microfilm must also be considered.
World Wide Web (internet) a boon for researchers help in identifying useful and relevant data
to the research project.
Notes Association of Indian Universities periodical University News publishes the thesis of the month
in the last page of the periodical. It includes the research projects completed in that particular
month, and is an important source of literature review.
Research agencies conduct various studies which comprises abundant data that may be helpful
in searching the literature.
in
e.
5.4 Guidelines for Information Presentation
fre
s4
1. Discuss fairly and clearly: The review of literature should be like a discussion with a
e
ot
friend concerning the studies, research reports, and writings that bear directly on your
.n
w
2. Organize a plan: Begin the discussion from a broad perspective and narrow it down to the
//
s:
3. Do not copy information as it is: Emphasise over what the study interprets rather to what
content has been produced. So, the researcher should critically evaluate and present the
information in his own words.
4. Establish the relationship between literature and research project: This can be done by
charting each study in relation to the problem or sub- problem it addresses. Study carefully
before beginning to write. Literature discussed should have a link to the research problem.
5. Summarise: Summarise major contributions of significant studies and articles to the body
of knowledge under review, maintaining the focus established in the introduction.
!
Caution Do not forget to evaluate the current “state of the art” for the body of knowledge
reviewed, pointing out major methodological flaws or gaps in research, inconsistencies in
theory and findings, and areas or issues pertinent to any future study.
Conclude by providing some insight into the relationship between the central topic of the
literature review and a larger area of study such as a discipline, a scientific endeavour, or
a profession.
Self Assessment
1. Selecting the topics and sub-topics: The researcher needs to select the topics and sub-topics
related to the research question being studied. It helps to direct the literature search in the
right direction. If the topics and sub-topics are not chosen, the researcher may end up with
lot of unrelated information.
2. Identifying the sources of information: The sources of information for specific topics are
to be identified and a list of the sources along with the specific topic should be made.
3. Collecting the information: Collect information systematically one after the other from
the reliable sources. The information about each topic should be recorded separately. This
helps the researcher to organise the information properly.
in
e.
fre
es4
ot
.n
w
w
w
//
s:
tp
ht
4. Organise the information: Information about each topic should be recorded and maintained
separately. It has to be categorised based on the topics and sub-topics. The categorised
information may be used appropriately in writing a literature review.
5. Writing the literature review: It consists of three steps, which are explained below:
Did u know? What are the steps involved in literature review writing?
Introduction: Define the topic, issues or areas of research being studied, thus providing an
appropriate context for reviewing the literature.
Body: Critically evaluate the information and make appropriate comparison of the studies
reviewed.
Conclusion: State what is your view point about the study, but not what the study says.
Self Assessment
Notes 6. If the ..................... and ..................... are not chosen, the researcher may end up with lot of
unrelated information.
7. The sources of information for specific topics are to be identified and a list of the sources
along with the ....................should be made.
8. Introduction primarily involves ................... the topic, issues or areas of research being
studied.
5.6 Summary
Literature review helps the researcher to move from management question to research
question.
in
Literature research enable to gather data for the current project which otherwise has been
e.
collected for some other purpose.
fre
s4
This if found useful for current study will save time and cost.
e
ot
Sometimes the secondary data through literature review may not be entirely suitable;
s:
tp
5.7 Keywords
Analysing skills: It is the ability to apply principles of analysis to identify the unbiased and
valid studies.
Literature review: A literature review is an evaluative report of information found in the literature
related to your selected area of study.
1. In writing the literature review, the purpose is to convey to the reader what knowledge
and ideas have been established on a topic, and what their strengths and weaknesses are?
2. What specific things would you keep in mind while writing the literature review?
7. Do you think that a researcher must study each sub problem separately? Why/why not?
8. Why would you say world wide web to be a boon for researchers?
9. Why is said that a researcher should never copy the information as it is? Notes
10. What might happen if in the conclusion, the researcher states that what study says?
1. Literature review
2. Research problem
3. Summarise
4. Beginning
5. Research question
6. Topics, Sub-topics
7. Specific topic
8. Defining
in
e.
5.9 Further Readings fre
s4
e
ot
.n
w
Books Abrams, M.A., Social Surveys and Social Action, London: Heinemann, 1951.
w
w
Bernal, J.D., The Social Function of Science, London: George Routledge and Sons,
1939.
Chase, Stuart, The Proper Study of Mankind: An inquiry into the Science of Human
Relations, New York, Harper and Row Publishers, 1958.
S.N Murthy and U. Bhojanna, Business Research Methods, 2nd Edition, Excel Books.
CONTENTS
Objectives
Introduction
6.1 Meaning
6.2 Types of Research Designs
6.3 Exploratory Research
6.3.1 Exploratory Research Methods
6.4 Conclusive Research
6.4.1 Descriptive or Diagnostic Research
6.4.2 Survey
6.4.3 Observation Studies
in
6.5 Causal Research
e.
fre
6.6 Experimentation
es4
Objectives
After studying this unit, you will be able to:
Construct an overview of research design;
Define exploratory research design;
Explain the methods that are adopted during exploratory research;
Describe the descriptive research design;
Discus the causal research design;
Explain the experimentation.
Introduction Notes
Suppose a manufacturer of a quality machine finds sales disappointing and believes that they
may be helped by the development of point of purchase display. The contemplated display is
expensive but the manufacturer would like to try it out first on a limited basis to be sure that it
stimulates more sales and profits than it costs. This can be achieved through a better planning
and formulation of a good strategy. According to Kerlinger, “Research design is the plan,
structure and strategy of investigation conceived so as to obtain answers to research questions
and to control variance”.
Research design is in fact the conceptual structure within which the research is conducted.
Bernard Philips has described the research design as a “blue print for the collection, measurement
and analysis of data”.
6.1 Meaning
Research design is simply a plan for a study. This is used as a guide in collecting and analyzing
in
e.
the data. It can be called a blue print to carry out the study. It is like a plan made by an architect
fre
to build the house, if a research is conducted without a blue print, the result is likely to be
s4
different from what is expected at the start. The blue print includes (1) interviews to be conducted,
e
ot
observations to be made, experiments to be conducted data analysis to be made. (2) Tools used
.n
w
to collect the data such as questionnaire (3) what is the sampling methods used.
w
w
Research design can be thought of as the structure of research – it is the “glue” that holds all of
//
s:
the elements in a research project together. A successful design stems from a collaborative
tp
ht
Research Design is mainly of three types namely, exploratory, descriptive and causal research.
Exploratory research is used to seek insights into general nature of the problem. It provides the
relevant variable that need to be considered. In this type of research, there is no previous
knowledge; research methods are flexible, qualitative and unstructured.
Notes The researcher in this method does not know “what he will find”.
Descriptive research is a type of research, very widely used in marketing research. Generally in
descriptive study there will be a hypothesis, with respect to this hypothesis, we ask questions
like size, distribution, etc.
Causal research, this type of research is concerned with finding cause and effect relationship.
Normally experiments are conducted in this type of research.
Self Assessment
However, a frequently used classification system is to group research designs under three major
categories:
i. Research Designs in case of Exploratory Research Studies.
in
e.
fre
s4
e
ot
.n
w
w
w
//
s:
tp
ht
The major emphasis in exploratory research is on converting broad, vague problem statements
into small, precise sub-problem statements, which is done in order to formulate specific
hypothesis. The hypothesis is a statement that specifies, “how two or more variables are related?”
In the early stages of research, we usually lack from sufficient understanding of the problem to
formulate a specific hypothesis. Further, there are often several tentative explanations.
Examples: 1. “Sales are down because our prices are too high”,
2. “our dealers or sales representatives are not doing a good job”,
The following are the circumstances in which exploratory study would be ideally suited: Notes
Exploratory study is also used to increase the analyst’s familiarity with the problem. This
is particularly true, when the analyst is new to the problem area.
Example: A market researcher working for (new entrant) a company for the first time.
To establish priorities so that further research can be conducted.
Exploratory studies may be used to clarify concepts and help in formulating precise
problems.
in
e.
fre
Example: The management is considering a change in the contract policy, which it
s4
hopes, will result in improved satisfaction for channel members. An exploratory study can be
e
used to clarify the present state of channel members’ satisfaction and to develop a method by
ot
.n
In general, exploratory research is appropriate to any problem about which very little is
tp
At exploratory stage:
1. Sometimes, it may not be possible to develop any hypothesis at all, if the situation is
being investigated for the first time. This is because no previous data is available.
3. In other cases, most of the data is available and it may be possible to provide answers to
the problem.
Example: The example given below indicates each of the above type:
Notes In example 1, the research question is posed to determine “What benefit do people seek from the
Ad?” Since no previous research is done on consumer benefit for this product, it is not possible
to form any hypothesis.
In example 2, some information is currently available about packaging for a soft drink. Here it
is possible to formulate a hypothesis which is purely tentative. The hypothesis formulated here
may be only one of the several alternatives available.
In example 3, the root cause of customer dissatisfaction is known, i.e. lack of personalised
service. In this case, it is possible to verify whether this is a cause or not.
The quickest and the cheapest way to formulate a hypothesis in exploratory research is by using
any of the four methods:
1. Literature Search: This refers to “referring to a literature to develop a new hypothesis”.
The literature referred are – trade journals, professional journals, market research finding
publications, statistical publications etc.
in
e.
fre
Example: Suppose a problem is “Why are sales down?” This can quickly be analysed
s4
with the help of published data which should indicate “whether the problem is an “industry
e
1. The company’s market share has declined but industry’s figures are normal.
w
w
2. The industry is declining and hence the company’s market share is also declining.
w
//
s:
If we accept the situation that our company’s sales are down despite the market showing
an upward trend, then we need to analyse the marketing mix variables.
Example:
A TV manufacturing company feels that its market share is declining whereas the
overall television industry is doing very well.
Due to a trade embargo imposed by a country, textiles exports are down and hence
sales of a company making garment for exports is on the decline.
The above information may be used to pinpoint the reason for declining sales.
2. Experience Survey: In experience surveys, it is desirable to talk to persons who are well
informed in the area being investigated. These people may be company executives or
persons outside the organisation. Here, no questionnaire is required. The approach adopted
in an experience survey should be highly unstructured, so that the respondent can give
divergent views. Since the idea of using experience survey is to undertake problem
formulation, and not conclusion, probability sample need not be used. Those who cannot
speak freely should be excluded from the sample.
Example:
(1) A group of housewives may be approached for their choice for a “ready to cook
product”.
(2) A publisher might want to find out the reason for poor circulation of newspaper
introduced recently. He might meet (a) Newspaper sellers (b) Public reading room
(c) General public (d) Business community, etc.
These are experienced persons whose knowledge researcher can use.
3. Focus Group: Another widely used technique in exploratory research is the focus group. Notes
In a focus group, a small number of individuals are brought together to study and talk
about some topic of interest. The discussion is co-ordinated by a moderator. The group
usually is of 8-12 persons. While selecting these persons, care has to be taken to see that
they should have a common background and have similar experiences in buying. This is
required because there should not be a conflict among the group members on the common
issues that are being discussed. During the discussion, future buying attitudes, present
buying opinion etc., are gathered.
Most of the companies conducting the focus groups first screen the candidates to determine
who will compose the particular group. Firms also take care to avoid groups, in which
some of the participants have their friends and relatives, because this leads to a biased
discussion. Normallyl, a number of such groups are constituted and the final conclusions
of various groups are taken for formulating the hypothesis. Therefore a key factor in focus
group is to have similar groups. Normally there are 4-5 groups. Some of them may even
have 6-8 groups. The guiding criterion is to see whether the latter groups are generating
additional ideas or repeating the same with respect to the subject under study. When this
shows a diminishing return from the group, the discussions stopped. The typical focus
in
group lasts for 1-30 hours to 2 hours. The moderator under the focus group has a key role.
e.
fre
His job is to guide the group to proceed in the right direction.
s4
Listening: He must have a good listening ability. The moderator must not miss the
w
Permissive: The moderator must be permissive, yet alert to the signs that the group
s:
tp
is disintegrating.
ht
Memory: He must have a good memory. The moderator must be able to remember
the comments of the participants. Example: A discussion is centered around a new
advertisement by a telecom company. The participant may make a statement early
and make another statement later, which is opposite to what was said earlier.
For example: The participant may say that s(he) never subscribed to the views
expressed in the advertisement by the competitor, but subsequently may say that
the “current advertisement of competitor is excellent”.
Encouragement: The moderator must encourage unresponsive members to participate.
Respondent moderator group: Under this method, the moderator will select one of
the participants to act as a temporary moderator.
Dualing moderator group: In this method, there are two moderators. They purposely
take opposing positions on a given topic. This will help the researcher to obtain the
views of both groups.
Contd...
Notes Two way focus group: Under this method one group will listen to the other group.
Later, the second group will react to the views of the first group.
Dual moderator group: Here, there are two moderators. One moderator will make
sure that the discussion moves smoothly. The second moderator will ask a specific
question.
4. Case studies: Analysing a selected case sometimes gives an insight into the problem
which is being researched. Case histories of companies which have undergone a similar
situation may be available. These case studies are well suited to carry out exploratory
research. However, the result of investigation of case histories is always considered
suggestive, rather than conclusive. In case of preference to “ready to eat food”, many case
histories may be available in the form of previous studies made by competitors. We must
carefully examine the already published case studies with regard to other variables such
as price, advertisement, changes in the taste etc.
A Case in Point
in
A company manufacturing electric shavers, known for its brand, wanted to introduce the product
e.
fre
in Japan. Before the launch, the company made sure that all the 4Ps are acceptable to customers.
s4
When the product was launched, it met with failure. The company wondered what went wrong.
e
Later investigations revealed that Japanese palms were very small and hence the product was
ot
.n
not convenient for use. All possible causes were not listed and examined. This shows the
w
Self Assessment
tp
ht
4. In experience surveys, it is desirable to talk to persons who are well informed in the area
being.............................
5. Under the method of ....................................... group, the moderator will select one of the
participants to act as a temporary moderator.
6. Most of the companies conducting the ........................... groups first screen the candidates to
determine who will compose the particular group.
7. The moderator must not miss the .............................comment.
8. The moderator must encourage ........................ members to participate.
Meaning: This is a research having clearly defined objectives. In this type of research, specific
courses of action are taken to solve the problem.
In conclusive research, there are two types:
(a) Descriptive research
(b) Experimental research or Causal research.
Meaning
(a) The name itself reveals that, it is essentially a research to describe something.
(b) Descriptive inform us about the proportions of high and low income customers in a
particular territory. What descriptive research cannot indicate is that it cannot establish a
cause and effect relationship between the characteristics of interest. This is the distinct
disadvantage of descriptive research.
(c) Descriptive study requires a clear specification of “Who, what, when, where, why and how” of
in
the research.
e.
fre
s4
new outlet. The company wants to determine, “How people come to patronize a new outlet?”
.n
w
Some of the questions that need to be answered before data collection for this descriptive study
w
w
are as follows:
//
s:
tp
Who? Who is regarded as a shopper responsible for the success of the shop, whose demographic
ht
Notes To determine the association of the two variables such as Ad and sales
To make a prediction. We might be interested in sales forecasting for the next three years,
so that we can plan for training of new sales representatives
To estimate the proportion of people in a specific population, who behave in a particular
way.
in
e.
fre
e s4
ot
Task For each of the situation mentioned below, state whether the research should be
.n
2. To find out the consumer reaction regarding use of new detergents which are
tp
ht
economical.
Brands At T1 At T2 Notes
Brand X 500(25%) 600(30%)
Brand Y 700(35%) 650(32.5%)
Brand Z 400(20%) 300(15%)
Brand M 200(10%) 250(12.5%)
All others 200(10%) 250(12.5%)
200 100%
As can be seen between period T1 and T2 Brand X and Brand M has shown an improvement
in market share. Brand Y and Brand Z has decrease in market share, where as all other
categories remains the same. This shows that Brand A and M has gained market share at
the cost of Y and Z.
Omnibus panel.
in
True panel: This involves repeat measurement of the same variables.
e.
fre
s4
Each member of the panel is examined at a different time, to arrive at a conclusion on the
.n
w
above subject.
w
w
Omnibus panel: In omnibus panel too, a sample of elements is being selected and
//
s:
maintained, but the information collected from the member varies. At a certain point of
tp
ht
The sample may not be representative. This is because sometimes, panels may be
selected on account of convenience.
The panel members who provide the data, may not be interested to continue as
panel members. There could be dropouts, migration etc. Members who replace
them may differ vastly from the original member.
Notes Sometimes the panel members may show disinterest and non-committed.
A lengthy period of membership in the panel may cause respondents to start
imagining themselves to be experts and professionals. They may start responding
like experts and consultants and not like respondents. To avoid this, no one should
be retained as a member for more than 6 months.
(b) Cross-sectional study: Cross-sectional study is one of the most important types of
descriptive research, it can be done in two ways
Field study
Field survey
Field study: This includes a depth study. Field study involves an in-depth study of a
problem, such as reaction of young men and women towards a product.
Example: Reaction of Indian men towards branded ready-to-wear suit. Field study is
carried out in real world environment settings. Test marketing is an example of field study.
in
Field survey: Large samples are a feature of the study. The biggest limitations of this
e.
fre
survey are cost and time. Also, if the respondent is cautious, then he might answer the
s4
questions in a different manner. Finally, field survey requires good knowledge like
e
attribute in determining the consumption of a product, like sales of a woolen wear in a particular
s:
tp
location. Suppose that the proposition to be examined is that, the urban population is more
ht
likely to use the product than the semi-urban population. This hypothesis can be examined in a
cross-sectional study. Measurement can be taken from a representative sample of the population
in both geographical locations with respect to the occupation and use of the products. In case of
tabulation, researcher can count the number of cases that fall into each of the following classes:
Urban population which does not use the product - Category III
Semi-urban population which does not use the product - Category IV
Here, we should know that the hypothesis need to be supported and tested by the sample data
i.e., the proportion of urbanities using the product should exceed the semi-urban population
using the product.
6.4.2 Survey
The survey is a research technique in which data are gathered by asking questions of respondents.
Survey research is one of the most important areas of measurement in applied social research.
The broad area of survey research encompasses any measurement procedures that involve
asking questions of respondents. A “survey” can be anything form a short paper-and-pencil
feedback form to an intensive one-on-one in-depth interview.
Types of Surveys
Surveys can be divided into two broad categories: the questionnaire and the interview.
Questionnaires are usually paper-and-pencil instruments that the respondent completes.
Interviews are completed by the interviewer based on the respondent says. Sometimes, it’s hard Notes
to tell the difference between a questionnaire and an interview. For instance, some people think
that questionnaires always ask short closed-ended questions while interviews always ask broad
open-ended ones. But you will see questionnaires with open-ended questions (although they do
tend to be shorter than in interviews) and there will often be a series of closed-ended questions
asked in an interview.
Survey research has changed dramatically in the last ten years. We have automated telephone
surveys that use random dialing methods. There are computerized kiosks in public places that
allows people to ask for input. A whole new variation of group interview has evolved as focus
group methodology. Increasingly, survey research is tightly integrated with the delivery of
service. Your hotel room has a survey on the desk. Your waiter presents a short customer
satisfaction survey with your check. You get a call for an interview several days after your last
call to a computer company for technical assistance. You’re asked to complete a short survey
when you visit a web site.
in
Selecting the type of survey you are going to use is one of the most critical decisions in many
e.
fre
social research contexts. You’ll see that there are very few simple rules that will make the
s4
decision for you – you have to use your judgment to balance the advantages and disadvantages
e
Population Issues
w
w
//
s:
He first set of considerations have to do with the population and its accessibility.
tp
ht
1. Can the population be enumerated?: For some populations, you have a complete listing of
the units that will be sampled. For others, such a list is difficult or impossible to compile.
For instance, there are complete listings of registered voters or person with active drivers
licenses. But no one keeps a complete list of homeless people. If you are doing a study that
requires input from homeless persons, you are very likely going to need to go and find
the respondents personally. In such contexts, you can pretty much rule out the idea of mail
surveys or telephone interviews.
2. Is the population literate?: Questionnaires require that your respondents can read. While
this might seem initially like a reasonable assumption for many adult populations, we
know from recent research that the instance of adult illiteracy is alarmingly high. And,
even if your respondents can read to some degree, your questionnaire may contain difficult
or technical vocabulary. Clearly, there are some populations that you would expect to be
illiterate. Young children would not be good targets for questionnaires.
3. Are there language issues?: We live in a multilingual world. Virtually every society has
members who speak other than the predominant language. Some countries (like Canada)
are officially multilingual. And, our increasingly global economy requires us to do research
that spans countries and language groups. Can you produce multiple versions of your
questionnaire? For mail instruments, can you know in advance the language your
respondent speaks, or do you send multiple translations of your instrument? Can you be
confident that important connotations in your instrument are not culturally specific?
Could some of the important nuances get lost in the process of translating your questions?
4. Will the population cooperate?: People who do research on immigration issues have a
difficult methodological problem. They often need to speak with undocumented
immigrants or people who may be able to identify others who are. Why would we expect
those respondents to cooperate? Although the researcher may mean no harm, the
Notes respondents are at considerable risk legally if information they divulge should get into
the hand of the authorities. The same can be said for any target group that is engaging in
illegal or unpopular activities.
5. What are the geographic restrictions?: Is your population of interest dispersed over too
broad a geographic range for you to study feasibly with a personal interview? It may be
possible for you to send a mail instrument to a nationwide sample. You may be able to
conduct phone interviews with them. But it will almost certainly be less feasible to do
research that requires interviewers to visit directly with respondents if they are widely
dispersed.
Sampling Issues
The sample is the actual group you will have to contact in some way. There are several important
sampling issues you need to consider when doing survey research.
1. What data is available? : What information do you have about your sample? Do you
know their current addresses? Their current phone numbers? Are your contact lists up to
date?
in
e.
2. fre
Can respondents be found?: Can your respondents be located? Some people are very busy.
s4
Some travel a lot. Some work the night shift. Even if you have an accurate phone or
e
address, you may not be able to locate or make contact with your sample.
ot
.n
3. Who is the respondent?: Who is the respondent in your study? Let’s say you draw a sample
w
w
a specific individual? Do you want to talk only to the “head of household” (and how is that
s:
tp
person defined)? Are you willing to talk to any member of the household? Do you state
ht
that you will speak to the first adult member of the household who opens the door? What
if that person is unwilling to be interviewed but someone else in the house is willing?
How do you deal with multi-family households? Similar problems arise when you sample
groups, agencies, or companies. Can you survey any member of the organization? Or, do
you only want to speak to the Director of Human Resources? What if the person you
would like to interview is unwilling or unable to participate? Do you use another member
of the organization?
4. Can all members of population be sampled?: If you have an incomplete list of the population
(i.e., sampling frame) you may not be able to sample every member of the population.
Lists of various groups are extremely hard to keep up to date. People move or change their
names. Even though they are on your sampling frame listing, you may not be able to get
to them. And, it’s possible they are not even on the list.
5. Are response rates likely to be a problem?: Even if you are able to solve all of the other
population and sampling problems, you still have to deal with the issue of response rates.
Some members of your sample will simply refuse to respond. Others have the best of
intentions, but can’t seem to find the time to send in your questionnaire by the due date.
Still others misplace the instrument or forget about the appointment for an interview.
Low response rates are among the most difficult of problems in survey research. They can
ruin an otherwise well-designed survey effort.
Question Issues
Sometimes the nature of what you want to ask respondents will determine the type of survey
you select.
1. What types of questions can be asked?: Are you going to be asking personal questions? Notes
Are you going to need to get lots of detail in the responses? Can you anticipate the most
frequent or important types of responses and develop reasonable closed-ended questions?
2. How complex will the questions be?: Sometimes you are dealing with a complex subject or
topic. The questions you want to ask are going to have multiple parts. You may need to
branch to sub-questions.
3. Will screening questions be needed?: A screening question may be needed to determine
whether the respondent is qualified to answer your question of interest. For instance, you
wouldn’t want to ask someone their opinions about a specific computer program without
first “screening” them to find out whether they have any experience using the program.
Sometimes you have to screen on several variables (e.g., age, gender, experience). The
more complicated the screening, the less likely it is that you can rely on paper-and-pencil
instruments without confusing the respondent.
4. Can question sequence be controlled?: Is your survey one where you can construct in
advance a reasonable sequence of questions? Or, are you doing an initial exploratory
study where you may need to ask lots of follow-up questions that you can’t easily anticipate?
in
e.
5. Will lengthy questions be asked?: If your subject matter is complicated, you may need to
fre
give the respondent some detailed background for a question. Can you reasonably expect
s4
your respondent to sit still long enough in a phone interview to ask your question?
e
ot
.n
6. Will long response scales be used?: If you are asking people about the different computer
w
equipment they use, you may have to have a lengthy response list (CD-ROM drive, floppy
w
w
drive, mouse, touch pad, modem, network connection, external speakers, etc.). Clearly, it
//
s:
Content Issues
The content of your study can also pose challenges for the different survey types you might
utilize.
1. Can the respondents be expected to know about the issue?: If the respondent does not keep
up with the news (e.g., by reading the newspaper, watching television news, or talking
with others), they may not even know about the news issue you want to ask them about.
Or, if you want to do a study of family finances and you are talking to the spouse who
doesn’t pay the bills on a regular basis, they may not have the information to answer your
questions.
2. Will respondent need to consult records?: Even if the respondent understands what you’re
asking about, you may need to allow them to consult their records in order to get an
accurate answer. For instance, if you ask them how much money they spent on food in the
past month, they may need to look up their personal check and credit card records. In this
case, you don’t want to be involved in an interview where they would have to go look
things up while they keep you waiting (they wouldn’t be comfortable with that).
Bias Issues
People come to the research endeavor with their own sets of biases and prejudices. Sometimes,
these biases will be less of a problem with certain types of survey approaches.
1. Can social desirability be avoided?: Respondents generally want to “look good” in the
eyes of others. None of us likes to look like we don’t know an answer. We don’t want to
say anything that would be embarrassing. If you ask people about information that may
Notes put them in this kind of position, they may not tell you the truth, or they may “spin” the
response so that it makes them look better. This may be more of a problem in an interview
situation where they are face-to face or on the phone with a live interviewer.
3. Can false respondents be avoided?: With mail surveys it may be difficult to know who
actually responded. Did the head of household complete the survey or someone else? Did
the CEO actually give the responses or instead pass the task off to a subordinate? Is the
person you’re speaking with on the phone actually who they say they are? At least with
personal interviews, you have a reasonable chance of knowing who you are speaking
with. In mail surveys or phone interviews, this may not be the case.
Administrative Issues
in
e.
Last, but certainly not least, you have to consider the feasibility of the survey method for your
fre
study.
s4
e
1. Costs: Cost is often the major determining factor in selecting survey type. You might
ot
.n
prefer to do personal interviews, but can’t justify the high cost of training and paying for
w
the interviewers. You may prefer to send out an extensive mailing but can’t afford the
w
w
postage to do so.
//
s:
tp
2. Facilities: Do you have the facilities (or access to them) to process and manage your study?
ht
In phone interviews, do you have well-equipped phone surveying facilities? For focus
groups, do you have a comfortable and accessible room to host the group? Do you have
the equipment needed to record and transcribe responses?
3. Time: Some types of surveys take longer than others. Do you need responses immediately
(as in an overnight public opinion poll)? Have you budgeted enough time for your study
to send out mail surveys and follow-up reminders, and to get the responses back by mail?
Have you allowed for enough time to get enough personal interviews to justify that
approach?
Clearly, there are lots of issues to consider when you are selecting which type of survey you
wish to use in your study. And there is no clear and easy way to make this decision in many
contexts. There may not be one approach which is clearly the best. You may have to make
tradeoffs of advantages and disadvantages. There is judgment involved. Two expert researchers
may, for the very same problem or issue, select entirely different survey methods. But, if you
select a method that isn’t appropriate or doesn’t fit the context, you can doom a study before you
even begin designing the instruments or questions themselves.
An observational study draws inferences about the possible effect of a treatment on subjects,
where the assignment of subjects into a treated group versus a control group is outside the
control of the investigator. This is in contrast with controlled experiments, such as randomized Notes
controlled trials, where each subject is randomly assigned to a treated group or a control group
before the start of the treatment.
Self Assessment
in
e.
It is concerned with the “Why” aspect of It is concerned with the “What”, “When” or “How
consumer behaviour i.e., it tries to understand fre
often” on the consumer behaviour.
s4
the problem and not measure the result.
e
This research does not require large samples. This needs large samples of respondents.
ot
.n
population.
w//
Due to imprecise statement, data collection is Statement is precise. Therefore data collection is
s:
There is no need for a questionnaire for There should be a properly designed questi onnaire
collecting the data. for data collection.
Task For the below mentioned scenario, lay down your recommendation of the most
suitable type of research. Explain the reasons for your choice.
(1) Exploratory
(2) Descriptive
(3) Experimentation
(4) Longitudinal
(5) Cross-sectional
(a) A Tyre manufacturer is expecting recession in the next two years. The firm
would like to know the changes that are to be made in the current marketing
strategy, so as to minimize the adverse effect of the company’s performance
on account of recession.
Contd...
Notes (b) A company manufacturing cell phones is concerned about a new brand being
introduced by a competitor. The company would like to monitor how the
new brand of the competitor will affect its market share in the next one year.
(c) A ready-to-eat food major would like to introduce iced tea. The company
feels that this product is superior to what is already available in the market.
The company wants to develop a unique promotional theme for the new
product so that it may be clearly differentiated by the consumer and should
appeal to broader section of the population.
(d) A co-operative bank has 4,000 customers who have taken personal loan or
vehicle loan. Of late, the bank feels that there has been an increase in the
number of defaulters. The bank would like to know whether people who are
regular (no default) and defaulters differ in terms of characteristics such as
age, income, occupation, sex, marital status.
in
Causal Research are the studies that engage in hypotheses testing usually explain the nature of
e.
certain relationships, or establish the differences among groups or the independence of two or
fre
s4
more factors in a situation. A research design in which the major emphasis is on determining a
e
cause-and-effect relationship. The research is used to measure what impact a specific change will
ot
.n
have on existing norms and allows market researchers to predict hypothetical scenarios upon
w
Example: If a clothing company currently sells blue denim jeans, causal research can
ht
measure the impact of the company changing the product design to the colour white.
Following the research, company bosses will be able to decide whether changing the colour of
the jeans to white would be profitable.
To summaries, causal research is a way of seeing how actions now will affect a business in the
future. Nevertheless, it has to be remembered that not all causal research hypotheses can be
studied. There are many reasons for this, one of them being that true random assignment is not
possible in many cases. Gender cannot be randomly assigned, and therefore already you cannot
test all causal hypotheses. The three main reasons why you can’t test everything deal with
2. ethics, because we can’t randomly assign that some people receive a virus to test its effects,
or that some participants have to act as slaves and others as masters to test a hypothesis,
and
3. resources, if a researcher does not have the money or the equipment needed to perform a
study, then it won’t be done.
Synopsis is an abstract form of research which underlines the research procedure followed and
is presented before the guide for evaluating its potentiality. In one sentence it may be described
as a condensation of the final report. The structure of synopsis varies and also depends on the
guides’ choice. However, for our understanding a common structure may be framed as under:
1. Defining the Problem: In defining the problem of the research objective, definition of key Notes
terms, general background information, limitations of the study and order of presentation
should be mentioned in brief.
2. Review of Existing Literature:In this head, researcher should study the summary of different
points of view on the subject matter as found in books, periodicals and approach to be
followed at the time of writing.
3. Conceptual Framework and Methodology: Under this head the researcher should first
make a statement of the hypothesis. Discussion on the research methodology used, duly
pointing out the relationship between the hypothesis and objective of the study and
finally discussions about the sources and means of obtaining data should also be made. In
this head the researcher should also point out the limitations of methodology, if any, and
the natural crises from which the research is bound to suffer for such obvious limitations.
4. Analysis of Data: Analysis of the data involves testing of hypothesis from data collected
and key conclusions thus arrived.
5. General Conclusions: In general conclusions, the researcher should make a restatement of
objectives. Conclusion with respect to the acceptance or rejection of hypothesis, conclusion
in
e.
with respect to the stated objectives, suggested areas of further research and final discussion
fre
of possible implications of the study for a model, group, theory and discipline.
es4
Finally the researcher should mention about the bibliographies and appendices. The above
ot
.n
However, in our country, keeping in view the object of research, style and structure of synopsis
w
w
varies and quite often it is found that the research guide exercises his own discretion in synopsis
//
s:
preparation than following some acceptable international norms. A standard format for
tp
preparation of synopsis commonly used in management and commerce research in India may
ht
be drawn as follows:
1. Introduction: This includes definition of the problem and its review from a historical
perspective.
2. Objective of the Study: It defines the research purpose and its speciality from the existing
available research in the related field.
3. Literature Review: It includes among other things, different sources from which the required
abstract is drawn.
4. Methodology: It is intended to draw out the sequences followed in research and ways and
manners of carrying out the survey and compilation of data.
5. Hypothesis: It is a formal statement relating to the research problem and it need to be
tested based on the researchers’ findings.
6. Model: It underlies the nature and structure of the model that the researcher is going to
build in the light of survey findings.
Self Assessment
Example: The data collected may show that the no. of people who own a car and their
income has risen over a period of time. Despite this, we cannot say “No. of car increase is due to
rise in the income”. May be, improved road conditions or increase in number of banks offering
car loans have caused in increase in the ownership of cars.
To find the causal relationship between the variables, the researcher has to do an experiment.
Example:
1. Which print advertisement is more effective? Is it front page, middle page or the last
page?
in
2. Among several promotional measure, such as Advertisement, personal selling,
e.
“which one is more effective”? Can we increase sales of our product by obtaining
fre
additional shelf space? What is experimentation? It is research process in which one
s4
or more variables are manipulated, which shows the cause and effect relationship.
e
ot
Experimentation is done to find out the effect of one factor on the other. The different
.n
w
These are units on which the experiment is carried out. It is done with one or more independent
variables controlled by a person to find out its effect on a dependent variable.
These are the variables whose effects the researcher wishes to examine. For example: Explanatory
variables may be advertising, pricing, packaging etc.
This is a variable which is under study. For example: Sales, Consumer attitudes, Brand loyalty etc.
These are also known as blocking variables. Extraneous variables affect the results of the
experiments.
in
e.
In the second category, extraneous variables may totally elude the researcher’s control. In this
fre
case, we say that the experiment has been confounded i.e., it is not possible to make any conclusions
s4
Example: A company introduces a product in two different cities. It would like to know
w
the impact of advertising on sales. Simultaneously, the competitors’ product in one of the cities
//
s:
is not available during this period due to a strike in the factory. Now, the researcher cannot
tp
ht
conclude that sales of their product in that city have increased due to advertisement. Therefore,
this experiment is confounded. In this case, the strike is the confounding variable.
History: History refers to those events which are external to the experiment, but occur at
the same time as experiment is being conducted. This may affect the result.
Example: Let us suppose that, a manufacture makes a 20% cut in the price of a product
and monitors sales in the coming weeks. The purpose of research is to learn about the impact of
price on sales. Meanwhile, if the production of the product declines due to a shortage of raw
materials, then the sales will not increase. Therefore, we cannot conclude that the price cut did
not have any influence on sales because the history of external events have occurred during the
period and we cannot control the event. The event can only be identified.
Maturation: Maturation is similar to history. Maturation specifically refers to the changes
occurring within the test units and not due to the effect of the experiment. Maturation
takes place due to passage of time. It refers to the effect of people growing older. Persons
who use a particular product may discontinue using that product and may switch over to
an alternate product.
Example:
1. Pepsi is consumed when people are young. Due to passage of time, the consumer
might prefer to consume Diet Pepsi or even avoid it altogether.
Notes 2. Assuming that a training programme is conducted for salesmen, the company wants
to measure the impact of its sales programme. If the company finds that the sales
have improved, it may not be due to its training programme. It may be because
their salesmen have gained more experience now and know the customer better.
Better understanding between salesmen and customer may be the reason for increased
sales.
Maturation effect is not just limited to test unit, composed of people alone. Organisations
also change, dealers grow, become more successful, diversify, and so on.
Testing: Pre-testing effect occurs, when the same respondents are measured more than
once. Responses given at a later stage will have a direct bearing on the responses given
during an earlier measurement.
in
e.
Pretest suffers from limitations of internal validity. This can be understood through an
fre
example. Assume that a respondent’s opinion is measured before and after exposure to a
s4
TV commercial of Hyundai car with Shahrukh Khan as brand ambassador. When the
e
ot
respondent replies for second time, he may remember, how he rated Hyundai during the
.n
first measurement. He may give the same rating to simply prove that he is consistent. In
w
w
that case, the difference between the two measurements will reveal nothing about the real
w
//
impact.
s:
tp
Alternately, some of the respondents might give a different rating during the second
ht
measurement. This may not be due to the fact that the respondent has changed his opinion
about Hyundai and the brand ambassador. He has given different rating because he does
not want to be identified as a person with no change of opinion to the said commercial.
In both cases above, the internal validity suffers.
Example: An equipment such as vacuum cleaner is left behind for the customer’s use for
two weeks. After two weeks, respondents were given a questionnaire to answer. The reply may
be quite different from what was before the trial of the product.
This may be because of two reasons:
(1) Some of the questions have been changed.
(2) The interviewers for pre-testing and post-testing periods are different.
The measurement in experiments will depend upon the instrument used for measurement.
Also, results may vary due to the application of instruments, where there are several
interviewers. Thus, it is very difficult to ensure that all the interviewers will ask the same
questions with the same tone and develop the same rapport. There may be difference in
response, because each interviewer conducts the interview differently.
Bias in Selection: Bias in selection occurs because two groups selected for experiment may
not be identical. If the two groups are asked various questions, they will respond differently.
If multiple groups participate this error recurs frequently. There are two promotional
advertisements, A and B, for ‘ready to eat food’. The idea is to gauge the effectiveness of
two advertisements. Assume that the respondent exposed to ‘A’ are the dominant users of Notes
the product. Now, suppose 50% of those who saw Advertisement A bought the product
and only 10% of those who saw Advertisement B bought the product. From the above, one
should not conclude that advertisement ‘A’ is more effective than advertisement ‘B’. The
main difference may be due to food preference habits between the groups; even in this
case, the internal validity might suffer but to a lesser degree.
Experimental Mortality: Some members may leave the original group and some new
members may join the old group. This is because some members might migrate to another
geographical area. This change in composition of the members will alter the composition
of the group itself.
Example: Assume that a vacuum cleaner manufacturer wants to introduce a new version.
He interviews hundred respondents who are currently using the older version. Let us assume
that, these 100 respondents have rated the existing vacuum cleaner on a 10 point scale (1 for
lowest and 10 for highest). Let the mean rating of the respondents be 7.
in
Now the newer version is demonstrated to the same hundred respondents and the equipment is
e.
fre
left with them for two months. At the end of two months, only 80 participants respond, since the
s4
remaining 20 refused to answer. Now the mean score of 80 respondents is 8 on the same 10 point
e
ot
scale. From this, can we conclude that the new vacuum cleaner is better?
.n
w
The answer to the above question depends on the composition of 20 respondents who dropped
w
w
out. Suppose the 20 respondents who dropped out displayed negative reaction to the product,
//
s:
then the mean score would not have been 8. It would have been even lower than 7. The difference
tp
in mean rating does not give the true picture. It does not indicate that the new product is better
ht
One might wonder, why not we leave the 20 respondents from the original group and calculate
the mean rating of the remaining 80 and compare the two? But this method will also not solve
the mortality effect. Mortality effect will occur in an experiment, irrespective of whether human
beings are involved or not.
Task You are the manager of product planning and marketing research for a home
appliance stores. Your company is considering a proposal to manufacture and market an
emergency lamp in which segment the company currently does not have any product.
You have assigned this project to one of your subordinates.
(b) What data would be useful for deciding whether to develop an emergency lamp or
not?
(c) How will you design a study to obtain the needed data?
Self Assessment
In this design, the dependent variable is measured after exposing the test units to the experimental
variable. This can be understood with the help of following example:
in
e.
fre
s4
Example: Assume M/s Hindustan Lever Ltd wants to conduct an experiment on the
e
“Impact of free sample on the sale of toilet soaps”. Small samples of toilet soaps are mailed to
ot
.n
selected customers in a locality. After one month, a coupon of 25 paise off on one cake of soap is
w
mailed to each customer to whom free samples were sent earlier. An equal number of these
w
w
coupons are also mailed to people in another locality in the neighborhood. The coupons are
//
s:
coded to keep an account of the number of coupons redeemed from each locality. Suppose, 400
tp
coupons were redeemed from the experimental group and 250 coupons were redeemed from
ht
the control group. The difference of 150 is supposed to be the effect of free samples. In this
method, the conclusion can be drawn only after conducting the experiment.
In this method, measurements are made before as well as after the design.
The above example of “Before-after” suffers from validity threat due to the following.
It alerts the respondents to the fact that they are being studied. The respondents may discuss the
topics with friends and relatives and modify their behaviour accordingly.
Instrumentation Effect
This can be due to two different instruments being used – one before and one after. A change in
the interviewers before and after, results in the instrumentation effect.
Factorial design permits the researcher to test two or more variables at the same time. Factorial
design helps to determine the effect of each of the variables and measure the interacting effect of
many variables.
Example: A departmental store wants to study the impact of price reduction for a product.
Given that, there is also promotion (POP) being carried out in the stores (a) near the entrance
(b) at usual place, at the same time. Now assume that there are two price levels namely regular
price A1 and reduced price A2. Let there be three types of POP namely B1, B2 and B3. There are
3 × 2 = 6 combinations possible. The combinations possible are B1A1, B1A2, B2A1, B2A2, B3A1, B3A2.
Which of these combinations is best suited is what the researcher is interested in. Suppose there
are 60 departmental stores of the chain divided into groups of 10 stores each. Now, randomly
assign the above combination to each of these 10 stores as follows:
Combinations Sales
B1A1 S1
in
e.
B1A2 S2
fre
s4
B2A1 S3
e
ot
B2A2 S4
.n
w
B3A1 S5
w
w
B3A2 S6
//
s:
tp
S1 to S6 represents the sales resulting from each variable. The data gathered will provide details
ht
The researcher chooses three shelf arrangements in three stores. He would like to observe the
sales generated in each of these stores at different periods. The researcher must make sure that
one type of shelf arrangement is used in each store only once.
Notes In the Latin Square design, only one variable is tested. As an example of Latin Square design,
assume that a supermarket chain is interested in the effect of in-store promotion on sales.
Suppose there are three promotions considered as follows:
1. No promotion.
2. Free sample with demonstration.
3. Window display.
Which of the three will be effective? The outcome may be affected by the size of the stores and
the time period. If we choose three stores and three time periods, the total number of combination
is 3 × 3 = 9. The arrangement is as follows:
Store
Time period
1 2 3
1 B C A
2 C A B
3 A B C
in
e.
6.7.5 Ex-post Facto Design fre
s4
This is a variation of “after only design”. The groups such as experiment and control are identified
e
ot
Example: Let us assume that a magazine publisher wants to ascertain the impact of
//
s:
advertisement on knitting in ‘Women’s Era’ periodical. The subscribers were asked whether they
tp
ht
have seen this advertisement on ‘knitting’. Those who have read and not read were asked about
the price, design etc. of the product. The difference indicates the effectiveness of the advertisement.
In this design, the experimental group is set to receive the treatment rather than exposing it to
the treatment by its choice.
Caselet Different Opinion on Ad
A medium-size manufacturer of calculators was introducing a new scientific model. The
company wants to communicate the same through an advertising programme. There was
a discussion between the Marketing Manager and Vice-President — Marketing regarding
this. The Marketing Manager was of the opinion that emphasis in the advertisement
should be on features, since that would generate more sales. The Vice-President was of the
opinion that the advertisement should emphasise on price, discounts etc. Since there was
a difference in opinion, a market research agency was called and told to suggest a research
design which would aid in making a final decision about the advertising programme.
If you were to head the Ad agency, what research designs would you recommend and
why?
Self Assessment
There are primarily four types of research namely exploratory research, descriptive research
and experimental research.
Exploratory research helps the researcher to become familiar with the problem.
It helps to establish the priorities for further research. It may or may not be possible to
formulate Hypothesis during exploratory stage.
To get an insight into the problem, literature search, experience surveys, focus groups,
and selected case studies assist in gaining insight into the problem.
The role of moderator or facilitator is extremely important in focus group. There are
several variations in the formation of focus group.
Descriptive research is rigid. This type of research is basically dependent on hypothesis.
Descriptive research is used to describe the characteristics of the groups.
It can also be used forecasting or prediction.
in
e.
True panel and Omni bus panel. fre
s4
In true panel same measurement are made during period of time.
e
ot
In Omni bus panel different measurement are made during a period of time.
.n
w
w
A cross-sectional study involves field study and field survey, the difference being the size
w
of sample.
//
s:
tp
Causal research is conducted mainly to prove the fact that one factor “X” the cause was
ht
6.9 Keywords
Conclusive Research: This is a research having clearly defined objectives. In this type of research,
specific courses of action are taken to solve the problem.
Descriptive Research: It is essentially a research to describe something.
Expost Facto Research: Study of the current state and factors causing it.
Extraneous Variable: These variables affect the response of test units. Also known as confounding
variable.
Factorial Design: This is an experimental design when the effect of two or more variables are
being studied simultaneously.
Field Study: Field study involves an in-depth study of a problem, such as reaction of young men
and women towards a product.
in
e.
Answers: Self Assessment fre
es4
ot
7. Participant’s 8. Unresponsive
ht
CONTENTS
Objectives
Introduction
7.1 Primary Data and Secondary Data
7.1.1 Primary Data
7.1.2 Secondary Data
7.2 Data Collection Methods
7.2.1 Observation Method
7.2.2 Qualitative Techniques of Data Collection
7.3 Questionnaire Designing
7.3.1 Importance of Questionnaire in MR
in
e.
7.3.2 Developing a Good Questionnaire fre
s4
7.3.3 Types of Questionnaires
e
ot
7.4 Summary
7.5 Keywords
7.6 Review Questions
7.7 Further Readings
Objectives
Explain the data collection procedure for primary data and secondary data;
Introduction
Once the researcher has decided the ‘Research Design’, the next job is of data collection. For data
to be useful, our observations need to be organized so that we can get some patterns and come
to logical conclusions.
Statistical investigation requires systematic collection of data, so that all relevant groups are
represented in the data.
Notes To determine the potential market for a new product, for example, the researcher might study
500 consumers in a certain geographical area. It must be ascertained that the group contains
people representing variables such as income level, race, education and neighborhood.
!
Caution The quality of data will greatly affect the conclusions and hence, utmost importance
must be given to this process and every possible precaution should be taken to ensure
accuracy, while gathering and collecting data.
Depending upon the sources utilized, whether the data has come from actual observations or
from records that are kept for normal purposes, statistical data can be classified into two categories
–
Primary and secondary.
Data is one of the most important and vital aspect of any research studies. Researchers conducted
in different fields of study can be different in methodology but every research is based on data
in
which is analyzed and interpreted to get information. Data can be numbers, images, words,
e.
fre
figures, facts or ideas. Data in itself cannot be understood and to get information from the data
s4
one must interpret it into meaningful information. There are various methods of interpreting
e
data. Data sources are broadly classified into primary and secondary data. Let us discuss both of
ot
.n
them:
w
w
w
The data directly collected by the researcher, with respect to the problem under study, is known
ht
as primary data. Primary data is also the firsthand data collected by the researcher for the
immediate purpose of the study. Primary data is one which is collected by the investigator
himself for the purpose of a specific inquiry or study. Such data is original in character and is
generated by surveys conducted by individuals or research institutions.
Importance of Primary data cannot be neglected. A research can be conducted without secondary
data but a research based on only secondary data is least reliable and may have biases because
secondary data has already been manipulated by human beings. In statistical surveys it is
necessary to get information from primary sources and work on primary data: for example, the
statistical records of female population in a country cannot be based on newspaper, magazine
and other printed sources. One such source is old and secondly they contain limited information
as well as they can be misleading and biased.
1. Validity: Validity is one of the major concerns in a research. Validity is the quality of a
research that makes it trustworthy and scientific. Validity is the use of scientific methods
in research to make it logical and acceptable. Using primary data in research can improves
the validity of research. First hand information obtained from a sample that is
representative of the target population will yield data that will be valid for the entire
target population.
2. Authenticity: Authenticity is the genuineness of the research. Authenticity can be at stake
if the researcher invests personal biases or uses misleading information in the research.
Primary research tools and data can become more authentic if the methods chosen to
analyze and interpret data are valid and reasonably suitable for the data type. Primary
sources are more authentic because the facts have not been overdone. Primary source can
be less authentic if the source hides information or alters facts due to some personal Notes
reasons. There are methods that can be employed to ensure factual yielding of data from
the source.
3. Reliability: Reliability is the certainty that the research is enough true to be trusted on.
For example, if a research study concludes that junk food consumption does not increase
the risk of cancer and heart diseases. This conclusion should have to be drawn from a
sample whose size, sampling technique and variability is not questionable. Reliability
improves with using primary data. In the similar research mentioned above if the researcher
uses experimental method and questionnaires the results will be highly reliable. On the
other hand, if he relies on the data available in books and on internet he will collect
information that does not represent the real facts.
Sources for primary data are limited and at times it becomes difficult to obtain data from
primary source because of either scarcity of population or lack of cooperation. Regardless of any
difficulty one can face in collecting primary data; it is the most authentic and reliable data
in
source. Following are some of the sources of primary data:
e.
1.
fre
Experiments: Experiments require an artificial or natural setting in which to perform
s4
logical study to collect data. Experiments are more suitable for medicine, psychological
e
studies, nutrition and for other scientific studies. In experiments the experimenter has to
ot
.n
keep control over the influence of any extraneous variable on the results.
w
w
2. Survey: Survey is most commonly used method in social sciences, management, marketing
w
//
list of questions either open-ended or close-ended for which the respondents give
answers. Questionnaire can be conducted via telephone, mail, live in a public area,
or in an institute, through electronic mail or through fax and other methods.
Interview: Interview is a face-to-face conversation with the respondent. In interview
the main problem arises when the respondent deliberately hides information
otherwise it is an in depth source of information. The interviewer can not only
record the statements the interviewee speaks but he can observe the body language,
expressions and other reactions to the questions too. This enables the interviewer to
draw conclusions easily.
Observations: Observation can be done while letting the observing person know that
he is being observed or without letting him know. Observations can also be made in
natural settings as well as in artificially created environment.
Secondary data are statistics that already exist. They have been gathered not for immediate use.
This may be described as “those data that have been compiled by some agency other than the
user”.
Internal secondary data is a part of the company’s record, for which research is already conducted.
Internal data are those that are found within the organisation.
Example: Sales in units, credit outstanding, call reports of sales persons, daily production
report, monthly collection report, etc.
The data collected by the researcher from outside the company. This can be divided into four
parts:
Census data
Individual project report being published
in
Miscellaneous data
e.
fre
Census data: Census data is the most important data among the sources of data. The following
s4
Population Census
tp
ht
These techniques involve data collection on a commercial basis i.e., data collected by this method
is sold to interested clients, on payment. Example of such organisation is Neilson Retail, ORG
Marg, IMRB etc. These organizations provide NRS called National Readership Survey to the
sponsors and advertising agencies. They also provide business relationship survey called BRS
which estimates the following:
(a) Rating
There is also a study called FSRP which covers children in the age group of 10-19 years. Notes
Beside their demographics and psychographics, the study covers those areas such as:
A syndicated source consists of market research firms offering syndicated services. These market
research organisations, collects and updates information on a continues basis. Since data is
syndicated, their cost is spread over a number of client organisations and hence cheaper. For
example, a client firm can give certain specific question to be included in the questionnaire,
which is used routinely to collect syndicated data. The client will have to pay extra for these. The
data generated by these additional questions and analysis of such data will be revealed only to
in
the firms submitting the questions. Therefore we can say, customization of secondary data is
e.
fre
possible. Some areas of syndicated services are newspapers, magazine readership, TV channel
s4
popularity etc. Data from syndicated sources are available on a weekly or monthly basis.
e
ot
.n
Includes trade association such as FICCI, CEI, Institution of Engineers, chamber of Commerce,
//
s:
Libraries such as public library, University Library etc., literature, state and central government
tp
publications, private sources such as All India Management Association (AIMA), Financial Express
ht
and Financial Dailies, world bodies and international organizations such as IMF, ADB etc.
Advantages
It is economical, without the need to hire field staff.
They provide information, which retailers may not be willing to reveal to researcher.
No training is required to collect this data, unlike primary data.
Disadvantages
Because secondary data has been collected for some other projects, it may not fit in with the
problem that is being defined. In some cases, the feed is so poor that the data becomes completely
inappropriate. It may be ill-suited because of the following three reasons:
Unit of Measurement: It is common for secondary data to be expressed in units.
Example: Size of the retail establishments, for instance, can be expressed in terms of
gross sales, profits, square feet area and number of employees. Consumer incomes can be
expressed in variables the individual, family, household etc. Secondary data available may not
fit in easily.
Assume that the class intervals are quite different from those which are needed.
Notes
Example: Data available with respect to age group is as follows:
<18 year
18-24 years
25-34 years
35-44 years
Suppose the company needs a classification less than 20, 20-30 and 30-40, the above
classification of secondary data cannot be used.
(a) Who has collected the data: The reliability of the source determines the accuracy of
the data. Assume that a publisher of a private periodical conducts a survey of his
readers. The main aim of the survey is to find out the opinion of readers about
in
e.
advertisements appearing in it. This survey is done by the publisher in the hope that
fre
other firms will buy this data before inserting advertisements.
es4
Assume that a professional M.R agency has conducted a similar survey and has sold
ot
.n
If you are an individual who wants information on a particular periodical you buy
//
the data from M.R agency rather from the periodical’s publisher. The reason for this
s:
tp
is trust of the M.R agency. The reasons for trusting the M.R agency are as follows:
ht
2. The data quality of MR agency will be good since they are professionals.
4. What was the time period of data collection? For example, days of the week,
time of the day.
Recency: This pertains to “how old was the information?” If it is five years old, it may be
useless. Therefore, the publication lag is a problem.
The sources of unpublished data are many; they may be found in diaries, letters, unpublished
biographies and autobiographies and may also be available with scholars and research workers,
trade associations, labour bureaus and other public/private individuals and organizations.
Before using secondary data, the researcher must ensure the reliability, suitability and adequacy
of data.
Internal records or published records are often capable of giving remarkably useful information.
Sometimes, the information may be sufficient enough to give the desired result. However, this
preliminary information shall most of the time help in developing the overall research strategy
and hence must be undertaken before any further research is contemplated.
For a manufacturing industry, for example, the internal production and sales records, if designed
and maintained properly, can help in a big way even for formulating the companies strategies.
External sources of data include statistics and reports issued by governments, trade associations
and other reputable organizations such as advertising agencies and research companies and
trade directories.
in
e.
In India, some of the major sources of secondary data are: fre
s4
Indian Council of Agriculture, Central Statistical Organization, Army Statistical
e
ot
Statistics on Small Scale Industries, RBI Bulletin, Annual survey of industries, Indian Labour
w
w
Task List some major secondary sources of information for the following:
2. M.T.R has several product ideas on ready-to-eat products. It wishes to convert ideas
into products and enter the market. Before entering, the company needs to find
necessary information to assess the market potential.
3. An MNC wishes to open a showroom in a Metro. The first step that the company
would like to take is to collect the information about suitability.
4. Number of residential houses less than 10 years old in a given locality.
5. Number of consultancy/recruitment firms in a city.
6. Percentage of families with children less than 15 years in a given locality.
7. Citizens who have electoral I.D cards in a local city.
8. Annual sales figures of a multi-retail outlet.
Self Assessment
Notes 3. ………………………is a part of the company’s record, for which research is already
conducted.
Observation and questioning are two broad approaches available for primary data collection.
The major difference between the two approaches is that, in questioning process, respondent
play an active role, because of interaction with the researcher.
in
e.
fre
Example 1: Suppose a safety week is celebrated and public is made aware of safety
es4
precautions to be observed while walking on the road. After one week, an observer can stand at
ot
a street corner and observe the No. of people walking on footpath and those walking on the road
.n
w
This will tell him whether the campaign on safety is successful or unsuccessful. Sometimes
//
s:
Example 2: Behaviour or attitude of children, and also of those who are inarticulate.
There are several methods of observation of which, any one or a combination of some of them,
can be used by the observer. They are:
Direct-indirect observation
Human-mechanical observation
Example 1: A Manager of a hotel wants to know “How many of his customers visit the
hotel with family and how many visits as single customer”.
Here observation is structured, since it is clear “what is to be observed”. He may tell the waiters
to record this. This information is required to decide the tables and chairs requirement and also
the layout.
Suppose, the Manager wants to know how single customer and customer with family behave
and what is their mood. This study is vague, it needs non-structured observation.
It is easier to record structured observation than non structured observation.
Notes
Example 2: To distinguish between structured and unstructured observation, consider a
study, investigating the amount of search that goes into a “soap purchase”. On the one hand, the
observers could be instructed to stand at one end of a supermarket and record each sample
customer’s search. This may be observed and recorded as follows. “Purchaser first paused after
looking at HLL brand”. He looked at the price on of the product, kept the product back on the
shelf, then picked up a soap cake of HLL and glanced at the picture on the pack and its list of
ingredients, and kept it back. He then checked the label and price for P&G product, kept that
back down again, and after a slight pause, picked up a different flavor soap of M/S Godrej
company and placed it in his trolley and moved down the aisle. On the other hand, observers
might simply be told to record the “First soap cake examined”, by checking the appropriate
boxes in the observation form. The “second situation” represents more structured than the first.
To use more structured approach, it would be necessary to decide precisely, what is to be
observed and the specific categories and units that would be used to record the observations.
2. Disguised-undisguised Observation Methods: In disguised observation, the respondents
do not know that they are being observed. In non disguised observation, the respondents
are well aware that they are being observed. In disguised observation, many times observers
in
pose as shoppers. They are called as “mystery shoppers”. They are paid by the research
e.
fre
organisations. The main strength of disguised observation is that, it allows for maintaining
s4
the true reactions of the individuals.
e
ot
In undisguised method, observation may be contained due to induced error by the objects
.n
phenomenon are observed. Suppose, researcher is interested in knowing about the soft
ht
drink consumption of a student in a hostel room. He may like to observe empty soft drink
bottles dropped into the bin. Similarly, the observer may seek the permission of the hotel
owner, to visit the kitchen or stores. He may carry out a kitchen/stores audit, to find out
the consumption of various brands of spice items being used by the Hotel.
Notes It may be noted that, the success of an indirect observation largely depends
on “How best the observer is able to identify physical evidence of the problem
under study”.
4. Human-mechanical Observation: Most of the studies in marketing research based on
human observation, wherein trained observers are required to observe and record their
observations. In some cases, mechanical devices such as eye cameras are used for
observation. One of the major advantages of electrical/mechanical devices is that, their
recordings are free from subjective bias.
1. The observer might be waiting at the point of observation. Still the desired event may not
take place i.e. observation is required over a long period of time and hence delay may
occur.
2. For observation, extensive training of observers is required.
3. This is an expensive method.
4. External observation gives only surface indications. To go beneath the surface it is very
difficult. So only overt behaviour can be observed.
5. Two observers may observe the same event but may draw inference differently.
6. It is very difficult to gather information on (1) Opinions (2) Intentions etc.
Task What observation technique would you use to gather the following information?
1. What kind of influence do children have on the purchase behaviour of their parents?
in
e.
2. How do discounts influence the purchase behaviour of customers buying colour
TV?
fre
s4
3. A study to find out the potential location for a snack bar in a city.
e
ot
.n
Qualitative research is used to analyse those data which cannot be quantified. Qualitative research
s:
is used in exploratory research. The number of respondents covered in this type of research is
tp
ht
Depth Interview
Unstructured, direct interview is known as a depth interview. Here the interviewer will continue
to ask probing questions of like, “What did you mean by that statement?”, “Why did you feel
this way?” and “What other reasons do you have?” etc., until he is satisfied that he has obtained
the information he wants.
Notes The unstructured interview is free from restrictions imposed by a formal list of
questions.
The interview may be conducted in a casual and informal manner in which the flow of the
conversation determines what questions are to be asked and the order in which they should be
asked.
This is a process where a group of experts in the field gather together. They may have to reach
a consensus on forecasts. Sometimes, the judgment may be made by some group members who
have strong personalities.
Notes In the Delphi approach, the group members are asked to make individual judgments
about a particular subject, say ‘sales forecast’.
These judgments are compiled and returned to the group members, so that they can compare
their previous judgment with those of others. Then they are given an opportunity to revise their
judgments, especially if it differs from the others. They can say, why their judgment is accurate,
even if it differs, from that of the other group members. After 5 to 6 rounds of interaction, the
group members reach conclusion.
in
e.
fre
They are the best known and most widely used type of indirect interviews. Here, a group of
s4
people jointly participate in an unstructured indirect interview conducted by a moderator. The
e
group usually consists of six to ten people. In general, the selected persons have similar
ot
.n
backgrounds. The moderator attempts to focus the discussion on the problem areas.
w
w
Focus groups are used primarily to provide background information and to generate hypothesis
w
//
Projective Techniques
Example: Many a time, people do not want to reveal their true motive for fear of being
branded ‘old fashioned’. Questions such as “Do you do all household work yourself?” The
Notes answer may be ‘no’, though the truth is ‘yes’. A ‘yes’ answer may not be given because it may
suggest that the family is not financially sound and cannot afford a maid for help.
3. TAT and
4. Cartoon test
1. Word Association Test: This test consists of presenting a series of stimulus words to the
respondent, who is asked to answer quickly with the first word that comes to his mind.
The respondent, by answering quickly, gives the word that s(he) or she associates most
in
closely with the stimulus word.
e.
fre
s4
Example:
e
ot
What brand of detergent comes to your mind first, when I mention washing of an
.n
w
expensive cloth?
w
w
(1) Surf
//
s:
tp
(2) Tide
ht
(3) Key
(4) Ariel
Who drinks the milk most?
(1) Athletes
(2) Young boys
(3) Adults
(4) Children
In a study of cigarettes, the respondent is asked to give the first word that comes to
his mind.
(1) Injurious
(2) Style
(3) Strong
(4) Stimulus
(5) Bad manners
(6) Disease
(7) Pleasure
2. Completion Techniques
Sentence completion
Story completion
Sentence Completion: Here the respondents have to finish a set of incomplete sentences. Notes
Example: Let us make a study dealing with people’s inner feelings towards software
professionals.
(a) Earnings of a software professional …………….
(b) Being a software professional means …………….
(c) Working hours for software professional are …………….
(d) The personal life of a software professional is …………….
(e) The social status of software professional is …………….
Suppose you want to study the attitude towards a periodical:
(a) A person who reads Women’s Era periodical is …………….
(b) Business World periodical appeals to ……………..
(c) Outlook periodical is read by …………….
(d) Investor periodical is mostly liked by …………….
in
e.
Suppose you want to provide a basis for developing advertising appeal for a brand
of cooking oil, the following sentence may be used:
fre
s4
Example:
Mr. X belongs to the upper-middle class. He received a telephone call, where the
caller said that “I am from Globe Travels. Sir, I want to tell you about our recent
offer, that is, if you travel to the US this summer, you will get two tickets free by the
year end to fly to the Far East.
What was Mr. X’s reaction? Why?
Two children are quarreling at the breakfast table before going to school. The younger
of the two, has spilled coffee on her brother’s shirt which he was supposed to wear
on the same day for attending annual sports event.
What will the mother do?
The story completion has numerous applications in solving marketing problem. The
most important of which is to provide data to the seller, recognising the image and
feelings people have about the company ’s products and services. This method is used
before finalising an advertisement.
3. Thematic Apperception Test (TAT): TAT is a projective technique. It is used to measure the
attitude and perception of the individual. Some picture cards are shown to respondents.
Notes The respondent is required to tell the story by looking at the picture. When the subjects
start telling the story, the researcher notices the respondents’ expression, pauses and
emotions to draw the inference.
In the TAT, the test subject (the boy
shown here) examines a set of cards that
portray human figures in a variety of
settings and situations, and is asked to
tell a story about each card. The story
includes the event shown in the picture,
preceding events, emotions and thoughts
of those portrayed, and the outcome of
the event shown. The story content and
structure are thought to reveal the
subject’s attitudes, inner conflicts, and
views. Customer insights may be extracted by posing the questions given above to the
respondents
Source: (http://www.minddisorders.com)
4. Cartoon Test or Balloon Test: Here a cartoon is shown. The cartoon character belongs to
a particular situation. One or more of ‘balloons’ include the conversation of the character,
in
and is left open and the respondent is asked to fill in. In comparing the cartoon technique
e.
fre
with the direct question, let us take the example of “choosing a brand ambassador”.
e s4
ot
.n
w
w
w
//
s:
tp
ht
In the above case, with which person would you agree and why ?
Caselet
G iven below are some topics. In each case, indicate whether the research is qualitative
or quantitative in nature. Also recommend specific techniques for each.
(a) A company would like to come out with ideas to creatively communicate the benefits
of a new detergent through a TV commercial.
(b) Hospital authorities want to ascertain their patients’ ratings of attributes like medical
treatment, room service, emergency service, etc.
Contd...
(c) After discussing with several sales people, the sales manager suspects that the morale Notes
of the sales force is low, and wants to confirm this by using an employee morale
questionnaire.
(d) A firm marketing toffee has two alternative wrapper designs for the product and is
wonders, which one will result in higher sales.
Self Assessment
in
e.
fre
Questionnaires are an inexpensive way to gather data from a potentially large number of
s4
respondents. Often they are the only feasible way to reach a number of reviewers large enough
e
ot
to allow statistically analysis of the results. A well-designed questionnaire that is used effectively
.n
w
can gather information on both the overall performance of the test system as well as information
w
the participants, they can be used to correlate performance and satisfaction with the test system
s:
tp
of questions and other prompts for the purpose of gathering information from respondents.
Although they are often designed for statistical analysis of the responses, this is not always the
case. The questionnaire was invented by Sir Francis Galton.
!
Caution It is important to remember that a questionnaire should be viewed as a multi-
stage process beginning with definition of the aspects to be examined and ending with
interpretation of the results.
Every step needs to be designed carefully because the final results are only as good as the
weakest link in the questionnaire process. Although questionnaires may be cheap to administer
compared to other data collection methods, they are every bit as expensive in terms of design
time and interpretation.
Questionnaires have advantages over some other types of surveys in that they are cheap, do not
require as much effort from the questioner as verbal or telephone surveys, and often have
standardized answers that make it simple to compile data. However, such standardized answers
may frustrate users. Questionnaires are also sharply limited by the fact that respondents must be
able to read the questions and respond to them. Thus, for some demographic groups conducting
a survey by questionnaire may not be practical.
Notes As a type of survey, questionnaires also have many of the same problems relating
to question construction and wording that exist in other types of opinion polls.
To study:
1. Behaviour, past and present.
2. Demographic characteristics such as age, sex, income, occupation.
3. Attitudes and opinions.
4. Level of knowledge.
in
e.
5. It must keep the respondent interested throughout.
fre
s4
Cost is less
Lasts longer
Better fragrance
Produces more lather
Notes
Example: “Subjects attitude towards Cyber laws and the need for government legislation
to regulate it”.
I can’t say
Very urgently needed
in
(a) In the Lok Sabha for approval.
e.
(b) Approved by the Lok Sabha and pending in the Rajya Sabha.
fre
s4
(c) Passed by both the Houses, pending the presidential approval.
e
ot
Depending on which answer the respondent chooses, his knowledge on the subject is
w
w
classified.
//
s:
In a disguised type, the respondent is not informed of the purpose of the questionnaire.
tp
Example:
1. ”Tell me your opinion about Mr. Ben’s healing effect show conducted at Bangalore?”
2. “What do you think about the Babri Masjid demolition?”
3. Non-structured and Disguised Questionnaire: The main objective is to conceal the topic of
enquiry by using a disguised stimulus. Though the stimulus is standardized by the
researcher, the respondent is allowed to answer in an unstructured manner. The assumption
made here is that individual’s reaction is an indication of respondent’s basic perception.
Projective techniques are examples of non-structured disguised technique. The techniques
involve the use of a vague stimulus, which an individual is asked to expand or describe or
build a story, three common types under this category are (a) Word association (b) Sentence
completion (c) Story telling.
4. Non-structured and Non-disguised Questionnaire: Here the purpose of the study is clear,
but the responses to the question are open-ended.
Example: “How do you feel about the Cyber law currently in practice and its need for
further modification”? The initial part of the question is consistent. After presenting the initial
question, the interview becomes very unstructured as the interviewer probes more deeply.
Subsequent answers by the respondents determine the direction the interviewer takes next. The
question asked by the interviewer varies from person to person. This method is called “the
depth interview”. The major advantage of this method is the freedom permitted to the interviewer.
By not restricting the respondents to a set of replies, the experienced interviewers will be above
to get the information from the respondent fairly and accurately.
Notes
!
Caution The main disadvantage of this method of interviewing is that it takes time, and
the respondents may not co-operate.
Another disadvantage is that coding of open-ended questions may pose a challenge.
Example: When a researcher asks the respondent “Tell me something about your
experience in this hospital”. The answer may be “Well, the nurses are slow to attend and the
doctor is rude. ‘Slow’ and ‘rude’ are different qualities needing separate coding. This type of
interviewing is extremely helpful in exploratory studies.
Types Characteristics
Structured – The same question is posed to each respondent.
Disguised Administering the questionnaire and post-administration work is simple i.e.
coding tabulating, etc. is easy.
in
e.
This type of questionnaire is least used in market research.
fre
Respondents’ bias is minimized.
s4
Unstructured – This type of questionnaire is very commonly used for focus group
e
Disguised discussions.
ot
.n
The first question to be asked by the market researcher is “what type of information does he
need from the survey?” This is valid because if he omits some information on relevant and vital
aspects, his research is not likely to be successful. On the other hand, if he collects information
which is not relevant, he is wasting his time and money.
!
Caution At this stage, information required, and the scope of research should be clear.
2. Get additional information on the research issue, from secondary data and exploratory
research. The exploratory research will suggest “what are the relevant variables?”
in
(a) awareness, (b) facts, (c) opinions, (d) attitudes, (e) future plans, (f) reasons.
e.
fre
Facts are usually sought out in marketing research.
s4
e
ot
.n
Example: Which television programme did you see last Saturday? This requires a
w
w
reasonably good memory and the respondent may not remember. This is known as recall loss.
w
Therefore questioning the distant past should be avoided. Memory of events depends on
//
s:
(1) Importance of the events and (2) Whether it is necessary for the respondent to remember. In
tp
ht
the above case, both the factors are not fulfilled. Therefore, the respondent does not remember.
On the contrary, a birthday or wedding anniversary of individuals is remembered without
effort since the event is important. Therefore, the researcher should be careful while asking
questions about the past. First, he must make sure that the respondent has the answer.
Example: Do you go to the club? He may answer ‘yes’, though it is untrue. This may be
because the respondent wants to impress upon the interviewer that he belongs to a well-to-do
family and can afford to spend money on clubs. To obtain facts, the respondents must be
conditioned (by good support) to part with the correct facts.
The questionnaire can be used to collect information either through personal interview,
mail or telephone. The method chosen depends on the information required and also the
type of respondent. If the information is to be collected from illiterate individuals, a
questionnaire would be the wrong choice.
Type of Questions
Open-ended Questions
These are questions where respondents are free to answer in their own words.
Example: “What factor do you consider while buying a suit”? If multiple choices are
given, it could be colour, price, style, brand etc., but some respondents may mention attributes
which may not occur to the researcher.
Notes Therefore, open-ended questions are useful in exploratory research, where all possible alternatives
are explored. The greatest disadvantage of open-ended questions is that the researcher has to
note down the answer of the respondents verbatim. Therefore, there is a likelihood of the
researcher failing to record some information.
Another problem with open-ended question is that the respondents may not use the same frame
of reference.
The respondent may have meant “basic pay” but interviewer may think that the respondent is
talking about “total pay including dearness allowance and incentive”. Since both of them refer
to pay, it is impossible to separate two different frames.
Dichotomous Question
These questions have only two answers, ‘Yes’ or ‘no’, ‘true’ or ‘false’ ‘use’ or ‘don’t use’.
in
e.
fre
Do you use toothpaste? Yes ……….. No …………
s4
Close-ended Questions
Which of the following words or phrases best describes the kind of person you feel would be
most likely to use this product, based on what you have seen in the commercial?
(II) Based on what you saw in the commercial, how interested do you feel, you would be Notes
buying the products?
Definitely
Probably I would buy
I may or may not buy
Probably I would not buy
in
e.
fre
s4
e
Task Which type of questionnaires do you think to be easier to answer? Give reasons to
ot
Wordings of Questions
//
s:
tp
Wordings of particular questions could have a large impact on how the respondent interprets
ht
them. Even a small shift in the wording could alter the respondent’s answer.
Example:
“Don’t you think that Brazil played poorly in the FIFA cup?” The answer will be ‘yes’.
Many of them, who do not have any idea about the game, will also most likely say ‘yes’.
If the question is worded in a slightly different manner, the response will be different.
“Do you think that, Brazil played poorly in the FIFA cup?” This is a straightforward
question. The answer could be ‘yes’, ‘no’ or ‘don’t know’ depending on the knowledge the
respondents have about the game.
“Do you think anything should be done to make it easier for people to pay their phone
bill, electricity bill and water bill under one roof”?
“Don’t you think something might be done to make it easier for people to pay their phone
bill, electricity bill, water bill under one roof”?
A change of just one word as above can generate different responses by respondents.
Guidelines towards the use of correct wording:
Is the vocabulary simple and familiar to the respondents?
Example:
Instead of using the word ‘reasonably’, ‘usually’, ‘occasionally’, ‘generally’, ‘on the whole’.
“How often do you go to a movie?” “Often, may be once a week, once a month, once in
two months or even more.”
These are questions, in which the respondent can agree with one part of the question, but not
agree with the other or cannot answer without making a particular assumption.
Example:
“Do you feel that firms today are employee-oriented and customer-oriented?” There are
two separate issues here – [yes] [no]
“Are you happy with the price and quality of branded shampoo?” [yes] [no]
Leading Questions
A leading question is one that suggests the answer to the respondent. The question itself will
influence the answer, when respondents get an idea that the data is being collected by a company.
in
e.
The respondents have a tendency to respond positively.
fre
es4
Example:
ot
.n
“How do you like the programme on ‘Radio Mirchy’? The answer is likely to be ‘yes’. The
w
w
unbiased way of asking is “which is your favorite F.M. Radio station? The answer could be
w
//
any one of the four stations namely 1. Radio City 2. Mirchy 3. Rainbow 4. Radio-One.
s:
tp
Do you think that offshore drilling for oil is environmentally unsound? The most probable
ht
response is ‘yes’. The same question can be modified to eliminate the leading factor.
What is your feeling about the environmental impact of offshore drilling for oil? Give choices
as follows:
3. No opinion.
Loaded Questions
A leading question is also known as a loaded question. In a loaded question, special emphasis is
given to a word or a phrase, which acts as a lead to respondent.
Example:
“Do you own a Kelvinator refrigerator.”
A better question would be “what brand of refrigerator do you own?”
Notes
Example:“Do you think that the government publications are distributed effectively”?
This is not the correct way, since respondent does not know what the meaning of the word
effective distribution is. This is confusing. The correct way of asking questions is “Do you think
that the government publications are readily available when you want to buy?” Example “Do
you think whether value price equation is attractive”? Here, respondents may not know the
meaning of value price equation.
Applicability
“Is the question applicable to all respondents?” Respondents may try to answer a question even
though they don’t qualify to do so or may lack from any meaningful opinion.
in
e.
taken a loan).
fre
Avoid Implicit Assumptions
es4
An implicit alternative is one that is not expressed in the options. Consider following two
ot
.n
questions:
w
w
2. Would you prefer to have a job, or do you prefer to do just domestic work?
s:
tp
Even though, we may say that these two questions look similar, they vary widely. The difference
ht
Example: “Why do you use Ayurvedic soap”? One respondent might say “Ayurvedic
soap is better for skin care”. Another may say “Because the dermatologist has recommended”.
A third might say “It is a soap used by my entire family for several years”. The first respondent
answers the reason for using it at present. The second respondent answers how he started using.
The third respondent “the family tradition for using”. As can be seen, different reference frames
are used. The question may be balanced and rephrased.
Complex Questions
In which of the following do you like to park your liquid funds?
Debenture
Preferential share
Equity linked M.F
I.P.O
Fixed deposit
If this question is posed to the general public, they may not know the meaning of liquid fund.
Most of the respondents will guess and tick one of them.
Generally as a thumb rule, it is advisable to keep the number of words in a question not
exceeding 20. The question given below is too long for the respondent to comprehend, leave
alone answer.
Example: Do you accept that the people whom you know, and associate yourself have
been receiving ESI and P.F benefits from the government accept a reduction in those benefits,
with a view to cut down government expenditure, to provide more resources for infrastructural
development?
Sometimes the respondent may not have the information that is needed by the researcher.
in
Example:
e.
fre
The husband is asked a question “How much does your family spend on groceries in a
s4
week”? Unless the respondent does the grocery shopping himself, he will not know how
e
ot
much has been spent. In a situation like this, it will be helpful to ask a ‘filtered question’.
.n
An example of a filtered question can be, “Who buys the groceries in your family”?
w
w
w
“Do you have the information of Mr. Ben’s visit to Bangalore”? Not only should the
//
s:
individual have the information but also s(he) should remember the same. The inability
tp
(1) Basic information (2) Classification (3) Identification information. Items such as age, sex,
income, education etc. are questioned in the classification section. The identification part involves
body of the questionnaire. Always move from general to specific questions on the topic. This is
known as funnel sequence. Sequencing of questions is illustrated below:
(1) Which TV shows do you watch?
The above three questions follow a funnel sequence. If we reverse the order of question and ask Notes
“which show was watched last week”?, the answer may be biased. This example shows the
importance of sequencing.
Example: Clear instructions, gaps between questions, answers and spaces are part of
layout. Two different layouts are shown below:
_____Less than 1 year _____1 to 2 years _____2 to 4 years _____ more than 4 years.
______1 to 2 years.
______2 to 4 years.
in
e.
______ More than 4 years.
fre
s4
From the above example, it is clear that layout – 2 is better. This is because likely respondent
e
Therefore, while preparing a questionnaire start with a general question. This is followed by a
w
w
direct and simple question. This is followed by more focused questions. This will elicit maximum
w
//
information.
s:
tp
Suppose the questionnaire is not provided with ‘don’t know’ or ‘no option’, then the respondent
is forced to choose one side or the other. A ‘don’t know’ is not a neutral response. This may be
due to genuine lack of knowledge.
In a balanced scale, the numbers of favorable responses are equal to the number of unfavorable
responses. If the researcher knows that there is a possibility of a favourable response, it is best
to use unbalanced scale.
Funnel sequencing gets the name from its shape, starting with broad questions and progressively
narrowing down the scope. Move from general to specific examples.
1. How do you think this country is getting along in its relations with other countries?
2. How do you think we are doing in our relations with the US?
5. Some say we are very weak on the nuclear deal with the US, while, some say we are OK.
What do you feel?
The first question introduces the general subject. In the next question, a specific country is
mentioned. The third and fourth questions are asked to seek views. The fifth question is to seek
a specific opinion.
Example: The word used by researcher must convey the same meaning to the respondents.
Are instructions clear skip questions clear?
One of the prime conditions for pre-testing is that the sample chosen for pre-testing should be
similar to the respondents who are ultimately going to participate. Just because a few chosen
respondents fill in all the questions going does not mean that the questionnaire is sound.
The questionnaire should not be too long as the response will be poor. There is no rule to decide
this. However, the researcher should consider that if he were the respondent, how he would
react to a lengthy questionnaire. One way of deciding the length of the questionnaire is to
calculate the time taken to complete the questionnaire. He can give the questionnaire to a few
known people to seek their opinion.
in
e.
fre
s4
e
Task Give one example for each of the following type of the questions:
ot
.n
1. Leading question
w
w
w
2. Double-barreled question
//
s:
3. Close-ended question
tp
ht
Mail questionnaires can be explained as the questionnaires that are mailed to the respondents
who can complete them at their convenience in their homes and at their own pace. They are
expected to meet with a better response rate when respondents are notified in advance about the
forthcoming survey and a reputed research organization administers them with its own
introductory cover letter.
Example: “Tell me spontaneously, what comes to your mind if I ask you about cigarette
smoking”.
3. In case of a mail questionnaire, it is not possible to verify whether the respondent himself/
herself has filled the questionnaire. If the questionnaire is directed towards the housewife,
say, to know her expenditure on kitchen items, she alone is supposed to answer it. Instead,
if her husband answers the questionnaire, the answer may not be correct.
4. Any clarification required by the respondent regarding questions is not possible.
in
e.
fre
Example: Prorated discount, product profile, marginal rate, etc., may not be understood
s4
by the respondents.
e
ot
.n
5. If the answers are not correct, the researcher cannot probe further.
w
w
w
3. If a lengthy questionnaire has to be made, first write a letter requesting the cooperation of
the respondents.
Task A nationalised bank wants to determine what is most effective way to increase
responses to their mail questionnaire? Three possibilities are:
1. Issue a gift coupon for 25 so that the respondent can go to a specified store to avail
the gift item.
2. Ask the respondent to note their name and address in the completed questionnaire.
Thereafter they will be mailed a cheque for 50.
3. Along with a questionnaire, write a letter stating pen set as gifts would be sent to
them, after receiving the completed questionnaire. Mail the questionnaire to 2,000
respondents chosen from four metros. Set up an experiment in which the above
incentives can be tested and the most appropriate method identified.
Schedule may be defined as a proforma that contains a set of questions which are asked and
filled by an interviewer in a face to face situation with another. It is a standardized device or tool
of observation to collect the data in an objective manner. In this method of data collection the
interviewer puts certain questions and the respondent furnishes certain answers and the
interviewer records as they are given.
in
e.
he requires to analyze and tabulate the data. Therefore schedule acts as an “aide memoire”.
fre
s4
Aid to classification and analysis: Another objective of the schedule is to tabulate and
e
Types of Schedules
w
w
//
s:
Observation Schedule: The schedules which are used for observation are known as
observation schedules. Using this schedule, observer records the activities and responses
of an individual respondent or a group of respondents under specific conditions. The main
purpose of the observation schedule is to verify information.
Rating Schedule: Rating schedules are used to assess the attitudes, opinions, preferences,
inhibitions, perceptions and other similar elements or attributes of respondent. Such
measurement is done using a Rating Scale.
Document Schedule: These schedules are used in exploratory research to obtain data
regarding written evidence and case histories from autobiography, diary, or records of
government etc. It is an important method for collecting preliminary data or for preparing
a source list.
Institution Survey Schedules: This type of schedule is used for studying different problems
of institutions.
Interview Schedule: Using his schedule, an interviewer presents the questions to the
interviewee and records his responses in the given space of the questionnaire.
Saving of time: While filling the schedule, the researcher may use abbreviation or short Notes
forms for answers, he may also generate a template. All these steps help in saving of time
in data collection.
Personal contact: In the schedule method there is a personal contact between the
respondent and the field worker. The behaviour, and character of respondent obviously
facilitates the research work.
Human touch: Sometimes reading something does not impress as much as when the same
is heard or spoken by experts as they are able to lay the right emphasis. This greatly
improves the response.
Deeper probe: Through this method it is possible to probe deeper into the personality,
living conditions, values, etc., of the respondents.
Defects in sampling are detected: If there are some defects in schedule during sampling it
easily come to the notice and can be rectified by the researcher.
Removal of doubts: Presence of enumerator removes the doubts in the minds of respondent
on the one hand and avoid from the respondent artificial replies owing to fear of cross
in
checking on the other hand.
e.
fre
Human elements make the study more reliable and dependable: The presence of human
s4
elements makes the situation more attractive and interesting which helps in making
e
ot
Costly and time-consuming: This method is costly and time consuming due to its basic
requirement of interviewing the respondents. This becomes a serious limitation when
respondents are not found in a particular region but are scattered over a wide area.
Need of trained field workers: The schedule method requires involvement of well trained
and experienced field workers. This involves great cost and sometimes workers are not
easily available forcing engagement of inexperienced hands, which defeats the purpose
trained of research.
Adverse effect of personal presence: Sometimes personal presence of enumerator becomes
an inhibiting factor. Many people despite knowing certain facts cannot say them in the
presence of others.
Organizational difficulties: If the field of research is dispersed, it becomes difficult to
organize it. Getting trained manpower, assigning them duties and then administrating
the research is a very difficult task.
Accurate communication: It means that the questions given in the schedule should enable
the respondent to understand the context in which they are asked.
Accurate response: The schedule should structure in such a manner so that the required
information are accurate and secured. For this, following steps should be taken.
The size of the schedule should be precise and attractive.
in
7.3.7 Sample Questionnaires
e.
fre
s4
A Study of Customer Retention as Adopted by Textile Retail Outlets
e
ot
2. Address:
ht
in
e.
Gifts on special occasion fre
s4
14. Which one do you think is most effective? Please rank them?
ht
Date: Place:
Form No: [ ] [ ] [ ] [ ] [ ]
1. Personal Profile
(a) Name: [][][][][][][][][][][][][][][][][]
Contd...
in
Vendor’s Reputation [] Technical Expertise []
e.
Client Base
fre []
s4
e
Standard [] Intermediate []
Latest / Advanced []
8. In your PC, would you prefer?
Conventional Design [] Innovative Design []
If new, why:
New design distracts attention -
New design means increased price -
New design is hard to adapt -
If Innovative, why?
To create own identity
Out of business need -
Space management -
9. Rate the following four factors important for innovative design, starting with the
most preferred:
(a) Size (b) Shape
(c) Colour/ordinary (d) Portability and Sturdiness
1. ————————————— 3. ———————————
2. ————————————— 4. ———————————
Contd...
10. To what extent would the computer increase your efficiency? Notes
Negligible [] 20-40% []
40-60% [] More []
11. How many hours on an average per week would you use your PC?
0 to 5 hours [] 6 to 12 hours []
13 to 18 hours [] More []
12. While using your PC, most of the time would be for:
Education [] Accounting []
__________________________________________________________________________
__________________________________________________________________________
in
e.
__________________________________________________________________________
fre
s4
Signature of the Respondent ________________________
e
ot
.n
Self Assessment
w
w
w
10. The main objective of ………………….. is to conceal the topic of enquiry by using a disguised
stimulus.
11. …………………. are questions where respondents are free to answer in their own words.
12. A …………………. question is one that suggests the answer to the respondent.
7.4 Summary
Data sources are broadly classified into primary and secondary data.
Primary data is one which is collected by the investigator himself for the purpose of a
specific inquiry or study. The data directly collected by the researcher, with respect to the
problem under study, is known as primary data.
Observation method has a limitation i.e., certain attitudes, knowledge, motivation etc.
cannot be measured by this method.
Secondary data are statistics that already exists. These may not be readily used because
these data are collected for some other purpose.
There are two types of secondary data (1) Internal and (2) External secondary data.
Census is the most important among secondary data.
Syndicated data is an important form of secondary data which may be classified into
(a) Consumer purchase data (b) Retailer and wholesaler data (c) Advertising data.
Notes Questions in a questionnaire may be classified into open question, close ended questions,
dichotomous questions etc.
7.5 Keywords
Closed-ended questions: There are two basic formats in this type: (a) Make one or more choices
among the alternatives and (b) Rate the alternatives.
Dichotomous questions: These questions have only two answers, ‘Yes’ or ‘no’, ‘true’ or ‘false’
‘use’ or ‘don’t use’.
Internal Secondary Data: Is that data which is a part of company’s record, for which research is
already conducted.
Leading question: A leading question is one that suggests the answer to the respondent.
Open-ended questions: These are questions where respondents are free to answer in their own
words.
Primary Data: Data directly collected by the researcher, with respect to problem under study, is
in
known as primary data.
e.
fre
Recency: This refers to “How old is the information?” If it is five years old, it may be useless.
es4
Structured disguised Questionnaire: This type of Questionnaire is used to find, peoples’ attitude,
ot
.n
3. What are the several methods used to collect data by observation method?
4. What are the advantages and limitations of collecting data by observation method?
1. Primary data
2. Authenticity
3. Internal secondary data
4. Interview
5. Observation, questioning
6. Direct observation
7. Qualitative research
8. Depth
9. Questionnaire
10. Non-Structured and Disguised Questionnaire
12. Leading
Books Boyd, Westfall, and Stasch, Marketing Research - Text and Cases, All India Traveller
Bookseller, New Delhi.
Brown, F.E., Marketing Research - A Structure for Decision-making, Addison-Wesley
Publishing Company.
Kothari, C.R., Research Methodology - Methods and Techniques, Wiley Eastern Ltd.
S.N. Murthy and U. Bhojanna, Business Research Methods, Excel Books, 2007.
Stockton and Clark, Introduction to Business and Economic Statistics, D.B. Taraporevala
Sons and Co. Private Limited, Bombay.
in
e.
fre
s4
Online links
e
www.indiastudychannel.com
ot
.n
www.scribd.com/doc
w
w
www.soas.ac.uk
w
//
s:
www.web-source.net
tp
ht
CONTENTS
Objectives
Introduction
8.1 Meaning of Sampling
8.1.1 Sample Frame
8.1.2 When is a Census Appropriate?
8.1.3 When is Sample Appropriate?
8.2 Sampling Process
8.3 Types of Sample Design
8.3.1 Probability Sampling Techniques
8.3.2 Non-probability Sampling Techniques
in
e.
8.4 Distinction between Probability Sample and Non-probability Sample
fre
s4
8.5 Errors in Sampling
e
ot
Objectives
Define sampling;
Introduction Notes
The most important task in carrying out a survey is to select the sample. Sample selection is
undertaken for practical impossibility to survey the population. By applying rationality in
selection of samples, we generalise the findings of our research.
In carrying out a survey relating to the research, we should first select the problem and study its
implications in different areas. Selection of the research problem, as has already been stated,
should be in line with the researcher’s interest, chain of thinking and existing research in the
same area and should have some direct utility. What is most important in selecting a research
problem is that the research topic should be within manageable limits.
Sampling is the process of selecting units (e.g., people, organizations) from a population of
interest so that by studying the sample we may fairly generalize our results back to the population
from which they were chosen. Each observation measures one or more properties (weight,
location, etc.) of an observable entity enumerated to distinguish objects or individuals. Survey
in
e.
weights often need to be applied to the data to adjust for the sample design. Results from
fre
probability theory and statistical theory are employed to guide practice. A sample is a part of a
s4
target population, which is carefully selected to represent the population. Sampling frame is the
e
ot
list of elements from which the sample is actually drawn. Actually, sampling frame is nothing
.n
Sampling frame is the list of elements from which the sample is actually drawn. Actually,
sampling frame is nothing but the correct list of population.
Example: A researcher may be interested in contacting firms in iron and steel or petroleum
products industry. These industries are limited in number, so a census will be suitable.
2. Sometimes, the researcher is interested in gathering information from every individual.
in
e.
7. Selection of sample. fre
s4
(1) Define the population: Population is defined in terms of:
e
ot
(1) Elements
.n
w
(3) Extent
//
s:
tp
(4) Time.
ht
Example: You want to learn about scooter owners in a city. The RTO will be the frame,
which provides you names, addresses and the types of vehicles possessed.
(3) Specify the sampling unit: Individuals who are to be contacted are the sampling units. If
retailers are to be contacted in a locality, they are the sampling units.
Sampling unit may be husband or wife in a family. The selection of sampling unit is very
important. If interviews are to be held during office timings, when the heads of families
and other employed persons are away, interviewing would under-represent employed
persons, and over-represent elderly persons, housewives and the unemployed.
(4) Selection of sampling method: This refers to whether (a) probability or (b) non-probability
methods are used.
(5) Determine the sample size: This means we need to decide “how many elements of the Notes
target population are to be chosen?” The sample size depends upon the type of study that
is being conducted.
Example: If it is an exploratory research, the sample size will be generally small. For
conclusive research, such as descriptive research, the sample size will be large.
The sample size also depends upon the resources available with the company. It depends
on the accuracy required in the study and the permissible errors allowed.
(6) Specify the sampling plan: A sampling plan should clearly specify the target population.
Improper defining would lead to wrong data collection.
in
e.
(7) fre
Select the sample: This is the final step in the sampling process.
es4
Self Assessment
ot
.n
w
3. Sampling ........................ is the list of elements from which the sample is actually drawn.
6. The sample size depends upon the ........................ available with the company.
1. Random sampling.
4. Cluster sampling.
5. Multi-stage sampling.
Simple random sample is a process in which every item of the population has an equal probability
of being chosen.
There are two methods used in the random sampling:
(1) Lottery method: Take a population containing four departmental stores: A, B, C and D.
Suppose we need to pick a sample of two stores from the population using a simple
random procedure. We write down all possible samples of two. Six different combinations,
each containing two stores from the population, are AB, AD, AC, BC, BD, CD. We can now
write down six sample combination on six identical pieces of paper, fold the piece of paper
so that they cannot be distinguished. Put them in a box. Mix them and pull one at random.
This procedure is the lottery method of making a random selection.
(2) Using random number table: A random number table consists of a group of digits that are
in
arranged in random order, i.e., any row, column, or diagonal in such a table contains
e.
fre
digits that are not in any systematic order. There are three tables for random numbers
s4
(a) Tippet’s table (b) Fisher and Yate’s table (c) Kendall and Raington table.
e
ot
40743 39672
w
//
80833 18496
s:
tp
10743 39431
ht
88103 23016
53946 43761
31230 41212
24323 18054
Example: Taking the earlier example of stores. We first number the stores.
1 A 2 B 3 C 4 D
The stores A, B, C and D have been numbered as 1, 2, 3 and 4.
We proceed as follows, in order to select two shops out of four randomly:
Suppose, we start with the second row in the first column of the table and decide to read
diagonally. The starting digit is 8. There are no departmental stores with the number 8 in the
population. There are only four stores. Move to the next digit on the diagonal, which is 0. Ignore
it, since it does not correspond to any of the stores in the population. The next digit on the
diagonal is 1 which corresponds to store A. Pick A and proceed until we get two samples. In this
case, the two departmental stores are 1 and 4. The sample derived from this consists of departmental
stores A and D.
In random sampling, there are two possibilities (1) Equal probability (2) Varying probability.
Equal Probability
Notes
Example: Put 100 chits in a box numbered 1 to 100. Pick one number at random. Now the
population has 99 chits. Now, when a second number is being picked, there are 99 chits. In order
to provide equal probability, the sample selected is being replaced in the population.
Varying Probability
This is also called random sampling without replacement. Once a number is picked, it is not
included again. Therefore, the probability of selecting a unit varies from the other.
In our example, it is 1/100, 1/99, 1/98, 1/97 if we select four samples out of 100.
in
e.
fre
es4
ot
(2) One unit between the first and Kth unit in the population list is randomly chosen.
.n
w
Calculate
To select the first unit, we randomly pick one number between 1 to 20, say 17. So our sample
begins with 17,37,57………….. Please note that only the first item was randomly selected. The
rest are systematically selected. This is a very popular method because we need only one random
number.
A probability sampling procedure in which simple random sub-samples are drawn from within
different strata, which are, more or less equal on some characteristics. Stratified sampling are of
two types:
1. Proportionate stratified sampling: The number of sampling units drawn from each stratum
is in proportion to the population size of that stratum.
2. Disproportionate stratified sampling: The number of sampling units drawn from each
stratum is based on the analytical consideration, but not in proportion to the size of the
population of that stratum.
Notes
Sometimes, marketing professionals want information about the component part of the
population. Assume there are three stores. Each store forms a strata and the sampling
from within each strata is being selected. The resultant might be used to plan different
promotional activities for each store strata.
Suppose a researcher wishes to study the retail sales of products, such as tea in a universe
of 1,000 grocery stores (Kirana shops included). The researcher can first divide this universe
into three strata based on the size of the store. This benchmark for size could be only one
of the following (a) floor space (b) volume of sales (c) variety displayed etc.
in
10,000 100
e.
fre
Suppose we need 12 stores, then choose four from each strata, at random. If there was no
s4
stratification, simple random sampling from the population would be expected to choose
e
ot
two large stores (20% of 12) about four medium stores (30% of 12) and about six small
.n
w
As can be seen, each store can be studied separately using the stratified sample.
//
s:
tp
Example:
Example:
Size of stores Sample Mean Sales per store No. of stores Percent of stores
Large 200 2000 20
Medium 80 3000 30
Small 40 5000 50
10,000 100
The population mean of monthly sales is calculated by multiplying the sample mean by
its relative weight.
200 × 0.2 + 80 × 0.3+40 × 0.5 = 84
n1 n n n
p = 2 ................ k
N1 N2 Nk N
n1 n n
= n 1 n 1 and so on
N1 N N
in
e.
fre
Example: A survey is planned to analyse the perception of people towards their own
s4
religious practices. The population consists of various religions, viz., Hindu, Muslim, Christian,
e
Sikh, Jain, assuming a total of 10,000. Hindu, Muslim, Christian, Sikh and Jains consists of 6,000,
ot
.n
2,000, 1,000, 500 and 500 respectively. Determine the sample size of each stratum by applying
w
n1 n n n n n
P = 2 3 4 5
N1 N2 N3 N 4 N5 N
n1 n 200
= N1 6,000
N1 N 10,000
= 20 × 6
= 120
n 200
n2 = N2 2,000
N 10,000
= 40
Notes
n 200
n3 = N3 1,000
N 10,000
= 20
n 200
n4 = N4 500
N 10,000
= 10
n
n5 = N 5 10
N
n = n1 + n 2 + n3 + n4 + n5
= 120 + 40 + 20 + 10 + 10
= 200
in
Sample Disproportion
e.
fre
Let is the variance of the stratum i,
s4
e
ot
where i = 1,2,3……….k.
.n
w
The formula to compute the sample size of the stratum i is the variance of the stratum i,
w
w
ri =
Ni = Population of stratum i
N = Total population.
Example: The Government of India wants to study the performance of women self help
groups (WSHGs) in three regions viz. North, South and West. The total number of WSHGs is
1,500. The number of groups in North, South and West are 600, 500 and 400 respectively. The
Government found more variation between WSHGs in the North, South and West regions. The
variance of performance of WSHGs in these regions are 64, 25 and 16 respectively. If the
disappropriate stratified sampling is to be used with the sample size of 100, determine the
number of sampling units for each region.
Solutions:
Variance of stratum 1, = 12 = 64
Ni ri in
Stratum Number Size of the stratum Ni ri i riin ri in 3
N 1 i
r i
Cluster Sampling
in
1. The population is divided into clusters.
e.
fre
2. A simple random sample of few clusters is selected.
s4
e
Step 1: The above mentioned cluster sampling is similar to the first step of stratified random
w
w
sampling. But the two sampling methods are different. The key to cluster sampling is decided by
w//
A major advantage of simple cluster sampling is the case of sample selection. Suppose, we have
ht
a population of 20,000 units from which we wish to select 500 units. Choosing a sample of that
size is a very time-consuming process, if we use Random Numbers table. Suppose, the entire
population is divided into 80 clusters of 250 units each, we can choose two sample clusters
(2 × 250=500) easily by using cluster sampling. The most difficult job is to form clusters. In
marketing, the researcher forms clusters so that he can deal with each cluster differently.
Cross Houses
1 X1 X2 X3 X4
2 X5 X6 X7 X8
We need to select eight houses. We can choose eight houses at random. Alternatively, two
clusters, each containing four houses can be chosen. In this method, every possible sample of
eight houses would have a known probability of being chosen – i.e. chance of one in two. We
must remember that in the cluster, each house has the same characteristics. With cluster sampling,
it is impossible for certain random sample to be selected. For example, in the cluster sampling
process described above, the following combination of houses could not occur: X1, X2, X5, X6, X9,
X10, X13 and X14. This is because the original universe of 16 houses have been redefined as a
universe of four clusters. So only clusters can be chosen as a sample.
Notes
Example: Suppose, we want to have 7500 households from all over the country. In such
a case, from the first stage, District, say 30 districts out of 600 are selected from all over the
country.
I Stage - Cities: Suppose 5 cities are selected out of each 30 districts; and
II Stage - Wards/Localities: say 10 wards/localities are selected from each city
III Stage - Households: 50 households are selected from each ward/locality.
In stage I, we can employ stratified sampling
In stage II, we can use cluster sampling
In stage III, we can have simple random sampling.
!
Caution The use of various methods shall give individually contribute towards accuracy,
cost, time, etc. This leads us to conclude that multistage sampling leads to saving of time,
labour and money. Apart from this wherever an appropriate frame is not available, the
in
use of multistage sampling has universal appeal.
e.
fre
s4
Multi-stage Sampling
e
ot
.n
The name implies that sampling is done in several stages. This is used with stratified/cluster
w
designs.
w
w
//
The management of a newly-opened club is solicits new membership. During the first rounds,
ht
all corporate were sent details so that those who are interested may enroll. Having enrolled, the
second round concentrates on how many are interested to enroll for various entertainment
activities that club offers such as billiards, indoor sports, swimming, and gym etc. After obtaining
this information, you might stratify the interested respondents. This will also tell you the
reaction of new members to various activities. This technique is considered to be scientific, since
there is no possibility of ignoring the characteristics of the universe.
Task What are the advantages and disadvantages of multi-stage sampling? Enlist.
Area Sampling
Example: If someone wants to measure the sales of toffee in retail stores, one might
choose a city locality and then audit toffee sales in retail outlets in those localities.
The main problem in area sampling is the non-availability of lists of shops selling toffee in a
particular area. Therefore, it would be impossible to choose a probability sample from these
outlets directly. Thus, the first job is to choose a geographical area and then list out outlets
selling toffee. Then follows the probability sample for shops among the list prepared.
Example: You may like to choose shops which sell the brand—Cadbury dairy milk. The
disadvantage of the area sampling is that it is expensive and time-consuming.
1. Deliberate sampling
2. Shopping Mall Intercept Sampling
3. Sequential sampling
4. Quota sampling
5. Snowball sampling
6. Panel samples
This is also known as the judgment sampling. The investigator uses his discretion in selecting
sample observations from the universe. As a result, there is an element of bias in the selection.
From the point of view of the investigator, the sample thus chosen may be a true representative
of the universe. However, the units in the universe do not enjoy an equal chance of getting
in
included in the sample. Therefore, it cannot be considered a probability sampling.
e.
fre
s4
Example: Test market cities are being selected, based on the judgment sampling, because
e
ot
these cities are viewed as typical cities matching with certain demographical characteristics.
.n
w
w
This is a non-probability sampling method. In this method the respondents are recruited for
ht
individual interviews at fixed locations in shopping malls. (For example: Shopper ’s Shoppe,
Food World, Sunday to Monday). This type of study would include several malls, each serving
different socio-economic population.
Example: The researcher may wish to compare the responses of two or more TV
commercials for two or more products. Mall samples can be informative for this kind of studies.
Mall samples should not be used under following circumstances i.e., if the difference in
effectiveness of two commercials varies with the frequency of mall shopping, change in the
demographic characteristic of mall shoppers, or any other characteristic. The success of this
method depends on “How well the sample is chosen”.
Merits
Demerits
This is a method in which the sample is formed on the basis of a series of successive decisions.
They aim at answering the research question on the basis of accumulated evidence. Sometimes,
a researcher may want to take a modest sample and look at the results. Thereafter, s(he) will
decide if more information is required for which larger samples are considered. If the evidence
is not conclusive after a small sample, more samples are required. If the position is still
inconclusive, still larger samples are taken. At each stage, a decision is made about whether
more information should be collected or the evidence is now sufficient to permit a conclusion.
in
e.
Quota Sampling fre
s4
Quota sampling is quite frequently used in marketing research. It involves the fixation of
e
ot
Suppose, 2,00,000 students are appearing for a competitive examination. We need to select 1% of
w
w
Category Quota
General merit 1,000
Sport 600
NRI 100
SC/ST 300
TOTAL 2,000
1. The population is divided into segments on the basis of certain characteristics. Here, the
segments are termed as cells.
2. A quota of unit is selected from each cell.
Advantages
1. Quota sampling does not require prior knowledge about the cell to which each population
unit belongs. Therefore, this sampling has a distinct advantage over stratified random
sampling, where every population unit must be placed in the appropriate stratum before
the actual sample selection.
2. It is simple to administer. Sampling can be done very quickly.
3. The necessity of the researcher going to various geographical locations is avoided and
thus cost is reduced.
Limitations Notes
1. It may not be possible to get a “representative” sample within the quota as the selection
depends entirely on the mood and convenience of the interviewer.
2. Since too much liberty is being allowed to the interviewer, the quality of work suffers if
they are not competent.
Snowball Sampling
This is a non-probability sampling. In this method, the initial groups of respondents are selected
randomly. Subsequent respondents are being selected based on the opinion or referrals provided
by the initial respondents. Further referrals will lead to more referrals, thus leading to a snowball
sampling. The referrals will have demographic and psychographic characteristics that are
relatively similar to the person referring them.
Example: College students bring in more students on the consumption of Pepsi. The
major advantage of snowball sampling is that it monitors the desired characteristics in the
in
e.
population. fre
s4
Panel Samples
e
ot
.n
Panel samples are frequently used in marketing research. To give an example, suppose that one
w
w
households are drawn. These households are contacted to gather information on the pattern of
s:
tp
consumption. Subsequently, say after a period of six months, the same households are approached
ht
Self Assessment
Probability Sample
1. Here, each member of a universe has a known chance of being selected and included in the
sample.
2. Any personal bias is avoided. The researcher cannot exercise his discretion in the selection
of sample items.
In this case, the likelihood of choosing a particular universe element is unknown. The sample
chosen in this method is based on aspects like convenience, quota etc.
Task Identify the appropriate target population and sampling frame for various situations
listed below:
1. The regional marketing manager of a beverage company wants to test market three
new flavours to gauge their acceptance.
in
e.
of dealers towards a new promotion policy announced.
fre
s4
4. A TV channel wants to determine the viewing habits of housewives and their
e
programme preferences.
ot
.n
5. A departmental chain such as Food World wants to determine the shopping behaviour
w
w
The only way to guarantee the minimization of sampling error is to choose the appropriate
sample size. As the sample keeps on increasing, the sampling error decreases. Sampling error is
the gap between the sample mean and population mean.
Example: If a study is done amongst Maruti car-owners in a city to find the average
monthly expenditure on the maintenance of car, it can be done by including all Maruti car-
owners. It can also be done by choosing a sample without covering the entire population. There
will be a difference between the two methods with regard to monthly expenditure.
One way of distinguishing between the sampling and the non-sampling error is that, while
sampling error relates to random variations which can be found out in the form of standard
error, non-sampling error occurs in some systematic way which is difficult to estimate.
A sampling frame is a specific list of population units, from which the sample for a study being
chosen.
Notes
Example:
An MNC bank wants to pick up a sample among the credit card holders. They can
readily get a complete list of credit card holders, which forms their data bank. From
this frame, the desired individuals can be chosen. In this example, sample frame is
identical to ideal population namely all credit card holders. There is no sampling
error in this case.
Assume that a bank wants to contact the people belonging to a particular profession
over phone (doctors, lawyers) to market a home loan product. The sampling frame
in this case is the telephone directory. This sampling frame may pose several
problems: (1) People might have migrated. (2) Numbers have changed. (3) Many
numbers were not yet listed. The question is “Are the residents who are included in
the directory likely to differ from those who are not included”? The answer is yes.
Thus in this case, there will be a sampling error.
in
e.
This occurs, because the planned sample and final sample vary significantly.
fre
s4
e
Example: Marketers want to know about the television viewing habits across the country.
ot
.n
They choose 500 households and mail the questionnaire. Assume that only 200 respondents
w
reply. This does not show a non-response error, which depends upon the discrepancy. If those
w
w
200 who replied did not differ from the chosen 500, there is no non-response error.
//
s:
tp
Consider an alternative. The people who responded are those who had plenty of leisure time.
ht
Therefore, it is implied that non-respondents do not have adequate leisure time. In this case, the
final sample and the planned sample differ. If it was assumed that all the 500 chosen have leisure
time, but in the final analysis only 200 have leisure time and not others. Therefore, a sample
with respect to leisure time leads to response error.
This occurs during the data collection, analysis of data or interpretation. Respondents sometimes
give distorted answers unintentionally for questions which are difficult, or if the question is
exceptionally long and the respondent may not have answer. Data errors can also occur depending
on the physical and social characteristics of the interviewer and the respondent. Things such as
the tone and voice can affect the responses. Therefore, we can say that the characteristics of the
interviewer can also result in data error. Also, cheating on the part of the interviewer leads to
data error. Data errors can also occur when answers to open-ended questions are being
improperly recorded.
The respondent must be briefed before beginning the interview, “What is expected”? “To what
extent he should answer”? Also, the interviewer must make sure that respondent is familiar
with the subject. If these are not made clear by the interviewer, errors will occur.
Editing mistakes made by the editors in transferring the data from questionnaire to computers
are other causes for errors.
The respondent could terminate his/her participation in data gathering, because it may be felt
that the questionnaire is too long and tedious.
Notes
1. For non-response – provide incentives such as a gift or cash. This enhances the
possibility as well as incidence of response.
2. Data error: Don’t ask question, which respondents cannot answer. Also, do not ask
sensitive questions.
3. Train the interviewer to establish a good rapport with the respondents.
Self Assessment
in
e.
11. A …………. is a specific list of population units, from which the sample for a study being
fre
chosen.
s4
e
ot
12. ………….. occurs during the data collection, analysis of data or interpretation.
.n
w
w
1. The first factor that must be considered in estimating sample size, is the error permissible.
ht
3. Higher the confidence level in the estimate, the larger the sample must be. There is a
tradeoff between the degree of confidence and the degree of precision with a sample of
fixed size.
4. The greater the number of sub-groups of interest within the sample, the greater its size
must be.
Example: We may like to obtain the family income level from a mail survey, but the
researcher may not receive response from everyone. If the researcher feels the response rate is
40%, then he needs to despatch that many extra questionnaires. A low percentage of response
can cause serious problems to the researcher. This is known as the non-response error.
Non-response error may be due to 1) failure to locate, 2) flat refusal
The failure to locate: People move to new destinations. However, if the sample frames used are
of recent origin, this problem can be overcome.
Flat refusal: We do not know if those who did not respond hold different views or opinions
from those who responded.
This implies that those who don’t respond should be motivated. It can be done in any one of the Notes
following ways:
1. An advance letter informing the respondents that they will receive a questionnaire and
requesting their cooperation. This will generally increase the rate of response.
2. Monetary incentive or gift given to respondents will yield a larger response rate.
3. Proper follow up is necessary after the potential respondent received the questionnaire.
Example: Determine the sample size if standard deviation of the population is 3.9,
population mean is 36 and sample mean is 33 and the desired degree of precision is 99%.
Solution: Given 3.9 , 36, x 33 and z = 1% (99% precision implies 1% level of significance)
2
z
in
n = where d x
e.
d fre
s4
2
e
2.576 3.9
11.21 11
ot
=
36 33
.n
w
w
w//
s:
tp
Task Prepare a sample plan including the sample size for Santoor soap, keeping in mind
ht
both the male and female customers. Use three economic strata, the educational level and
the age group influencing the buyer behaviour. Prepare a sampling design for the
following:
Notes
x
n
where is the standard deviation of the population distribution of that quantity and n is the size
(number of items) in the sample.
Caselet Case of Food Manufacturer
a. 100 of the 200 respondents had been to Ahar between 8 a.m. to 9 p.m.
in
b. Of these 100 respondents, 85 had seen the company’s full-page advertisement.
e.
fre
c. Of the remaining 100 respondents, who had not been to Ahar, only 50 had seen the
s4
company’s ad.
e
ot
.n
On the basis of the above findings of the survey, the company claimed that, the promotion
w
was a “big success”. Do you agree? If so, explain why do you think so?
w
w
//
s:
Self Assessment
tp
ht
8.8 Summary
In stratified sampling, random samples are drawn from several strata, which has more or Notes
less same characteristics.
8.9 Keywords
Census: It refers to complete inclusion of all elements in the population. A sample is a sub-group
of the population.
Deliberate Sampling: The investigator uses his discretion in selecting sample observations from
the universe. As a result, there is an element of bias in the selection.
Multistage Sampling: The name implies that sampling is done in several stages.
Quota Sampling: Quota sampling is quite frequently used in marketing research. It involves the
fixation of certain quotas, which are to be fulfilled by the interviewers.
Random Sampling: Simple random sample is a process in which every item of the population
has an equal probability of being chosen.
in
e.
Sample Frame: Sampling frame is the list of elements from which the sample is actually drawn.
fre
s4
Stratified Random Sampling: A probability sampling procedure in which simple random sub-
e
samples are drawn from within different strata, that are, more or less equal on some characteristics.
ot
.n
w
2. Which method of sampling would you use in studies, where the level of accuracy can vary
from the prescribed norms and why?
6. One mobile phone user is asked to recruit another mobile phone user. What sampling
method is this known as and why?
7. Sampling is a part of the population. True/False? Why/why not?
8. Determine the sample size if the standard deviation of population is 20 and the standard
error is 4.1.
9. What do see as the reason behind purposive sampling being known as judgement sampling?
10. Suppose, the population consists of 45,000 households, divided into five (5) strata on the
basis of monthly income. This can be illustrating as below:
0 - 1000
1001 - 5000
5001 - 7500
7501 - 10,000
Above 10,000
Notes Then
(a) Find out the number of units from each strata if the sample constitutes 1% of the
population.
(b) If selection is for 150 items selecting equally from each strata, find out the number of
sample units from each strata.
1. Sampling 2. Target
3. Frame 4. Small
5. Target 6. Resources
in
e.
13. Failure to locate fre 14. Random Sample
s4
CONTENTS
Objectives
Introduction
9.1 Components of Attitude
9.2 Scaling Technique
9.2.1 Types of Scaling Techniques
9.2.2 Comparative and Non-comparative Scales
9.3 Criteria for the Good Test
9.4 Data Processing Operations
9.4.1 Steps in Processing of Data
9.5 Summary
in
e.
9.6 Keywords fre
s4
9.7 Review Questions
e
ot
Objectives
w
//
s:
tp
Introduction
Attitude is a degree of positive or negative effect associated with some psychological object.
Attitudes are subjective and personal. Attitude influences the behaviour. Purchase decisions are
based upon the attitudes. The attitudes can change over time.
Notes Affective: This refers to the respondent’s liking or preferences for an object. This is also
known as the feeling component. (a) I like the product ‘A’ (b) Advertisement ‘X’ is poor.
This component reveals the buyers’ positive or negative attitude towards the product.
Behavioural: This refers to the respondent’s intention to buy. This is a situation prior to
the purchase. In marketing, the usage and buying pattern depends on this component.
This is also known as action component.
Did u know? What are the Determinants of Attitude? (What Alters the Attitude?)
Attitudes are not static, but change continuously. Attitudes undergo change due to five
factors:
in
Changes in the group membership
e.
fre
Individual personality.
es4
ot
to value or magnitude.
ht
Nominal Scale
In this scale, numbers are used to identify the objects. For example, University Registration
numbers assigned to students, numbers on their jerseys.
‘Yes’, it won’t affect the answers given by respondents. The numbers used in nominal scales Notes
serve only the purpose of counting.
The telephone numbers are an example of nominal scale, where one number is assigned to one
subscriber. The idea of using nominal scale is to make sure that no two persons or objects receive
the same number. Similarly, bus route numbers are the example of nominal scale.
Arranging the books in the library, subject wise, author wise – we use nominal scale.
!
Caution It should be kept in mind that nominal scale has certain limitation, viz.
in
e.
(b) No mathematical operation is possible. fre
s4
(c) Statistical implication – Calculation of the standard deviation and the mean is not
e
ot
The Ordinal scale is used for ranking in most market research studies. Ordinal scales are used to
tp
ht
ascertain the consumer perceptions, preferences, etc. For example, the respondents may be
given a list of brands which may be suitable and were asked to rank on the basis of ordinal scale.
Lux
Liril
Cinthol
Lifebuoy
Hamam
Notes Rank the following attributes of 1–5 scale according to the importance in the microwave oven:
Did u know? What is the difference between nominal and ordinal scales?
In nominal scale numbers can be interchanged, because it serves only for the purpose of
counting. Numbers in Ordinal scale have meaning and it won’t allow interchangeability.
Interval Scale
Interval scale is more powerful than the nominal and ordinal scales. The distance given on the
in
e.
scale represents equal distance on the property being measured. Interval scale may tell us “How
fre
far the objects are apart with respect to an attribute? ” This means that the difference can be
s4
compared. The difference between “1” and “2” is equal to the difference between “2” and “3”.
e
ot
.n
w
Example:
w
w
1. Suppose we want to measure the rating of a refrigerator using interval scale. It will
//
s:
appear as follows:
tp
ht
Ratio Scale
Ratio scale is a special kind of internal scale that has a meaningful zero point. With this scale,
length, weight or distance can be measured. In this scale, it is possible to say, how many times
greater or smaller one object is being compared to the other.
Notes
Example: Sales this year for product A are twice the sales of the same product last year.
Statistical implications: All statistical operations can be performed on this scale.
To overcome this problem, in comparative scale, a reference point is fixed to facilitate comparison.
Illustration of comparative and non-comparative scale is shown below.
Comparative Scale: In each of the following, which store do you think is better (please tick one
store from the following)
in
(e) Smart…………. (f) More……….
e.
fre
Non-comparative Scale: The most important reason for shopping at Big Bazar is ….
es4
(a) Ambiance (b) Price (c) Variety (d) Parking space (e) Discount (f) Home delivery
ot
.n
w
w
// w
s:
tp
ht
Comparative Scales
Paired Comparison
Example: Here a respondent is asked to show his preferences from among five brands of
coffee – A, B, C, D and E with respect to flavors. He is required to indicate his preference in pairs.
A number of pairs are calculated as follows. The brands to be rated are presented two at a time,
Notes so each brand in the category is compared once to every other brand. In each pair, the respondents
were asked to divide 100 points on the basis of how much they liked one compared to the other.
The score is totally for each brand.
N(N )
No. of pairs =
2
5(5 1)
In this case, it is = =2
2
A&B B&D
A&C B&E
A&D C&D
A&E C&E
B&C D&E
If there are 15 brands to be evaluated, then we have 105 paired comparison(s) and that is the
limitation of this method.
in
e.
Rank Order Scale fre
s4
In this method, respondents are required to rank more than two objects or alternatives based on
e
ot
some criteria.
.n
w
w
w
It is simpler than paired comparison scale, as its procedure can be easily understood by the
ht
respondent.
Rank order scales are more difficult than rating scale because they involve comparison and
hence require more attention and mental effort. The main disadvantage is the respondent may
not like to make a choice among the given alternative and hence compelled to choose one of the
given alternative.
Another shortcoming of the rank order scaling is that, respondents cannot meaningfully rank
more than 5 to 6 objects. The problem will not arise while ranking of first and last objects but
with those in the undifferentiated middle. When there are several objects, one solution is to
divide the ranking into 2 stages. For example, with 9 objects, the first stage would be to rank the
objects into classes. Top three middle three and last three. The next stage would be to rank the
3 objects within each class.
Constant sum scale is one of the methods of comparative scaling. In this method, the respondent
is instructed to allocate some constant sum (points) to various features given, based on the
importance of attribute to the respondent. For example, number of features in selection of a
bank by the respondent may be done as follows. 100 points are assigned, which will be allocated
among the features. The features may be as follows:
Feature No. of points (sum)
1. Location —————
2. Banking hours —————
3. Interest rate —————
Contd...
Multi-dimensional Scaling
This is used to study consumer attitudes, particularly with respect to perceptions and preferences.
These techniques help identify the product attributes that are important to the customers and to
in
e.
measure their relative importance. Multi-Dimensional Scaling is useful in studying the following:
fre
1. (a) What are the major attributes considered while choosing a product (soft drinks, modes
s4
2. Which is the ideal combination of attributes according to the customer? (i.e., which two or
w
w
This scaling is used to describe similarity and preference of brands. The respondents were asked
to indicate their perception, or the similarity between various objects (products, brands, etc.)
and preference among objects. This scaling is also known as perceptual mapping.
There are two ways of collecting the input data to plot perceptual mapping:
1. Non-attribute method.
2. Attribute method.
1. Non-attribute method: Here, the researcher asks the respondent to make a judgment
about the objects directly. In this method, the criteria for comparing the objects is decided
by the respondent himself.
2. Attribute method: In this method, instead of respondents selecting the criteria, they were
asked to compare the objects based on the criteria specified by the researcher.
For example, to determine the perception of a consumer: Assume there are five insurance
companies to be evaluated on two attributes namely (1) convenient locality (2) courteous personal
service. Customers’ perception regarding the five insurance companies are as follows:
Software such as SPSS, SAS and Excel are the packages used in MDS. Brand positioning
research is one of SPSS’s important features. SAS is a business intelligence software. Excel
is also used to a certain extent.
in
e.
fre
Caselet Case: New Baby Care Product (Perceptual Mapping)
es4
ot
T his method is particular about the steps adopted by searchers to assist a company in
.n
the newborn baby care market. The example cited is that of Marico. This map helps
w
w
Marico to identify the position of its competitors. Marico introduced a new brand baby oil
w
//
named “Sparsh.” This is an unorthodox entry. Marico was the first to rope in an ambassador
s:
actress in a market worth 300 crore. A second brand ambassador to speak in favour was
tp
ht
Sonali Bendre, for both baby oil and a bathing bar. The reason for choosing a female
ambassador was to lay emphasis on the concept of motherhood.
Marico is a leader in the world’s largest coconut oil brand namely, Parachute. They are
now switching over to health care products from hair oil and edible oil. Though adult
health care products constitute a . 1,500 crore market, baby care segment still continues to
be a niche market. The following are some of the obstacles in developing loyalty towards
the baby care products.
1. The family may use the same adult hair care product for children as well.
2. Customers repeating the product are hard to find due to the fact that women have
fewer babies in present times.
3. Women stick to the product that their mothers recommend.
4. Herbal versions are still popular in urban, semi-urban and rural areas.
5. There are big players in the field of baby care products.
A few example are Johnson and Johnson, Dabur, Wipro, etc. The market also has the
Himalaya Drug Company which has established its own herbal baby care division.
The uniqueness of Sparsh lies in the fact that it meets consumer needs by using traditional
ingredients in modern packing. Marico’s main effort is to create brand differentiation.
Two parameters used by Sparsh of Marico are:
1. Price.
2. Value perception.
Contd...
In a segment, where perceived quality governs decision-making, and value as a parameter Notes
is the choice, value proposition is central to Sparsh’s marketing.
In a segment, where price is the consideration, the company has priced it on par with
leaders.
In promotion, the company has used a two-pronged approach:
1. Build brand value.
2. Cut through the clutter.
in
e.
fre
s4
e
ot
.n
w
w
w
//
s:
tp
ht
3. To understand the products which are viewed as substitutes and those that are differentiated.
4. For segmenting the market.
Limitations of MDS:
1. Conceptual problem: The criteria on which the similarities are gauged may vary during an
interview with respondents. They vary depending on what the respondent thinks. A
customer may buy something for himself or he may gift a product to others. In both cases,
the criteria used for selection are different.
2. Preference: Keeps changing from time to time.
3. Complicated computational problem.
When there are very large number of characteristics to be rated, it becomes very difficult for the
respondent to rank order. To deal with this, Q-Sort scaling procedure is used. In this technique,
respondents are used to sort out the various characteristics into convenient groups. Therefore,
large number of groups is used in this method. This will increase the reliability of the results.
Suppose respondents are given say 100 statements. They are asked to place them in eleven piles,
ranging from “most highly agree” to “least highly agree”.
!
Caution The ideal number of this type of scaling should be between sixty and ninety.
The number of statements/objects placed in each pile is pre specified so that roughly normal
distribution of object is obtained.
Non-comparative Scales
in
e.
Likert Scale fre
s4
attitude object. Each statement has ‘5 points’, Agree and Disagree on the scale. They are also
.n
w
called summated scales, because scores of individual items are summated to produce a total
w
score for the respondent. The Likert Scale consists of two parts – item part and evaluation part.
w
//
Item part is usually a statement about a certain product, event or attitude. Evaluation part is a list
s:
tp
of responses like “strongly agree” to “strongly disagree”. The five point-scale is used here. The
ht
numbers like +2, +1, 0, –1, –2 are used. The Likert Scale must contain an equal number of
favourable and unfavourable statements, Now, let us see with an example how the attitude of a
customer is measured with respect to a shopping mall.
Neither
Sl. Strongly Strongly
Likert scale items Disagree agree nor Agree
No. disagree agree
disagree
1 Salesmen at the shopping - - - - -
mall are courteous
2 Shopping mall does not have - - - - -
enough parking space
3 Prices of items are - - - - -
reasonable.
4 Mall has wide range of - - - - -
products to choose
5 Mall operating hours are - - - - -
inconvenient
6 The arrangement of items in - - - - -
the mall is confusing
The respondents’ overall attitude is measured by summing up his (her) numerical rating on the
statement making up the scale. Since some statements are favourable and others unfavourable,
it is the one important task to be done before summing up the ratings. In other words, “strongly
agree” category attached to favourable statement and “strongly disagree” category attached to
unfavourable. The statement must always be assigned the same number, such as +2, or –2. The Notes
success of the Likert Scale depends on “How well the statements are generated?” The higher the
respondent’s score, the more favourable is the attitude. For example, if there are two shopping
malls, ABC and XYZ and if the scores using the Likert Scale are 30 and 60 respectively, we can
conclude that the customers’ attitude towards XYZ is more favourable than ABC.
This is very similar to the Likert Scale. It also consists of a number of items to be rated by the
respondents. The essential difference between Likert and Semantic Differential Scale is as follows:
It uses “Bipolar” adjectives and phrases. There are no statements in the Semantic Differential
Scale.
Each pair of adjective is separated by a seven point scale.
Notes Some individuals have favourable descriptions on the right side, while some have
in
e.
on the left side. The reason for the reversal is to have a combination of both favourable
fre
and unfavourable statements.
e s4
ot
.n
w
w
w
Task Please rate the five real estate developers mentioned below on the given scales for
//
s:
(1) Ansal (2) Raheja (3) Purvankara (4) Mantri (5) Salpuria
–3 –2 –1 0 +1 +2 +3
1. Not reliable _ _ _ _ _ _ _ Reliable
2. Expensive _ _ _ _ _ _ _ Not expensive
3. Trustworthy _ _ _ _ _ _ _ Not trustworthy
4. Untimely delivery _ _ _ _ _ _ _ Timely delivery
5. Strong Brand Image _ _ _ _ _ _ _ Poor brand image
The respondents were asked to tick one of the seven categories which describes their
views on attitude. Computation is being done exactly the same way as in the Likert Scale.
Suppose, we are trying to evaluate the packaging of a particular product. The seven point
scale will be as follows:
“ I feel …………..
1. Delighted
2. Pleased
3. Mostly satisfied
4. Equally satisfied and dissatisfied
5. Mostly dissatisfied
6. Unhappy
7. Terrible.
This scale is a modified version of semantic differential scale. It uses only one pole. It is a ten
print scale with a range of +5 to -5. This scale measures both the direction and intensity of
attitude simultaneously. Unlike semantic differential study, which uses bipolar adjectives, here
single word is used to describe the characteristic of interest. There is no absolute zero point. This
is an interval scales. Respondents are asked to indicate the object by selecting a numerical
response category. The main advantage of this scale is that it is simple to administer and also to
construct.
An illustration of staple scale is as follows. You have been associated with M/s XYZ company,
conducting marketing research.
in
e.
Circle the number you think is most appropriate. If you think, the data provided by the research
fre
company is extremely accurate circle +5 and vice-versa.
es4
ot
Staple Scale is used in developing profile analysis. Despite the simplicity of constructing and
usage, semantic differential scale finds an edge.
Thurstone Scale
This is also known as an equal appearing interval scale. The following are the steps to construct
a Thurstone Scale:
Step 1: To generate a large number of statements, relating to the attitude to be measured.
Step 2: These statements (75 to 100) are given to a group of judges, say 20 to 30, who were asked
to classify them according to the degree of favourableness and unfavourableness.
Step 3: 11 piles are to be made by the judges. The piles vary from “most unfavourable” in pile 1
to neutral in pile 6 and most favourable statement in pile 11.
Step 4: Study the frequency distribution of ratings for each statement and eliminate those Notes
statements, which different judges have given widely scattered ratings.
Step 5: Select one or two statements from each of the 11 piles for the final scale. List the selected
statements in random order to form the scale.
Step 6: The respondents whose attitudes are to be scaled were given the list of statements and
asked to indicate their agreement or disagreement with each statement. Some may agree with
one statement while some may agree with more than one statement.
Example:
1. Crime and violence in movies:
1. All movies with crime and violence should be prohibited by law.
in
e.
5. fre
Watching a movie with crime and violence does not interfere with my routine
s4
life.
e
ot
6. I have no opinion one way or the other, about watching movies with crime
.n
and violence.
w
w
w
8. Most movies with crime and violence are interesting and absorbing.
ht
10. People learn “how to be safe and protect oneself” by seeing a movie on
crime.
11. Watching crime in a movie does not harm our lifestyle.
Conclusion: A respondent might agree with statements 8, 9 and 10. Such agreement
represents a favourable attitude towards crime and violence. On the contrary, if
items 1, 3, 4 are chosen by respondents, it shows that respondents are unfavourably
disposed towards crime in movies. If the respondent chooses 1, 5 and 11, it could be
interpreted to indicate that she (he) is not consistent in his(her) attitude about the
subject.
2. Suppose, we are interested in the attitude of certain socio-economic class of
respondents towards savings and investments. The final list of statements would be
as follows:
1. One should live for the present and not the future. So, savings are absolutely
not required.
2. There are many attractions to spend the money saved.
9. Some amount of savings and investments are a must for every individual.
10. One should try to save more so that most of it could be invested.
2. Used to find attitude towards issues like war, religion, language, culture, place of worship,
etc.
in
Limitations:
e.
fre
1. Limited use in marketing research, since it is time consuming.
es4
4. It is an expensive method.
//
s:
tp
In 1966, Professor F. Thomas Juster argued that since verbal intentions are simply disguised
probability statements, then why not directly capture the probabilities themselves as measured
by the respondents.
Juster’s 11 point probability scale can be used to produce estimates of the average probability
that a population will do something by a future time. Since what is being measured is a probability,
the mean response estimates the proportion of the population that will perform the behaviour
at issue.
An example is given by the question, “On a scale of 0-10 where 0 indicates no chance and
10 indicates certainty, what is the chance that you will change your primary bank in the next
12 months?” If then the average response is 3.2, this translates to 32% of the population intend to
switch banks.
The Juster scale in its many applications has been found to be superior as a predictive measure
of future purchase behaviour than other intentions scales. The distribution of responses, however,
has been found to affect the predictive accuracy of the scale. Not surprisingly, the greater the
variation in responses, the less accurate the predictions.
Studies have shown that purchase probabilities can be over or under estimated by the Juster
scale, but on average, it is the most consistent in accurately predicting actual purchase rates.
There are important issues to be considered in the administration of the Juster Scale that have
been found to contribute to variation in its effectiveness. These include unfamiliarity of the
respondent with new products, training of the administrator and differences in age and education
level of respondents.
The Juster scale has also been successfully used to predict respondent behaviour outside the Notes
typical consumption behaviour realm, which includes being applied to telephone surveys, fast
moving consumer goods, self-completion questionnaires, services, brands and customer loyalty.
One example of such an extension involved predicting the percentage of a given population of
adults currently at home looking after children, who will take up paid employment in the next
year. At an aggregate level in this example, the Juster Scale mean was 1.9 indicating that a
predicted 19% of respondents would find paid work in the next year. When actual behaviour
was measured in the following year, it was found that indeed, 19% of these respondents were in
paid employment.
Table 9.2: Juster’s 11-point Probability Scale
in
4 Fair possibility (4 chances in 10)
e.
5 Fairly good possibility (5 chances in 10)
fre
s4
Task A manufacturer of packed bakery items wants to evaluate customer attitudes toward
his product brand. 300 customers who buy this brand filled the questionnaire that was sent
to them. The answers of this questionnaire were converted to scale and the results are as
follows:
1. The average score from the above sample on a 10-item Likert Scale was 65.
2. Average score for a sample on 10-item Semantic Differential Scale was 60.
You are required to indicate whether these customers had a favourable or unfavourable
attitude towards the product.
Self Assessment
2. ............................ scale is used to assess attitude of the respondents group regarding any
issue of public interest.
3. ............................... Scaling is used to study consumer attitudes, particularly with respect to
perceptions and preferences.
4. Thurstone Scale is also known as an ...................................... scale.
There are two criteria to decide whether the scale selected is good or not. They are shown in the
diagram given below:
Figure 9.2: Criteria for the Good Test
in
e.
fre
es4
ot
.n
w
w
w
//
s:
tp
ht
Reliability
Test-retest Method
There is an approach called test-retest to check the reliability. In this approach, respondents are
given identical sets of scales at two different points of time under almost identical conditions.
The time interval is between 2 to 5 weeks. The similarity between 2 measurements is determined
by calculating the correlation coefficient. Higher the value of correlation coefficient, greater the
reliability.
The disadvantage of this method is that, if the interval between first and second test is more the Notes
scale will be less reliable.
Second disadvantage is, it is difficult to convince the original respondents to take the test for
second time.
Third disadvantage is that, the second time answer may be influenced by the first time answer.
Assume that, an opinion about the hospital service is asked. Two weeks later, if the same
question is asked, the reply by respondent will be influenced by what was told the first time.
In this method 2 “equivalent” scales are used to obtain consistent results. So, the researcher
administer one scale to the respondent and 2 weeks later another scale, which is equivalent of
the first one to the same respondent.
The greatest problem of this method is to construct 2 scales that appear to be different but have
similar effect. This alternative forms test is similar to the test-retest method, except that the test-
retest method uses the same measuring instrument both the times.
in
e.
Internal Consistency fre
es4
In this method two or more measurement of the same concept is taken at the same time and then
ot
compares to see if they agree with each other. Suppose we use Likert scale and offer choices from
.n
w
strongly agree to strongly disagree to determine consumer attitude towards the service rendered
w
w
by Big Bazar. Suppose the researcher prepares 4 statements scale to measure this:
//
s:
ii. Controls used for conducting the experiment must be good. Example, (a) measuring device
must be accurate. (b) The researcher who administers must be trained to avoid bias in
respondents.
iii. Items to be measured must be stated clearly.
Validity
The paradigm of validity focused in the question “Are we measuring, what we think, we are
measuring”? Success of the scale lies in measuring “What is intended to be measured?” Of the
two attributes of scaling, validity is the most important.
There are several methods to check the validity of the scale used for measurement.
1. Construct Validity: A sales manager believes that there is a clear relation between job
satisfaction for a person and the degree to which a person is an extrovert and the work
performance of his sales force. Therefore, those who enjoy high job satisfaction, and have
Notes extrovert personalities should exhibit high performance. If they do not, then we can
question the construct validity of the measure.
2. Content Validity: A researcher should define the problem clearly. Identify the item to be
measured. Evolve a suitable scale for this purpose. Despite these, the scale may be criticised
for being lacking in content validity. Content validity is known as face validity. An
example can be the introduction of new packaged food. When new packaged food is
introduced, the product representing a major change in taste. Thousands of consumers
may be asked to taste the new packaged food. Overwhelmingly, people may say that they
liked the new flavour. With such a favourable reaction, the product when introduced on a
commercial scale may still meet with failure. So, what is wrong? Perhaps a crucial question
that was omitted. The people may be asked if liked the new packaged food, to which the
majority might have “yes” but the same respondents were not asked, “Are you willing to
give up the product which you are consuming currently?” In this case, the problem was
not clearly identified and the item to be ‘measured’ was left out.
3. Predictive Validity: This pertains to “How best a researcher can guess the future
performance from the knowledge of attitude score”?
in
e.
fre
Example: An opinion questionnaire, which is the basis for forecasting the demand for a
s4
product has predictive validity. The procedure for predictive validity is to first measure the
e
attitude and then predict the future behaviour. Finally, this is followed by the measurement of
ot
.n
future behaviour at an appropriate time. Compare the two results (past and future). If the two
w
scores are closely associated, then the scale is said to have predictive validity.
w
//w
4. Criterion Validity:
s:
tp
variables selected as meaningful criteria i.e., predicted and actual behavior should
be similar
(b) Addresses the question of what construct or characteristic the scale is actually
measuring
5. Convergent Validity: Extent to which scale correlates positively with other measures of
the same construct.
6. Discriminant Validity: Extent to which a measure does not correlate with other constructs
from which it is supposed to differ.
7. Nomological Validity: Extent to which scale correlates in theoretically predicted ways
with measures of different but related constructs.
Notes
Task Suppose a cosmetic manufacturing company wants to ascertain the perception of its
customers towards a product. Take the 7-item scale to measure the perceived perception of
the product using Likert and Semantic Differential scales. The following are some of the
likely adjectives which are used in Semantic Differential scale:
in
Unfavourable X Favourable
e.
Soft X freHard
s4
Organised X Disorganised
e
ot
Quick X Slow
.n
w
Formal X Informal
w
w
Pleasure X Displeasure
//
s:
tp
Complex X Simple
ht
Cheap X Costly
Pleasant X Unpleasant
Fragrant X Less fragrant
Dominating X Submissive
Rational X Emotional
Self Assessment
Notes 8. Those who enjoy high job satisfaction, and have extrovert personalities should exhibit
....................performance.
12. In …………method two or more measurement of the same concept is taken at the same
time and then compares to see if they agree with each other.
Processing data is very important in market research. After collecting the data, the next task of
the researcher is to analyse and interpret the data. The purpose of analysis is to draw conclusions.
There are two parts in processing the data:
in
(1) Data analysis
e.
fre
(2) Interpretation of data
e s4
Analysis of the data involves organising the data in a particular manner. Interpretation of data
ot
.n
is a method for deriving conclusions from the data analysed. Analysis of data is not complete,
w
w
unless it is interpreted.
//w
s:
2. Editing
3. Coding
4. Tabulation
5. Summarising the data
Data collection is a significant part of market research. Even more significant is to filter out the
relevant data from the mass of data collected. Data continues to be in raw form, unless they are
processed and analysed.
Primary data collected by surveys and observations by field investigations are hastily entered
into questionnaires. Due to the pressure of interviewing, the researcher has to write down the
responses immediately. Many times this may not be systematic. The information so collected by
field staff is called raw data.
The information collected may be illegible, incomplete and inaccurate to a considerable extent.
Also the information collected will be scattered in several data collection formats. The data
lying in such a crude form are not ready for analysis. Keeping this in mind, the researcher must
take some measures to organise the data so that it can be analysed.
The various steps which are required to be taken for this purpose are (a) editing and (b) coding
and (c) tabulating.
Editing Notes
The main purpose of editing is to eliminate errors and confusion. Editing involves inspection
and correction of each questionnaire. The main role of editing is to identify commissions,
ambiguities and errors in response.
Editing thus means the activity of inspecting, correcting and modifying the correct data.
This can be done in two stages (a) Field editing (b) Office editing.
(a) Field editing: Objectives of field editing are – To make sure that proper procedure is
followed in selecting the respondent, interview them and record their responses. In field
editing, speed is the main criteria, since editing should be done when the study is still
under progress. The main problems faced in field editing are:
(1) Inappropriate respondents
(2) Incomplete interviews
(3) Improper understanding
in
(4) Lack of consistency
e.
fre
(5) Legibility
s4
e
Examples:
w
//
s:
2. Incomplete interview: All questions are to be answered. There should not be any
‘blanks’. Blanks can have different meanings, like (a) No answer (b) Refusal to
answer (c) Question not applicable (d) Interviewer by oversight did not record. The
reason for no answer could be that the respondent really does not know the answers.
Sometimes, the respondent does not answer, may be because of the sensitive or
emotional aspect of the question.
3. Lack of understanding: The interviewer, in a hurry, would have recorded some
abbreviated answer. Later at the end of the day, s(he) cannot figure out what it
meant.
4. Consistency: The earlier part of the questionnaire indicates that there are no children
and in the later part the age of children is mentioned.
5. Legibility: If what is said is not clear, the interviewer must clarify the same on the
spot.
6. Fictitious interview: This amounts to cheating by the interviewer. Here, the
questionnaires are filled without conducting interviews. A surprise check by
superiors is one way to minimise this.
(b) Office editing: Office editing is more thorough than field editing. The job of an office
editor is more difficult than that of the field editor. In case of a mail questionnaire there
are no other methods of cross-verification, except to conduct office audit. Examples as
below illustrate the kind of problems faced by office editors. Problems of consistency,
rapport with respondents, etc., are some of the issues which get highlighted during office
editing.
Notes
Examples:
1. A respondent indicated that he doesn’t drink coffee, but when questioned about his
favourite brand, he replied “Bru”.
2. A rating scale given to a respondent states Semantic Differential Scale with 10 items.
The respondent has ticked “strongly agree” to the 10 items.
3. “What is the most expensive purchase you have made in the last one year?” is the
question. Two respondents answer (1) LCD TV and (2) Trip to USA.
In example-1 above, there is inconsistency. There are two possibilities which an editor
need to consider. (1) Was the respondent lying? (2) Did the interviewer record wrongly?
The editor has to look into the answers to other questions on beverages, and interpret the
right answer.
In example-2 above, it is to be remembered that Semantic Differential scale consists of
items which has alternately positive and negative connotations. If a respondent has marked
both positive and negative as ‘agreed’, the only conclusion the editor can draw is that the
respondent filled the questionnaire without knowledge. The editor will have to discard
in
this questionnaire, since there are no alternatives.
e.
fre
In example-3 above, both the respondents have answered correctly. The frame of reference
s4
is different. The main problem is, one of them is a product, whereas the other is a service.
e
ot
While coding the data, the two answers should be put under two different categories.
.n
w
Coding
s:
tp
ht
Coding refers to those activities which helps in transforming edited questionnaires into a form
that is ready for analysis. Coding speeds up the tabulation while editing eliminates errors.
Coding involves assigning numbers or other symbols to answers so that the responses can be
grouped into limited number of classes or categories.
2. Mutual exclusivity.
3. Single Dimension.
1. Establishment of appropriate category:
Example: Suppose the researcher is analysing the “inconvenience” that a car owner is
facing with his present model. Therefore, the factor chosen for coding may be “inconvenience”.
Under this there could be 4 types (1) Inconvenience in entering the backseat (2) Inconvenience
due to insufficient legroom (3) Inconvenience with respect to the interior (4) Inconvenience in
door locking, and opening the dickey. Now the researcher may classify these four answers
based on internal inconvenience and other inconveniences referring to the exterior. Each is
assigned a different number for the purpose of codification.
2. Mutually exclusive: This is important because the answer given by the respondent should
be placed under one category. Example: Occupation of an individual may be responded to
as (1) Professional (2) Sales (3) Executive (4) Manager etc.
Sometimes, respondents might think that they belong to more than one category. This is because Notes
a sales personnel may be doing a sales job and therefore should be placed under the sales
category. Also, he may supervise the work of other sales executive(s). In this case, he is doing a
managerial function. Viewed in this context, he should be placed under the managerial category,
which has a different code. Therefore, he can only be put under one category, which is to be
decided. One way of deciding this could be to analyse “in which of the two functions does he
spend most time”?
Yet another scenario assumes that there is a salesman who is currently employed. Under the
column of ‘occupation’, he will tick it as sales, while under the current employment column, he
will mark unemployed. How does one codify this? Under which category should he be placed.
One of the solutions is to have a classification, such as employed salesman, unemployed salesman
to represent the two separate categories.
in
e.
Business fre B
s4
Retired R
e
ot
Technical T
.n
Consultant C
w
w
w
//
Tabulation
s:
tp
ht
Tabulation refers to counting the number of cases that fall into various categories. The results
are summarized in the form of statistical tables. The raw data is divided into groups and sub-
group(s). The counting and placing of data in a particular group and sub-group are done. The
tabulation involves:
(1) Sorting and counting
(2) Summarising of data
Tabulation may be of two types (1) simple tabulation (2) cross tabulation. In simple tabulation,
a single variable is counted. Cross-tabulation includes two or more variables, which are treated
simultaneously. Tabulation can be done entirely by hand or by machine, or by both hand and
machine.
The form in which tabulation is to be done is decided by taking into account (1) the purpose of
study and (2) the use of statistical tools e.g. mean, mode, standard deviation etc. Improper
tabulation may create difficulties in the use of these tools.
TABLE No.
Title – Number of children per family
Caption
Sub Heading Total
Body
Foot note
in
e.
fre
The table must have a clear and brief title. The head note, usually the measurement unit, is
s4
placed at the top of the table in the right hand corner in a bracket.
e
ot
Stub indicates the row title or the row headings and is placed in the left-hand column. Caption
.n
Sub-entries are the sub-group of the stub. The body of the table given full information of the
//
s:
frequency.
tp
ht
Before taking up summarising, the data should be classified into (1) Relevant data, and
(2) Irrelevant data. During the field study, the researcher collects lot of data which he may think
would be of use. Summarizing the data includes:
Classification of Data
(a) Number of groups: The number of groups should be sufficient to record all possible data.
The classification should not be too narrow. If it is too narrow, there can be an overlap.
Example: If a researcher is conducting a survey on “Why does the current owner dislikes
the car?” The car owner may indicate the following:
(4) Mileage
Now the above data can be classified into two or three categories such as (1) Discomfort
(2) Expense (3) Pride (4) Safety (5) Design of the car.
(b) Width of the class interval: Class interval should be uniform and should be of equal Notes
width. This will provide consistency in the data distribution.
(c) Exclusive categories: The classification should be done in such a way that the response can
be placed in only one category.
Example: Problem of leg room is the answer by respondent. This should be placed either
under discomfort or design, but not both.
(d) Exhaustive categories: This should be made to include all responses including “Don’t
Know” answers. Sometimes this will influence the ultimate answer to the research problem.
Frequency Distribution: Frequency distribution simply reports the number of responses that
each question receives. Frequency distribution organises the data into classes or groups. It
shows the number of data that falls into particular class.
in
e.
fre
s4
Example:
e
ot
4000-6999 100
w
w
7000-9999 122
//
s:
tp
10000-12999 140
ht
In marketing research, central value or tendency plays a very important role. The researcher
may be interested in the average sales/shop, average consumption per month etc. The population
parameters can be calculated with the help of simple average. The average of sample may be
taken as population parameter. For example, if the average income of the population is to be
computed, the researcher may select a sample, collect data on family income and calculate the
relevant statistics which will be a representative of the population.
The total purchasing power of the community can be estimated on sample average. If the sample
is stratified, the purchasing power of each income class may also be estimated. The median
figure will reveal that half the population has more income than the median income, and the
others half has less income than the median income. The mode will reveal the most common
frequency. Based on this, shoppers can devise their strategy to sell the product.
The three most common ways to measure centrality or central tendency are the mode, median
and mean.
Mode
The mode is the central value or item that occurs most often, when data is categorized in a
frequency distribution, it is very easy to identify the mode, since the category in which the mode
lies has the greatest number of observations.
Example: Data regarding household income of 300 people as tabulated by the researcher.
D1
M0 = LM 0 i
D1 D2
in
e.
fre
D1 = Difference between the frequency of modal class and the class immediately
s4
preceding the modal class.
e
ot
D2 = Difference between the frequency of the modal class and the class immediately
.n
95
Md = 10,000
ht
5,000
95 75
95
= 10000 +
170
Median
Median lies precisely halfway between the highest and lowest values. It is necessary to arrange
the data into ascending or descending order before selecting the median value. For the ungrouped
data with an odd number of observations, the median would be the middle value. For an even
number of observations, the median value is half way between central value.
N
C.F
2
Md = LM d i
fM d
CF = Cumulative frequency for the class just below the median class. Notes
Mean
In a grouped data, the midpoint of each category would be multiplied by the number of
observation in that category. Sum up and the total to be divided by the total number of observation.
fx
Eqn., X =
f
in
e.
fre
s4
Example: Two students X, Y attend 3 class tests and the scores are as follows:
e
ot
.n
Y - has deteriorated
Measures of Dispersion
Indicates the degrees of the scatteredness of the observations. Let curves A and B represent two
frequency distributions. Observe that A and B have the same mean. But curve A has less variability
than B.
Notes If we measure only the mean of these two distributions, we will miss an important difference
between A and B. To increase our understanding of the pattern of the data, we must also measure
its dispersion.
Range: It is the difference between the highest and lowest observed values.
i.e range = H – L, H = Highest, L = Lowest.
Note:
1. Range is the crudest measure of dispersion.
HL
2. is called the coefficient of range.
HL
Semi-Inter Quartile Range (Quartile deviation): Semi-Inter quartile range Q.
Q3 Q1
Q is given by Q =
2
Note:
in
Q3 Q1
e.
1. Q 3 Q 1 is called the coefficient of quartile deviation.
fre
s4
2. Quartile deviation is not a true measure of dispersion but only a distance of scale.
e
ot
Mean Deviation (MD): If A is any average then mean deviation about A is given by:
.n
w
w
f |x A|
w
i i
MD(A) =
//
N
s:
tp
ht
Note:
MD(A)
3. is called the coefficient of mean deviation.
A
Variance (s2 ): A measure of the average squared distance between the mean and each term in
the population.
1
s2 =
N
fi (xi x)2
Standard deviation (s) is the positive square root of the variance:
1
s = f (x
i i x)2
N
1
s2 =
N
fi (xi2 (x)2
Note: Combined variance of two sets of data of N 1 and N2 items with means x 1 and x2 and
standard deviations s1 and s2 respectively is obtained by:
Notes
N 1 12 N 2 22 N 1d12 N 1d 22
2
= N1 N2
N 1 x1 N 2 x 2
and x =
N1 N2
Sample variance (2) : Let x1, x2, x3,……… xn, represent a sample with mean x.
Then sample variance 2 is given by:
2 =
(x x)
n1
=
x -
n(x)2
n1 n1
in
e.
2 2 fre
Note: =
(x x)
x
n(x)2
is called the sample standard deviation.
s4
Task You have collected data on employees of a large organisation in a metro. You
s:
tp
analyse the data by the type of work, education level, whether the employee belongs to an
ht
urban or rural area. The results are as below. How would you interpret them?
Caselet
Self Assessment
13. ……………of data is a method for deriving conclusions from the data analysed.
14. ……………involves inspection and correction of each questionnaire.
in
e.
9.5 Summary fre
es4
These scales show the extent of likes/dislikes, agreement disagreement or belief towards
w
w
an object.
w
//
tp
ht
There are four types of scales used in market research namely paired comparison, Likert,
semantic differential and thurstone scale.
Likert is a five point scale whereas semantic differential scale is a seven point scale.
Bipolar adjectives are used in semantic differential scale.
Thurstone scale is used to assess attitude of the respondents group regarding any issue of
public interest.
Validity and reliability of the scale is verified before the scale is used for measurement.
If repeated measurement gives the same result, then the scale said to be reliable.
Validity refers to “Does the scale measure what it intends to measure”.
There are three methods to check the validity which type of validity is required depends
on “What is being measured”.
Processing data is very important in market research. After collecting the data, the next
task of the researcher is to analyse and interpret the data.
9.6 Keywords
Constant Sum Scale: Constant sum scale is one of the methods of comparative scaling. In this
method, the respondent is instructed to allocate some constant sum (points) to various features
given, based on the importance of attribute to the respondent.
Interval Scale: Interval scale is more powerful than the nominal and ordinal scales. The distance Notes
given on the scale represents equal distance on the property being measured.
Likert scale: This consists of a series of statements concerning an attitude object. Each statement
has ‘5 points’, Agree and Disagree on the scale.
Multi-Dimensional Scaling: This is used to study consumer attitudes, particularly with respect
to perceptions and preferences.
Nominal Scale: In this scale, numbers are used to identify the objects.
Ordinal Scale (Ranking scale): The Ordinal scale is used for ranking in most market research
studies. Ordinal scales are used to ascertain the consumer perceptions, preferences, etc.
Rank Order Scale: In this method, respondents are required to rank more than two objects or
alternatives based on some criteria.
Ratio Scale: Ratio scale is a special kind of internal scale that has a meaningful zero point. With
this scale, length, weight or distance can be measured.
Scaling: The generation of a continuum upon which measured objects are located.
in
e.
9.7 Review Questions fre
e s4
ot
12. What are the different types, sources and characteristics of hypothesis?
13. A highway petrol police on NH4 wants to find out how fast the car and truck travels on
this highway stretch. To obtain this information, a speed recording device at an appropriate
place on the highway was installed. The speed was recorded for about three hours and the
following data was recorded.
55 61 60 68
52 50 69 60
65 66 59 62
2001 - 16
2002 - 17
2003 - 18
in
e.
2004 fre - 20
s4
attitudes towards a product. For this purpose, the company wants the customer to complete
s:
a questionnaire which indicates several product attributes. It was decided by the company
tp
ht
that only five attributes that affect the sale of the product would be considered for analysis.
The attributes were appearance, quality of the picture, sound, after-sales service and price.
The following scales are used to assess the product:
Neither
Strongly Agree Strongly
Disagree Agree
disagree nor agree
Disagree
1. Appearance is good ___ ___ ___ ___ X
2. Price is reasonable ___ ___ ___ X ___
3. After-sales service is good ___ ___ ___ X ___
4. Sound quality is excellent ___ X ___ ___ ___
5. Picture is sharp ___ ___ ___ ___ X
1 2 3 4 5
Suppose the customer has inspected the product and the response is as shown in the table
above:
1. Interval 2. Thurstone
7. Opinion 8. High
17. Tabulation
in
e.
Hague & Morgan, Marketing Research in Practice, Kogan page.
fre
s4
Paneerselvam. R, Research Methods, PHI
e
ot
.n
w
w
www.scribd.com/doc
tp
ht
www.soas.ac.uk
www.web-source.net
CONTENTS
Objectives
Introduction
10.1 Correlation
10.1.1 Scatter Diagram
10.2 Types of Correlation
10.2.1 Positive Correlation
10.2.2 Negative Correlation
10.2.3 No Correlation
10.3 Partial Correlation
10.4 Multiple Correlations
in
e.
10.5 Summary fre
s4
10.6 Keywords
e
ot
Objectives
tp
ht
Introduction
Marketing research data analysis is a blend of statistics, psychology, information technology
and art. The professional marketing researcher is not expected to have a complete understanding
of all the techniques of data analysis, but is expected to manage the blending of these disciplines
in order to develop and organize a complete analysis of the data that satisfies the information
requirements of the project. Managers of today often need to understand and make decisions
depending upon the numerical data on two or more variables simultaneously. For example,
(i) Cost of production and volume of production,
(ii) Expenditure on Advertising and Sales of a Product,
(iii) Number of Vehicles on Road and Number of Accidents,
(iv) Number of Colleges offering MBA Programme and number of MBA Graduates,
(v) Number of Counters at an e - Seva Kendra and the waiting time of customers
(vi) Number of Telephone calls and Rate per Call and so on.
In other words, one of the basic functions of a manager is to understand the relationship between Notes
these variables and make appropriate decisions keeping the future in mind known as ‘Forecasting
or Prediction’. The part which deals with understanding of the behaviour of variables is
Correlation and the part deal with the forecasting is Regression.
10.1 Correlation
Correlation is a statistical tool for studying the relationship between two or more variables and
correlation analysis involves various methods and techniques used for studying and measuring
the extent of relationship between the two variables. Two variables said to be correlated, if the
change in one variable results in a corresponding change in the other. Various experts have
defined correlation in their own words and their definitions, broadly speaking, imply that
correlation is the degree of association between two or more variables. Some important
definitions of correlation are given below:
1. “If two or more quantities vary in sympathy so that movements in one tend to be
accompanied by corresponding movements in other(s) then they are said to be correlated.”
in
— L.R. Connor
e.
2. fre
“Correlation is an analysis of covariation between two or more variables.”
s4
— A.M. Tuttle
e
ot
.n
3. “When the relationship is of a quantitative nature, the appropriate statistical tool for
w
w
discovering and measuring the relationship and expressing it in a brief formula is known
w
as correlation.”
//
s:
tp
— Ya Lun Chou
Correlation Coefficient: It is a numerical measure of the degree of association between two or
more variables.
Notes 2. The two variables may act upon each other: Cause and effect relation exists in this
case also but it may be very difficult to find out which of the two variables is
independent. For example, if we have data on price of wheat and its cost of production,
the correlation between them may be very high because higher price of wheat may
attract farmers to produce more wheat and more production of wheat may mean
higher cost of production, assuming that it is an increasing cost industry. Further,
the higher cost of production may in turn raise the price of wheat. For the purpose
of determining a relationship between the two variables in such situations, we can
take any one of them as independent variable.
3. The two variables may be acted upon by the outside influences: In this case we might
get a high value of correlation between the two variables, however, apparently no
cause and effect type relation seems to exist between them. For example, the demands
of the two commodities, say X and Y, may be positively correlated because the
incomes of the consumers are rising. Coefficient of correlation obtained in such a
situation is called a spurious or nonsense correlation.
4. A high value of the correlation coefficient may be obtained due to sheer coincidence
(or pure chance): This is another situation of spurious correlation. Given the data on
in
e.
any two variables, one may obtain a high value of correlation coefficient when in
fre
fact they do not have any relationship. For example, a high value of correlation
s4
coefficient may be obtained between the size of shoe and the income of persons of a
e
ot
locality.
.n
w
w
Let the bivariate data be denoted by (Xi, Yi), where i = 1, 2 ...... n. In order to have some idea about
ht
the extent of association between variables X and Y, each pair (Xi, Yi), i = 1, 2......n, is plotted on
a graph. The diagram, thus obtained, is called a Scatter Diagram.
Each pair of values (Xi, Yi) is denoted by a point on the graph. The set of such points (also known
as dots of the diagram) may cluster around a straight line or a curve or may not show any
tendency of association. Various possible situations are shown with the help of following
diagrams:
Figure 10.1: Scatter Diagram
If all the points or dots lie exactly on a straight line or a curve, the association between the Notes
variables is said to be perfect. This is shown below:
Figure 10.2
A scatter diagram of the data helps in having a visual idea about the nature of association
between two variables. If the points cluster along a straight line, the association between variables
is linear. Further, if the points cluster along a curve, the corresponding association is non-linear
or curvilinear. Finally, if the points neither cluster along a straight line nor along a curve, there
in
e.
is absence of any association between the variables. fre
s4
It is also obvious from the above figure that when low (high) values of X are associated with low
e
(high) value of Y, the association between them is said to be positive. Contrary to this, when low
ot
.n
(high) values of X are associated with high (low) values of Y, the association between them is
w
said to be negative.
w
w
//
s:
Broadly speaking, there are four types of Correlation, namely, (a) Positive correlation,
(b) Negative correlation, (c) Linear correlation and (d) Non-Linear Correlation.
If the values of two variables deviate in the same direction i.e., if increase in the values of one
variable results, on an average, in a corresponding increase in the values of the other variable or
if a decrease in the values of one variable results, on an average, in a corresponding decrease in
the values of the other variable, the corresponding correlation is said to be positive or direct.
Examples:
(i) Sales revenue of a product and expenditure on Advertising.
(ii) Amount of rain fall and yield of a crop (up to a point).
(iii) Price of a commodity and quantity of supply of a commodity.
(iv) Height of the Parent and the height of the Child.
(v) Number of patients admitted into a Hospital and Revenue of the Hospital.
(vi) Number of workers and output of a factory.
If the variables X and Y are perfectly positively related to each other then, we get a graph as
shown in Figure 10.3.
Notes
Figure 10.3: Perfect Positive Correlation (R = 1)
If the variables X and Y are related to each other with a very high degree of positive relationship
then we can notice a graph as in Figure 10.4.
in
e.
fre
s4
e
ot
.n
w
w
w
//
s:
tp
ht
If the variables X and Y are related to each other with a very low degree of positive relationship
then we can notice a graph as in Figure 10.5.
Correlation is said to be negative or inverse if the variables deviate in the opposite direction i.e.,
if the increase (decrease) in the values of one variable results, on the average, in a corresponding
decrease (increase) in the values of the other variable.
Notes
Examples:
1. Price and demand of a commodity
If the variables X and Y are perfectly negatively related to each other then, we get a graph as
shown in Figure 10.6.
Figure 10.6: Perfect Negative Correlation (R = - 1)
in
e.
fre
s4
e
ot
.n
w
w
w
//
s:
tp
ht
If the variables X and Y are related to each other with a very high degree of negative relationship
then we can notice a graph as in Figure 10.7.
If the variables X and Y are related to each other with a very low degree of negative relationship
then we can notice a graph as in Figure 10.8.
Notes Figure 10.8 : Very Low Negative Correlation (R = Near to 0 but Negative)
10.2.3 No Correlation
If the scatter diagram show the points which are highly spread over and show no trend or
patterns we can say that there is no correlation between the variables. Refer to Figure 10.9.
in
e.
Figure 10.9: No Correlation (r = 0)
fre
es4
ot
.n
w
w
w
//
s:
tp
ht
Linear Correlation
Two variables are said to be linearly related if corresponding to a unit change in one variable
there is a constant change in the other variable over the entire range of the values.
If two variables are related linearly, then we can express the relationship as
Y = a+bX
where ‘a’ is called as the “intercept” (If X= 0, then Y= a) and ‘b’ is called as the “rate of change” or
slope.
If we plot the values of X and the corresponding values of Y on a graph, then the graph would be
a straight line as shown in Figure 10.10.
Example:
X 1 2 3 4 5
Y 6 8 10 12 14
For a unit change in the value of x, a constant 2 units change in the value of y can be noticed. The Notes
above can be expressed as: Y = 4 + 2 X
Figure 10.10
in
e.
If corresponding to a unit change in one variable, the other variable does not change in a
fre
constant rate, but change at varying rates, then the relationship between two variables is said to
s4
be non-linear or curvilinear as shown in Figure 10.11. In this case, if the data are plotted on the
e
ot
graph, we do not get a straight line curve. Mathematically, the correlation is non-linear if the
.n
slope of the plotted curve is not constant. Data relating to Economics, Social Science and Business
w
w
only.
//
s:
tp
ht
Example:
X -6 -4 -2 0 2 4 6
Y 36 16 4 0 4 16 36
To measure the degree of association between two variables X and Y, Karl Pearson defined the
Coefficient of Correlation ‘’ as below. In this method, the coefficient of correlation is calculated
as the ratio of the covariance of the two variables to the product of their variances.
Notes
Cov(X i , Yi )
Correlation co-efficient ( ) =
{V(X i ) V(Yi )}
where Cov(Xi,Yi) =
(X i X)(Yi Y)
n
(X i X)2
and V(Xi) =
n
V(Yi) =
(Y Y)
i
= xy
2 2
x y
in
Where x X i X and y Yi Y
e.
fre
s4
Pearson’s Method – Direct Method
e
ot
.n
Example 1: Following is the data on two variables Xi and Yi we find the sums and squares
s:
tp
( Xi X ) ( Yi Y ) ( X i X )2 ( Yi Y )2 ( X i X )( Yi Y )
Xi Yi
x y x 2
y2 xy
2 7 -2 -4 4 16 8
3 9 -1 -2 1 4 2
4 10 0 -1 0 1 0
5 14 1 3 1 9 3
6 15 2 4 4 16 8
20 55 2 2
x =10 y =46 xy = 21
Xi 20 4, Y Y i 55
X = 11
n 5 n 5
=
xy
2 2
x y
21 21
= = 0.98
10 46 3.16 x 6.78
The value of = 0.98 shows that two series X and Y have almost perfect positive correlation.
When the arithmetic means of both sets of numerical items are not whole numbers and involve
decimals, calculating the coefficient of correlation by direct method becomes tedious. To
overcome this difficulty the following modified short-cut method formula is used:
Cov(Xi, Yi) =
X Y i i
XY
n
2 2
V(Xi) =
X i
X 2 ; V(Yi) =
Y i
Y2
n n
Cov(X i , Yi )
= {V(X i ) V(Yi )}
n X i Yi X i Yi
=
n X X i 2 n Yi2 Yi 2
2
in
i
e.
fre
s4
Example 2: Calculate the Karl Pearson’s coefficient of correlation for the following data
e
ot
Let sales represents Xi variable and advertise expenditure represents Yi variable to calculate the
w
w
n X i Yi X i Yi
ht
=
n X i2 X i 2 n Yi2 Yi 2
Xi Yi Xi 2 Yi 2 X i Yi
1 3 1 9 3
2 15 4 225 30
3 6 9 36 18
4 20 16 400 80
5 9 25 81 45
6 25 36 625 150
2 2
X i =21 Y i =78 X i =91 Y i =1376 X Y = 326
i i
6 x 326 21x78
= 2 2
6 x 91 21 6 x 1376 78
318
= (10.247 x 46.605)
= 0.667
This suggests that a fairly high degree of correlation between X and Y series i.e. between sales
and advertising expenditure.
In case the magnitude of the data is large, using the two methods explained above will give lot
of inconvenience while calculating the correlation coefficient by Karl Pearson’s method. So we
take deviations from some convenient numbers to reduce the magnitude of data. There will be
no change in the value of correlation coefficient even if deviations are taken. We define, ui = Xi-
A and = vi = Yi- B, where A and B can any arbitrary and assumed values. The formulae are given
below,
2 2
V(ui) =
u i
u 2 ; V(vi) =
v i
v 2 ; Cov(ui,vi) =
u i vi
uv
n n n
Cov(u i , vi )
=
{V(u i ) V(vi )}
n ui vi ui vi
=
n u u i 2 n vi2 vi 2
2
i
in
e.
Example 3: Using short cut method, we calculate ‘r’ for the following data of Xi =
fre
Advertising expenditure (Rupees in thousands) and Yi = sales (Rupees in lakhs). Let us define
s4
Xi Yi ui vi ui 2 vi 2 u i vi
w
w
w
65 53 5 -17 25 289 - 85
s:
tp
62 58 2 -12 4 144 - 24
ht
u i
50
5; v
v i
40
4
u = n 10 n 10
n ui vi ui vi
= n u u i 2 n vi2 vi 2
2
i
27040 27040
=
53980 22240 34647.373
= 0.78
Hence the correlation between X and Y series is fairly high as the coefficient of correlation is 0.78.
(i) The value of correlation coefficient varies between [–1, +1]. This indicates that the value
of does not exceed unity.
(ii) Sign of depends on sign of the covariance.
(iii) If = –1, the variables are perfectly negatively correlated.
(iv) If = +1, the variables are perfectly positively correlated.
(v) If = 0, the variables are not correlated in a linear fashion. There may be nonlinear
relationship between variables.
(vi) Correlation coefficient is independent of change of scale and shifting of origin. In other
words, shifting the origin and change the scale do not have any effect on the value of
correlation.
Let us see the following example to understand the concept, ‘if = 0, the variables are not
correlated in a linear fashion. There may be nonlinear relationship between variables’.
in
e.
Example 4: If Xi and Yi are given as below, we calculate the correlation coefficient.
fre
s4
Xi Yi Xi 2 Yi 2 X i Yi
e
ot
-3 9 9 81 -27
.n
w
-2 4 4 16 -8
w
-1 1 1 1 -1
w//
0 0 0 0 0
s:
tp
1 1 1 1 1
ht
2 4 4 16 8
3 9 9 81 27
2 2 2 2
X i =0 Y i =28 X i =28 Y i =196 X i Yi =0
n X i Yi X i Yi
=
n X X i 2 n Yi2 Yi 2
2
i
7 x 0 0 x 28
= 2 2
7 x 28 0 7 x 196 28
0
= 196 x 588
0
= 196 x 588 =0
Since = 0 it does not mean that the variables X i and Yi are uncorrelated. It can only be said that
the variables are linearly uncorrelated. In fact if we closely look at the data of X i and Yi, it can be
observed that Yi = Xi2 is the relationship existing between Xi and Yi. This is a nonlinear relationship
between the variables. Karl Pearson’s coefficient of correlation can not measure nonlinear
relationship between the variables.
When the number of observations is large, the data are often classified into two-way frequency
distribution i.e. table where in the values of one variable (X) are represented in the rows while
other variable (Y) in columns. These values can be either discrete or continuous. The frequencies
in each class are shown in cells in the body of the table.
Steps for calculating correlation coefficient for grouped data:
(i) Record the mid-points (mp) of the class intervals for both X and Y variables.
(ii) Choose an assumed mean in X series and calculate the deviations (dx) from it. The same
procedure to be used for Y series and calculate the deviations (dy).
(iii) To simplify the calculations, step deviations can be taken by dividing deviations by a
common factor.
(iv) Calculate f.dx, f .dx.dx i.e.f.dx2, f.dx.dy for X series and f.dy, f .dy.dy i.e.f.dy2, f.dx.dy for Y
series.
(v) Substitute all the values obtained in the following formula:
in
e.
fre n fdxdy fdx fdy
=
s4
2
n fdx fdx2 n fdy2 fdy
2
e
ot
.n
w
w
w
Example 5: Calculate the Karl Pearson’s coefficient of correlation for the following
//
s:
grouped data:
tp
ht
f.dx2 16 0 14 fdx2 = 84
f.dxdy -5 0 3 fdxdy = 0
9.27
=
89.76 67.82
in
e.
9.27
fre
= = - 0.119
s4
9.47 8.24
e
ot
.n
This shows very low degree of negative correlation between advertising expenditure (X) and
w
It is not possible to express attributes such as character, conduct, honesty, beauty, morality,
intellectual integrity etc. in numerical terms. For example, it is easy to for a class teacher to
arrange the students in his class in an ascending or descending order of intelligence. This means
that he can rank them according to their intelligence. Hence in problems that involve attributes
of the type mentioned above, the coefficient of correlation is entirely based on the rank differences
between corresponding items.
(i) In the first case, when actual ranks are given, the difference of the two ranks (R1 – R2) are
taken and these are denoted by ‘d’
(ii) The differences are squared and their total (d2) obtained
(iii) Then the following formula is applied to calculate the rank correlation coefficient
6 d2
rs = 1
N(N 2 1)
Notes (iv) In the second case,1 when the ranks are not given, when the actual data are given, we have
to assign ranks. We may do so by taking highest value as 1 or the lowest value as 1. When
the two observations are same, then the normal practice is to assign an average rank to the
two observations.
Student 1 2 3 4 5 6 7 8 9 10
Ranks in Subject A 4 6 1 3 9 7 10 2 8 5
Ranks in Subject B 5 8 3 1 7 6 9 2 10 4
Solution:
In order to calculate rank correlation, we have to calculate d2 and the following formula is used
in
e.
fre 6 d2
s4
rs = 1
N(N 2 1)
e
ot
.n
1 4 5 -1 1
2 6 8 -2 4
3 1 3 -2 4
4 3 1 2 4
5 9 7 2 4
6 7 6 1 1
7 10 9 1 1
8 2 2 0 0
9 8 10 -2 4
10 5 4 1 1
d2 = 24
6 d2
rs = 1
N(N 2 1)
6 x 24
rs = 1
10(10 2 1)
144
rs = 1 = 0.855
1099
The rank correlation coefficient (0.855) shows that there is a very high degree of correlation
between ranks obtained in subject A and Subject B of the ten students.
Student No 1 2 3 4 5 6 7 8 9 10
Marks by Judge X 43 56 29 81 96 34 73 62 48 76
Marks by Judge Y 15 26 34 86 19 29 83 67 51 58
1 43 8 15 10 -2 4
2 56 6 26 8 -2 4
3 29 10 34 6 4 16
4 81 2 86 1 1 1
5 96 1 19 9 -8 64
in
e.
6 34 9 29 7 2 4
7 73 4 83 2
fre 2 4
s4
8 62 5 67 3 2 4
e
ot
9 48 7 51 5 2 4
.n
10 76 3 58 4 -1 1
w
w
//w
d2 = 106
s:
tp
ht
6 d2
rs = 1
N(N 2 1)
6 x 106
rs = 1
10(10 2 1)
636
rs = 1 = 0.36
1099
The rank correlation coefficient (0.36) shows that there is a low degree of correlation between
marks assigned by Judge X and Judge Y to the ten students.
Example 8: Obtain the rank correlation between variables Xth (Price of commodity A in
) and Yth (Price of commodity B in ) from the following pairs of observed values.
X 24 29 23 38 46 52 41 36 68 56
Y 110 126 145 131 163 158 131 129 154 140
d2 = 64.5
X Ranks of X Y Ranks of Y Difference Squared
(R1 ) (R2) (R1 – R2) (d) difference ( d2)
24 9 110 10 -1 1
Quantitative Techniques-II
29 8 126 9 -1 1
23 10 145 4 6 36
38 6 131 6.5 -0.5 0.25
46 4 163 1 3 9
Notes 52 3 158 2 1 1
41 5 131 6.5 -1.5 2.25
36 7 129 8 -1 1
68 1 154 3 -2 4
56 2 140 5 -3 9
d2 = 64.5
In the data, there two equal values (found in Y series) i.e. 131 which is a tie for the ranks 6 and 7
respectively. Then the average of 6 and 7 ranks (6.5) is assigned as rank for both the observations.
Then the common ranks for both the observations are 6.5.
In this data we find common ranks in the second series (Y). Therefore the formula for the
coefficient of correlation through the rank differences method has to be modified as given
below:
1 1 1
6 d2 (m 13 m 1 ) (m 2 3 m 2 ) (m 3 3 m 3 ) ........
12 12 12
rs = 1
N(N 2 1)
in
e.
fre
m1, m2, m3 …. stands for number of items in the respective groups with common ranks. In this
s4
problem only one group having items two (or two common ranks in that group), hence we can
e
assign m1 = 2
ot
.n
1
w
6 d 2 (m 1 3 m 1 )
w
12
w
rs = 1
//
N(N 2 1)
s:
tp
ht
1
6 64.5 (2 3 2)
12
rs = 1
10(10 2 1)
6 64.5 0.5
rs = 1 = 0.61
990
The rank correlation coefficient (0.61) shows that there is a moderate correlation between X
and Y.
Self Assessment
In case of three variables xi, xj and xk, the partial correlation between xi and xj is defined as the
simple correlation between them after eliminating the effect of xk. This is denoted as rij × k.
We note that xi × k = xi – bikxk is that part of xi which is left after the removal of linear effect of xk Notes
on it. Similarly, xj × k = xj – bjkxk is that part of xj which is left after the removal of linear effect of
xk on it. Equivalently, rij × k can also be regarded as correlation between xi × k and xj × k. Thus, we
can write .
Si
2
= nSi rik nSi Sk rik nSi2 1 rik2
Sk
in
e.
2 2 2
Similarly, x i×k = nS j 1 rjk . fre
s4
nSiS j rij rikrjk rij rikrjk
e
ot
2 2 2 2 2 2
i ik j jk ik jk
w
w
// w
s:
tp
Did u know? What is Zero order, First order, and Second order Partial Correlation?
ht
Simple correlation between two variables is called the zero order co-efficient since in simple
correlation, no factor is held constant. The partial correlation studied between two variables by
keeping the third variable constant is called a first order co-efficient, as one variable is kept
constant. Similarly, we can define a second order co-efficient and so on. The partial correlation
co-efficient varies between –1 and +1. Its calculation is based on the simple correlation
co-efficient.
The coefficient of multiple correlations in case of regression of xi on xj and xk, denoted by Ri×jk,
is defined as a simple coefficient of correlation between xi and xic.
Cov xi , xic
x x i ic
x x x i i i . jk
Thus Ri × jk =
Var xi Var xic x x 2 2 2
i ic x x x
2
i i i . jk
xi2 xi xi jk xi2 x xi jk i
= (Using property III)
xi2 xi xi jk xi xi2 xi – xi xi jk
2
nSi2 nSi2. jk 1
= Si2 Si2. jk
2
nS nS nS
i
2
i
2
i . jk Si
Si2 jk
It may be noted here that is proportion of unexplained variation. Thus, we can also write
Si2
xi2 jk
Ri2 jk 1 .
xi2
2
Further, we can write Ri jk in terms of the simple correlation coefficients.
Ri2 jk = 1
Si2 1 rij2 rik2 rjk2 2rij rik rjk r2
ij rik2 2 rij rik rjk
Si2 1 rjk2 1 rjk2
in
e.
fre 2
Notes If there are m variables, R12 23....m 1
S12 23....m
1
x123....m
s4
S12 x12
e
ot
.n
w
The multiple correlation coefficient generalizes the standard coefficient of correlation. It is used
s:
tp
in multiple regression analysis to assess the quality of the prediction of the dependent variable.
ht
It corresponds to the squared correlation between the predicted and the actual values of the
dependent variable. It can also be interpreted as the proportion of the variance of the dependent
variable explained by the independent variables. When the independent variables (used for
predicting the dependent variable) are pair wise orthogonal, the multiple correlation coefficient
is equal to the sum of the squared coefficients of correlation between each independent variable
and the dependent variable. This relation does not hold when the independent variables are not
orthogonal. The significance of a multiple coefficient of correlation can be assessed with an F
ratio. The magnitude of the multiple coefficient of correlation tends to overestimate the
magnitude of the population correlation, but it is possible to correct for this overestimation.
!
Caution Strictly speaking we should refer to this coefficient as the squared multiple
correlation coefficient, but current usage seems to ignore the adjective “squared,” probably
because mostly its squared value is considered.
Self Assessment
6. A ....................... rank correlation implies that a high (low) rank of an individual according Notes
to one characteristic is accompanied by its high (low) rank according to the other.
7. The regression equations are useful for predicting the value of ....................... variable for
given value of the ....................... variable.
8. When two or more individuals have the same rank, each individual is assigned a rank
equal to the ....................... of the ranks that would have been assigned to them in the event
of there being slight differences in their values.
10.5 Summary
Researchers sometimes put all the data together, as if they were one sample.
There are two simple ways to approach these types of data.
We can use the technique of correlation to test the statistical significance of the association.
In other cases we use regression analysis to describe the relationship precisely by means
of an equation that has predictive value.
in
e.
Straight-line (linear) relationships are particularly important because a straight line is a
fre
simple pattern that is quite common.
e s4
The correlation measures the direction and strength of the linear relationship.
ot
.n
w
10.6 Keywords
w
//w
s:
1. Obtain the two lines of regression from the following data and estimate the blood pressure
when age is 50 years. Can we also estimate the blood pressure of a person aged 20 years on
the basis of this regression equation? Discuss.
Blood Pressure (Y) 127 112 140 118 129 116 130 125 115 120 135 133
2. Show that the coefficient of correlation, r, is independent of change of origin and scale.
3. Prove that the coefficient of correlation lies between - 1 and + 1.
4. “If two variables are independent the correlation between them is zero, but the converse
is not always true”. Explain the meaning of this statement.
5. What is Spearman’s rank correlation? What are the advantages of the coefficient of rank
correlation over Karl Pearson’s coefficient of correlation?
Books Abrams, M.A., Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation , Baltimore: John Hopkins
University Press, 1943.
in
e.
www.scribd.com/doc
fre
s4
www.soas.ac.uk
e
ot
www.web-source.net
.n
w
w
w
//
s:
tp
ht
CONTENTS
Objectives
Introduction
11.1 Regression Analysis
11.1.1 Simple Regression
11.2 Meaning of Multiple Regressions
11.3 Coefficient of Determination (2)
11.3.1 Linear Multiple Regression Analysis
11.3.2 Logistic Regression Analysis
11.4 Coefficient of Multiple Determinations
11.5 Summary
in
e.
11.6 Keywords fre
s4
11.7 Review Questions
e
ot
Objectives
w
//
s:
tp
Introduction
As you develop Cause & Effect diagrams based on data, you may wish to examine the degree of
correlation between variables. A statistical measurement of correlation can be calculated using
the least squares method to quantify the strength of the relationship between two variables. The
output of that calculation is the Correlation Coefficient, or (r), which ranges between –1 and 1.
A value of 1 indicates perfect positive correlation – as one variable increases, the second increases
in a linear fashion. Likewise, a value of –1 indicates perfect negative correlation – as one variable
increases, the second decreases. A value of zero indicates zero correlation.
Before calculating the Correlation Coefficient, the first step is to construct a scatter diagram.
Most spreadsheets, including Excel, can handle this task. In this case, the process improvement
team is analyzing door closing efforts to understand what the causes could be. The Y-axis
represents the width of the gap between the sealing flange of a car door and the sealing flange
on the body – a measure of how tight the door is set to the body. The fishbone diagram indicated
that variability in the seal gap could be a cause of variability in door closing efforts.
Notes
Notes It is important to note that Correlation is not Causation - two variables can be very
strongly correlated, but both can be caused by a third variable.
Example: Consider two variables: (1) how much my grass grows per week, and (2) the
average depth of the local reservoir. Both variables could be highly correlated because both are
dependent upon a third variable – how much it rains.
If the coefficient of correlation calculated for bivariate data (Xi, Yi), i = 1, 2, ...... n, is reasonably
high and a cause and effect type of relation is also believed to be existing between them, the next
logical step is to obtain a functional relation between these variables. This functional relation is
known as regression equation in statistics. Since the coefficient of correlation is measure of the
degree of linear association of the variables, we shall discuss only linear regression equation.
in
This does not, however, imply the non-existence of non-linear regression equations.
e.
fre
The regression equations are useful for predicting the value of dependent variable for given
s4
value of the independent variable. As pointed out earlier, the nature of a regression equation is
e
ot
The term regression was first introduced by Sir Francis Galton in 1877. In his study of the
s:
tp
relationship between heights of fathers and sons, he found that tall fathers were likely to have
ht
tall sons and vice-versa. However, the mean height of sons of tall fathers was lower than the
mean height of their fathers and the mean height of sons of short fathers was higher than the
mean height of their fathers. In this way, a tendency of the human race to regress or to return to
a normal height was observed. Sir Francis Galton referred this tendency of returning to the
mean height of all men as regression in his research paper, “Regression towards mediocrity in
hereditary stature”. The term ‘Regression’, originated in this particular context, is now used in
various fields of study, even though there may be no existence of any regressive tendency.
For a bivariate data (Xi, Yi), i = 1, 2, ...... n, we can have either X or Y as independent variable. If
X is independent variable then we can estimate the average values of Y for a given value of X.
The relation used for such estimation is called regression of Y on X. If on the other hand Y is used
for estimating the average values of X, the relation will be called regression of X on Y. For a
bivariate data, there will always be two lines of regression. It will be shown later that these two
lines are different, i.e., one cannot be derived from the other by mere transfer of terms, because
the derivation of each line is dependent on a different set of assumptions.
Line of Regression of Y on X
The general form of the line of regression of Y on X is YCi = a + bXi, where YCi denotes the average
or predicted or calculated value of Y for a given value of X = Xi. This line has two constants, a and
b. The constant a is defined as the average value of Y when X = 0. Geometrically, it is the intercept
of the line on Y-axis. Further, the constant b, gives the average rate of change of Y per unit change
in X, is known as the regression coefficient.
The above line is known if the values of a and b are known. These values are estimated from the Notes
observed data (Xi, Yi), i = 1, 2, ...... n.
Notes It is important to distinguish between YCi and Yi. Where as Yi is the observed value,
YCi is a value calculated from the regression equation.
Using the regression YCi = a + bXi, we can obtain YC1, YC2, ...... YCn corresponding to the X values
X1, X2, ...... Xn respectively. The difference between the observed and calculated value for a
particular value of X say Xi is called error in estimation of the i th observation on the assumption
of a particular line of regression. There will be similar type of errors for all the n observations.
We denote by ei = Yi – YCi (i = 1, 2,.....n), the error in estimation of the i th observation. As is
obvious from Figure 11.1, ei will be positive if the observed point lies above the line and will be
negative if the observed point lies below the line. Therefore, in order to obtain a Figure of total
error, eis are squared and added. Let S denote the sum of squares of these errors,
n n
2
i.e., S ei2 Yi YCi .
in
e.
i1 i1
fre
Figure 11.1
s4
e
ot
.n
w
w
//w
s:
tp
ht
The regression line can, alternatively, be written as a deviation of Yi from YCi i.e. Yi – YCi = ei or
Yi = YCi + ei or Yi = a + bXi + ei. The component a + bXi is known as the deterministic component
and ei is random component.
The value of S will be different for different lines of regression. A different line of regression
means a different pair of constants a and b. Thus, S is a function of a and b. We want to find such
values of a and b so that S is minimum. This method of finding the values of a and b is known as
the Method of Least Squares.
Rewrite the above equation as S = (Yi – a – bXi)2 ( YCi = a + bXi).
¶S ¶S ¶S ¶S
(i) 0 and (ii) ¶ b 0, where and are the partial derivatives of S w.r.t. a and b
¶a ¶a ¶b
respectively.
Now
¶S n
= 2 Yi a bXi 0
¶a i1
n n n
Or Y a bX
i 1
i i = Y na b X
i1
i
i1
i 0
Notes n n
or Y
i1
i = na b X i
i1
.... (1)
¶S n
Also, = 2 Yi a bX i X i 0
¶b i1
n n
or 2 X i Yi aX i bX i2 = X Y aX i i i bX i2 0
i1 i1
n n n
or X Y a X
i i i b X i2 = 0
i1 i1 i1
n n n
2
or X Y
i1
i i = a X i b X i
i1 i1
.... (2)
Equations (1) and (2) are a system of two simultaneous equations in two unknowns a and b,
which can be solved for the values of these unknowns. These equations are also known as
in
e.
normal equations for the estimation of a and b. Substituting these values of a and b in the
fre
regression equation YCi = a + bXi, we get the estimated line of regression of Y on X.
e s4
Y na b X i
s:
i
= or Y a bX .... (3)
tp
n n n
ht
This shows that the line of regression YCi = a + bXi passes through the point X, Y .
From equation (3), we have a = Y bX .... (4)
= Y X i bX X i b X i2 nXY b.nX 2 b X i2
or X Y nXY
i i = b X 2
i nX 2
X Y nXY
i i
or b = 2 2 .... (5)
X nX i
Also, X Y nXY
i i = X i X Yi Y
2
and X 2
i nX 2 = X i X
X X Y Y
i i
b = 2 .... (6)
X X i
Notes
x yi i
or b = 2 .... (7)
x i
1
n
X i X Yi Y Cov X, Y
b = .... (8)
1 2
2X
n
X i X
The expression for b, which is convenient for use in computational work, can be written from
equation (5) is given below:
X Y i i X Y
i i
X Y ni i
n n
X Y i i
n
b = 2 2
X Xi
in
X n n i
2
X i2
e.
i
fre n
e s4
X Y
//
n X i Yi
s:
i i
b = .... (9)
tp
2
n X X 2
ht
i i
To write the shortcut formula for b, we shall show that it is independent of change of origin but
not of change of scale.
As in case of coefficient of correlation we define
Xi A Yi B
ui = and vi =
h k
X = A hu and Y = B kv
also X i X = h ui u and Yi Y = k vi v
hk u i u vi v k u i u vi v
b = 2 2
h 2 ui u h ui u
k n ui vi ui vi
= 2 .... (10)
h
n u i2 u i
Notes
Cov X, Y
Consider equation (8), b 2X
r XY
Writing Cov(X, Y) = r × XY, we have b r Y
2X X
Y
or Y
Ci Y = r Xi X .... (12)
X
Line of Regression of X on Y
The general form of the line of regression of X on Y is XCi = c + dYi, where XCi denotes the
predicted or calculated or estimated value of X for a given value of Y = Yi and c and d are
in
e.
constants. d is known as the regression coefficient of regression of X on Y.
fre
s4
In this case, we have to calculate the value of c and d so that
e
ot
As in the previous section, the normal equations for the estimation of c and d are
w
//w
Figure 11.2
Y
Yi
c+b
X ci =
Yi
Yci Xi
O X
This shows that the line of regression also passes through the point X, Y . Since both the lines
of regression passes through the point X, Y , therefore X, Y is their point of intersection as
shown in Figure 11.3.
As before, the various expressions for d can be directly written, as given below.
X Y nXY
i i
d = 2 2 .... (16)
Y nY i
Notes
Figure 11.3
dY
i
c+
=
Xi
a+b
Y ci =
ci
X
Y
O X X
X X Y Y
i i
or d = 2 .... (17)
Y Y i
x yi i
or d = .... (18)
in
2
y
e.
i
fre
s4
1
X i X Yi Y Cov X, Y
e
ot
n
.n
= 1 2
2Y .... (19)
Yi Y
w
n
w
// w
s:
X Y
n X i Yi
tp
i i
ht
This expression is useful for calculating the value of d. Another short-cut formula for the
calculation of d is given by
d =
h n ui vi ui vi .... (21)
2
k
n vi2 vi
Xi A Yi B
where u i and vi
h k
Cov X, Y r X Y
d = 2
2
r X .... (22)
Y Y Y
Substituting the value of c from equation (15) into line of regression of X on Y we have
X Ci = X dY dYi or X Ci
X d Yi Y .... (23)
X
or X Ci X
= r Yi Y .... (24)
Y
Remarks: It should be noted here that the two lines of regression are different because these
have been obtained in entirely two different ways. In case of regression of Y on X, it is assumed
Notes that the values of X are given and the values of Y are estimated by minimising S(Yi – YCi)2 while
in case of regression of X on Y, the values of Y are assumed to be given and the values of X are
estimated by minimising S(Xi – XCi)2. Since these two lines have been estimated on the basis of
different assumptions, they are not reversible, i.e., it is not possible to obtain one line from the
other by mere transfer of terms. There is, however, one situation when these two lines will
coincide. From the study of correlation we may recall that when r = ± 1, there is perfect correlation
between the variables and all the points lie on a straight line. Therefore, both the lines of
regression coincide and hence they are also reversible in this case. By substituting r = ± 1 in
equation (12) or (24) it can be shown that the lines of regression in both the cases become
Yi Y Xi X
Y X
Further when r = 0, equation (12) becomes YCi Y and equation (24) becomes X Ci X. These are
the equations of lines parallel to X-axis and Y-axis respectively. These lines also intersect at the
point (X, Y) and are mutually perpendicular at this point.
in
e.
Correlation Coefficient and the Two Regression Coefficients
fre
e s4
Y X
ot
X Y
w
w
w
Y X
//
bd r r r 2 or
s:
Remarks: The following points should be kept in mind about the coefficient of correlation and
the regression coefficients:
always be same and this will depend upon the sign of Cov(X, Y).
2. Since bd = r2 and 0 r2 1, therefore either both b and d are less than unity or if one of them
is greater than unity, the other must be less than unity such that 0 b × d 1 is always true.
Example: Obtain the two regression equations and find correlation coefficient between
X and Y from the following data:
X 10 9 7 8 11
Y 6 3 2 4 5
Solution:
Calculation Table
X Y XY X2 Y2
10 6 60 100 36
9 3 27 81 9
7 2 14 49 4
Contd...
8 4 32 64 16
11 5 55 121 25
224 45 PROFESSIONAL
LOVELY 20 188
UNIVERSITY 415 90
Unit 11: Multiple Regression and Correlation Analysis
X Y XY X2 Y2
10 6 60 100 36
9 3 27 81 9
Notes
7 2 14 49 4
8 4 32 64 16
11 5 55 121 25
45 20 188 415 90
(a) Regression of Y on X
45 20
Also, X = 9 and Y 4
5 5
in
e.
(b) Regression of X on Y fre
s4
e
d = 2 2
n Y Y
2 5 90 20
w
w
// w
3. The most likely marks in statistics when marks in economics are 30.
Marks in Eco. 25 28 35 32 31 36 29 38 34 32
Marks in Stat. 43 46 49 41 36 32 31 30 33 39
Solution:
Calculation Table
10 30
X = 31 10 32 and Y 41 10 38
in
b =
e.
2
n u u
fre 2 1500 100
s4
a = Y bX = 38 + 0.66 × 32 = 59.26
e
ot
.n
Regression equation is
w
w
Y C = 59.26 – 0.66X
//w
s:
(b) Regression of X on Y
tp
ht
c = X dY = 32 + 0.23 × 38 = 40.88
Regression equation is
X C = 40.88 – 0.23Y
2. Coefficient of correlation
Self Assessment
1. The regression line can, alternatively, be written as a deviation of Yi from YCi i.e. Yi – YCi =
……………………
Multiple regressions are a statistical technique that allows us to predict someone’s score on one
variable on the basis of their scores on several other variables. An example might help. Suppose
we were interested in predicting how much an individual enjoys their job. Variables such as
salary, extent of academic qualifications, age, sex, number of years in full-time employment and
socioeconomic status might all contribute towards job satisfaction. If we collected data on all of
these variables, perhaps by surveying a few hundred members of the public, we would be able
to see how many and which of these variables gave rise to the most accurate prediction of job
satisfaction. We might find that job satisfaction is most accurately predicted by type of occupation,
salary and years in full-time employment, with the other variables not helping us to predict job
satisfaction.
When using multiple regressions in psychology, many researchers use the term “independent
variables” to identify those variables that they think will influence some other “dependent
variable”. We prefer to use the term “predictor variables” for those variables that may be useful
in predicting the scores on another variable that we call the “criterion variable”. Thus, in our
example above, type of occupation, salary and years in full-time employment would emerge as
in
significant predictor variables, which allow us to estimate the criterion variable – how satisfied
e.
someone is likely to be with their job. As we have pointed out before, human behaviour is fre
s4
inherently noisy and therefore it is not possible to produce totally accurate predictions, but
e
multiple regressions allow us to identify a set of predictor variables which together provide a
ot
.n
In the case of simple linear regression, one variable, say, X1 is affected by a linear combination
//w
of another variable X2 (we shall use X1 and X2 instead of Y and X used earlier). However, if X1 is
s:
tp
affected by a linear combination of more than one variable, the regression is termed as a
ht
Xjc = aj×1, 2, .... j–1, j + 1, .... k + bj 1.2,3, .... j –1, j + 1, ....kX1 + bj 2.1, 3, .... j – 1, j + 1, ....kX2 +......(j = 1, 2,.... k).
Here aj.1,2, .... , bj1.2, 3, .... ...... etc. are constants. The constant aj.1,2, .... is interpreted as the value of Xj
when X2, X3, ..... Xj-1, Xj + 1 ..... Xk are all equal to zero. Further, bj1.2,3, .... j–1, j + 1, ....k, bj2.1,3, .... j –1, j +1, ....k etc.,
are (k – 1) partial regression coefficients of regression of Xj on X1, X2 ...... Xj – 1, Xj + 1 ...... Xk.
For simplicity, we shall consider three variables X1, X2 and X3. The three possible regression
equations can be written as
X 1c = a1.23 + b12.3X2 + b13.2X3 .... (1)
X 2c = a2.13 + b21.3X1 + b23.1X3 .... (2)
X 3c = a3.12 + b31.2X1 + b32.1X2 .... (3)
Given n observations on X1, X2 and X3, we want to find such values of the constants of the
n 2
regression equation so that X
i 1
ij
Xijc , j = 1, 2, 3, is minimised.
!
Caution For convenience, we shall use regression equations expressed in terms of
deviations of variables from their respective means.
X 1c
a1.23 b12.3
X 2
b13.2
X 3
or X 1 a1.23 b12.3 X 2 b13.2 X 3 .... (4)
n n n
X 1c X1 = b12.3 X 2 X 2 b13.2 X 3 X 3 or x1c b12.3 x 2 b13.2 x 3 .... (5)
in
e.
and fre
x 3c = b31.2x1 + b32.1x2, respectively. .... (7)
e s4
ot
.n
w
Notes The subscript of the coefficients preceding the dot are termed as primary subscripts
w
w
while those appearing after it are termed as secondary subscripts. The number of secondary
//
s:
subscripts gives the order of the regression coefficient, e.g., b12.3 is regression coefficient of
tp
Let us first estimate the coefficients of regression equation (5). Given n observations on each of
the three variables X1, X2 and X3, we have to find the values of the constants b12.3 and b13.2X3 so
that is minimised. Using method of least squares, the normal equations can be written as
x x x x x x x
1 2
2
3 1 3 2 3
b 12.3 = 2 .... (10)
x x x x
2
2
2
3 2 3
x x x x x x x
1 3
2
2 1 2 2 3
b 13.2 = 2 .... (11)
x x x x
2
2
2
3 2 3
Note: Notes
1. Various sums of squares and sums of products of deviations, used above, can be computed
2. The fact that a regression coefficient is independent of change of origin can also be utilised
to further simplify the computational work.
3. The regression coefficients of equations (2) and (3) can be written by symmetry as given
below:
x x x x x x x
2 1
2
3 2 3 1 3
b 21.3 = 2
x x x x
2
1
2
3 1 3
in
e.
fre
x x x x x x x
2 3
2
1 2 1 1 3
s4
b 23.1 = 2
x x x x
2 2
e
ot
1 3 1 3
.n
w
Further, b 31.2 = b 13.2 and b 32.1 = b 23.1 and the expressions for the constant terms are
w
w
//
When = 1; or -1; or 0, the interpretation of does not pose any problem. When = 1; or –1, all the
points lie on straight line in a graph showing a perfect positive or negative correlation. When
the points are extremely scattered on a graph, then it becomes evident that there is almost no
relationship between the two variables. However, when it comes to other values of, we have to
be careful in its interpretation. Suppose we get a correlation of = 0.9, we may say that = 0.9 is
‘twice as good’ or ‘twice as strong’ as a correlation of = 0.45. It may be noted that this comparison
is wrong. The strength of is judged by coefficient of determination, for = 0.9, = 0.81. We multiply
it by 100, thus getting 81 per cent. Thus suggest that when = 0.9 then we can say that 81 per cent
of the total variation in the Y series can be attributed to the relationship with X.
Multiple regressions is the most commonly utilized multivariate technique. It examines the
relationship between a single metric dependent variable and two or more metric independent
variables. The technique relies upon determining the linear relationship with the lowest sum of
squared variances; therefore, assumptions of normality, linearity, and equal variance are carefully
observed. The beta coefficients (weights) are the marginal impacts of each variable, and the size
of the weight can be interpreted directly. Multiple regression is often used as a forecasting tool.
Notes dependent variables, as the objective is to arrive at a probabilistic assessment of a binary choice.
The independent variables can be either discrete or continuous. A contingency table is produced,
which shows the classification of observations as to whether the observed and predicted events
match. The sum of events that were predicted to occur which actually did occur and the events
that were predicted not to occur which actually did not occur, divided by the total number of
events, is a measure of the effectiveness of the model. This tool helps predict the choices consumers
might make when presented with alternatives.
In statistics, the coefficient of determination R2 is used in the context of statistical models whose
main purpose is the prediction of future outcomes on the basis of other related information. It is
the proportion of variability in a data set that is accounted for by the statistical model. It
provides a measure of how well future outcomes are likely to be predicted by the model.
There are several different definitions of R2 which are only sometimes equivalent. One class of
such cases includes that of linear regression. In this case, if an intercept is included thenR2 is
simply the square of the sample correlation coefficient between the outcomes and their predicted
in
values, or in the case of simple linear regression, between the outcomes and the values of the
e.
fre
single regress or being used for prediction. In such cases, the coefficient of determination ranges
s4
from 0 to 1. Important cases where the computational definition of R2 can yield negative values,
e
depending on the definition used, arise where the predictions which are being compared to the
ot
.n
corresponding outcomes have not been derived from a model-fitting procedure using those
w
w
data, and where linear regression is conducted without including an intercept. Additionally,
w
negative values of R2 may occur when fitting non-linear trends to data. In these instances, the
//
s:
mean of the data provides a fit to the data that is superior to that of the trend under this goodness
tp
ht
of fit analysis.
In multiple regression analysis, the proportion of the variation in Y explained by the regression,
which can be calculated as SSexplained/SStotal . In other words this is the proportion of variation
in the criterion variable that is accounted for by the co-variations in the predictor (independent)
variable. The coefficient of determination of a multiple linear regression model is the quotient
of the variances of the fitted values and observed values of the dependent variable. If we
denote yi as the observed values of the dependent variable, y as its mean, and ŷ i as the fitted
value, then the coefficient of determination is:
2
(yˆ i y)
R2 =
(y i y)2
Self Assessment
11.5 Summary
If the coefficient of correlation calculated for bivariate data (Xi, Yi), i = 1, 2, ...... n, is
reasonably high and a cause and effect type of relation is also believed to be existing
between them, the next logical step is to obtain a functional relation between these variables.
The general form of the line of regression of Y on X is YCi = a + bXi, where YCi denotes the Notes
average or predicted or calculated value of Y for a given value of X = Xi.
Multiple regressions are a statistical technique that allows us to predict someone’s score
on one variable on the basis of their scores on several other variables. An example might
help.
There are several different definitions of R2 which are only sometimes equivalent. One
class of such cases includes that of linear regression.
The least-squares regression line is the line that makes the sum of the squares of the
vertical distances of the data points from the line as small as possible.
Non-parametric regression analysis traces the dependence of a response variable on one
or several predictors without specifying in advance the function that relates the response
to the predictors.
11.6 Keywords
in
Coefficient of determination:In statistics, the coefficient of determination R2 is used in the context
e.
of statistical models whose main purpose is the prediction of future outcomes on the basis of
fre
other related information.
es4
Regression Equation: If the coefficient of correlation calculated for bivariate data (Xi, Yi), i = 1, 2,
ot
.n
...... n, is reasonably high and a cause and effect type of relation is also believed to be existing
w
w
between them, the next logical step is to obtain a functional relation between these variables.
w
1. Distinguish between correlation and regression. Discuss least square method of fitting
regression.
2. What do you understand by linear regression? Why there are two lines of regression?
Under what condition(s) can there be only one line?
3. What do you think as the reason behind the two lines of regression being different?
1. ei
2. Line of regression
3. Logistic regression
4. Coefficient of determination
Books Abrams, M.A., Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation , Baltimore: John Hopkins
University Press, 1943.
in
www.web-source.net
e.
fre
es4
ot
.n
w
w
w
//
s:
tp
ht
CONTENTS
Objectives
Introduction
in
e.
12.2.6 Compute fre
s4
12.2.7 Make Decisions
e
ot
12.5 P-values
12.6 Summary
12.7 Keywords
Objectives
After studying this unit, you will be able to:
Identify the steps involved in hypothesis testing;
Explain the statistical testing procedure;
Discuss the errors in hypothesis testing;
Explain the types of tests.
Introduction
A statistical hypothesis test is a method of making statistical decisions using experimental data.
In statistics, a result is called statistically significant if it is unlikely to have occurred by chance.
The phrase “test of significance” was coined by Ronald Fisher: “Critical tests of this kind may be
called tests of significance, and when such tests are available we may discover whether a second
sample is or is not significantly different from the first.”
Notes Hypothesis testing is sometimes called confirmatory data analysis, in contrast to exploratory
data analysis. In frequency probability, these decisions are almost always made using null-
hypothesis tests; that is, ones that answer the question. Assuming that the null hypothesis is
true, what is the probability of observing a value for the test statistic that is at least as extreme
as the value that was actually observed? One use of hypothesis testing is deciding whether
experimental results contain enough information to cast doubt on conventional wisdom.
If the researcher wants to infer something about the total population from which the sample was
taken, statistical methods are used to make inference. We may say that, while a hypothesis is
useful, it is not always necessary. Many a time, the researcher is interested in collecting and
analysing the data indicating the main characteristics without a hypothesis. Also, a hypothesis
may be rejected but can never be accepted except tentatively. Further evidence may prove it
wrong. It is wrong to conclude that since hypothesis was not rejected it can be accepted as valid.
in
e.
What is a Null Hypothesis? fre
s4
e
A null hypothesis is a statement about the population, whose credibility or validity the researcher
ot
.n
A null hypothesis is formulated specifically to test for possible rejection or nullification. Hence
w
//
the name ‘null hypothesis’. Null hypothesis always states “no difference”. It is this null hypothesis
s:
tp
1. Formulate the null hypothesis, with H0 and HA, the alternate hypothesis.
According to the given problem, H0 represents the value of some parameter of population.
The normal approach is to set two hypotheses instead of one, in such a way, that if one hypothesis
is true, the other is false. Alternatively, if one hypothesis is false or rejected, then the other is true
or accepted. These two hypotheses are:
(1) Null hypothesis
(2) Alternate hypothesis
Let us assume that the mean of the population is µo and the mean of the sample is x. Since we
have assumed that the population has a mean of µo, this is our null hypothesis. We write this as
Hoµ = µo, where Ho is the null hypothesis. Alternate hypothesis is HA=µ. The rejection of null Notes
hypothesis will show that the mean of the population is not µ o. This implies that alternate
hypothesis is accepted.
Having formulated the hypothesis, the next step is its validity at a certain level of significance.
The confidence with which a null hypothesis is accepted or rejected depends upon the significance
level. A significance level of say 5% means that the risk of making a wrong decision is 5%. The
researcher is likely to be wrong in accepting false hypothesis or rejecting a true hypothesis by 5
out of 100 occasions. A significance level of say 1% means, that the researcher is running the risk
of being wrong in accepting or rejecting the hypothesis is one of every 100 occasions. Therefore,
a 1% significance level provides greater confidence to the decision than 5% significance level.
in
A hypothesis test may be one-tailed or two-tailed. In one-tailed test the test-statistic for rejection
e.
of null hypothesis falls only in one-tailed of sampling distribution curve.
fre
es4
ot
.n
w
w
w
//
s:
tp
ht
Example:
In a right side test, the critical region lies entirely in the right tail of the sample
distribution. Whether the test is one-sided or two-sided – depends on alternate
hypothesis.
A tyre company claims that mean life of its new tyre is 15,000 km. Now the researcher
formulates the hypothesis that tyre life is = 15,000 km.
A two-tailed test is one in which the test statistics leading to rejection of null hypothesis falls on
both tails of the sampling distribution curve as shown.
When we should apply a hypothesis test that is one-tailed or two-tailed depends on the nature
of the problem. One-tailed test is used when the researcher’s interest is primarily on one side of
the issue.
Notes
Example:
“Is the current advertisement less effective than the proposed new advertisement”?
A two-tailed test is appropriate, when the researcher has no reason to focus on one
side of the issue.
Example: “Are the two markets – Mumbai and Delhi different to test market a product?”
A product is manufactured by a semi-automatic machine. Now, assume that the
same product is manufactured by the fully automatic machine. This will be two-
sided test, because the null hypothesis is that “the two methods used for
manufacturing the product do not differ significantly”.
H 0 = µ 1 = µ2
in
< One-sided to right
e.
fre
> One-sided to left
es4
It tells the researcher the number of elements that can be chosen freely.
w
w
//
s:
tp
Example: a+b/2 =5. fix a=3, b has to be 7. Therefore, the degree of freedom is 1.
ht
If the hypothesis pertains to a larger sample (30 or more), the Z-test is used. When the sample is
small (less than 30), the T-test is used.
12.2.6 Compute
Accepting or rejecting of the null hypothesis depends on whether the computed value falls in the
region of rejection at a given level of significance.
Task Discuss When Would You Prefer Two Tailed Test To One Tailed Test.
Self Assessment
3. A ………………test is one in which the test statistics leading to rejection of null hypothesis Notes
falls on both tails of the sampling distribution curve as shown.
4. ………………tells the researcher the number of elements that can be chosen freely.
(1) is called Type 1 error ( a), (2) is called Type 2 error ( b). When a =0.10 it means that true
hypothesis will be accepted in 90 out of 100 occasions. Thus, there is a risk of rejecting a true
hypothesis in 10 out of every 100 occasions. To reduce the risk, use a = 0.01 which implies that we
are prepared to take a 1% risk i.e., the probability of rejecting a true hypothesis is 1%. It is
also possible that in hypothesis testing, we may commit Type 2 error (b) i.e., accepting a null
hypothesis which is false.
in
e.
fre
s4
Notes The only way to reduce Type 1 and Type 2 error is by increasing the sample size.
e
ot
.n
w
Type 1 and Type 2 error is presented as follows. Suppose a marketing company has 2 distributors
s:
tp
(retailers) with varying capabilities. On the basis of capabilities, the company has grouped them
ht
into two categories (1) Competent retailer (2) Incompetent retailer. Thus R 1 is a competent
retailer and R2 is an incompetent retailer. The firm wishes to award a performance bonus (as a
part of trade promotion) to encourage good retailership. Assume that two actions A1 and A2
would represent whether the bonus or trade incentive is given and not given. This is shown as
follows:
When the firm has failed to reward a competent retailer, it has committed type-2 error. On the
other hand, when it was rewarded to an incompetent retailer, it has committed type-1error.
(1) Parametric tests are more powerful. The data in this test is derived from interval and ratio
measurement.
(2) In parametric tests, it is assumed that the data follows normal distributions. Examples of
parametric tests are (a) Z-Test, (b) T-Test and (c) F-Test.
Notes (3) Observations must be independent i.e., selection of any one item should not affect the
chances of selecting any others be included in the sample.
Univariate
If we wish to analyse one variable at a time, this is called univariate analysis. For example:
Effect of sales on pricing. Here, price is an independent variable and sales is a dependent
variable. Change the price and measure the sales.
Bivariate
in
12.4.2 Non-parametric Test
e.
fre
s4
Non-parametric tests are used to test the hypothesis with nominal and ordinal data.
e
ot
(3) The hypothesis of non-parametric test is concerned with something other than the value
tp
of a population parameter.
ht
(4) Easy to compute. There are certain situations particularly in marketing research, where
the assumptions of parametric tests are not valid. For example: In a parametric test, we
assume that data collected follows a normal distribution. In such cases, non-parametric
tests are used. Examples of non-parametric tests are (a) Binomial test (b) Chi-Square test
(c) Mann-Whitney U test (d) Sign test. A binominal test is used when the population has
only two classes such as male, female; buyers, non-buyers, success, failure etc. All
observations made about the population must fall into one of the two tests. The binomial
test is used when the sample size is small.
Advantages
Disadvantages
Non-parametric test involves the greater risk of accepting a false hypothesis and thus committing
a Type 2 error.
12.5 P-values
procedure you run in SPSS. It is used in two ways: (1) as a criterion level where you, the researcher Notes
have arbitrarily decided in advance to use as the cutoff where you reject the null hypothesis, in
which case, you would ordinarily say something like “setting p at p > .65 for one-tailed or two-
tailed tests of significance allows some confidence that 65% of the time, rejecting the null
hypothesis will not be in error”; and more commonly, (2) as a expression of inference
uncertainty after you have run some test statistic regarding the strength of some association or
relationship between your independent and dependent variables, in which case, you would say
something like “the evidence suggests there is a statistically significant effect, however, p < .05
also suggests that 5% of the time, we should be uncertain about the significance of drawing any
statistical inferences.”
Task A study was conducted to measure the motivation level of each of the category of
managers. Formulate a hypothesis, suggesting testing procedures to show that there is no
relation between the category of managers and the level of motivation.
Self Assessment
in
e.
fre
Fill in the blanks:
s4
e
5. To reduce the risk, use a = 0.01 which implies that we are prepared to take a …………..risk
ot
.n
7. ……………………test involves the greater risk of accepting a false hypothesis and thus
tp
12.6 Summary
Hypothesis testing is the use of statistics to determine the probability that a given
hypothesis is true.
The usual process of hypothesis testing consists of four steps.
Formulate the null hypothesis and the alternative hypothesis.
Identify a test statistic that can be used to assess the truth of the null hypothesis.
Compute the P-value, which is the probability that a test statistic at least as significant as
the one observed would be obtained assuming that the null hypothesis were true.
The smaller the -value, the stronger the evidence against the null hypothesis.
Compare the -value to an acceptable significance value a.
If , that the observed effect is statistically significant, the null hypothesis is ruled out, and
the alternative hypothesis is valid.
12.7 Keywords
Alternate Hypothesis: An alternative hypothesis is one that specifies that the null hypothesis is
not true. The alternative hypothesis is false when the null hypothesis is true, and true when the
null hypothesis is false.
Notes Null Hypothesis: The null hypothesis is a hypothesis which the researcher tries to disprove,
reject or nullify.
1. What hypothesis, test and procedure would you use when an automobile company has
manufacturing facility at two different geographical locations? Each location manufactures
two-wheelers of a different model. The customer wants to know if the mileage given by
both the models is the same or not. Samples of 45 numbers may be taken for this purpose.
2. What hypothesis, test and procedure would you use when a company has 22 sales
executives? They underwent a training programme. The test must evaluate whether the
sales performance is unchanged or improved after the training programme.
3. What hypothesis, test and procedure would you use A company has three categories of
managers:
(a) With professional qualifications but without work experience.
in
(b) With professional qualifications accompanied by work experience.
e.
(c) fre
Without professional qualifications but with work experience.
es4
1. Hypothesis
//w
2. Null hypothesis
s:
tp
3. Two-tailed
ht
4. Degree of freedom
5. 1%
6. Normal distributions
7. Non-parametric
8. P-value
Books Abrams, M.A, Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation , Baltimore: John Hopkins
University Press, 1943.
R.S. Bhardwaj, Business Statistics, Excel Books, New Delhi, 2008.
S.N. Murthy and U. Bhojanna, Business Research Methods, Excel Books, 2007.
CONTENTS
Objectives
Introduction
13.1 Small Sample Tests
13.1.1 T-test
13.1.2 Snedecor’s F-distribution
13.2 Large Sample Test
13.2.1 Z-test (Parametric Test)
13.2.2 Chi-square Test
13.2.3 ANOVA
13.3 Summary
in
e.
13.4 Keywords fre
s4
13.5 Review Questions
e
ot
Objectives
w
//
s:
tp
Introduction
Tests for statistical significance are used to estimate the probability that a relationship observed
in the data occurred only by chance; the probability that the variables are really unrelated in the
population. They can be used to filter out unpromising hypotheses. In research reports, tests of
statistical significance are reported in three ways. First, the results of the test may be reported in
the textual discussion of the results. Include:
1. Hypothesis
Notes strength (and sometimes the direction) of the relationship. Each has its use, and they are best
when used together.
13.1.1 T-test
T-test is used in the following circumstances: When the sample size n<30.
in
e.
In this text, the sample size is less than 30. Standard deviations are not known using this test. We
fre
can find out if there is any significant difference between the two means i.e. whether the two
es4
Let X1, X2 ...... Xn be n independent random variables from a normal population with mean m and
s:
tp
1 2
When s is not known, it is estimated by s, the sample standard deviation s
n1
X i X .
X
In such a case we would like to know the exact distribution of the statistic and the answer
s/ n
to this is provided by t-distribution.
X
W.S. Gosset defined t statistic as t which follows t - distribution with (n–1) degrees of
s/ n
freedom.
Features of t-distribution
1. Like c2- distribution, t-distribution also has one parameter n = n–1, where n denotes
sample size. Hence, this distribution is known if n is known.
2. Mean of the random variable t is zero and standard deviation is , for n > 2.
2
5. The random variate t is defined as the ratio of a standard normal variate to the square root Notes
2
of - variate divided by its degrees of freedom.
X n X X
/ n / n
t =
s 2
s / 2 2
1 Xi X
n1 2
X
= / n Standard Normal Variate
in
2
2 -variate
e.
n 1
n1 fre
e s4
ot
.n
w
w
// w
s:
tp
ht
Illustration: There are two nourishment programmes ‘A’ and ‘B’. Two groups of children are
subjected to this. Their weight is measured after six months. The first group of children subjected
to the programme ‘A’ weighed 44,37,48,60,41 kgs. at the end of programme. The second group of
children were subjected to nourishment programme ‘B’ and their weight was 42, 42, 58, 64, 64, 67,
62 kgs. at the end of the programme. From the above, can we conclude that nourishment
programme ‘B’ increased the weight of the children significantly, given a 5% level of confidence.
Null Hypothesis: There is no significant difference between Nourishment programme ‘A’ and
‘B’.
Solution:
Quantitative Techniques-II
xx x x yy y y
= (x-46) =(y-57)
44 -2 4 42 -15 225
37 -9 81 42 -15 225
Notes
48 2 4 58 1 1
60 14 196 64 7 49
41 -5 25 64 7 49
67 10 100
62 5 25
230 0 310 399 0 674
xy
t =
1 1
s2
n 1 n 2
Here n1 = 5 n1 = 7
x = 230 y 399
in
2 2
x x = 310 y y 674
e.
fre
s4
x 230 46
e
ot
x = n1 5
.n
w
w
w
y 399 57
//
y =
s:
n2 7
tp
ht
1
s2 = n n 2
1 2
x x y y
2 2
1
s2 = 310 674 98.4
10
46 57
t = 1 1
98.4
5 7
11
= 12
98.4
35
11 11
=
33.73 5.8
= 1.89
t at 10 d.f. at 5% level is 1.81.
Since, calculated t is greater than 1.81, it is significant. Hence HA is accepted. Therefore the two
nutrition programmes differ significantly with respect to weight increase.
Let there be two independent random samples of sizes n1 and n2 from two normal
1 2
2
populations with variances 12 and 22 respectively. Further, let s1
n1 1
X 1i X 1 and
1 2
s 22
n2 1
X 2i X 2 be the variances of the first sample and the second samples respectively.
Then F - statistic is defined as the ratio of two c2 - variates. Thus, we can write
2n 1 1 n 1 1 s12 / n 1
s12
n 1 2
1
2
F 12 1
21
n2 1 n 2 1 s 2 / n 1 s 2
2
22
2 2
n2 1 2
Features of F-distribution
in
e.
fre
1. This distribution has two parameters n1 (= n1 - 1) and n2 (= n2 - 1).
e s4
ot
2
.n
2. The mean of F - variate with n1 and n2 degrees of freedom is 2 and standard error is
w
2
w
w
2 2 1 2 2
//
s:
2 .
1 2 4
tp
2
ht
Notes We note that the mean will exist if 2 > 2 and standard error will exist if 2 > 4.
Further, the mean > 1.
3. The random variate F can take only positive values from 0 to .
4. For large values of 1 and 2, the distribution approaches normal distribution.
5. If a random variate follows t-distribution with n degrees of freedom, then its square
follows F-distribution with 1 and n d.f. i.e. t2n = F1,
( )
2 1
6. F and are also related as F1,2 = as 2
1
P1 = Proportion in sample 1
P2 = Proportion in sample 2
Example: You are working as a purchase manager for a company. The following
information has been supplied by two scooter tyres manufacturers.
Company A Company B
Mean life (in km) 13000 12000
S.D (in km) 340 388
Sample size 100 100
in
e.
In the above, the sample size is 100, hence a Z-test may be used.
fre
s4
(b) Testing the hypothesis about difference between two means: This can be used when two
e
ot
Example: In a city during the year 2000, 20% of households indicated that they read
//
‘Femina’ magazine. Three years later, the publisher had reasons to believe that circulation has
s:
tp
gone up. A survey was conducted to confirm this. A sample of 1,000 respondents were contacted
ht
and it was found 210 respondents confirmed that they subscribe to the periodical ‘Femina’. From
the above, can we conclude that there is a significant increase in the circulation of ‘Femina’?
Solution:
We will set up null hypothesis and alternate hypothesis as follows:
Null Hypothesis is H0. µ = 15%
Alternate Hypothesis is HA. µ > 15%
This is a one-tailed (right) test.
P-
Z =
1
n
210
-0.20
Z = 1000
0.20 1 0.20
1000
0.21-0.20
Z =
0.2 0.8
1000
0.01-
=
0.16
1000
Notes
0.1
=
0.4
31.62
0.1
= = 8.33
0.012
As the value of Z at 0.05 =1.64 and calculated value of Z falls in the rejection region, we reject null
hypothesis, and therefore we conclude that the sale of ‘Femina’ has increased significantly.
With the help of this test, we will come to know whether two or more attributes are associated
or not. How much the two attributes are related cannot be by Chi-Square test. Suppose, we have
certain number of observations classified according to two attributes. We may like to know
whether a newly introduced medicine is effective in the treatment of certain disease or not.
!
in
e.
Caution One case where the distribution of the test statistic is an exact chi-square
fre
distribution is the test that the variance of a normally-distributed population has a given
e s4
value based on a sample variance. Such a test is uncommon in practice because values of
ot
The numbers of automobile accidents per week in a certain city were as follows:
w
//
s:
Months Jan Feb March April May June July Aug Sep Oct
tp
ht
No. of accidents 12 8 20 2 14 10 15 6 9 4
Does the above data indicate that accident conditions were uniform during the 10- month period.
100
Expected frequency 12 8 20 2 14 10 15 6 9 4 10
10
Computation
Null hypothesis: The accident occurrence is uniform over a 10-week period.
Month
Observed No. Expected No.
O–E
O E2
(O – E)2
of accidents of accidents E
1 12 10 2 4 0.4
2 8 10 -2 4 0.4
3 20 10 10 100 10.0
4 2 10 -8 64 6.4
5 14 10 4 16 1.6
6 10 10 0 0 0.0
7 15 10 5 25 2.5
8 6 10 -4 16 1.6
9 9 10 -1 1 0.1
10 4 10 -6 36 3.6
100 100 0 26.6
Notes
O E2 (26.6)
x2 = E
D.F = 10 - 1 = 9
Table value at 5% for 9 degree of freedom = 16.91
Since calculated value = 26.6 greater than table value of 19.19, null hypothesis rejected at 5%
level of significance.
Conclusion: The accident occurring are not uniform over a 10-week period.
Task What hypothesis, test and procedure would you use in the following situation?
in
wants to know if the mileage given by both the models is the same or not. Samples
e.
of 45 numbers may be taken for this purpose.
fre
s4
2. A company has 22 sales executives. They underwent a training programme. The test
e
ot
must evaluate whether the sales performance is unchanged or improved after the
.n
training programme.
w
w
Self Assessment
2. With the help of ……………. test, we will come to know whether two or more attributes
are associated or not.
13.2.3 ANOVA
(a) ANOVA: It is a statistical technique. It is used to test the equality of three or more sample
means. Based on the means, inference is drawn whether samples belongs to same
population or not.
(b) Conditions for using ANOVA:
(1) Data should be quantitative in nature.
(2) Data normally distributed.
(3) Samples drawn from a population follows random variation.
(c) ANOVA can be discussed in two parts:
(1) One-way classification
(2) Two and three-way classification.
in
(1) To compare the mileage achieved by different brands of automotive fuel.
e.
fre
(2) Compare the first year earnings of graduates of half a dozen top business schools.
e s4
ot
Consider the following pricing experiment. Three prices are considered for a new toffee box
w//
introduced by Nutrine company. Price of three varieties of toffee boxes are 39, 44 and 49.
s:
tp
The idea is to determine the influence of price levels on sales. Five supermarkets are selected to
ht
What the manufacturer wants to know is: (1) Whether the difference among the means is significant?
If the difference is not significant, then the sale must be due to chance. (2) Do the means differ?
(3) Can we conclude that the three samples are drawn from the same population or not?
Two-way ANOVA
The procedure to be followed to calculate variance is the same as it is for the one-way classification.
The example of two-way classification of ANOVA is as follows:
Example: A firm has four types of machines – A , B, C and D. It has put four of its workers
on each machines for a specified period, say one week. At the end of one week, the average
output of each worker on each type of machine was calculated. These data are given below:
Illustration: Company ‘X’ wants its employees to undergo three different types of training
programme with a view to obtain improved productivity from them. After the completion of
the training programme, 16 new employees are assigned at random to three training methods
and the production performance was recorded.
The training manager’s problem is to find out if there are any differences in the effectiveness of
the training methods? The data recorded is as under:
in
e.
Method 1 15 18 19 22 11
fre
Method 2 22 27 18 21 17
e s4
Method 3 18 24 19 16 22 15
ot
.n
w
w
w
Sample variance si 2
x i x where n is No. of observation under each method.
n1
7. Calculate the number of degree of freedom in the numerator F ratio using equation, d.f =
(No. of samples -1).
8. Calculate the number of degree of freedom in the denominator of F ratio using the equation
d.f = n i k
Solution: Notes
85 105 114
x1 = 17 x 2 21 x 3 19
5 5 6
2. Grand mean
in
15 18 19 22 11 22 27 18 21 17 24 19 16 22 15 18
e.
= x fre
16
s4
304
e
19
ot
=
16
.n
w
2 2
n x x xx x x
n xx
tp
ht
5 17 19 -2 4 5 × 4 = 20
5 21 19 2 4 5 × 4 = 20
6 19 19 0 0 6×0=0
2
n x
i 1 x = 40
2
n x i i x 40
= 20
k 1 31
Notes
2 2 2
x x
70 x x
62 x x
60
Sample variance =
n1 51, n1 51, n1 61
2 70 2 62 2 60
s
1 17.5 s
2 15.5 s
3 12
4 4 5
2 ni 1 2
5. Within column variance s1
n i k
4 4 5
= 17.5 15.5 12
13 13 13
192
Within column variance 14.76
in
13
e.
fre
Between column variance 20
s4
6. F= 1.354
Within column variance 14.76
e
ot
.n
7. d.f of Numerator = (3 – 1) = 2.
w
w
w
1
s:
tp
10. The value is 3.81. This is the upper limit of acceptance region. Since calculated value 1.354
lies within it we can accept H0, the null hypothesis.
Conclusion: There is no significant difference in the effect of the three training methods.
Self Assessment
13.3 Summary
Testing the hypothesis about difference between two means: This can be used when two
population means are given and null hypothesis is Ho : P1 = P2.
ANOVA is a statistical technique. It is used to test the equality of three or more sample
means. Based on the means, inference is drawn whether samples belongs to same
population or not.
13.4 Keywords
ANOVA: It is a statistical technique. It is used to test the equality of three or more sample means.
Based on the means, inference is drawn whether samples belongs to same population or not.
Significance Level: Significance level is the criterion used for rejecting the null hypothesis. Notes
Tests for statistical significance: Tests for statistical significance are used to estimate the
probability that a relationship observed in the data occurred only by chance; the probability
that the variables are really unrelated in the population.
1. Each person in a random sample of 50 was asked to state his/her sex and preferred colour.
The resulting frequencies are shown below.
A chi-square test is used to test the null hypothesis that sex and preferred colour are
independent. Will you reject at the null hypothesis 0.005 level? Why/ Why not?
2. Are all employees equally prone to having accidents? To investigate this hypothesis,
in
e.
Parry (1985) looked at a light manufacturing plant and classified the accidents by type and
fre
by age of the employee.
es4
ot
Accident Type
.n
Age
Sprain Burn Cut
w
w
Under 25
w
9 17 5
//
s:
25 or over 61 13 12
tp
ht
A chi-square test gave a test-statistic of 20.78. If we test at a =.05, does the proportion of
sprain, cuts and burns seems to be similar for both age classes? Why/ why not?
3. In hypothesis testing, if is the probability of committing an error of Type II. The power of
the test, is then the probability of rejecting H0 when HA is true or not? Why?
4. In a statistical test of hypothesis, what would happen to the rejection region if , the level of
significance, is reduced?
5. During the pre-flight check, Pilot Mohan discovers a minor problem - a warning light
indicates that the fuel gauge may be broken. If Mohan decides to check the fuel level by
hand, it will delay the flight by 45 minutes. If he decides to ignore the warning, the aircraft
may run out of fuel before it gets to Mumbai. In this situation, what would be:
(a) the appropriate null hypothesis? and;
(b) a type I error?
2. Chi-square
3. Quantitative
4. 2
Books Abrams, M.A, Social Surveys and Social Action, London: Heinemann, 1951.
Arthur, Maurice, Philosophy of Scientific Investigation , Baltimore: John Hopkins
University Press, 1943.
in
www.web-source.net
e.
fre
es4
ot
.n
w
w
w
//
s:
tp
ht
CONTENTS
Objectives
Introduction
in
14.6 Summary
e.
fre
14.7 Keywords
s4
e
ot
Objectives
s:
tp
ht
Introduction
As the name indicates, multivariate analysis comprises a set of techniques dedicated to the
analysis of data sets with more than one variable. Several of these techniques were developed
recently in part because they require the computational capabilities of modern computers.
Multivariate analysis (MVA) is based on the statistical principle of multivariate statistics, which
involves observation and analysis of more than one statistical variable at a time. In design and
analysis, the technique is used to perform trade studies across multiple dimensions while taking
into account the effects of all variables on the responses of interest. Sometimes, the marketers
will come across situations, which are complex involving two or more variables. Hence,
bi-variate analysis deals with this type of situation. Chi-Square is an example of bi-variate
analysis. In multi-variate analysis, the numbers of variables to be tackled are many.
Example: The demand for television sets may depend not only on price, but also on the
income of households, advertising expenditure incurred by TV manufacturer and other similar
factors. To solve this type of problem, multivariate analysis is required.
Notes
In this analysis, two or more groups are compared. In the final analysis, we need to find out
whether the groups differ one from another.
in
e.
1. Those who buy our brand and those who buy competitors’ brand.
fre
2. Good salesman, poor salesman, medium salesman.
es4
3. Those who go to Food World to buy and those who buy in a Kirana shop.
ot
.n
Suppose there is a comparison between the groups mentioned as above along with demographic
w
and socio-economic factors, then discriminant analysis can be used. One way of doing this is to
//
s:
proceed and calculate the income, age, educational level, so that the profile of each group could
tp
ht
be determined. Comparing the two groups based on one variable alone would be informative
but it would not indicate the relative importance of each variable in distinguishing the groups.
This is because several variables within the group will have some correlation which means that
one variable is not independent of the other.
If we are interested in segmenting the market using income and education, we would be
interested in the total effect of two variables in combinations, and not their effects separately.
Further, we would be interested in determining which of the variables are more important or
had a greater impact. To summarize, we can say, that Discriminant Analysis can be used when
we want to consider the variables simultaneously to take into account their interrelationship.
Like regression, the value of dependent variable is calculated by using the data of independent
variable.
Z = b1x1 + b2x + b3x3 +..............
Z = Discriminant score
b 1 = Discriminant weight for variable
x = Independent variable
As can be seen in the above, each independent variable is multiplied by its corresponding
weightage.
This results in a single composite discriminant score for each individual. By taking the average
of discriminant score of the individuals within a certain group, we create a group mean. This is
known as centroid. If the analysis involves two groups, there are two centroids. This is very
similar to multiple regression, except that different types of variables are involved.
Application: A company manufacturing FMCG products introduces a sales contest among its
marketing executives to find out “How many distributors can be roped in to handle the company’s
product”. Assume that this contest runs for three months. Each marketing executive is given
target regarding number of new distributors and sales they can generate during the period. This
target is fixed and based on the past sales achieved by them about which, the data is available in Notes
the company. It is also announced that marketing executives who add 15 or more distributors
will be given a Maruti omni-van as prize. Those who generate between 5 and 10 distributors
will be given a two-wheeler as the prize. Those who generate less than 5 distributors will get
nothing. Now assume that 5 marketing executives won a Maruti van and 4 won a two-wheeler.
The company now wants to find out, “Which activities of the marketing executive made the
difference in terms of winning a prize and not winning the prize”. One can proceed in a number
of ways. The company could compare those who won the Maruti van against the others.
Alternatively, the company might compare those who won, one of the two prizes against those
who won nothing. It might compare each group against each of the other two.
Discriminant analysis will highlight the difference in activities performed byeach group members
to get the prize. The activity might include:
1. More number of calls made to the distributors.
2. More personal visits to the distributors with advance appointments.
3. Use of better convincing skills.
Discriminant Analysis
in
e.
1.
fre
What variable discriminates various groups as above; the number of groups could be two
s4
or more. Dealing with more than two groups is called Multiple Discriminant Analysis
e
(M.D.A).
ot
.n
2. Can discriminating variables be chosen to forecast the group to which the brand/person/
w
w
Self Assessment
Example: Common factor – Inconvenience inside a car. The components may be:
1. Leg room.
2. Seat arrangement.
in
The questionnaire may be administered to 5,000 respondents. The opinion of the customer is
e.
gathered. Let us allot points 1 to 10 for the variables factors A to F. 1 is the lowest and 10 is the
fre
highest. Let us assume that application of factor analysis has led to grouping the variables as
s4
follows:
e
ot
A, B, D, E into factor – 1
.n
w
w
F into Factor -2
w
//
C into Factor - 3
s:
tp
in
e.
Where can Cluster Analysis be applied? fre
es4
segment sizes. Industries, where this technique is useful include automobiles, retail stores,
w
insurance, B-to-B, durables and packaged goods. Some of the well-known frameworks in consumer
w
w
An FMCG company wants to map the profile of its target audience in terms of lifestyle,
attitude and perceptions.
A consumer durable company wants to know the features and services a consumer takes
into account, when purchasing through catalogues.
A housing finance corporation wants to identify and cluster the basic characteristics,
lifestyles and mindset of persons who would be availing housing loans. Clustering can be
done based on parameters such as interest rates, documentation, processing fee, number
of installments etc.
Process
There are two ways in which Cluster Analysis can be carried out:
1. First, objects/respondents are segmented into a pre-decided number of clusters. In this
case, a method called non-hierarchical method can be used, which partitions data into the
specified number of clusters
2. The second method is called the hierarchical method.
The above two are basic approaches used in cluster analysis. This can be used to segment
customer groups for a brand or product category, or to segment retail stores into similar groups
based on selected variables.
Interpretation of Results
Ideally, the variables should be measured on an interval or ratio scale. This is because the
clustering techniques use the distance measure to find the closest objects to group into a cluster.
An example of its use can be clustering of towns similar to each other which will help decide
in
3. Computing the similarities among the entities.
e.
fre
4. Arrange the cluster in a hierarchy.
s4
e
The example below shows Cluster Analysis based on three dimensions age, income and family
tp
size. Cluster Analysis is used to segment the car-buying population in a Metro. For example “A”
ht
might represent potential buyers of low end cars. Example: Maruti 800 (for common man).
These are people who are graduating from the two-wheeler market segment. Cluster “B” may
represent mid-population segment buying Zen, Santro, Alto etc. Cluster “C” represents car
buyers, who belong to upper strata of society. Buyers of Lancer, Honda city etc. Cluster “D”
represents the super-rich cluster, i.e. Buyers of Benz, BMW etc.
Figure 14.1: Matching Measure
Example: Suppose there are five attributes, 1 to 5, on which we are judging two objects
A and B. The existence of an attribute may be indicated by 1 and its absence by 0. In this way, two
objects are viewed as similar if they share common attributes.
Notes
Table
Attribute 1 2 3 4 5 6 7
Brand – A 1 0 0 1 0 0 1
Brand – B 0 0 1 1 1 0 0
ad
S =
abcd
Where a = No. of attributes possessed by brands A and B
b = No. of attributes possessed by brand A but not by brand B
c = No. of attributes possessed by brand B but not by brand A
d = No. of attributes not possessed by both brands.
1 2 3
Substituting, we get S = 0.43
in
1 2 2 2 7
e.
A and B’s association is to be the extent of 43%. fre
s4
It is now clear that object A possess attributes 1, 4, and 7 while object B possess the attributes 3,
e
ot
4 and 5. A glance at the above table will indicate that objects A and B are similar in respect of 2
.n
(0 & 0), 6 (0 & 0) and 4 (1 & 1). In respect of other attributes, there is no similarity between A and
w
w
B. Now we can arrive at a simple matching measure by (a) counting up the total number of
w
//
matches – either 0, 0 or 1, (b) dividing this number by the total number of attributes.
s:
tp
Symbolically SAB = M / N
ht
Self Assessment
Conjoint analysis is concerned with the measurement of the joint effect of two or more attributes
that are important from the customers’ point of view. In a situation where the company would
like to know the most desirable attributes or their combination for a new product or service, the
use of conjoint analysis is most appropriate.
Notes
Example: An airline would like to know, which is the most desirable combination of
attributes to a frequent traveller: (a) Punctuality (b) Air fare (c) Quality of food served on the
flight and (d) Hospitality and empathy shown.
Conjoint Analysis is a multivariate technique that captures the exact levels of utility that an
individual customer places on various attributes of the product offering. Conjoint Analysis
enables a direct comparison.
Example: A comparison between the utility of a price level of 400 versus 500, a
delivery period of 1 week versus 2 weeks, or an after-sales response of 24 hours versus 48 hours.
Once we know the utility levels for each attribute (and at individual levels as well), we can
combine these to find the best combination of attributes that gives the customer the highest
utility, the second best combination that gives the second highest utility, and so on. This
information is then used to design a product or service offering.
Application
in
e.
Conjoint Analysis is extremely versatile and the range of applications includes virtually in any
fre
industry. New product or service design, including the concepts in the pre-prototyping stage
es4
Some examples of other areas where this technique can be used are:
w
w
tp
ht
Process
Design attributes for a product are first identified. For a shirt manufacturer, these could be
design such as designer shirts Vs plain shirts, this price of 400 versus 800. The outlets can have
exclusive distribution or mass distribution. All possible combinations of these attribute levels
are then listed out. Each design combination will be ranked by customers and used as input data
for Conjoint Analysis. Then the utility of the products relative to price can be measured.
The output is a part-worth or utility for each level of each attribute. For example, the design may
get a utility level of 5 and plain, 7.5. Similarly, the exclusive distribution may have a part utility
of 2, and mass distribution, 5.8. We then put together the part utilities and come up with a total
utility for any product combination we want to offer, and compare that with the maximum
utility combination for this customer segment.
This process clarifies to the marketer about the product or service regarding the attributes that
they should focus on in the design.
If a retail store finds that the height of a shelf is an important attribute for selling at a particular
level, a well-designed shelf may result from this knowledge. Similarly, a designer of clocks will
benefit from knowing the utility attached by customers to the dial size, background colours, and
price range of the clocks.
Approach
From a discussion with the client, identify the design attributes to be studied and the levels at
which they can be offered. Then build a list of product concepts on offer. These product concepts
are then ranked by customers. Once this data is available, use Conjoint Analysis to derive the
part utilities of each attribute level. This is then used to predict the best product design for the
given customer segment. Use the SPSS Conjoint procedure to analyse the data.
in
e.
1 = Most preferred, 8= Least preferred fre
s4
Combination Rank
e
ot
One combination 3 kg, 4 hours, Dell clearly dominates and 5 kg, 2 hours, Lenovo is least
preferred.
Let us now take the average rank for 3 kg option = 4+3+2+1 / 4 = 2.5
For 5 kg option average rank is 5+8+7+6 / 4 = 6.5
For 4 hour option 5+3+7+1 / 4 = 4
For 2 hour option 4+8+2+6 / 4 = 5
For Dell 5+6+1+2 / 4 = 3.5
For Lenovo 5.5
Looking at the difference in average ranks, the most important characteristic to this
respondent is weight = 4, followed by brand name = 2 and battery life = 1.
Self Assessment
In addition to fulfilling the goals of detecting underlying structure and data reduction that is
shares with other methods, multidimensional scaling (MDS) provides the researcher with a
spatial representation of data that can facilitate interpretation and reveal relationships. Therefore,
we can define MDS as “a set of multivariate statistical methods for estimating the parameters in
and assessing the fit of various spatial distance models for proximity data.”
The spatial display of data provided by MDS is why it is also sometimes referred to as perceptual
mapping. MDS has much more flexibility about the types of data that can be used to generate the
solution. Almost any measures of similarity and dissimilarity can be used, depending on what
your statistical computer software will accept.
in
2. Non-metric
e.
fre
Metric MDS makes the assumption that the input data is either ratio or interval data, while the
s4
non-metric model requires simply that the data be in the form of ranks. Therefore, the non-
e
metric model has more fewer restrictions than the metric model, but also less rigor. One technique
ot
.n
to use if you are unsure whether your data is ordinal or can be considered interval is to try both
w
metric and non-metric models. If the results are very close, the metric model may be used.
w
w
An advantage of the non-metric models is that they permit the researcher to categorize and
//
s:
examine preference data, such as the kind obtained in marketing studies or other areas where
tp
ht
We have already seen that MDS can accept more different measures of similarity and dissimilarity
than factor analysis techniques can. In addition, there are some differences in terminology.
These differences reflect the origin of MDS in the field of psychology. The measure corresponding
to factors are called alternatively dimensions or stimulus coordinates.
The output of MDS looks very similar to that of factor analysis and the determination of the
optimal number of dimensions is handled in much the same way.
MDS solution. When the results are mapped in two dimensions, the solution will reproduce a Notes
conventional map, except that the MDS plot might need to be rotated so that the north-south and
east-west dimensions conform to expectations. However, the once the rotation is completed, the
configuration of the cities will be spatially correct.
Task Which technique would you use to measure the joint effect of various attributes
while designing an automobile loan and why?
Self Assessment
in
e.
14.6 Summary fre
s4
Some of the multi variate analysis are discriminant analysis, Factor analysis, Cluster
.n
w
In discriminant analysis, it is verified whether the 2 groups differ from one another.
//
s:
Factor analysis is used to reduce large no of various factors into fewer variables cluster
tp
ht
14.7 Keywords
Cluster analysis: Cluster Analysis is a technique used for classifying objects into groups.
Discriminant analysis: In this analysis, two or more groups are compared. In the final analysis,
we need to find out whether the groups differ one from another.
Multivariate analysis: In multi variate analysis, the number of variables to be tackled are
many.
Notes 7. Which multivariate analysis would you apply to identify specific customer segment for a
company’s brand and why?
1. Multivariate
2. Categorize, examine
3. Discriminant
4. Increases
5. Descriptive/exploratory
6. Standardized
7. Cluster
8. Marketing
in
e.
9. Conjoint fre
s4
10. Attributes
e
ot
.n
12. Output
w
//
s:
tp