0% found this document useful (0 votes)
123 views33 pages

Research Methodology 4

This document provides an overview of data analysis techniques for research methodology. It discusses univariate analysis which involves a single variable, bivariate analysis which involves two variables, and multivariate analysis which involves three or more variables. Specific techniques are described for each type of analysis including frequency distributions, measures of central tendency, hypothesis testing, ANOVA, correlation, regression, and various graphical and statistical methods. The objectives are to understand concepts like data analysis, hypothesis testing, and specific multivariate techniques like discriminant analysis, factor analysis, and cluster analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views33 pages

Research Methodology 4

This document provides an overview of data analysis techniques for research methodology. It discusses univariate analysis which involves a single variable, bivariate analysis which involves two variables, and multivariate analysis which involves three or more variables. Specific techniques are described for each type of analysis including frequency distributions, measures of central tendency, hypothesis testing, ANOVA, correlation, regression, and various graphical and statistical methods. The objectives are to understand concepts like data analysis, hypothesis testing, and specific multivariate techniques like discriminant analysis, factor analysis, and cluster analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Research Methodology 65

Unit 8 - Data analysis and hypothesis testing


Notes
Structure
8.1 Introduction
8.2 Analysis
8.3 ANOVA and Design of Experiments
8.4 Correlation and Regression: Explaining Association and Causation
8.5 Discriminant Analysis for Classification and Prediction
8.6 Factor Analysis for Data Reduction
8.7 Cluster Analysis for market segmentation
8.8 Conjoint Analysis for Product Design
8.9 Hypothesis Testing
8.10 Summary

Objectives
 To understand Concept of data analysis
 Define Simple tabulation and cross tabulation
 Discuss ANOVAs and design of Experiments
 Illustrate Correlation and Regression: Explaining Association and Causation
 To understand the concept of Discriminate Analysis for Classification and prediction
 Discuss Factor analysis for data reduction
 Discuss Cluster Analysis for Market Segmentation
 Introduce Conjoint analysis for product design
 Concept of Hypothesis Testing

8.1 Introduction
Once the data has been collected in the form of filled up questionnaires, the next step
is to process it. This can be done either manually or with the help of questionnaire. Some
of the packages which we can use for analysis purpose are like SAS, SPSS, STATISTICA
and SYSTAT .This chapter focuses on complete analysis part of collected data. On the
basis of different variables approaches of analysis will be differ.

8.2 Analysis
Analysis of data is the process by which data is converted into useful information. Raw
data as collected from questionnaire cannot be used unless it is processed in some way to
make it amenable to drawing conclusion. Various techniques of data collection are available.

Types of Analysis
1. Univariate, involving a single variable at a time.
2. Bivariate, involving two variables at a time.
3. Multivariate, involving three or more variables at a time.
The choice of which of above types of data analysis to use depend on at least three
factors viz.
Amity Directorate of Distance & Online Education
66 Research Methodology
a. Scale of Data

Notes b. The research design


c. Assumption about test statistic being used

Univariate and Bivariate Analysis:


Univaraiate analysis is a single-variable analysis. In a questionnaire based marketing
research project, each question usually represents a variable under study. Bivariate analysis
involves two variables at a time. Two different questions may represent two variables. If
these two variables are analyzed together, it is an example of bivariate analysis

Univariate analysis is the simplest form of quantitative analysis in market research.


The analysis is carried out with the description of a single variable and its attributes of the
applicable description of analysis For example, if the variable age was the subject of the
analysis, the researcher would look at how many subjects fall into a given age attribute
categories.

Univariate analysis contrasts with bivariate analysis - the analysis of two variables
simultaneously - or multivariate analysis - the analysis of multiple variables simultaneously.
Univariate analysis is also used primarily for descriptive purposes, while bivariate and
multivariate analysis is geared more towards explanatory purposes. Univariate analysis is
commonly used in the first stages of research, in analyzing the data at hand, before being
supplemented by more advance, inferential bivariate or multivariate analysis.

A basic way of presenting univariate data is to create a frequency distribution of the


individual cases, which involves presenting the number of attributes of the variable studied
for each case observed in the sample. This can be done in a table format, with a bar chart
or a similar form of graphical representation. There are several tools used in univariate
analysis; their applicability depends on whether we are dealing with a continuous variable
(such as age) or a discrete variable (such as gender).

In addition to frequency distribution, univariate analysis commonly involves reporting


measures of central tendency (location). This involves describing the way in which
quantitative data tend to cluster around some value. In the univariate analysis, the measure
of central tendency is an average of a set of measurements, the word average being
variously construed as (arithmetic) mean, median, mode or other measure of location,
depending on the context.

Another set of measures used in the univariate analysis, complementing the study of
the central tendency, involves studying the statistical dispersion .These measurements
look at how the values are distributed around values of central tendency. The dispersion
measures most often involve studying the range; inter quartile range, and the standard
deviation.

Amity Directorate of Distance & Online Education


Research Methodology 67
Classifications of Univariate data analysis.

Notes
Univarite Data

Nonmetric Data Meteric Data

One Two or more One Tow or More


Sample Sample Sample sample

Frequency
Chi Square T-test z test
K-S Runs Independent Related
Binomial

Chi Square Sign Wilcoxon Independent Related


mann-Whitney Mcnemar Chi
median K-S Square
K-W ANOVA Two-Group
Paired t test
t test. z test,
One Way
ANOVA

Multivariate Analysis

What is multivariate analysis?

Multivariate analysis is the analysis of the simultaneous relationships among three or


more phenomena. While in a univariate analysis the focus is on the level (average) and
distribution (variance) of the phenomenon, while in a bivariate analysis the focus shifts
to the degree of relationships (correlations or covarainces) between the phenomenon.
In a multivariate analysis, the focus shifts from paired relationships to the more complex
simultaneous relationships among phenomenon.

Multivariate analysis (MVA) is based on the statistical principle of multivariate


statistics, which involves observation and analysis of more than one statistical variable
at a time. In design and analysis, the technique is used to perform trade studies
across multiple dimensions while taking into account the effects of all variables on
the responses of interest.

Uses for multivariate analysis include:

* Design for capability (also known as capability-based design)


* Inverse design, where any variable can be treated as an independent variable

Amity Directorate of Distance & Online Education


68 Research Methodology

* Analysis of alternatives, the selection of concepts to fulfill a customer need


Notes * Analysis of concepts with respect to changing scenarios
* Identification of critical design drivers and correlations across hierarchical levels
Multivariate analysis can be complicated by the desire to include physics-based
analysis to calculate the effects of variables for a hierarchical “system-of-systems.” Often,
studies that wish to use multivariate analysis are stalled by the dimensionality of the
problem. These concerns are often eased through the use of surrogate models, highly
accurate approximations of the physics-based code. Since surrogate models take the form
of an equation, they can be evaluated very quickly. This becomes an enabler for large-
scale MVA studies: while a Monte Carlo simulation across the design space is difficult with
physics-based codes, it becomes trivial when evaluating surrogate models, which often
take the form of response surface equations.

Decision Analyst
In order to understand multivariate analysis, it is important to understand some of the
terminology. A variate is a weighted combination of variables. The purpose of the analysis
is to find the best combination of weights. Nonmetric data refers to data that are either
qualitative or categorical in nature. Metric data refers to data that are quantitative, and
interval or ratio in nature.

Initial step-data quality


Before launching into an analysis technique, it is important to have a clear
understanding of the form and quality of the data. The form of the data refers to
whether the data are nonmetric or metric. The quality of the data refers to how
normally distributed the data are. The first few techniques discussed are sensitive
to the linearity, normality, and equal variance assumptions of the data. Examinations
of distribution, skewness, and kurtosis are helpful in examining distribution. Also,
it is important to understand the magnitude of missing values in observations and
to determine whether to ignore them or impute values to the missing observations.
Another data quality measure is outliers, and it is important to determine whether
the outliers should be removed. If they are kept, they may cause a distortion to the
data; if they are eliminated, they may help with the assumptions of normality. The
key is to attempt to understand what the outliers represent.

Multiple Regression Analysis


Multiple regressions are the most commonly utilized multivariate technique. It examines
the relationship between a single metric dependent variable and two or more metric
independent variables. The technique relies upon determining the linear relationship with
the lowest sum of squared variances; therefore, assumptions of normality, linearity, and
equal variance are carefully observed. The beta coefficients (weights) are the marginal
impacts of each variable, and the size of the weight can be interpreted directly. Multiple
regression is often used as a forecasting tool.

Logistic Regression Analysis


Sometimes referred to as “choice models,” this technique is a variation of multiple
regression that allows for the prediction of an event. It is allowable to utilize nonmetric
(typically binary) dependent variables, as the objective is to arrive at a probabilistic
assessment of a binary choice. The independent variables can be either discrete or
continuous. A contingency table is produced, which shows the classification of observations
as to whether the observed and predicted events match. The sum of events that were
predicted to occur which actually did occur and the events that were predicted not to occur
Amity Directorate of Distance & Online Education
Research Methodology 69
which actually did not occur, divided by the total number of events, is a measure of the
effectiveness of the model. This tool helps predict the choices consumers might make Notes
when presented with alternatives.

Discriminant Analysis
The purpose of discriminant analysis is to correctly classify observations or people
into homogeneous groups. The independent variables must be metric and must have a
high degree of normality. Discriminant analysis builds a linear discriminant function, which
can then be used to classify the observations. The overall fit is assessed by looking at the
degree to which the group means differ (Wilkes Lambda or D2) and how well the model
classifies. To determine which variables have the most impact on the discriminant function,
it is possible to look at partial F values. The higher the partial F, the more impact that
variable has on the discriminant function. This tool helps categorize people, like buyers
and nonbuyers.

Multivariate Analysis of Variance (MANOVA)


This technique examines the relationship between several categorical independent
variables and two or more metric dependent variables. Whereas analysis of variance
(ANOVA) assesses the differences between groups (by using T tests for 2 means and
F tests between 3 or more means), MANOVA examines the dependence relationship
between a set of dependent measures across a set of groups. Typically this analysis
is used in experimental design, and usually a hypothesized relationship between
dependent measures is used. This technique is slightly different in that the independent
variables are categorical and the dependent variable is metric. Sample size is an
issue, with 15-20 observations needed per cell. However, too many observations per
cell (over 30) and the technique lose its practical significance. Cell sizes should be
roughly equal, with the largest cell having less than 1.5 times the observations of the
smallest cell. That is because, in this technique, normality of the dependent variables
is important. The model fit is determined by examining mean vector equivalents across
groups. If there is a significant difference in the means, the null hypothesis can be
rejected and treatment differences can be determined.

Factor Analysis
When there are many variables in a research design, it is often helpful to reduce the
variables to a smaller set of factors. This is an independence technique, in which there is
no dependent variable. Rather, the researcher is looking for the underlying structure of the
data matrix. Ideally, the independent variables are normal and continuous, with at least
3 to 5 variables loading onto a factor. The sample size should be over 50 observations,
with over 5 observations per variable. Multi collinearity is generally preferred between the
variables, as the correlations are key to data reduction. Kaiser’s Measure of Statistical
Adequacy (MSA) is a measure of the degree to which every variable can be predicted
by all other variables. An overall MSA of .80 or higher is very good, with a measure of
under .50 deemed poor.

There are two main factor analysis methods: common factor analysis, which extracts
factors based on the variance shared by the factors, and principal component analysis,
which extracts factors based on the total variance of the factors. Common factor analysis
is used to look for the latent (underlying) factors, where as principal components analysis
is used to find the fewest number of variables that explain the most variance. The first
factor extracted explains the most variance. Typically, factors are extracted as long as the
eigen values are greater than 1.0 or the Screen test visually indicates how many factors
to extract. The factor loadings are the correlations between the factor and the variables.

Amity Directorate of Distance & Online Education


70 Research Methodology

Typically a factor loading of .4 or higher is required to attribute a specific variable to a


Notes factor. An orthogonal rotation assumes no correlation between the factors, whereas an
oblique rotation is used when some relationship is believed to exist.

Cluster Analysis
The purpose of cluster analysis is to reduce a large data set to meaningful subgroups
of individuals or objects. The division is accomplished on the basis of similarity of the
objects across a set of specified characteristics. Outliers are a problem with this technique,
often caused by too many irrelevant variables. The sample should be representative of
the population, and it is desirable to have uncorrelated factors. There are three main
clustering methods: hierarchical, which is a treelike process appropriate for smaller data
sets; nonhierarchical, which requires specification of the number of clusters a priori, and
a combination of both.

There are 4 main rules for developing clusters:


The clusters should be different,
They should be reachable,
They should be measurable, and
The clusters should be profitable (big enough to matter).
This is a great tool for market segmentation.

Multidimensional Scaling(MDS)
The purpose of MDS is to transform consumer judgments of similarity into distances
represented in multidimensional space. This is a decompositional approach that uses
perceptual mapping to present the dimensions. As an exploratory technique, it is useful
in examining unrecognized dimensions about products and in uncovering comparative
evaluations of products when the basis for comparison is unknown. Typically there must be
at least 4 times as many objects being evaluated as dimensions. It is possible to evaluate
the objects with nonmetric preference rankings or metric similarities (paired comparison)
ratings. Kruskal’s Stress measure is a “badness of fit” measure; a stress percentage of
0 indicates a perfect fit, and over 20% is a poor fit. The dimensions can be interpreted
either subjectively by letting the respondents identify the dimensions or objectively by the
researcher.

Correspondence Analysis
This technique provides for dimensional reduction of object ratings on a set of attributes,
resulting in a perceptual map of the ratings. However, unlike MDS, both independent
variables and dependent variables are examined at the same time. This technique is more
similar in nature to factor analysis. It is a compositional technique, and is useful when
there are many attributes and many companies. It is most often used in assessing the
effectiveness of advertising campaigns. It is also used when the attributes are too similar
for factor analysis to be meaningful. The main structural approach is the development
of a contingency (crosstab) table. This means that the form of the variables should be
nonmetric. The model can be assessed by examining the Chisquare value for the model.
Correspondence analysis is difficult to interpret, as the dimensions are a combination of
independent and dependent variables.

Conjoint Analysis
Conjoint analysis is often referred to as “trade-off analysis,” in that it allows for the
evaluation of objects and the various levels of the attributes to be examined. It is both a
compositional technique and a dependence technique, in that a level of preference for a

Amity Directorate of Distance & Online Education


Research Methodology 71
combination of attributes and levels is developed. A part-worth, or utility, is calculated for
each level of each attribute, and combinations of attributes at specific levels are summed Notes
to develop the overall preference for the attribute at each level. Models can be built which
identify the ideal levels and combinations of attributes for products and services.

Canonical Correlation
The most flexible of the multivariate techniques, canonical correlation simultaneously
correlates several independent variables and several dependent variables. This powerful
technique utilizes metric independent variables, unlike MANOVA, such as sales, satisfaction
levels, and usage levels. It can also utilize nonmetric categorical variables. This technique
has the fewest restrictions of any of the multivariate techniques, so the results should be
interpreted with caution due to the relaxed assumptions. Often, the dependent variables
are related, and the independent variables are related, so finding a relationship is difficult
without a technique like canonical correlation.

Structural Equation Modeling


Unlike the other multivariate techniques discussed, structural equation modeling (SEM)
examines multiple relationships between sets of variables simultaneously. This represents
a family of techniques, including LISREL, latent variable analysis, and confirmatory factor
analysis. SEM can incorporate latent variables, which either are not or cannot be measured
directly into the analysis. For example, intelligence levels can only be inferred, with direct
measurement of variables like test scores, level of education, grade point average, and
other related measures. These tools are often used to evaluate many scaled attributes or
build summated scales.

Each of the multivariate techniques described above has a specific type of research
question for which it is best suited. Each technique also has certain strengths and
weaknesses that should be clearly understood by the analyst before attempting to interpret
the results of the technique. Current statistical packages (SAS, SPSS, S-Plus, and
others) make it increasingly easy to run a procedure, but the results can be disastrously
misinterpreted without adequate care.

Dependent and Independent variables:


If two or more variables are analyzed together, it may be necessary to spell out the
relationship the two variables. The concept of dependent and independent variables is
useful in spelling out the relationship. Two variables are called independent variables if a
change in one does not influence or cause a change in the other. But if a change in one
variable cause a change in other variable, the first one is called an independent variable,
and the second one is called the dependent variable.

First Stage Analysis:


1. Simple tabulation:

In a questionnaire-based survey, the first stage of analysis is called simple tabulation.


This consists of every question being treated separately. For every question, the number
of response in each category of answer is counted.

2. Computer tabulation:

If codes were used to input the data into the computer for tabulation, the number 1,
2, 3 could have also been the numerical codes for three categories of responses to the
above question.

Amity Directorate of Distance & Online Education


72 Research Methodology

3. Percentage:
Notes In addition to the number of respondents who fall into the category, we usually compute
percentage of the respondent also.

Simple tabulation for Ranking Type question: If we had ordinal scaled questions in our
questionnaire. Then, we may have a complex answer to tabulate. For example:

Rank of five Brands of Refrigerator shown below on scale of 1 to 5.


BRAND RANK
Whirlpool 1
Kelvinator 4
Godrej 5
Samsung 2
Videocon 3

4. Tabulating rating:

Commonly used rating scales are of the following type-

Q. Rate the following attributes of LIRIL soap on a scale of 1 to 5

1= Very Unsatisfactory, 2 = Unsatisfactory, 3= neither Satisfactory nor unsatisfactory,


4= Satisfactory, 5= very satisfactory.

Lather : 1 2 3 4 5

Second Stage Analysis:


1. Cross tabulation

After the simple frequency and percentage tabulation for every question on the
questionnaire comes to the second stage-cross tabulation. A cross tabulation can be
done by combining any two of the questions and tabulating the data together. This is a 2
variable cross tabulation.

Cross tabulation of more than two variables:

It is possible to have cross-tabulation of three or more variables in a table. But most


people find it difficult to assimilate information contained in three variables cross tabulation.
For the purpose of drawing bivariate analysis conclusion, a two variable cross tabulation
is quite adequate. A series of two cross tabulation can be performed on the important
variable in the questionnaire. It is for the researcher to decide which variables need to be
cross tabulated.

The Chi-squared test for cross tabulation:

In the case of cross tabulation featuring two variables, a test of significance called the
Chi-squared test can be used to test if the two variables are statistically associated with
each other significantly. The user, who is analyzing the data on the computer and using
a statistical package, can request a chi-squared test along with any cross tabulation.
Command such as CROSSTABS or CROSSTABULATION on most satisfied packages
have the option of doing a chi-squared test.

Amity Directorate of Distance & Online Education


Research Methodology 73
Multivariate Techniques
Notes

Dependence Interdependence
Technques Techniques

One Dependent More than One Variable Interobject


Variable Dependent Interdependence Similarity
Vairable

1) Cross tabulation 1) Multivariate 1) Cluster


1) Factor
(more than two analysis of variance Analysis
Analysis
varibales. and covariance. 2) Multidimension
2) Analysis of 2) Canonical Scaling.
variance and correlation.
covariance. 3) Multiple
3) Multiple Discriminant
regression. Analysis
4) Two group
Discriminant
Analysis.
5) Conjoint Analysis

To sum up multivarite analysis techniques are depicted graphically in following


schematic.

8.3 Anova and the design of experiments


In most marketing research applications, a survey of some sort is the method used,
whether it is conducted through mail, a personal interview, over the phone, or more
recently, on the internet. There are however, other classes of study available, one of which
is observation. The other widely used class of study is known as experimentation.

Application:

The application areas for experiments in marketing research are wide. Whenever a
marketing mix variable such as price, a specific promotion, or type of distribution, even
specific element like self space or color of packaging and so on is changed, we would
want to know its effect. Under proper conditions, an experiment can tell us the effect of
specific variation in one or more elements of marketing mix.

Methods:

A one independent variable experiment is called one way ANOVA. ANOVA stands for
Analysis of variance the generic name given to a set of techniques for studying the cause
and effect of one or more factors on single dependent variable. In case of more than one
variable MANOVA is used.

Amity Directorate of Distance & Online Education


74 Research Methodology

Variables:
Notes The analysis of variance technique is used when the independent variables are of
nominal scale and dependent variable is metric or at least interval scaled.

Experimental Design:

The design of Experiment is most critical when performing any experiment to be


analyzed through the technique of ANOVA. There are four major types of design:

1. Completely Randomized Design in a One Way ANOVA


2. Randomized Block Design
3. Latin Square Design
4. Factorial Design with 2 or more factors.

1. Completely Randomized Design in a One Way ANOVA:

This particularly design is used when there is only one categorical independent
variable, and one dependent variable. Each category of an independent variable is called
a level. The independent variable may be different level of prices, or different pack sizes,
or different product colors, and the effect could be the sale of the product.

2. Randomized Block Design:

It has been more efficient in isolating the variance due to the block variables. It should
be used when we suspect that a blocking variable is affecting the relationship between
the independent and dependent variables.

3. Latin Square Design:

The Latin Square Design is an extension of the randomized Block design. It consists
of one independent variable and two blocks, instead of one which we saw in randomized
Block design. It has no special significance in marketing research, so we will move on to
the more general case of a factorial design where any number of factors can be tested
simultaneously for their effects on the dependent variable.

4. Factorial Design with Two or more Factor:

This type of design is employed when we have two or more independent variables or
factors. The major advantage of this design is that multiple factors can be simultaneously
tested. There are two effects, one is main effect and other is Interaction effect.

8.4 Correlaion and Regression


Correlation and regression are generally performed together. The application of
correlation analysis is to measure the degree of association between two set of quantitative
data. The main objective of regression analysis is to explain the variation in one variable,
based on the variation in one or more other variables. The applications area are in explaining
variations in sales of product based on advertising expenses, or number of sales people,
or number of sales offices, or on all the above variables.

If there is only one dependent variable and one independent variable used to explain
the variation in it, then the model is known as a simple regression. If multiple independent
variables are used to explain the variation in a dependent variable, it is called multiple
regressions.

Amity Directorate of Distance & Online Education


Research Methodology 75
Methods:
There are basically two approaches to regression: Notes
1. A hit and Trial Approach:- In the hit and trial approach, we collect data on a large
number of independent variables and then try to fit a regression model with a stepwise
regression model, entering one variable into regression equation at a time.

The general regression model of this type is:

Y = a + b1x1+b2x2+…..bnxn

2. A pre conceived Approach: The pre-conceived approach assumes the researcher


knows reasonably well which variables explain y and the model is pre-conceived, say with
3 independent variables x1,x2,x3. Therefore not too much experimentation is done. The
main objective is to find out if the pre-conceived model is good or not.

Recommended usage:

The hit and trial approach may be used for exploratory research. But for serious
decision-making, there has to be appropriate knowledge of the variables which are likely
to affect y, and only such variables should be used in the regression analysis.

It is also recommended unless the model is itself significant at the desired confidence
level; the R2 value should not be interpreted.

8.5 Discriminate Analysis for Classification and Prediction


Application area for this technique is where we want to be able to distinguish
between two or three sets of objects or people, based on the knowledge of some of their
characteristics.

Method:

Discriminate analysis is very similar to the multiple regression technique. The form of
the equation in a two variable discriminate analysis is:

Y = a + k1x1 + k2x2

This is similar to a regression equation. It is called the discriminate function. Also,


like in a regression analysis, Y is the dependent variable and x1 and x2 are independent
variable. k1 and k2 are the coefficients of the independent variables, and a is a constant.
Please note that Y in this case is a categorical variable.k1 and k2 are determined by
appropriate algorithms in the computer package used, but the underlying objective is that
two coefficients should maximize the separation between the two groups of the y variable.

Variables and Data:

As mentioned above, Y is a classification into 2 or more groups and therefore, a


grouping variable in the terminology of discriminant analysis. That is group are formed on
the basis of existing data, and coded as 1 or 2 or similar to dummy variable coding. The
independent variables are continuous scale variable, and used as predictors of the group
to which the objects will belong.

8.6 Factor Analysis for Data Reduction


Factor analysis is a very useful method of reducing data complexity by reducing the
number of variables being studied. It is a good way of resolving this confusion and identifying
latent or underlying factors from an array of seemingly important variables.
Amity Directorate of Distance & Online Education
76 Research Methodology

Factor analysis is a set of technique which, by analyzing correlations between


Notes variables, reduces their number into fewer factors which explain much of original data,
more economically.

Methods:

There are two stages in factor analysis:

Stage 1: it can be called the factor extraction process, where our objective is to identify
how many factors will be extracted from the data. The most popular method for this is called
principal component analysis. There is also a rule of thumb based on the computation of
Eigen value, to determine how many factor to extract.

Stage 2: It is called rotation of principal components. This is actually optional, but


highly recommended. After the number of extracted factor is decided upon in stage 1, the
next task of the researcher is to interpret and name the factor. This is done by the process
of identifying which factors are associated with which of the original value.

Recommended usage:

It is used to reduce data variables into a smaller set of factors. The analysis could be
started by observing through a correlation matrix, if correlations exist between at least
some of the original variables.

8.7 Cluster Analysis for Market Segmentation


It is multivariate procedure ideally suited to segmentation applications in Marketing
research. It is group of similar objects. And segmentation involves identifying group of target
customer who are similar in buying habits, demographic characteristics, or psychographics.

Methods:

The basic methods of clustering used in computer packages are of two types:

1. Hierarchical clustering or nodel methods.


2. Non-hierarchical clustering or Nodal methods.
The first type includes methods such as single linkage, complete linkage, and average
linkage. A range of solutions is provided by the computer, from 1 cluster solution to an n
cluster solution, where n is the number of object is being studied.

The second type includes the K means approach where you specify in advance how
many cluster are required from the data.

Data / Scale of Variables:

Generally, interval-scaled variables are ideally suited for cluster analysis. Continuous
or ratio scaled variable can also be used but the instances of such use are rarer.

Recommended usage:

Find the number of cluster in the data by running a hierarchical clustering programme
on the variables.

1. Once the number of cluster has been identified, a k-means clustering option can
be run on the data.

Amity Directorate of Distance & Online Education


Research Methodology 77
8.8 Conjoint Analysis for Product Design
Notes
Application

Conjoint analysis is a multivariate technique that captures the exact level of utility
that an individual customer puts on various attributes of the product offering. Once we
know utility levels for every attributes, we can combine these to find the best combination
of attributes that gives him the highest utility, the second best combination that gives the
second highest utility, and so on and it will provide competitive strategy.

Methods:

Essentially, conjoint analysis is an attempt to convert ordinal scale ranking given by


respondents into an interval scale value or utility scale. The method is quite straight forward
in the data collection stage. The researcher determines a set of attributes and there levels,
say 3 attributes, each at 2 levels, which he feels are critical decision making variables for
the consumer. Then we should combine these attributes in different set of combination.

Recommended usage:

The usage of conjoint analysis can be three levels:

1. Individual consumer
2. Segment level
3. Across segment

8.9 Hypothesis Testing


This is a brief introduction of hypothesis testing concept, in the context of the “t” test.
Suppose, as marketers of brand shirt, we wanted to find out weather a set of customers
in Delhi and a set of customers in Mumbai thought of our brand in the same way or not.
Suppose we conducted a small survey in both cities and got rating on an interval scale
from our customers. We now want to do a statistical test to find out if the two sets of Rating
are “significantly different” from each other or not. We have to now set a level of “Statistical
significance” and select a suitable test. We also need to specify a null hypothesis.

The “null hypothesis” represents a statement to be used to perform a statistical test


to prove or to disprove the statement. In the above example, the null hypothesis for the
“t” test would be “There is no significant difference in the ratings given by customers in
Mumbai and Delhi”.

Now, we have to set a level of significance for the test. This represents the chance
that we may be making a mistake of a certain type. It can also be set as. For example,
if we desire that the confidence level for the test should be 95, then (100-95)/100, or 0
.05, becomes the significance level. We can think of it as a 0 .05 probability that we are
making a certain type of error in our decision making process. Type one error is the error of
rejecting the null hypothesis when it is true. Commonly used values of significance used in
marketing research are 0.05 or 0.10. But there is no hard and fast rule, and the significance
level can be set at a different level if necessary. Let us assume we take the conventional
value of 0.05 for our test. We will either reject the null hypothesis or fail to reject it.

1. The independent sample “t” test

Let us proceed with the same example and set up an independent sample “t” test as
discussed above, at a significance level of 0 .05 Table 1 presents the input data for the

Amity Directorate of Distance & Online Education


78 Research Methodology

test. This assumes that 15 customers of our brand each in Mumbai and Delhi were asked
to rate our brand on a 7 point scale. This response of all the 30 customers is in column
Notes labeled ‘rating’ in the table. The column labeled city indicates the city from which the rating
came, with a code of 1 for Mumbai and 2 for Delhi.

Table 1 presents the output from the independent sample’t’ test performed on the
above data. The decision rule for the test at 0.05 significance level is this-

If the ‘p’ value is less than the significance level set up by us for the test, we reject
the null hypothesis. Otherwise, we accept the null hypothesis. In this case, we find that
‘p’ value for the’t’ tests is 0.011 assuming unequal variance in two populations. This value
of 0.011 being less than our significance level of 0.05, we reject the null hypothesis and
conclude that the ratings of Mumbai and Delhi are different. If the ‘p’ value had been
larger than 0.05, we would have accepted the null hypothesis that there was no difference
between the two ratings.

Manual Versus Computer-based Hypothesis Testing: Please note that conventional


hypothesis testing would have done a manual computation of the t value from the data,
compared it with a value from the’t’ tables and arrived at the same kind of conclusion that
we did. The advantage of using the computer is that the test is performed by the package
automatically, and we get the ‘p’ value for the best in the computer output. We are going
to use this approach throughout this book for all the tests and analytical procedures. This
removes the need for tedious manual calculations, and leaves the student to do managerial
jobs like interpreting computer outputs rather than waste time in manual computation.

Table 1 Input data for independent sample’t’ test

Serial No. Ratings City


1 2 1
2 3 1
3 3 1
4 4 1
5 5 1
6 4 1
7 4 1
8 5 1
9 3 1
10 4 1
11 5 1
12 4 1
13 3 1
14 3 1
15 4 1
16 3 1
17 4 2
18 5 2
19 6 2
20 5 2

Amity Directorate of Distance & Online Education


Research Methodology 79
21 5 2
22 5 2 Notes
23 4 2
24 3 2
25 3 2
26 5 2
27 6 2
28 6 2
29 6 2
30 5 2

Table 2 ‘t’ test for independent samples of CITY

Value No. of cases Mean Std. Deviation

Mumbai 15 3.7333 .0884

Delhi 15 4.7333 1.100

Mean difference = -1.0000

Levene’s test for equality of variance F = .727p = .401

‘t’ test for equality of means

Variance t-value df 2-tail significance

Equal -2.75 28 0.010

Unequal -2.75 26.76 .011

1. Paired Sample ‘t’ test

In some cases, we may not have independent samples, but the same sample could
be used to do a research study involving two measurements. For instance, we may
measure somebody’s attitude towards a brand before it is advertised, try and find out if
there attitude has changed due to the ad campaign. In such cases, a paired sample’t’ test
is the appropriate statistical test.

We will illustrate using the example mentioned above. Assume we need a sample of
18 respondents whom we asked to rate on a 10 point interval scale, their attitude towards
say, Tamarind brand of Garments, before and after an ad campaign was released for this
brand. A rating of 1 represents “Brand is Highly Disliked” and a rating of 10 represents
“Brand is Highly Liked”, with other ratings having appropriate meanings.

The assumed data are in table 3. The first column contains ratings given by respondents
before they saw the ad campaign, and the second column represents their ratings after
they saw the ad campaign.

Table 4 contains the resultant computer output for a paired sample’t’ test. Assume that
we had set the significance level at 0.05, and that the null hypothesis is that “there is no
difference in the ratings given by respondents before and after they aw the ad campaign.”

The output table shows that the 2-tailed significance of the test is 0.000, from the last
column titled “2-tail significance” This is the ‘p’ value and it is less than the level of 0 .05
we had set. Therefore, as per our decision rule specified in the earlier example, we have to
reject the null hypothesis at significance level of 0.05, and conclude that there is significant
Amity Directorate of Distance & Online Education
80 Research Methodology

difference in the rating given by respondents before and after their exposure to the ad
campaign. The mean rating after the ad campaign is 5.7778 and before the campaign, it
Notes is 3.2778, and the difference of 2.5 is statistically significant.

If we have a sample size larger than 30 for the independent sample’t’ test, we can use
the ‘z’ test instead of the’t’ test. The statement of null hypothesis will remain the same in
the case of ‘z’ test also.

Table 3 Input data for paired sample’t’ test

Serial No. Before After


1 3 5
2 4 6
3 2 6
4 5 7
3 8
5 4 4
6 5 6
7 5 7
8 3 5
9 4 4
10 2 4
11 2 6
12 4 7
13 1 4
14 3 6
15 6 8
16 3 4
17 2 5
18 3 6
Table 4. ‘t’ test for paired sample

Mean Std. Deviation

After Rating after ad campaign 5.7778 1.309

Before Rating before ad campaign 3.2778 1.274

Paired Differences

Mean difference Std. difference t value Df 2-tail significance

2.5000 1.295 8.19 17 0.0000

Examples

Few of the practical examples are illustrated below to help the student to apply variety
of analysis tools used in marketing research

Question 1: as per survey reports of a state it was found that the average annual
expenditure for food grains by households is Rs 1,596. A random sample of 34 people in

Amity Directorate of Distance & Online Education


Research Methodology 81
one of the city of state had a sample mean expenditure of Rs1, 425 with sample standard
deviation of Rs 425. Test to see if the mean expenditure for food grains for people in the
city is different from the average of the state. Use a 1% significance level.
Notes

Solution:

We have to test the hypothesis that --

Ho: µ= 1596 Vs H1:µ ≠ 1596 (Two tailed test)

Will use t test

Test Statistics

x – µ
t = 5/ √n follows Student’s’t’ distribution with d.f. = n-1 = 33.

= -2.35

Where

x =1425 µ= 1596 σ=425 n=34

We use t distribution as is unknown.

P Value

P-value = 2*P (z <-2.35) = 0.0251

Since P-value of 0.0251 > 0.01 we do not reject Ho. It is statistically not significant

Conclusion
At the 1% level of significance, the data does not provide enough evidence to reject
the null hypothesis. Thus we conclude that the mean expenditure for food grains for the
city is not different from the state average.

Question 2: In a group of 9 subjects from a population with a mean satisfaction level


100 and standard deviation of 15, the company had given free sample of their product for
trial for a month and then again collected the data to know the satisfaction level from the
product and the sample mean of satisfaction level was found to be 113 and the sample
standard deviation is 10. Did the new product result in significant increase in satisfaction
level?

Solution: - Since this problem involves comparing a single group’s mean with the
population mean and the standard deviation for the population is known, the proper
statistical test to use is the Z-test

The value of µ= 100 and σ = 15

The null hypothesis and alternative hypothesis will be:

Ho : µ= 100 Vs H1 : µ >100 and Significance level is taken as .05

z = (x–µ)/ σx Where σ x = √ σ2/n or

So σx = σ/√n = 15/√9

= 15/3 = 5

Z= (113-100)/ 5= 13/5= 2.6 the table value of Z at 0.05 significance level is 1.64 the
calculated value of Z is greater than the table value hence the null hypothesis
Amity Directorate of Distance & Online Education
82 Research Methodology

Ho:µ= 100 is rejected which means the alternative hypothesis H1: µ >100 is true
Notes which means the new product has increased the satisfaction level of the consumers in a
significant manner.

Question3: A retail outlet has recently launched a new promotion campaign in its stores
across the city, and taken a random sample of 25 stores and found the average sales to
be 15 lacs per month, with a standard deviation of 9. Can we infer that the new promotion
campaign is a success if the average sales of all the stores are 12lacs per month?

Solution: - Since this problem involves comparing a single group’s mean with the
population mean and the standard deviation for the population is not known, the proper
statistical test to use is the one-sample t-test.

The null hypothesis and alternative hypothesis will be

Ho:µ= 12 Vs H1: µ >12 and Significance level is taken as .05, the degree of freedom
will be n-1 which is equal to 25-1=24 We use t distribution as ? is unknown.

x – µ
t = σ√ n
Here

x =15 µ= 12 σ=9 n=25

t = 15-12 = 3
=
15
= 1.84
9/√25 9/5 9
Value of t from the table at significance level .05 and degree of freedom 24 is 2.064

The calculated value of t is 1.84 which is less than the table value 2.064 hence the
null hypothesis is true i.e. Ho: λ = 12 which means that there is significant change in the
sales of the retail outlet hence we can infer that the new sales promotion campaign is not
a success.

Question4 :company has recently launched a new version of soap with changes
in packaging and look, the sales of the different territories is having a mean of 50000
units, and standard deviation 4000 units, now the company has taken sales data from 81
territories and it was found that average sales is 52000 units . Can we conclude that the
new packaging significantly improved the sales?

Solution-
The null hypothesis and alternative hypothesis will be

Ho: µ= 50,000 Vs H1 : µ > 50,000

The significance level is taken as 0.5 and we have to conduct Z test

z = (x – µ) were σx = σ/√n

The standard error of the mean can be calculated by the following formula:

So σx = σ/√n = 4000/√81

= 4000/9 = 444.4

Z= (52000-50000)/ 444.4 =4.5

Now we will look the value of z from the table at 0.05 level of significance which is
1.64 the calculated value of z is greater than the table value hence our null hypothesis is
rejected which means the alternative hypothesis is correct , therefore H1 : µ > 50,000 is true.
Amity Directorate of Distance & Online Education
Research Methodology 83
Question5: following information is collected in a survey from two cities Delhi and
Mumbai from the people having cars, the sample size taken was 100 Notes
Delhi (X) Mumbai (Y) Total

Cars owned by women 10 20 30


Cars owned by men 30 40 70
Total 40 60 100

Can we infer from the above data that the cars owned by women are relatively more
in Mumbai than in Delhi?

Solution

In this case we have to conduct χ2 test

χ2 = ∑ (Ο - E2)
E
Where O is the Observed Frequency in each category
E is the Expected Frequency in the corresponding category
df is the “degree of freedom” (n-1)
χ2 is Chi Square

First we have to calculate the expected frequencies

Expected frequency for women owning a car in Delhi can be calculated as follows

Expected frequency of women owning a car in Delhi =

(Cars owned by women x Cars in Delhi)


Total numbers of car owner

So expected frequency of women owning a car in Delhi = (30 x 40)/100=12

Similarly we can calculate the other frequencies and make the following table

Cars owned in Delhi Cars owned in Mumbai Total


Cars owned by women 12 18 30
Cars owned by men 28 42 70
40 60 100

Now we will calculate the values for chi square

Groups Observed Expected (Fo - Fe) (Fo - Fe)2/ Fe


Frequency Frequency

Cars owned by
women in Delhi 10 12 -2 4/12=0.33

Cars owned by
women in Mumbai 20 18 2 4/18=0.22

Cars owned by men


in Delhi 30 28 2 4/28=0.14

Cars owned by men


in Mumbai 40 42 -2 4/42=0.09

χ2 =( 0.33+0.22+0.14+0.09 ) = 0.78

Amity Directorate of Distance & Online Education


84 Research Methodology

The degree of freedom will be (c-1) (r-1), where c number of column and r number
Notes of rows

Degree of freedom = (2-1) (2-1) =1

Now we have to look for the value of χ2 at degree of freedom 1 at significance level
5% the level of significance can also be changed but normally we take the significance
level as 5%, the value is 3.841

The calculated value of χ2 is 0.78 and table value is 3.841. The calculated value is
lower than the table value hence we can conclude that the cars owned by women are
relatively more in Mumbai than in Delhi

Question 6: A drug manufacturing company is testing its new drug for curing baldness
, in and experiment of 500 persons, half of them were given the new drug and rest were
given the placebo, the patients reactions to treatment were recorded in the following table

Results of the new drug on the baldness

Treatment Cured significantly Allergic reaction No effect Total

Drug 150 30 70 250

Placebo 130 40 80 250

Total 280 70 150 500

Can we conclude the new drug is significantly different than placebo in curing the
baldness?

Solution: - we assume that the new drug is not significantly different from the placebo
in treatment of baldness

In this case we have to conduct χ2 test

χ2 = ∑ (Ο - E2)
E
Where O is the Observed Frequency in each category

E is the Expected Frequency in the corresponding category

df is the “degree of freedom” (n-1)(c-1)

χ2 is Chi Square

First we have to calculate the expected frequencies. For expected frequency of patients
getting cured by drug can be calculated as:

E11 = (250x280)/ 500 = 140 similarly we can calculate the expected frequencies and
make the following table.

Expected values for the result of the new drug on the baldness
Treatment Cured significantly Allergic reaction No effect Total

Drug 140 35 75 250

Placebo 140 35 75 250

Total 280 70 150 500

Now we will make the following table for further analysis

Amity Directorate of Distance & Online Education


Research Methodology 85
Observed Expected (O–E) (O–E)2 (O–E)2/E
frequency (O) Frequency (E) Notes
150 140 10 100 0.714

130 140 -10 100 0.714

30 35 -5 25 0.714

40 35 5 25 0.714

70 75 -5 25 0.333

80 75 5 25 0.333

Total 3.522

Now we have found that the calculated value of χ2 = 3.522, we will now look for the
table value of χ2 at significance level 0.05 and degree of freedom (2-1)(3-1) = 1x2 = 2.

The value of χ2 is 5.99 which is greater than the calculated value the null hypothesis
accepted which means there is no significant difference in the results of new drug and
placebo in curing the baldness.

Question7: following data is collected from a survey about the monthly income of
house holds in a locality and their expenditure in retail outlets. Is there any relationship
between the two variables?

Income in Thousands 18 14 25 12 30 22 36 10

Expenditure in Retail 8 7 10 5 14 10 16 4
outlets , in thousands

Solution:-

We have to calculate the coefficient of correlation ‘R’

R= n.ΣXY-(ΣX) (ΣY)

√[nΣX2 -(ΣX)2][nΣY2 -(ΣY)2]

Now we have to make the following table for calculating the values required

X Y XY X2 Y2

18 8 144 324 64
14 7 98 196 49
25 10 250 625 100
12 5 60 144 25
30 14 420 900 196
22 10 220 484 100
36 16 576 1296 256
10 4 40 100 16
ΣX ΣY ΣXY ΣX2 ΣY2

167 74 1808 4069 806

Amity Directorate of Distance & Online Education


86 Research Methodology

R= n.ΣXY-(ΣX) (ΣY)
Notes
√[nΣX2 -(ΣX)2][nΣY2 -(ΣY)2]

Now putting the values 10x1808-(167x74)

√ {(10x4069)-(167)2} {(10x806)-(74)2

= 18080-12358

√ (40690-27889) x (8060-5476)

= 5722

√ (12801) x (2584)

= 5722 = 5722/5751.32=0.99

√ 33077784

So as the value of coefficient of correlation is 0.99 which is positive and nearing 0 we


can conclude that the two variables are having a positive relationship which means if X
increases the Y also increases.

Question 8: Rama Sales Corporation has conducted a survey pertaining to its


expenditure on sales promotion activity carried out in nine different areas and sales,
following data is collected

Area 1 2 3 4 5 6 7 8 9

Expend on Sales 25 35 30 60 45 30 40 60 45
Promotion
(in thousands)

Sales (in boxes) 100 90 70 110 95 75 100 100 80

Can we predict the sales volume on the basis of given sales promotion expenditure
for an area?

Solution: we assume that the two variables expenditure on sales promotions and
sales are linearly related to each other so to predict the sales on the basis of expenditure
on sales promotion we have to find our the regression equations.

Y=a+bX where Y is sales volume, X is the expenditure on sales promotion activity and
a and b are intercept and coefficient of X.

We have to find out the values of a and b we can use the following formula

ΣY= n a +bΣ X and ΣXY=aΣX+bΣX2 were n is the number of variables taken here it
is nine.

Now we will construct following table for analysis

Area Sales Y Expenditure on sales promotion X X 2 XY

1 100 25 625 2500

2 90 35 1225 3150
Amity Directorate of Distance & Online Education
Research Methodology 87
3 70 30 900 2100
Notes
4 110 60 3600 6600

5 95 45 2025 4275

6 75 30 900 2250

7 100 40 1600 4000

8 100 60 3600 6000

9 80 45 2025 3600

ΣY ΣX ΣX2 ΣXY

820 370 16500 34475

ΣY=n a +b ΣX and ΣXY=aΣX+bΣX2

ΣY=n a +bΣX

(1) ---------------------- 820=9a+b(370)

ΣXY=aΣX+bΣX2

(2)__________________34475=370a+b (16500)

Multiply equation 1 with 44.6 it becomes

820x44.6= (9x44.6) +b (370x44.6)

(3)_______________________36572=401.4a+16500b

Solving equation 2 and 3

36572-34475= (401.4-370) a

2097=31.4a

a = 2097/34.1=61.4

Similarly

370b= 820-(61.4)9=820-552.6=267.4

So b= 267.4/370=0.723

So our equation becomes

Y= 61.4+0.723X

So we can predict the sale of area by putting the value of X for that area.

Question 9 : A telecom company has introduced two new plans T 199 and T 299 for
their pre paid customers, the two plans were launched and after a month the company
conducted a survey to find out the satisfaction level of the customers from the two plans,
it was found that out of sample of 60 customers using T 199 plan 18 were very much
satisfied and out of sample of 100 customers using T 299 plan 22 were very satisfied.
Find out which talk plan is more effective.

Solution:

We will assume that both plans have given same level of satisfaction hence;

Amity Directorate of Distance & Online Education


88 Research Methodology

Ho: p1= p2 and H1: p1 > p2 here p1 and p2 are proportions of customers who are
Notes satisfied by using the plans T 199 and T 299

n 1 =60, p1 = 18/ 60=0.30

n 2 =100, p2 = 22/ 100=0.22

Z= [(p1 - p2)-(p1 - p2)]


Sp1-p2

Sp1-p2 = √ p (1-p ) { 1/n1 + 1/ n2 } here p = n1 p1 + n2 p

n 1 + n2

So p = {60(18/60) + 100922/100)} / 60+100 = (18+22)/160= 0.25

Similarly we can calculate the value of Sp1-p2

= √ 0.25(0.75) {91/600+ (1/100} = √ 0.1875 (160/600) = 0.0707

Now we can calculate the value of z which is equal to (0.30-0.22)/0.0707 =


0.08/0.0707=1.131

The table value of Z at 0.05 significance level is 1.645 which is greater than 1.131
calculated value of Z hence we can calculate that null hypothesis Ho : p1= p2 is true which
means there is no difference in the satisfaction level generated by T 199 plan in comparison
to T 299 plan hence both plans are equal.

Question10: A company recently launched their new product in the market and
investigated the brand preference of its product by the distribution channel partner, the
company had selected three states from the north zone and collected the data from 6
distributors from one state, the scores were calculated on the basis of a questionnaire,
higher score represent the more preference given by the distributor for company’s product.
Using 0.05 significance level analyze and comment that there is no difference of brand
preference shown by the distributors of three different states.

Punjab Uttar Pradesh Haryana

6 5 6
5 5 7
4 4 6
5 4 5
6 5 6
4 4 6

Solution

Assuming that all the distributors of the three states show equal brand preference

Punjab Uttar Pradesh Haryana

X1 X2 1 X2 X22 X3 X32
6 36 5 25 6 36
5 25 5 25 7 49
4 16 4 16 6 36
5 25 4 16 5 25
6 36 5 25 6 36
4 16 4 16 6 36
Total 30 154 27 123 36 218

Amity Directorate of Distance & Online Education


Research Methodology 89
T= sum of all the observations
Notes
= Σ X1 + ΣX2 +ΣX3 = 30+27+36= 93

CF, correction factor = T2/ n, n = total number of samples which is equal to 18 here.

(93)2/18=8649/18= 480.5

SST= Total sum of squares

= {Σ X12 + ΣX22 +Σ X32 } - CF

= {154+123+218}- 480.5=14.5

SSTR= sum of squares between the samples

{(Σ X1)2/ n1+ (ΣX2)2/n2+ (ΣX3)2 /n3} - CF

Here n1, n2 and n3 represent the samples of three areas, Punjab, Uttar Pradesh and
Haryana.

= {(30)2/ 6+ (27)2/6+(36)2 /6 } - 480.50

={(900/6) + (729/6)+ (1296/6) }-480.50 = (150+121.5+216) -480.50

=7

SSE = SST- SSTR+ 14.50-7= 7.50

Degree of freedom df1=r-1 = 3-1=2 and df2 = n-r=18-3= 15

MSTR= SSTR/ df1= 7/2=3.5

MSE= SSE/df2=7.5/15=0.5

Create ANOVAs table

Source of Sum of squares Degree of freedom Mean squares Test statistic


Variation
Between sample 7 2 2.5 F=3.5/0.5
Within samples 7.5 15 0.5 7

Total 14.5 17

The table value of F for df1 =2,df2=15 and significance level 0.05 is 3.68 and calculated
value of F is 7 which more therefore null hypothesis that brand preference in the three states
is equal is void and for this reason we can say there is difference in brand preference by
distributors of the three states of northern region

Question11: A mobile company claims that the average life of their M800 mobile
phone is more than 5000 hrs, a random sample of 25 was tested and a mean and standard
deviation of 5200 and 250 were computed. Is company’s claim of 5000hrs valid?

Solution:-

Assuming the population represent normal distribution curve and the claim is valid
the null hypothesis becomes

Ho : µ≥5000 Vs H1 : µ< 50

Amity Directorate of Distance & Online Education


90 Research Methodology

x – µ
t = s/√ n
Notes
here

x =5200 µ= 5000 σ=250 n=25

t = (5200-5000) / (250/√25) = 200/(250/5) = 200/50 = 4

the table value of t at 0.05 significance level and degree of freedom of n-1 i.e. 16-1 = 15 is

2.131

The calculated value of t is greater than the table value, which means the null hypothesis

Ho : µ≥5000 is rejected which means the alternate hypothesis H1 : µ < 50 is true which
concludes that the claim of the company that the average life of their Mobile phone M800
is not valid or in other words the average life of M800 mobile phone is less than 5000hrs.

Question 12 : A tea marketing company has launched a new brand of tea in 12


states of India and conducted the sales analysis for the first month and then the analysis
was again conducted after 3 months , following data is collected, can we say that there
is significant growth in the sales of the new brand of tea within the period of three months

Number of 1 2 3 4 5 6 7 8 9 10 11 12
States

1st month 50 42 51 26 35 42 60 41 70 55 62 38
Sales in
thousands

4th month 62 40 61 35 30 52 68 51 84 63 72 50
sales in
thousands

Solution

We will assume that there is no growth in the sales of the new brand of tea within three
months hence our null hypothesis becomes

Ho: µd = 0 and alternative hypothesis H1: µd ≠0

Now we have to conduct two tail t tests

t = (d bar - µd ) / Sd √n

d bar = (Σd ) / n

Sd = √{(Σ d2) / (n-1) } - { (Σ d )2 / n(n-1)}

Now to calculate the values required we will construct the following table

Number of State Sales for Sales for the Difference in


the first month fourth month sales = d d2

1 50 62 12 144
2 42 40 -2 4
3 51 61 10 100
4 26 35 9 81
5 35 30 -5 25
6 42 52 10 100
7 60 68 8 64
Amity Directorate of Distance & Online Education
Research Methodology 91
8 41 51 10 100
9 70 84 14 196
Notes
10 55 63 8 64
11 62 72 10 100
12 38 50 12 144

(Σd ) = 96 (Σd 2) =1122

d bar = (Σd ) / n = 96/ 12 = 8

Sd = √{(Σd2 ) / (n-1) } - { (Σd )2 / n(n-1)}

Sd = √{(1122 ) / (12-1) } - { (96 )2 / 12(12-1)}

Sd = √{(1122 ) / (11) } - { (9216 ) / 132}

Sd = √{102} - {69.81} = √ 32.19 = 5.673

t = ( d bar - µd ) / Sd√n

t= 8 / (5.673/√12) = 8/ (5.673/3.464)=8/1.64 =4.88

Now we will see the value of t from the table at 0.05 significance level and degree of
freedom of n-1 = 12-1 =11

The table value is 1.796 the calculated value of t is greater than the table value hence
we reject the null hypothesis Ho: µd = 0 so we can conclude that there is significant growth
in the sales of the new brand of tea.

Question 13 : Following table gives the data about the sale target achieved by 4
salesmen in three months Jan ,Feb and Mar of 2010.

Month Salesman

A B C D
JAN 50 40 48 39
FEB 46 48 50 45
MAR 39 44 40 39

Is there a significant difference in the sale made by the four salesmen? Is there a
significant difference in the sales made during the different months?

Solution

We assume that there is no significant difference between the sales target achieved
by the four salesmen during different months. Coding the above data by subtracting 40
from each observation, we construct two way ANNOVA table as follows

Month Salesman Row

A(X1) X12 B(X2) X22 C(X3) X32 D(X4) X42 sum


JAN 10 100 0 0 8 64 -1 1 17
FEB 6 36 8 64 10 100 5 25 29
MAR -1 1 4 16 0 0 -1 1 2
Sum 15 137 12 80 18 164 3 27 48

T= sum of all observation in three samples of month = 48

Correction factor CF = T2 / n = (48)2/ 12 = 2304/12 = 192

SSTR= sum of squares between salesmen

Amity Directorate of Distance & Online Education


92 Research Methodology

{(15) / 3 + ( 12) / 3 + ( 18) / 3 + ( 3) / 3 } - CF


2 2 2 2

Notes = { (225)/ 3 + (144)/ 3 + (324)/ 3 + (9/ 3} - 192

= {75+48+108+3}-192 = 42= SSTR

SSR = sum of squares between months

{ ( 17)2 / 4 + ( 29)2 / 4 + ( 2)2 / 4} - CF

= {( 289)/ 4 + ( 841) / 4 + (4) / 4} - 192

= { 72.25+210.25+1}-192 = 91.5=SSR

SST = Total sum of squares

= {Σ X12 + ΣX22+Σ X32 + Σ X42 } - CF

= {137+80+164+27} - 192 = 216 = SST

SSE = SST- ( SSC+SSR) = 216-(42+91.5) = 82.5

Total degree of freedom are df=n-1=12-1=11 similarly

dfc = c-1 = 4-1 = 3 , dfr = r-1 = 3-1= 2 , df = (c-1)(r-1) = 3x2 = 6

MSTR = SSTR / ( c-1) = 42/3 = 14

MSR = SSR/ (r-1) = 91.5/2 = 45.75

MSE= SSE/ (c-1)(r-1) = 82.5/6 = 13.75

We will make two way ANNOVA table

Sources of variation Sum of Squares Degree of Mean Squares Variance Ratio


freedom

Between salesmen 42.0 3 14.00 Ftreatment =

14/13.75=1.018

Between months 91.5 2 45.75 Fblock = 45.75/


13.75 = 3.327

Residual Error 82.5 6 13.75

Total 216 11

The table value of F for df1= 3, df2 = 6 at a significance level of 0.05 is 4.75.Since the
calculated value of F is 1.018 less than the table value the null hypothesis is accepted, so
we can say that the sales target achieved by salesmen do not differ significantly.

Similarly the table value of F for df1= 2, df2 = 6 at a significance level of 0.05 is 5.14.Since
the calculated value of F is 3.327 less than the table value the null hypothesis is accepted
so we can conclude that sales made during different months do not differ significantly.

Question14: following data is being collected relate to the age of insured person and
mediclaims submitted by them in 3 years.

Insured Person 1 2 3 4 5 6 7 8 9 10

Age 30 32 35 40 48 50 52 55 57 61

Claims submitted 1 0 2 5 2 4 6 5 7 8

Find out correlation coefficient and interpret the results


Amity Directorate of Distance & Online Education
Research Methodology 93
Solution
Correlation of coefficient can be calculated as Notes
R= n.ΣXY-(ΣX)( ΣY)

√[nΣX2 -(?X)2][nΣY2 -(ΣY)2]

Here X = x-x bar and Y = y- y bar

Now we will make the following table for analysis

AGE x X X2 y claims Y Y2 XY
30 -16 256 1 -3 9 48
32 -14 196 0 -4 16 56
35 -11 121 2 -2 4 22
40 -6 36 5 1 1 -6
48 2 4 2 -2 4 -4
50 4 16 4 0 0 0
52 6 36 6 2 4 12
55 9 81 5 1 1 9
57 11 121 7 3 9 33
61 15 225 8 4 16 60

460 0 1092 40 0 64 230

x bar =( Σx )/n = 460/10= 46 similarly

y bar = ( Σy )/n = 40/10= 4

Substituting the values in the formula

R= n.ΣXY-(ΣX)( ΣY)

√[nΣX2 -(?X)2][nΣY2 -(ΣY)2]

R = (10x230 )

√[(10x1092)(10x64)]

R=230/264.363 = 0.870

The value of r is positive and nearing one which means the mediclaims forwarded by
the insured person and their age are positively correlated in higher degree.

Question15: following data is collected from a survey about the monthly income of
house holds in a locality and their expenditure in retail outlets. Is there any relationship
between the two variables?

Income in 18 14 25 12 30 22 36 10
Thousands

Expenditure in 8 7 10 5 14 10 16 4
Retail outlets ,
in thousands

Solution

We have to calculate the coefficient of correlation r

R= n.ΣXY-(ΣX)( ΣY)

√[nΣX2 -(?X)2][nΣY2 -(ΣY)2]


Amity Directorate of Distance & Online Education
94 Research Methodology

Now we have to make the following table for calculating the values required
Notes X Y XY X2 Y2
18 8 144 324 64
14 7 98 196 49
25 10 250 625 100
12 5 60 144 25
30 14 420 900 196
22 10 220 484 100
36 16 576 1296 256
10 4 40 100 16

ΣX ΣY ΣXY ΣX2 ΣY2

167 74 1808 4069 806

R= n.ΣXY-(ΣX)( ΣY)

√[nΣX2 -(?X)2][nΣY2 -(ΣY)2]

Now putting the values 10x1808-(167x74)

√{(10x4069)-(167)2}{(10x806)-(74)2

= 18080-12358

√ (40690-27889)x(8060-5476)

= 5722

√(12801)x(2584)

= 5722 =5722/5751.32=0.99

√ 33077784

The value of coefficient of correlation is 0.99 which is positive and nearing 0 we


can conclude that the two variables are having a positive relationship which means if X
increases the Y also increases.

Summary
Analysis of survey based on data starts with simply tabulating the collected data.
Before we do this, data is assumed to be coded if it is nominal scaled. If we are using
SPSS, value labels for nominal data variables must also be input and saved. ANOVA
stands for analysis, the generic name given to a set of techniques for studying cause and
effect relationship of one or more factors on a single dependent variable. The analysis
of variance technique is used when the independent variables are of nominal scale and
dependent variable is metric. Calculation and regression are the best applied together to
test whether metric variables are associated with each other, and whether the dependent
variable can be explained by some independent variables, or predicted from them. In
Amity Directorate of Distance & Online Education
Research Methodology 95
marketing, the dependent variable of interest is usually sales. The independent variables
can be any marketing mix variables which affect sales, such as advertising expenditure,
number of sales people, promotional expenditure and so on. Discriminant analysis is
Notes
somewhat similar to regression analysis. There are dependent variables and there are
some independent variables used to predict the dependent variable is categorical, not
metric. It is used to classify people or objects into two or more groups based on some
knowledge of there characteristics. Factor analysis technique provide a fascinating way of
reducing the number of variables in a research problem to a smaller and more manageable
number by combining related ones into factors. This relieves the researcher from the
confusion arising through overlapping measures of the same underlying variables. Cluster
methods are many. The basic idea of cluster analysis is to group similar objects together.
Some measure of similarity is used to this. Two basics types of clustering methods are
hierarchical and non-hierarchical method, and try to identify the number of cluster in the
data. Conjoint analysis is ideally suited for product design problems. This is because the
technique is able to put numerical value on the mysteries of the consumer’s mind. It tries
to map his decision making process and tradeoffs he makes while choosing a particular
product offering. The result is conjoint analysis is a set of utility values for every product
variation and attribute level on offer.

Check your progress

1. Which of these is a type of data analysis that is used in analyzing raw data?
a) Bivariate analysis
b) Regression analysis
c) Conjoint analysis
d) None of the above

2. Univaraiate analysis is a …………variable analysis.


a) Multi
b) Double
c) Single
d) None of the above

3. First stage analysis can be done through--


a) Cross tabulation
b) Chi square
c) Computer tabulation
d) None of these

4. Chi square test is called the test of ……………….


a) Consistency
b) Accuracy
c) Significance
d) Legibility

5. ANOVA stands for --


a) Analysis of Variable
b) Analysis of Variance

Amity Directorate of Distance & Online Education


96 Research Methodology

c) Analysis of Variety
Notes d) None of the above

6. The design of Experiment is most critical when performing any experiment to be


analyzed through the technique of ANOVA. Which of the following is one of the major
type of design?
a) Latin design
b) Block factorial design
c) Factorial Design with 2 or more factors.
d) None of the above

7. The main objective of regression analysis is to explain the variation in one variable,
based on the variation in one or more other variables.
a) One variable
b) Multi variable
c) Two variable
d) One or more variable

8. Which of the following is a regression approach?


a) Trail approach
b) Hit and trial approach
c) Conceived approach
d) Perceived approach

9. The form of the equation in a two variable discriminate analysis is--


a) Y = x + l1x1 + k2x2
b) Y = a + k1x1 + k2x2
c) Y = a + l2x2 + l2x2
d) Y = x + k1x1 + k2x2

10. What are the two stages of factor analysis?


a) Interpretation and rotation of principal variables
b) Extraction process and rotation of principle variables.
c) Factor extraction process and rotation of principal components
d) Factor extraction process and Interpretation process.

Questions &Exercises
1. Define Experimental design in ANOVA?
2. Discuss the method and usage of cluster analysis?
3. What is Null Hypothesis?
4. Define paired simple t test?
5. Describe the concept of data analysis.
6. How to differentiate between simple tabulation and cross tabulation?
7. Describe the design of Experiments

Amity Directorate of Distance & Online Education


Research Methodology 97
8. Explain the association and causation in relation with Correlation and Regression.
9. Describe the Factor analysis for data reduction. Notes
10. Explain Hypothesis Testing.
11. Describe univariate and bivariate techniques.

For Further Reading:


 Paul E. Green, Donald D.Tull and Gerald Albaum: Research For Marketing
Decisions, Fifth Edition, Prentice Hall Of India
 Harper W Boyd, Ralph Westphal and Stanley F Stasch: Marketing Research-Text
and Cases, Latest Edition, Richard D Irwin, Inc.
 Naresh K. Malhotra: Marketing Research-An Applied Orientation, Third Edition,
Pearson Education Asia ( Indian edition)
 Rajendra Nargundkar: Marketing Research-- Text & Cases; Tata McGraw- Hill
publishing Company Limited.
 Marketing research: G C Beri; Tata McGraw- Hill publishing Company Limited

Amity Directorate of Distance & Online Education

You might also like