Research Methodology 4
Research Methodology 4
Objectives
To understand Concept of data analysis
Define Simple tabulation and cross tabulation
Discuss ANOVAs and design of Experiments
Illustrate Correlation and Regression: Explaining Association and Causation
To understand the concept of Discriminate Analysis for Classification and prediction
Discuss Factor analysis for data reduction
Discuss Cluster Analysis for Market Segmentation
Introduce Conjoint analysis for product design
Concept of Hypothesis Testing
8.1 Introduction
Once the data has been collected in the form of filled up questionnaires, the next step
is to process it. This can be done either manually or with the help of questionnaire. Some
of the packages which we can use for analysis purpose are like SAS, SPSS, STATISTICA
and SYSTAT .This chapter focuses on complete analysis part of collected data. On the
basis of different variables approaches of analysis will be differ.
8.2 Analysis
Analysis of data is the process by which data is converted into useful information. Raw
data as collected from questionnaire cannot be used unless it is processed in some way to
make it amenable to drawing conclusion. Various techniques of data collection are available.
Types of Analysis
1. Univariate, involving a single variable at a time.
2. Bivariate, involving two variables at a time.
3. Multivariate, involving three or more variables at a time.
The choice of which of above types of data analysis to use depend on at least three
factors viz.
Amity Directorate of Distance & Online Education
66 Research Methodology
a. Scale of Data
Univariate analysis contrasts with bivariate analysis - the analysis of two variables
simultaneously - or multivariate analysis - the analysis of multiple variables simultaneously.
Univariate analysis is also used primarily for descriptive purposes, while bivariate and
multivariate analysis is geared more towards explanatory purposes. Univariate analysis is
commonly used in the first stages of research, in analyzing the data at hand, before being
supplemented by more advance, inferential bivariate or multivariate analysis.
Another set of measures used in the univariate analysis, complementing the study of
the central tendency, involves studying the statistical dispersion .These measurements
look at how the values are distributed around values of central tendency. The dispersion
measures most often involve studying the range; inter quartile range, and the standard
deviation.
Notes
Univarite Data
Frequency
Chi Square T-test z test
K-S Runs Independent Related
Binomial
Multivariate Analysis
Decision Analyst
In order to understand multivariate analysis, it is important to understand some of the
terminology. A variate is a weighted combination of variables. The purpose of the analysis
is to find the best combination of weights. Nonmetric data refers to data that are either
qualitative or categorical in nature. Metric data refers to data that are quantitative, and
interval or ratio in nature.
Discriminant Analysis
The purpose of discriminant analysis is to correctly classify observations or people
into homogeneous groups. The independent variables must be metric and must have a
high degree of normality. Discriminant analysis builds a linear discriminant function, which
can then be used to classify the observations. The overall fit is assessed by looking at the
degree to which the group means differ (Wilkes Lambda or D2) and how well the model
classifies. To determine which variables have the most impact on the discriminant function,
it is possible to look at partial F values. The higher the partial F, the more impact that
variable has on the discriminant function. This tool helps categorize people, like buyers
and nonbuyers.
Factor Analysis
When there are many variables in a research design, it is often helpful to reduce the
variables to a smaller set of factors. This is an independence technique, in which there is
no dependent variable. Rather, the researcher is looking for the underlying structure of the
data matrix. Ideally, the independent variables are normal and continuous, with at least
3 to 5 variables loading onto a factor. The sample size should be over 50 observations,
with over 5 observations per variable. Multi collinearity is generally preferred between the
variables, as the correlations are key to data reduction. Kaiser’s Measure of Statistical
Adequacy (MSA) is a measure of the degree to which every variable can be predicted
by all other variables. An overall MSA of .80 or higher is very good, with a measure of
under .50 deemed poor.
There are two main factor analysis methods: common factor analysis, which extracts
factors based on the variance shared by the factors, and principal component analysis,
which extracts factors based on the total variance of the factors. Common factor analysis
is used to look for the latent (underlying) factors, where as principal components analysis
is used to find the fewest number of variables that explain the most variance. The first
factor extracted explains the most variance. Typically, factors are extracted as long as the
eigen values are greater than 1.0 or the Screen test visually indicates how many factors
to extract. The factor loadings are the correlations between the factor and the variables.
Cluster Analysis
The purpose of cluster analysis is to reduce a large data set to meaningful subgroups
of individuals or objects. The division is accomplished on the basis of similarity of the
objects across a set of specified characteristics. Outliers are a problem with this technique,
often caused by too many irrelevant variables. The sample should be representative of
the population, and it is desirable to have uncorrelated factors. There are three main
clustering methods: hierarchical, which is a treelike process appropriate for smaller data
sets; nonhierarchical, which requires specification of the number of clusters a priori, and
a combination of both.
Multidimensional Scaling(MDS)
The purpose of MDS is to transform consumer judgments of similarity into distances
represented in multidimensional space. This is a decompositional approach that uses
perceptual mapping to present the dimensions. As an exploratory technique, it is useful
in examining unrecognized dimensions about products and in uncovering comparative
evaluations of products when the basis for comparison is unknown. Typically there must be
at least 4 times as many objects being evaluated as dimensions. It is possible to evaluate
the objects with nonmetric preference rankings or metric similarities (paired comparison)
ratings. Kruskal’s Stress measure is a “badness of fit” measure; a stress percentage of
0 indicates a perfect fit, and over 20% is a poor fit. The dimensions can be interpreted
either subjectively by letting the respondents identify the dimensions or objectively by the
researcher.
Correspondence Analysis
This technique provides for dimensional reduction of object ratings on a set of attributes,
resulting in a perceptual map of the ratings. However, unlike MDS, both independent
variables and dependent variables are examined at the same time. This technique is more
similar in nature to factor analysis. It is a compositional technique, and is useful when
there are many attributes and many companies. It is most often used in assessing the
effectiveness of advertising campaigns. It is also used when the attributes are too similar
for factor analysis to be meaningful. The main structural approach is the development
of a contingency (crosstab) table. This means that the form of the variables should be
nonmetric. The model can be assessed by examining the Chisquare value for the model.
Correspondence analysis is difficult to interpret, as the dimensions are a combination of
independent and dependent variables.
Conjoint Analysis
Conjoint analysis is often referred to as “trade-off analysis,” in that it allows for the
evaluation of objects and the various levels of the attributes to be examined. It is both a
compositional technique and a dependence technique, in that a level of preference for a
Canonical Correlation
The most flexible of the multivariate techniques, canonical correlation simultaneously
correlates several independent variables and several dependent variables. This powerful
technique utilizes metric independent variables, unlike MANOVA, such as sales, satisfaction
levels, and usage levels. It can also utilize nonmetric categorical variables. This technique
has the fewest restrictions of any of the multivariate techniques, so the results should be
interpreted with caution due to the relaxed assumptions. Often, the dependent variables
are related, and the independent variables are related, so finding a relationship is difficult
without a technique like canonical correlation.
Each of the multivariate techniques described above has a specific type of research
question for which it is best suited. Each technique also has certain strengths and
weaknesses that should be clearly understood by the analyst before attempting to interpret
the results of the technique. Current statistical packages (SAS, SPSS, S-Plus, and
others) make it increasingly easy to run a procedure, but the results can be disastrously
misinterpreted without adequate care.
2. Computer tabulation:
If codes were used to input the data into the computer for tabulation, the number 1,
2, 3 could have also been the numerical codes for three categories of responses to the
above question.
3. Percentage:
Notes In addition to the number of respondents who fall into the category, we usually compute
percentage of the respondent also.
Simple tabulation for Ranking Type question: If we had ordinal scaled questions in our
questionnaire. Then, we may have a complex answer to tabulate. For example:
4. Tabulating rating:
Lather : 1 2 3 4 5
After the simple frequency and percentage tabulation for every question on the
questionnaire comes to the second stage-cross tabulation. A cross tabulation can be
done by combining any two of the questions and tabulating the data together. This is a 2
variable cross tabulation.
In the case of cross tabulation featuring two variables, a test of significance called the
Chi-squared test can be used to test if the two variables are statistically associated with
each other significantly. The user, who is analyzing the data on the computer and using
a statistical package, can request a chi-squared test along with any cross tabulation.
Command such as CROSSTABS or CROSSTABULATION on most satisfied packages
have the option of doing a chi-squared test.
Dependence Interdependence
Technques Techniques
Application:
The application areas for experiments in marketing research are wide. Whenever a
marketing mix variable such as price, a specific promotion, or type of distribution, even
specific element like self space or color of packaging and so on is changed, we would
want to know its effect. Under proper conditions, an experiment can tell us the effect of
specific variation in one or more elements of marketing mix.
Methods:
A one independent variable experiment is called one way ANOVA. ANOVA stands for
Analysis of variance the generic name given to a set of techniques for studying the cause
and effect of one or more factors on single dependent variable. In case of more than one
variable MANOVA is used.
Variables:
Notes The analysis of variance technique is used when the independent variables are of
nominal scale and dependent variable is metric or at least interval scaled.
Experimental Design:
This particularly design is used when there is only one categorical independent
variable, and one dependent variable. Each category of an independent variable is called
a level. The independent variable may be different level of prices, or different pack sizes,
or different product colors, and the effect could be the sale of the product.
It has been more efficient in isolating the variance due to the block variables. It should
be used when we suspect that a blocking variable is affecting the relationship between
the independent and dependent variables.
The Latin Square Design is an extension of the randomized Block design. It consists
of one independent variable and two blocks, instead of one which we saw in randomized
Block design. It has no special significance in marketing research, so we will move on to
the more general case of a factorial design where any number of factors can be tested
simultaneously for their effects on the dependent variable.
This type of design is employed when we have two or more independent variables or
factors. The major advantage of this design is that multiple factors can be simultaneously
tested. There are two effects, one is main effect and other is Interaction effect.
If there is only one dependent variable and one independent variable used to explain
the variation in it, then the model is known as a simple regression. If multiple independent
variables are used to explain the variation in a dependent variable, it is called multiple
regressions.
Y = a + b1x1+b2x2+…..bnxn
Recommended usage:
The hit and trial approach may be used for exploratory research. But for serious
decision-making, there has to be appropriate knowledge of the variables which are likely
to affect y, and only such variables should be used in the regression analysis.
It is also recommended unless the model is itself significant at the desired confidence
level; the R2 value should not be interpreted.
Method:
Discriminate analysis is very similar to the multiple regression technique. The form of
the equation in a two variable discriminate analysis is:
Y = a + k1x1 + k2x2
Methods:
Stage 1: it can be called the factor extraction process, where our objective is to identify
how many factors will be extracted from the data. The most popular method for this is called
principal component analysis. There is also a rule of thumb based on the computation of
Eigen value, to determine how many factor to extract.
Recommended usage:
It is used to reduce data variables into a smaller set of factors. The analysis could be
started by observing through a correlation matrix, if correlations exist between at least
some of the original variables.
Methods:
The basic methods of clustering used in computer packages are of two types:
The second type includes the K means approach where you specify in advance how
many cluster are required from the data.
Generally, interval-scaled variables are ideally suited for cluster analysis. Continuous
or ratio scaled variable can also be used but the instances of such use are rarer.
Recommended usage:
Find the number of cluster in the data by running a hierarchical clustering programme
on the variables.
1. Once the number of cluster has been identified, a k-means clustering option can
be run on the data.
Conjoint analysis is a multivariate technique that captures the exact level of utility
that an individual customer puts on various attributes of the product offering. Once we
know utility levels for every attributes, we can combine these to find the best combination
of attributes that gives him the highest utility, the second best combination that gives the
second highest utility, and so on and it will provide competitive strategy.
Methods:
Recommended usage:
1. Individual consumer
2. Segment level
3. Across segment
Now, we have to set a level of significance for the test. This represents the chance
that we may be making a mistake of a certain type. It can also be set as. For example,
if we desire that the confidence level for the test should be 95, then (100-95)/100, or 0
.05, becomes the significance level. We can think of it as a 0 .05 probability that we are
making a certain type of error in our decision making process. Type one error is the error of
rejecting the null hypothesis when it is true. Commonly used values of significance used in
marketing research are 0.05 or 0.10. But there is no hard and fast rule, and the significance
level can be set at a different level if necessary. Let us assume we take the conventional
value of 0.05 for our test. We will either reject the null hypothesis or fail to reject it.
Let us proceed with the same example and set up an independent sample “t” test as
discussed above, at a significance level of 0 .05 Table 1 presents the input data for the
test. This assumes that 15 customers of our brand each in Mumbai and Delhi were asked
to rate our brand on a 7 point scale. This response of all the 30 customers is in column
Notes labeled ‘rating’ in the table. The column labeled city indicates the city from which the rating
came, with a code of 1 for Mumbai and 2 for Delhi.
Table 1 presents the output from the independent sample’t’ test performed on the
above data. The decision rule for the test at 0.05 significance level is this-
If the ‘p’ value is less than the significance level set up by us for the test, we reject
the null hypothesis. Otherwise, we accept the null hypothesis. In this case, we find that
‘p’ value for the’t’ tests is 0.011 assuming unequal variance in two populations. This value
of 0.011 being less than our significance level of 0.05, we reject the null hypothesis and
conclude that the ratings of Mumbai and Delhi are different. If the ‘p’ value had been
larger than 0.05, we would have accepted the null hypothesis that there was no difference
between the two ratings.
In some cases, we may not have independent samples, but the same sample could
be used to do a research study involving two measurements. For instance, we may
measure somebody’s attitude towards a brand before it is advertised, try and find out if
there attitude has changed due to the ad campaign. In such cases, a paired sample’t’ test
is the appropriate statistical test.
We will illustrate using the example mentioned above. Assume we need a sample of
18 respondents whom we asked to rate on a 10 point interval scale, their attitude towards
say, Tamarind brand of Garments, before and after an ad campaign was released for this
brand. A rating of 1 represents “Brand is Highly Disliked” and a rating of 10 represents
“Brand is Highly Liked”, with other ratings having appropriate meanings.
The assumed data are in table 3. The first column contains ratings given by respondents
before they saw the ad campaign, and the second column represents their ratings after
they saw the ad campaign.
Table 4 contains the resultant computer output for a paired sample’t’ test. Assume that
we had set the significance level at 0.05, and that the null hypothesis is that “there is no
difference in the ratings given by respondents before and after they aw the ad campaign.”
The output table shows that the 2-tailed significance of the test is 0.000, from the last
column titled “2-tail significance” This is the ‘p’ value and it is less than the level of 0 .05
we had set. Therefore, as per our decision rule specified in the earlier example, we have to
reject the null hypothesis at significance level of 0.05, and conclude that there is significant
Amity Directorate of Distance & Online Education
80 Research Methodology
difference in the rating given by respondents before and after their exposure to the ad
campaign. The mean rating after the ad campaign is 5.7778 and before the campaign, it
Notes is 3.2778, and the difference of 2.5 is statistically significant.
If we have a sample size larger than 30 for the independent sample’t’ test, we can use
the ‘z’ test instead of the’t’ test. The statement of null hypothesis will remain the same in
the case of ‘z’ test also.
Paired Differences
Examples
Few of the practical examples are illustrated below to help the student to apply variety
of analysis tools used in marketing research
Question 1: as per survey reports of a state it was found that the average annual
expenditure for food grains by households is Rs 1,596. A random sample of 34 people in
Solution:
Test Statistics
x – µ
t = 5/ √n follows Student’s’t’ distribution with d.f. = n-1 = 33.
= -2.35
Where
P Value
Since P-value of 0.0251 > 0.01 we do not reject Ho. It is statistically not significant
Conclusion
At the 1% level of significance, the data does not provide enough evidence to reject
the null hypothesis. Thus we conclude that the mean expenditure for food grains for the
city is not different from the state average.
Solution: - Since this problem involves comparing a single group’s mean with the
population mean and the standard deviation for the population is known, the proper
statistical test to use is the Z-test
So σx = σ/√n = 15/√9
= 15/3 = 5
Z= (113-100)/ 5= 13/5= 2.6 the table value of Z at 0.05 significance level is 1.64 the
calculated value of Z is greater than the table value hence the null hypothesis
Amity Directorate of Distance & Online Education
82 Research Methodology
Ho:µ= 100 is rejected which means the alternative hypothesis H1: µ >100 is true
Notes which means the new product has increased the satisfaction level of the consumers in a
significant manner.
Question3: A retail outlet has recently launched a new promotion campaign in its stores
across the city, and taken a random sample of 25 stores and found the average sales to
be 15 lacs per month, with a standard deviation of 9. Can we infer that the new promotion
campaign is a success if the average sales of all the stores are 12lacs per month?
Solution: - Since this problem involves comparing a single group’s mean with the
population mean and the standard deviation for the population is not known, the proper
statistical test to use is the one-sample t-test.
Ho:µ= 12 Vs H1: µ >12 and Significance level is taken as .05, the degree of freedom
will be n-1 which is equal to 25-1=24 We use t distribution as ? is unknown.
x – µ
t = σ√ n
Here
t = 15-12 = 3
=
15
= 1.84
9/√25 9/5 9
Value of t from the table at significance level .05 and degree of freedom 24 is 2.064
The calculated value of t is 1.84 which is less than the table value 2.064 hence the
null hypothesis is true i.e. Ho: λ = 12 which means that there is significant change in the
sales of the retail outlet hence we can infer that the new sales promotion campaign is not
a success.
Question4 :company has recently launched a new version of soap with changes
in packaging and look, the sales of the different territories is having a mean of 50000
units, and standard deviation 4000 units, now the company has taken sales data from 81
territories and it was found that average sales is 52000 units . Can we conclude that the
new packaging significantly improved the sales?
Solution-
The null hypothesis and alternative hypothesis will be
z = (x – µ) were σx = σ/√n
The standard error of the mean can be calculated by the following formula:
So σx = σ/√n = 4000/√81
= 4000/9 = 444.4
Now we will look the value of z from the table at 0.05 level of significance which is
1.64 the calculated value of z is greater than the table value hence our null hypothesis is
rejected which means the alternative hypothesis is correct , therefore H1 : µ > 50,000 is true.
Amity Directorate of Distance & Online Education
Research Methodology 83
Question5: following information is collected in a survey from two cities Delhi and
Mumbai from the people having cars, the sample size taken was 100 Notes
Delhi (X) Mumbai (Y) Total
Can we infer from the above data that the cars owned by women are relatively more
in Mumbai than in Delhi?
Solution
χ2 = ∑ (Ο - E2)
E
Where O is the Observed Frequency in each category
E is the Expected Frequency in the corresponding category
df is the “degree of freedom” (n-1)
χ2 is Chi Square
Expected frequency for women owning a car in Delhi can be calculated as follows
Similarly we can calculate the other frequencies and make the following table
Cars owned by
women in Delhi 10 12 -2 4/12=0.33
Cars owned by
women in Mumbai 20 18 2 4/18=0.22
χ2 =( 0.33+0.22+0.14+0.09 ) = 0.78
The degree of freedom will be (c-1) (r-1), where c number of column and r number
Notes of rows
Now we have to look for the value of χ2 at degree of freedom 1 at significance level
5% the level of significance can also be changed but normally we take the significance
level as 5%, the value is 3.841
The calculated value of χ2 is 0.78 and table value is 3.841. The calculated value is
lower than the table value hence we can conclude that the cars owned by women are
relatively more in Mumbai than in Delhi
Question 6: A drug manufacturing company is testing its new drug for curing baldness
, in and experiment of 500 persons, half of them were given the new drug and rest were
given the placebo, the patients reactions to treatment were recorded in the following table
Can we conclude the new drug is significantly different than placebo in curing the
baldness?
Solution: - we assume that the new drug is not significantly different from the placebo
in treatment of baldness
χ2 = ∑ (Ο - E2)
E
Where O is the Observed Frequency in each category
χ2 is Chi Square
First we have to calculate the expected frequencies. For expected frequency of patients
getting cured by drug can be calculated as:
E11 = (250x280)/ 500 = 140 similarly we can calculate the expected frequencies and
make the following table.
Expected values for the result of the new drug on the baldness
Treatment Cured significantly Allergic reaction No effect Total
30 35 -5 25 0.714
40 35 5 25 0.714
70 75 -5 25 0.333
80 75 5 25 0.333
Total 3.522
Now we have found that the calculated value of χ2 = 3.522, we will now look for the
table value of χ2 at significance level 0.05 and degree of freedom (2-1)(3-1) = 1x2 = 2.
The value of χ2 is 5.99 which is greater than the calculated value the null hypothesis
accepted which means there is no significant difference in the results of new drug and
placebo in curing the baldness.
Question7: following data is collected from a survey about the monthly income of
house holds in a locality and their expenditure in retail outlets. Is there any relationship
between the two variables?
Income in Thousands 18 14 25 12 30 22 36 10
Expenditure in Retail 8 7 10 5 14 10 16 4
outlets , in thousands
Solution:-
R= n.ΣXY-(ΣX) (ΣY)
Now we have to make the following table for calculating the values required
X Y XY X2 Y2
18 8 144 324 64
14 7 98 196 49
25 10 250 625 100
12 5 60 144 25
30 14 420 900 196
22 10 220 484 100
36 16 576 1296 256
10 4 40 100 16
ΣX ΣY ΣXY ΣX2 ΣY2
R= n.ΣXY-(ΣX) (ΣY)
Notes
√[nΣX2 -(ΣX)2][nΣY2 -(ΣY)2]
√ {(10x4069)-(167)2} {(10x806)-(74)2
= 18080-12358
√ (40690-27889) x (8060-5476)
= 5722
√ (12801) x (2584)
= 5722 = 5722/5751.32=0.99
√ 33077784
Area 1 2 3 4 5 6 7 8 9
Expend on Sales 25 35 30 60 45 30 40 60 45
Promotion
(in thousands)
Can we predict the sales volume on the basis of given sales promotion expenditure
for an area?
Solution: we assume that the two variables expenditure on sales promotions and
sales are linearly related to each other so to predict the sales on the basis of expenditure
on sales promotion we have to find our the regression equations.
Y=a+bX where Y is sales volume, X is the expenditure on sales promotion activity and
a and b are intercept and coefficient of X.
We have to find out the values of a and b we can use the following formula
ΣY= n a +bΣ X and ΣXY=aΣX+bΣX2 were n is the number of variables taken here it
is nine.
2 90 35 1225 3150
Amity Directorate of Distance & Online Education
Research Methodology 87
3 70 30 900 2100
Notes
4 110 60 3600 6600
5 95 45 2025 4275
6 75 30 900 2250
9 80 45 2025 3600
ΣY ΣX ΣX2 ΣXY
ΣY=n a +bΣX
ΣXY=aΣX+bΣX2
(2)__________________34475=370a+b (16500)
(3)_______________________36572=401.4a+16500b
36572-34475= (401.4-370) a
2097=31.4a
a = 2097/34.1=61.4
Similarly
370b= 820-(61.4)9=820-552.6=267.4
So b= 267.4/370=0.723
Y= 61.4+0.723X
So we can predict the sale of area by putting the value of X for that area.
Question 9 : A telecom company has introduced two new plans T 199 and T 299 for
their pre paid customers, the two plans were launched and after a month the company
conducted a survey to find out the satisfaction level of the customers from the two plans,
it was found that out of sample of 60 customers using T 199 plan 18 were very much
satisfied and out of sample of 100 customers using T 299 plan 22 were very satisfied.
Find out which talk plan is more effective.
Solution:
We will assume that both plans have given same level of satisfaction hence;
Ho: p1= p2 and H1: p1 > p2 here p1 and p2 are proportions of customers who are
Notes satisfied by using the plans T 199 and T 299
n 1 + n2
The table value of Z at 0.05 significance level is 1.645 which is greater than 1.131
calculated value of Z hence we can calculate that null hypothesis Ho : p1= p2 is true which
means there is no difference in the satisfaction level generated by T 199 plan in comparison
to T 299 plan hence both plans are equal.
Question10: A company recently launched their new product in the market and
investigated the brand preference of its product by the distribution channel partner, the
company had selected three states from the north zone and collected the data from 6
distributors from one state, the scores were calculated on the basis of a questionnaire,
higher score represent the more preference given by the distributor for company’s product.
Using 0.05 significance level analyze and comment that there is no difference of brand
preference shown by the distributors of three different states.
6 5 6
5 5 7
4 4 6
5 4 5
6 5 6
4 4 6
Solution
Assuming that all the distributors of the three states show equal brand preference
X1 X2 1 X2 X22 X3 X32
6 36 5 25 6 36
5 25 5 25 7 49
4 16 4 16 6 36
5 25 4 16 5 25
6 36 5 25 6 36
4 16 4 16 6 36
Total 30 154 27 123 36 218
CF, correction factor = T2/ n, n = total number of samples which is equal to 18 here.
(93)2/18=8649/18= 480.5
= {154+123+218}- 480.5=14.5
Here n1, n2 and n3 represent the samples of three areas, Punjab, Uttar Pradesh and
Haryana.
=7
MSE= SSE/df2=7.5/15=0.5
Total 14.5 17
The table value of F for df1 =2,df2=15 and significance level 0.05 is 3.68 and calculated
value of F is 7 which more therefore null hypothesis that brand preference in the three states
is equal is void and for this reason we can say there is difference in brand preference by
distributors of the three states of northern region
Question11: A mobile company claims that the average life of their M800 mobile
phone is more than 5000 hrs, a random sample of 25 was tested and a mean and standard
deviation of 5200 and 250 were computed. Is company’s claim of 5000hrs valid?
Solution:-
Assuming the population represent normal distribution curve and the claim is valid
the null hypothesis becomes
Ho : µ≥5000 Vs H1 : µ< 50
x – µ
t = s/√ n
Notes
here
the table value of t at 0.05 significance level and degree of freedom of n-1 i.e. 16-1 = 15 is
2.131
The calculated value of t is greater than the table value, which means the null hypothesis
Ho : µ≥5000 is rejected which means the alternate hypothesis H1 : µ < 50 is true which
concludes that the claim of the company that the average life of their Mobile phone M800
is not valid or in other words the average life of M800 mobile phone is less than 5000hrs.
Number of 1 2 3 4 5 6 7 8 9 10 11 12
States
1st month 50 42 51 26 35 42 60 41 70 55 62 38
Sales in
thousands
4th month 62 40 61 35 30 52 68 51 84 63 72 50
sales in
thousands
Solution
We will assume that there is no growth in the sales of the new brand of tea within three
months hence our null hypothesis becomes
t = (d bar - µd ) / Sd √n
d bar = (Σd ) / n
Now to calculate the values required we will construct the following table
1 50 62 12 144
2 42 40 -2 4
3 51 61 10 100
4 26 35 9 81
5 35 30 -5 25
6 42 52 10 100
7 60 68 8 64
Amity Directorate of Distance & Online Education
Research Methodology 91
8 41 51 10 100
9 70 84 14 196
Notes
10 55 63 8 64
11 62 72 10 100
12 38 50 12 144
t = ( d bar - µd ) / Sd√n
Now we will see the value of t from the table at 0.05 significance level and degree of
freedom of n-1 = 12-1 =11
The table value is 1.796 the calculated value of t is greater than the table value hence
we reject the null hypothesis Ho: µd = 0 so we can conclude that there is significant growth
in the sales of the new brand of tea.
Question 13 : Following table gives the data about the sale target achieved by 4
salesmen in three months Jan ,Feb and Mar of 2010.
Month Salesman
A B C D
JAN 50 40 48 39
FEB 46 48 50 45
MAR 39 44 40 39
Is there a significant difference in the sale made by the four salesmen? Is there a
significant difference in the sales made during the different months?
Solution
We assume that there is no significant difference between the sales target achieved
by the four salesmen during different months. Coding the above data by subtracting 40
from each observation, we construct two way ANNOVA table as follows
= { 72.25+210.25+1}-192 = 91.5=SSR
14/13.75=1.018
Total 216 11
The table value of F for df1= 3, df2 = 6 at a significance level of 0.05 is 4.75.Since the
calculated value of F is 1.018 less than the table value the null hypothesis is accepted, so
we can say that the sales target achieved by salesmen do not differ significantly.
Similarly the table value of F for df1= 2, df2 = 6 at a significance level of 0.05 is 5.14.Since
the calculated value of F is 3.327 less than the table value the null hypothesis is accepted
so we can conclude that sales made during different months do not differ significantly.
Question14: following data is being collected relate to the age of insured person and
mediclaims submitted by them in 3 years.
Insured Person 1 2 3 4 5 6 7 8 9 10
Age 30 32 35 40 48 50 52 55 57 61
Claims submitted 1 0 2 5 2 4 6 5 7 8
AGE x X X2 y claims Y Y2 XY
30 -16 256 1 -3 9 48
32 -14 196 0 -4 16 56
35 -11 121 2 -2 4 22
40 -6 36 5 1 1 -6
48 2 4 2 -2 4 -4
50 4 16 4 0 0 0
52 6 36 6 2 4 12
55 9 81 5 1 1 9
57 11 121 7 3 9 33
61 15 225 8 4 16 60
R= n.ΣXY-(ΣX)( ΣY)
R = (10x230 )
√[(10x1092)(10x64)]
R=230/264.363 = 0.870
The value of r is positive and nearing one which means the mediclaims forwarded by
the insured person and their age are positively correlated in higher degree.
Question15: following data is collected from a survey about the monthly income of
house holds in a locality and their expenditure in retail outlets. Is there any relationship
between the two variables?
Income in 18 14 25 12 30 22 36 10
Thousands
Expenditure in 8 7 10 5 14 10 16 4
Retail outlets ,
in thousands
Solution
R= n.ΣXY-(ΣX)( ΣY)
Now we have to make the following table for calculating the values required
Notes X Y XY X2 Y2
18 8 144 324 64
14 7 98 196 49
25 10 250 625 100
12 5 60 144 25
30 14 420 900 196
22 10 220 484 100
36 16 576 1296 256
10 4 40 100 16
R= n.ΣXY-(ΣX)( ΣY)
√{(10x4069)-(167)2}{(10x806)-(74)2
= 18080-12358
√ (40690-27889)x(8060-5476)
= 5722
√(12801)x(2584)
= 5722 =5722/5751.32=0.99
√ 33077784
Summary
Analysis of survey based on data starts with simply tabulating the collected data.
Before we do this, data is assumed to be coded if it is nominal scaled. If we are using
SPSS, value labels for nominal data variables must also be input and saved. ANOVA
stands for analysis, the generic name given to a set of techniques for studying cause and
effect relationship of one or more factors on a single dependent variable. The analysis
of variance technique is used when the independent variables are of nominal scale and
dependent variable is metric. Calculation and regression are the best applied together to
test whether metric variables are associated with each other, and whether the dependent
variable can be explained by some independent variables, or predicted from them. In
Amity Directorate of Distance & Online Education
Research Methodology 95
marketing, the dependent variable of interest is usually sales. The independent variables
can be any marketing mix variables which affect sales, such as advertising expenditure,
number of sales people, promotional expenditure and so on. Discriminant analysis is
Notes
somewhat similar to regression analysis. There are dependent variables and there are
some independent variables used to predict the dependent variable is categorical, not
metric. It is used to classify people or objects into two or more groups based on some
knowledge of there characteristics. Factor analysis technique provide a fascinating way of
reducing the number of variables in a research problem to a smaller and more manageable
number by combining related ones into factors. This relieves the researcher from the
confusion arising through overlapping measures of the same underlying variables. Cluster
methods are many. The basic idea of cluster analysis is to group similar objects together.
Some measure of similarity is used to this. Two basics types of clustering methods are
hierarchical and non-hierarchical method, and try to identify the number of cluster in the
data. Conjoint analysis is ideally suited for product design problems. This is because the
technique is able to put numerical value on the mysteries of the consumer’s mind. It tries
to map his decision making process and tradeoffs he makes while choosing a particular
product offering. The result is conjoint analysis is a set of utility values for every product
variation and attribute level on offer.
1. Which of these is a type of data analysis that is used in analyzing raw data?
a) Bivariate analysis
b) Regression analysis
c) Conjoint analysis
d) None of the above
c) Analysis of Variety
Notes d) None of the above
7. The main objective of regression analysis is to explain the variation in one variable,
based on the variation in one or more other variables.
a) One variable
b) Multi variable
c) Two variable
d) One or more variable
Questions &Exercises
1. Define Experimental design in ANOVA?
2. Discuss the method and usage of cluster analysis?
3. What is Null Hypothesis?
4. Define paired simple t test?
5. Describe the concept of data analysis.
6. How to differentiate between simple tabulation and cross tabulation?
7. Describe the design of Experiments