BBA BRM Module 10final
BBA BRM Module 10final
Content:
• Tabular and Pictorial/Graphical Representation of Data-using Excel
• Descriptive Statistics
• Correlation and Regression
• Basic statistical tests (t, chi-square)
Data Processing
Processing data is very important in market research. After collecting the data, the next task
of the researcher is to analyze and interpret the data. The purpose of analysis is to draw
conclusions. There are two parts in processing the data:
1. Data Analysis: It involves organizing the data in a particular manner. The process of
systematically applying statistical and/or logical techniques to describe and illustrate,
and evaluate data with the goal of discovering useful information, and supporting
decision-making.
2. Interpretation of data: It is a method for deriving conclusions from the data analyzed.
Analysis of data is not complete, unless it is interpreted.
3. Classification of data: Process in which the collected data are arranged in separate classes,
groups or sub-groups according to their common characteristics. Raw data cannot be easily
understood and it is not fit for further analysis and interpretation. Classification of data helps
users in comparison and analysis. For example, the population of town can be grouped
according to gender, age, marital status etc.
Main features of classification of data:
• To simplify the complex data: Reduce the bulk of information (data) under
investigation into a simplified and meaningful form.
• To economize space: Space is saved without sacrificed the quality and quantity of data.
• To depict trend
• To facilitate comparison: Data presented in a tabular form, having rows and columns,
facilitate quick comparison among its observations.
• To facilitate statistical comparison
• To help reference: It can be used as reference for future needs.
Tabulation may be of two types:
i. Simple or One-way Tabulation: A single variable is counted. The MCQs which allow
only one answer may use one-way tabulation or univariate. The questions are pre-
determined and consist of counting the number of responses falling into a particular
category and calculate the percentage. There are two types of univariate tabulation:
a. Question with only one response: If the question has only one answer, the
tabulation may be of the following type:
Table No. 1
Study of number of children in a family
No. of children Family Percentage
0 10 5
1 30 15
2 70 35
3 60 30
4 20 10
More than 4 10 5
200 100
Table No. 2
Choice of an automobile
What do you dislike about the car which you own at present?
Parameter No. of respondents
Engine 10
Body 15
Mileage 15
Interior 06
Color 18
Maintenance Frequency 16
Inconvenience 20
There is duplication because respondents may be dissatisfied with the mileage given
by vehicle and may dislike interior of the car. Here, there are more than one parameter
to dislike the car by owner. Suppose we are tabulating the cause of inconvenience felt
by the car owner, it can be classified as follows:
• Cramped legroom
• Rear seat problem
• Difficulty in raising the window
• Difficulty in locking the door.
Now, the tabulation of each of the specific factors would help to identify the real
reason for dislike.
ii. Cross Tabulation or Two-way Tabulation: It includes two or more variables, which are
treated simultaneously. Tabulation can be done entirely by hand or by machine, or by
both hand and machine. This is known as Bivariate Tabulation. The data may include
two or more variables. Cross tabulation is very commonly used in market research.
Example: Popularity of health drink among families having different incomes. Suppose
500 families are contacted and data collected is as follows:
Table No. 3
Use of Health Drink
Income per month No. of children per family No. of families
0 1 2 3 4 5 More than 5
<1000 5 0 8 9 11 15 25 73
1001-2000 10 5 8 10 13 18 27 91
2001-3000 20 10 12 14 20 22 32 130
3001-4000 12 3 6 7 13 20 30 91
4001-5000 6 2 6 5 10 15 20 64
>5000 6 1 4 5 7 10 18 51
59 21 44 50 74 100 152 500
Note: The above table shows that consumption of a health drink not only depends on
income but also on the number of children per family. Health drinks are also very
popular among the family with no children. This shows that even adults consume this
drink. It is obvious from the table that 59 out of 500 families consume health drinks
even though they have no children. The table also shows that families in the income
group of 2001 to 3000 consume health drinks the most.
The form in which tabulation is to be done is decided by taking into account:
a. The purpose of study, and
b. The use of statistical tools e.g., mean, mode, standard deviation etc. Improper
tabulation may create difficulties in the use of these tools.
• If the researcher wants to infer something about the total population from which the
sample was taken, statistical methods are used to make inference. We may say that,
while a hypothesis is useful, it is not always necessary. Many a time, a researcher is
interested in collecting and analysing the data indicating the main characteristics
without a hypothesis.
• Also, a hypothesis may be rejected but can never be accepted except tentatively.
Further, evidence may prove it wrong. It is wrong to conclude that since hypothesis
was not rejected it can be accepted as valid.
• Null Hypothesis: It is a statement about the population, whose credibility or validity
the researcher wants to assess based on the sample. A Null hypothesis is formulated
specifically to test for possible rejection or nullification. Hence the name ‘Null
Hypothesis’. Null hypothesis always states “no difference”. It is this null hypothesis that
is tested by the researcher.
There are several bases on which hypothesis are classified:
a. Descriptive Hypothesis: These by name implies describing some characteristics of an
object, a situation, an individual or even an organization. Example: Why youngster
prefer “X” soft drinks? Decentralization of decision making is more effective. All these
tell us the characteristics of some entity.
b. Relation Hypothesis: In this case we describe relationship between two variables.
Example: Why rich people shop at life style? Rate of attrition is high in those Jobs
where there is night shift working.
Steps involved in Hypothesis Testing
1. Formulate the null hypothesis, with Ho and Ha, the alternate hypothesis. According
to the given problem, H0 represents the value of some parameter of population.
2. Select an appropriate test assuming, Ho to be true.
3. Calculate the value.
4. Select the level of significance other at 1% or 5%.
5. Find the critical region.
6. If the calculated value lies within the critical region, then reject Ho.
7. State the conclusion in writing.
Types of Tests:
1. Parametric test:
i. Parametric tests are more powerful. The data in this test is derived from
interval and ratio measurement.
ii. In parametric tests, it is assumed that the data follows normal distributions.
Examples of parametric tests are (a) Z-test, (b) t-test, and (c) F-test.
iii. Observations must be independent i.e., selection of any one item should not
affect the chances of selecting any others be included in the sample.
2. Non-parametric test: They are used to test the hypothesis with nominal and ordinal
data.
i. We do not make assumptions about the shape of population distribution.
ii. These are distribution-free tests.
iii. The hypothesis of non-parametric test is concerned with something other than
the value of a population parameter.
iv. Easy to compute: There are certain situations particularly in marketing
research, where the assumptions of parametric tests are not valid. Example: In
a parametric test, we assume that data collected follows a normal distribution.
Example of non-parametric tests are (a)Binomial Test (b) Chi-square test (c)
Mann-Whitney U test (d) Sign test. A Binomial test is used when the population
has only two classes such as male, female, buyers, non-buyers, success, failure
etc. All observations made about the population must fall into one of the two
tests. The Binomial test is used when the sample size is small.
Example 1: A CFL manufacturing company supplies its products to various retailers across the
country. The company claims that the average life of its CFL is 24 months. The company has
received complaints from retailers that the average life of its CFL is not 24 months. For
verifying the complaints, the company took a random sample of 60 CFLs and found that the
average life of the CFL is 23 months. Assume that the population standard deviation is 5
months. Use Alpha α= 0.05 to test whether the average life of a CFL in the population is 24
months.
Solution:
One-Sample Z
N Mean SE Mean 95% CI Z p
60 23.000 0.645 (21.735, 24.265) -1.55 0.121
• The null hypothesis that there is no change in the average life of the CFL, is accepted.
• The sample mean result may be due to sampling fluctuations. The company should ask
retailers to re-test the average life of its CFL.
Example 2: During the economic boom, the average monthly income of software professionals
touched Rs. 75,000. A researcher is conducting a study on the impact of economic recession
in 2008. The researcher believes that the economic recession may have an adverse impact on
the average monthly salary of software professionals. For verifying his belief, the researcher
has taken a random sample of 20 software professional and computed their average income
during the recession period. The average income of these 20 professionals is computed as Rs.
60,000. The sample standard deviation is computed as Rs. 3,000. Use alpha = 0.10 to test
whether the average income of software professionals is Rs. 75,000 or its has gone down as
indicated by the sample mean.
Solution:
Here, the sample size is 20 (less than 30), therefore, t test can be used for testing the
hypothesis.
The researcher’s belief about the decrease in the average monthly income of software
professionals holds good. The researcher is 90% confident that the average monthly income
of software professional has gone down owing to economic recession in 2008. The p-value
output indicates the acceptance of the alternative hypothesis.
One-Sample t
N Mean Std. Dev SE Mean 90% Upper bound Z p
40 60000 3000 671 60891 -22.36 0.000
Chi-Square Test
With the help of this test, we will come to know whether two or more attributes are associated
or not. How much the two attributes are related cannot be by Chi-square test. Suppose, we
have certain number of observations classified according to two attributes.
Example 3: The number of automobile accidents per week in a certain city were as follows:
Months Jan Feb Mar Apr May June July Aug Sep Oct
No. of accidents 12 8 20 2 14 10 15 6 9 4
Does the given data indicate that accident conditions were uniform during the 10-month
period.
Solution:
Correlation
Correlation refers to the statistical relationship between the two entities. It measures the
extent to which two variables are linearly related. For example, the height and weight of a
person are related, and taller people tend to be heavier than shorter people.
You can apply correlation to a variety of data sets. In some cases, you may be able to predict
how things will relate, while in others, the relation will come as a complete surprise. It's
important to remember that just because something is correlated doesn't mean it's causal.
There are three types of correlation:
• Positive Correlation: A positive correlation means that this linear relationship is
positive, and the two variables increase or decrease in the same direction.
• Negative Correlation: A negative correlation is just the opposite. The relationship line
has a negative slope, and the variables change in opposite directions, i.e., one variable
decrease while the other increases.
• No Correlation: No correlation simply means that the variables behave very differently
and thus, have no linear relationship.
Example 4: Tom has started a new catering business, where he first analyses the cost of
making a sandwich and what price he should sell them. After talking to various cooks currently
selling the sandwich, he has gathered the information below. Tom was convinced that there
is a positive linear relationship between the number of sandwiches and the total cost of
making them. Analyse if this statement is true.
Regression Analysis: