0% found this document useful (0 votes)
23 views20 pages

BBA BRM Module 10final

The document discusses data analysis and interpretation. It covers topics like tabular and graphical representation of data, descriptive statistics, correlation and regression, and statistical tests. The main steps covered in processing data are editing, coding, classification, tabulation and graphical presentation. Data processing involves organizing raw data by editing it for errors and completeness, coding responses numerically, classifying the coded data, tabulating the results, and presenting graphs. Proper data processing is important for analyzing and interpreting data to draw meaningful conclusions.

Uploaded by

Rishika Madhukar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views20 pages

BBA BRM Module 10final

The document discusses data analysis and interpretation. It covers topics like tabular and graphical representation of data, descriptive statistics, correlation and regression, and statistical tests. The main steps covered in processing data are editing, coding, classification, tabulation and graphical presentation. Data processing involves organizing raw data by editing it for errors and completeness, coding responses numerically, classifying the coded data, tabulating the results, and presenting graphs. Proper data processing is important for analyzing and interpreting data to draw meaningful conclusions.

Uploaded by

Rishika Madhukar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Module 10

Data Analysis & Interpretation

Content:
• Tabular and Pictorial/Graphical Representation of Data-using Excel
• Descriptive Statistics
• Correlation and Regression
• Basic statistical tests (t, chi-square)

Data Processing
Processing data is very important in market research. After collecting the data, the next task
of the researcher is to analyze and interpret the data. The purpose of analysis is to draw
conclusions. There are two parts in processing the data:
1. Data Analysis: It involves organizing the data in a particular manner. The process of
systematically applying statistical and/or logical techniques to describe and illustrate,
and evaluate data with the goal of discovering useful information, and supporting
decision-making.
2. Interpretation of data: It is a method for deriving conclusions from the data analyzed.
Analysis of data is not complete, unless it is interpreted.

Steps in Processing of Data:


Data processing is concerned with editing, coding, classifying, tabulating and charting and
diagramming research data:
1. Editing: The process of examining the data collected in questionnaires/schedules
to detect errors and omissions and to see that they are corrected and schedules
are ready for tabulation.
2. Coding: It is the process/operations by which data/responses are organized into
classes/categories and numerals or other symbols are given to each item according
to the class in which it falls. The symbols are known as Code.
3. Classification
4. Tabulation
5. Graphical Presentation
Preparing Raw Data: The information collected may be illegible, incomplete and inaccurate
to a considerable extent. Also, the information collected will scattered in several data
collection formats. The data lying in such a crude form are not ready for analysis. Keeping this
in mind, the researcher must take some measures to organize the data so that it can be
analyzed.
1. Editing: The first step in data processing is editing of complete schedules/questionnaires.
Irrespective of the method of data collection, the information collected is called raw data.
The first step in processing our data is to ensure that the data are “clean” that is free from
inconsistencies and incompleteness. This process of cleaning is called “editing”. Editing is
a process of checking to detect and/or correct errors and omissions. Editing consists of
scrutinizing the completed research instruments to identify and minimize errors,
incompleteness, and gaps in the information obtained from the respondents.
a. Field Editing: In field editing, the researcher/field staff may go through the
questionnaire as soon as it is filled by respondent to find out whether or not there is a
need for completing partial or correcting vague answers. The main problems faced in
field editing are:
i. Inappropriate respondents: It is intended to include house owners in the
sample for conducting the survey. If a tenant is interviewed, it would be wrong.
ii. Incomplete Interviews: All questions are to be answered. There should not be
any ‘blanks.’ Blanks can have different meanings, like (a) No answer (b) Refusal
to answer (c) Question not applicable (d) Interviewer by oversight did not
record. The reason for ‘no answer’ could be that the respondent really does
not know the answers. Sometimes, the respondent does not answer, may be
because of the sensitive or emotional aspect of the question. Sometimes
“don’t know” answer is edited as “no response”. This is wrong. “Don’t know”
means that the respondent is not sure and is in double mind about his reaction
or considers the questions personal and does not want to answer it. “No
response” means that the respondent is not familiar with the
situation/object/event/individual about which he is asked.
iii. Improper Understanding: The interviewer, in a hurry, would have recorded
some abbreviated answer. Later at the end of the day, he cannot figure out
what it meant.
iv. Lack of consistency: The earlier part of the questionnaire indicates that there
are no children and in the later part the age of children is mentioned.
v. Legibility: If what is said is not clear, the interviewer must clarify the same on
the spot.
vi. Fictitious Interviews: This amount to cheating by the interviewer. Here, the
questionnaires are filled without conducting interviews. A surprise check by
superiors is one way to minimize this.
b. Office Editing or Central Editing: Office editing is more thorough than field editing. The
job of an office editor is more difficult than that of the field editor. In office editing, all
the questionnaires are brought at the office and one by one it is scrutinized to check
deficiency in it. In case of a mail questionnaire there are no other methods of cross-
verification, except to conduct office audit.
Problems faced by office editors: Problems of consistency, rapport with respondents, etc.,
are some of the issues which get highlighted during office editing. Example:
i. A respondent indicated that he doesn’t drink coffee, but when questioned about
his favorite brand, he replied “Bru”. Here, there is inconsistency. There are two
possibilities which an editor needs to consider. (a) Was the respondent lying? (b)
Did the interviewer record wrongly? The editor has to look into the answers to
other questions on beverages, and interpret the right answer.
ii. A rating scale given to a respondent states Sematic Differential Scale with 10 items.
The respondent has ticked “strongly agree” to the 10 items. Here, Semantic
Differential Scale consists of items which has alternately positive and negative
connotations. If a respondent has marked both positive and negative as ‘agreed’,
the only conclusion the editor can draw is that respondent filled the questionnaire
without knowledge. The editor will have to discard this questionnaire, since there
are no alternatives.
iii. “What is the most expensive purchase you have made in the last one year?” is the
question. Two respondents answer (a) LCD TV and (b) Trip to USA. Here, both the
respondents have answered correctly. The frame of reference is different. The main
problem is, one of them is a product, whereas the other is a service. While coding
the data, the two answers should be put under two different categories.
Answers to open-ended questions pose great difficulty in editing.
2. Coding: It refers to those activities which helps in transforming edited questionnaires into
a form that is ready for analyses. Coding speeds up the tabulation while editing
eliminates errors. Coding involves assigning numbers or other symbols to answers so that
the responses can be grouped into limited number of classes or categories. Coding can be
done in any number of ways i.e., Assigning a letter, number, color etc. Example: 1 is used
for male and 2 for female.
The coding scheme, assigned symbol together with specific coding instruction may be
recorded in a book. Some guidelines to be followed in coding which is as follows:
i. Establishment of appropriate category: Suppose the researcher is analyzing the
“inconvenience” that a car owner is facing with his present model. Therefore, the
factor chosen may be “inconvenience”. Under this could be 4 types (a) Inconvenience
in entering the backseat (b) Inconvenience due to insufficient legroom (c)
Inconvenience with respect to the interior (d) Inconvenience in door locking and
opening the dicky. Now, the researcher may classify these four answers based on
internal inconvenience and other Inconveniences referring to the exterior. Each is
assigned a different number for the purpose of codification.
ii. Mutual Exclusivity: This is important because the answer given by the respondent
should be placed under one category. Example: Occupation of an individual may be
responded to as (a) Professional (b) Sales (c) Executive (d) Manager etc. Sometimes,
respondents might think that they belong to more than one category. This is because
sales personnel may be doing a sales job and therefore should be placed under the
sales category. Also, he may supervise the work of other sales executive(s). In this case,
he is doing a managerial function. Viewed in this context, he should be placed under
the managerial category, which has a different code. Therefore, he can only be put
under one category, which is to be decided. One way of deciding this could be to
analyze “in which of the two functions does he spend most time”?

Questions Answers Codes


1. Do you own a vehicle? Yes 1
No 2
2. What is your occupation? Salaried S
Business B
Retired R
Technical T
Consultant C

3. Classification of data: Process in which the collected data are arranged in separate classes,
groups or sub-groups according to their common characteristics. Raw data cannot be easily
understood and it is not fit for further analysis and interpretation. Classification of data helps
users in comparison and analysis. For example, the population of town can be grouped
according to gender, age, marital status etc.
Main features of classification of data:

• Classification of data into groups (Literate or illiterate, employed or unemployed)


• Homogeneity as basis of classification (similar characteristics units are placed in one
class or group)
• Unity in diversity
Objective/Advantage/Functions of classification of data:

• To condense and simplify the collected data.


• To bring out points of similarities and dissimilarities of data.
• To highlights the features of the data.
• To make facts comparable.
• To make logical arrangements.
• It provides information about the mutual relationships among elements of a data set.
• To prepare ground for tabulation.
Method/Types of Classification:
a. Qualitative Classification or Classification according to Qualitative attributes:
• Simple Classification: Also called Dichotomy or two-fold classification. Data are
classified into two classes (one class consisting of items possessing the given
attribute and the other class consisting of items which do not possess the given
attribute) and only one attribute is studied such as: male and female; blind and
not-blind; educated and uneducated etc.
• Manifold Classification: Class is sub-divided into more than two sub-classes
which may be sub-divided further. Example: Population->Urban and Rural-
>Male and Female.
b. Quantitative Classification:
• Chronological Classification: When data are classified on the basis of time e.g.,
hour, day, week, month, year etc.
• Geographical Classification: When data are classified on the basis of
geographical or locational differences, such as: cities, districts, or villages.
• Variable Classification: When data are classified on the basis of some
characteristics which can be measured such as height, weight, income,
expenditure, production, or sales. The term variable refers to any quantity or
attributes whose value varies from one investigation to another. It can be
divided into the following two types:
(i) Continuous variable: The one that can take any value within the range of
number Like height, weight etc.
(ii) Discrete (also called discontinuous) variable: The one whose values change
by steps or jumps and can not assume a fractional value. The number of
children in a family, number of workers (or employees), number of students in
a class. In such a case data are obtained by counting.
4. Tabulation: Tabulation refers to counting the number of cases that fall into various
categories. The results are summarized in the form of statistical tables. The raw data is divided
into groups and sub-group(s). The way of summarizing and presenting the given data in a
systematic form in rows and columns.
The logical listing of related quantitative data in vertical columns and horizontal rows of
numbers with sufficient explanatory and qualifying words, phrases and statements in the form
of titles, headings and explanatory notes.
Objective or Advantages of Tabulation:

• To simplify the complex data: Reduce the bulk of information (data) under
investigation into a simplified and meaningful form.
• To economize space: Space is saved without sacrificed the quality and quantity of data.
• To depict trend
• To facilitate comparison: Data presented in a tabular form, having rows and columns,
facilitate quick comparison among its observations.
• To facilitate statistical comparison
• To help reference: It can be used as reference for future needs.
Tabulation may be of two types:
i. Simple or One-way Tabulation: A single variable is counted. The MCQs which allow
only one answer may use one-way tabulation or univariate. The questions are pre-
determined and consist of counting the number of responses falling into a particular
category and calculate the percentage. There are two types of univariate tabulation:
a. Question with only one response: If the question has only one answer, the
tabulation may be of the following type:

Table No. 1
Study of number of children in a family
No. of children Family Percentage
0 10 5
1 30 15
2 70 35
3 60 30
4 20 10
More than 4 10 5
200 100

b. Multiple responses to question: Sometimes, respondents may give more than


one answer to a given question. In this case, there will be an overlap, and
responses when tabulated, need not add to 100%.

Table No. 2
Choice of an automobile
What do you dislike about the car which you own at present?
Parameter No. of respondents
Engine 10
Body 15
Mileage 15
Interior 06
Color 18
Maintenance Frequency 16
Inconvenience 20
There is duplication because respondents may be dissatisfied with the mileage given
by vehicle and may dislike interior of the car. Here, there are more than one parameter
to dislike the car by owner. Suppose we are tabulating the cause of inconvenience felt
by the car owner, it can be classified as follows:

• Cramped legroom
• Rear seat problem
• Difficulty in raising the window
• Difficulty in locking the door.
Now, the tabulation of each of the specific factors would help to identify the real
reason for dislike.
ii. Cross Tabulation or Two-way Tabulation: It includes two or more variables, which are
treated simultaneously. Tabulation can be done entirely by hand or by machine, or by
both hand and machine. This is known as Bivariate Tabulation. The data may include
two or more variables. Cross tabulation is very commonly used in market research.
Example: Popularity of health drink among families having different incomes. Suppose
500 families are contacted and data collected is as follows:

Table No. 3
Use of Health Drink
Income per month No. of children per family No. of families
0 1 2 3 4 5 More than 5
<1000 5 0 8 9 11 15 25 73
1001-2000 10 5 8 10 13 18 27 91
2001-3000 20 10 12 14 20 22 32 130
3001-4000 12 3 6 7 13 20 30 91
4001-5000 6 2 6 5 10 15 20 64
>5000 6 1 4 5 7 10 18 51
59 21 44 50 74 100 152 500
Note: The above table shows that consumption of a health drink not only depends on
income but also on the number of children per family. Health drinks are also very
popular among the family with no children. This shows that even adults consume this
drink. It is obvious from the table that 59 out of 500 families consume health drinks
even though they have no children. The table also shows that families in the income
group of 2001 to 3000 consume health drinks the most.
The form in which tabulation is to be done is decided by taking into account:
a. The purpose of study, and
b. The use of statistical tools e.g., mean, mode, standard deviation etc. Improper
tabulation may create difficulties in the use of these tools.

5. Graphic Presentation: It is a visual display of data. It is an attractive and easily


understandable way of presenting data. It represents complex data in a simple form.
Comparative analysis of data is easily and quickly. There are different types of graphical
presentation which depends on the nature of data and the type of statistical results.
• Pie Chart
• Bar Graphs
• Line Graphs
• Histogram

Summarising the Data


Summarizing the data includes:
1. Classification of data
2. Frequency Distribution
3. Use of appropriate statistical tool
2. Usage of Statistical Tools
Measures of Central Tendency: The three most common ways to measure centrality or
Central Tendency are Mode, Median and Mean.
Measure of Dispersion is the numbers that are used to represent the scattering of the data.
These are the numbers that show the various aspects of the data spread across various
parameters.

Types of Measures of Dispersion


Measures of dispersion can be classified into two categories shown below:
• Absolute Measures of Dispersion
• Relative Measures of Dispersion
These measures of dispersion can be further divided into various categories. The measures
of dispersion have various parameters and these parameters have the same unit.
Hypothesis: A hypothesis is a tentative proposition relating to certain phenomenon, which
the researcher wants to verify when required.

• If the researcher wants to infer something about the total population from which the
sample was taken, statistical methods are used to make inference. We may say that,
while a hypothesis is useful, it is not always necessary. Many a time, a researcher is
interested in collecting and analysing the data indicating the main characteristics
without a hypothesis.
• Also, a hypothesis may be rejected but can never be accepted except tentatively.
Further, evidence may prove it wrong. It is wrong to conclude that since hypothesis
was not rejected it can be accepted as valid.
• Null Hypothesis: It is a statement about the population, whose credibility or validity
the researcher wants to assess based on the sample. A Null hypothesis is formulated
specifically to test for possible rejection or nullification. Hence the name ‘Null
Hypothesis’. Null hypothesis always states “no difference”. It is this null hypothesis that
is tested by the researcher.
There are several bases on which hypothesis are classified:
a. Descriptive Hypothesis: These by name implies describing some characteristics of an
object, a situation, an individual or even an organization. Example: Why youngster
prefer “X” soft drinks? Decentralization of decision making is more effective. All these
tell us the characteristics of some entity.
b. Relation Hypothesis: In this case we describe relationship between two variables.
Example: Why rich people shop at life style? Rate of attrition is high in those Jobs
where there is night shift working.
Steps involved in Hypothesis Testing
1. Formulate the null hypothesis, with Ho and Ha, the alternate hypothesis. According
to the given problem, H0 represents the value of some parameter of population.
2. Select an appropriate test assuming, Ho to be true.
3. Calculate the value.
4. Select the level of significance other at 1% or 5%.
5. Find the critical region.
6. If the calculated value lies within the critical region, then reject Ho.
7. State the conclusion in writing.
Types of Tests:
1. Parametric test:
i. Parametric tests are more powerful. The data in this test is derived from
interval and ratio measurement.
ii. In parametric tests, it is assumed that the data follows normal distributions.
Examples of parametric tests are (a) Z-test, (b) t-test, and (c) F-test.
iii. Observations must be independent i.e., selection of any one item should not
affect the chances of selecting any others be included in the sample.
2. Non-parametric test: They are used to test the hypothesis with nominal and ordinal
data.
i. We do not make assumptions about the shape of population distribution.
ii. These are distribution-free tests.
iii. The hypothesis of non-parametric test is concerned with something other than
the value of a population parameter.
iv. Easy to compute: There are certain situations particularly in marketing
research, where the assumptions of parametric tests are not valid. Example: In
a parametric test, we assume that data collected follows a normal distribution.
Example of non-parametric tests are (a)Binomial Test (b) Chi-square test (c)
Mann-Whitney U test (d) Sign test. A Binomial test is used when the population
has only two classes such as male, female, buyers, non-buyers, success, failure
etc. All observations made about the population must fall into one of the two
tests. The Binomial test is used when the sample size is small.
Example 1: A CFL manufacturing company supplies its products to various retailers across the
country. The company claims that the average life of its CFL is 24 months. The company has
received complaints from retailers that the average life of its CFL is not 24 months. For
verifying the complaints, the company took a random sample of 60 CFLs and found that the
average life of the CFL is 23 months. Assume that the population standard deviation is 5
months. Use Alpha α= 0.05 to test whether the average life of a CFL in the population is 24
months.
Solution:
One-Sample Z
N Mean SE Mean 95% CI Z p
60 23.000 0.645 (21.735, 24.265) -1.55 0.121

• The null hypothesis that there is no change in the average life of the CFL, is accepted.
• The sample mean result may be due to sampling fluctuations. The company should ask
retailers to re-test the average life of its CFL.
Example 2: During the economic boom, the average monthly income of software professionals
touched Rs. 75,000. A researcher is conducting a study on the impact of economic recession
in 2008. The researcher believes that the economic recession may have an adverse impact on
the average monthly salary of software professionals. For verifying his belief, the researcher
has taken a random sample of 20 software professional and computed their average income
during the recession period. The average income of these 20 professionals is computed as Rs.
60,000. The sample standard deviation is computed as Rs. 3,000. Use alpha = 0.10 to test
whether the average income of software professionals is Rs. 75,000 or its has gone down as
indicated by the sample mean.
Solution:
Here, the sample size is 20 (less than 30), therefore, t test can be used for testing the
hypothesis.
The researcher’s belief about the decrease in the average monthly income of software
professionals holds good. The researcher is 90% confident that the average monthly income
of software professional has gone down owing to economic recession in 2008. The p-value
output indicates the acceptance of the alternative hypothesis.

One-Sample t
N Mean Std. Dev SE Mean 90% Upper bound Z p
40 60000 3000 671 60891 -22.36 0.000

Chi-Square Test
With the help of this test, we will come to know whether two or more attributes are associated
or not. How much the two attributes are related cannot be by Chi-square test. Suppose, we
have certain number of observations classified according to two attributes.
Example 3: The number of automobile accidents per week in a certain city were as follows:

Months Jan Feb Mar Apr May June July Aug Sep Oct
No. of accidents 12 8 20 2 14 10 15 6 9 4
Does the given data indicate that accident conditions were uniform during the 10-month
period.
Solution:

Correlation

Correlation refers to the statistical relationship between the two entities. It measures the
extent to which two variables are linearly related. For example, the height and weight of a
person are related, and taller people tend to be heavier than shorter people.
You can apply correlation to a variety of data sets. In some cases, you may be able to predict
how things will relate, while in others, the relation will come as a complete surprise. It's
important to remember that just because something is correlated doesn't mean it's causal.
There are three types of correlation:
• Positive Correlation: A positive correlation means that this linear relationship is
positive, and the two variables increase or decrease in the same direction.
• Negative Correlation: A negative correlation is just the opposite. The relationship line
has a negative slope, and the variables change in opposite directions, i.e., one variable
decrease while the other increases.
• No Correlation: No correlation simply means that the variables behave very differently
and thus, have no linear relationship.

Example 4: Tom has started a new catering business, where he first analyses the cost of
making a sandwich and what price he should sell them. After talking to various cooks currently
selling the sandwich, he has gathered the information below. Tom was convinced that there
is a positive linear relationship between the number of sandwiches and the total cost of
making them. Analyse if this statement is true.

No of Sandwich Cost of Bread Vegetable Total Cost


10 100 30 130
20 200 60 260
30 300 90 390
40 400 120 520
Tom convinces a positive linear relationship between the number of sandwiches and the total
cost of making them. Analyse if this statement is true.
Solution:
After plotting the points between the number of sandwiches prepared and the cost of making
them, there is a positive relationship.
And it can be seen from the above table that yes, there is a positive linear relationship. If one
runs correlation, it will come to +1. Hence, as Tom makes more sandwiches, the cost will
increase, and it appears valid as one will require the more the sandwich, the more vegetables,
and so would bread be required. Hence, this has a positive perfect linear relationship based
on the data.

Regression Analysis:

• We use regression analysis to estimate the relationships between two or more


variables. There are two basic terms that you need to be familiar with:
• The Dependent Variable is the factor you are trying to predict.
• The Independent Variable is the factor that might influence the dependent variable.
• Consider the following data where we have a number of COVID cases and masks sold
in a particular month.

You might also like