0% found this document useful (0 votes)
89 views36 pages

Basic Data Analysis

This document discusses various topics related to data analysis basics, including: 1) Reading data tables to draw conclusions about sales patterns of instant noodle brands across 4 stores with different catchment profiles. 2) Analyzing car ownership data by population strata to understand higher ownership in large cities versus rural areas. 3) The concepts of correlation between variables, how a high correlation does not necessarily imply causation, and using correlation to study relationships in retail sales data over time. 4) Different options for collecting brand image data through ratings scales or simple association measures on various product attributes.

Uploaded by

chaitanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views36 pages

Basic Data Analysis

This document discusses various topics related to data analysis basics, including: 1) Reading data tables to draw conclusions about sales patterns of instant noodle brands across 4 stores with different catchment profiles. 2) Analyzing car ownership data by population strata to understand higher ownership in large cities versus rural areas. 3) The concepts of correlation between variables, how a high correlation does not necessarily imply causation, and using correlation to study relationships in retail sales data over time. 4) Different options for collecting brand image data through ratings scales or simple association measures on various product attributes.

Uploaded by

chaitanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

Data Analysis – Basics

Topics of today’s discussion

 Reading Data Tables to make Conclusions

 Reading Central Tendency and Dispersion

 Correlation and Causality

 Row-Column Normalization of data tables

 Variables type and Statistical Analysis tool to build


relationships
Topics of today’s discussion

 Reading Data Tables to make Conclusions

 Reading Central Tendency and Dispersion

 Correlation and Causality

 Row-Column Normalization of data tables

 Variables type and Statistical Analysis tool to build


relationships
Reading Data Tables – Situation 1

 Let us assume a city has 4 modern format stores (named


Store 1, Store 2, etc) of a single retail player
 They are more or less of similar size and have similar total
monthly Sales
 However, the Sales by different categories are different –
for example, one Store might have a higher Sales of FMCG
and another a higher Sale of Staples
 In such a scenario, let us look at the buyers of “instant
noodles” in these 4 stores in 2014
Reading Data Tables – Situation 1

Brands Purchased in each Store – Instant Noodles (2014)

Store 1 Store 2 Store 3 Store 4


Among buyers of Instant
Noodles in each Store
% Buying Maggi variants only 70% 75% 55% 85%
% Buying Yippee or Top Ramen
20% 20% 35% 10%
or others only
% Buying both 10% 5% 10% 5%
Reading Data Tables – Situation 1

% Contribution of Buyers of Instant Noodles Brands from each


Store (2014)

Store 1 Store 2 Store 3 Store 4

% Buying Maggi variants only 18% 27% 42% 13%


% Buying Yippee or Top Ramen
13% 18% 66% 4%
or others only
% Buying both 20% 14% 60% 6%
Reading Data Tables – Situation 1 – Assignment

 Reading the 2 tables what will you conclude about Instant


Noodles sales from the 4 stores?
 Can you make some guesses about the difference in the
catchment profiles of these stores?
Reading Data Tables – Situation 1

SOME SIMPLE CONCLUSIONS


 Store 3 has a substantially large number of buyers of
Instant Noodles as category – Store 4 has the least

 Among their respective buyers, Stores 1, 2 and 4 have high


(70%+) “solus Maggi” buyers, especially Store 4 (85%)

 Store 3 has lesser (55%) “solus Maggi” buyers. But, being


the largest seller of instant noodles, contributes maximum
to Maggi sales, as well as to the other brands’ sales
Reading Data Tables – Situation 1

SOME SIMPLE CONCLUSIONS


 Same-sized Stores, yet Store 3 has
 Higher Instant Noodles sales and
 Higher % of new brand (Yippee, Smoodles, etc) Sales
So, the catchment profile might be
- younger, with more double income hhlds, bachelors, etc
- also, psychographically, more open to trying new brands
- more exposed to media… hence aware of new brands

 Similarly, Store 4 catchment profile might be just the


opposite
Reading Data Tables – Situation 1 – Hint
(Column %s)

Brands Purchased in each Store – Instant Noodles

Store 1 Store 2 Store 3 Store 4


Base: Buyers of Instant
500 700 1500 300
Noodles in each Store
% Buying Maggi variants only 70% 75% 55% 85%
% Buying Yippee or Top Ramen
20% 20% 35% 10%
or others only
% Buying both 10% 5% 10% 5%
Reading Data Tables – Situation 1 – Hint
(Row %s)

% Contribution of Buyers of Instant Noodles Brands from each


Store

Store 1 Store 2 Store 3 Store 4


% of ALL INSTANT
17% 23% 50% 10%
NOODLES BUYERS
% Buying Maggi variants only 18% 27% 42% 13%
% Buying Yippee or Top Ramen
13% 18% 66% 4%
or others only
% Buying both 20% 14% 60% 6%
Reading Data Tables – Situation 2

Car Ownership (Among higher Social Class) by Pop Strata

All Large Urban Urban Rural Rural


India Metros 1 – 50L <1L 10K+ <10K
Target household
20000 2000 6000 5000 3000 4000
Population (‘000)
Sample Size 5000 1000 1000 1000 1000 1000
Column %
% Owning Cars 20% 51% 28% 19% 7% 2%
% Not Owning
80% 49% 72% 81% 93% 98%
Cars
Reading Data Tables – Situation 2

Car Ownership (Among higher Social Class) by Pop Strata


All Large Urban Urban Rural Rural
India Metros 1 – 50L <1L 10K+ <10K
Target household
20000 2000 6000 5000 3000 4000
Population (‘000)
Row %
% Owning Cars 100% 26% 43% 24% 5% 2%

From the given two tables (i.e. Column % and Row %) what do you
conclude about car ownership in India?
Reading Data Tables – Situation 2

BROADLY, THERE ARE 3 CONCLUSIONS ON CAR


OWNERSHIP:
1. Overall, 20% of households in higher Social Class of India,
own cars
2. Ownership is highest in large metros (51%) and comes
down step-wise as we go down the lower pop strata – lower
in non-metro urban and lowest in rural
3. However, due to large size of Urban 1 – 50 lakh population,
the highest contribution of cars (43%) comes from this pop
strata
Topics of today’s discussion

 Reading Data Tables to make Conclusions

 Reading Central Tendency and Dispersion

 Correlation and Causality

 Row-Column Normalization of data tables

 Variables type and Statistical Analysis tool to build


relationships
Assignment 2

In the catchment area of a store, 1000 people were asked some questions in a
survey. TG: 21 – 45yrs, housewives or single earning members, who
themselves shop for day-to-day household items,

They were asked to agree or disagree with a statement “I prefer buying day-to-
day items from modern format outlets rather than going to traditional Kirana
stores” in a five point scale (Likert Scale):

Of the 1000 people responding to this question, the mean score obtained was
3.1 out of 5. What can you conclude from this?
Assignment 2

If you are now given the following distribution:

What would you conclude?


Can you make some hypotheses on the sub-groups of people giving this
opinion?
Would you want the data to be analyzed in some other sub-groups?
Assignment 2

We may need to look at an output like:

… to check, Is the polarization of findings due to different attitudes


among different age groups and different stages of life?
Topics of today’s discussion

 Reading Data Tables to make Conclusions

 Reading Central Tendency and Dispersion

 Correlation and Causality

 Row-Column Normalization of data tables

 Variables type and Statistical Analysis tool to build


relationships
Correlation between two variables

When we look at data for two or more variables, we sometimes see that
data for two variables move in the same direction.
For example, if we record the heights and weights of a large number of
people, we would observe that there are many taller people who also weigh
higher and similarly there are many shorter people who also weigh lesser

Also, there can be two variables that move mostly in opposite directions
For example, the Power of the engine and Mileage of the car – mostly, cars
with higher Power would have lower Mileage.

We refer to the term ‘Correlation’ to explain the strength of linear


association between two variables and a co-efficient ‘r’ is used
to measure this strength

The value of ‘r’ can range between –1 and +1


Correlation between two variables

r value closer to +1  a strong positive association


r value closer to -1  a strong negative association
r value closer to 0  weak association between the two variables

In Retail Sales data too, it will be interesting to observe the Correlation


of Sales of certain categories over the period of time

- Is there a high positive Correlation between Sales of Shampoos and


Conditioners?
- Is there a negative Correlation between Sales of Shower Gels and
Soaps?
by looking at long-term Sales data
Correlation does not imply Causality

However, one must note “CORRELATION DOES NOT IMPLY


CAUSALITY”

Meaning, a high positive Correlation between A and B does not


mean that A causes B or A leads to B

e.g. Brand Imagery vs Brand Usage


 generally a high positive correlation… but does it mean increase in
Brand Imagery would lead to increase in Brand Usage?

NO!
Topics of today’s discussion

 Reading Data Tables to make Conclusions

 Reading Central Tendency and Dispersion

 Correlation and Causality

 Row-Column Normalization of data tables

 Variables type and Statistical Analysis tool to build


relationships
Let us look at a Brand Image Q’re situation

• You want to understand the imagery of 5 brands of


shampoos on several attributes

• The attributes can be “Makes hair shiny”, “Cleans hair


well”, …, “Has good packaging”, etc

• You may collect this info as “Ratings” (OPTION 1) or as


simple “Association” (OPTION 2)
Rate the following brands in terms of your preference on
each attribute in a 1 – 5 scale, 5 = Excellent,…, 1 = Poor
Dove Tresemme Head & S Clear Sunsilk
Gives shiny hair
Cleans hair well
Removes dandruff


OPTION 1

Good vfm

Good packaging
Which brands would you associate on each attribute as per your
preference? (TICK/CIRCLE CODE AS APPLICABLE)
Dove Tresemme Head & S Clear Sunsilk
Gives shiny hair A B C D E
Cleans hair well A B C D E
Removes dandruff A B C D E
… A B C D E

… A B C D E

… A B C D E
OPTION 2
… A B C D E
… A B C D E

Good vfm A B C D E

Good packaging A B C D E
Obviously…

• OPTION 1 will be more detailed, more robust to analyze


Statistically

• However, OPTION 1 will be very time consuming to


administer

• So, lot of times, we go ahead with OPTION 2 when we have


large number of attributes and/or brands to work with
A Typical output of such Image Association

Dove Tresemme Head & S Clear Sunsilk


Gives shiny hair 72% 32% 54% 42% 73%
Cleans hair well 68% 22% 60% 45% 75%
Removes dandruff 45% 12% 88% 82% 70%
Good vfm 55% 18% 60% 62% 82%

Good packaging 65% 24% 57% 51% 76%

• PROBLEM HERE IS… Large brands will always have high


associations across …and small brands will have small
associations across all attributes
SO, HOW DO WE GET THE RELATIVE STRENGTHS AND
WEAKNESSES OF SMALL BRANDS?
ROW – COLUMN NORMALIZATION

• It brings all Brands and all Attributes to the same platform

• Hence fair comparison can be made in terms of relative


Strengths and Weaknesses

(Refer to Excel File for computations)


Image Association

(Refer to Excel File for Row – Column normalization)


After Normalization…

Tin plate Plastic Glass Tetrapack

Makes the product reasonably priced -- ++ -


Increases longevity of product
Convenient for stocking in godown ++ + --
Looks attractive on shelves for a long time as colours do not
fade away +
Destroyable and the material useable without damaging the
environment -- -- + -
Popular among consumers ++ +
Convenient for transportation as it does not break / get
tampered easily ++ --
Not tampered easily by insects / rats ++ -- ++ --
Gives protection against foreign odours -- + +
Convenient for stocking on shelves - ++ -
Convenient for transportation by requiring less space --
Re-useable by the customer for some other purpose -- ++
Topics of today’s discussion

 Reading Data Tables to make Conclusions

 Reading Central Tendency and Dispersion

 Correlation and Causality

 Row-Column Normalization of data tables

 Variables type and Statistical Analysis tool to build


relationships
Variable Types and Statistical tools

DEPENDENT VARIABLE
INDEPENDENT VARIABLE

36
Variable Types and Statistical tools

Types of tools Applicable:


- Chi-Square test

- ANOVA

- Multiple Regression

- Discriminant Analysis OR Logistic Regression

37
Examples of Applications:
- Chi-Square test (Purchaser vs Non-Purchaser… Are there

differences by demographic groups?)

- ANOVA (Preference levels among different brands of vodka drinkers)

- Multiple Regression (“Ad likeability” by Uniqueness, Relevance, etc)

- Discriminant Analysis OR Logistic Regression (Purchase /

Non-Purchase by Ad Uniqueness, Relevance, etc)

38
THANK YOU!

You might also like