0% found this document useful (0 votes)
23 views9 pages

Introduction

Uploaded by

rithike395
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views9 pages

Introduction

Uploaded by

rithike395
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction

Task 1: Data Exploration and Customer Analytics.

In this blog post, we will embark on a data analysis journey to gain


insights into customer purchase behavior. We will be utilizing the R
programming language and several powerful packages to extract
meaningful information from our transaction and customer
behavior datasets. Our goal is to gain insights into customer
segments, their spending patterns, preferred brands, and pack sizes.

Data Exploration
We start by importing the necessary R packages and loading the
transaction and customer behavior datasets. Using functions
like head() and str(), we explore the structure and contents of the
datasets to understand their dimensions, data types, and variables.

#Exploring Transactions Data


str(trns_1)
trns_1 %>%
summarize_all(class) %>%
gather(variable, class)

Data Transformation
Next, we perform data transformation tasks such as formatting the
date column, handling outliers, cleaning and preprocessing text data
in the product name column, and checking for missing dates. These
transformations ensure that the data is in a suitable format for
analysis.
• I found out the data type of the DATE column was an
integer instead of a Date data type.

Exploratory Data Analysis (EDA)


During the EDA phase, we delve deeper into the datasets to uncover
insights. We analyze summary statistics, identify outliers, explore
product names and pack sizes, check unique values in categorical
columns, visualize transaction trends over time, and check for
duplicates in the data. I found outliers in the Quantity and Sales
column using the box whisker plot and removed them later.

Later I removed all the non-chips product categories from the


product name. While checking the date column, I found out 1 date
was missing, and after further analysis, it turns out to be 25th Dec,
I’m showing the plot specifying the sudden increase in sales and
drop in the number of transactions.
One duplicate value was found and removed from the transaction
dataset.

Distribution of Pack Size


The pack size frequency doesn’t seem inconsistent and does not
differ significantly from other observations.

Data Analysis

In the data analysis section, we focus on calculating metrics related


to customer segments, such as total sales, chips bought per
customer, number of customers in each segment, and average sales
by customer segment. We use visualizations like bar plots and
histograms to compare sales across different customer segments and
analyze the chips bought per customer.

I created a column for brand names and there are a total of 26


unique brands.
I printed out the unique values in customer behaviors’ categorical
columns. After cleaning and exploring the datasets, I merged
customer behavior and transaction tables to make it easy for
analysis.

For analysis, four metrics created these are as follows:

1. Customer segment who are spending most.

Mainstream customers are spending most in premium_customer


and OLDER SINGLES/COUPLES in LIFESTAGE segment.

Most sales come from Budget-Midage singles/couples, followed by


Mainstream-young singles/couples.

2. Chips bought per customer by segment.


Mainstream-Older families segment are buying more chips, followed
by mainstream young families.

3. Number of customers in each segment.


The highest number of customers is in the Mainstream-Young
Single/Couples segment which is the reason for more sales in this
segment. But this is not the case for the Budget-midage segment.

4. Avg sales by customer segment.


Mainstream-Young single couples & Middle-aged single/couples
tend to spend more per unit and contribute most in sales.

In further analysis, find out the brand they prefer and the size of the
packet. Below are the findings:

Mainstream young single couple segment tends to buy TYRRELLS


chips most and BURGER the least. They prefer to buy 270g pack size
most and 220g the least. Twisties Cheese is the brand that sells 270g
size chips.

Insights and Recommendations


• Based on our analysis, we uncover key insights such as the
segments with the highest sales, preferred brands and pack
sizes among specific customer segments, and trends in
spending per pack. Just before Christmas sales increased
significantly. These insights provide valuable information
for business decision-making.

• Category Manager can focus more on TYRRELLS chips as


Mainstream-young single/couples tend to buy these chips
by increasing the visibility of the product to attract
customers of this segment.

• Maintaining the stock sufficient for sales just before


Christmas.

Stay tuned for the next part of our data analysis journey in the
upcoming blog post!

Check out the detailed analysis on Github. Any thoughts or


suggestions are welcome in the comment or you can directly
message and connect with me on Linkedin.

You might also like