CUSTOMER ANALYSIS - Report

The document outlines a customer analysis project aimed at understanding shopping behaviors in a supermarket through transaction data. It details the dataset attributes, preprocessing steps, and analysis methods including correlation and outlier detection using Z-Score normalization. The findings suggest that certain product descriptions can be considered redundant and that sales patterns fluctuate throughout the year, notably increasing during the Christmas season.

Uploaded by

Nitin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views10 pages

CUSTOMER ANALYSIS - Report

Uploaded by

Nitin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

CUSTOMER ANALYSIS

15/07/2024 Name: Nitin S

Batch no: 07 Roll no: 22i338

Problem Statement:
We want to understand how customers shop at a supermarket by looking at their
transaction data. This means finding out what products are popular, which ones aren’t, and
how buying patterns change over time. By creating new metrics and visualizing the data, we
aim to gather useful insights to improve sales strategies and the overall shopping experience.
Removing outliers will help ensure that our findings are accurate and practical for making
smart business decisions.

Dataset used:

Link: https://github.com/Aegon127bc/Supermarket-CustomerAnalysis.git

Attribute Type Description

Unique alphanumeric ID
assigned to a basket,
BasketID Numerical
deﬁned as a set of
transactions.
The date on which the
BasketDate Numerical
transaction was made.
Cost of one unit of
Sale Numerical
product.
CustomerID Numerical The ID of the customer.
The Customer’s country of
CustomerCountry Categorical
origin.
ProdID Numerical The ID of the product.

ProdDescr Categorical A description of the product.

The number of units bought
Qta Numerical
in the transaction.
Data Preprocessing:

Loading the data set. The data set is assigned to a Pandas data frame. The BasketDate is
converted from 'str' to 'datetime'.
Types of attributes and null values and Description of the dataset.

Count of null values, for each attribute.

All the ProdDescr missing values are already included in the CustomerID ones. In fact, if we
count all the rows where both the attributes are null.
Analysis of single attributes.
The sales are kind of constant throughout the year and increased in the proximity of
Christmas. In fact, the day with more sales was 2011-12-05 and the day with fewer sales was
2010-12-22. A simple way to justify this would be that the store started its activity that ﬁrst
Christmas and steadily increased its clientele from time to time.
Correlation:
To calculate pairwise correlation, we transformed some attributes into categorical ones. Due
to implementation reasons, ProdDescr had to be treated differently from other attributes. For
this reason, we introduced a dictionary: for each string (key) we assigned an incremental
identifier (value).

All the ProdDescr missing values are already included in the CustomerID ones. In fact, if we
count all the rows where both the attributes are null.

Then we replaced all the descriptions with their associated identifier and this way we
proceeded to calculate the pairwise correlation, represented by a heatmap.
Correlation in the original dataset isn’t high in most of the considered pairs.
Exceptions are:
• BasketIDs and BasketDates: all the transactions that belong to the same basket are
made on the same date.
• ProdID and ProdDescr: the same item (usually) has the same description.
So, due to the high correlation score (0.98) for descriptions and items, we
can safely assume that ProdDescr is a superﬂuous attribute and so it can be
dropped in future studies.
On the other hand, our new attribute Amount has of course a very high correlation
with the Qta attribute, simply because they’re directly proportional.
Outliers:
The last step for data quality assessment was the outliers detection, which was
made only for the new dataset (the one with all the new attributes).
For outliers’ analysis and removal, we decided to use the Z-Score metric,
which is an important measurement that tells how many Standard Deviation above or
below a number is from the mean of the dataset.

Z-score normalization refers to the process of normalizing every value in a dataset such that
the mean of all of the values is 0 and the standard deviation is 1.

We use the following formula to perform a z-score normalization on every value in a dataset:

New value = (x – μ) / σ

where:

x: Original value
μ: Mean of data
σ: Standard deviation of data

Walmart Case Study
No ratings yet
Walmart Case Study
40 pages
CPA Financial Reporting
From Everand
CPA Financial Reporting
Academic Publisher
No ratings yet
Task 2 - Experimentation and Uplift Testing - Jupyter Notebook
No ratings yet
Task 2 - Experimentation and Uplift Testing - Jupyter Notebook
41 pages
SMDM-Project Report (Madhur Dhananiwala)
100% (2)
SMDM-Project Report (Madhur Dhananiwala)
43 pages
Its665 Report
No ratings yet
Its665 Report
45 pages
Smart Data Discovery
No ratings yet
Smart Data Discovery
29 pages
Task 1 - Data Preparation and Customer Analytics - Jupyter Notebook
No ratings yet
Task 1 - Data Preparation and Customer Analytics - Jupyter Notebook
64 pages
Amazon Data Analysis With SQL
No ratings yet
Amazon Data Analysis With SQL
4 pages
BA Report
No ratings yet
BA Report
17 pages
Amazon Sales Analysis
No ratings yet
Amazon Sales Analysis
51 pages
Business Report Project SMDM Sonali Pradhan
100% (1)
Business Report Project SMDM Sonali Pradhan
56 pages
Final Project
No ratings yet
Final Project
39 pages
Mra 2
No ratings yet
Mra 2
40 pages
Unit 4 Basics of Feature Engineering
No ratings yet
Unit 4 Basics of Feature Engineering
33 pages
MRA Part A
No ratings yet
MRA Part A
30 pages
Ali Shafi BSBA 2-A 6522 Sales Market Data
No ratings yet
Ali Shafi BSBA 2-A 6522 Sales Market Data
40 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
Case Study-1-Pattern Discovery in Supermarket Sales Transactions Using EDA
No ratings yet
Case Study-1-Pattern Discovery in Supermarket Sales Transactions Using EDA
3 pages
Document
No ratings yet
Document
29 pages
Grocery Store's Data Group 1
No ratings yet
Grocery Store's Data Group 1
23 pages
249 PRJ
No ratings yet
249 PRJ
31 pages
Analytical Methods PDF
No ratings yet
Analytical Methods PDF
9 pages
Target Data Analyst SQL Interview Questions 1737945171
No ratings yet
Target Data Analyst SQL Interview Questions 1737945171
23 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
Walmart Solution PDF
No ratings yet
Walmart Solution PDF
35 pages
BusinessCaseStudyTargetMySQL v1
No ratings yet
BusinessCaseStudyTargetMySQL v1
31 pages
Marketing & Retail Analytics - Project 2
No ratings yet
Marketing & Retail Analytics - Project 2
28 pages
File 2620
No ratings yet
File 2620
24 pages
Rithika Content
No ratings yet
Rithika Content
25 pages
I CT 762 Group Report
No ratings yet
I CT 762 Group Report
19 pages
Analysis of Superstore Database
No ratings yet
Analysis of Superstore Database
23 pages
MRA Project Milestone 2
71% (17)
MRA Project Milestone 2
20 pages
Final Project
No ratings yet
Final Project
15 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
Rithika
No ratings yet
Rithika
16 pages
Supermarket Sales Analysis 1
No ratings yet
Supermarket Sales Analysis 1
13 pages
EDA Report Week2
No ratings yet
EDA Report Week2
15 pages
ISE302 - IT Project Management
No ratings yet
ISE302 - IT Project Management
25 pages
Olist Kasyapa
No ratings yet
Olist Kasyapa
22 pages
National Institute of Technology Durgapur
No ratings yet
National Institute of Technology Durgapur
11 pages
Training
No ratings yet
Training
17 pages
Target SQL - Reference
No ratings yet
Target SQL - Reference
11 pages
Ass 5 PR
No ratings yet
Ass 5 PR
6 pages
Data Analysis
No ratings yet
Data Analysis
10 pages
DSML - Project Report - Group 3
No ratings yet
DSML - Project Report - Group 3
17 pages
Wrangle Report
No ratings yet
Wrangle Report
7 pages
Solution
No ratings yet
Solution
4 pages
Analytics Roadmap
No ratings yet
Analytics Roadmap
30 pages
SQL Capstone Project
No ratings yet
SQL Capstone Project
4 pages
Data Analytics Project Sem4
No ratings yet
Data Analytics Project Sem4
6 pages
Case Study Module 1
No ratings yet
Case Study Module 1
4 pages
Enterprise Final Demo
No ratings yet
Enterprise Final Demo
8 pages
Knime Anomaly Detection Visualization
No ratings yet
Knime Anomaly Detection Visualization
13 pages
201707011
No ratings yet
201707011
13 pages
Advance Data Analytics ASSIGNMENT
No ratings yet
Advance Data Analytics ASSIGNMENT
10 pages
TSK 1
No ratings yet
TSK 1
3 pages
KPMG Data Quality Assessment
No ratings yet
KPMG Data Quality Assessment
2 pages
Text
No ratings yet
Text
3 pages
Ass
No ratings yet
Ass
4 pages
Gaurav Upadhyay ML Project
No ratings yet
Gaurav Upadhyay ML Project
8 pages
A Study On Employee Involvement
No ratings yet
A Study On Employee Involvement
89 pages
SMDM Project Report Dipti
No ratings yet
SMDM Project Report Dipti
14 pages
Gowtham Mra 2
No ratings yet
Gowtham Mra 2
18 pages
Determinants of Academic Performance Among Senior H Igh School Shs Students in The Ashanti Mampong Municipality of Ghana
No ratings yet
Determinants of Academic Performance Among Senior H Igh School Shs Students in The Ashanti Mampong Municipality of Ghana
16 pages
Customer Analytics Retail Project
No ratings yet
Customer Analytics Retail Project
8 pages
Revision Sheet 5 - Stats Exemplar
No ratings yet
Revision Sheet 5 - Stats Exemplar
19 pages
Data Management
No ratings yet
Data Management
7 pages
Class Test-1: Manpreet Singh 2K19/DMBA/48 Ans 1)
No ratings yet
Class Test-1: Manpreet Singh 2K19/DMBA/48 Ans 1)
2 pages
Paper 2 CEO Narcissism
No ratings yet
Paper 2 CEO Narcissism
21 pages
Somers D
No ratings yet
Somers D
27 pages
Asuncion
No ratings yet
Asuncion
7 pages
Accelerating Materials Property Predictions Using Machine Learning
No ratings yet
Accelerating Materials Property Predictions Using Machine Learning
6 pages
Signature Redacted Signature Redactec: Lgiur Reuacted
No ratings yet
Signature Redacted Signature Redactec: Lgiur Reuacted
53 pages
Regression
No ratings yet
Regression
12 pages
Detecting Lead-Lag Relationships in Stock Returns and Portfolio Strategies
No ratings yet
Detecting Lead-Lag Relationships in Stock Returns and Portfolio Strategies
45 pages
Math Apps IA
No ratings yet
Math Apps IA
28 pages
CBR Statdas
No ratings yet
CBR Statdas
20 pages
Quantitative Sample Format or Template
No ratings yet
Quantitative Sample Format or Template
27 pages
Chapter - 5 Portfolio Analysis
No ratings yet
Chapter - 5 Portfolio Analysis
18 pages
Chapter 18 Classification, Biodiversity & Conservation
No ratings yet
Chapter 18 Classification, Biodiversity & Conservation
25 pages
EJ1341858
No ratings yet
EJ1341858
19 pages
RESEARCH Sample
No ratings yet
RESEARCH Sample
30 pages
P.planing Assi
No ratings yet
P.planing Assi
35 pages
Probability and Statistics 2022-2023 (Se1) - CLC
No ratings yet
Probability and Statistics 2022-2023 (Se1) - CLC
2 pages
SEHH2031 Exercises Chapter 10
No ratings yet
SEHH2031 Exercises Chapter 10
7 pages
156-Article Text-467-1-10-20210623
No ratings yet
156-Article Text-467-1-10-20210623
14 pages
08 Introduction To Correlation and Linear Regression Analysis 2
No ratings yet
08 Introduction To Correlation and Linear Regression Analysis 2
5 pages
JCR Analysis 197-1593562562166-175
No ratings yet
JCR Analysis 197-1593562562166-175
10 pages
Calculating Correlation Coefficients With Repeated Observations - Part 1
No ratings yet
Calculating Correlation Coefficients With Repeated Observations - Part 1
1 page
Learn SAP SD in 24 Hours
From Everand
Learn SAP SD in 24 Hours
Alex Nordeen
5/5 (1)

CUSTOMER ANALYSIS - Report

Uploaded by

CUSTOMER ANALYSIS - Report

Uploaded by

CUSTOMER ANALYSIS

15/07/2024 Name: Nitin S

Attribute Type Description

ProdDescr Categorical A description of the product.

Count of null values, for each attribute.

You might also like