0% found this document useful (0 votes)
14 views7 pages

Wrangle Report

Uploaded by

pop000black
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views7 pages

Wrangle Report

Uploaded by

pop000black
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

WRANGLE REPORT

3 MONTHS IN REVIEW

[email protected]
SUPERMARKET SALES
INTRODUCTION
In real-world scenarios, data is often requires extensive preprocessing. Utilizing Python
and its robust libraries, I will collect data. The next step involves thoroughly assessing
the data's quality and structure, followed by a meticulous cleaning process, known as
data wrangling. These efforts will be systematically documented in a Jupyter Notebook
within the project folder. Furthermore, I will present the wrangled data through
comprehensive analyses and visualizations using Python (and its libraries) and
POWER BI.

General idea :
The dataset I will be wrangling, analyzing, and visualizing comprises sales data from a
supermarket, collected over a span of three months. This dataset includes detailed
records of transactions, product categories, sales figures, and customer information.
The objective is to clean and preprocess this data to ensure accuracy and consistency,
followed by a thorough analysis to uncover sales trends, customer behavior patterns,
and product performance. This analysis will provide valuable insights into the
supermarket's operations and support data-driven decision-making.

Subheading
PROJECT INDEX
Data gather:
- File load into the notebook

Data assessing:
- Explore the data
- Identify data quality issues and tidiness issues

Data cleaning:

- Fixing each issue that have been Identified

Data saving:

- Saving clean version of data

Data visualization:

- General analysis
- Customer analysis
- Product analysis
- Payment analysis
- Time series analysis
Data gather:
- File load into the notebook
The dataset, 'supermarket sales.csv,' contains detailed records of transactions. Please download this
file manually using the following link: GitHub

Data assessing:
- Explore the data

After gathering each of the above pieces of data, we need to assess them visually and
programmatically for quality and tidiness issues.

- Identify data quality issues and tidiness issues

data quality issues:

• Tax and Total columns with missing values.


• Incorrect data type: Unit Price column stored as object instead of numeric.
• Inconsistent values: Customer Type column with inconsistent values. quantity have
negative
values
• Mixed units: Unit Price column with "USD" unit.
• Inconsistent format: Time column with 12-hour and 24-hour formats.
• outliers: one outlier in rating column '97' probably meant 9.7 and there is some in total and I
n tax column but its not far from threshold so i will not drop it
data tidiness issues:
• city variable separated into 3 columns
• but each city has only one branch so its not big issue

Data cleaning:

• Solving each one of the data issues

Data Visualization:

After that, we now have a clean, tidy and stored data set. We can now use our visuals
to extract some insights from the data, I have made two types of insights. Let’s have a
look at them.

1. General analysis
Some insights about data

2. Customer analysis
Some insights about customer gender, customer type and other aspects

3. Product analysis
Some insights about product line, unit price and quantity and other aspects

4. Payment analysis
Some insights about payment method, rating and total and other aspects

5. Time series analysis

In here I analyze a lot of aspects with respect to time

You might also like