0% found this document useful (0 votes)
271 views19 pages

Data Visualization For Python - Sales Retail - r1

This document analyzes and visualizes sales data for a retail company to identify trends and patterns. It uses the CRISP-DM methodology and Python for the analysis. The data is imported into Python and cleaned by removing duplicates and handling missing values. Visualization techniques like histograms, box plots, line graphs and pie charts are used to analyze trends in monthly sales, quantity ordered, price each, sales by product type and country. Correlation between variables like sales and quantity ordered is also checked.

Uploaded by

Mazhar Mahadzir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
271 views19 pages

Data Visualization For Python - Sales Retail - r1

This document analyzes and visualizes sales data for a retail company to identify trends and patterns. It uses the CRISP-DM methodology and Python for the analysis. The data is imported into Python and cleaned by removing duplicates and handling missing values. Visualization techniques like histograms, box plots, line graphs and pie charts are used to analyze trends in monthly sales, quantity ordered, price each, sales by product type and country. Correlation between variables like sales and quantity ordered is also checked.

Uploaded by

Mazhar Mahadzir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Data Visualization using

Python:
Analyzing and Visualizing
Sales Data for Retail
Trend Analysis
Introduction

Nurazureen Binti
Yahya

Mohammad Nizwan
bin Mohd Nasir

Chuay Zi Yang
Group member
Project Goals

O b je c tiv e
✓ Poor data quality can make it
difficult to visualize data
✓ To analyze and visualize sales
data for a retail company to Project accurately or highlight trends
and patterns
identify trends and patterns ✓ Limited numerical data set
Goals

Constraint

3
CRISP-DM Methodology

4
Technical Stacks

Python application

Microsoft Excel
Microsoft
PowerPoint

Application and Software Packages used


Project Architecture and Data Preparation

Deployment
Data preparation
The dataset contains information about
the sales of a retail company, including
the date of sale, the product sold, the
Exploratory data analysis quantity, the price, and the total
revenue

Data cleaning by removing duplicates, handling missing values &


transforming the data if necessary
Data cleaning
• Clean and process the data by removing
Data preparation duplicates, handling missing values, and
transforming the data if necessary

Understand the business


problem

6
Model Building

7
Data sources

Python libraries

Syntax:
import pandas as pd o Import the relevant data sources into Python
import matplotlib.pyplot as plt using appropriate libraries and functions
import seaborn as sns)
o Add encoding='latin1' for pandas able to
load the Latin characters present in a
dataset
Import data sources into python

Syntax:
data = pd.read_csv("C:/Users/Desktop/Desktop/Data
Science/Innodatatics Python Project/sales_data_sample.csv",
encoding='latin1')
Data cleaning

Quick observation to see the right type of data

Syntax:
print(data.head)

Quick observation to see the right type of data

Syntax:
print(data.tail)

o By applying these syntaxes, we can easily


obtain the top 5 and bottom 5 results from
the data.
Continue…
Display column names

Syntax:
data.columns o New data frame syntax, users can accurately
copy the header names without having to
constantly refer back to the Excel file.
Create new data frame for important data only
for processing
Syntax:
data1 = data[['QUANTITYORDERED','PRICEEACH',
'SALES','ORDERDATE', 'STATUS', 'MONTH_ID', 'YEAR_ID',
'PRODUCTLINE', 'COUNTRY', 'DEALSIZE']]

o Column syntax, will retrieve all the header


titles from the data source
Continue…
o By utilizing this syntax, a new column will be
Check duplicated values ( Method 1 )
appended to a new data frame, and the user can
Syntax: verify each line from line 1 to line 2823 individually.
data1.duplicated()
Check duplicated values ( Method 2)

Syntax:
data_check_dupl = data1.copy()
data_check_dupl['Duplicated'] = data1.duplicated()

o Method 1, can obtain the top 5 and bottom


5 results from the data, and if the result is
"False," there is no duplicate data on the
data source.
Continue…
Check duplicated values ( Method 3 )

Syntax:
data_check_dupl['Duplicated'].value_counts()
o The output shows that all key headers have a
value of "0," which indicates that there are
no missing values in any of the rows

Check missing value

Syntax:
data1.isnull().sum()

o Method 3, if using this syntax will display


2822 lines with no 'True' value, indicating
that there are no duplicates present in the
data.
Visualization techniques
Check outliers Check outliers ( Box plot for sales data)

Syntax: Syntax:
data1.describe() plt.boxplot(data1.SALES)

o The output is presented in a tabular format


for numerical data, displaying the mean,
standard deviation, minimum, maximum,
and other relevant statistics.
o The box plot graph reveals the presence of
outliers in the data source
Continue…
Histogram graph for month sales Histogram graph for quantity ordered

o Histogram (Month Orders) – Peaked at year end in tandem with holidays and festive seasons.
o Histogram (Quantity Ordered) – Most clients made orders in quantity from 20 to 50.
Continue…
Histogram graph for sales data Histogram graph for price each

o Histogram (Sales Data) – Most of the sales made had values between 1000 to 6000.
o Histogram (Price Each) – Most of the product sold had prices between 90 to 100.
Continue…
Line graph for sales data
o Line Graph (Sales Data) – Increase of sales is seen
from Year 2003 to 2004 but significantly decreased
from Year 2004 to 2005.
Continue…
Pie chart for total sales by product type Pie chart for total sales by country

o The classic cars product type has emerged as o In terms of overall sales, the USA has
the top-selling product among all the types outperformed other regions, emerging as
the top-selling market
Continue…
Check data correlation

o To check the correlation between two


variables

o Sales and quantity ordered show a positive


and weak correlation

You might also like