Data Visualization For Python - Sales Retail - r1
Data Visualization For Python - Sales Retail - r1
Python:
Analyzing and Visualizing
Sales Data for Retail
Trend Analysis
Introduction
Nurazureen Binti
Yahya
Mohammad Nizwan
bin Mohd Nasir
Chuay Zi Yang
Group member
Project Goals
O b je c tiv e
✓ Poor data quality can make it
difficult to visualize data
✓ To analyze and visualize sales
data for a retail company to Project accurately or highlight trends
and patterns
identify trends and patterns ✓ Limited numerical data set
Goals
Constraint
3
CRISP-DM Methodology
4
Technical Stacks
Python application
Microsoft Excel
Microsoft
PowerPoint
Deployment
Data preparation
The dataset contains information about
the sales of a retail company, including
the date of sale, the product sold, the
Exploratory data analysis quantity, the price, and the total
revenue
6
Model Building
7
Data sources
Python libraries
Syntax:
import pandas as pd o Import the relevant data sources into Python
import matplotlib.pyplot as plt using appropriate libraries and functions
import seaborn as sns)
o Add encoding='latin1' for pandas able to
load the Latin characters present in a
dataset
Import data sources into python
Syntax:
data = pd.read_csv("C:/Users/Desktop/Desktop/Data
Science/Innodatatics Python Project/sales_data_sample.csv",
encoding='latin1')
Data cleaning
Syntax:
print(data.head)
Syntax:
print(data.tail)
Syntax:
data.columns o New data frame syntax, users can accurately
copy the header names without having to
constantly refer back to the Excel file.
Create new data frame for important data only
for processing
Syntax:
data1 = data[['QUANTITYORDERED','PRICEEACH',
'SALES','ORDERDATE', 'STATUS', 'MONTH_ID', 'YEAR_ID',
'PRODUCTLINE', 'COUNTRY', 'DEALSIZE']]
Syntax:
data_check_dupl = data1.copy()
data_check_dupl['Duplicated'] = data1.duplicated()
Syntax:
data_check_dupl['Duplicated'].value_counts()
o The output shows that all key headers have a
value of "0," which indicates that there are
no missing values in any of the rows
Syntax:
data1.isnull().sum()
Syntax: Syntax:
data1.describe() plt.boxplot(data1.SALES)
o Histogram (Month Orders) – Peaked at year end in tandem with holidays and festive seasons.
o Histogram (Quantity Ordered) – Most clients made orders in quantity from 20 to 50.
Continue…
Histogram graph for sales data Histogram graph for price each
o Histogram (Sales Data) – Most of the sales made had values between 1000 to 6000.
o Histogram (Price Each) – Most of the product sold had prices between 90 to 100.
Continue…
Line graph for sales data
o Line Graph (Sales Data) – Increase of sales is seen
from Year 2003 to 2004 but significantly decreased
from Year 2004 to 2005.
Continue…
Pie chart for total sales by product type Pie chart for total sales by country
o The classic cars product type has emerged as o In terms of overall sales, the USA has
the top-selling product among all the types outperformed other regions, emerging as
the top-selling market
Continue…
Check data correlation