Data Visualization: Types of Data Visualization: Charts and Graphs Line Charts
Data Visualization: Types of Data Visualization: Charts and Graphs Line Charts
Geospatial Visualizations
Interactive Visualizations
Infographics
Data
Clean data: Data should be free from errors, inconsistencies, and missing
values.
Structured data: Data should be organized in a way that makes it easy to
analyze and visualize.
Collect and clean data: Gather and prepare the data for analysis.
uploaded = files.upload()
Import Data
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
datadf = pd.read_csv('customer_analytics.csv')
df.head()
Data Preprocessing
df.shape
df.dtypes
df.isnull().sum()
df.duplicated().sum()
Descriptive Statistics
df.describe()
df.head()
Exploratory Data Analysis
plt.title('Gender Distribution')
The dataset has the equal number of both males and female customers, with
percentage of 49.6% and 50.4% respectively.
Product Properties
fig, ax = plt.subplots(1,3,figsize=(15,5))
sns.histplot(df['Weight_in_gms'], ax=ax[0], kde=True).set_title('Weight
Distribution')
These three graphs explain the distribution of product properties - Weight, Cost
and Importance in the dataset. Firstly, looking at the weight distribution, we
can see that the products weighing between 1000-2000 grams and 4000-6000
grams are more in number. This means that the company is selling more of the
products in these weight ranges. The second graph is about the product
importance, where majority of the products have low or medium importance.
The third graph is about the cost of the product. Third graph is about the cost
distribution of the products, where there is increased distribution between 150-
200 and 225-275 dollars. From this, I conclude that majority of the products are
lighter than 6000 grams, have low or medium importance and costs between
150-275 dollars.
Logistics
fig, ax = plt.subplots(1,3,figsize=(15,5))
The above graphs visualizes the logistics and delivery of the product. In the first
graph, we can see that the number of products from warehouse F is most i.e.
3500, whereas rest of the warehouses have nearly equal number of products.
The second graph is about the shipment of the product, where majority of the
products are shipped via Ship whereas nearly 2000 products are shipped by
flight and road. Third graph is about the timely delivery of the product where we
can see that the number of products delivered on time is more than the number
of products not delivered on time.
From all the above graph, I assume that warehouse F is close to seaport,
because warehouse F has the most number of products and most of the products
are shipped via ship.
Customer Experience
fig, ax = plt.subplots(2,2,figsize=(15,10))
The third graph is about the prior purchases done by the customers, where we
can see that majority of the customers have done 2-3 prior purchases, which
means that customers who are having prior purchases, they are satisfied with
the service, and they are buying more products. The fourth graph is about the
discount offered on the products, where we can see that majority of the products
have 0-10% discount, which means that the company is not offering much
discount on the products.
The number of products timely delivered for both the genders is same, which
means there is no relation of customer gender and product delivery.
fig, ax = plt.subplots(2,2,figsize=(15,10))
The third graph is about the customer's prior purchase, which also shows that
customers who have done more prior purchases have higher count of products
delivered on time and this is the reason that they are purchasing again from the
company. The fourth graph is about the discount offered on the product and
product delivery, where we can see that products that have 0-10% discount have
higher count of products delivered late, whereas products that have discount
more than 10% have higher count of products delivered on time.
plt.figure(figsize=(10,10))
It is clear that customer are more concern regarding the delivery of the product
when the cost of the product is high. This is the reason that they call the
customer care to know the status of the product. So, it is important to make sure
the delivery of the product is on time when the cost of the product is high.
Conclusion
From the exploratory data analysis (EDA), it was found that product
weight and cost significantly impact delivery time. Specifically, products
weighing between 2500 and 3500 grams and costing less than 250 dollars had
a higher likelihood of being delivered on time. Additionally, most products were
shipped from Warehouse F via ship, suggesting that this warehouse might be
located near a seaport, contributing to more efficient deliveries.
Submitted by:
HARSHAPRADHA K(24CESG010)