0% found this document useful (0 votes)
17 views15 pages

Data Visualization: Types of Data Visualization: Charts and Graphs Line Charts

Uploaded by

211cs011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views15 pages

Data Visualization: Types of Data Visualization: Charts and Graphs Line Charts

Uploaded by

211cs011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

DATA VISUALIZATION

Data visualization is the process of creating graphical representations of data to


better understand and communicate information. It involves using visual
elements like charts, graphs, plots, and other visualizations to help people
understand and analyze data more effectively.

TYPES OF DATA VISUALIZATION:

Charts and Graphs

 Line Charts: Used to show trends over time or relationships between


continuous data.

 Bar Charts: Used to compare categorical data across different groups.

 Scatter Plots: Used to show relationships between two continuous


variables.

 Pie Charts: Used to show how different categories contribute to a whole.

 Histograms: Used to show the distribution of continuous data.

Geospatial Visualizations

 Maps: Used to show geographic data, such as population density, climate


patterns, or election results.

 Heat Maps: Used to show density or intensity of data points on a map.

 Geospatial Scatter Plots: Used to show relationships between geographic


data points.

Interactive Visualizations

 Dashboards: Used to provide an overview of multiple data sets and allow


users to explore data in real-time.
 Interactive Scatter Plots: Used to allow users to explore relationships
between data points in real-time.

 Filterable Visualizations: Used to allow users to filter data based on


specific criteria.

Infographics

 Static Infographics: Used to communicate a message or tell a story using


a combination of data, images, and text.

 Interactive Infographics: Used to allow users to explore data and interact


with the visualization in real-time.

Other Types of Data Visualization

 3D Visualizations: Used to show complex relationships between multiple


variables.

 Network Visualizations: Used to show relationships between nodes and


edges, such as social networks or supply chains.

 Radar Charts: Used to compare multiple categories across multiple


dimensions.

To create effective data visualizations, several key elements are required.


Here are some of the most important ones:

Data

 Quality data: Accurate, complete, and relevant data is essential for


creating meaningful visualizations.

 Clean data: Data should be free from errors, inconsistencies, and missing
values.
 Structured data: Data should be organized in a way that makes it easy to
analyze and visualize.

Data Visualization Process

 Define the problem: Identify the business problem or question to be


answered.

 Collect and clean data: Gather and prepare the data for analysis.

 Analyze data: Apply statistical and analytical techniques to extract


insights.

 Design visualization: Create a visualization that effectively communicates


the insights.

 Refine and iterate: Refine the visualization based on feedback and


iteration.

SELECTION OF DATA: Optimizing Delivery Times in E-commerce

from google.colab import files

uploaded = files.upload()

Import Data

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

Loading the dataset

datadf = pd.read_csv('customer_analytics.csv')

df.head()

Data Preprocessing

df.shape

o/p: (10999, 12)

df.dtypes

Dropping column ID because it is an index column

df.drop(['ID'], axis=1, inplace=True)


#Checking for null/missing values

df.isnull().sum()

df.duplicated().sum()

Descriptive Statistics

df.describe()

df.head()
Exploratory Data Analysis

In the exploratory data analysis, I will be looking at the relationship between


the target variable and the other variables. I will also be looking at the
distribution of the variables across the dataset, in order to understand the data
in a better way.

Customer Gender Distribution

plt.pie(df['Gender'].value_counts(),labels = ['F','M'], autopct='%1.1f%%',


startangle=90)

plt.title('Gender Distribution')

The dataset has the equal number of both males and female customers, with
percentage of 49.6% and 50.4% respectively.

Product Properties

df.replace([np.inf, -np.inf], np.nan, inplace=True)

fig, ax = plt.subplots(1,3,figsize=(15,5))
sns.histplot(df['Weight_in_gms'], ax=ax[0], kde=True).set_title('Weight
Distribution')

sns.countplot(x = 'Product_importance', data = df, ax=ax[1]).set_title('Product


Importance')

sns.histplot(df['Cost_of_the_Product'], ax=ax[2], kde=True).set_title('Cost of


the Product')

These three graphs explain the distribution of product properties - Weight, Cost
and Importance in the dataset. Firstly, looking at the weight distribution, we
can see that the products weighing between 1000-2000 grams and 4000-6000
grams are more in number. This means that the company is selling more of the
products in these weight ranges. The second graph is about the product
importance, where majority of the products have low or medium importance.
The third graph is about the cost of the product. Third graph is about the cost
distribution of the products, where there is increased distribution between 150-
200 and 225-275 dollars. From this, I conclude that majority of the products are
lighter than 6000 grams, have low or medium importance and costs between
150-275 dollars.
Logistics

fig, ax = plt.subplots(1,3,figsize=(15,5))

sns.countplot(x = 'Warehouse_block', data = df, ax=ax[0]).set_title('Warehouse


Block')

sns.countplot(x = 'Mode_of_Shipment', data = df, ax=ax[1]).set_title('Mode of


Shipment')

sns.countplot(x = 'Reached.on.Time_Y.N', data = df,


ax=ax[2]).set_title('Reached on Time')

The above graphs visualizes the logistics and delivery of the product. In the first
graph, we can see that the number of products from warehouse F is most i.e.
3500, whereas rest of the warehouses have nearly equal number of products.
The second graph is about the shipment of the product, where majority of the
products are shipped via Ship whereas nearly 2000 products are shipped by
flight and road. Third graph is about the timely delivery of the product where we
can see that the number of products delivered on time is more than the number
of products not delivered on time.
From all the above graph, I assume that warehouse F is close to seaport,
because warehouse F has the most number of products and most of the products
are shipped via ship.

Customer Experience

fig, ax = plt.subplots(2,2,figsize=(15,10))

sns.countplot(x = 'Customer_care_calls', data = df,


ax=ax[0,0]).set_title('Customer Care Calls')

sns.countplot(x = 'Customer_rating', data = df, ax=ax[0,1]).set_title('Customer


Rating')

sns.countplot(x = 'Prior_purchases', data = df, ax=ax[1,0]).set_title('Prior


Purchases')

sns.histplot(x = 'Discount_offered', data = df, ax=ax[1,1], kde =


True).set_title('Discount Offered')

o/p: Text(0.5, 1.0, 'Discount Offered')


The above graphs visualizes the customer experience based on their customer
care calls, rating, prior purchases and discount offered. The first graph shows
the number of customer care calls done by the customers, where we can see that
majority of the customers have done 3-4 calls, which could be a potential
indicator, which shows that customers could be facing with the product delivery.
In the second graph, we can see that the count of customer ratings across all
ratings is same, but there are little more count in rating 1, which means
customers are not satisfied with the service.

The third graph is about the prior purchases done by the customers, where we
can see that majority of the customers have done 2-3 prior purchases, which
means that customers who are having prior purchases, they are satisfied with
the service, and they are buying more products. The fourth graph is about the
discount offered on the products, where we can see that majority of the products
have 0-10% discount, which means that the company is not offering much
discount on the products.

Customer Gender and Product Delivery

sns.countplot(x = 'Gender', data = df, hue =


'Reached.on.Time_Y.N').set_title('Gender vs Reached on Time')

The number of products timely delivered for both the genders is same, which
means there is no relation of customer gender and product delivery.

Customer Experience and Product Delivery

fig, ax = plt.subplots(2,2,figsize=(15,10))

sns.countplot(x = 'Customer_care_calls', data = df, ax=ax[0,0],hue =


'Reached.on.Time_Y.N').set_title('Customer Care Calls')
sns.countplot(x = 'Customer_rating', data = df, ax=ax[0,1],hue =
'Reached.on.Time_Y.N').set_title('Customer Rating')

sns.countplot(x = 'Prior_purchases', data = df, ax=ax[1,0],hue =


'Reached.on.Time_Y.N').set_title('Prior Purchases')

sns.violinplot(x = 'Reached.on.Time_Y.N', y = 'Discount_offered' ,data = df,


ax=ax[1,1]).set_title('Discount Offered')

It is important to understand the customer experience and respond to services


provided by the E-Commerce company. The above graphs explain the
relationship between customer experience and product delivery. The first graph
is about the customer care calls and product delivery, where we that the
difference in timely and late delivery of the product decreases with increase in
the number of calls by the customer, which means that with the delay in product
delivery the customer gets anxious about the product and calls the customer
care. The second graph is about the customer rating and product delivery,
where we can see that customers who rating have higher count of products
delivered on time.

The third graph is about the customer's prior purchase, which also shows that
customers who have done more prior purchases have higher count of products
delivered on time and this is the reason that they are purchasing again from the
company. The fourth graph is about the discount offered on the product and
product delivery, where we can see that products that have 0-10% discount have
higher count of products delivered late, whereas products that have discount
more than 10% have higher count of products delivered on time.

Correlation Matrix Heatmap

plt.figure(figsize=(10,10))

sns.heatmap(df.corr(), annot=True, cmap='coolwarm')


In the correlation matrix heatmap, we can see that there is positive correlation
between cost of product and number of customer care calls.

sns.violinplot(x = 'Customer_care_calls', y = 'Cost_of_the_Product', data = df)

It is clear that customer are more concern regarding the delivery of the product
when the cost of the product is high. This is the reason that they call the
customer care to know the status of the product. So, it is important to make sure
the delivery of the product is on time when the cost of the product is high.
Conclusion

From the exploratory data analysis (EDA), it was found that product
weight and cost significantly impact delivery time. Specifically, products
weighing between 2500 and 3500 grams and costing less than 250 dollars had
a higher likelihood of being delivered on time. Additionally, most products were
shipped from Warehouse F via ship, suggesting that this warehouse might be
located near a seaport, contributing to more efficient deliveries.

Customer behavior also plays a crucial role in predicting delivery timeliness.


The analysis revealed that the more frequently customers call, the higher the
chances of delayed delivery. Interestingly, customers with more prior
purchases tended to experience more timely deliveries, possibly indicating a
higher level of trust in the company, which encourages repeat purchases.
Another observation is that products with a 0-10% discount had a higher rate
of late deliveries, while those with discounts of more than 10% were more often
delivered on time.

Submitted by:

HARSHAPRADHA K(24CESG010)

You might also like