0% found this document useful (0 votes)

193 views61 pages

Data Visualization Using Plotly, Matplotlib, Seaborn and Squarify - Data Science

This document discusses data visualization techniques using Python libraries like Plotly, Matplotlib, Seaborn and Squarify. It provides an overview of various chart types like line charts, bar charts, histograms, boxplots, pie charts and scatter plots that can be created using these libraries. It also includes a case study on analyzing employee attrition rate from an HR dataset using these visualization techniques.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

193 views61 pages

Data Visualization Using Plotly, Matplotlib, Seaborn and Squarify - Data Science

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Data Visualization using plotly, matplotlib,

seaborn and squarify | Data Science

Data Visualization is one of the important activities we perform when doing Exploratory
Data Analysis. It helps in preparing business reports, visual dashboards, storytelling etc
important tasks. In this post I have explained how to ask questions from the data and in
return get the self-explanatory graphs. In this You will learn the use of various python
libraries like plotly, matplotlib, seaborn, squarify etc to plot those graphs.

Key takeaways from this post are:

• Asking questions from data set
• Univariate Analysis
• Bivariate Analysis
• Analysis of more than 3 variables
• 3D Visualization
• Case Study on employee Attrition Rate using HR Data Set
plotly
• Visualization library for the data Era
Line Chart in plotly
• 2 numeric variables with 1-1 mapping, i.e in situations where we have 1 y value
corresponding to 1 x value

You can export images to html file only with offline mode

• https://plot.ly/python/static-image-export/

• https://plot.ly/python/privacy/

Note that this is a bare chart with no information, later in the activity we will add title, x
labels and y labels.
Basic Bar chart in plotly
• 1 Categorical variable

Histogram in plotly
• 1 numeric variable
Boxplot in plotly
• 1 Numeric variable
Pie chart in plotly

• 1 Categorical variable
Note: We do not suggest you use pie chart, one reason being the total is not always
obvious and second, having many levels will make the chart cluttered.
Scatter plot in plotly

• 2 numeric variables
• One x might have multiple corresponding y values
Tree map
https://plot.ly/python/treemaps/
Case Study
Now let us use our new found skill to extract insights from a dataset
hr_data Description
Education 1 ‘Below College’ 2 ‘College’ 3 ‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’
EnvironmentSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

JobInvolvement 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

JobSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
PerformanceRating 1 ‘Low’ 2 ‘Good’ 3 ‘Excellent’ 4 ‘Outstanding’
RelationshipSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

WorkLifeBalance 1 ‘Bad’ 2 ‘Good’ 3 ‘Better’ 4 ‘Best’

Checking the datatypes
Checking the number of unique values in each column
Observations:

Most columns have fewer than 4 unique levels

NumCompaniesWorked and PercentSalaryHike have less than 15 values and we can convert
these into categorical values for analysis purposes,
this is fairly subjective. You can also continue with these as integer values.

Replacing the integers with above values with the values in the description
• hr_data.Education = hr_data.Education.replace(to_replace=[1,2,3,4,5],value=[‘Below
College’, ‘College’, ‘Bachelor’, ‘Master’, ‘Doctor’])
• hr_data.EnvironmentSatisfaction =
hr_data.EnvironmentSatisfaction.replace(to_replace=[1,2,3,4],value=[‘Low’,
‘Medium’, ‘High’, ‘Very High’])
• hr_data.JobInvolvement =
hr_data.JobInvolvement.replace(to_replace=[1,2,3,4],value=[‘Low’, ‘Medium’, ‘High’,
‘Very High’])

• hr_data.JobSatisfaction =
hr_data.JobSatisfaction.replace(to_replace=[1,2,3,4],value=[‘Low’, ‘Medium’, ‘High’,
‘Very High’])
• hr_data.PerformanceRating =
hr_data.PerformanceRating.replace(to_replace=[1,2,3,4],value=[‘Low’, ‘Good’,
‘Excellent’, ‘Outstanding’])
• hr_data.RelationshipSatisfaction =
hr_data.RelationshipSatisfaction.replace(to_replace=[1,2,3,4],value=[‘Low’,
‘Medium’, ‘High’, ‘Very High’])
• hr_data.WorkLifeBalance =
hr_data.WorkLifeBalance.replace(to_replace=[1,2,3,4],value=[‘Bad’, ‘Good’, ‘Better’,
‘Best’])
Extract categorical columns
Columns with 15 or less levels are considered as categorical columns for the purpose of this
analysis

We have decided to treat all the columns with 15 or less levels as categorical columns, the
following few lines of code extract all the columns which satisfy the condition.
Print the categorical column names

Check if the above columns are categorical in the data set

Type Conversion
• n dimensional type conversion to ‘category’ is not implemented yet
Categorical attributes summary

Extracting Numeric Columns

Exploratory Data Analysis
Univariate Analysis
1. What is the attrition rate in the company?

Attrition in numbers (pandas)

This is one way to tell matplotlib to plot the graphs in the notebook
Attrition rate in percentage (pandas)
plotly In percentages
2. What is the Gender Distribution in the company?
Steps to create a bar chart with counts for a categorical variable in plotly

• Steps to create a bar chart with counts for a categorical variable

o create an object and store the counts (optional)
o create a bar object
▪ pass the x values
▪ pass the y values
▪ optional :
▪ text to be displayed
▪ text position
▪ color of the bar
▪ name of the bar (trace in plotly terminology)
o create a layout object
▪ title – font and size of title
▪ x axis – font and size of xaxis text
▪ y axis – font and size of yaxis text
o create a figure object:
▪ add data
▪ add layout
o plot the figure object
Observations:
Irrespective of the distance bin, there is a global pattern i.e every bin
has more male employees
One of the metric to find out if you have chosen the correct number of clusters
is to see if you can give a name to all your clusters in terms of business.

This is all for now. I have also created a report on Employee Attrition Rate
Analysis. you may like to check it as well. Please read it using the below link.

Report on Employee Attrition Rate Analysis

Thank you for reading. Your comments, thoughts on this post are most
welcome.

Agency Accelerator Week
No ratings yet
Agency Accelerator Week
1 page
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
100% (1)
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
41 pages
Python Seaborn Notes
No ratings yet
Python Seaborn Notes
28 pages
Fake News Detection Using Machine Learning Models
No ratings yet
Fake News Detection Using Machine Learning Models
5 pages
How To Use Carrd
No ratings yet
How To Use Carrd
3 pages
LK Ign, Electrical
No ratings yet
LK Ign, Electrical
193 pages
Assignment Cover Sheet: Sthapa@ismt - Edu.np
No ratings yet
Assignment Cover Sheet: Sthapa@ismt - Edu.np
12 pages
Step by Step DFS
No ratings yet
Step by Step DFS
53 pages
Face Detection and Smile Detection
No ratings yet
Face Detection and Smile Detection
8 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
HW1
100% (1)
HW1
8 pages
Churn Modeling
100% (1)
Churn Modeling
11 pages
IRIS BPNN - Ipynb - Colaboratory
100% (1)
IRIS BPNN - Ipynb - Colaboratory
4 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
Currency Recognition On Mobile Phones Proposed System Modules
No ratings yet
Currency Recognition On Mobile Phones Proposed System Modules
26 pages
Data Science
No ratings yet
Data Science
39 pages
Role of Machine Learning in The Field of Fiber Reinforced Polymer
No ratings yet
Role of Machine Learning in The Field of Fiber Reinforced Polymer
6 pages
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
No ratings yet
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
4 pages
AIML - 04 Single Layer Perceptron
No ratings yet
AIML - 04 Single Layer Perceptron
11 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
Career Plans For Next 2 Years
No ratings yet
Career Plans For Next 2 Years
11 pages
05 Logistic - Regression
No ratings yet
05 Logistic - Regression
7 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
DL Practical File
No ratings yet
DL Practical File
58 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
Building Powerful Image Classification Models Using Very Little Data
No ratings yet
Building Powerful Image Classification Models Using Very Little Data
20 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Econ209 f2024 Lab 4 Truong Gia Han
No ratings yet
Econ209 f2024 Lab 4 Truong Gia Han
11 pages
Feature Engineering
No ratings yet
Feature Engineering
23 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
(IJETA-V8I5P1) :yew Kee Wong
No ratings yet
(IJETA-V8I5P1) :yew Kee Wong
5 pages
Python Plotly
No ratings yet
Python Plotly
8 pages
Unit-5 Decision Trees and Ensemble Learning
100% (1)
Unit-5 Decision Trees and Ensemble Learning
162 pages
Pandas Plotting Capabilities
No ratings yet
Pandas Plotting Capabilities
27 pages
Python Data Science
No ratings yet
Python Data Science
25 pages
Data Science
No ratings yet
Data Science
31 pages
Python Setup For Machine Learning
100% (1)
Python Setup For Machine Learning
3 pages
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
No ratings yet
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
4 pages
Ensemble Learning: Wisdom of The Crowd
100% (1)
Ensemble Learning: Wisdom of The Crowd
12 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
Data Mining With Bigdata
No ratings yet
Data Mining With Bigdata
30 pages
Machine Learning in Python Main Developments and T
100% (1)
Machine Learning in Python Main Developments and T
44 pages
ML Project Shivani Pandey
100% (2)
ML Project Shivani Pandey
49 pages
Presentation GPT 4
100% (1)
Presentation GPT 4
25 pages
Data Visualization
No ratings yet
Data Visualization
9 pages
Lead Scoring Case Study Presentation
100% (2)
Lead Scoring Case Study Presentation
11 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
34 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
Scikit Learn Docs
No ratings yet
Scikit Learn Docs
1,810 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Weather Forecasting Basepaper
100% (1)
Weather Forecasting Basepaper
14 pages
CCS355 Neural Networks and Deep Learning Lab
No ratings yet
CCS355 Neural Networks and Deep Learning Lab
43 pages
Python Machine Learning Workboo - AI Publishiing
No ratings yet
Python Machine Learning Workboo - AI Publishiing
308 pages
Salary Prediction LinearRegression
100% (1)
Salary Prediction LinearRegression
7 pages
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
No ratings yet
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
8 pages
Data Preparation For Automated Machine Learning: White Paper
No ratings yet
Data Preparation For Automated Machine Learning: White Paper
21 pages
A Guide To Teaching Data Science PDF
No ratings yet
A Guide To Teaching Data Science PDF
26 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Getting Started - TensorFlow
0% (1)
Getting Started - TensorFlow
14 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
Churn Prediction - Commercial Use of Data Science
No ratings yet
Churn Prediction - Commercial Use of Data Science
25 pages
Recruitment Process of Ufone, Telenor and Zong
100% (3)
Recruitment Process of Ufone, Telenor and Zong
29 pages
Economics Thesis Blue Variant
No ratings yet
Economics Thesis Blue Variant
38 pages
Websys
No ratings yet
Websys
1 page
Encapsulation Worksheet
No ratings yet
Encapsulation Worksheet
2 pages
JUnit 5 - IntelliJ IDEA Documentation
No ratings yet
JUnit 5 - IntelliJ IDEA Documentation
6 pages
Erp Manager
No ratings yet
Erp Manager
2 pages
M. Ed #RD Teacher Education - I
No ratings yet
M. Ed #RD Teacher Education - I
78 pages
LUMEL Transducers Range
No ratings yet
LUMEL Transducers Range
6 pages
弗兰德减速机
No ratings yet
弗兰德减速机
5 pages
5G Boosting Smart Cities Development
No ratings yet
5G Boosting Smart Cities Development
4 pages
Power System Reactance Diagram Questions PDF
No ratings yet
Power System Reactance Diagram Questions PDF
22 pages
Data Sheet: PRO MAX 240W 24V 10A
No ratings yet
Data Sheet: PRO MAX 240W 24V 10A
8 pages
Scheufler Abstract Openfoam 2019
No ratings yet
Scheufler Abstract Openfoam 2019
2 pages
Fractal Previous Year Coding Questions Super Dream
No ratings yet
Fractal Previous Year Coding Questions Super Dream
2 pages
Hands-On Exercise No. 1 Batch-02 Graphic Design Total Marks: 10 Due Date: 04/08/2022
No ratings yet
Hands-On Exercise No. 1 Batch-02 Graphic Design Total Marks: 10 Due Date: 04/08/2022
3 pages
5.-Abstract-of-Canvass ADDITIONAL HAND WASHING FACILITIES
No ratings yet
5.-Abstract-of-Canvass ADDITIONAL HAND WASHING FACILITIES
2 pages
VVDI Prog User Manual V4.9.2
No ratings yet
VVDI Prog User Manual V4.9.2
80 pages
Computerized Enrollment System For Mary
No ratings yet
Computerized Enrollment System For Mary
30 pages
Scenario 11
No ratings yet
Scenario 11
2 pages
Unit 25 SoW Maintenance 25
100% (1)
Unit 25 SoW Maintenance 25
8 pages
Directory Sites List: S.no. Date Client Url Website Url
No ratings yet
Directory Sites List: S.no. Date Client Url Website Url
6 pages
HTML, CSS and JavaScript.
No ratings yet
HTML, CSS and JavaScript.
7 pages
Module 10 Logic
No ratings yet
Module 10 Logic
10 pages
CM Bc9000-Eng-Int-B-Catalogue
No ratings yet
CM Bc9000-Eng-Int-B-Catalogue
20 pages
DISD SD380 Wheel Loader Specs PDF
No ratings yet
DISD SD380 Wheel Loader Specs PDF
8 pages
Mini Projects 1-3-Satyaki Mitra
No ratings yet
Mini Projects 1-3-Satyaki Mitra
33 pages

Data Visualization Using Plotly, Matplotlib, Seaborn and Squarify - Data Science

Uploaded by

Data Visualization Using Plotly, Matplotlib, Seaborn and Squarify - Data Science

Uploaded by

Data Visualization using plotly, matplotlib,

seaborn and squarify | Data Science

Key takeaways from this post are:

JobInvolvement 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’

WorkLifeBalance 1 ‘Bad’ 2 ‘Good’ 3 ‘Better’ 4 ‘Best’

Most columns have fewer than 4 unique levels

Check if the above columns are categorical in the data set

Extracting Numeric Columns

Attrition in numbers (pandas)

• Steps to create a bar chart with counts for a categorical variable

Report on Employee Attrition Rate Analysis

You might also like