Data Visualization Using Plotly, Matplotlib, Seaborn and Squarify - Data Science
Data Visualization Using Plotly, Matplotlib, Seaborn and Squarify - Data Science
You can export images to html file only with offline mode
• https://plot.ly/python/static-image-export/
• https://plot.ly/python/privacy/
Note that this is a bare chart with no information, later in the activity we will add title, x
labels and y labels.
Basic Bar chart in plotly
• 1 Categorical variable
Histogram in plotly
• 1 numeric variable
Boxplot in plotly
• 1 Numeric variable
Pie chart in plotly
• 1 Categorical variable
Note: We do not suggest you use pie chart, one reason being the total is not always
obvious and second, having many levels will make the chart cluttered.
Scatter plot in plotly
• 2 numeric variables
• One x might have multiple corresponding y values
Tree map
https://plot.ly/python/treemaps/
Case Study
Now let us use our new found skill to extract insights from a dataset
hr_data Description
Education 1 ‘Below College’ 2 ‘College’ 3 ‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’
EnvironmentSatisfaction 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’
NumCompaniesWorked and PercentSalaryHike have less than 15 values and we can convert
these into categorical values for analysis purposes,
this is fairly subjective. You can also continue with these as integer values.
Replacing the integers with above values with the values in the description
• hr_data.Education = hr_data.Education.replace(to_replace=[1,2,3,4,5],value=[‘Below
College’, ‘College’, ‘Bachelor’, ‘Master’, ‘Doctor’])
• hr_data.EnvironmentSatisfaction =
hr_data.EnvironmentSatisfaction.replace(to_replace=[1,2,3,4],value=[‘Low’,
‘Medium’, ‘High’, ‘Very High’])
• hr_data.JobInvolvement =
hr_data.JobInvolvement.replace(to_replace=[1,2,3,4],value=[‘Low’, ‘Medium’, ‘High’,
‘Very High’])
• hr_data.JobSatisfaction =
hr_data.JobSatisfaction.replace(to_replace=[1,2,3,4],value=[‘Low’, ‘Medium’, ‘High’,
‘Very High’])
• hr_data.PerformanceRating =
hr_data.PerformanceRating.replace(to_replace=[1,2,3,4],value=[‘Low’, ‘Good’,
‘Excellent’, ‘Outstanding’])
• hr_data.RelationshipSatisfaction =
hr_data.RelationshipSatisfaction.replace(to_replace=[1,2,3,4],value=[‘Low’,
‘Medium’, ‘High’, ‘Very High’])
• hr_data.WorkLifeBalance =
hr_data.WorkLifeBalance.replace(to_replace=[1,2,3,4],value=[‘Bad’, ‘Good’, ‘Better’,
‘Best’])
Extract categorical columns
Columns with 15 or less levels are considered as categorical columns for the purpose of this
analysis
We have decided to treat all the columns with 15 or less levels as categorical columns, the
following few lines of code extract all the columns which satisfy the condition.
Print the categorical column names
Type Conversion
• n dimensional type conversion to ‘category’ is not implemented yet
Categorical attributes summary
This is one way to tell matplotlib to plot the graphs in the notebook
Attrition rate in percentage (pandas)
plotly In percentages
2. What is the Gender Distribution in the company?
Steps to create a bar chart with counts for a categorical variable in plotly
This is all for now. I have also created a report on Employee Attrition Rate
Analysis. you may like to check it as well. Please read it using the below link.
Thank you for reading. Your comments, thoughts on this post are most
welcome.