0% found this document useful (0 votes)
62 views

Python Data Visualisation

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Python Data Visualisation

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

edgetier

COMPLEX DECISIONS SIMPLIFIED

Data Visualisation in Python


Quick and easy routes to plotting magic

Shane Lynn Ph.D.


@shane_a_lynn

www.edgetier.com | [email protected] | @TeamEdgeTier


edgetier Outline
COMPLEX DECISIONS SIMPLIFIED

• Data Visualisation Basics


• Basic Python Setup & Core Libraries
• Code examples and comparisons
• What to avoid
edgetier EdgeTier
COMPLEX DECISIONS SIMPLIFIED

EdgeTier specialise in data and artificial intelligence products for customer


contact centres.

Commercially focused
SaaS to increase revenue
and reduce costs

Focus on data science, AI system works alongside


machine learning, and customer service agents
automation to increase efficiency by
100%
edgetier
Data Visualisation
COMPLEX DECISIONS SIMPLIFIED

Data visualisation is a general term that describes any effort


to help people understand the significance of data by placing
it in a visual context.
edgetier
Data Visualisation
COMPLEX DECISIONS SIMPLIFIED

Choice of Data Visualisation Tool is important

Iteration speed
Un-intrusive
Flexible
Aesthetically pleasing
edgetier Chart Choice
COMPLEX DECISIONS SIMPLIFIED

Source: www.extremepresentation.com
edgetier
Chart Choice – Fearsome Foursome
COMPLEX DECISIONS SIMPLIFIED

BARPLOT HISTOGRAM SCATTER PLOT LINE CHART


Represents the An accurate Show the Shows the
value of entities graphical relationship evolution of
using bar of representation of between 2 numeric numeric variables.
various length. the distribution of variables.
numeric data.
Icons: www.data-to-viz.com
edgetier
Chart Choice – Fearsome Foursome
COMPLEX DECISIONS SIMPLIFIED

Special Mentions

BARPLOT HISTOGRAM
BARPLOT SCATTER PLOT LINE CHART
Represents the BOXPLOT Represents
An accurate SANKEY
the Show CHOROPLETH
the Shows the
value of entities
Summarizevalue
thegraphicalDIAGRAM relationship MAP
of entities evolution of
using bar of representation
distribution using bar
of of of flows
Showing between
with 2 numeric
Display annumeric variables.
various length. the
various
distribution
numeric variables length.of links variables.
smooth aggregated value for
numeric data. each region of a map
Icons: www.data-to-viz.com
edgetier
Data Visualisation in Python
COMPLEX DECISIONS SIMPLIFIED
Interactive
environment

Data
Manipulation
Python Visualisation Library

- Lots of choice of libraries


- Many tools, with varied APIs & outputs
- Best to conquer and become familiar Visualisation
Library
with one / two
seaborn
edgetier Matplotlib
COMPLEX DECISIONS SIMPLIFIED

Grand daddy of Python Plotting


Low level plotting library with
Matlab-like API

+ Very flexible, complete control


- Verbose plots, aesthetically lacking,
sometimes difficult with Pandas

...need to know enough to debug…


edgetier
Pandas / Seaborn / Altair
COMPLEX DECISIONS SIMPLIFIED

Higher level plotting


Pandas – Visualisation API built into
DataFrame & Series objects, interface
to Matplotlib.
Seaborn – extends and provides high-
level API on Matplotlib with
improved styling.
Altair – Built on “Vega-Lite”
visualisation grammar. Allows some
interactive plots in Jupyter
Notebooks.
edgetier
Basic Notebook Setup
COMPLEX DECISIONS SIMPLIFIED

Imports on Matplotlib
Top of notebook – inline
vs notebook style.

Theme also can be


chosen here
edgetier Sample Data
COMPLEX DECISIONS SIMPLIFIED

EdgeTier relevant sample dataset on chat system performance.


Agents answering customer chats from different websites and
languages – 5477 chats over 100 agents.
edgetier The Bar Plot
COMPLEX DECISIONS SIMPLIFIED
edgetier
The Bar Plot - Matplotlib
COMPLEX DECISIONS SIMPLIFIED
Bar plot of chats per user

Python visualisation libraries often require that the


data for plotting is pre-formatted for visualisation.

For Pandas and Matplotlib, the visualisation library


often only present the values, and does not do
calculations.
edgetier
The Bar Plot - Matplotlib
COMPLEX DECISIONS SIMPLIFIED

.bar() function does the work, manually position


‘x’ labels and positions.
Most code here is formatting and display.
edgetier
The Bar Plot - Pandas
COMPLEX DECISIONS SIMPLIFIED

Plot output is Matplotlib – same manipulation.


Slightly simpler API / data access.
edgetier
The Bar Plot - Seaborn
COMPLEX DECISIONS SIMPLIFIED

seaborn

Simpler data access again.


Same Matplotlib formatting functions
edgetier
The Bar Plot - Altair
COMPLEX DECISIONS SIMPLIFIED

Not Matplotlib-based – very different syntax and formatting.


Ordering was difficult here.
Only one command for everything. JSON format behind.
edgetier
The Bar Plot - Altair
COMPLEX DECISIONS SIMPLIFIED

Not Matplotlib-based – very different syntax and formatting.


Ordering was difficult here.
Only one command for everything. JSON format behind.
edgetier The Bar Plot
COMPLEX DECISIONS SIMPLIFIED

seaborn
edgetier
Prettier Pandas Plots
COMPLEX DECISIONS SIMPLIFIED

seaborn

Seaborn styles are applied to all matplotlib plots –


Cheat your way to nicer looking Pandas Plots!
edgetier
More Challenging Bar Plot
COMPLEX DECISIONS SIMPLIFIED
For the top 20 agents, what was the split of the
top websites?
We want a ‘stacked bar’ for this visualisation.
edgetier
Stacked Bar - Matplotlib
COMPLEX DECISIONS SIMPLIFIED
edgetier
Stacked Bar - Matplotlib
COMPLEX DECISIONS SIMPLIFIED
edgetier
Stacked Bar - Matplotlib
COMPLEX DECISIONS SIMPLIFIED
edgetier
Stacked Bar - Pandas
COMPLEX DECISIONS SIMPLIFIED

Plotting code is simple, but data manipulation


required.
edgetier
Stacked Bar - Seaborn
COMPLEX DECISIONS SIMPLIFIED

seaborn

Elegant API, simple code structure,


but …
…embarrassingly…
no stacked-bar chart support!
edgetier
Stacked Bar - Seaborn
COMPLEX DECISIONS SIMPLIFIED

seaborn

Elegant API, simple code structure,


but …
…embarrassingly…
no stacked-bar chart support!
edgetier
Stacked Bar - Altair
COMPLEX DECISIONS SIMPLIFIED

Simple output, short code.


Some issues around data storage,
JSON formats, and sorting is difficult.
edgetier
Seaborn - Estimators
COMPLEX DECISIONS SIMPLIFIED

seaborn

Calculations done as part of plotting – no


previous data manipulations.

Separation of data and visualisation code.


edgetier
Seaborn - Estimators
COMPLEX DECISIONS SIMPLIFIED

seaborn

Very simple to change estimator function to


calculate different statistics.

Similar functionality available in Altair


edgetier Histograms
COMPLEX DECISIONS SIMPLIFIED
edgetier Histograms
COMPLEX DECISIONS SIMPLIFIED

seaborn

All libraries
good at
univariate
distribution
visualisations.
edgetier Histograms
COMPLEX DECISIONS SIMPLIFIED

seaborn
edgetier Histograms
COMPLEX DECISIONS SIMPLIFIED

Layering / comparison achieved unfortunately


by building up the histograms in place.
edgetier
Histograms - Seaborn
COMPLEX DECISIONS SIMPLIFIED

seaborn

Some really nice options for


impressive and informative hints on
Seaborn graphs.
edgetier
Scatter Plots - Pandas
COMPLEX DECISIONS SIMPLIFIED
edgetier
Scatter Plots - Pandas
COMPLEX DECISIONS SIMPLIFIED

Pandas: Good for quick single-coloured scatter visualisations.


Messy with multiple categories.
edgetier
Scatter Plots - Pandas
COMPLEX DECISIONS SIMPLIFIED

Pandas: Good for quick single-coloured scatter visualisations.


Messy with multiple categories.
edgetier
Scatter Plots - Seaborn
COMPLEX DECISIONS SIMPLIFIED

Seaborn / Altair: Better higher level representation, and better


for multi-category scatters.
seaborn
edgetier
Scatter Plots - Altair
COMPLEX DECISIONS SIMPLIFIED

Seaborn / Altair: Better higher level representation, and better


for multi-category scatters.
edgetier Line Plots
COMPLEX DECISIONS SIMPLIFIED
edgetier Line Plots
COMPLEX DECISIONS SIMPLIFIED

Plot chats per language


over time
Pandas: Needs data
manipulation, simple
thereafter.
edgetier Line Plots
COMPLEX DECISIONS SIMPLIFIED

Pandas: Needs data


manipulation, simple
thereafter.
edgetier Line Plots
COMPLEX DECISIONS SIMPLIFIED

Pandas: Needs data


manipulation, simple
thereafter.
edgetier Line Plots
COMPLEX DECISIONS SIMPLIFIED

Seaborn/Altair:
Operate directly on seaborn

raw data
edgetier More Options!
COMPLEX DECISIONS SIMPLIFIED

Geospatial Viz
Folium: Generate interactive
maps using leaflet.js
Matplotlib: Basemap plugin

Interactive Plots
Bokeh: Makes visualisations for
web browser interaction.
Plotly: Online visualisations –
runs by default in cloud
edgetier
What to Avoid – Angles?
COMPLEX DECISIONS SIMPLIFIED

Pie Charts: Radial angle for comparison. Humans are very bad at
accurate radial comparisons – we’ve evolved for speedy length /
distance comparisons.
https://blog.funnel.io/why-we-dont-use-pie-charts-and-some-tips-on-
better-data-visualizations
edgetier
What to Avoid – Angles?
COMPLEX DECISIONS SIMPLIFIED

Pie Charts: Radial angle for comparison. Humans are very bad at
accurate radial comparisons – we’ve evolved for speedy length /
distance comparisons.
https://blog.funnel.io/why-we-dont-use-pie-charts-and-some-tips-on-
better-data-visualizations
edgetier
What to Avoid – Area?
COMPLEX DECISIONS SIMPLIFIED

Area: We’re bad at area – rank these bubbles by area, and compare
them relative to each other.
edgetier
What to Avoid – Area?
COMPLEX DECISIONS SIMPLIFIED

Area: We’re bad at area – rank these bubbles by area, and compare
them relative to each other.

https://www.data-to-viz.com/caveat/area_hard.html
edgetier
What to Avoid – 3d?
COMPLEX DECISIONS SIMPLIFIED

3d: In general, 3D is “fake fancy”. Impractical but gee-whizz – avoid!


Caveat: Interactive Scatters?
edgetier Conclusions
COMPLEX DECISIONS SIMPLIFIED

Wide variety of tools available in Python.

Get familiar with Pandas syntax for quick & simple


exploration, and use with Seaborn themes.
Learn one more high-level library in detail – Seaborn or Altair
for publication of output and more flexibility

“Simplicity is the ultimate sophistication”


Leonardo Da Vinci
edgetier
COMPLEX DECISIONS SIMPLIFIED

Data Visualisation in Python


Quick and easy routes to plotting magic

Shane Lynn PhD


@shane_a_lynn | @TeamEdgeTier

www.edgetier.com | [email protected] | @TeamEdgeTier


edgetier More?
COMPLEX DECISIONS SIMPLIFIED

Resources
Tour of Python’s Data Landscape
https://dsaber.com/2016/10/02/a-dramatic-tour-through-pythons-data-
visualization-landscape-including-ggplot-and-altair/

Python Graph Gallery


https://python-graph-gallery.com/

From Data to Viz


https://www.data-to-viz.com/

You might also like