Data Visualization Using Python
Dr. Muhammad Hanif
Department of Computer Science, Electrical and Space Engineering
Lulea University of Technology, Sweden
Data Scientist in 2D….
https://www.slideshare.net/joshwills/production-machine-learninginfrastructure
Data Scientist in 3D….
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Data Scientist in 5D….
https://speakerdeck.com/chdoig/the-state-of-python-for-data-science-pyss
Data Scientist in 5D….
https://speakerdeck.com/chdoig/the-state-of-python-for-data-science-pyss
Data Scientists Responsibilities
http://berkeleysciencereview.com/article/first-rule-data-science/
Why Python?
q General purpose
q IPython
q Popular and mature (both API wise and community support wise)
q Glue language (high level APIs, low level C/Fortran bindings)
q Science ecosystem (growing!)
Python’s Popularity: Widespread Knowledge and Many Tools
https://becominghuman.ai/top-20-most-popular-programming-languages-for-2021-and-beyond-735ee8370c61
Python’s Popularity: Widespread Knowledge and Many Tools
https://yalantis.com/blog/top-10-programming-languages/
Avoid Two Language Problem
Python’s Usage: Spread Over Whole Data Science Workflow
https://speakerdeck.com/chdoig/the-state-of-python-for-data-science-pyss-2015
One day at FB’s Data Science: A member could…
Author a multistage a)processing pipeline in Python,
design a hypothesis test, perform a b)Regression analysis
over data sample with R, design and implement an
c)algorithm for some data-intensive service in Hadoop,
or d)communicate the results of our analysis.
Jeff Hammerbacher
http://berkeleysciencereview.com/scienti%EF%AC%81c-collaborations-uc-berkeley-data-driven-cover/
Python Fits All!
Python: Tools
q Interactivity / Collaboration
o Ipython
o Jupyter
q Data Wrangling / Analysis
o Numpy
o Pandas
q Data Visualization
o Matplotlib
o Seaborn etc.
Why
Visualize
?
Visualize to Analyze
Visualize to Analyze
q Patterns q Correlation
q Trends
Make Decision based on a massive dataset
IN ONE
LOOK
Visualize
to
Discover
Interactive Visualization:
Let You Discover Information
https://ocean.sagepub.com/blog/tools-and-tech/turning-covid-19-into-a-data-visualization-exercise-for-your-students
Visualize
to
Support
a Story
Visualize
to tell a
Story
By itself
Distribution of Global Wealth
http://news.bbc.co.uk/2/shared/spl/hi/guides/457000/457022/html/nn5page1.stm
Visualize
to
Teach
Our brain processes
visuals 60,000 times
faster than text
https://twitter.com/omnivex/status/1126879918804094976
Python
Libraries
For Data
Visualization
Data Science is Getting Important for Python Community
6 out of 25 most popular libraries are for Data Science
https://www.python.org/dev/peps/pep-0465/#but-isn-t-matrix-multiplication-a-pretty-niche-requirement
Science Stack is Getting Better Each Day
https://speakerdeck.com/chdoig/the-state-of-python-for-data-science-pyss-2015
Matplotlib
q Python 2D plotting library which produces
publication quality figures in a variety of
hardcopy formats and interactive environments
across platforms.
q Python forerunner library for data visualization.
q “is extremely powerful but with that power
comes complexity.”
Matplotlib
https://realpython.com/python-matplotlib-guide/
Seaborn
q harnesses the power of matplotlib to create
beautiful charts in a few lines of code.
q The key difference is Seaborn’s default styles
and color palettes are designed to be more
aesthetically pleasing and modern.
Seaborn
https://blog.insightdatascience.com/data-visualization-in-python-advanced-functionality-in-seaborn-20d217f1a9a6
ggplot
q plotting system for Python based on R's
ggplot2 and the Grammar of Graphics.
q layer components to create a complete plot.
ggplot
Bokeh
q is also based on The Grammar of Graphics,
but unlike ggplot, it’s native to Python, not
ported over from R.
q supports streaming and real-time data.
Bokeh
https://www.kaggle.com/pavansanagapati/pandas-bokeh-visualization-tutorial
pygal
q offers interactive plots that can be embedded in
the web browser.
o Its prime differentiator is the ability to output
charts as SVGs.
q Each chart type is packaged into a method and
the built-in styles are pretty,
o it’s easy to create a nice-looking chart in a few
lines of code.
pygal
https://www.pluralsight.com/guides/charts-in-pygal
plotly
q making interactive plots, but it offers some
charts you won’t find in most libraries, like
contour plots, dendrograms, and 3D
charts.
plotly
https://pypi.org/project/plotly/
geoplotlib geoplotlib
q toolbox for creating maps and plotting
geographical data.
q You can use it to create a variety of map-types,
like choropleths, heatmaps, and dot density
maps.
geoplotlib geoplotlib
https://www.pluralsight.com/guides/building-geoplots-with-geoplotlib