Data visualization in
Python
Martijn Tennekes, Ali Hürriyetoglu
THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION
Eurostat
Outline
• Overview data visualization in Python
• ggplot
• Folium
• Conclusion
2
Eurostat
Which packages/functions
• Standard charts (e.g. line chart, bar chart, scatter plot):
• Matplotlib, Pandas, Seaborn, ggplot, Altair, ...
• Thematic maps
• Folium, Basemap, Cartopy, Iris, …
• Other visualisations
• Bokeh (interactive plots), plotly, …
3
Eurostat
ggplot
• Based on one of the most popular R package (ggplot2)
• Based on the Grammar of Graphics (Wilkinson, 2005)
• Charts are build up according to this grammar:
• data
• mapping / aestetics
• geoms
• stats
• scales
• coord
• Facets
• Pandas DataFrames are used natively in ggplot.
4
Eurostat
ggplot and qplot
Stacking of layers
Data: DataFrame. and transformations
with +
ggplot(mpg, aes(x = displ, y = cty) ) +
geom_point()
Aestatics: x, y, color, fill, shape
Geometry: points
Shortcut function: qplot (quick plot):
qplot(diamonds.carat, diamonds.price)
5
Eurostat
Aesthetics
Mapping of data to
visual attributes of
geometric objects:
– Position: x, y
– Color: color
– Shape: shape
ggplot(aes(x='carat', y='price', color='clarity'), diamonds) +
geom_point()
6
Eurostat
Aesthetics
Mapping of data to
visual attributes of
geometric objects:
– Position: x,y
– Color: color
– Shape: shape
ggplot(aes(x='carat', y='price', shape="cut"), diamonds) +
geom_point()
7
Eurostat
Geom
• Geometric objects:
• Points, lines, polygons, …
• Functions start with “geom_”
• Also margins:
• geom_errorbar(), geom_pointrange(),
geom_linerange().
• Note: they require the aesthetics ymin and
ymax.
ggplot(mpg, aes(x = displ, y = cty)) +
geom_point() + geom_line() 8
Eurostat
Stat
• stat_smooth() and stat_density() enable statistical transformation
• Most geoms have default stat (and the other way round)
• geom and stat form a layer
• One or more layers form a plot
9
Eurostat
stat_smooth
ggplot(aes(x='date', y='beef'), data=meat) + geom_point() + \
stat_smooth(method='loess')
10
Eurostat
stat_density
ggplot(aes(x='price', color='clarity'), data=diamonds) + stat_density()
11
Eurostat
Scales (and axes)
• A scale indicates how the value of a variable scales with an
aesthetic
• Therefore:
• A scale belongs to one aesthetic (x, y, color, fill, etc.)
• The axis is an essential part of a scale
• With scale_XXX, the scales and axes can be adjusted (XXX stands
for the a combination of aesthetic and type of scale, e.g.
scale_fill_gradient)
12
Eurostat
scale_x_log
ggplot(diamonds, aes(x='price')) + geom_histogram() + scale_x_log(base=100)
13
Eurostat
Coord
• A chart is drawn in a coordinate
system. This can be transformed.
• A pie chart has a polar coordinate
system.
df = pd.DataFrame({"x": np.arange(100)})
df['y'] = df.x * 10 # polar coords
p = ggplot(df, aes(x='x', y='y')) + geom_point() + coord_polar()
print(p)
14
Eurostat
Facets
• With facets, small
multiples are created.
• Each facet shows a subset
of the data.
ggplot(diamonds, aes(x='price')) + \
geom_histogram() + \
facet_grid("cut")
15
Eurostat
Facets example
ggplot(chopsticks, aes(x='chopstick_length',
y='food_pinching_effeciency')) + \
geom_point() + \
geom_line() + \
scale_x_continuous(breaks=[150, 250, 350]) + \
facet_wrap("individual") 16
Eurostat
Facets
example 2
ggplot(diamonds, aes(x="carat", y="price", color="color",
shape="cut")) + geom_point() + facet_wrap("clarity") 17
Eurostat
ggplot tips
• You can annotate plots
ggplot(mtcars, aes(x='mpg')) + geom_histogram() + \
xlab("Miles per Gallon") + ylab("# of Cars")
• Assign a plot to a variable, for instance g:
g = ggplot(mpg, aes(x = displ, y = cty)) +
geom_point()
• The function save saves the plot to the desired format:
g.save(“myimage.png”)
18
Eurostat
Folium: Thematic maps
• A thematic map is a visualization where statistical
information with a spatial component is shown.
• Other libraries are: Basemap, Cartopy, Iris
• Folium builds on the data wrangling strengths of
the Python ecosystem and the mapping strengths
of the Leaflet.js library.
• Manipulate your data in Python, then visualize it
in on a Leaflet map via Folium.
19
Eurostat
Folium features
• Built-in tilesets from OpenStreetMap, MapQuest
Open, MapQuest Open Aerial, Mapbox, and
Stamen
• Supports custom tilesets with Mapbox or Cloudmade API
keys.
• Supports GeoJSON and TopoJSON overlays,
• as well as the binding of data to those overlays to create
choropleth maps with color-brewer color schemes.
20
Eurostat
Basic Maps
folium.Map(location=[50.89, 5.99], zoom_start=14)
21
Eurostat
Basic maps
folium.Map(location=[50.89, 5.99], zoom_start=14, tiles='Stamen Toner')
22
Eurostat
GeoJSON/TopoJSON Overlays
ice_map = folium.Map(location=[-59, -11], tiles='Mapbox Bright', zoom_start=2)
ice_map.geo_json(geo_path=geo_path)
ice_map.geo_json(geo_path=topo_path, topojson='objects.antarctic_ice_shelf')
ice_map.create_map(path='ice_map.html') 23
Eurostat
Choropleth maps
map = folium.Map(location=[48, -102], zoom_start=3)
map.choropleth(geo_path=state_geo, data=state_data,
columns=['State', 'Unemployment'], key_on='feature.id',
fill_color='YlGn', fill_opacity=0.7, line_opacity=0.2, 24
legend_name='Unemployment Rate (%)')
Eurostat
Summary
• Python has many options for data visualization
• Each visualisation library has a particular audience
• Javascript backend is mostly used to extend power of the
visualisation
• Python’s extensive data processing tools integrates well
with visualisation requirements
25
Eurostat
References
• http://yhat.github.io/ggplot/
• https://folium.readthedocs.io/en/latest/
26
Eurostat