21AD71 Module 4 Textbook
21AD71 Module 4 Textbook
MODULE 4
Introduction
geoplotlib is an open-source Python library for geospatial data visualizations. It has
a wide range of geographical visualizations and supports hardware acceleration.
It also provides performance rendering for large datasets with millions of data
points. As discussed in earlier chapters, Matplotlib provides various ways to visualize
geographical data.
However, Matplotlib is not designed for this task because its interfaces are
complicated and inconvenient to use. Matplotlib also restricts how geographical
data can be displayed. The Basemap and Cartopy libraries allow you to plot on
a world map, but these packages do not support drawing on map tiles. Map tiles
are underlying rectangular, square, or hexagonal tile slabs that are used to create
a seamless map of the world, with lightweight, individually requested tiles that are
currently in view.
geoplotlib, on the other hand, was designed precisely for this purpose; it not only
provides map tiles but also allows for interactivity and simple animations. It provides
a simple interface that allows access to compelling geospatial visualizations such
as histograms, point-based plots, tessellations such as Voronoi or Delaunay, and
choropleth plots.
In the exercises and activities in this chapter, we will use geoplotlib in combination
with different real-world datasets to do the following:
• Discover dense areas within cities in Europe that have a high population
• Create a custom animated layer that displays the time series data of aircraft
Introduction | 257
geoplotlib uses the concept of layers that can be placed on top of one another,
providing a powerful interface for even complex visualizations. It comes with several
common visualization layers that are easy to set up and use.
258 | Plotting Geospatial Data
From the preceding diagram, we can see that geoplotlib is built on top of NumPy/
SciPy and Pyglet/OpenGL. These libraries take care of numerical operations and
rendering. Both components are based on Python, therefore enabling the use of the
full Python ecosystem.
Note
All the datasets used in this chapter can be found at
https://packt.live/3bzApYN. All the files of exercises and
activities can be found here: https://packt.live/2UJRbyt.
geoplotlib fully integrates into the Python ecosystem. This even enables us to
plot geographical data inline inside our Jupyter Notebooks. This possibility allows
us to design our visualizations quickly and iteratively.
• Simplicity: Looking at the example provided here, we can quickly see that
geoplotlib abstracts away the complexity of plotting map tiles and already-
provided layers such as dot density and histogram. It has a simple API that
provides common visualizations. These visualizations can be created using
custom data with only a few lines of code.
The core attributes of our datasets are lat and lon values. Latitude and
longitude values enable us to index every single location on Earth. In geoplotlib,
we need them to tell the library where on the map our elements need to be
rendered. If our dataset comes with lat and lon columns, we can display each
of those data points, for example, dots on a map with five lines of code.
260 | Plotting Geospatial Data
In addition, we can use the f_tooltip argument to provide a popup for each
point as an element of the column we provide as a source as follows:
dataset_obj = DataAccessObject(dataset_filtered)
geoplotlib.dot(dataset_obj, \
f_tooltip=lambda d:d['City'].title())
geoplotlib.show()
Executing this code will result in the following dot density plot:
Figure 5.2: Dot density layer of cities in Brazil and an overlay of the city on hovering
Geospatial Visualizations | 261
Next, we will create geographical visualizations without much effort and discover the
advantages of using geoplotlib in combination with pandas. We will implement an
exercise that plots the cities of the world and will be able to feel the performance of
the library when plotting thousands of dots on our map.
Geospatial Visualizations
Voronoi tessellation, Delaunay triangulation, and choropleth plots are a few of
the geospatial visualizations that will be used in this chapter. An explanation for each
of them is provided here.
Voronoi Tessellation
In a Voronoi tessellation, each pair of data points is separated by a line that is the
same distance from both data points. The separation creates cells that, for every
given point, marks which data point is closer. The closer the data points, the smaller
the cells.
The following example shows how you can simply use the voronoi method to
create this visualization:
geoplotlib.show()
After importing the dependencies we need, we read the dataset using the read_csv
method of pandas (or geoplotlib). We then use it as data for our voronoi method,
which handles all the complex logic of plotting the data on the map.
262 | Plotting Geospatial Data
In addition to the data itself, we can set several parameters, such as general
smoothing using the set_smoothing method. The smoothing of the lines
uses anti-aliasing:
Delaunay Triangulation
A Delaunay triangulation is related to Voronoi tessellation. When connecting each
data point to every other data point that shares an edge, we end up with a plot that
is triangulated. The closer the data points are to each other, the smaller the triangles
will be. This gives us a visual clue about the density of points in specific areas. When
combined with color gradients, we get insights about points of interest, which can be
compared with a heatmap:
geoplotlib.show()
Geospatial Visualizations | 263
This example uses the same dataset as before, that is, population density in Brazil.
The structure of the code is the same as in the voronoi example.
After importing the dependencies that we need, we read the dataset using the read_
csv method and then use it as data for our delaunay method, which handles all of
the complex logic of plotting data on the map.
In addition to the data itself, we can again use the set_smoothing method to
smooth the lines using anti-aliasing.
Choropleth Plot
This kind of geographical plot displays areas such as the states of a country in
a shaded or colored manner. The shade or color of the plot is determined by a
single data point or a set of data points. It gives an abstract view of a geographical
area to visualize the relationships and differences between the different areas. In
the following code and visual example, we can see that the unemployment rate
determines the shade of each state of the US. The darker the shade, the higher
the rate:
"""
plot the outlines of the states and color them using the unemployment
rate
"""
cmap = ColorMap('Reds', alpha=255, levels=10)
geoplotlib.geojson('../../Datasets/us_states_shapes.json', \
fill=True, color=get_color, \
f_tooltip=lambda properties: properties['NAME'])
geoplotlib.geojson('../../Datasets/us_states_shapes.json', \
fill=False, color=[255, 255, 255, 64])
geoplotlib.set_bbox(BoundingBox.USA)
geoplotlib.show()
We will cover what each line does in more detail later. However, to give you a better
understanding of what is happening here, we will quickly cover the sections of the
preceding code.
The first few lines import all the necessary dependencies, including geoplotlib and
json, which will be used to load our dataset, which is provided in this format.
After the import statements, we see a get_color method. This method returns
a color that has been determined by the unemployment rate of the given data point.
This method defines how dark the red value will be. In the last section of the script,
we read our dataset and use it with the geojson method.
The choropleth plot is one of the only visualizations that does not have a method
assigned that is solely used for this kind of plot. We use the geojson() method to
create more complex shapes than simple dots. By using the f_tooltip argument,
we can also display the name of the city we are hovering over.
The BoundingBox object is an object to define the "corners" of the viewport. We can
set an initial focus when running our visualization, which helps the user see what the
visualization is about without panning around and zooming first.
266 | Plotting Geospatial Data
Executing this code with the right example dataset provides the
following visualization:
Figure 5.5: Choropleth plot of unemployment rates in the US; the darker the color, the
higher the value
Exercise 5.01: Plotting Poaching Density Using Dot Density and Histograms
In this exercise, we'll be looking at the primary use of geoplotlib's plot methods for
dot density, histograms, and Voronoi diagrams. For this, we will make use of data
on various poaching incidents that have taken place all over the world.
The dataset that we will be using here contains data from poaching incidents in
Tanzania. The dataset consists of 268 rows and 6 columns (id_report, date_
report, description, created_date, lat, and lon).
Geospatial Visualizations | 267
Note that geoplotlib requires your dataset to have both lat and lon columns. These
columns are the geographical data for latitude and longitude, which are used to
determine how to plot the data. The following are the steps to perform:
dataset = read_csv('../../Datasets/poaching_points_cleaned.csv')
4. Print out the dataset and look at its type. What difference do you see compared
to a pandas DataFrame? Let's take a look:
6. Plot each row of our dataset as a single point on the map using a dot density
layer by calling the dot method. Then, call the show method to render the map
with a given layer:
Only looking at the lat and lon values in the dataset won't give us a very good
idea of where on the map our elements are located or how far apart they are.
We're not able to draw conclusions and get insights into our dataset without
visualizing our data points on a map. When looking at the rendered map, we
can instantly see that some areas have more incidents than others. This insight
couldn't have been easily identified by simply looking at the numbers in the
dataset itself.
7. Visualize the density using the hist method, which will create a Histogram
Layer on top of our map tiles. Then, define a binsize of 20. This will allow us
to set the size of the hist bins in our visualization:
8. Create a Voronoi plot using the same dataset. Use a color map cmap of
'Blues_r' and define the max_area parameter as 1e5:
# plotting a voronoi map
geoplotlib.voronoi(dataset, cmap='Blues_r', \
max_area=1e5, alpha=255)
geoplotlib.show()
Geospatial Visualizations | 271
Note
To access the source code for this specific section, please refer to
https://packt.live/2UIwGkT.
This section does not currently have an online interactive example, and will
need to be run locally.
Voronoi plots are good for visualizing the density of data points, too. Voronoi
introduces a little bit more complexity with several parameters, such as cmap, max_
area, and alpha. Here, cmap denotes the color of the map, alpha denotes the
color of the alpha, and max_area denotes a constant that determines the color of
the Voronoi areas.
272 | Plotting Geospatial Data
If we compare this Voronoi visualization with the histogram plot, we can see that
one area draws a lot of attention. The center-right edge of the plot shows quite a
large dark blue area with an even darker center: something that could've easily been
overlooked with the histogram plot.
We have now covered the basics of geoplotlib. It has many more methods, but they
all have a similar API that makes using the other methods simple. Since we have
looked at some very basic visualizations, it's now up to you to solve the first activity.
3. List all the datatypes that are present in it and verify that they are correct.
Then, map the Latitude and Longitude columns to lat and lon.
5. Use the agg method of pandas to get the average number of cities per country.
6. Obtain the number of cities per country (the first 20 entries) and extract the
countries that have a population of greater than zero.
8. Again, filter your remaining data for cities with a population of greater
than 100,000.
9. To get a better understanding of the density of our data points on the map, use
a Voronoi tessellation layer.
10. Filter down the data even further to only cities in countries such as Germany and
Great Britain.
11. Finally, use a Delaunay triangulation layer to find the most densely
populated areas.
Geospatial Visualizations | 273
Figure 5.13: A Delaunay triangle visualization of cities in Germany and Great Britain
Note
The solution for this activity can be found on page 436.
Geospatial Visualizations | 275
You have now completed your first activity using geoplotlib. Note how we made use
of different plots to get the information we required. Next, we will look at some more
custom features of geoplotlib that will allow us to change the map tiles provider and
create custom plotting layers.
{
"type": "Feature",
"properties": {
"name": "Dinagat Islands"
},
"geometry": {
"type": "Point",
"coordinates": [125.6, 10.1]
}
}
2. Since the geojson method of geoplotlib only needs a path to the us_states.
json dataset instead of a DataFrame or object, we don't need to load it.
However, since we still want to see what kind of data we are handling, we must
open the GeoJSON file and load it as a json object. We can then access its
members using simple indexing:
Our dataset contains a few properties. Only the state name, NAME, and the
number of consensus areas, CENSUSAREA, are important for us in this exercise.
Note
Geospatial applications prefer GeoJSON files for persisting and exchanging
geographical data.
3. Extract the names of all the states of the USA from the dataset. Next, print the
number of states in the dataset and then print all the states as a list:
4. If your GeoJSON file is valid, that is, if it has the expected structure, then use the
geojson method of geoplotlib. Create a GeoJSON plot using the geojson()
method of geoplotlib:
After calling the show method, the map will show up with a focus on North
America. In the following diagram, we can already see the borders of each state:
5. Rather than assigning a single value to each state, we want the darkness to
represent the number of census areas. To do this, we have to provide a method
for the color property. Map the CENSUSAREA attribute to a ColorMap class
object with 10 levels to allow a good distribution of color. Provide a maxvalue
of 300000 to the to_color method to define the upper limit of our dataset:
As you can see in the code example, we can provide three arguments to our
ColorMap. The first one, 'Reds', in our case, defines the basic coloring
scheme. The alpha argument defines how opaque we want the color to be,
255 being 100% opaque, and 0 completely invisible. Those 8-bit values for the
Red, Green, Blue, and Alpha (RGBA) values are commonly used in styling: they
all range from 0 to 255. With the levels argument, we can define how many
"steps," that is, levels of red values, we can map to.
6. Use the us_states.json file in the Datasets folder to visualize the different
states. First, provide the color mapping to our color parameter and set the
fill parameter to True. Then, draw a black outline for each state. Use the
color argument and provide the RGBA value for black. Lastly, use the USA
constant of the BoundingBox class to set the bounding box:
"""
plotting the shaded states and adding another layer which plots the
state outlines in white
our BoundingBox should focus the USA
"""
geoplotlib.geojson('../../Datasets/us_states.json', \
fill=True, color=get_color)
geoplotlib.geojson('../../Datasets/us_states.json', \
fill=False, color=[0, 0, 0, 255])
geoplotlib.set_bbox(BoundingBox.USA)
geoplotlib.show()
280 | Plotting Geospatial Data
A new window will open, displaying the country, USA, with the areas of its
states filled with different shades of red. The darker areas represent higher
census areas.
7. To give the user some more information about this plot, use the f_tooltip
argument to provide a tooltip displaying the name and census area value of the
state currently hovered over:
geoplotlib.geojson('../../Datasets/us_states.json', \
fill=False, color=[0, 0, 0, 255])
geoplotlib.set_bbox(BoundingBox.USA)
geoplotlib.show()
Upon hovering, we will get a tooltip for each of the plotted areas displaying the
name of the state and the census area value.
Note
To access the source code for this specific section, please refer to
https://packt.live/30PX9Rh.
This section does not currently have an online interactive example, and will
need to be run locally.
282 | Plotting Geospatial Data
You've already built different plots and visualizations using geoplotlib. In this exercise,
we looked at displaying data from a GeoJSON file and creating a choropleth plot.
In the following topics, we will cover more advanced customizations that will give you
the tools to create more powerful visualizations.
Tile Providers
geoplotlib supports the use of different tile providers. This means that any
OpenStreetMap tile server can be used as a backdrop for our visualization. Some of
the popular free tile providers include Stamen Watercolor, Stamen Toner, Stamen
Toner Lite, and DarkMatter. Changing the tile provider can be done in two ways:
geoplotlib contains a few built-in tile providers with shortcuts. The following code
shows you how to use it:
geoplotlib.tiles_provider('darkmatter')
geoplotlib.tiles_provider({\
'url': lambda zoom, \
xtile, ytile:
'http://a.tile.stamen.com/'\
'watercolor/%d/%d/%d.png' \
% (zoom, xtile, ytile),\
'tiles_dir': 'tiles_dir',
'attribution': \
'Python Data Visualization | Packt'\
})
Tile Providers | 283
The caching in tiles_dir is mandatory since, each time the map is scrolled or
zoomed into, we query new map tiles if they are not already downloaded. This
can lead to the tile provider refusing your request due to too many requests
occurring in a short period of time.
In the following exercise, we'll take a quick look at how to switch the map tile
provider. It might not seem convincing at first, but it can take your visualizations to
the next level if leveraged correctly.
import geoplotlib
We won't use a dataset in this exercise since we want to focus on the map tiles
and tile providers.
geoplotlib.show()
284 | Plotting Geospatial Data
This will display an empty world map since we haven't specified a tile provider.
By default, it will use the CartoDB Positron map tiles.
Tile Providers | 285
In this example, we used the darkmatter map tiles. As you can see, they are
very dark and will make your visualizations pop out.
Note
We can also use different map tiles such as watercolor, toner,
toner-lite, and positron in a similar way.
geoplotlib.tiles_provider({
'url': lambda zoom, \
xtile, ytile: \
'http://a.tile.openstreetmap.fr/'\
'hot/%d/%d/%d.png' \
% (zoom, xtile, ytile),\
'tiles_dir': 'custom_tiles',
'attribution': 'Custom Tiles '\
'Provider – Humanitarian map style'\
})
geoplotlib.show()
Tile Providers | 287
Figure 5.20: Humanitarian map tiles from the custom tile providers object
288 | Plotting Geospatial Data
Some map tile providers have strict request limits, so you may see warning
messages if you're zooming in too fast.
Note
To access the source code for this specific section, please refer to
https://packt.live/3e6WjTT.
This section does not currently have an online interactive example, and will
need to be run locally.
You now know how to change the tile provider to give your visualization one more
layer of customizability. This also introduces us to another layer of complexity. It
all depends on the concept of our final product and whether we want to use the
"default" map tiles or some artistic map tiles.
The next section will cover how to create custom layers that can go far beyond
the ones we have described in this book. We'll look at the basic structure of the
BaseLayer class and what it takes to create a custom layer.
Custom Layers
Now that we have covered the basics of visualizing geospatial data with built-in
layers and methods to change the tile provider, we will now focus on defining our
custom layers. Custom layers allow you to create more complex data visualizations.
They also help with adding more interactivity and animation to them. Creating a
custom layer starts by defining a new class that extends the BaseLayer class that's
provided by geoplotlib. Besides the __init__ method, which initializes the class
level variables, we also have to, at the very least, extend the draw method of the
BaseLayer class already provided.
Depending on the nature of your visualization, you might also want to implement
the invalidate method, which takes care of map projection changes such as
zooming into your visualization. Both the draw and invalidate methods receive
a Projection object that takes care of the latitude and longitude mapping on our
two-dimensional viewport. These mapped points can be handed to an instance of a
BatchPainter object that provides primitives such as points, lines, and shapes to
draw those coordinates onto your map.
Custom Layers | 289
class CountrySelectLayer(BaseLayer):
self.country_num = (self.country_num + 1) \
% len(countries)
return True
elif key == pyglet.window.key.LEFT:
self.country_num = (self.country_num - 1) \
% len(countries)
return True
return False
europe_bbox = BoundingBox(north=68.574309, \
west=-25.298424, \
south=34.266013, \
east=47.387123)
geoplotlib.add_layer(CountrySelectLayer(dataset, europe_bbox))
geoplotlib.show()
As we've seen several times before, we first import all the necessary dependencies for
this plot, including geoplotlib. BaseLayer and BatchPainter are dependencies
we haven't seen before, since they are only needed when writing custom layers.
The BatchPainter class is another helper for our implementation that lets us
trigger the drawing of elements onto the map.
When creating the custom layer, we simply provide the BaseLayer class in the
parentheses to tell Python to extend the given class.
The class then needs to implement at least two of the provided methods,
__init__ and draw.
__init__ defines what happens when a new custom layer is instantiated. This is
used to set the state of our layer; here, we define values such as our data to be used
and create a new BatchPainter class.
Custom Layers | 291
The draw method is called every frame and draws the defined elements using the
BatchPainter class.
In this method, we can do all sorts of calculations such as, in this case, filtering our
dataset to only contain the values of the current active timestamp. In addition to that,
we make the viewport follow our current lat and lon values by fitting the projection
to a new BoundingBox.
Since we don't want to draw everything from scratch with every frame, we use the
invalidate method, which only updates the points on the viewport. For example,
changes such as zooming.
When using interaction elements, such as switching through our countries using
the arrow keys, we can return either True or False from the on_key_pressed
method to trigger the redrawing of all the points.
Once our class is defined, we can call the add_layer method of geoplotlib to add
the newly defined layer to our visualization and finally call show() to show the map.
When executing the preceding example code, we get a visualization that, upon
switching the selected country with the arrow keys, draws the cities for the selected
country using dots on the map:
The following figure shows the cities in Spain after changing the selected country
using the arrow keys:
Figure 5.22: The selection of cities in Spain after changing the country using the arrow keys
In the following exercise, we will create our animated visualization by using what
we've learned about custom layers in the preceding example.
Note
Since geoplotlib operates on OpenGL, this process is highly performant and
can even draw complex visualizations quickly.
Let's create a custom layer that will allow us to display geospatial data and animate
the data points over time:
dataset = pd.read_csv('../../Datasets/flight_tracking.csv')
3. Use the head method to list the first five rows of the dataset and to understand
the columns:
4. Rename the latitude and longitude columns to lat and lon by using the
rename method provided by pandas:
# renaming columns latitude to lat and longitude to lon
dataset = dataset.rename(index=str, \
columns={"latitude": "lat", "longitude": "lon"})
Take another look at the first five elements of the dataset, and observe that the
names of the columns have changed to lat and lon:
Figure 5.24: The dataset with the lat and lon columns
294 | Plotting Geospatial Data
5. Since we want to get a visualization over time in this activity, we need to work
with date and time. If we take a closer look at our dataset, it shows us that
date and time are separated into two columns. Combine date and time into
a timestamp, using the to_epoch method already provided:
6. Use to_epoch and the apply method provided by the pandas DataFrame to
create a new column called timestamp that holds the Unix timestamp:
"""
create a new column called timestamp with the to_epoch method applied
"""
dataset['timestamp'] = dataset.apply(lambda x: to_epoch\
(x['date'], x['time']), \
axis=1)
7. Take another look at our dataset. We now have a new column that holds the
Unix timestamps:
Since our dataset is now ready to be used with all the necessary columns
in place, we can start writing our custom layer. This layer will display each
point once it reaches the timestamp that's provided in the dataset. It will be
displayed for a few seconds before it disappears. We'll need to keep track of the
current timestamp in our custom layer. Consolidating what we learned in the
theoretical section of this topic, we have an __init__ method that constructs
our custom TrackLayer.
8. In the draw method, filter the dataset for all the elements that are in the
mentioned time range and use each element of the filtered list to display it on
the map with color that's provided by the colorbrewer method.
Since our dataset only contains data from a specific time range and we're always
incrementing the time, we want to check whether there are still any elements
with timestamps after the current timestamp. If not, we want to set our
current timestamp to the earliest timestamp that's available in the dataset. The
following code shows how we can create a custom layer:
class TrackLayer(BaseLayer):
def __init__(self, dataset, bbox=BoundingBox.WORLD):
self.data = dataset
self.cmap = colorbrewer(self.data['hex_ident'], \
alpha=200)
self.time = self.data['timestamp'].min()
self.painter = BatchPainter()
self.view = bbox
def draw(self, proj, mouse_x, mouse_y, ui_manager):
self.painter = BatchPainter()
df = self.data.where((self.data['timestamp'] \
> self.time) \
& (self.data['timestamp'] \
<= self.time + 180))
296 | Plotting Geospatial Data
9. Define a custom BoundingBox that focuses our view on this area, since the
dataset only contains data from the area around Leeds in the UK:
Figure 5.26: Final animated tracking map that displays the routes of the aircraft
298 | Plotting Geospatial Data
Note
To access the source code for this specific section, please refer to
https://packt.live/3htmztU.
This section does not currently have an online interactive example, and will
need to be run locally.
You have now completed the custom layer activity using geoplotlib. We've applied
several preprocessing steps to shape the dataset as we want to have it. We've also
written a custom layer to display spatial data in the temporal space. Our custom layer
even has a level of animation. This is something we'll look into more in the following
chapter about Bokeh. We will now implement an activity that will help us get more
acquainted with custom layers in Bokeh.
Activity 5.02: Visualizing City Density by the First Letter Using an Interactive
Custom Layer
In this last activity for geoplotlib, you'll combine all the methodologies learned in the
previous exercises and the activity to create an interactive visualization that displays
the cities that start with a given letter, by merely pressing the left and right arrow keys
on your keyboard.
Since we use the same setup to create custom layers as the library does, you will be
able to understand the library implementations of most of the layers provided by
geoplotlib after this activity.
5. Filter the dataset to only contain European cities by using the given europe_
country_codes list.
Custom Layers | 299
6. Compare the length of all data with the filtered data of Europe by printing the
length of both.
7. Filter down the European dataset to get a dataset that only contains cities that
start with the letter Z.
8. Print its length and the first five rows using the head method.
9. Create a dot density plot with a tooltip that shows the country code and the
name of the city separated by a -. Use the DataAccessObject to create a
copy of our dataset, which allows the use of f_tooltip. The following is the
expected output of the dot density plot:
10. Create a Voronoi plot with the same dataset that only contains cities that start
with Z. Use the 'Reds_r' color map and set the alpha value to 50 to make
sure you still see the map tiles. The following is the expected output of the
Voronoi plot:
11. Create a custom layer that plots all the cities in Europe dataset that starts with
the provided letter. Make it interactive so that by using the left and right arrow
keys, we can switch between the letters. To do that, first, filter the self.data
dataset in the invalidate method using the current letter acquired from the
start_letters array using self.start_letter indexing.
12. Create a new BatchPainter() function and project the lon and lat values
to x and y values. Use the BatchPainter function to paint the points on the
map with a size of 2.
Custom Layers | 301
13. Call the batch_draw() method in the draw method and use the ui_
manager to add an info dialog to the screen telling the user which starting
letter is currently being used.
15. Add the custom layer using the add_layer method and provide the given
europe_bbox as a BoundingBox class.
The following is the expected output of the custom filter layer:
Figure 5.29: A custom filter layer displaying European cities starting with A
302 | Plotting Geospatial Data
If we press the right arrow twice, we will see the cities starting with C instead:
Figure 5.30: A custom filter layer displaying European cities starting with C
Note
The solution for this activity can be found on page 447.
This last activity has a custom layer that uses all the properties described by
geoplotlib. All of the already provided layers by geoplotlib are created using the same
structure. This means that you're now able to dig into the source code and create
your own advanced layers.
306 | Making Things Interactive with Bokeh
Introduction
Bokeh is an interactive visualization library focused on modern browsers and the
web. Other than Matplotlib or geoplotlib, the plots and visualizations we are going to
create in this chapter will be based on JavaScript widgets. Bokeh allows us to create
visually appealing plots and graphs nearly out of the box without much styling. In
addition to that, it helps us construct performant interactive dashboards based on
large static datasets or even streaming data.
Bokeh has been around since 2013, with version 1.4.0 being released in November
2019. It targets modern web browsers to present interactive visualizations to users
rather than static images. The following are some of the features of Bokeh:
• Supports multiple languages: Other than Matplotlib and geoplotlib, Bokeh has
libraries for both Python and JavaScript, in addition to several other
popular languages.
• Beautiful chart styling: The tech stack is based on Tornado in the backend
and is powered by D3 in the frontend. D3 is a JavaScript library for creating
outstanding visualizations. Using the underlying D3 visuals allows us to create
beautiful plots without much custom styling.
Since we are using Jupyter Notebook throughout this book, it's worth mentioning that
Bokeh, including its interactivity, is natively supported in Notebook.
Introduction | 307
Concepts of Bokeh
The basic concept of Bokeh is, in some ways, comparable to that of Matplotlib. In
Bokeh, we have a figure as our root element, which has sub-elements such as a title,
an axis, and glyphs. Glyphs have to be added to a figure, which can take on different
shapes, such as circles, bars, and triangles. The following hierarchy shows the
different concepts of Bokeh:
Interfaces in Bokeh
The interface-based approach provides different levels of complexity for users that
either want to create some basic plots with very few customizable parameters or
want full control over their visualizations to customize every single element of their
plots. This layered approach is divided into two levels:
Note
The models interface is the basic building block for all plots.
The following are the two levels of the layered approach to interfaces:
• bokeh.plotting
The vital thing to note here is that even though its setup is done automatically,
we can configure the sub-elements. When using this interface, the creation of
the scene graph used by BokehJS is handled automatically too.
• bokeh.models
This low-level interface is composed of two libraries: the JavaScript library called
BokehJS, which gets used for displaying the charts in the browser, and the core
plot creation Python code, which provides the developer interface. Internally, the
definition created in Python creates JSON objects that hold the declaration for
the JavaScript representation in the browser.
Introduction | 309
The models interface provides complete control over how Bokeh plots and
widgets (elements that enable users to interact with the data displayed) are
assembled and configured. This means that it is up to the developer to ensure
the correctness of the scene graph (a collection of objects describing
the visualization).
Output
Outputting Bokeh charts is straightforward. There are three ways this can be done:
• The .show() method: The primary option is to display the plot in an HTML page
using this method.
• The inline .show() method: When using inline plotting with a Jupyter
Notebook, the .show() method will allow you to display the chart inside
your Notebook.
The most powerful way of providing your visualization is through the use of the
Bokeh server.
Bokeh Server
Bokeh creates scene graph JSON objects that will be interpreted by the BokehJS
library to create the visualization output. This process gives you a unified format for
other languages to create the same Bokeh plots and visualizations, independently of
the language used.
To create more complex visualizations and leverage the tooling provided by Python,
we need a way to keep our visualizations in sync with one another. This way, we can
not only filter data but also do calculations and operations on the server-side, which
updates the visualizations in real-time.
In addition to that, since we will have an entry point for data, we can create
visualizations that get fed by streams instead of static datasets. This design provides a
way to develop more complex systems with even greater capabilities.
310 | Making Things Interactive with Bokeh
Looking at the scheme of this architecture, we can see that the documents are
provided on the server-side, then moved over to the browser, which then inserts
it into the BokehJS library. This insertion will trigger the interpretation by BokehJS,
which will then create the visualization. The following diagram describes how the
Bokeh server works:
Presentation
In Bokeh, presentations help make the visualization more interactive by using
different features, such as interactions, styling, tools, and layouts.
Interactions
Probably the most exciting feature of Bokeh is its interactions. There are two types of
interactions: passive and active.
Introduction | 311
Passive interactions are actions that the users can take that doesn't change the
dataset. In Bokeh, this is called the inspector. As we mentioned before, the inspector
contains attributes such as zooming, panning, and hovering over data. This tooling
allows the user to inspect the data in more detail and might provide better insights
by allowing the user to observe a zoomed-in subset of the visualized data points. The
elements highlighted with a box in the following figure show the essential passive
interaction elements provided by Bokeh. They include zooming, panning, and
clipping data.
Active interactions are actions that directly change the displayed data. This includes
actions such as selecting subsets of data or filtering the dataset based on parameters.
Widgets are the most prominent of active interactions since they allow users to
manipulate the displayed data with handlers. Examples of available widgets are
buttons, sliders, and checkboxes.
312 | Making Things Interactive with Bokeh
Referring back to the subsection about the output styles, these widgets can be
used in both the so-called standalone applications in the browser and the Bokeh
server. This will help us consolidate the recently learned theoretical concepts and
make things more transparent. Some of the interactions in Bokeh are tab panes,
dropdowns, multi-selects, radio groups, text inputs, check button groups, data tables,
and sliders. The elements highlighted with a red box in the following figure show a
custom active interaction widget for the same plot we looked at in the example of
passive interaction.
Integrating
Embedding Bokeh visualizations can take two forms:
Bokeh is a little bit more complicated than Matplotlib with Seaborn and has its
drawbacks like every other library. Once you have the basic workflow down, however,
you're able to quickly extend basic visualizations with interactivity features to give
power to the user.
Note
One interesting feature is the to_bokeh method, which allows you to
plot Matplotlib figures with Bokeh without configuration overhead. Further
information about this method is available at https://bokeh.pydata.org/
en/0.12.3/docs/user_guide/compat.html.
In the following exercises and activities, we'll consolidate the theoretical knowledge
and build several simple visualizations to explain Bokeh and its two interfaces.
After we've covered the basic usage, we will compare the plotting and models
interfaces and work with widgets that add interactivity to the visualizations.
314 | Making Things Interactive with Bokeh
Basic Plotting
As mentioned before, the plotting interface of Bokeh gives us a higher-level
abstraction, which allows us to quickly visualize data points on a grid.
output_notebook()
Before we can create a plot, we need to import the dataset. In the examples in this
chapter, we will work with a computer hardware dataset. It can be imported by using
pandas' read_csv method.
The basic flow when using the plotting interface is comparable to that of
Matplotlib. We first create a figure. This figure is then used as a container to define
elements and call methods on:
show(plot)
Once we have created a new figure instance using the imported figure() method,
we can use it to draw lines, circles, or any glyph objects that Bokeh offers. Note that
the first two arguments of the plot.line method is datasets that contain an equal
number of elements to plot the element.
Basic Plotting | 315
To display the plot, we then call the show() method we imported from the bokeh.
plotting interface earlier on. The following figure shows the output of the
preceding code:
Figure 6.5: Line plot showing the cache memory of different hardware
316 | Making Things Interactive with Bokeh
Since the interface of different plotting types is unified, scatter plots can be created in
the same way as line plots:
Figure 6.6: Scatter plot showing the cache memory of different hardware
Basic Plotting | 317
show(plot)
318 | Making Things Interactive with Bokeh
Figure 6.7: Line plots displaying the cache memory and cycle time per
hardware with the legend
Basic Plotting | 319
When looking at the preceding example, we can see that once we have several lines,
the visualization can get cluttered.
We can give the user the ability to mute, meaning defocus, the clicked element in
the legend.
plot.legend.click_policy="mute"
show(plot)
320 | Making Things Interactive with Bokeh
Figure 6.8: Line plots displaying the cache memory and cycle time per hardware with a
mutable legend; cycle time is also muted
Basic Plotting | 321
Note
All the exercises and activities in this chapter are developed using
Jupyter Notebook. The files can be downloaded from the following link:
https://packt.live/39txwH5. All the datasets used in this chapter can be found
at https://packt.live/3bzApYN.
import pandas as pd
from bokeh.plotting import figure, show
3. Import and call the output_notebook method from the io interface of Bokeh
to display the plots inside a Jupyter Notebook:
dataset = pd.read_csv('../../Datasets/world_population.csv', \
index_col=0)
5. Verify that our data has been successfully loaded by calling head on
our DataFrame:
dataset.head()
322 | Making Things Interactive with Bokeh
Figure 6.9: Loading the top five rows of the world_population dataset
using the head method
Basic Plotting | 323
6. Populate our x-axis and y-axis with some data extraction. The x-axis will hold all
the years that are present in our columns. The y-axis will hold the population
density values of the countries. Start with Germany:
7. After extracting the necessary data, create a new plot by calling the Bokeh
figure method. Provide parameters such as title, x_axis_label, and
y_axis_label to define the descriptions displayed on our plot. Once our
plot is created, we can add glyphs to it. Here, we will use a simple line. Set the
legend_label parameter next to the x and y values to get an informative
legend in our visualization:
"""
plotting the population density change in Germany in the given years
"""
plot = figure(title='Population Density of Germany', \
x_axis_label='Year', \
y_axis_label='Population Density')
plot.line(years, de_vals, line_width=2, legend_label='Germany')
show(plot)
324 | Making Things Interactive with Bokeh
Figure 6.10: Creating a line plot from the population density data of Germany
Basic Plotting | 325
8. Now add another country—in this case, Switzerland. Use the same technique
that we used with Germany to extract the data for Switzerland:
9. We can add several layers of glyphs on to our figure plot. We can also
stack different glyphs on top of one another, thus giving specific and data-
improved visuals. Add an orange line to the plot that displays the data from
Switzerland. Also, plot orange circles for each data point of the ch_vals list
and assign it the same legend_label to combine both representations, the
line, and circles:
"""
plotting the data for Germany and Switzerland in one visualization,
adding circles for each data point for Switzerland
"""
plot = \
figure(title='Population Density of Germany and Switzerland', \
x_axis_label='Year', y_axis_label='Population Density')
plot.line(years, de_vals, line_width=2, legend_label='Germany')
plot.line(years, ch_vals, line_width=2, color='orange', legend_
label='Switzerland')
plot.circle(years, ch_vals, size=4, line_color='orange', \
fill_color='white', legend_label='Switzerland')
show(plot)
326 | Making Things Interactive with Bokeh
10. When looking at a larger amount of data for different countries, it makes sense
to have a plot for each of them separately. Use gridplot layout:
"""
plotting the Germany and Switzerland plot in two different
visualizations that are interconnected in terms of view port
"""
from bokeh.layouts import gridplot
plot_de = figure(title='Population Density of Germany', \
x_axis_label='Year', \
y_axis_label='Population Density', \
plot_height=300)
Figure 6.12: Using a gridplot to display the country plots next to each other
328 | Making Things Interactive with Bokeh
Figure 6.13: Using the gridplot method to arrange the visualizations vertically
Basic Plotting | 329
Note
To access the source code for this specific section, please refer to
https://packt.live/2Beg0KY.
We have now covered the very basics of Bokeh. Using the plotting interface makes
it easy to get some quick visualizations in place. This helps you understand the data
you're working with.
This simplicity is achieved by abstracting away complexity, and we lose much control
by using the plotting interface. In the next exercise, we'll compare the plotting
and models interfaces to show you how much abstraction is added to plotting.
import numpy as np
import pandas as pd
from bokeh.io import output_notebook
output_notebook()
330 | Making Things Interactive with Bokeh
dataset = pd.read_csv('../../Datasets/world_population.csv', \
index_col=0)
4. Call head on our DataFrame to verify that our data has been
successfully loaded:
dataset.head()
Figure 6.14: Loading the top five rows of the world_population dataset
using the head method
Basic Plotting | 331
6. Create three lists that have years present in the dataset, the mean population
density for the whole dataset for each year, and the mean population density
per year for Japan:
7. Use the plot element and apply our glyphs elements to it. Plot the global mean
with a line and the mean of Japan with crosses. Set the legend location to the
bottom-right corner:
plot = \
figure(title='Global Mean Population Density compared to Japan', \
x_axis_label='Year', y_axis_label='Population Density')
plot.legend.location = 'bottom_right'
show(plot)
332 | Making Things Interactive with Bokeh
Figure 6.15: Line plots comparing the global mean population density with that of Japan
The models interface is of a much lower level than other interfaces. We can
already see this when looking at the list of imports we need for a
comparable plot.
9. Before we build our plot, we have to find the min and max values for the y-axis
since we don't want to have too large or too small a range of values. Get all the
mean values for global and Japan without any invalid values. Get their smallest
and largest values and pass them to the constructor of Range1d. For the x-axis,
our list of years is pre-defined:
extracted_jp_vals = \
[jp_val['Japan'] for i, jp_val in enumerate(jp_vals) \
if i not in [0, len(jp_vals) - 1]]
min_pop_density = min(extracted_mean_pop_vals)
min_jp_densitiy = min(extracted_jp_vals)
min_y = int(min(min_pop_density, min_jp_densitiy))
max_pop_density = max(extracted_mean_pop_vals)
334 | Making Things Interactive with Bokeh
max_jp_densitiy = max(extracted_jp_vals)
max_y = int(max(max_jp_densitiy, max_pop_density))
xdr = Range1d(int(years[0]), int(years[-1]))
ydr = Range1d(min_y, max_y)
10. Next, create two Axis objects, which will be used to display the axis lines and
the label for the axis. Since we also want ticks between the different values, pass
in a Ticker object that creates this setup:
11. Create the title by passing a Title object to the title attribute of the
Plot object:
# creating the plot object
title = \
Title(align = 'left', \
text = 'Global Mean Population Density compared to Japan')
plot = Plot(x_range=xdr, y_range=ydr, plot_width=650, \
plot_height=600, title=title)
12. Try to display our plot now by using the show method. Since we have no
renderers defined at the moment, we will get an error. We need to add elements
to our plot:
"""
error will be thrown because we are missing renderers that are
created when adding elements
"""
show(plot)
Basic Plotting | 335
13. Insert the data into a DataSource object. This can then be used to map the
data source to the glyph object that will be displayed in the plot:
14. Use the right add method to add objects to the plot. For layout elements such as
the Axis objects, use the add_layout method. Glyphs, which display our data,
have to be added with the add_glyph method:
plot.add_layout(x_axis, 'below')
plot.add_layout(y_axis, 'left')
line_renderer = plot.add_glyph(line_source, line_glyph)
cross_renderer = plot.add_glyph(cross_source, cross_glyph)
336 | Making Things Interactive with Bokeh
15. Show our plot again to see our lines are in place:
show(plot)
Figure 6.17: A models interface-based plot displaying the lines and axes
Basic Plotting | 337
16. Use an object to add a legend to the plot. Each LegendItem object will be
displayed in one line in the legend:
17. Create the grid by instantiating two Grid objects, one for each axis. Provide the
tickers of the previously created x and y axes:
18. Finally, use the add_layout method to add the grid and the legend to our plot.
After this, display our complete plot, which will look like the one we created in
the first task, with only four lines of code:
plot.add_layout(legend)
plot.add_layout(x_grid)
plot.add_layout(y_grid)
show(plot)
338 | Making Things Interactive with Bokeh
Figure 6.18: Full recreation of the visualization done with the plotting interface
As you can see, the models interface should not be used for simple plots. It's
meant to provide the full power of Bokeh to experienced users that have specific
requirements that need more than the plotting interface.
Note
To access the source code for this specific section, please refer to
https://packt.live/3fq8pIf.
We have looked at the difference between the high-level plotting and low-level
models interface now. This will help us understand the internal workings and
potential future errors better. In this following activity, we'll use what we've already
learned and created a basic visualization that plots the mean car price of each
manufacturer from our dataset.
Next, we will color each data point with a color based on a given value. In Bokeh, like
in geoplotlib, this can be done using ColorMapper.
ColorMapper can map specific values to a given color in the selected spectrum. By
providing the minimum and maximum value for a variable, we define the range in
which colors are returned:
color_mapper = LinearColorMapper(palette='Magma256', \
low=min(dataset['cach']), \
high=max(dataset['cach']))
show(plot)
340 | Making Things Interactive with Bokeh
Next, we will implement all the concepts related to Bokeh we have learned so far.
Basic Plotting | 341
Note that we will use only the make and price columns in our activity.
In the process, we will first plot all cars with their prices and then slowly develop
a more sophisticated visualization that also uses color to visually focus the
manufacturers with the highest mean prices.
3. Load the automobiles.csv dataset from the Datasets folder using pandas.
Make sure that the dataset is loaded by displaying the first five elements of
the dataset.
5. Add a new column index to our dataset by assigning it to the values from our
dataset.index.
6. Create a new figure and plot each car using a scatter plot with the index and
price column. Give the visualization a title of Car prices and name the x-axis
Car Index. The y-axis should be named Price.
Grouping cars from manufacturers together
7. Group the dataset using groupby and the column make. Then use the mean
method to get the mean value for each column. We don't want the make
column to be used as an index, so provide the as_index=False
argument to groupby.
Adding color
12. Plot each manufacturer and provide a size argument with a size of 15.
13. Provide the color argument to the scatter method and use the field and
transform attributes to provide the column (y) and the color_mapper.
14. Set the label orientation to vertical.
Basic Plotting | 343
Figure 6.20: Final visualization displaying the mean car price for each manufacturer
Note
The solution for this activity can be found on page 456.
344 | Making Things Interactive with Bokeh
In the next section, we will create interactive visualizations that allow the user to
modify the data that is displayed.
Adding Widgets
One of the most powerful features of Bokeh is the ability to use widgets to
interactively change the data that's displayed in a visualization. To understand the
importance of interactivity in your visualizations, imagine seeing a static visualization
about stock prices that only shows data for the last year.
If you're interested in seeing the current year or even visually comparing it to the
recent and coming years, static plots won't be suitable. You would need to create one
plot for every year or even overlay different years on one visualization, which would
make it much harder to read.
Comparing this to a simple plot that lets the user select the date range they want, we
can already see the advantages. You can guide the user by restricting values and only
displaying what you want them to see. Developing a story behind your visualization is
very important, and doing this is much easier if the user has ways of interacting with
the data.
Bokeh widgets work best when used in combination with the Bokeh server. However,
using the Bokeh server approach is beyond the content of this book, since we would
need to work with simple Python files. Instead, we will use a hybrid approach that
only works with the Jupyter Notebook.
We will look at the different widgets and how to use them before going in and
building a basic plot with one of them. There are a few different options regarding
how to trigger updates, which are also explained in this section. The widgets that will
be covered in the following exercise are explained in the following table:
Adding Widgets | 345
The general way to create a new widget visible in a Jupyter Notebook is to define
a new method and wrap it into an interact widget. We'll be using the "syntactic
sugar" way of adding a decorator to a method—that is, by using annotations. This will
give us an interactive element that will be displayed after the executable cell, like in
the following example:
In the preceding example, we first import the interact element from the
ipywidgets library. This then allows us to define a new method and annotate it
with the @interact decorator.
The Value attribute tells the interact element which widget to use based on the
data type of the argument. In our example, we provide a string, which will give us a
TextBox widget. We can refer to the preceding table to determine which Value
data type will return which widget.
The print statement in the preceding code prints whatever has been entered in the
textbox below the widget.
Note
The methods that we can use interact with always have the same structure.
We will look at several examples in the following exercise.
3. In this first task, we will add interactive widgets to the interactive element of
IPython. Import the necessary interact and interact_manual elements
from ipywidgets:
4. Create a checkbox widget and print out the result of the interactive element:
@interact(Value=False)
def checkbox(Value=False):
print(Value)
Figure 6.23: Interactive checkbox that will switch from False to True if checked
Note
@interact() is called a decorator. It wraps the annotated method into
the interact component. This allows us to display and react to the change of
the drop-down menu. The method will be executed every time the value of
the dropdown changes.
348 | Making Things Interactive with Bokeh
@interact(Value=options)
def dropdown(Value=options[0]):
print(Value)
7. Create two widgets, a dropdown and a checkbox with the same value, as in the
last two tasks:
@interact(Select=options, Display=False)
def uif(Select, Display):
print(Select, Display)
9. Create an int slider using values of 0 and 100 as the @interact decorator min
and max values. Set continuous_update to false to only trigger an update on
mouse release:
@interact(Value=slider)
def slider(Value=0.0):
print(Value)
Figure 6.28: Interactive int slider that only triggers upon mouse release
Note
Although the outputs of Figure 6.27 and Figure 6.28 look the same, in Figure
6.28, the slider triggers only upon mouse release.
Note
Compared to the previous cells, this one contains the interact_
manual decorator instead of interact. This will add an execution button that
will trigger the update of the value instead of triggering with every change.
This can be really useful when working with larger datasets, where the
recalculation time would be large. Because of this, you don't want to trigger
the execution for every small step, but only once you have selected the
correct value.
Note
To access the source code for this specific section, please refer to
https://packt.live/3e8G60B.
After looking at several example widgets and how to create and use them in the
previous exercise, we will now use a real-world stock_price dataset to create a
basic plot and add simple interactive widgets.
352 | Making Things Interactive with Bokeh
The dataset of this exercise is a stock_prices dataset. This means that we will be
looking at data over a range of time. As this is a large and variable dataset, it will be
easier to show and explain widgets such as slider and dropdown on it. The dataset
is available in the Datasets folder of the GitHub repository; here is the link to it:
https://packt.live/3bzApYN. Follow these steps:
import pandas as pd
4. After downloading the dataset and moving it into the Datasets folder of this
chapter, import our stock_prices.csv data:
dataset = pd.read_csv('../../Datasets/stock_prices.csv')
5. Test whether the data has been loaded successfully by executing the head
method on the dataset:
dataset.head()
Adding Widgets | 353
Figure 6.30: Loading the top five rows of the stock_prices dataset using the head method
Since the date column has no information about the hour, minute, and second,
we want to avoid displaying them in the visualization later on and display the
year, month, and day.
6. Create a new column that holds the formatted short version of the date value.
Print out the first five rows of the dataset to see the new column, short_date:
dataset.head()
354 | Making Things Interactive with Bokeh
Note
The execution of the cell will take a moment since it's a fairly large dataset.
Please be patient.
In this task, we will create a basic visualization with the stock price dataset.
This will be your first interactive visualization in which you can dynamically
change the stock that is displayed in the graph. We will get used to one of the
aforementioned interactive widgets: the drop-down menu. It will be the main
point of interaction for our visualization.
7. Import the already-familiar figure and show methods from the plotting
interface. Since we also want to have a panel with two tabs displaying different
plot styles, also import the Panel and Tabs classes from the models interface:
8. Create two tabs. The first tab will contain a line plot of the given data, while the
second will contain a circle-based representation of the same data. Create a
legend that will display the name of the currently viewed stock:
line_plot=figure(title='Stock prices', \
x_axis_label='Date', \
x_range=stock['short_date'], \
y_axis_label='Price in $USD')
line_plot.line(stock['short_date'], stock['high'], \
legend_label=stock_name)
line_plot.xaxis.major_label_orientation = 1
circle_plot=figure(title='Stock prices', \
x_axis_label='Date', \
x_range=stock['short_date'], \
y_axis_label='Price in $USD')
circle_plot.circle(stock['short_date'], stock['high'], \
legend_label=stock_name)
circle_plot.xaxis.major_label_orientation = 1
line_tab=Panel(child=line_plot, title='Line')
circle_tab=Panel(child=circle_plot, title='Circles')
tabs = Tabs(tabs=[ line_tab, circle_tab ])
return tabs
356 | Making Things Interactive with Bokeh
9. Get a list of all the stock names in our dataset by using the unique method for
our symbol column:
Once we have done this, use this list as an input for the interact element.
10. Add the drop-down widget in the decorator and call the method that returns our
visualization in the show method with the selected stock. Only provide the first
25 entries of each stock. By default, the stock of Apple should be displayed; its
symbol in the dataset is AAPL. This will give us a visualization that is displayed
in a pane with two tabs. The first tab will display an interpolated line, and the
second tab will display the values as circles:
The following screenshot shows the output of the code in step 11:
Note
We can already see that each date is displayed on the x-axis. If we want to
display a bigger time range, we have to customize the ticks on our x-axis.
This can be done using ticker objects.
Note
To access the source code for this specific section, please refer to
https://packt.live/3fnfPvI.
We have now covered the very basics of widgets and how to use them in a
Jupyter Notebook.
Note
If you want to learn more about using widgets and which widgets can be
used in Jupyter, visit https://ipywidgets.readthedocs.io/en/latest/examples/
Using%20Interact.html and https://ipywidgets.readthedocs.io/en/stable/
examples/Widget%20List.html.
360 | Making Things Interactive with Bokeh
In the following activity, we will make use of the Bokeh DataSource to add a
tooltip overlay to our plot that is displayed upon hovering over the data points.
DataSource can be helpful in several cases, for example, displaying a tooltip on
hovering the data points. In most cases, we can use pandas DataFrames to feed data
into our plot, but for certain features, such as tooltips, we have to use DataSource:
data_source = \
ColumnDataSource(data=dict(vendor_name=dataset['vendor_name'], \
model=dataset['model'], \
cach=dataset['cach'], \
x=dataset['index'], \
y=dataset['cach']))
show(plot)
Adding Widgets | 361
Figure 6.34: Cache memory plotted as dots with tooltip overlay displaying the vendor,
model, and amount of memory
We want to use the nationality, gold, silver, and bronze columns to create
a custom visualization that lets us dig through the Olympians.
Adding Widgets | 363
Our visualization will display each country that participated in a coordinate system
where the x-axis represents the number of medals won and the y-axis represents the
number of athletes. Using interactive widgets, we will be able to filter the displayed
countries by both the maximum number of medals won and the maximum amount
of athletes axes.
Figure 6.35: Final interactive visualization that displays the scatter plot
364 | Making Things Interactive with Bokeh
There are many options when it comes to choosing which interactivity to use. We will
focus on only two widgets to make it easier for you to understand the concepts. In
the end, we will have a visualization that allows us to filter countries for the number
of medals and athletes they placed in the Olympics and upon hovering over the single
data points, receive more information about each country:
3. Import figure and show from Bokeh and interact and widgets from
ipywidgets to get started.
4. Load our olympia2016_athletes.csv dataset from the Datasets folder
and set up the interaction elements. Scroll down until you reach the cell that says
getting the max number of medals and athletes of all countries. Extract the
two numbers from the dataset.
6. Set up the @interact method, which will display the complete visualization.
The only code we will write here is to show the return value of the get_plot
method that gets all the interaction element values as parameters.
7. Implement the decorator method, move up in the Notebook, and work on the
get_plot method.
8. First, filter our countries dataset that contains all the countries that placed
athletes in the Olympic games. Check whether they have a lower or equal
number of medals and athletes than our max values passed as arguments.
9. Create our DataSource and use it for the tooltips and the printing of the
circle glyphs.
10. After that, create a new plot using the figure method that has the following
attributes: title set to Rio Olympics 2016 - Medal comparison, x_
axis_label set to Number of Medals, and y_axis_label set to Num
of Athletes.
Summary | 365
11. Execute every cell starting from the get_plot cell to the bottom—again,
making sure that all implementations are captured.
12. When executing the cell that contains the @interact decorator, you will
see a scatter plot that displays a circle for every country displaying additional
information, such as the shortcode of the country, the number of athletes,
and the number of gold, silver, and bronze medals.
Note
The solution for this activity can be found on page 465.
As we mentioned before, when working with interactive features and Bokeh, you
might want to read up about the Bokeh server a little bit more. It will give you more
options to express your creativity by creating animated plots and visualizations that
can be explored by several people at the same time.
Summary
In this chapter, we have looked at another option for creating visualizations with a
whole new focus: web-based Bokeh plots. We also discovered ways in which we can
make our visualizations more interactive and give the user the chance to explore data
in a different way.
As we mentioned in the first part of this chapter, Bokeh is a comparably new tool
that empowers developers to use their favorite language to create easily portable
visualizations for the web. After working with Matplotlib, Seaborn, geoplotlib, and
Bokeh, we can see some standard interfaces and similar ways to work with those
libraries. After studying the tools that are covered in this book, it will be simple to
understand new plotting tools.
In the next and final chapter, we will introduce a new real-life dataset to create
visualizations. This last chapter will allow you to consolidate the concepts and tools
that you have learned about in this book and further enhance your skills.