0% found this document useful (0 votes)
10 views107 pages

21AD71 Module 4 Textbook

The document introduces geoplotlib, an open-source Python library designed for geospatial data visualizations, emphasizing its advantages over Matplotlib and other libraries. It outlines the library's capabilities, including support for map tiles, interactivity, and various visualization types such as dot density plots, Voronoi tessellations, and choropleth plots. Additionally, it provides a detailed exercise on visualizing poaching incidents in Tanzania using geoplotlib, highlighting the integration with pandas for data manipulation.

Uploaded by

raji87bdvt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views107 pages

21AD71 Module 4 Textbook

The document introduces geoplotlib, an open-source Python library designed for geospatial data visualizations, emphasizing its advantages over Matplotlib and other libraries. It outlines the library's capabilities, including support for map tiles, interactivity, and various visualization types such as dot density plots, Voronoi tessellations, and choropleth plots. Additionally, it provides a detailed exercise on visualizing poaching incidents in Tanzania using geoplotlib, highlighting the integration with pandas for data manipulation.

Uploaded by

raji87bdvt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

256 | Plotting Geospatial Data

MODULE 4
Introduction
geoplotlib is an open-source Python library for geospatial data visualizations. It has
a wide range of geographical visualizations and supports hardware acceleration.
It also provides performance rendering for large datasets with millions of data
points. As discussed in earlier chapters, Matplotlib provides various ways to visualize
geographical data.

However, Matplotlib is not designed for this task because its interfaces are
complicated and inconvenient to use. Matplotlib also restricts how geographical
data can be displayed. The Basemap and Cartopy libraries allow you to plot on
a world map, but these packages do not support drawing on map tiles. Map tiles
are underlying rectangular, square, or hexagonal tile slabs that are used to create
a seamless map of the world, with lightweight, individually requested tiles that are
currently in view.

geoplotlib, on the other hand, was designed precisely for this purpose; it not only
provides map tiles but also allows for interactivity and simple animations. It provides
a simple interface that allows access to compelling geospatial visualizations such
as histograms, point-based plots, tessellations such as Voronoi or Delaunay, and
choropleth plots.

In the exercises and activities in this chapter, we will use geoplotlib in combination
with different real-world datasets to do the following:

• Highlight popular poaching spots in one area of Tanzania

• Discover dense areas within cities in Europe that have a high population

• Visualize values for the distinct states of the US

• Create a custom animated layer that displays the time series data of aircraft
Introduction | 257

To understand the concepts, design, and implementation of geoplotlib, take a brief


look at its conceptual architecture. The two inputs that are fed to geoplotlib are your
data sources and map tiles. The map tiles, as we'll see later, can be replaced by
different providers. The outputs describe the possibility to not only render images
inside Jupyter Notebooks but also to work in an interactive window that allows the
zooming and panning of the maps. The schema of the components of geoplotlib
looks as follows:

Figure 5.1: Conceptual architecture of geoplotlib

geoplotlib uses the concept of layers that can be placed on top of one another,
providing a powerful interface for even complex visualizations. It comes with several
common visualization layers that are easy to set up and use.
258 | Plotting Geospatial Data

From the preceding diagram, we can see that geoplotlib is built on top of NumPy/
SciPy and Pyglet/OpenGL. These libraries take care of numerical operations and
rendering. Both components are based on Python, therefore enabling the use of the
full Python ecosystem.

Note
All the datasets used in this chapter can be found at
https://packt.live/3bzApYN. All the files of exercises and
activities can be found here: https://packt.live/2UJRbyt.

All of the following examples are created with the world_cities_pop.csv


dataset, which we will use for the exercises and activities later in this chapter. Before
we can use it, we have to extract the .zip file that is included in the
Datasets folder.
To use the world_cities_pop dataset, we need to add a lat and lon column.
For the examples, we also want to filter our dataset down to contain only cities in
Brazil. This will give us dataset_filtered. We will use this filtered-down dataset
in the following examples:

# loading the Dataset with geoplotlib


dataset = pd.read_csv('../../Datasets/world_cities_pop.csv', \
                      dtype={'Region': np.str})

# Adding lat and lon column needed by geoplotlib


dataset['lat'] = dataset['Latitude']
dataset['lon'] = dataset['Longitude']

# filtering for cities in brasil


dataset_filtered = dataset[dataset['Country'] == 'br']

To run these examples yourself, please refer to Examples.ipynb in the Examples


folder of the chapter.
Introduction | 259

The Design Principles of geoplotlib


Taking a closer look at the internal design of geoplotlib, we can see that it is built
around three design principles:

• Integration: geoplotlib visualizations are purely Python-based. This means


that generic Python code can be executed, and other libraries such as pandas
can be used for data wrangling purposes. We can manipulate and enrich our
datasets using pandas DataFrames and later convert them into a geoplotlib
DataAccessObject, which we need for optimal compatibilities, as follows:
import pandas as pd
from geoplotlib.utils import DataAccessObject

# data wrangling with pandas DataFrames here


dataset_obj = DataAccessObject(dataset_filtered)

geoplotlib fully integrates into the Python ecosystem. This even enables us to
plot geographical data inline inside our Jupyter Notebooks. This possibility allows
us to design our visualizations quickly and iteratively.

• Simplicity: Looking at the example provided here, we can quickly see that
geoplotlib abstracts away the complexity of plotting map tiles and already-
provided layers such as dot density and histogram. It has a simple API that
provides common visualizations. These visualizations can be created using
custom data with only a few lines of code.

The core attributes of our datasets are lat and lon values. Latitude and
longitude values enable us to index every single location on Earth. In geoplotlib,
we need them to tell the library where on the map our elements need to be
rendered. If our dataset comes with lat and lon columns, we can display each
of those data points, for example, dots on a map with five lines of code.
260 | Plotting Geospatial Data

In addition, we can use the f_tooltip argument to provide a popup for each
point as an element of the column we provide as a source as follows:

# plotting our dataset as a dot density plot


import geoplotlib
from geoplotlib.utils import DataAccessObject

dataset_obj = DataAccessObject(dataset_filtered)
geoplotlib.dot(dataset_obj, \
               f_tooltip=lambda d:d['City'].title())

geoplotlib.show()

Executing this code will result in the following dot density plot:

Figure 5.2: Dot density layer of cities in Brazil and an overlay of the city on hovering
Geospatial Visualizations | 261

In addition to this, everyone who's used Matplotlib before will have no


problems understanding geoplotlib. The syntax of geoplotlib is highly inspired
by Matplotlib.

• Performance: As we mentioned before, geoplotlib can handle large amounts of


data due to the use of NumPy for accelerated numerical operations and OpenGL
for accelerated graphical rendering.

Next, we will create geographical visualizations without much effort and discover the
advantages of using geoplotlib in combination with pandas. We will implement an
exercise that plots the cities of the world and will be able to feel the performance of
the library when plotting thousands of dots on our map.

Geospatial Visualizations
Voronoi tessellation, Delaunay triangulation, and choropleth plots are a few of
the geospatial visualizations that will be used in this chapter. An explanation for each
of them is provided here.

Voronoi Tessellation
In a Voronoi tessellation, each pair of data points is separated by a line that is the
same distance from both data points. The separation creates cells that, for every
given point, marks which data point is closer. The closer the data points, the smaller
the cells.

The following example shows how you can simply use the voronoi method to
create this visualization:

# plotting our dataset as voronoi plot


geoplotlib.voronoi(dataset_filtered, line_color='b')
geoplotlib.set_smoothing(True)

geoplotlib.show()

As we can see, the code to create this visualization is relatively short.

After importing the dependencies we need, we read the dataset using the read_csv
method of pandas (or geoplotlib). We then use it as data for our voronoi method,
which handles all the complex logic of plotting the data on the map.
262 | Plotting Geospatial Data

In addition to the data itself, we can set several parameters, such as general
smoothing using the set_smoothing method. The smoothing of the lines
uses anti-aliasing:

Figure 5.3: Voronoi plot of cities in Brazil to visualize population density

Delaunay Triangulation
A Delaunay triangulation is related to Voronoi tessellation. When connecting each
data point to every other data point that shares an edge, we end up with a plot that
is triangulated. The closer the data points are to each other, the smaller the triangles
will be. This gives us a visual clue about the density of points in specific areas. When
combined with color gradients, we get insights about points of interest, which can be
compared with a heatmap:

# plotting our dataset as a delaunay


geoplotlib.delaunay(dataset_filtered, cmap='hot_r')
geoplotlib.set_smoothing(True)

geoplotlib.show()
Geospatial Visualizations | 263

This example uses the same dataset as before, that is, population density in Brazil.
The structure of the code is the same as in the voronoi example.

After importing the dependencies that we need, we read the dataset using the read_
csv method and then use it as data for our delaunay method, which handles all of
the complex logic of plotting data on the map.

In addition to the data itself, we can again use the set_smoothing method to
smooth the lines using anti-aliasing.

The resulting visualization looks as follows:

Figure 5.4: Delaunay triangulation of cities in Brazil to visualize population density


264 | Plotting Geospatial Data

Choropleth Plot
This kind of geographical plot displays areas such as the states of a country in
a shaded or colored manner. The shade or color of the plot is determined by a
single data point or a set of data points. It gives an abstract view of a geographical
area to visualize the relationships and differences between the different areas. In
the following code and visual example, we can see that the unemployment rate
determines the shade of each state of the US. The darker the shade, the higher
the rate:

from geoplotlib.colors import ColorMap


import json
"""
find the unemployment rate for the selected county, and convert it to
color
"""
def get_color(properties):
    key = str(int(properties['STATE'])) \
          + properties['COUNTY']
    if key in unemployment_rates:
        return cmap.to_color(unemployment_rates.get(key), \
                             .15, 'lin')
    else:
        return [0, 0, 0, 0]

# get unemployment data


with open('../../Datasets/unemployment.json') as fin:
    unemployment_rates = json.load(fin)
Geospatial Visualizations | 265

"""
plot the outlines of the states and color them using the unemployment
rate
"""
cmap = ColorMap('Reds', alpha=255, levels=10)
geoplotlib.geojson('../../Datasets/us_states_shapes.json', \
                   fill=True, color=get_color, \
                   f_tooltip=lambda properties: properties['NAME'])
geoplotlib.geojson('../../Datasets/us_states_shapes.json', \
                   fill=False, color=[255, 255, 255, 64])
geoplotlib.set_bbox(BoundingBox.USA)

geoplotlib.show()

We will cover what each line does in more detail later. However, to give you a better
understanding of what is happening here, we will quickly cover the sections of the
preceding code.

The first few lines import all the necessary dependencies, including geoplotlib and
json, which will be used to load our dataset, which is provided in this format.
After the import statements, we see a get_color method. This method returns
a color that has been determined by the unemployment rate of the given data point.
This method defines how dark the red value will be. In the last section of the script,
we read our dataset and use it with the geojson method.

The choropleth plot is one of the only visualizations that does not have a method
assigned that is solely used for this kind of plot. We use the geojson() method to
create more complex shapes than simple dots. By using the f_tooltip argument,
we can also display the name of the city we are hovering over.

The BoundingBox object is an object to define the "corners" of the viewport. We can
set an initial focus when running our visualization, which helps the user see what the
visualization is about without panning around and zooming first.
266 | Plotting Geospatial Data

Executing this code with the right example dataset provides the
following visualization:

Figure 5.5: Choropleth plot of unemployment rates in the US; the darker the color, the
higher the value

Next, we will implement an exercise to plot dot density and histograms.

Exercise 5.01: Plotting Poaching Density Using Dot Density and Histograms
In this exercise, we'll be looking at the primary use of geoplotlib's plot methods for
dot density, histograms, and Voronoi diagrams. For this, we will make use of data
on various poaching incidents that have taken place all over the world.

The dataset that we will be using here contains data from poaching incidents in
Tanzania. The dataset consists of 268 rows and 6 columns (id_report, date_
report, description, created_date, lat, and lon).
Geospatial Visualizations | 267

Each row is uniquely identified by id_report. The date_report column states


what date the poaching incident took place on. On the other hand, the created_
date column states the date on which the report was created. The description
column provides basic information about the incident. The lat and lon columns
state the geographical location of the place where the poaching took place.

Note that geoplotlib requires your dataset to have both lat and lon columns. These
columns are the geographical data for latitude and longitude, which are used to
determine how to plot the data. The following are the steps to perform:

1. Create an Exercise5.01.ipynb Jupyter Notebook within the


Chapter05/Exercise5.01 folder to implement this exercise.
2. First, import the dependencies that you will need. Use the read_csv
method provided by geoplotlib to read the dataset as a CSV file into a
DataAccessObject:
import geoplotlib
from geoplotlib.utils import read_csv

3. Load the poaching_points_cleaned.csv dataset from the Datasets


folder using the pandas read_csv method as well:

dataset = read_csv('../../Datasets/poaching_points_cleaned.csv')

4. Print out the dataset and look at its type. What difference do you see compared
to a pandas DataFrame? Let's take a look:

# looking at the dataset structure


dataset

The following figure shows the output of the preceding code:

Figure 5.6: Dataset structure


268 | Plotting Geospatial Data

The dataset is stored in a DataAccessObject class that's provided by


geoplotlib. It does not have the same capabilities as a pandas DataFrame.
Instead, it's meant for the simple and quick loading of data so that you can
create a visualization. If we print out this object, we can see the differences
better. It gives us a basic overview of what columns are present and how many
rows the dataset has.

5. Convert the dataset into a pandas DataFrame to preprocess the data:

# csv import with pandas


import pandas as pd
pd_dataset = \
    pd.read_csv('../../Datasets/poaching_points_cleaned.csv')
pd_dataset.head()

The following figure shows the output:

Figure 5.7: The first five entries of the dataset

6. Plot each row of our dataset as a single point on the map using a dot density
layer by calling the dot method. Then, call the show method to render the map
with a given layer:

# plotting our dataset with points


geoplotlib.dot(dataset)
geoplotlib.show()

The following figure shows the output:


Geospatial Visualizations | 269

Figure 5.8: Dot density visualization of poaching points

Only looking at the lat and lon values in the dataset won't give us a very good
idea of where on the map our elements are located or how far apart they are.
We're not able to draw conclusions and get insights into our dataset without
visualizing our data points on a map. When looking at the rendered map, we
can instantly see that some areas have more incidents than others. This insight
couldn't have been easily identified by simply looking at the numbers in the
dataset itself.

7. Visualize the density using the hist method, which will create a Histogram
Layer on top of our map tiles. Then, define a binsize of 20. This will allow us
to set the size of the hist bins in our visualization:

# plotting our dataset as a histogram


geoplotlib.hist(dataset, binsize=20)
geoplotlib.show()
270 | Plotting Geospatial Data

The following figure shows the output of the preceding code:

Figure 5.9: Histogram visualization of poaching points

Histogram plots give us a better understanding of the density distribution of


our dataset. Looking at the final plot, we can see that there are some hotspots
for poaching. It also highlights the areas without any poaching incidents.

8. Create a Voronoi plot using the same dataset. Use a color map cmap of
'Blues_r' and define the max_area parameter as 1e5:
# plotting a voronoi map
geoplotlib.voronoi(dataset, cmap='Blues_r', \
                   max_area=1e5, alpha=255)
geoplotlib.show()
Geospatial Visualizations | 271

The following figure shows the output of the preceding code:

Figure 5.10: Voronoi visualization of poaching points

Note
To access the source code for this specific section, please refer to
https://packt.live/2UIwGkT.

This section does not currently have an online interactive example, and will
need to be run locally.

Voronoi plots are good for visualizing the density of data points, too. Voronoi
introduces a little bit more complexity with several parameters, such as cmap, max_
area, and alpha. Here, cmap denotes the color of the map, alpha denotes the
color of the alpha, and max_area denotes a constant that determines the color of
the Voronoi areas.
272 | Plotting Geospatial Data

If we compare this Voronoi visualization with the histogram plot, we can see that
one area draws a lot of attention. The center-right edge of the plot shows quite a
large dark blue area with an even darker center: something that could've easily been
overlooked with the histogram plot.

We have now covered the basics of geoplotlib. It has many more methods, but they
all have a similar API that makes using the other methods simple. Since we have
looked at some very basic visualizations, it's now up to you to solve the first activity.

Activity 5.01: Plotting Geospatial Data on a Map


In this activity, we will take our previously learned skills of plotting data with
geoplotlib and apply them to our new world_cities_pop.csv dataset. We will
find the dense areas of cities in Europe that have a population of more than
100,000 people:

1. Create an Activity5.01.ipynb Jupyter Notebook within the


Chapter05/Activity5.01 folder to implement this activity.
2. Import the dependencies and load the world_cities_pop.csv dataset from
the Datasets folder using pandas.

3. List all the datatypes that are present in it and verify that they are correct.
Then, map the Latitude and Longitude columns to lat and lon.

4. Now, plot the data points on a dot density plot.

5. Use the agg method of pandas to get the average number of cities per country.

6. Obtain the number of cities per country (the first 20 entries) and extract the
countries that have a population of greater than zero.

7. Plot the remaining data on a dot plot.

8. Again, filter your remaining data for cities with a population of greater
than 100,000.

9. To get a better understanding of the density of our data points on the map, use
a Voronoi tessellation layer.

10. Filter down the data even further to only cities in countries such as Germany and
Great Britain.

11. Finally, use a Delaunay triangulation layer to find the most densely
populated areas.
Geospatial Visualizations | 273

Observe the expected output of the dot plot:

Figure 5.11: A dot density visualization of the reduced dataset

The following is the expected output of the Voronoi plot:

Figure 5.12: A Voronoi visualization of densely populated cities


274 | Plotting Geospatial Data

The following is the expected output of the Delaunay triangulation:

Figure 5.13: A Delaunay triangle visualization of cities in Germany and Great Britain

Note
The solution for this activity can be found on page 436.
Geospatial Visualizations | 275

You have now completed your first activity using geoplotlib. Note how we made use
of different plots to get the information we required. Next, we will look at some more
custom features of geoplotlib that will allow us to change the map tiles provider and
create custom plotting layers.

The GeoJSON Format


The GeoJSON format is used to encode a variety of data structures, such as points,
lines, and polygons with a focus on geographical visualization. The format has a
defined structure that each valid file has to follow:

{
  "type": "Feature",
  "properties": {
    "name": "Dinagat Islands"
  },
  "geometry": {
    "type": "Point",
    "coordinates": [125.6, 10.1]
  }
}

Each object with additional properties, for example, an ID or name attribute, is a


Feature. The properties attribute simply allows additional information to be
added to the feature. The geometry attribute holds information about the type
of feature we are working with, for example, a Point, and its specific coordinates.
The coordinates define the positions for the "waypoints" of the given type. Those
coordinates define the shape of the element to be displayed by the plotting library.
276 | Plotting Geospatial Data

Exercise 5.02: Creating a Choropleth Plot with GeoJSON Data


In this exercise, we will work with GeoJSON data and also create a choropleth
visualization. GeoJSON is especially useful for displaying statistical variables in
shaded areas. In our case, the areas will be the outlines of the states of the USA.

Let's create a choropleth visualization with the given GeoJSON data:

1. Create an Exercise5.02.ipynb Jupyter Notebook within the


Chapter05/Activity5.02 folder to implement this exercise.
Then, load the dependencies for this exercise:

# importing the necessary dependencies


import json
import geoplotlib
from geoplotlib.colors import ColorMap
from geoplotlib.utils import BoundingBox

2. Since the geojson method of geoplotlib only needs a path to the us_states.
json dataset instead of a DataFrame or object, we don't need to load it.
However, since we still want to see what kind of data we are handling, we must
open the GeoJSON file and load it as a json object. We can then access its
members using simple indexing:

# displaying the fourth entry of the states dataset


with open('../../Datasets/us_states.json') as data:
    dataset = json.load(data)
   
    fourth_state = dataset.get('features')[3]
   
    # only showing one coordinate instead of all points
    fourth_state['geometry']['coordinates'] = \
        fourth_state['geometry']['coordinates'][0][0]
    print(json.dumps(fourth_state, indent=4))
Geospatial Visualizations | 277

Our dataset contains a few properties. Only the state name, NAME, and the
number of consensus areas, CENSUSAREA, are important for us in this exercise.

Note
Geospatial applications prefer GeoJSON files for persisting and exchanging
geographical data.

3. Extract the names of all the states of the USA from the dataset. Next, print the
number of states in the dataset and then print all the states as a list:

# listing the states in the dataset


with open('../../Datasets/us_states.json') as data:
    dataset = json.load(data)
   
    states = [feature['properties']['NAME'] for feature in \
             dataset.get('features')]
    print('Number of states:', len(states))
    print(states)

The following figure shows the output of the preceding code:

Figure 5.14: List of all cities in the US


278 | Plotting Geospatial Data

4. If your GeoJSON file is valid, that is, if it has the expected structure, then use the
geojson method of geoplotlib. Create a GeoJSON plot using the geojson()
method of geoplotlib:

# plotting the information from the geojson file


geoplotlib.geojson('../../Datasets/us_states.json')
geoplotlib.show()

After calling the show method, the map will show up with a focus on North
America. In the following diagram, we can already see the borders of each state:

Figure 5.15: Map with outlines of the states plotted


Geospatial Visualizations | 279

5. Rather than assigning a single value to each state, we want the darkness to
represent the number of census areas. To do this, we have to provide a method
for the color property. Map the CENSUSAREA attribute to a ColorMap class
object with 10 levels to allow a good distribution of color. Provide a maxvalue
of 300000 to the to_color method to define the upper limit of our dataset:

cmap = ColorMap('Reds', alpha=255, levels=10)


def get_color(properties):
    return cmap.to_color(properties[CENSUSAREA], \
                         maxvalue=300000,scale='lin')

As you can see in the code example, we can provide three arguments to our
ColorMap. The first one, 'Reds', in our case, defines the basic coloring
scheme. The alpha argument defines how opaque we want the color to be,
255 being 100% opaque, and 0 completely invisible. Those 8-bit values for the
Red, Green, Blue, and Alpha (RGBA) values are commonly used in styling: they
all range from 0 to 255. With the levels argument, we can define how many
"steps," that is, levels of red values, we can map to.

6. Use the us_states.json file in the Datasets folder to visualize the different
states. First, provide the color mapping to our color parameter and set the
fill parameter to True. Then, draw a black outline for each state. Use the
color argument and provide the RGBA value for black. Lastly, use the USA
constant of the BoundingBox class to set the bounding box:

"""
plotting the shaded states and adding another layer which plots the
state outlines in white
our BoundingBox should focus the USA
"""
geoplotlib.geojson('../../Datasets/us_states.json', \
                   fill=True, color=get_color)
geoplotlib.geojson('../../Datasets/us_states.json', \
                   fill=False, color=[0, 0, 0, 255])

geoplotlib.set_bbox(BoundingBox.USA)
geoplotlib.show()
280 | Plotting Geospatial Data

After executing the preceding steps, the expected output is as follows:

Figure 5.16: Choropleth visualization showing census areas in different states

A new window will open, displaying the country, USA, with the areas of its
states filled with different shades of red. The darker areas represent higher
census areas.

7. To give the user some more information about this plot, use the f_tooltip
argument to provide a tooltip displaying the name and census area value of the
state currently hovered over:

# adding the f_tooltip that


geoplotlib.geojson('../../Datasets/us_states.json', \
                   fill=True, color=get_color, \
                   f_tooltip=lambda properties: \
                             properties['NAME'] \
                             + ' - Census Areas: ' \
                             + str(properties['CENSUSAREA']))
Geospatial Visualizations | 281

geoplotlib.geojson('../../Datasets/us_states.json', \
                   fill=False, color=[0, 0, 0, 255])

geoplotlib.set_bbox(BoundingBox.USA)
geoplotlib.show()

The following is the output of the preceding code:

Figure 5.17: A choropleth visualization showing the census


area value of the state hovered over

Upon hovering, we will get a tooltip for each of the plotted areas displaying the
name of the state and the census area value.

Note
To access the source code for this specific section, please refer to
https://packt.live/30PX9Rh.

This section does not currently have an online interactive example, and will
need to be run locally.
282 | Plotting Geospatial Data

You've already built different plots and visualizations using geoplotlib. In this exercise,
we looked at displaying data from a GeoJSON file and creating a choropleth plot.

In the following topics, we will cover more advanced customizations that will give you
the tools to create more powerful visualizations.

Tile Providers
geoplotlib supports the use of different tile providers. This means that any
OpenStreetMap tile server can be used as a backdrop for our visualization. Some of
the popular free tile providers include Stamen Watercolor, Stamen Toner, Stamen
Toner Lite, and DarkMatter. Changing the tile provider can be done in two ways:

• Make use of built-in tile providers:

geoplotlib contains a few built-in tile providers with shortcuts. The following code
shows you how to use it:

geoplotlib.tiles_provider('darkmatter')

• Provide a custom object to the tiles_provider method:

By providing a custom object to geoplotlib's tiles_provider() method, you


will not only get access to the url parameter from which the map tiles are being
loaded but also see the attribution parameter displayed in the lower-right
corner of the visualization. We are also able to set a distinct caching directory
for the downloaded tiles. The following code demonstrates how to provide a
custom object:

geoplotlib.tiles_provider({\
                           'url': lambda zoom, \
                           xtile, ytile:
                           'http://a.tile.stamen.com/'\
                           'watercolor/%d/%d/%d.png' \
                           % (zoom, xtile, ytile),\
                           'tiles_dir': 'tiles_dir',
                           'attribution': \
                           'Python Data Visualization | Packt'\
})
Tile Providers | 283

The caching in tiles_dir is mandatory since, each time the map is scrolled or
zoomed into, we query new map tiles if they are not already downloaded. This
can lead to the tile provider refusing your request due to too many requests
occurring in a short period of time.

In the following exercise, we'll take a quick look at how to switch the map tile
provider. It might not seem convincing at first, but it can take your visualizations to
the next level if leveraged correctly.

Exercise 5.03: Visually Comparing Different Tile Providers


In this exercise, we will switch the map tile provider for our visualizations.
geoplotlib provides mappings for some of the most popular available map tiles.
However, we can also provide a custom object that contains the url of some tile
providers.

The following are the steps to perform the exercise:

1. Create an Exercise5.03.ipynb Jupyter Notebook within the


Chapter05/Exercise5.03 folder to implement this exercise. Import the
necessary dependencies:

import geoplotlib

We won't use a dataset in this exercise since we want to focus on the map tiles
and tile providers.

2. Display the map with the default tile provider:

geoplotlib.show()
284 | Plotting Geospatial Data

The following figure shows the output of the preceding code:

Figure 5.18: World map with the default tile provider

This will display an empty world map since we haven't specified a tile provider.
By default, it will use the CartoDB Positron map tiles.
Tile Providers | 285

3. Use the tiles_provider method and provide the 'darkmatter' tiles:

# using map tiles from the dark matter tile provider


geoplotlib.tiles_provider('darkmatter')
geoplotlib.show()

geoplotlib provides several shorthand accessors to common map tile providers.


The following figure shows the output:

Figure 5.19: World map with darkmatter map tiles


286 | Plotting Geospatial Data

In this example, we used the darkmatter map tiles. As you can see, they are
very dark and will make your visualizations pop out.

Note
We can also use different map tiles such as watercolor, toner,
toner-lite, and positron in a similar way.

4. Use the attribution element of the tiles_provider argument object (the


entity passed to the method) to provide a custom attribution:

geoplotlib.tiles_provider({
                           'url': lambda zoom, \
                           xtile, ytile: \
                           'http://a.tile.openstreetmap.fr/'\
                           'hot/%d/%d/%d.png' \
                           % (zoom, xtile, ytile),\
                           'tiles_dir': 'custom_tiles',
                           'attribution': 'Custom Tiles '\
                            'Provider – Humanitarian map style'\
})
geoplotlib.show()
Tile Providers | 287

The following figure shows the output of the preceding code:

Figure 5.20: Humanitarian map tiles from the custom tile providers object
288 | Plotting Geospatial Data

Some map tile providers have strict request limits, so you may see warning
messages if you're zooming in too fast.

Note
To access the source code for this specific section, please refer to
https://packt.live/3e6WjTT.

This section does not currently have an online interactive example, and will
need to be run locally.

You now know how to change the tile provider to give your visualization one more
layer of customizability. This also introduces us to another layer of complexity. It
all depends on the concept of our final product and whether we want to use the
"default" map tiles or some artistic map tiles.

The next section will cover how to create custom layers that can go far beyond
the ones we have described in this book. We'll look at the basic structure of the
BaseLayer class and what it takes to create a custom layer.

Custom Layers
Now that we have covered the basics of visualizing geospatial data with built-in
layers and methods to change the tile provider, we will now focus on defining our
custom layers. Custom layers allow you to create more complex data visualizations.
They also help with adding more interactivity and animation to them. Creating a
custom layer starts by defining a new class that extends the BaseLayer class that's
provided by geoplotlib. Besides the __init__ method, which initializes the class
level variables, we also have to, at the very least, extend the draw method of the
BaseLayer class already provided.
Depending on the nature of your visualization, you might also want to implement
the invalidate method, which takes care of map projection changes such as
zooming into your visualization. Both the draw and invalidate methods receive
a Projection object that takes care of the latitude and longitude mapping on our
two-dimensional viewport. These mapped points can be handed to an instance of a
BatchPainter object that provides primitives such as points, lines, and shapes to
draw those coordinates onto your map.
Custom Layers | 289

An example of a custom layer, comparable to what we will create, is this program,


which plots the cities of a selected country as dots on the map. We have a given list of
possible countries and can switch through them using the arrow keys:

# importing the necessary dependencies


import pyglet
from geoplotlib.layers import BaseLayer
from geoplotlib.core import BatchPainter

countries = ['be', 'ch', 'de', 'es', 'fr', 'it', 'nl', 'pt']

class CountrySelectLayer(BaseLayer):

    def __init__(self, data, bbox=BoundingBox.WORLD):


        self.data = data
        self.view = bbox
       
        # start with germany
        self.country_num = 0
       
    def invalidate(self, proj):
        country_data = \
                       self.data[self.data['Country'] \
                       == countries[self.country_num]]
        self.painter = BatchPainter()
       
        x, y = proj.lonlat_to_screen(country_data['lon'], \
               country_data['lat'])
        self.painter.points(x, y, 2)

    def draw(self, proj, mouse_x, mouse_y, ui_manager):


        self.painter.batch_draw()
       
    def draw(self, proj, mouse_x, mouse_y, ui_manager):
        self.painter.batch_draw()
        ui_manager.info('Displaying cities in {}'.format\
                       (countries[self.country_num]))
       
    def on_key_release(self, key, modifiers):
        if key == pyglet.window.key.RIGHT:
290 | Plotting Geospatial Data

            self.country_num = (self.country_num + 1) \
                               % len(countries)
            return True
        elif key == pyglet.window.key.LEFT:
            self.country_num = (self.country_num - 1) \
                               % len(countries)
            return True
           
        return False

    # bounding box that gets used when layer is created


    def bbox(self):
        return self.view

europe_bbox = BoundingBox(north=68.574309, \
                          west=-25.298424, \
                          south=34.266013, \
                          east=47.387123)
geoplotlib.add_layer(CountrySelectLayer(dataset, europe_bbox))
geoplotlib.show()

As we've seen several times before, we first import all the necessary dependencies for
this plot, including geoplotlib. BaseLayer and BatchPainter are dependencies
we haven't seen before, since they are only needed when writing custom layers.

BaseLayer is a class provided by geoplotlib that is extended by our custom Layer


class. This concept is called inheritance. This means that our custom class has access
to all the properties and methods defined in the BaseLayer class. This is necessary
since geoplotlib requires a predefined structure for layers to make them plottable.

The BatchPainter class is another helper for our implementation that lets us
trigger the drawing of elements onto the map.

When creating the custom layer, we simply provide the BaseLayer class in the
parentheses to tell Python to extend the given class.

The class then needs to implement at least two of the provided methods,
__init__ and draw.
__init__ defines what happens when a new custom layer is instantiated. This is
used to set the state of our layer; here, we define values such as our data to be used
and create a new BatchPainter class.
Custom Layers | 291

The draw method is called every frame and draws the defined elements using the
BatchPainter class.
In this method, we can do all sorts of calculations such as, in this case, filtering our
dataset to only contain the values of the current active timestamp. In addition to that,
we make the viewport follow our current lat and lon values by fitting the projection
to a new BoundingBox.

Since we don't want to draw everything from scratch with every frame, we use the
invalidate method, which only updates the points on the viewport. For example,
changes such as zooming.

When using interaction elements, such as switching through our countries using
the arrow keys, we can return either True or False from the on_key_pressed
method to trigger the redrawing of all the points.

Once our class is defined, we can call the add_layer method of geoplotlib to add
the newly defined layer to our visualization and finally call show() to show the map.

When executing the preceding example code, we get a visualization that, upon
switching the selected country with the arrow keys, draws the cities for the selected
country using dots on the map:

Figure 5.21: The selection of cities in Germany


292 | Plotting Geospatial Data

The following figure shows the cities in Spain after changing the selected country
using the arrow keys:

Figure 5.22: The selection of cities in Spain after changing the country using the arrow keys

In the following exercise, we will create our animated visualization by using what
we've learned about custom layers in the preceding example.

Note
Since geoplotlib operates on OpenGL, this process is highly performant and
can even draw complex visualizations quickly.

Exercise 5.04: Plotting the Movement of an Aircraft with a Custom Layer


In this exercise, we will create a custom layer to display geospatial data and also
animate your data points over time. We'll get a deeper understanding of how
geoplotlib works and how layers are created and drawn. Our dataset contains both
spatial and temporal information, which enables us to plot the flight's movement
overtime on our map.
Custom Layers | 293

Let's create a custom layer that will allow us to display geospatial data and animate
the data points over time:

1. Import pandas for the data import:

# importing the necessary dependencies


import pandas as pd

2. Use the read_csv method of pandas to load the flight_tracking.csv


dataset from the Datasets folder:

dataset = pd.read_csv('../../Datasets/flight_tracking.csv')

3. Use the head method to list the first five rows of the dataset and to understand
the columns:

# displaying the first 5 rows of the dataset


dataset.head()

Figure 5.23: The first five elements of the dataset

4. Rename the latitude and longitude columns to lat and lon by using the
rename method provided by pandas:
# renaming columns latitude to lat and longitude to lon
dataset = dataset.rename(index=str, \
          columns={"latitude": "lat", "longitude": "lon"})

Take another look at the first five elements of the dataset, and observe that the
names of the columns have changed to lat and lon:

# displaying the first 5 rows of the dataset


dataset.head()

Figure 5.24: The dataset with the lat and lon columns
294 | Plotting Geospatial Data

5. Since we want to get a visualization over time in this activity, we need to work
with date and time. If we take a closer look at our dataset, it shows us that
date and time are separated into two columns. Combine date and time into
a timestamp, using the to_epoch method already provided:

# method to convert date and time to an unix timestamp


from datetime import datetime
def to_epoch(date, time):
    try:
        timestamp = round(datetime.strptime('{} {}'.\
                    format(date, time), \
                    '%Y/%m/%d %H:%M:%S.%f').timestamp())
        return timestamp
    except ValueError:
        return round(datetime.strptime('2017/09/11 17:02:06.418', \
                     '%Y/%m/%d %H:%M:%S.%f').timestamp())

6. Use to_epoch and the apply method provided by the pandas DataFrame to
create a new column called timestamp that holds the Unix timestamp:

"""
create a new column called timestamp with the to_epoch method applied
"""
dataset['timestamp'] = dataset.apply(lambda x: to_epoch\
                                    (x['date'], x['time']), \
                                    axis=1)

7. Take another look at our dataset. We now have a new column that holds the
Unix timestamps:

# displaying the first 5 rows of the dataset


dataset.head()

Figure 5.25: The dataset with a timestamp column added


Custom Layers | 295

Since our dataset is now ready to be used with all the necessary columns
in place, we can start writing our custom layer. This layer will display each
point once it reaches the timestamp that's provided in the dataset. It will be
displayed for a few seconds before it disappears. We'll need to keep track of the
current timestamp in our custom layer. Consolidating what we learned in the
theoretical section of this topic, we have an __init__ method that constructs
our custom TrackLayer.

8. In the draw method, filter the dataset for all the elements that are in the
mentioned time range and use each element of the filtered list to display it on
the map with color that's provided by the colorbrewer method.

Since our dataset only contains data from a specific time range and we're always
incrementing the time, we want to check whether there are still any elements
with timestamps after the current timestamp. If not, we want to set our
current timestamp to the earliest timestamp that's available in the dataset. The
following code shows how we can create a custom layer:

# custom layer creation


import geoplotlib
from geoplotlib.layers import BaseLayer
from geoplotlib.core import BatchPainter
from geoplotlib.colors import colorbrewer
from geoplotlib.utils import epoch_to_str, BoundingBox

class TrackLayer(BaseLayer):
    def __init__(self, dataset, bbox=BoundingBox.WORLD):
        self.data = dataset
        self.cmap = colorbrewer(self.data['hex_ident'], \
                                alpha=200)
        self.time = self.data['timestamp'].min()
        self.painter = BatchPainter()
        self.view = bbox
    def draw(self, proj, mouse_x, mouse_y, ui_manager):
        self.painter = BatchPainter()
        df = self.data.where((self.data['timestamp'] \
                              > self.time) \
                              & (self.data['timestamp'] \
                              <= self.time + 180))
296 | Plotting Geospatial Data

        for element in set(df['hex_ident']):


            grp = df.where(df['hex_ident'] == element)
            self.painter.set_color(self.cmap[element])
            x, y = proj.lonlat_to_screen(grp['lon'], grp['lat'])
            self.painter.points(x, y, 15, rounded=True)
        self.time += 1
        if self.time > self.data['timestamp'].max():
            self.time = self.data['timestamp'].min()
        self.painter.batch_draw()
        ui_manager.info('Current timestamp: {}'.\
                        format(epoch_to_str(self.time)))
       
    # bounding box that gets used when the layer is created
    def bbox(self):
        return self.view

9. Define a custom BoundingBox that focuses our view on this area, since the
dataset only contains data from the area around Leeds in the UK:

# bounding box for our view on Leeds


from geoplotlib.utils import BoundingBox
leeds_bbox = BoundingBox(north=53.8074, \
                         west=-3, \
                         south=53.7074 , \
                         east=0)

10. geoplotlib sometimes requires you to provide a DataAccessObject


class instead of a pandas DataFrame. Use geoplotlib to convert any pandas
DataFrame into a DataAccessObject class:

# displaying our custom layer using add_layer


from geoplotlib.utils import DataAccessObject
data = DataAccessObject(dataset)
geoplotlib.add_layer(TrackLayer(data, bbox=leeds_bbox))
geoplotlib.show()
Custom Layers | 297

The following is the output of the preceding code:

Figure 5.26: Final animated tracking map that displays the routes of the aircraft
298 | Plotting Geospatial Data

Note
To access the source code for this specific section, please refer to
https://packt.live/3htmztU.

This section does not currently have an online interactive example, and will
need to be run locally.

You have now completed the custom layer activity using geoplotlib. We've applied
several preprocessing steps to shape the dataset as we want to have it. We've also
written a custom layer to display spatial data in the temporal space. Our custom layer
even has a level of animation. This is something we'll look into more in the following
chapter about Bokeh. We will now implement an activity that will help us get more
acquainted with custom layers in Bokeh.

Activity 5.02: Visualizing City Density by the First Letter Using an Interactive
Custom Layer
In this last activity for geoplotlib, you'll combine all the methodologies learned in the
previous exercises and the activity to create an interactive visualization that displays
the cities that start with a given letter, by merely pressing the left and right arrow keys
on your keyboard.

Since we use the same setup to create custom layers as the library does, you will be
able to understand the library implementations of most of the layers provided by
geoplotlib after this activity.

1. Create an Activity5.02.ipynb Jupyter Notebook within the


Chapter05/Activity5.02 folder to implement this activity.
2. Import the dependencies.

3. Load the world_cities_pop.csv dataset from the Datasets folder using


pandas and look at the first five rows to understand its structure.

4. Map the Latitude and Longitude columns to lat and lon.

5. Filter the dataset to only contain European cities by using the given europe_
country_codes list.
Custom Layers | 299

6. Compare the length of all data with the filtered data of Europe by printing the
length of both.

7. Filter down the European dataset to get a dataset that only contains cities that
start with the letter Z.

8. Print its length and the first five rows using the head method.

9. Create a dot density plot with a tooltip that shows the country code and the
name of the city separated by a -. Use the DataAccessObject to create a
copy of our dataset, which allows the use of f_tooltip. The following is the
expected output of the dot density plot:

Figure 5.27: Cities starting with a Z in Europe as dots


300 | Plotting Geospatial Data

10. Create a Voronoi plot with the same dataset that only contains cities that start
with Z. Use the 'Reds_r' color map and set the alpha value to 50 to make
sure you still see the map tiles. The following is the expected output of the
Voronoi plot:

Figure 5.28: Voronoi visualization of cities starting with a Z in Europe

11. Create a custom layer that plots all the cities in Europe dataset that starts with
the provided letter. Make it interactive so that by using the left and right arrow
keys, we can switch between the letters. To do that, first, filter the self.data
dataset in the invalidate method using the current letter acquired from the
start_letters array using self.start_letter indexing.
12. Create a new BatchPainter() function and project the lon and lat values
to x and y values. Use the BatchPainter function to paint the points on the
map with a size of 2.
Custom Layers | 301

13. Call the batch_draw() method in the draw method and use the ui_
manager to add an info dialog to the screen telling the user which starting
letter is currently being used.

14. Check which key is pressed using pyglet: pyglet.window.key.RIGHT. If


the right or left key is pressed, increment or decrement the start_letter
value of the FilterLayer class accordingly. (Use modulo to allow rotation,
which should happen when A->Z or Z->A). Make sure that you return True in
the on_key_release method if you changed the start_letter to trigger a
redrawing of the points.

15. Add the custom layer using the add_layer method and provide the given
europe_bbox as a BoundingBox class.
The following is the expected output of the custom filter layer:

Figure 5.29: A custom filter layer displaying European cities starting with A
302 | Plotting Geospatial Data

If we press the right arrow twice, we will see the cities starting with C instead:

Figure 5.30: A custom filter layer displaying European cities starting with C

Note
The solution for this activity can be found on page 447.

This last activity has a custom layer that uses all the properties described by
geoplotlib. All of the already provided layers by geoplotlib are created using the same
structure. This means that you're now able to dig into the source code and create
your own advanced layers.
306 | Making Things Interactive with Bokeh

Introduction
Bokeh is an interactive visualization library focused on modern browsers and the
web. Other than Matplotlib or geoplotlib, the plots and visualizations we are going to
create in this chapter will be based on JavaScript widgets. Bokeh allows us to create
visually appealing plots and graphs nearly out of the box without much styling. In
addition to that, it helps us construct performant interactive dashboards based on
large static datasets or even streaming data.

Bokeh has been around since 2013, with version 1.4.0 being released in November
2019. It targets modern web browsers to present interactive visualizations to users
rather than static images. The following are some of the features of Bokeh:

• Simple visualizations: Through its different interfaces, it targets users of many


skill levels, providing an API for quick and straightforward visualizations as well
as more complex and extremely customizable ones.

• Excellent animated visualizations: It provides high performance and can,


therefore, work on large or even streaming datasets, which makes it the go-to
choice for animated visualizations and data analysis.

• Inter-visualization interactivity: This is a web-based approach; it's easy to


combine several plots and create unique and impactful dashboards
with visualizations that can be interconnected to create
inter-visualization interactivity.

• Supports multiple languages: Other than Matplotlib and geoplotlib, Bokeh has
libraries for both Python and JavaScript, in addition to several other
popular languages.

• Multiple ways to perform a task: Adding interactivity to Bokeh visualizations


can be done in several ways. The simplest built-in way is the ability to zoom and
pan in and out of your visualization. This gives the users better control of what
they want to see. It also allows users to filter and transform the data.

• Beautiful chart styling: The tech stack is based on Tornado in the backend
and is powered by D3 in the frontend. D3 is a JavaScript library for creating
outstanding visualizations. Using the underlying D3 visuals allows us to create
beautiful plots without much custom styling.

Since we are using Jupyter Notebook throughout this book, it's worth mentioning that
Bokeh, including its interactivity, is natively supported in Notebook.
Introduction | 307

Concepts of Bokeh
The basic concept of Bokeh is, in some ways, comparable to that of Matplotlib. In
Bokeh, we have a figure as our root element, which has sub-elements such as a title,
an axis, and glyphs. Glyphs have to be added to a figure, which can take on different
shapes, such as circles, bars, and triangles. The following hierarchy shows the
different concepts of Bokeh:

Figure 6.1: Concepts of Bokeh


308 | Making Things Interactive with Bokeh

Interfaces in Bokeh
The interface-based approach provides different levels of complexity for users that
either want to create some basic plots with very few customizable parameters or
want full control over their visualizations to customize every single element of their
plots. This layered approach is divided into two levels:

• Plotting: This layer is customizable.

• Models interface: This layer is complex and provides an open approach to


designing charts.

Note
The models interface is the basic building block for all plots.

The following are the two levels of the layered approach to interfaces:

• bokeh.plotting

This mid-level interface has a somewhat comparable API to Matplotlib. The


workflow is to create a figure and then enrich this figure with different glyphs
that render data points in the figure. As in Matplotlib, the composition of
sub-elements such as axes, grids, and the inspector (which provide basic ways
of exploring your data through zooming, panning, and hovering) is done without
additional configuration.

The vital thing to note here is that even though its setup is done automatically,
we can configure the sub-elements. When using this interface, the creation of
the scene graph used by BokehJS is handled automatically too.

• bokeh.models

This low-level interface is composed of two libraries: the JavaScript library called
BokehJS, which gets used for displaying the charts in the browser, and the core
plot creation Python code, which provides the developer interface. Internally, the
definition created in Python creates JSON objects that hold the declaration for
the JavaScript representation in the browser.
Introduction | 309

The models interface provides complete control over how Bokeh plots and
widgets (elements that enable users to interact with the data displayed) are
assembled and configured. This means that it is up to the developer to ensure
the correctness of the scene graph (a collection of objects describing
the visualization).

Output
Outputting Bokeh charts is straightforward. There are three ways this can be done:

• The .show() method: The primary option is to display the plot in an HTML page
using this method.

• The inline .show() method: When using inline plotting with a Jupyter
Notebook, the .show() method will allow you to display the chart inside
your Notebook.

• The .output_file() method: You're also able to directly save the


visualization to a file without any overhead using the .output_file()
method. This will create a new file at the given path with a given name.

The most powerful way of providing your visualization is through the use of the
Bokeh server.

Bokeh Server
Bokeh creates scene graph JSON objects that will be interpreted by the BokehJS
library to create the visualization output. This process gives you a unified format for
other languages to create the same Bokeh plots and visualizations, independently of
the language used.

To create more complex visualizations and leverage the tooling provided by Python,
we need a way to keep our visualizations in sync with one another. This way, we can
not only filter data but also do calculations and operations on the server-side, which
updates the visualizations in real-time.

In addition to that, since we will have an entry point for data, we can create
visualizations that get fed by streams instead of static datasets. This design provides a
way to develop more complex systems with even greater capabilities.
310 | Making Things Interactive with Bokeh

Looking at the scheme of this architecture, we can see that the documents are
provided on the server-side, then moved over to the browser, which then inserts
it into the BokehJS library. This insertion will trigger the interpretation by BokehJS,
which will then create the visualization. The following diagram describes how the
Bokeh server works:

Figure 6.2: The Bokeh server

Presentation
In Bokeh, presentations help make the visualization more interactive by using
different features, such as interactions, styling, tools, and layouts.

Interactions

Probably the most exciting feature of Bokeh is its interactions. There are two types of
interactions: passive and active.
Introduction | 311

Passive interactions are actions that the users can take that doesn't change the
dataset. In Bokeh, this is called the inspector. As we mentioned before, the inspector
contains attributes such as zooming, panning, and hovering over data. This tooling
allows the user to inspect the data in more detail and might provide better insights
by allowing the user to observe a zoomed-in subset of the visualized data points. The
elements highlighted with a box in the following figure show the essential passive
interaction elements provided by Bokeh. They include zooming, panning, and
clipping data.

Figure 6.3: Example of passive interaction zooming

Active interactions are actions that directly change the displayed data. This includes
actions such as selecting subsets of data or filtering the dataset based on parameters.
Widgets are the most prominent of active interactions since they allow users to
manipulate the displayed data with handlers. Examples of available widgets are
buttons, sliders, and checkboxes.
312 | Making Things Interactive with Bokeh

Referring back to the subsection about the output styles, these widgets can be
used in both the so-called standalone applications in the browser and the Bokeh
server. This will help us consolidate the recently learned theoretical concepts and
make things more transparent. Some of the interactions in Bokeh are tab panes,
dropdowns, multi-selects, radio groups, text inputs, check button groups, data tables,
and sliders. The elements highlighted with a red box in the following figure show a
custom active interaction widget for the same plot we looked at in the example of
passive interaction.

Figure 6.4: Example of custom active interaction widgets


Introduction | 313

Integrating
Embedding Bokeh visualizations can take two forms:

• HTML document: These are the standalone HTML documents. These


documents are self-contained, which means that all the necessary dependencies
of Bokeh are part of the generated HTML document. This format is simple to
generate and can be sent to clients or quickly displayed on a web page.

• Bokeh applications: Backed by a Bokeh server, these provide the possibility to


connect to, for example, Python tooling for more advanced visualizations.

Bokeh is a little bit more complicated than Matplotlib with Seaborn and has its
drawbacks like every other library. Once you have the basic workflow down, however,
you're able to quickly extend basic visualizations with interactivity features to give
power to the user.

Note
One interesting feature is the to_bokeh method, which allows you to
plot Matplotlib figures with Bokeh without configuration overhead. Further
information about this method is available at https://bokeh.pydata.org/
en/0.12.3/docs/user_guide/compat.html.

In the following exercises and activities, we'll consolidate the theoretical knowledge
and build several simple visualizations to explain Bokeh and its two interfaces.
After we've covered the basic usage, we will compare the plotting and models
interfaces and work with widgets that add interactivity to the visualizations.
314 | Making Things Interactive with Bokeh

Basic Plotting
As mentioned before, the plotting interface of Bokeh gives us a higher-level
abstraction, which allows us to quickly visualize data points on a grid.

To create a new plot, we have to define our imports to load the


necessary dependencies:

# importing the necessary dependencies


import pandas as pd
from bokeh.plotting import figure, show
from bokeh.io import output_notebook

output_notebook()

Before we can create a plot, we need to import the dataset. In the examples in this
chapter, we will work with a computer hardware dataset. It can be imported by using
pandas' read_csv method.

# loading the Dataset with pandas


dataset = pd.read_csv('../../Datasets/computer_hardware.csv')

The basic flow when using the plotting interface is comparable to that of
Matplotlib. We first create a figure. This figure is then used as a container to define
elements and call methods on:

# adding an index column to use it for the x-axis


dataset['index'] = dataset.index

# plotting the cache memory levels as line


plot = figure(title='Cache per Hardware', \
              x_axis_label='Hardware index', \
              y_axis_label='Cache Memory')
plot.line(dataset['index'], dataset['cach'], line_width=5)

show(plot)

Once we have created a new figure instance using the imported figure() method,
we can use it to draw lines, circles, or any glyph objects that Bokeh offers. Note that
the first two arguments of the plot.line method is datasets that contain an equal
number of elements to plot the element.
Basic Plotting | 315

To display the plot, we then call the show() method we imported from the bokeh.
plotting interface earlier on. The following figure shows the output of the
preceding code:

Figure 6.5: Line plot showing the cache memory of different hardware
316 | Making Things Interactive with Bokeh

Since the interface of different plotting types is unified, scatter plots can be created in
the same way as line plots:

# plotting the hardware cache as dots


plot = figure(title='Cache per Hardware', \
              x_axis_label='Hardware', \
              y_axis_label='Cache Memory')
plot.scatter(dataset['index'], dataset['cach'], size=5, color='red')
show(plot)

The following figure shows the output of the preceding code:

Figure 6.6: Scatter plot showing the cache memory of different hardware
Basic Plotting | 317

In many cases, a visualization will have several attributes of a dataset plotted. A


legend will help users understand which attributes they are looking at. Legends
display a mapping between, for example, lines in the plot and according to
information such as the hardware cache memory.

By adding a legend_label argument to the plot calls like plot.line(), we get a


small box containing the information in the top-right corner (by default):

# plotting cache memory and cycle time with legend


plot = figure(title='Attributes per Hardware', \
              x_axis_label='Hardware index', \
              y_axis_label='Attribute Value')
plot.line(dataset['index'], dataset['cach'], \
          line_width=5, legend_label='Cache Memory')
plot.line(dataset['index'], dataset['myct'], line_width=5, \
          color='red', legend_label='Cycle time in ns')

show(plot)
318 | Making Things Interactive with Bokeh

The following figure shows the output of the preceding code:

Figure 6.7: Line plots displaying the cache memory and cycle time per
hardware with the legend
Basic Plotting | 319

When looking at the preceding example, we can see that once we have several lines,
the visualization can get cluttered.

We can give the user the ability to mute, meaning defocus, the clicked element in
the legend.

Adding a muted_alpha argument to the line plotting and adding a click_policy


of mute to our legend element are the only two steps needed:

# adding mutability to the legend


plot = figure(title='Attributes per Hardware', \
              x_axis_label='Hardware index', \
              y_axis_label='Attribute Value')
plot.line(dataset['index'], dataset['cach'], line_width=5, \
          legend_label='Cache Memory', muted_alpha=0.2)
plot.line(dataset['index'], dataset['myct'], line_width=5, \
          color='red', legend_label='Cycle time in ns', \
          muted_alpha=0.2)

plot.legend.click_policy="mute"

show(plot)
320 | Making Things Interactive with Bokeh

The following figure shows the output of the preceding code:

Figure 6.8: Line plots displaying the cache memory and cycle time per hardware with a
mutable legend; cycle time is also muted
Basic Plotting | 321

Next, we will do an exercise to plot the graph using Bokeh.

Note
All the exercises and activities in this chapter are developed using
Jupyter Notebook. The files can be downloaded from the following link:
https://packt.live/39txwH5. All the datasets used in this chapter can be found
at https://packt.live/3bzApYN.

Exercise 6.01: Plotting with Bokeh


In this exercise, we want to use bokeh.plotting interface, which is focused on
providing a simple interface for quick visualization creation. We will use world_
population dataset. This dataset shows the population of different countries over
the years. Follow these steps:

1. Create an Exercise6.01.ipynb Jupyter notebook within the Chapter06/


Exercise6.01 folder.
2. Import the figure (which will initialize a plot) and the show method (which
displays the plot) from plotting our library:

import pandas as pd
from bokeh.plotting import figure, show

3. Import and call the output_notebook method from the io interface of Bokeh
to display the plots inside a Jupyter Notebook:

from bokeh.io import output_notebook


output_notebook()

4. Use pandas to load the world_population dataset:

dataset = pd.read_csv('../../Datasets/world_population.csv', \
                      index_col=0)

5. Verify that our data has been successfully loaded by calling head on
our DataFrame:

dataset.head()
322 | Making Things Interactive with Bokeh

The following figure shows the output:

Figure 6.9: Loading the top five rows of the world_population dataset
using the head method
Basic Plotting | 323

6. Populate our x-axis and y-axis with some data extraction. The x-axis will hold all
the years that are present in our columns. The y-axis will hold the population
density values of the countries. Start with Germany:

# preparing our data for Germany


years = [year for year in dataset.columns if not year[0].isalpha()]
de_vals = [dataset.loc[['Germany']][year] for year in years]

7. After extracting the necessary data, create a new plot by calling the Bokeh
figure method. Provide parameters such as title, x_axis_label, and
y_axis_label to define the descriptions displayed on our plot. Once our
plot is created, we can add glyphs to it. Here, we will use a simple line. Set the
legend_label parameter next to the x and y values to get an informative
legend in our visualization:

"""
plotting the population density change in Germany in the given years
"""
plot = figure(title='Population Density of Germany', \
              x_axis_label='Year', \
              y_axis_label='Population Density')
plot.line(years, de_vals, line_width=2, legend_label='Germany')
show(plot)
324 | Making Things Interactive with Bokeh

The following figure shows the output of the preceding code:

Figure 6.10: Creating a line plot from the population density data of Germany
Basic Plotting | 325

8. Now add another country—in this case, Switzerland. Use the same technique
that we used with Germany to extract the data for Switzerland:

# preparing the data for the second country


ch_vals = [dataset.loc[['Switzerland']][year] for year in years]

9. We can add several layers of glyphs on to our figure plot. We can also
stack different glyphs on top of one another, thus giving specific and data-
improved visuals. Add an orange line to the plot that displays the data from
Switzerland. Also, plot orange circles for each data point of the ch_vals list
and assign it the same legend_label to combine both representations, the
line, and circles:

"""
plotting the data for Germany and Switzerland in one visualization,
adding circles for each data point for Switzerland
"""
plot = \
figure(title='Population Density of Germany and Switzerland', \
       x_axis_label='Year', y_axis_label='Population Density')
plot.line(years, de_vals, line_width=2, legend_label='Germany')
plot.line(years, ch_vals, line_width=2, color='orange', legend_
label='Switzerland')
plot.circle(years, ch_vals, size=4, line_color='orange', \
            fill_color='white', legend_label='Switzerland')
show(plot)
326 | Making Things Interactive with Bokeh

The following figure shows the output of the preceding code:

Figure 6.11: Adding Switzerland to the plot


Basic Plotting | 327

10. When looking at a larger amount of data for different countries, it makes sense
to have a plot for each of them separately. Use gridplot layout:

"""
plotting the Germany and Switzerland plot in two different
visualizations that are interconnected in terms of view port
"""
from bokeh.layouts import gridplot
plot_de = figure(title='Population Density of Germany', \
                 x_axis_label='Year', \
                 y_axis_label='Population Density', \
                 plot_height=300)

plot_ch = figure(title='Population Density of Switzerland', \


                 x_axis_label='Year', \
                 y_axis_label='Population Density', \
                 plot_height=300, x_range=plot_de.x_range, \
                 y_range=plot_de.y_range)

plot_de.line(years, de_vals, line_width=2)


plot_ch.line(years, ch_vals, line_width=2)
plot = gridplot([[plot_de, plot_ch]])
show(plot)

The following figure shows the output of the preceding code:

Figure 6.12: Using a gridplot to display the country plots next to each other
328 | Making Things Interactive with Bokeh

11. Realign the plots vertically by passing a two-dimensional array to the


gridplot method:
# plotting the preceding declared figures in a vertical manner
plot_v = gridplot([[plot_de], [plot_ch]])
show(plot_v)

The following screenshot shows the output of the preceding code:

Figure 6.13: Using the gridplot method to arrange the visualizations vertically
Basic Plotting | 329

Note
To access the source code for this specific section, please refer to
https://packt.live/2Beg0KY.

You can also run this example online at https://packt.live/3e1Hbr0.

We have now covered the very basics of Bokeh. Using the plotting interface makes
it easy to get some quick visualizations in place. This helps you understand the data
you're working with.

This simplicity is achieved by abstracting away complexity, and we lose much control
by using the plotting interface. In the next exercise, we'll compare the plotting
and models interfaces to show you how much abstraction is added to plotting.

Let's implement an exercise to compare the plotting and models interfaces.

Exercise 6.02: Comparing the Plotting and Models Interfaces


In this exercise, we want to compare the plotting and models interfaces. We will
compare them by creating a basic plot with the high-level plotting interface and
then recreate this plot by using the lower-level models interface. This will show us
the differences between these two interfaces and set us up for the next exercises, in
which we will need to understand how to use the models interface. Follow
these steps:

1. Create an Exercise6.02.ipynb Jupyter Notebook within the Chapter06/


Exercise6.02 folder to implement this exercise.
2. Import the figure (which will initialize a plot) and the show method (which
displays the plot). Also, import and call the output_notebook method from
the io interface of Bokeh to plot inline:

import numpy as np
import pandas as pd
from bokeh.io import output_notebook

output_notebook()
330 | Making Things Interactive with Bokeh

3. Use pandas to load our world_population dataset:

dataset = pd.read_csv('../../Datasets/world_population.csv', \
                      index_col=0)

4. Call head on our DataFrame to verify that our data has been
successfully loaded:

dataset.head()

The following screenshot shows the output of the preceding code:

Figure 6.14: Loading the top five rows of the world_population dataset
using the head method
Basic Plotting | 331

5. Import figure and show to display our plot:

from bokeh.plotting import figure, show

6. Create three lists that have years present in the dataset, the mean population
density for the whole dataset for each year, and the mean population density
per year for Japan:

years = [year for year in dataset.columns if not year[0].isalpha()]


mean_pop_vals = [np.mean(dataset[year]) for year in years]
jp_vals = [dataset.loc[['Japan']][year] for year in years]

7. Use the plot element and apply our glyphs elements to it. Plot the global mean
with a line and the mean of Japan with crosses. Set the legend location to the
bottom-right corner:

plot = \
figure(title='Global Mean Population Density compared to Japan', \
       x_axis_label='Year', y_axis_label='Population Density')

plot.line(years, mean_pop_vals, line_width=2, \


          legend_label='Global Mean')
plot.cross(years, jp_vals, legend_label='Japan', line_color='red')

plot.legend.location = 'bottom_right'

show(plot)
332 | Making Things Interactive with Bokeh

The following screenshot shows the output of the preceding code:

Figure 6.15: Line plots comparing the global mean population density with that of Japan

As we can see in the preceding diagram, we have many elements already in


place. This means that we already have the right x-axis labels, the matching
range for the y-axis, and our legend is nicely placed in the upper-right corner
without much configuration.
Basic Plotting | 333

Using the models Interface

The models interface is of a much lower level than other interfaces. We can
already see this when looking at the list of imports we need for a
comparable plot.

8. Import Grid, Plot, LinearAxis, RangeId, Line, Cross,


ColumnDataSource, SingleIntervalTicker, YearsTicker, the
Glyphrenderer, Title, Legend, and LegendItem from the submodules of
the models interface:

# importing the models dependencies


from bokeh.io import show
from bokeh.models.grids import Grid
from bokeh.models.plots import Plot
from bokeh.models.axes import LinearAxis
from bokeh.models.ranges import Range1d
from bokeh.models.glyphs import Line, Cross
from bokeh.models.sources import ColumnDataSource
from bokeh.models.tickers import SingleIntervalTicker, YearsTicker
from bokeh.models.renderers import GlyphRenderer
from bokeh.models.annotations import Title, Legend, LegendItem

9. Before we build our plot, we have to find the min and max values for the y-axis
since we don't want to have too large or too small a range of values. Get all the
mean values for global and Japan without any invalid values. Get their smallest
and largest values and pass them to the constructor of Range1d. For the x-axis,
our list of years is pre-defined:

# defining the range for the x and y axis


extracted_mean_pop_vals = \
[val for i, val in enumerate(mean_pop_vals) \
if i not in [0, len(mean_pop_vals) - 1]]

extracted_jp_vals = \
[jp_val['Japan'] for i, jp_val in enumerate(jp_vals) \
if i not in [0, len(jp_vals) - 1]]

min_pop_density = min(extracted_mean_pop_vals)
min_jp_densitiy = min(extracted_jp_vals)
min_y = int(min(min_pop_density, min_jp_densitiy))
max_pop_density = max(extracted_mean_pop_vals)
334 | Making Things Interactive with Bokeh

max_jp_densitiy = max(extracted_jp_vals)
max_y = int(max(max_jp_densitiy, max_pop_density))
xdr = Range1d(int(years[0]), int(years[-1]))
ydr = Range1d(min_y, max_y)

10. Next, create two Axis objects, which will be used to display the axis lines and
the label for the axis. Since we also want ticks between the different values, pass
in a Ticker object that creates this setup:

axis_def = dict(axis_line_color='#222222', axis_line_width=1, \


                major_tick_line_color='#222222', \
                major_label_text_color='#222222', \
                major_tick_line_width=1)
x_axis = LinearAxis(ticker = SingleIntervalTicker(interval=10), \
                    axis_label = 'Year', **axis_def)
y_axis = LinearAxis(ticker = SingleIntervalTicker(interval=50), \
                    axis_label = 'Population Density', **axis_def)

11. Create the title by passing a Title object to the title attribute of the
Plot object:
# creating the plot object
title = \
Title(align = 'left', \
      text = 'Global Mean Population Density compared to Japan')
plot = Plot(x_range=xdr, y_range=ydr, plot_width=650, \
            plot_height=600, title=title)

12. Try to display our plot now by using the show method. Since we have no
renderers defined at the moment, we will get an error. We need to add elements
to our plot:

"""
error will be thrown because we are missing renderers that are
created when adding elements
"""
show(plot)
Basic Plotting | 335

The following screenshot shows the output of the preceding code:

Figure 6.16: Empty plot with title

13. Insert the data into a DataSource object. This can then be used to map the
data source to the glyph object that will be displayed in the plot:

# creating the data display


line_source = ColumnDataSource(dict(x=years, y=mean_pop_vals))
line_glyph = Line(x='x', y='y', line_color='#2678b2', \
                  line_width=2)
cross_source = ColumnDataSource(dict(x=years, y=jp_vals))
cross_glyph = Cross(x='x', y='y', line_color='#fc1d26')

14. Use the right add method to add objects to the plot. For layout elements such as
the Axis objects, use the add_layout method. Glyphs, which display our data,
have to be added with the add_glyph method:

plot.add_layout(x_axis, 'below')
plot.add_layout(y_axis, 'left')
line_renderer = plot.add_glyph(line_source, line_glyph)
cross_renderer = plot.add_glyph(cross_source, cross_glyph)
336 | Making Things Interactive with Bokeh

15. Show our plot again to see our lines are in place:

show(plot)

The following screenshot shows the output of the preceding code:

Figure 6.17: A models interface-based plot displaying the lines and axes
Basic Plotting | 337

16. Use an object to add a legend to the plot. Each LegendItem object will be
displayed in one line in the legend:

legend_items= [LegendItem(label='Global Mean', \


                          renderers=[line_renderer]), \
                          LegendItem(label='Japan', \
                          renderers=[cross_renderer])]
legend = Legend(items=legend_items, location='bottom_right')

17. Create the grid by instantiating two Grid objects, one for each axis. Provide the
tickers of the previously created x and y axes:

# creating the grid


x_grid = Grid(dimension=0, ticker=x_axis.ticker)
y_grid = Grid(dimension=1, ticker=y_axis.ticker)

18. Finally, use the add_layout method to add the grid and the legend to our plot.
After this, display our complete plot, which will look like the one we created in
the first task, with only four lines of code:

plot.add_layout(legend)
plot.add_layout(x_grid)
plot.add_layout(y_grid)
show(plot)
338 | Making Things Interactive with Bokeh

The following screenshot shows the output of the preceding code:

Figure 6.18: Full recreation of the visualization done with the plotting interface

As you can see, the models interface should not be used for simple plots. It's
meant to provide the full power of Bokeh to experienced users that have specific
requirements that need more than the plotting interface.

Note
To access the source code for this specific section, please refer to
https://packt.live/3fq8pIf.

You can also run this example online at https://packt.live/2YHFOaD.


Basic Plotting | 339

We have looked at the difference between the high-level plotting and low-level
models interface now. This will help us understand the internal workings and
potential future errors better. In this following activity, we'll use what we've already
learned and created a basic visualization that plots the mean car price of each
manufacturer from our dataset.

Next, we will color each data point with a color based on a given value. In Bokeh, like
in geoplotlib, this can be done using ColorMapper.

ColorMapper can map specific values to a given color in the selected spectrum. By
providing the minimum and maximum value for a variable, we define the range in
which colors are returned:

# adding color based on the mean price to our elements


from bokeh.models import LinearColorMapper

color_mapper = LinearColorMapper(palette='Magma256', \
                                 low=min(dataset['cach']), \
                                 high=max(dataset['cach']))

plot = figure(title='Cache per Hardware', \


              x_axis_label='Hardware', \
              y_axis_label='Cache Memory')
plot.scatter(dataset['index'], dataset['cach'], \
             color={'field': 'y', 'transform': color_mapper}, \
             size=10)

show(plot)
340 | Making Things Interactive with Bokeh

The following screenshot shows the output of the preceding code:

Figure 6.19: Cache memory colored using the amount of cache

Next, we will implement all the concepts related to Bokeh we have learned so far.
Basic Plotting | 341

Activity 6.01: Plotting Mean Car Prices of Manufacturers


This activity will combine everything that you have learned about Bokeh so far. We
will use this knowledge to create a visualization that displays the mean price of each
car manufacturer of our dataset.

Our automobile dataset contains the following columns:

• make: Manufacturer of the car

• fuel-type: Diesel or gas

• num-of-doors: Number of doors

• body-style: Body style of the car, for example, convertible

• engine-location: Front or rear

• length: Continuous from 141.1 to 208.1

• width: Continuous from 60.3 to 72.3

• height: Continuous from 47.8 to 59.8

• num-of-cylinders: Number of cylinders, for example, eight

• horsepower: Amount of horsepower

• peak-rpm: Maximum RPM

• city-mpg: Fuel consumption in the city

• highway-mpg: Fuel consumption on the highway

• price: Price of the car

Note that we will use only the make and price columns in our activity.

In the process, we will first plot all cars with their prices and then slowly develop
a more sophisticated visualization that also uses color to visually focus the
manufacturers with the highest mean prices.

1. Create an Activity6.01.ipynb Jupyter Notebook within the


Chapter06/Activity6.01 folder.
2. Import pandas with an alias and make sure to enable Notebook output using
the bokeh.io interface.
342 | Making Things Interactive with Bokeh

3. Load the automobiles.csv dataset from the Datasets folder using pandas.
Make sure that the dataset is loaded by displaying the first five elements of
the dataset.

4. Import figure and show from Bokeh's plotting interface.

5. Add a new column index to our dataset by assigning it to the values from our
dataset.index.
6. Create a new figure and plot each car using a scatter plot with the index and
price column. Give the visualization a title of Car prices and name the x-axis
Car Index. The y-axis should be named Price.
Grouping cars from manufacturers together

7. Group the dataset using groupby and the column make. Then use the mean
method to get the mean value for each column. We don't want the make
column to be used as an index, so provide the as_index=False
argument to groupby.

8. Create a new figure with a title of Car Manufacturer Mean Prices, an


x-axis of Car Manufacturer, and a y-label of Mean Price. In addition to
that, handle the categorical data by providing the x_range argument to the
figure with the make column.

9. Assign the value of vertical to the xaxis.major_label_orientation


attribute of our grouped_plot. Call the show method again to display
the visualization.

Adding color

10. Import and set up a new LinearColorMapper object with a palette of


Magma256, and the min and max prices for the low and high arguments.
11. Create a new figure with the same name, labels, and x_range as before.

12. Plot each manufacturer and provide a size argument with a size of 15.

13. Provide the color argument to the scatter method and use the field and
transform attributes to provide the column (y) and the color_mapper.
14. Set the label orientation to vertical.
Basic Plotting | 343

The final output will look like this:

Figure 6.20: Final visualization displaying the mean car price for each manufacturer

Note
The solution for this activity can be found on page 456.
344 | Making Things Interactive with Bokeh

In the next section, we will create interactive visualizations that allow the user to
modify the data that is displayed.

Adding Widgets
One of the most powerful features of Bokeh is the ability to use widgets to
interactively change the data that's displayed in a visualization. To understand the
importance of interactivity in your visualizations, imagine seeing a static visualization
about stock prices that only shows data for the last year.

If you're interested in seeing the current year or even visually comparing it to the
recent and coming years, static plots won't be suitable. You would need to create one
plot for every year or even overlay different years on one visualization, which would
make it much harder to read.

Comparing this to a simple plot that lets the user select the date range they want, we
can already see the advantages. You can guide the user by restricting values and only
displaying what you want them to see. Developing a story behind your visualization is
very important, and doing this is much easier if the user has ways of interacting with
the data.

Bokeh widgets work best when used in combination with the Bokeh server. However,
using the Bokeh server approach is beyond the content of this book, since we would
need to work with simple Python files. Instead, we will use a hybrid approach that
only works with the Jupyter Notebook.

We will look at the different widgets and how to use them before going in and
building a basic plot with one of them. There are a few different options regarding
how to trigger updates, which are also explained in this section. The widgets that will
be covered in the following exercise are explained in the following table:
Adding Widgets | 345

Figure 6.21: Some of the basic widgets with examples

The general way to create a new widget visible in a Jupyter Notebook is to define
a new method and wrap it into an interact widget. We'll be using the "syntactic
sugar" way of adding a decorator to a method—that is, by using annotations. This will
give us an interactive element that will be displayed after the executable cell, like in
the following example:

# importing the widgets


from ipywidgets import interact, interact_manual

# creating an input text


@interact(Value='Input Text')
def text_input(Value):
    print(Value)
346 | Making Things Interactive with Bokeh

The following screenshot shows the output of the preceding code:

Figure 6.22: Interactive text input

In the preceding example, we first import the interact element from the
ipywidgets library. This then allows us to define a new method and annotate it
with the @interact decorator.

The Value attribute tells the interact element which widget to use based on the
data type of the argument. In our example, we provide a string, which will give us a
TextBox widget. We can refer to the preceding table to determine which Value
data type will return which widget.

The print statement in the preceding code prints whatever has been entered in the
textbox below the widget.

Note
The methods that we can use interact with always have the same structure.
We will look at several examples in the following exercise.

Exercise 6.03: Building a Simple Plot Using Basic Interactivity Widgets


This first exercise of the Adding Widgets topic will give you a gentle introduction to the
different widgets and the general concept of how to use them. We will quickly go over
the most common widgets, sliders, checkboxes, and dropdowns to understand
their structure.

1. Create an Exercise6.03.ipynb Jupyter Notebook within the


Chapter06/Exercise6.03 folder to implement this exercise.
Adding Widgets | 347

2. Import and call the output_notebook method from Bokeh's io interface to


display the plots inside Jupyter Notebook:

# make bokeh display figures inside the notebook


from bokeh.io import output_notebook
output_notebook()

Looking at Basic Widgets

3. In this first task, we will add interactive widgets to the interactive element of
IPython. Import the necessary interact and interact_manual elements
from ipywidgets:

# importing the widgets


from ipywidgets import interact, interact_manual

4. Create a checkbox widget and print out the result of the interactive element:

@interact(Value=False)
def checkbox(Value=False):
    print(Value)

The following screenshot shows the output of the preceding code:

Figure 6.23: Interactive checkbox that will switch from False to True if checked

Note
@interact() is called a decorator. It wraps the annotated method into
the interact component. This allows us to display and react to the change of
the drop-down menu. The method will be executed every time the value of
the dropdown changes.
348 | Making Things Interactive with Bokeh

5. Create a dropdown using a list of options, ['Option1', 'Option2',


'Option3', 'Option4'] as the @interact decorator value:
# creating a dropdown
options=['Option1', 'Option2', 'Option3', 'Option4']

@interact(Value=options)
def dropdown(Value=options[0]):
    print(Value)

The following screenshot shows the output of the preceding code:

Figure 6.24: Interactive dropdown

6. Create a text input using a value of 'Input Text' as the @interact


decorator value:

# creating an input text


@interact(Value='Input Text')
def text_input(Value):
    print(Value)

The following screenshot shows the output of the preceding code:

Figure 6.25: Interactive text input


Adding Widgets | 349

7. Create two widgets, a dropdown and a checkbox with the same value, as in the
last two tasks:

# multiple widgets with default layout


options=['Option1', 'Option2', 'Option3', 'Option4']

@interact(Select=options, Display=False)
def uif(Select, Display):
    print(Select, Display)

The following screenshot shows the output of the preceding code:

Figure 6.26: Two widgets are displayed vertically by default

8. Create an int slider using a range value of (0,100) as the @interact


decorator value:

# creating an int slider with dynamic updates


@interact(Value=(0, 100))
def slider(Value=0):
    print(Value)

The following screenshot shows the output of the preceding code:

Figure 6.27: Interactive int slider


350 | Making Things Interactive with Bokeh

9. Create an int slider using values of 0 and 100 as the @interact decorator min
and max values. Set continuous_update to false to only trigger an update on
mouse release:

# creating an int slider that only triggers on mouse release


from ipywidgets import IntSlider
slider=IntSlider(min=0, max=100, continuous_update=False)

@interact(Value=slider)
def slider(Value=0.0):
    print(Value)

The following screenshot shows the output of the preceding code:

Figure 6.28: Interactive int slider that only triggers upon mouse release

Note
Although the outputs of Figure 6.27 and Figure 6.28 look the same, in Figure
6.28, the slider triggers only upon mouse release.

10. Use the @interact_manual decorator, which adds an execution button to


the output that triggers a manual update of the plot. Create an int slider using a
range value of (0.0,100.0,0.5) as the decorator value to set a step size
of 0.5:

# creating a float slider 0.5 steps with manual update trigger


@interact_manual(Value=(0.0, 100.0, 0.5))
def slider(Value=0.0):
    print(Value)
Adding Widgets | 351

The following screenshot shows the output of the preceding code:

Figure 6.29: Interactive int slider with a manual update trigger

Note
Compared to the previous cells, this one contains the interact_
manual decorator instead of interact. This will add an execution button that
will trigger the update of the value instead of triggering with every change.
This can be really useful when working with larger datasets, where the
recalculation time would be large. Because of this, you don't want to trigger
the execution for every small step, but only once you have selected the
correct value.

Note
To access the source code for this specific section, please refer to
https://packt.live/3e8G60B.

You can also run this example online at https://packt.live/37ANwXT.

After looking at several example widgets and how to create and use them in the
previous exercise, we will now use a real-world stock_price dataset to create a
basic plot and add simple interactive widgets.
352 | Making Things Interactive with Bokeh

Exercise 6.04: Plotting Stock Price Data in Tabs


In this exercise, we will revisit the essential widgets and build a simple plot that will
display the first 25 data points for the selected stock. We will display the stocks that
can be changed with a drop-down menu.

The dataset of this exercise is a stock_prices dataset. This means that we will be
looking at data over a range of time. As this is a large and variable dataset, it will be
easier to show and explain widgets such as slider and dropdown on it. The dataset
is available in the Datasets folder of the GitHub repository; here is the link to it:
https://packt.live/3bzApYN. Follow these steps:

1. Create an Exercise6.04.ipynb Jupyter Notebook in the


Chapter06/Exercise6.04 folder to implement this exercise.
2. Import the pandas library:

import pandas as pd

3. Import and call the output_notebook method from Bokeh's io interface to


display the plots inside Jupyter Notebook:

from bokeh.io import output_notebook


output_notebook()

4. After downloading the dataset and moving it into the Datasets folder of this
chapter, import our stock_prices.csv data:

dataset = pd.read_csv('../../Datasets/stock_prices.csv')

5. Test whether the data has been loaded successfully by executing the head
method on the dataset:

dataset.head()
Adding Widgets | 353

The following screenshot shows the output of the preceding code:

Figure 6.30: Loading the top five rows of the stock_prices dataset using the head method

Since the date column has no information about the hour, minute, and second,
we want to avoid displaying them in the visualization later on and display the
year, month, and day.

6. Create a new column that holds the formatted short version of the date value.
Print out the first five rows of the dataset to see the new column, short_date:

# mapping the date of each row to only the year-month-day format


from datetime import datetime
def shorten_time_stamp(timestamp):
    shortened = timestamp[0]
    if len(shortened) > 10:
        parsed_date=datetime.strptime(shortened, \
                                      '%Y-%m-%d %H:%M:%S')
        shortened=datetime.strftime(parsed_date, '%Y-%m-%d')
    return shortened
dataset['short_date'] = \
dataset.apply(lambda x: shorten_time_stamp(x), axis=1)

dataset.head()
354 | Making Things Interactive with Bokeh

The following screenshot shows the output of the preceding code:

Figure 6.31: Dataset with the added short_date column

Note
The execution of the cell will take a moment since it's a fairly large dataset.
Please be patient.

Creating a Basic Plot and Adding a Widget

In this task, we will create a basic visualization with the stock price dataset.
This will be your first interactive visualization in which you can dynamically
change the stock that is displayed in the graph. We will get used to one of the
aforementioned interactive widgets: the drop-down menu. It will be the main
point of interaction for our visualization.

7. Import the already-familiar figure and show methods from the plotting
interface. Since we also want to have a panel with two tabs displaying different
plot styles, also import the Panel and Tabs classes from the models interface:

from ipywidgets import interact


from bokeh.models.widgets import Panel, Tabs
from bokeh.plotting import figure, show
Adding Widgets | 355

To better structure, our notebook, write an adaptable method that gets a


subsection of stock data as an argument and builds a two-tab Pane object that
lets us switch between the two views in our visualization.

8. Create two tabs. The first tab will contain a line plot of the given data, while the
second will contain a circle-based representation of the same data. Create a
legend that will display the name of the currently viewed stock:

# method to build the tab-based plot


def get_plot(stock):
    stock_name=stock['symbol'].unique()[0]

    line_plot=figure(title='Stock prices', \
                     x_axis_label='Date', \
                     x_range=stock['short_date'], \
                     y_axis_label='Price in $USD')
    line_plot.line(stock['short_date'], stock['high'], \
                   legend_label=stock_name)
    line_plot.xaxis.major_label_orientation = 1

    circle_plot=figure(title='Stock prices', \
                       x_axis_label='Date', \
                       x_range=stock['short_date'], \
                       y_axis_label='Price in $USD')
    circle_plot.circle(stock['short_date'], stock['high'], \
                       legend_label=stock_name)
    circle_plot.xaxis.major_label_orientation = 1

    line_tab=Panel(child=line_plot, title='Line')
    circle_tab=Panel(child=circle_plot, title='Circles')
    tabs = Tabs(tabs=[ line_tab, circle_tab ])

    return tabs
356 | Making Things Interactive with Bokeh

9. Get a list of all the stock names in our dataset by using the unique method for
our symbol column:

# extracting all the stock names


stock_names=dataset['symbol'].unique()

Once we have done this, use this list as an input for the interact element.

10. Add the drop-down widget in the decorator and call the method that returns our
visualization in the show method with the selected stock. Only provide the first
25 entries of each stock. By default, the stock of Apple should be displayed; its
symbol in the dataset is AAPL. This will give us a visualization that is displayed
in a pane with two tabs. The first tab will display an interpolated line, and the
second tab will display the values as circles:

# creating the dropdown interaction and building the plot


@interact(Stock=stock_names)
def get_stock_for(Stock='AAPL'):
    stock = dataset[dataset['symbol'] == Stock][:25]
    show(get_plot(stock))
Adding Widgets | 357

The following screenshot shows the output of the preceding code:

Figure 6.32: Line tab with the data of AAPL displayed


358 | Making Things Interactive with Bokeh

The following screenshot shows the output of the code in step 11:

Figure 6.33: Circle tab with the data of AAPL displayed


Adding Widgets | 359

Note
We can already see that each date is displayed on the x-axis. If we want to
display a bigger time range, we have to customize the ticks on our x-axis.
This can be done using ticker objects.

Note
To access the source code for this specific section, please refer to
https://packt.live/3fnfPvI.

You can also run this example online at https://packt.live/3d7RqsH.

We have now covered the very basics of widgets and how to use them in a
Jupyter Notebook.

Note
If you want to learn more about using widgets and which widgets can be
used in Jupyter, visit https://ipywidgets.readthedocs.io/en/latest/examples/
Using%20Interact.html and https://ipywidgets.readthedocs.io/en/stable/
examples/Widget%20List.html.
360 | Making Things Interactive with Bokeh

In the following activity, we will make use of the Bokeh DataSource to add a
tooltip overlay to our plot that is displayed upon hovering over the data points.
DataSource can be helpful in several cases, for example, displaying a tooltip on
hovering the data points. In most cases, we can use pandas DataFrames to feed data
into our plot, but for certain features, such as tooltips, we have to use DataSource:

# using a ColumnDataSource to display a tooltip on hovering


from bokeh.models.sources import ColumnDataSource

data_source = \
ColumnDataSource(data=dict(vendor_name=dataset['vendor_name'], \
                           model=dataset['model'], \
                           cach=dataset['cach'], \
                           x=dataset['index'], \
                           y=dataset['cach']))

TOOLTIPS=[('Vendor', '@vendor_name'), ('Model', '@model'), \


          ('Cache', '@cach')]

plot = figure(title='Cache per Hardware', \


              x_axis_label='Hardware', \
              y_axis_label='Cache Memory', tooltips=TOOLTIPS)
plot.scatter('x', 'y', size=10, color='teal', source=data_source)

show(plot)
Adding Widgets | 361

The following screenshot shows the output of the preceding code:

Figure 6.34: Cache memory plotted as dots with tooltip overlay displaying the vendor,
model, and amount of memory

In the next activity, we will learn to extend plots using widgets.


362 | Making Things Interactive with Bokeh

Activity 6.02: Extending Plots with Widgets


In this activity, you will combine what you have already learned about Bokeh. You
will also need the skills you have acquired while working with pandas for additional
DataFrame handling. We will create an interactive visualization that lets us explore
the results of the 2016 Rio Olympics.

Our dataset contains the following columns:

• id: Unique ID of the athlete

• name: Name of the athlete

• nationality: Nationality of the athlete

• sex: Male or female

• dob: Date of birth of the athlete

• height: Height of the athlete

• weight: Weight of the athlete

• sport: Category the athlete is competing in

• gold: Number of gold medals the athlete won

• silver: Number of silver medals the athlete won

• bronze: Number of bronze medals the athlete won

We want to use the nationality, gold, silver, and bronze columns to create
a custom visualization that lets us dig through the Olympians.
Adding Widgets | 363

Our visualization will display each country that participated in a coordinate system
where the x-axis represents the number of medals won and the y-axis represents the
number of athletes. Using interactive widgets, we will be able to filter the displayed
countries by both the maximum number of medals won and the maximum amount
of athletes axes.

Figure 6.35: Final interactive visualization that displays the scatter plot
364 | Making Things Interactive with Bokeh

There are many options when it comes to choosing which interactivity to use. We will
focus on only two widgets to make it easier for you to understand the concepts. In
the end, we will have a visualization that allows us to filter countries for the number
of medals and athletes they placed in the Olympics and upon hovering over the single
data points, receive more information about each country:

1. Create an Activity6.02.ipynb Jupyter Notebook within the Chapter06/


Activity6.02 folder.
2. Enable notebook output using the bokeh.io interface. Import pandas and load
the dataset and make sure that the dataset is loaded by displaying the first five
elements of the dataset.

3. Import figure and show from Bokeh and interact and widgets from
ipywidgets to get started.
4. Load our olympia2016_athletes.csv dataset from the Datasets folder
and set up the interaction elements. Scroll down until you reach the cell that says
getting the max number of medals and athletes of all countries. Extract the
two numbers from the dataset.

5. Create widgets for IntSlider for the maximum number of athletes


(orientation vertical) and IntSlider for the maximum number of medals
(orientation horizontal).

6. Set up the @interact method, which will display the complete visualization.
The only code we will write here is to show the return value of the get_plot
method that gets all the interaction element values as parameters.

7. Implement the decorator method, move up in the Notebook, and work on the
get_plot method.
8. First, filter our countries dataset that contains all the countries that placed
athletes in the Olympic games. Check whether they have a lower or equal
number of medals and athletes than our max values passed as arguments.

9. Create our DataSource and use it for the tooltips and the printing of the
circle glyphs.

10. After that, create a new plot using the figure method that has the following
attributes: title set to Rio Olympics 2016 - Medal comparison, x_
axis_label set to Number of Medals, and y_axis_label set to Num
of Athletes.
Summary | 365

11. Execute every cell starting from the get_plot cell to the bottom—again,
making sure that all implementations are captured.

12. When executing the cell that contains the @interact decorator, you will
see a scatter plot that displays a circle for every country displaying additional
information, such as the shortcode of the country, the number of athletes,
and the number of gold, silver, and bronze medals.

Note
The solution for this activity can be found on page 465.

As we mentioned before, when working with interactive features and Bokeh, you
might want to read up about the Bokeh server a little bit more. It will give you more
options to express your creativity by creating animated plots and visualizations that
can be explored by several people at the same time.

Summary
In this chapter, we have looked at another option for creating visualizations with a
whole new focus: web-based Bokeh plots. We also discovered ways in which we can
make our visualizations more interactive and give the user the chance to explore data
in a different way.

As we mentioned in the first part of this chapter, Bokeh is a comparably new tool
that empowers developers to use their favorite language to create easily portable
visualizations for the web. After working with Matplotlib, Seaborn, geoplotlib, and
Bokeh, we can see some standard interfaces and similar ways to work with those
libraries. After studying the tools that are covered in this book, it will be simple to
understand new plotting tools.

In the next and final chapter, we will introduce a new real-life dataset to create
visualizations. This last chapter will allow you to consolidate the concepts and tools
that you have learned about in this book and further enhance your skills.

You might also like