Python Support
Python Support
Another commonly used plot type is the simple scatter plot, a close cousin of the line
plot. Instead of points being joined by line segments, here the points are represented
individually with a dot, circle, or other shape. We’ll start by setting up the notebook
for plotting and importing the functions we will use:
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
The third argument in the function call is a character that represents the type of sym‐
bol used for the plotting. Just as you can specify options such as '-' and '--' to con‐
trol the line style, the marker style has its own set of short string codes. The full list of
available symbols can be seen in the documentation of plt.plot, or in Matplotlib’s
online documentation. Most of the possibilities are fairly intuitive, and we’ll show a
number of the more common ones here (Figure 4-21):
In[3]: rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
plt.plot(rng.rand(5), rng.rand(5), marker,
label="marker='{0}'".format(marker))
For even more possibilities, these character codes can be used together with line and
color codes to plot points along with a line connecting them (Figure 4-22):
In[4]: plt.plot(x, y, '-ok'); # line (-), circle marker (o), black (k)
This type of flexibility in the plt.plot function allows for a wide variety of possible
visualization options. For a full description of the options available, refer to the
plt.plot documentation.
The primary difference of plt.scatter from plt.plot is that it can be used to create
scatter plots where the properties of each individual point (size, face color, edge color,
etc.) can be individually controlled or mapped to data.
Let’s show this by creating a random scatter plot with points of many colors and sizes.
In order to better see the overlapping results, we’ll also use the alpha keyword to
adjust the transparency level (Figure 4-25):
Notice that the color argument is automatically mapped to a color scale (shown here
by the colorbar() command), and the size argument is given in pixels. In this way,
the color and size of points can be used to convey information in the visualization, in
order to illustrate multidimensional data.
For example, we might use the Iris data from Scikit-Learn, where each sample is one
of three types of flowers that has had the size of its petals and sepals carefully meas‐
ured (Figure 4-26):
In[8]: from sklearn.datasets import load_iris
iris = load_iris()
features = iris.data.T
We can see that this scatter plot has given us the ability to simultaneously explore
four different dimensions of the data: the (x, y) location of each point corresponds to
the sepal length and width, the size of the point is related to the petal width, and the
color is related to the particular species of flower. Multicolor and multifeature scatter
plots like this can be useful for both exploration and presentation of data.
Visualizing Errors
For any scientific measurement, accurate accounting for errors is nearly as important,
if not more important, than accurate reporting of the number itself. For example,
imagine that I am using some astrophysical observations to estimate the Hubble Con‐
stant, the local measurement of the expansion rate of the universe. I know that the
current literature suggests a value of around 71 (km/s)/Mpc, and I measure a value of
74 (km/s)/Mpc with my method. Are the values consistent? The only correct answer,
given this information, is this: there is no way to know.
Basic Errorbars
A basic errorbar can be created with a single Matplotlib function call (Figure 4-27):
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
In[2]: x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)
Here the fmt is a format code controlling the appearance of lines and points, and has
the same syntax as the shorthand used in plt.plot, outlined in “Simple Line Plots”
on page 224 and “Simple Scatter Plots” on page 233.
In addition to these basic options, the errorbar function has many options to fine-
tune the outputs. Using these additional options you can easily customize the aesthet‐
ics of your errorbar plot. I often find it helpful, especially in crowded plots, to make
the errorbars lighter than the points themselves (Figure 4-28):
In[3]: plt.errorbar(x, y, yerr=dy, fmt='o', color='black',
ecolor='lightgray', elinewidth=3, capsize=0);
In addition to these options, you can also specify horizontal errorbars (xerr), one-
sided errorbars, and many other variants. For more information on the options avail‐
able, refer to the docstring of plt.errorbar.
Continuous Errors
In some situations it is desirable to show errorbars on continuous quantities. Though
Matplotlib does not have a built-in convenience routine for this type of application,
it’s relatively easy to combine primitives like plt.plot and plt.fill_between for a
useful result.
Here we’ll perform a simple Gaussian process regression (GPR), using the Scikit-Learn
API (see “Introducing Scikit-Learn” on page 343 for details). This is a method of fit‐
ting a very flexible nonparametric function to data with a continuous measure of the
uncertainty. We won’t delve into the details of Gaussian process regression at this
point, but will focus instead on how you might visualize such a continuous error
measurement:
In[4]: from sklearn.gaussian_process import GaussianProcess
Note what we’ve done here with the fill_between function: we pass an x value, then
the lower y-bound, then the upper y-bound, and the result is that the area between
these regions is filled.
The resulting figure gives a very intuitive view into what the Gaussian process regres‐
sion algorithm is doing: in regions near a measured data point, the model is strongly
constrained and this is reflected in the small model errors. In regions far from a
measured data point, the model is not strongly constrained, and the model errors
increase.
For more information on the options available in plt.fill_between() (and the
closely related plt.fill() function), see the function docstring or the Matplotlib
documentation.
Finally, if this seems a bit too low level for your taste, refer to “Visualization with Sea‐
born” on page 311, where we discuss the Seaborn package, which has a more stream‐
lined API for visualizing this type of continuous errorbar.