0% found this document useful (0 votes)
48 views

Unit V Notes

DATA SCIENCE CLASS NOTES

Uploaded by

maryjoseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Unit V Notes

DATA SCIENCE CLASS NOTES

Uploaded by

maryjoseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

UNIT V

VISUALIZING USING MATPLOTLIB


Importing Matplotlib – Line plots – Scatter plots – visualizing errors –
density and contour plots – Histograms – legends – colors – subplots –
text and annotation – customization – three dimensional plotting -
Geographic Data with Basemap – Visualization with Seaborn.

Features of Matplotlib
 a well-tested, cross-platform graphics engine
 Has the ability to work with many operating systems and graphics
backends.
 supports dozens of backends and output types, regardless of
operating systems or output formats
 cross-platform, everything-to-everyone approach
 easy to set new global plotting styles
 Seaborn, ggplot, HoloViews, Altair, and even Pandas packages can
be used as wrappers around Matplotlib’s API
o With these wrappers, matplotlib’s syntax can be used to
adjust the final plot output

1. Importing matplotlib
 np - NumPy
 pd - Pandas

Some standard shorthands for Matplotlib imports:


import matplotlib as mpl
import matplotlib.pyplot as plt

Setting Styles

 plt.style – a directive to choose appropriate aesthetic styles for the


figures.
Example:
To set the classic style, use the classic Matplotlib style for the plots
created:

plt.style.use('classic')
How to Display Your Plots? show() or No show()?
 The three applicable contexts of using Matplotlib are
o script
o IPython terminal, or
o IPython notebook

Plotting from a script


 To use Matplotlib from within a script - use the function plt.show()
 plt.show()
o starts an event loop,
o looks for all currently active figure objects, and
o opens one or more interactive windows that display the
figure or figures

# ------- file: myplot.py ------


import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
plt.show()

 running this script from the command-line prompt results in a


window opening with your figure displayed
 The plt.show() command
o interacts with the system’s interactive graphical backend.
o should be used only once per Python session
o most often seen at the very end of the script.
 Multiple show() commands leads to unpredictable backend-
dependent behavior, and should mostly be avoided.

Plotting from an IPython shell


 It can be very convenient to use Matplotlib interactively within an
IPython shell.
 IPython is built to work well with Matplotlib by specifying Matplotlib
mode.
 To enable this mode, the %matplotlib magic command can be used
after starting ipython:
In [1]: %matplotlib
Using matplotlib backend: TkAgg
In [2]: import matplotlib.pyplot as plt

 At this point, any plt.plot command causes a figure window to open,


and further commands can be run to update the plot.
 Some changes (such as modifying properties of lines that are
already drawn) will not draw automatically.
 To force an update, plt.draw() can be used.
 Using plt.show() in Matplotlib mode is not required.

Plotting from an IPython notebook


 The IPython notebook is a browser-based interactive data analysis
tool that can combine narrative, code, graphics, HTML elements,
and much more into a single executable document
 Two possible options of embedding graphics directly in the
notebook
o %matplotlib notebook will lead to interactive plots embedded
within the notebook
o %matplotlib inline will lead to static images of your plot
embedded in the notebook
 This command runs any cell within the notebook that
create a plot embeds a PNG image of the resulting
graphic.
In[3]: %matplotlib inline
In[4]: import numpy as np
x = np.linspace(0, 10, 100)
fig = plt.figure()
plt.plot(x, np.sin(x), '-')
plt.plot(x, np.cos(x), '--');
Saving Figures to File
 Matplotlib has the ability to save figures in a wide variety of
formats.
 savefig() command - used to save a figure.
 IPython Image object - used to display the contents of this file.

In[5]: fig.savefig('my_figure.png')
from IPython.display import Image
Image('my_figure.png')

 To get a list of supported file types for your system - Use the
following method of the figure canvas object:

In[8]: fig.canvas.get_supported_filetypes()
Out[8]: {'eps': 'Encapsulated Postscript',
'jpeg': 'Joint Photographic Experts Group',
'jpg': 'Joint Photographic Experts Group',
'pdf': 'Portable Document Format',
'pgf': 'PGF code for LaTeX',
'png': 'Portable Network Graphics',
'ps': 'Postscript',
'raw': 'Raw RGBA bitmap',
'rgba': 'Raw RGBA bitmap',
'svg': 'Scalable Vector Graphics',
'svgz': 'Scalable Vector Graphics',
'tif': 'Tagged Image File Format',
'tiff': 'Tagged Image File Format'}

Two Interfaces
 A potentially confusing feature of Matplotlib is its dual interfaces:
o a convenient MATLAB-style state-based interface, and
o a more powerful object-oriented interface.
i. MATLAB-style interface Matplotlib
 originally written as a Python alternative for MATLAB users, and
much of its syntax reflects that fact.
 The MATLAB-style tools are contained in the pyplot (plt) interface.
 This interface is stateful: it keeps track of the “current” figure and
axes, where all plt commands are applied.
o To get current figure – plt.gcf()
o To get current axes - plt.gca()
 This interface is fast and convenient for simple plots.

o Example: Creating Subplots using the MATLAB-style interface


In[9]: plt.figure() # create a plot figure
# create the first of two panels and set current axis
plt.subplot(2, 1, 1) # (rows, columns, panel number)
plt.plot(x, np.sin(x))
# create the second panel and set current axis
plt.subplot(2, 1, 2)
plt.plot(x, np.cos(x));

ii. Object-oriented interface


 This interface is available for more complicated situations, and
when more control over the figure is required.
 Rather than depending on some notion of an“active”figure or
axes, in the object-oriented interface the plotting functions are
methods of explicit Figure and Axes objects.
 To re-create the previous plot using this style of plotting,

In[10]: # First create a grid of plots


# ax will be an array of two Axes objects
fig, ax = plt.subplots(2)
# Call plot() method on the appropriate object
ax[0].plot(x, np.sin(x))
ax[1].plot(x, np.cos(x));
2. Simple Line Plots
 Simplest of all plots is the visualization of a single function y = f (x).
 Creating a simple plot
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

 For all Matplotlib plots, we start by creating


o a figure
o an axes
 Creation of a figure and axes
In[2]: fig = plt.figure()
ax = plt.axes()
 The figure is
o an instance of the class plt.figure.
o a single container that contains all the objects representing
axes, graphics, text, and labels.
o Referred as fig.
 The axes are
o an instance of the class plt.axes.
o a bounding box with ticks and labels, which will eventually
contain the plot elements that contributes for visualization.
o Referred as ax.
 After creating an axes, the ax.plot function can be used to plot some
data.
A simple sinusoid
In[3]: fig = plt.figure()
ax = plt.axes()
x = np.linspace(0, 10, 1000)
ax.plot(x, np.sin(x));
 Alternatively, using the pylab interface, the figure and axes can be
created in the background.

Simple sinusoid via the object-oriented interface


In[4]: plt.plot(x, np.sin(x));

Over-plotting multiple lines


 To create a single figure with multiple lines, the plot function can
be used multiple times.
In[5]: plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x));

i. Adjusting the Plot: Line Colors and Styles


o The first adjustment to be made to a plot is to control the line
colors and styles.
o The plt.plot() function takes additional arguments to specify line
colors and styles.

1. Adjusting the line color


o the color keyword is used
o accepts a string argument representing virtually any
imaginable color.
o If no color is specified, Matplotlib will automatically cycle
through a set of default colors for multiple lines.

Controlling the color of plot elements


In[6]: plt.plot(x, np.sin(x - 0), color='blue') # specify color by name
plt.plot(x, np.sin(x - 1), color='g') # short color code (rgbcmyk)
plt.plot(x, np.sin(x - 2), color='0.75') # Grayscale between 0 and 1
plt.plot(x, np.sin(x - 3), color='#FFDD44') # Hex code (RRGGBB
from 00 to FF)
plt.plot(x, np.sin(x - 4), color=(1.0,0.2,0.3)) # RGB tuple, values 0 and
1
plt.plot(x, np.sin(x - 5), color='chartreuse'); # all HTML color names
supported

2. Adjusting the line style


o The line style is adjusted using the linestyle keyword.
Example of various line styles
In[7]: plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1, linestyle='dashed')
plt.plot(x, x + 2, linestyle='dashdot')
plt.plot(x, x + 3, linestyle='dotted');
# For short, the following codes can be used:
plt.plot(x, x + 4, linestyle='-') # solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.') # dashdot
plt.plot(x, x + 7, linestyle=':'); # dotted
Controlling colors and styles with the shorthand syntax
o The linestyle and color codes can be combined into a single
non-keyword argument to the plt.plot() function:

In[8]: plt.plot(x, x + 0, '-g') # solid green


plt.plot(x, x + 1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') # dashdot black
plt.plot(x, x + 3, ':r'); # dotted red

Single-character color codes


 reflect the standard abbreviations in the
o RGB (Red/Green/Blue) and
o CMYK (Cyan/Magenta/Yellow/blacK) color systems
 commonly used for digital color graphics.

There are many other keyword arguments that can be used to fine-tune
the appearance of the plot.

ii. Adjusting the Plot: Axes Limits


 Matplotlib chooses default axes limits for plots
 But sometimes it is required to have finer control.

a. Setting the axis limits with plt.xlim() and plt.ylim()


 The most basic way to adjust axis limits is to use the
o plt.xlim() method
o plt.ylim() method

Example of setting axis limits


In[9]: plt.plot(x, np.sin(x))
plt.xlim(-1, 11)
plt.ylim(-1.5, 1.5);
Example of reversing the y-axis
In[10]: plt.plot(x, np.sin(x))
plt.xlim(10, 0)
plt.ylim(1.2, -1.2);

b. Setting the axis limits with plt.axis


 The plt.axis() method sets the x and y limits with a single call, by
passing a list that specifies [xmin, xmax, ymin, ymax].
Example for setting axis limits using plt.axis
In[11]: plt.plot(x, np.sin(x))
plt.axis([-1, 11, -1.5, 1.5]);

Example of a “tight” layout


 This method automatically tightens the bounds around the current
plot.
In[12]: plt.plot(x, np.sin(x))
plt.axis('tight');
Example of an “equal” layout, with units matched to the output resolution
 It allows even higher-level specifications, such as ensuring an
equal aspect ratio so that on your screen, one unit in x is equal to
one unit in y.
In[13]: plt.plot(x, np.sin(x))
plt.axis('equal');

iii. Labeling Plots


 titles, axis labels, and simple legends.
 Titles and axis labels are the simplest such labels—there are
methods that can be used to quickly set them.

Examples of axis labels and title


In[14]: plt.plot(x, np.sin(x))
plt.title("A Sine Curve")
plt.xlabel("x")
plt.ylabel("sin(x)");
plt.legend() function
 Matplotlib has a built-in way of quickly creating such a legend.
 keeps track of the line style and color, and matches these with the
correct label.
 The position, size, and style of these labels can be adjusted using
optional arguments to the function.
 When multiple lines are being shown within a single axes, it can be
useful to create a plot legend that labels each line type.
 The label of each line is specified using the label keyword of the
plot function.
Plot legend example
In[15]: plt.plot(x, np.sin(x), '-g', label='sin(x)')
plt.plot(x, np.cos(x), ':b', label='cos(x)')
plt.axis('equal')
plt.legend();

3. Simple Scatter Plots


 commonly used plot type
 Two methods to generate scatter plots: plt.plot and plt.scatter
 Instead of points being joined by line segments, the points can be
represented individually with a dot, circle, or other shapes

Scatter plot example


In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

In[2]: x = np.linspace(0, 10, 30)


y = np.sin(x)
plt.plot(x, y, 'o', color='black');
 The third argument in the function call is a character that
represents the type of symbol used for the plotting
 The marker style has its own set of short string codes.

Demonstration of point numbers


In[3]: rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
plt.plot(rng.rand(5), rng.rand(5), marker,
label="marker='{0}'".format(marker))
plt.legend(numpoints=1)
plt.xlim(0, 1.8);

Combining line and point markers


In[4]: plt.plot(x, y, '-ok'); # line (-), circle marker (o), black (k)
Customizing line and point numbers
In[5]: plt.plot(x, y, '-p', color='gray', markersize=15, linewidth=4,
markerfacecolor='white', markeredgecolor='gray', markeredgewidth=2)
plt.ylim(-1.2, 1.2);

Scatter Plots with plt.scatter


o A second, more powerful method of creating scatter plots is the
plt.scatter function, which can be used very similarly to the
plt.plot function.
o The alpha keyword is used to adjust the transparency level.

A Simple Scatter Plot


In[6]: plt.scatter(x, y, marker='o');

Changing size, color, and transparency in scatter points


In[7]: rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)
plt.scatter(x, y, c=colors, s=sizes, alpha=0.3, cmap='viridis')
plt.colorbar(); # show color scale
Using point properties to encode features of the Iris data
In[8]: from sklearn.datasets import load_iris
iris = load_iris()
features = iris.data.T
plt.scatter(features[0], features[1], alpha=0.2,
s=100*features[3], c=iris.target, cmap='viridis')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1]);

 This scatter plot has the ability to simultaneously explore four


different dimensions of the data:
o the (x, y) location of each point corresponds to the sepal length
and width
o the size of the point is related to the petal width
o the color is related to the particular species of flower
o Multicolor and multi-feature scatter plots can be useful for both
exploration and presentation of data.

plot Versus scatter: A Note on Efficiency


 plt.plot is more efficient than plt.scatter.
 The plt.scatter function has the capability to render a different size
and/or color for each point, so the renderer must do the extra work
of constructing each point individually.
 In plt.plot, the points are always essentially clones of each other,
so the work of determining the appearance of the points is done
only once for the entire set of data.
 For large datasets, the difference between these two can lead to
vastly different performance, and for this reason, plt.plot should be
preferred over plt.scatter for large datasets.

4. Visualizing Errors
 An error is the difference between the calculated value and actual
value.
 For any scientific measurement, accurate accounting for errors is
more important than accurate reporting of the number itself.
 While representing the data graphically, some of the data have
irregularity.
 These irregularities or errors can be effectively visualized using
error bars and the plots give much more complete information.

Basic Errorbars
A basic errorbar can be created with a single Matplotlib function call:
An errorbar example
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)
plt.errorbar(x, y, yerr=dy, fmt='.k');
Customizing errorbars
 The errorbar function has many options (parameters) to finetune
the outputs.
o x: specifies horizontal coordinates of the data points.
o y: specifies vertical coordinates of the data points.
o xerr: Define the horizontal error bar sizes. Must have a
float or array-like shape.
o yerr: Define the vertical error bar sizes. Must have a float
or array-like shape.
o fmt: format code controlling the appearance of lines and
points. Contains string value. By default, this plots error
bars with markers. Use ‘none’ to plot error bars without
markers.
o ecolor: specifies the color of the error bars.
o elinewidth: specifies linewidth of the error bars.
o capsize: specifies the length of error bars in points or
float.
 Example:
plt.errorbar(x, y, yerr=dy, fmt='o', color='black', ecolor='lightgray',
elinewidth=3, capsize=0);

 Using these additional options, the errorbar plot can be easily


customized.

Continuous Errors
 In some situations it is desirable to show errorbars on continuous
quantities.
 Matplotlib does not have a built-in convenience routine for this type
of application.
 So, primitives like plt.plot and plt.fill_between are combined for a
useful result.
 Example: performing a simple Gaussian process regression (GPR),
using the Scikit-Learn API. This is a method of fitting a very flexible
nonparametric function to data with a continuous measure of the
uncertainty.

from sklearn.gaussian_process import GaussianProcess


# define the model and draw some data
model = lambda x: x * np.sin(x)
xdata = np.array([1, 3, 5, 6, 8])
ydata = model(xdata)
# Compute the Gaussian process fit
gp = GaussianProcess(corr='cubic', theta0=1e-2, thetaL=1e-4, thetaU=1E-
1, random_start=100)
gp.fit(xdata[:, np.newaxis], ydata)
xfit = np.linspace(0, 10, 1000)
yfit, MSE = gp.predict(xfit[:, np.newaxis], eval_MSE=True)
dyfit = 2 * np.sqrt(MSE) # 2*sigma ~ 95% confidence region

 xfit, yfit, and dyfit - sample the continuous fit to the data.
 To avoid plotting 1,000 points with 1,000 errorbars, the
plt.fill_between function can be used with a light color to visualize
this continuous error.

Representing continuous uncertainty with filled regions


plt.plot(xdata, ydata, 'or')
plt.plot(xfit, yfit, '-', color='gray')
plt.fill_between(xfit, yfit - dyfit, yfit + dyfit, color='gray', alpha=0.2)
plt.xlim(0, 10);

 Using the fill_between function: an x value, the lower y-bound, the


upper y-bound is passed, and the result is that the area between
these regions is filled.
 The resulting figure is the working of the Gaussian process
regression algorithm
o In regions near a measured data point, the model is strongly
constrained and this is reflected in the small model errors.
o In regions far from a measured data point, the model is not
strongly constrained, and the model errors increase.

5. Density and Contour Plots


 It is useful to display three-dimensional data in two dimensions
using contours or color-coded regions.
 The three Matplotlib functions that can be used for this task:
o plt.contour for contour plots
o plt.contourf for filled contour plots
o plt.imshow for showing images

Visualizing a Three-Dimensional Function


plt.contour
 A contour plot can be created with the plt.contour function.
 It takes three arguments: a grid of x values, a grid of y values, and
a grid of z values.
 The x and y values represent positions on the plot, and the z values
will be represented by the contour levels.
 The np.meshgrid function is used to build two-dimensional grids
from one-dimensional arrays.
 Examples:
plt.contour(X, Y, Z, colors='black');
plt.contour(X, Y, Z, 20, cmap='RdGy');
 Parameters of plt.contour function:
o X, Y: 2-D numpy arrays with same shape as Z, or
1-D arrays such that len(X)==M and len(Y)==N (where M
and N are rows and columns of Z)
o Z: The height values over which the contour is drawn.
o levels: Determines the number and positions of the contour
lines / regions.
o colors: The colors of the levels, i.e. the lines for contour and the
areas for contourf.
o cmap: color code the lines by specifying a colormap.
Example: cmap='RdGy'
RdGy (Red-Gray) colormap - good choice for centered data.
 plt.contourf() function – Using a filled contour plot, the spaces
between the lines are filled.
 plt.colorbar() command is additionally added while using
plt.contourf()
o automatically creates an additional axis with labeled color
information for the plot.
o black regions are “peaks,” while the red regions are “valley”
Drawback and remedies of generating contour plots
 Potential issue with this plot: splotchy (marked or covered with
large, irregular spots)
 the color steps are discrete rather than continuous, which is
not always what is desired
 Two solutions to solve this issue:
1. Setting the number of contours to a very high number, but
this results in a rather inefficient plot.
2. Better way to handle this is to use the plt.imshow() function,
which interprets a two-dimensional grid of data as an image.
plt.imshow()
 Does not accept an x and y grid, so the extent [xmin, xmax,
ymin, ymax] of the image on the plot should be manually
specified.
 By default, it follows the standard image array definition
where the origin is in the upper left, not in the lower left as
in most contour plots. This must be changed when showing
gridded data.
 Automatically adjusts the axis aspect ratio to match the input
data; This can be changed by setting, plt.axis(aspect='image')
to make x and y units match.
 Example:
plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower', cmap='RdGy',
alpha=0.5)

 Parameters of plt.imshow function:


o Z: indicates data of the image.
o extent: indicates bounding box in data coordinates.
o origin: used to place the [0, 0] index of the array in the
upper left or lower left corner of the axes.
o cmap: colormap instance or registered colormap name.
o alpha: represents intensity of the color.
 The plt.clabel() - function used to over-plot contours with labels on
the contours themselves.
Generation of contour plots using a function z = f (x, y):
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
In[2]: def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
In[3]: x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
# Creating 2-D grid of features
X, Y = np.meshgrid(x, y)
Z = f(X, Y)

Visualizing three-dimensional data with contours


# plots contour lines
In[4]: plt.contour(X, Y, Z, colors='black');

Visualizing three-dimensional data with colored contours


In[5]: plt.contour(X, Y, Z, 20, cmap='RdGy');

Visualizing three-dimensional data with filled contours


In[6]: plt.contourf(X, Y, Z, 20, cmap='RdGy')
plt.colorbar();
Representing three-dimensional data as an image
In[7]: plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower', cmap='RdGy')
plt.colorbar()
plt.axis(aspect='image');

Combining contour plots and image plots


Labeled contours on top of an image
In[8]: contours = plt.contour(X, Y, Z, 3, colors='black')
plt.clabel(contours, inline=True, fontsize=8)
plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower', cmap='RdGy',
alpha=0.5)
plt.colorbar();

6. Histograms, Binnings, and Density


 The hist() function in pyplot module of matplotlib library is used to
plot a histogram.
In[1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
data = np.random.randn(1000)

A simple histogram
In[2]: plt.hist(data);

A customized histogram
In[3]: plt.hist(data, bins=30, normed=True, alpha=0.5, histtype='stepfilled',
color='steelblue', edgecolor='none');

Over-plotting multiple histograms


In[4]: x1 = np.random.normal(0, 0.8, 1000)
x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)
kwargs = dict(histtype='stepfilled', alpha=0.3, normed=True,
bins=40)
plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);
 np.histogram() function - counts the number of points in a given bin

In[5]: counts, bin_edges = np.histogram(data, bins=5)


print(counts)
Out[5]: [ 12 190 468 301 29]

Two-Dimensional Histograms and Binnings


 Defining some data—an x and y array drawn from a multivariate
Gaussian distribution
In[6]: mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 10000).T

A two-dimensional histogram with plt.hist2d


 plt.hist2d is used to plot a two-dimensional histogram
 Syntax:
plt.hist2d(x, y, bins=30, cmap='Blues')
 Parameters of plt.hist2d function:
o x, y : denotes sequence of input data.
o bins : optional parameter that contains the integer or sequence
or string.
o cmap: refers to colormap instance or registered colormap
name used to map scalar data to colors.
In[12]: plt.hist2d(x, y, bins=30, cmap='Blues')
cb = plt.colorbar()
cb.set_label('counts in bin')
In[8]: counts, xedges, yedges = np.histogram2d(x, y, bins=30)

A two-dimensional histogram with plt.hexbin


 The hexbin() function is used to make a 2D hexagonal binning plot
of points x, y.
 gridsize parameter: represents the number of hexagons in the x-
direction or both direction.

In[9]: plt.hexbin(x, y, gridsize=30, cmap='Blues')


cb = plt.colorbar(label='count in bin')

 np.histogramdd function – used for histogram binning in


dimensions higher than two.

Kernel density estimation


 Another common method of evaluating densities in multiple
dimensions is Kernel Density Estimation (KDE).
 KDE can be thought of as a way to “smear out” the points in space
and add up the result to obtain a smooth function.
 One extremely quick and simple KDE implementation exists in the
scipy.stats package.
 gaussian_kde finds a nearly optimal smoothing length for the input
data.
Kernel density representation of a distribution
In[10]: from scipy.stats import gaussian_kde
# fit an array of size [Ndim, Nsamples]
data = np.vstack([x, y])
kde = gaussian_kde(data)
# evaluate on a regular grid
xgrid = np.linspace(-3.5, 3.5, 40)
ygrid = np.linspace(-6, 6, 40)
Xgrid, Ygrid = np.meshgrid(xgrid, ygrid)
Z = kde.evaluate(np.vstack([Xgrid.ravel(), Ygrid.ravel()]))
# Plot the result as an image
plt.imshow(Z.reshape(Xgrid.shape), origin='lower', aspect='auto',
extent=[-3.5, 3.5, -6, 6], cmap='Blues')
cb = plt.colorbar()
cb.set_label("density")

7. Customizing Plot Legends


 Plot legends give meaning to a visualization, assigning labels to the
various plot elements.
 plt.legend() command - automatically creates a legend for any
labeled plot elements
 Parameters of plt.legend function
o loc - specifies the location of the legend. The string values
‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the
legend at the corresponding corner of the axes/figure.
o frameon – returns boolean (true or false) value, whether the
legend should be drawn on a patch (frame).
o ncol - specifies the number of columns in the legend
o fancybox - Whether round edges should be enabled around
the FancyBboxPatch which makes up the legend's background.
o shadow: Whether to draw a shadow behind the legend.
o framealpha: change the transparency (alpha value) of the frame

A default plot legend


In[1]: import matplotlib.pyplot as plt
plt.style.use('classic')
In[2]: %matplotlib inline
import numpy as np
In[3]: x = np.linspace(0, 10, 1000)
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), '-b', label='Sine')
ax.plot(x, np.cos(x), '--r', label='Cosine')
ax.axis('equal')
leg = ax.legend();

A customized plot legend


In[4]: ax.legend(loc='upper left', frameon=False)
fig

A two-column plot legend


 ncol command – used to specify the number of columns in the
legend
In[5]: ax.legend(frameon=False, loc='lower center', ncol=2)
fig

A fancybox plot legend


In[6]: ax.legend(fancybox=True, framealpha=1, shadow=True,
borderpad=1)
fig
Choosing Elements for the Legend
 The legend includes all labeled elements by default.
 If this is not what is desired, we can fine-tune which elements and
labels appear in the legend by using the objects returned by plot
commands.
 The plt.plot() command is able to create multiple lines at once, and
returns a list of created line instances.
 Passing any of these to plt.legend() will tell it which to identify,
along with the labels we’d like to specify
Customization of legend elements
In[7]: y = np.sin(x[:, np.newaxis] + np.pi * np.arange(0, 2, 0.5))
lines = plt.plot(x, y)
# lines is a list of plt.Line2D instances
plt.legend(lines[:2], ['first', 'second']);

Alternative method of customizing legend elements


In[8]: plt.plot(x, y[:, 0], label='first')
plt.plot(x, y[:, 1], label='second')
plt.plot(x, y[:, 2:])
plt.legend(framealpha=1, frameon=True);
Legend for Size of Points
 The legend references some objects in the plot.
 Sometimes the legend defaults are not sufficient for the given
visualization.
 Example:
Using the size of points to indicate populations of California cities.
Also, the following code creates a legend that specifies the scale
of the sizes of the points and plots some labeled data with no
entries.
Location, geographic size, and population of California cities
In[9]: import pandas as pd
cities = pd.read_csv('data/california_cities.csv')
# Extract the data
lat, lon = cities['latd'], cities['longd']
population, area = cities['population_total'], cities['area_total_km2']
# Scatter the points, using size and color but no label
plt.scatter(lon, lat, label=None, c=np.log10(population),
cmap='viridis', s=area, linewidth=0, alpha=0.5)
plt.axis(aspect='equal')
plt.xlabel('longitude')
plt.ylabel('latitude')
plt.colorbar(label='log$_{10}$(population)')
plt.clim(3, 7)
# Create a legend and plot empty lists with the desired size and label
for area in [100, 300, 500]:
plt.scatter([], [], c='k', alpha=0.3, s=area, label=str(area) +
'km$^2$')
plt.legend(scatterpoints=1, frameon=False, labelspacing=1, title='City
Area')
plt.title('California Cities: Area and Population');
Multiple Legends
 Sometimes when designing a plot, it is desirable to add multiple
legends to the same axes.
 Matplotlib does not make this easy: via the standard legend
interface.
 It is only possible to create a single legend for the entire plot.
 Trying to create a second legend using plt.legend() or ax.legend(),
will override the first one.
 To work on this, a new legend artist is created from scratch, and
then lower-level ax.add_artist() method is used to manually add the
second artist to the plot.
 The ax.legend() creates a suitable Legend artist, which is then
saved in the legend_attribute and added to the figure when the plot
is drawn.
A split plot legend
In[10]: fig, ax = plt.subplots()
lines = []
styles = ['-', '--', '-.', ':']
x = np.linspace(0, 10, 1000)
for i in range(4):
lines += ax.plot(x, np.sin(x - i * np.pi / 2), styles[i], color='black')
ax.axis('equal')
# specify the lines and labels of the first legend
ax.legend(lines[:2], ['line A', 'line B'], loc='upper right',
frameon=False)
# Create the second legend and add the artist manually.
from matplotlib.legend import Legend
leg = Legend(ax, lines[2:], ['line C', 'line D'], loc='lower right',
frameon=False)
ax.add_artist(leg);
8. Customizing Colorbars
 Plot legends identify discrete labels of discrete points.
 For continuous labels based on the color of points, lines, or regions,
a labeled colorbar can be a great tool.
 In Matplotlib, a colorbar is a separate axes that can provide a key
for the meaning of colors in a plot.
 The simplest colorbar can be created with the plt.colorbar function.

A simple colorbar legend


In[1]: import matplotlib.pyplot as plt
plt.style.use('classic')
In[2]: %matplotlib inline
import numpy as np
In[3]: x = np.linspace(0, 10, 1000)
I = np.sin(x) * np.cos(x[:, np.newaxis])
plt.imshow(I)
plt.colorbar();

Customizing Colorbars
 The colormap can be specified using the cmap argument to the
plotting function that is creating the visualization.
A grayscale colormap
In[4]: plt.imshow(I, cmap='gray');
Choosing the colormap
 Three different categories of colormaps:
1. Sequential colormaps
These consist of one continuous sequence of colors (e.g., binary or
viridis).
2. Divergent colormaps
These usually contain two distinct colors, which show positive and
negative deviations from a mean (e.g., RdBu or PuOr).
3. Qualitative colormaps
These mix colors with no particular sequence (e.g., rainbow or jet).

i. The jet colormap and its uneven luminance scale


 The jet colormap, which was the default in Matplotlib prior to version
2.0, is an example of a qualitative colormap.
 Drawbacks of Qualitative maps:
o Qualitative maps are a poor choice for representing quantitative
data.
o Qualitative maps usually do not display any uniform progression
in brightness as the scale increases.

Conversion of the jet colorbar into black and white:


In[5]: from matplotlib.colors import LinearSegmentedColormap
def grayscale_cmap(cmap):
"""Return a grayscale version of the given colormap"""
cmap = plt.cm.get_cmap(cmap)
colors = cmap(np.arange(cmap.N))
# convert RGBA to perceived grayscale luminance
# cf. http://alienryderflex.com/hsp.html
RGB_weight = [0.299, 0.587, 0.114]
luminance = np.sqrt(np.dot(colors[:, :3] ** 2, RGB_weight))
colors[:, :3] = luminance[:, np.newaxis]
return LinearSegmentedColormap.from_list(cmap.name + "_gray",
colors, cmap.N)
def view_colormap(cmap):
"""Plot a colormap with its grayscale equivalent"""
cmap = plt.cm.get_cmap(cmap)
colors = cmap(np.arange(cmap.N))
cmap = grayscale_cmap(cmap)
grayscale = cmap(np.arange(cmap.N))
fig, ax = plt.subplots(2, figsize=(6, 2),
subplot_kw=dict(xticks=[], yticks=[]))
ax[0].imshow([colors], extent=[0, 10, 0, 1])
ax[1].imshow([grayscale], extent=[0, 10, 0, 1])

In[6]: view_colormap('jet')

ii. The viridis colormap and its even luminance scale


 In the above example, consider the bright stripes in the grayscale
image.
 Even in full color, this uneven brightness means that the eye will
be drawn to certain portions of the color range, which will
potentially emphasize unimportant parts of the dataset.
 It’s better to use a colormap such as viridis (the default as of
Matplotlib 2.0), which is specifically constructed to have an even
brightness variation across the range.
 Thus, it is useful for color perception and also grayscale printing.

In[7]: view_colormap('viridis')

iii. The cubehelix colormap and its luminance


 Another good option for continuous data is the cubehelix colormap.
In[8]: view_colormap('cubehelix')
iv. The RdBu (Red-Blue) colormap and its luminance
 For other situations, such as showing positive and negative
deviations from some mean, dual-color colorbars such as RdBu
(short for Red-Blue) can be useful.

 The positive-negative information will be lost upon translation to


grayscale.

Color limits and extensions


 Matplotlib allows for a large range of colorbar customization.
 The colorbar has some interesting flexibility;
For example, the color limits can be customized and the out-of-
bounds values can be indicated with a triangular arrow at the top
and bottom by setting the extend property.
 The result is a much more useful visualization of our data.
 Example: Displaying an image that is subject to noise
Specifying colormap extensions
In[10]: # make noise in 1% of the image pixels
speckles = (np.random.random(I.shape) < 0.01)
I[speckles] = np.random.normal(0, 3, np.count_nonzero(speckles))
plt.figure(figsize=(10, 3.5))
plt.subplot(1, 2, 1)
plt.imshow(I, cmap='RdBu')
plt.colorbar()
plt.subplot(1, 2, 2)
plt.imshow(I, cmap='RdBu')
plt.colorbar(extend='both')
plt.clim(-1, 1);
Discrete colorbars
 Colormaps are by default continuous.
 To represent discrete values, use the plt.cm.get_cmap() function,
and pass the name of a suitable colormap along with the number
of desired bins.
 The discrete version of a colormap can be used just like any other
colormap.

A discretized colormap
In[11]: plt.imshow(I, cmap=plt.cm.get_cmap('Blues', 6))
plt.colorbar()
plt.clim(-1, 1);
9. Multiple Subplots
 The subplots in Matplotlib are groups of smaller axes that can exist
together within a single figure.
 These subplots might be insets, grids of plots, or other more
complicated layouts.
 There are four routines for creating subplots in Matplotlib.
 They include:
o plt.axes
o plt.subplot
o plt.subplots
o plt.GridSpec
i. plt.axes: Subplots by Hand
o The most basic method of creating axes is to use the plt.axes
function.
o By default this function creates a standard axes object that fills the
entire figure.
o plt.axes also takes an optional argument that is a list of four
numbers in the figure coordinate system.
o These numbers represent [bottom, left, width, height] in the figure
coordinate system, which ranges from 0 at the bottom left of the
figure to 1 at the top right of the figure.

Example of an inset axes


 An inset axes can be created at the top-right corner of another
axes by setting the x and y position to 0.65 (that is, starting at 65%
of the width and 65% of the height of the figure) and the x and y
extents to 0.2 (that is, the size of the axes is 20% of the width and
20% of the height of the figure).
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
In[2]: ax1 = plt.axes() # standard axes
ax2 = plt.axes([0.65, 0.65, 0.2, 0.2])
Example: Creation of two vertically stacked axes
 The equivalent of plt.axes command within the object-oriented
interface is fig.add_axes().
In[3]: fig = plt.figure()
ax1 = fig.add_axes([0.1, 0.5, 0.8, 0.4],
xticklabels=[], ylim=(-1.2, 1.2))
ax2 = fig.add_axes([0.1, 0.1, 0.8, 0.4],
ylim=(-1.2, 1.2))
x = np.linspace(0, 10)
ax1.plot(np.sin(x))
ax2.plot(np.cos(x));

 There are two axes (the top with no tick labels) that are just
touching: the bottom of the upper panel (at position 0.5) matches
the top of the lower panel (at position 0.1 + 0.4).

ii. plt.subplot: Simple Grids of Subplots


 Aligned columns or rows of subplots are a common enough need
that Matplotlib has several convenience routines that make them
easy to create.
 The plt.subplot() creates a single subplot within a grid.
 This command takes three integer arguments
o the number of rows
o the number of columns
o the index of the plot that runs from the upper left to the
bottom right

A plt.subplot() example
In[4]: for i in range(1, 7):
plt.subplot(2, 3, i)
plt.text(0.5, 0.5, str((2, 3, i)), fontsize=18, ha='center')

plt.subplot() with adjusted margins


o The command plt.subplots_adjust can be used to adjust the spacing
between these plots.
o The following code uses the equivalent object-oriented command,
fig.add_subplot():

In[5]: fig = plt.figure()


fig.subplots_adjust(hspace=0.4, wspace=0.4)
for i in range(1, 7):
ax = fig.add_subplot(2, 3, i)
ax.text(0.5, 0.5, str((2, 3, i)), fontsize=18, ha='center')

 The hspace and wspace arguments of plt.subplots_adjust specifies


the spacing along the height and width of the figure, in units of the
subplot size (in this case, the space is 40% of the subplot width and
height).
iii. plt.subplots: The Whole Grid in One Go
 The approach that uses plt.subplot() become quite tedious while
creating a large grid of subplots, or hiding the x- and y-axis labels
on the inner plots.
 This problem is solved by using plt.subplots().
 Rather than creating a single subplot, this function creates a full
grid of subplots in a single line, returning them in a NumPy array.
 Compared to plt.subplot(), plt.subplots() is more consistent with
Python’s conventional 0-based indexing.
 The arguments are the number of rows and number of columns,
along with optional keywords sharex and sharey, which specifies
the relationships between different axes.

Shared x and y axis in plt.subplots()


 Example: A 2×3 grid of subplots can be created, where
o all axes in the same row share their y-axis scale, and
o all axes in the same column share their x-axis scale
In[6]: fig, ax = plt.subplots(2, 3, sharex='col', sharey='row')

Identifying plots in a subplot grid


 The inner labels on the grid are automatically removed to make
the plot cleaner by specifying sharex and sharey.
 The resulting grid of axes instances is returned within a NumPy
array, allowing for the convenient specification of the desired axes
using standard array indexing notation.
In[7]: # axes are in a two-dimensional array, indexed by [row, col]
for i in range(2):
for j in range(3):
ax[i, j].text(0.5, 0.5, str((i, j)), fontsize=18, ha='center')
fig
iv. plt.GridSpec: More Complicated Arrangements
 To go beyond a regular grid to subplots that span multiple rows and
columns, plt.GridSpec() is the best tool.
 The plt.GridSpec() object does not create a plot by itself; it is simply
a convenient interface that is recognized by the plt.subplot()
command.
 For example, a gridspec for a grid of two rows and three columns
with some specified width and height space can be created:

In[8]: grid = plt.GridSpec(2, 3, wspace=0.4, hspace=0.3)

Irregular subplots with plt.GridSpec


 The subplot locations and extents are specified using the familiar
Python slicing syntax.
In[9]: plt.subplot(grid[0, 0])
plt.subplot(grid[0, 1:])
plt.subplot(grid[1, :2])
plt.subplot(grid[1, 2]);

Visualizing multidimensional distributions with plt.GridSpec


 This type of flexible grid alignment can be used when creating multi-
axes histogram plots.
In[10]: # Create some normally distributed data
mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 3000).T
# Set up the axes with gridspec
fig = plt.figure(figsize=(6, 6))
grid = plt.GridSpec(4, 4, hspace=0.2, wspace=0.2)
main_ax = fig.add_subplot(grid[:-1, 1:])
y_hist = fig.add_subplot(grid[:-1, 0], xticklabels=[], sharey=main_ax)
x_hist = fig.add_subplot(grid[-1, 1:], yticklabels=[], sharex=main_ax)
# scatter points on the main axes
main_ax.plot(x, y, 'ok', markersize=3, alpha=0.2)
# histogram on the attached axes
x_hist.hist(x, 40, histtype='stepfilled', orientation='vertical',
color='gray')
x_hist.invert_yaxis()
y_hist.hist(y, 40, histtype='stepfilled',orientation='horizontal',
color='gray')
y_hist.invert_xaxis()
10. Visualization with Seaborn
 Although Matplotlib has proven to be an incredibly useful and popular
visualization tool, they are inefficient in certain situations.
 Some drawbacks in using Matplotlib:
o Prior to version 2.0, Matplotlib’s defaults are not exactly the
best choices.
o Matplotlib’s API is relatively low level.
o Doing sophisticated statistical visualization is possible, but
often requires a lot of boilerplate code.
o Matplotlib is not designed for use with Pandas DataFrames.
o In order to visualize data from a Pandas DataFrame, each
Series is extracted and they are concatenated together into
the right format.
 To solve these problems, Seaborn is used.
 Seaborn is a plotting library that can intelligently use the DataFrame
labels in a plot.
o It provides an API on top of Matplotlib that offers sane choices
for plot style and color defaults
o defines simple high-level functions for common statistical plot
types
o integrates with the functionality provided by Pandas
DataFrames

Seaborn Versus Matplotlib


Example: A simple random-walk plot in Matplotlib, using its classic plot
formatting and colors.

Data in Matplotlib’s default style


In[1]: import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np
import pandas as pd

In[2]: # Create some data


rng = np.random.RandomState(0)
x = np.linspace(0, 10, 500)
y = np.cumsum(rng.randn(500, 6), 0)

In[3]: # Plot the data with Matplotlib defaults


plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

Data in Seaborn’s default style


 Seaborn has many of its own high-level plotting routines
 It can also overwrite Matplotlib’s default parameters and in turn get
even simple Matplotlib scripts to produce vastly superior output.
 The style can be set by calling Seaborn’s set() method.
In[4]: import seaborn as sns
sns.set()
In[5]: # same plotting code as above!
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

Exploring Seaborn Plots


 The main idea of Seaborn is that it provides high-level commands
to create a variety of plot types useful for statistical data
exploration, and even some statistical model fitting.
 All of the following could be done using raw Matplotlib commands.
But the Seaborn API is much more convenient.
Plot types available in Seaborn
i. Histograms, KDE, and densities
ii. Pair Plots
iii. Faceted histograms
iv. Factor plots
v. Joint distribution plots
vi. Bar plots

i. Histograms, KDE, and densities


 In statistical data visualization, histograms and joint distributions
of variables must be plotted.

Histograms for visualizing distributions


In[6]: data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]],
size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])
for col in 'xy':
plt.hist(data[col], normed=True, alpha=0.5)

Kernel density estimates for visualizing distributions


 Rather than a histogram, a smooth estimate of the distribution can
be created using a kernel density estimation, which Seaborn does
with sns.kdeplot.
In[7]: for col in 'xy':
sns.kdeplot(data[col], shade=True)
Kernel density and histograms plotted together
 Histograms and KDE can be combined using distplot.
In[8]: sns.distplot(data['x'])
sns.distplot(data['y']);

A two-dimensional kernel density plot


 By passing the full two-dimensional dataset to kdeplot, a two-
dimensional visualization of the data can be created.
In[9]: sns.kdeplot(data);
A joint distribution plot with a two-dimensional kernel density estimate
 The joint distribution and the marginal distributions can be created
together using sns.jointplot.
In[10]: with sns.axes_style('white'):
sns.jointplot("x", "y", data, kind='kde');

A joint distribution plot with a hexagonal bin representation


 There are other parameters that can be passed to jointplot—for
example, a hexagonally based histogram can be used instead.
In[11]: with sns.axes_style('white'):
sns.jointplot("x", "y", data, kind='hex')

ii. Pair plots


 Pair plots can be used to generalize joint plots to datasets of larger
dimensions.
 This is very useful for exploring correlations between
multidimensional data, to plot all pairs of values against each other.
 Consider the well-known Iris dataset, which lists measurements of
petals and sepals of three iris species.
In[12]: iris = sns.load_dataset("iris")
iris.head()
Out[12]: sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

A pair plot showing the relationships between four variables


Visualizing the multidimensional relationships among the samples is as
easy as calling sns.pairplot.
In[13]: sns.pairplot(iris, hue='species', size=2.5);

iii. Faceted histograms


 Sometimes the best way to view data is via histograms of subsets.
Seaborn’s FacetGrid makes this extremely simple.
An example of a faceted histogram
Consider the data that shows the amount that restaurant staff receives
in tips based on various indicator data.
In[14]: tips = sns.load_dataset('tips')
tips.head()
Out[14]: total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

In[15]: tips['tip_pct'] = 100 * tips['tip'] / tips['total_bill']


grid = sns.FacetGrid(tips, row="sex", col="time",
margin_titles=True)
grid.map(plt.hist, "tip_pct", bins=np.linspace(0, 40, 15));

iv. Factor plots


An example of a factor plot, comparing distributions given various
discrete factors
This is useful in viewing the distribution of a parameter within bins
defined by any other parameter.
In[16]: with sns.axes_style(style='ticks'):
g = sns.factorplot("day", "total_bill", "sex", data=tips, kind="box")
g.set_axis_labels("Day", "Total Bill");
v. Joint distributions
 Similar to the pair plot, sns.jointplot can be used to show the joint
distribution between different datasets, along with the associated
marginal distributions.

A joint distribution plot


In[17]: with sns.axes_style('white'):
sns.jointplot("total_bill", "tip", data=tips, kind='hex')

vi. Bar plots


 Time series can be plotted with sns.factorplot.

A histogram as a special case of a factor plot


Example:
In[19]: planets = sns.load_dataset('planets')
planets.head()
Out[19]: method number orbital_period mass distance year
0 Radial Velocity 1 269.300 7.10 77.40 2006
1 Radial Velocity 1 874.774 2.21 56.95 2008
2 Radial Velocity 1 763.000 2.60 19.84 2011
3 Radial Velocity 1 326.030 19.40 110.62 2007
4 Radial Velocity 1 516.220 10.50 119.47 2009
In[20]: with sns.axes_style('white'):
g = sns.factorplot("year", data=planets, aspect=2, kind="count",
color='steelblue')
g.set_xticklabels(step=5)

Number of planets discovered by year and type


In[21]: with sns.axes_style('white'):
g = sns.factorplot("year", data=planets, aspect=4.0, kind='count',
hue='method', order=range(2001, 2015))
g.set_ylabels('Number of Planets Discovered')

Multiple Subplots
 The subplots in Matplotlib are groups of smaller axes that can exist
together within a single figure.
 These subplots might be insets, grids of plots, or other more
complicated layouts.
 There are four routines for creating subplots in Matplotlib.
 They include:
o plt.axes
o plt.subplot
o plt.subplots
o plt.GridSpec
v. plt.axes: Subplots by Hand
o The most basic method of creating axes is to use the plt.axes
function.
o By default this function creates a standard axes object that fills the
entire figure.
o plt.axes also takes an optional argument that is a list of four
numbers in the figure coordinate system.
o These numbers represent [bottom, left, width, height] in the figure
coordinate system, which ranges from 0 at the bottom left of the
figure to 1 at the top right of the figure.

Example of an inset axes


 An inset axes can be created at the top-right corner of another
axes by setting the x and y position to 0.65 (that is, starting at 65%
of the width and 65% of the height of the figure) and the x and y
extents to 0.2 (that is, the size of the axes is 20% of the width and
20% of the height of the figure).
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
In[2]: ax1 = plt.axes() # standard axes
ax2 = plt.axes([0.65, 0.65, 0.2, 0.2])

Example: Creation of two vertically stacked axes


 The equivalent of plt.axes command within the object-oriented
interface is fig.add_axes().
In[3]: fig = plt.figure()
ax1 = fig.add_axes([0.1, 0.5, 0.8, 0.4],
xticklabels=[], ylim=(-1.2, 1.2))
ax2 = fig.add_axes([0.1, 0.1, 0.8, 0.4],
ylim=(-1.2, 1.2))
x = np.linspace(0, 10)
ax1.plot(np.sin(x))
ax2.plot(np.cos(x));

 There are two axes (the top with no tick labels) that are just
touching: the bottom of the upper panel (at position 0.5) matches
the top of the lower panel (at position 0.1 + 0.4).

vi. plt.subplot: Simple Grids of Subplots


 Aligned columns or rows of subplots are a common enough need
that Matplotlib has several convenience routines that make them
easy to create.
 The plt.subplot() creates a single subplot within a grid.
 This command takes three integer arguments
o the number of rows
o the number of columns
o the index of the plot that runs from the upper left to the
bottom right

A plt.subplot() example
In[4]: for i in range(1, 7):
plt.subplot(2, 3, i)
plt.text(0.5, 0.5, str((2, 3, i)), fontsize=18, ha='center')

plt.subplot() with adjusted margins


o The command plt.subplots_adjust can be used to adjust the spacing
between these plots.
o The following code uses the equivalent object-oriented command,
fig.add_subplot():

In[5]: fig = plt.figure()


fig.subplots_adjust(hspace=0.4, wspace=0.4)
for i in range(1, 7):
ax = fig.add_subplot(2, 3, i)
ax.text(0.5, 0.5, str((2, 3, i)), fontsize=18, ha='center')

 The hspace and wspace arguments of plt.subplots_adjust specifies


the spacing along the height and width of the figure, in units of the
subplot size (in this case, the space is 40% of the subplot width and
height).

vii. plt.subplots: The Whole Grid in One Go


 The approach that uses plt.subplot() become quite tedious while
creating a large grid of subplots, or hiding the x- and y-axis labels
on the inner plots.
 This problem is solved by using plt.subplots().
 Rather than creating a single subplot, this function creates a full
grid of subplots in a single line, returning them in a NumPy array.
 Compared to plt.subplot(), plt.subplots() is more consistent with
Python’s conventional 0-based indexing.
 The arguments are the number of rows and number of columns,
along with optional keywords sharex and sharey, which specifies
the relationships between different axes.

Shared x and y axis in plt.subplots()


 Example: A 2×3 grid of subplots can be created, where
o all axes in the same row share their y-axis scale, and
o all axes in the same column share their x-axis scale
In[6]: fig, ax = plt.subplots(2, 3, sharex='col', sharey='row')

Identifying plots in a subplot grid


 The inner labels on the grid are automatically removed to make
the plot cleaner by specifying sharex and sharey.
 The resulting grid of axes instances is returned within a NumPy
array, allowing for the convenient specification of the desired axes
using standard array indexing notation.
In[7]: # axes are in a two-dimensional array, indexed by [row, col]
for i in range(2):
for j in range(3):
ax[i, j].text(0.5, 0.5, str((i, j)), fontsize=18, ha='center')
fig

viii. plt.GridSpec: More Complicated Arrangements


 To go beyond a regular grid to subplots that span multiple rows and
columns, plt.GridSpec() is the best tool.
 The plt.GridSpec() object does not create a plot by itself; it is simply
a convenient interface that is recognized by the plt.subplot()
command.
 For example, a gridspec for a grid of two rows and three columns
with some specified width and height space can be created:
In[8]: grid = plt.GridSpec(2, 3, wspace=0.4, hspace=0.3)

Irregular subplots with plt.GridSpec


 The subplot locations and extents are specified using the familiar
Python slicing syntax.
In[9]: plt.subplot(grid[0, 0])
plt.subplot(grid[0, 1:])
plt.subplot(grid[1, :2])
plt.subplot(grid[1, 2]);

Visualizing multidimensional distributions with plt.GridSpec


 This type of flexible grid alignment can be used when creating multi-
axes histogram plots.
In[10]: # Create some normally distributed data
mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 3000).T
# Set up the axes with gridspec
fig = plt.figure(figsize=(6, 6))
grid = plt.GridSpec(4, 4, hspace=0.2, wspace=0.2)
main_ax = fig.add_subplot(grid[:-1, 1:])
y_hist = fig.add_subplot(grid[:-1, 0], xticklabels=[], sharey=main_ax)
x_hist = fig.add_subplot(grid[-1, 1:], yticklabels=[], sharex=main_ax)
# scatter points on the main axes
main_ax.plot(x, y, 'ok', markersize=3, alpha=0.2)
# histogram on the attached axes
x_hist.hist(x, 40, histtype='stepfilled', orientation='vertical',
color='gray')
x_hist.invert_yaxis()
y_hist.hist(y, 40, histtype='stepfilled',orientation='horizontal',
color='gray')
y_hist.invert_xaxis()
Visualization with Seaborn
 Although Matplotlib has proven to be an incredibly useful and popular
visualization tool, they are inefficient in certain situations.
 Some drawbacks in using Matplotlib:
o Prior to version 2.0, Matplotlib’s defaults are not exactly the
best choices.
o Matplotlib’s API is relatively low level.
o Doing sophisticated statistical visualization is possible, but
often requires a lot of boilerplate code.
o Matplotlib is not designed for use with Pandas DataFrames.
o In order to visualize data from a Pandas DataFrame, each
Series is extracted and they are concatenated together into
the right format.
 To solve these problems, Seaborn is used.
 Seaborn is a plotting library that can intelligently use the DataFrame
labels in a plot.
o It provides an API on top of Matplotlib that offers sane choices
for plot style and color defaults
o defines simple high-level functions for common statistical plot
types
o integrates with the functionality provided by Pandas
DataFrames

Seaborn Versus Matplotlib


Example: A simple random-walk plot in Matplotlib, using its classic plot
formatting and colors.

Data in Matplotlib’s default style


In[1]: import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np
import pandas as pd

In[2]: # Create some data


rng = np.random.RandomState(0)
x = np.linspace(0, 10, 500)
y = np.cumsum(rng.randn(500, 6), 0)

In[3]: # Plot the data with Matplotlib defaults


plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

Data in Seaborn’s default style


 Seaborn has many of its own high-level plotting routines
 It can also overwrite Matplotlib’s default parameters and in turn get
even simple Matplotlib scripts to produce vastly superior output.
 The style can be set by calling Seaborn’s set() method.
In[4]: import seaborn as sns
sns.set()
In[5]: # same plotting code as above!
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

Exploring Seaborn Plots


 The main idea of Seaborn is that it provides high-level commands
to create a variety of plot types useful for statistical data
exploration, and even some statistical model fitting.
 All of the following could be done using raw Matplotlib commands.
But the Seaborn API is much more convenient.

Plot types available in Seaborn


vii. Histograms, KDE, and densities
viii. Pair Plots
ix. Faceted histograms
x. Factor plots
xi. Joint distribution plots
xii. Bar plots

vii. Histograms, KDE, and densities


 In statistical data visualization, histograms and joint distributions
of variables must be plotted.

Histograms for visualizing distributions


In[6]: data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]],
size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])
for col in 'xy':
plt.hist(data[col], normed=True, alpha=0.5)

Kernel density estimates for visualizing distributions


 Rather than a histogram, a smooth estimate of the distribution can
be created using a kernel density estimation, which Seaborn does
with sns.kdeplot.
In[7]: for col in 'xy':
sns.kdeplot(data[col], shade=True)

Kernel density and histograms plotted together


 Histograms and KDE can be combined using distplot.
In[8]: sns.distplot(data['x'])
sns.distplot(data['y']);
A two-dimensional kernel density plot
 By passing the full two-dimensional dataset to kdeplot, a two-
dimensional visualization of the data can be created.
In[9]: sns.kdeplot(data);

A joint distribution plot with a two-dimensional kernel density estimate


 The joint distribution and the marginal distributions can be created
together using sns.jointplot.
In[10]: with sns.axes_style('white'):
sns.jointplot("x", "y", data, kind='kde');
A joint distribution plot with a hexagonal bin representation
 There are other parameters that can be passed to jointplot—for
example, a hexagonally based histogram can be used instead.
In[11]: with sns.axes_style('white'):
sns.jointplot("x", "y", data, kind='hex')

viii. Pair plots


 Pair plots can be used to generalize joint plots to datasets of larger
dimensions.
 This is very useful for exploring correlations between
multidimensional data, to plot all pairs of values against each other.
 Consider the well-known Iris dataset, which lists measurements of
petals and sepals of three iris species.
In[12]: iris = sns.load_dataset("iris")
iris.head()
Out[12]: sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

A pair plot showing the relationships between four variables


Visualizing the multidimensional relationships among the samples is as
easy as calling sns.pairplot.
In[13]: sns.pairplot(iris, hue='species', size=2.5);

ix. Faceted histograms


 Sometimes the best way to view data is via histograms of subsets.
Seaborn’s FacetGrid makes this extremely simple.
An example of a faceted histogram
Consider the data that shows the amount that restaurant staff receives
in tips based on various indicator data.
In[14]: tips = sns.load_dataset('tips')
tips.head()
Out[14]: total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

In[15]: tips['tip_pct'] = 100 * tips['tip'] / tips['total_bill']


grid = sns.FacetGrid(tips, row="sex", col="time",
margin_titles=True)
grid.map(plt.hist, "tip_pct", bins=np.linspace(0, 40, 15));

x. Factor plots
An example of a factor plot, comparing distributions given various
discrete factors
This is useful in viewing the distribution of a parameter within bins
defined by any other parameter.
In[16]: with sns.axes_style(style='ticks'):
g = sns.factorplot("day", "total_bill", "sex", data=tips, kind="box")
g.set_axis_labels("Day", "Total Bill");

xi. Joint distributions


 Similar to the pair plot, sns.jointplot can be used to show the joint
distribution between different datasets, along with the associated
marginal distributions.
A joint distribution plot
In[17]: with sns.axes_style('white'):
sns.jointplot("total_bill", "tip", data=tips, kind='hex')

xii. Bar plots


 Time series can be plotted with sns.factorplot.

A histogram as a special case of a factor plot


Example:
In[19]: planets = sns.load_dataset('planets')
planets.head()
Out[19]: method number orbital_period mass distance year
0 Radial Velocity 1 269.300 7.10 77.40 2006
1 Radial Velocity 1 874.774 2.21 56.95 2008
2 Radial Velocity 1 763.000 2.60 19.84 2011
3 Radial Velocity 1 326.030 19.40 110.62 2007
4 Radial Velocity 1 516.220 10.50 119.47 2009
In[20]: with sns.axes_style('white'):
g = sns.factorplot("year", data=planets, aspect=2, kind="count",
color='steelblue')
g.set_xticklabels(step=5)
Number of planets discovered by year and type
In[21]: with sns.axes_style('white'):
g = sns.factorplot("year", data=planets, aspect=4.0, kind='count',
hue='method', order=range(2001, 2015))
g.set_ylabels('Number of Planets Discovered')

You might also like