Module 4 Visua;Ization Using Matplotlib (1)
Module 4 Visua;Ization Using Matplotlib (1)
Prepared By,
Dr. Anitha DB
Associate Professor & Head
Department of CSE-Data Science
ATME College of Engineering, Mysuru
ATME College of 1
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
Contents
• General Matplotlib Tips,
• Simple Line Plots,
• Simple Scatter Plots,
• Visualization with Seaborn
ATME College of 2
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
• Importing matplotlib
• Setting Styles
• show() or No show()? How to Display Your Plots
• Saving Figures to File
• Two Interfaces for the Price of One
ATME College of 3
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
ATME College of 4
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
ATME College of 5
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
ATME College of 6
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
• Importing matplotlib
• Setting Styles
• show() or No show()? How to Display Your Plots
• Saving Figures to File
• Two Interfaces for the Price of One
ATME College of 7
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
plt.style directive can be used to choose appropriate aesthetic styles for figures.
ATME College of 8
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.1 General Matplotlib Tips
4.1.3 show() or No show()? How to Display Your Plots
Plotting from a script : If you are using Matplotlib from within a script, the function plt.show() is your
friend. plt.show() starts an event loop, looks for all currently active figure objects, and opens one or more
interactive windows that display your figure or figures. So, let us consider a example file called myplot.py
containing the following:
ATME College of 9
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
At this point, any plt plot command will cause a figure window to open, and further commands can be run to
update the plot. Some changes (such as modifying proper ties of lines that are already drawn) will not draw
automatically; to force an update, use plt.draw().
ATME College of 10
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
Plotting interactively within an IPython notebook can be done with the %matplotlib command, and works in a
similar way to the IPython shell.
In the IPython notebook, you also have the option of embedding graphics directly in the notebook, with two
possible options:
• %matplotlib notebook will lead to interactive plots embedded within the notebook
• %matplotlib inline will lead to static images of your plot embedded in the notebook
ATME College of 11
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
ATME College of 12
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
ATME College of 13
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
ATME College of 14
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
ATME College of 15
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.1 General Matplotlib Tips
4.1.5 Two Interfaces for the Price of One
A potentially confusing feature of Matplotlib is its dual interfaces: a convenient MATLAB-style state-based
interface, and a more powerful object-oriented interface.
Differences between the two
MATLAB-style interface Matplotlib was originally written as a Python alternative for MATLAB users, and
much of its syntax reflects that fact. The MATLAB-style tools are contained in the pyplot (plt) interface. For
example, the following code will probably look quite familiar to MATLAB users (Figure 4-3):
ATME College of 16
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.1 General Matplotlib Tips
4.1.5 Two Interfaces for the Price of One
It’s important to note that this interface is stateful: it keeps track of the “current” figure and axes, which are
where all plt commands are applied. You can get a reference to these using the plt.gcf() (get current figure) and
plt.gca() (get current axes) routines. While this stateful interface is fast and convenient for simple plots, it is easy
to run into problems.
For example, once the second panel is created, how can we go back and add something to the first? This is
possible within the MATLAB-style interface, but a bit clunky. Fortunately, there is a better way.
Object-oriented interface
The object-oriented interface is available for these more complicated situations, and for when you want more
control over your figure. Rather than depending on some notion of an “active” figure or axes, in the object-
oriented interface the plotting functions are methods of explicit Figure and Axes objects. To re-create the
previous plot using this style of plotting, you might do the following (Figure 4-4):
ATME College of 17
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
For more simple plots, the choice of which style to use is largely a matter of preference, but the object-oriented
approach can become a necessity as plots become more complicated.
ATME College of 18
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
ATME College of 19
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.1 Simple Line Plot
The simplest of all plots is the visualization of a single function y = f (x). Here we will take a first look at
creating a simple plot of this type. As with all the following sections, we’ll start by setting up the notebook for
plotting and importing the functions we will use:
ATME College of 20
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.1 Simple Line Plot
• In Matplotlib, the figure (an instance of the class plt.Figure) can be thought of as a single container that
contains all the objects representing axes, graphics, text, and labels.
• The axes (an instance of the class plt.Axes) is what we see above: a bounding box with ticks and labels,
which will eventually contain the plot elements that make up our visualization.
• The variable name fig to refer to a figure instance, and ax to refer to an axes instance or group of axes
instances.
Once we have created an axes, we can use the ax.plot function to plot some data. Let’s start with a simple
sinusoid (Figure 4-6):
ATME College of 21
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.1 Simple Line Plot
ATME College of 22
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.1 Simple Line Plot
Alternatively, we can use the pylab interface and let the figure and axes be created for us in the background
(Figure 4-7):
ATME College of 23
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
If we want to create a single figure with multiple lines, we can simply call the plot function multiple times
(Figure 4-8):
ATME College of 24
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.2 Adjusting the Plot: Line Colors and Styles
The first adjustment you might wish to make to a plot is to control the line colors and styles. The plt.plot()
function takes additional arguments that can be used to specify these. To adjust the color, you can use the color
keyword, which accepts a string argument representing virtually any imaginable color. The color can be
specified in a variety of ways (Figure 4-9):
ATME College of 25
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.2 Adjusting the Plot: Line Colors and Styles
If no color is specified, Matplotlib will automatically cycle through a set of default colors for multiple
lines.
ATME College of 26
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.2 Adjusting the Plot: Line Colors and Styles
Similarly, you can adjust the line style using the linestyle keyword (Figure 4-10):
ATME College of 27
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.2 Adjusting the Plot: Line Colors and Styles
ATME College of 28
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.2 Adjusting the Plot: Line Colors and Styles
If you would like to be extremely terse, these linestyle and color codes can be com bined into a single
nonkeyword argument to the plt.plot() function (Figure 4-11):
Matplotlib does a decent job of choosing default axes limits for your plot, but some times it’s nice to have
finer control. The most basic way to adjust axis limits is to use the plt.xlim() and plt.ylim() methods (Figure
4-12):
ATME College of 30
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.3 Adjusting the Plot: Axes Limits
If for some reason you’d like either axis to be displayed in reverse, you can simply reverse the order of the
arguments (Figure 4-13):
ATME College of 31
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.3 Adjusting the Plot: Axes Limits
A useful related method is plt.axis() (note here the potential confusion between axes with an e, and axis with an i).
The plt.axis() method allows you to set the x and y limits with a single call, by passing a list that specifies [xmin,
xmax, ymin, ymax] (Figure 4-14)
ATME College of 32
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.3 Adjusting the Plot: Axes Limits
The plt.axis() method goes even beyond this, allowing you to do things like auto matically tighten the bounds
around the current plot (Figure 4-15):
ATME College of 33
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.3 Adjusting the Plot: Axes Limits
It allows even higher-level specifications, such as ensuring an equal aspect ratio so that on your screen, one unit
in x is equal to one unit in y (Figure 4-16):
ATME College of 34
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.4 Labeling Plots
Titles and axis labels are the simplest such labels—there are methods that can be used to quickly set them (Figure
4-17):
ATME College of 35
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
The plt.legend()
function keeps track of
the line style and
color, and matches
these with the correct
label.
ATME College of 36
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.2 Simple Line Plot
4.2.4 Labeling Plots
While most plt functions translate directly to ax methods (such as plt.plot() → ax.plot(), plt.legend() → ax.legend(),
etc.), this is not the case for all com mands. In particular, functions to set limits, labels, and titles are slightly
modified. For transitioning between MATLAB-style functions and object-oriented methods, make the following
changes:
ATME College of 37
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
ATME College of 38
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.1 Simple Scatter Plot
Another commonly used plot type is the simple scatter plot, a close cousin of the line plot. Instead of points being
joined by line segments, here the points are represented individually with a dot, circle, or other shape. We’ll
start by setting up the notebook for plotting and importing the functions we will use:
ATME College of 39
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.2 Scatter Plots with plt.plot
plt.plot function can also be used to produce scatter plots (Figure 4-20):
ATME College of 40
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.2 Scatter Plots with plt.plot
The third argument in the function call is a character(marker style) that represents the type of symbol used
for the plotting. A number of the more common markers are shown here (Figure 4-21):
ATME College of 41
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.2 Scatter Plots with plt.plot
ATME College of 42
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.2 Scatter Plots with plt.plot
For even more possibilities, these character codes can be used together with line and color codes to plot points
along with a line connecting them (Figure 4-22):
ATME College of 43
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.2 Scatter Plots with plt.plot
Additional keyword arguments to plt.plot specify a wide range of properties of the lines and markers (Figure 4-23):
ATME College of 45
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.3 Scatter Plots with plt.scatter
The primary difference of plt.scatter from plt.plot is that it can be used to create scatter plots where the
properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped
to data. Let’s show this by creating a random scatter plot with points of many colors and sizes. In order to
better see the overlapping results, we’ll also use the alpha keyword to adjust the transparency level (Figure 4-
25):
ATME College of 46
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.3 Scatter Plots with plt.scatter
ATME College of 47
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.3 Scatter Plots with plt.scatter
Notice that the color argument is automatically mapped to a color scale (shown here by the colorbar()
command), and the size argument is given in pixels. In this way, the color and size of points can be used to
convey information in the visualization, in order to illustrate multidimensional data.
For example, we might use the Iris data from Scikit-Learn, where each sample is one of three types of
flowers that has had the size of its petals and sepals carefully measured (Figure 4-26):
ATME College of 48
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.3 Scatter Plots with plt.scatter
ATME College of 49
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.3 Scatter Plots with plt.scatter
We can see that this scatter plot has given us the ability to simultaneously explore four different dimensions of
the data: the (x, y) location of each point corresponds to the sepal length and width, the size of the point is
related to the petal width, and the color is related to the particular species of flower. Multicolor and
multifeature scatter plots like this can be useful for both exploration and presentation of data
ATME College of 50
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.3 Simple Scatter Plot
4.3.4 plot Versus scatter
A Note on Efficiency
As datasets get larger than a few thousand points, plt.plot can be noticeably more efficient than plt.scatter.
The reason is that plt.scatter has the capability to render a different size and/or color for each point, so the
renderer must do the extra work of constructing each point individually.
In plt.plot, on the other hand, the points are always essentially clones of each other, so the work of determining
the appearance of the points is done only once for the entire set of data.
For large datasets, the difference between these two can lead to vastly different performance, and for this reason,
plt.plot should be preferred over plt.scatter for large datasets.
ATME College of 51
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
ATME College of 52
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.1 Seaborn Versus Matplotlib
Here is an example of a simple random-walk plot in Matplotlib, using its classic plot formatting and colors.
We start with the typical imports:
ATME College of 54
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
ATME College of 55
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Histograms, KDE, and densities
ATME College of 56
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Histograms, KDE, and densities
Rather than a histogram, we can get a smooth estimate of the distribution using a kernel density estimation, which
Seaborn does with sns.kdeplot (Figure 4-114):
ATME College of 57
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Histograms, KDE, and densities
Histograms and KDE can be combined using distplot (Figure 4-115):
ATME College of 58
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Histograms, KDE, and densities
If we pass the full two-dimensional dataset to kdeplot, we will get a two-dimensional visualization of the data
(Figure 4-116):
ATME College of 59
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Histograms, KDE, and densities
We can see the joint distribution and the marginal distributions together using sns.jointplot. For this plot,
we’ll set the style to a white background (Figure 4-117):
ATME College of 60
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Histograms, KDE, and densities
There are other parameters that can be passed to jointplot—for example, we can use a hexagonally based
histogram instead (Figure 4-118):
ATME College of 61
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Pair Plot
When you generalize joint plots to datasets of larger dimensions, you end up with pair plots. This is very useful
for exploring correlations between multidimensional data, when you’d like to plot all pairs of values against each
other.
We’ll demo this with the well-known Iris dataset, which lists measurements of petals and sepals of three iris
species:
ATME College of 62
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Pair Plot
ATME College of 63
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Faceted histograms
Sometimes the best way to view data is via histograms of subsets. Seaborn’s FacetGrid makes this extremely
simple. We’ll take a look at some data that shows the amount that restaurant staff receive in tips based on
various indicator data (Figure 4-120):
ATME College of 64
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Faceted histograms
ATME College of 65
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Factor Plot
Factor plots can be useful for this kind of visualization as well. This allows you to view the distribution of a
parameter within bins defined by any other parameter (Figure 4-121)
ATME College of 66
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Factor Plot
ATME College of 67
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Joint distributions
Similar to the pair plot, we can use sns.jointplot to show the joint distribution between different datasets,
along with the associated marginal distribu tions (Figure 4-122)
ATME College of 68
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Joint distributions
The joint plot can even do some automatic kernel density estimation and regression (Figure 4-123):
ATME College of 69
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Bar plots
Time series can be plotted with sns.factorplot. In the following example (visualized in Figure 4-124), the Planets
data is used
ATME College of 70
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Bar Plot
ATME College of 71
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Bar Plot
ATME College of 72
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module 4 : Data Visualization with MatPlotlib
4.4 Visualization with Seaborn
4.4.2 Exploring Seaborn Plots
Bar Plot
We can learn more by looking at the method of discovery of each of these planets, as illustrated in Figure 4-
125:
ATME College of 73
Department of CSE-DS, ATMECE
Engineering, Mysuru
THANK
YOU
ATME College of 74
Department of CSE-DS, ATMECE
Engineering, Mysuru