0% found this document useful (0 votes)
18 views25 pages

DSV Mod 5 PPT

21CS44 DATA SCIENCE VISUALIZATION MODULE 5 NOTES VTU

Uploaded by

jayanthi.vm2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views25 pages

DSV Mod 5 PPT

21CS44 DATA SCIENCE VISUALIZATION MODULE 5 NOTES VTU

Uploaded by

jayanthi.vm2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

||Jai Sri Gurudev||

SJC Institute of Technology


Department of Computer Science and Engineering

Module-5: A Deep Dive into Matplotlib

Prepared
By

Prof. Ajay N
Assistant Professor,
Dept. of CSE,
SJCIT, Chickballapur.
Introduction
 In the previous chapter, we focused on various visualizations and identified which
visualization is best suited to show certain information for a given dataset.
 We learned about the features, uses, and best practices for following various plots such as
comparison plots, relation plots, composition plots, distribution plots, and geoplots.
 Matplotlib is probably the most popular plotting library for Python. It is used for data
science and machine learning visualizations all around the world. John Hunter was an
American neurobiologist who began developing Matplotlib in 2003.
Overview of Plots in Matplotlib
 Plots in Matplotlib have a hierarchical structure that nests Python objects to create a tree-
like structure. Each plot is encapsulated in a Figure object. This Figure is the top-level
container of the visualization. It can have multiple axes, which are basically individual
plots inside this top-level container.
The two main components of a plot are as follows:
• Figure
The Figure is an outermost container that allows you to draw
multiple plots within it. It not only holds the Axes object but
also has the ability to configure the Title.
• Axes
The axes is an actual plot, or subplot, depending on whether you
Figure 5.1: A Figure contains at least one axes object want to plot single or multiple visualizations. Its sub-objects
include the x-axis, y-axis, spines, and legends.
Anatomy of a Matplotlib Figure
• Spines: Lines connecting the axis tick marks
• Title: Text label of the whole Figure object
• Legend: Describes the content of the plot
• Grid: Vertical and horizontal lines used as an extension
of the tick marks
• X/Y axis label: Text labels for the X and Y axes below
the spines
• Minor tick: Small value indicators between the major
tick marks
• Minor tick label: Text label that will be displayed at
the minor ticks
• Major tick: Major value indicators on the spines
• Major tick label: Text label that will be displayed at
the major ticks
• Line: Plotting type that connects data points with a line
• Markers: Plotting type that plots every data point with
Figure 5.2: Anatomy of a Matplotlib Figure a defined marker
Pyplot Basics
pyplot contains a simpler interface for creating visualizations that allow the users to plot the data without explicitly
configuring the Figure and Axes themselves. They are automatically configured to achieve the desired output. It is handy to
use the alias plt to reference the imported submodule, as follows:
import matplotlib.pyplot as plt

Creating Figures

use plt.figure() to create a new Figure

By default, the Figure has a width of 6.4 inches and a height of 4.8 inches with a dpi (dots per inch) of 100. To change
the default values of the Figure, we can use the parameters figsize and dpi.

plt.figure(figsize=(10, 5)) #To change the width and the height


plt.figure(dpi=300) #To change the dpi
Pyplot Basics
Closing Figures

plt.close() command will close the current Figure

To close a specific Figure, you can either provide a reference to a Figure instance or provide the Figure number. To
find the number of a Figure object, we can make use of the number attribute, as follows:

plt.gcf().number

The plt.close('all') command is used to close all active Figures. The following example shows how a Figure can be
created and closed:

plt.figure(num=10) #Create Figure with Figure number 10


plt.close(10) #Close Figure with Figure number 10
Pyplot Basics
Format Strings specify colors, marker types, and line styles
A format string is specified as [color][marker][line], where each item is optional. If the color argument is the only
argument of the format string, you can use matplotlib.colors
Matplotlib recognizes the following formats, among others:
• RGB or RGBA float tuples (for example, (0.2, 0.4, 0.3) or (0.2, 0.4, 0.3, 0.5))
• RGB or RGBA hex strings (for example, '#0F0F0F' or '#0F0F0F0F')
Demonstration
Ticks
Tick locations and labels can be set manually if Matplotlib's default isn't sufficient. Considering the previous plot, it
might be preferable to only have ticks at multiples of ones at the x-axis. One way to accomplish this is to use
plt.xticks() and plt. yticks() to either get or set the ticks manually.
plt.xticks(ticks, [labels], [**kwargs]) sets the current tick locations and labels of the x-axis.
Parameters:
• ticks: List of tick locations; if an empty list is passed, ticks will be disabled.
• labels (optional): You can optionally pass a list of labels for the specified locations.
• **kwargs (optional): matplotlib.text.Text() properties can be used to customize the appearance of the tick
labels. A quite useful property is rotation; this allows you to rotate the tick labels to use space more efficiently.
Displaying Figures Saving Figures

plt.show() is used to display a Figure The plt.savefig(fname) saves the current Figure. There are
or multiple Figures. To display Figures some useful optional parameters you can specify, such as dpi,
within a Jupyter Notebook, simply set format, or transparent.
the %matplotlib inline command at
the beginning of the code.
If you forget to use plt.show(), the plot
won't show up
Labels
Matplotlib provides a few label functions that we can use for setting labels to the x- and y-axes. The plt.xlabel() and
plt.ylabel() functions are used to set the label for the current axes. The set_xlabel() and set_ylabel() functions are used
to set the label for specified axes.

Titles
A title describes a particular chart/graph. The titles are placed above the axes in the center, left edge, or right edge.
There are two options for titles – you can either set the Figure title or the title of an Axes. The suptitle() function sets
the title for the current and specified Figure. The title() function helps in setting the title for the current and specified
axes.
plt.title('Title', fontsize=16)

Annotations
Annotations are used to annotate some features of the plot. In annotations, there are two locations to consider: the
annotated location, xy, and the location of the annotation, text xytext. It is useful to specify the parameter arrowprops,
which results in an arrow pointing to the annotated location.
Example:
ax.annotate('Example of Annotate', xy=(4,2), xytext=(8,4),arrowprops=dict(facecolor='green', shrink=0.05))
Legends
Legend describes the content of the plot. To add a legend to your Axes, we have to specify the label parameter at the
time of plot creation. Calling plt.legend() for the current Axes or Axes.legend() for a specific Axes will add the legend.
The loc parameter specifies the location of the legend.

Example:
plt.plot([1, 2, 3], label='Label 1')
plt.plot([2, 4, 3], label='Label 2')
plt.legend()

Figure 5.14: Legend example


Bar Chart
The plt.bar(x, height, [width]) creates a vertical bar plot. For horizontal bars, use the plt.barh() function.

Important parameters:
• x: Specifies the x coordinates of the bars
• height: Specifies the height of the bars
• width (optional): Specifies the width of all bars; the default is 0.8
Example:
plt.bar(['A', 'B', 'C', 'D'], [20, 25, 40, 10])

Figure 5.16: A simple bar chart


Pie Chart
The plt.pie(x, [explode], [labels], [autopct]) function creates a pie chart.
Important parameters:
• x: Specifies the slice sizes.
• explode (optional): Specifies the fraction of the radius offset for each slice. The explode-array must have the same
length as the x-array.
• labels (optional): Specifies the labels for each slice.
• autopct (optional): Shows percentages inside the slices according to the specified format string. Example: '%1.1f%%'.

Stacked Bar Chart


A stacked bar chart uses the same plt.bar function as bar charts. For each stacked bar, the plt.bar function must be
called, and the bottom parameter must be specified, starting with the second stacked bar. This will become clear with the
following example:
plt.bar(x, bars1)
plt.bar(x, bars2, bottom=bars1)
plt.bar(x, bars3, bottom=np.add(bars1, bars2))

Figure 5.21: A stacked bar chart


Demonstration
Stacked Area Chart
plt.stackplot(x, y) creates a stacked area plot.
Important parameters:
• x: Specifies the x-values of the data series.
• y: Specifies the y-values of the data series. For multiple series, either as a 2D array or any number of 1D arrays, call
the following function: plt.stackplot(x, y1, y2, y3, …).
• labels (optional): Specifies the labels as a list or tuple for each data series.
Example:
plt.stackplot([1, 2, 3, 4], [2, 4, 5, 8], [1, 5, 4, 2])

Figure 5.23: Stacked area chart


Demonstration
Histogram
A histogram visualizes the distribution of a single numerical variable. Each bar represents the frequency for a certain
interval. The plt.hist(x) function creates a histogram.
Important parameters:
• x: Specifies the input values.
• bins: (optional): Specifies the number of bins as an integer or specifies the bin edges as a list.
• range: (optional): Specifies the lower and upper range of the bins as a tuple.
• density: (optional): If true, the histogram represents a probability density.
Example:
plt.hist(x, bins=30, density=True) Figure 5.25: Histogram

plt.hist2d(x, y) creates a 2D histogram. 2D histograms can be


used to visualize the frequency of two-dimensional data. The
data is plotted on the xy-plane and the frequency is indicated by
the color.

Figure 5.26: 2D histogram with color bar


Box Plot
The box plot shows multiple statistical measurements. The box extends from the lower to the upper quartile values of the
data, thereby allowing us to visualize the interquartile range. For more details regarding the plot, refer to the previous
chapter. The plt.boxplot(x) function creates a box plot.
Important parameters:
• x: Specifies the input data. It specifies either a 1D array for a single box, or a sequence of arrays for multiple boxes.
• notch: (optional) If true, notches will be added to the plot to indicate the confidence interval around the median.

• labels: (optional) Specifies the labels as a sequence.


• showfliers: (optional) By default, it is true, and outliers are plotted beyond the caps.
• showmeans: (optional) If true, arithmetic means are shown.
Example:
plt.boxplot([x1, x2], labels=['A', 'B'])

Figure 5.27: Box plot


Demonstration
Scatter Plot
Scatter plots show data points for two numerical variables, displaying a variable on both axes. plt.scatter(x, y) creates
a scatter plot of y versus x, with optionally varying marker size and/or color.
Important parameters:
• x, y: Specifies the data positions.
• s: (optional) Specifies the marker size in points squared.
• c: (optional) Specifies the marker color. If a sequence of numbers
is specified, the numbers will be mapped to the colors of the color map.
Example:
plt.scatter(x, y)

Bubble Plot Figure 5.31: Scatter plot


The plt.scatter function is used to create a bubble plot. To
visualize a third or fourth variable, the parameters s (scale) and
c (color) can be used.
Example:
plt.scatter(x, y, s=z*500, c=c, alpha=0.5)
plt.colorbar()
The colorbar function adds a colorbar to the plot, which
indicates the value of the color.
Figure 5.33: Bubble plot with color bar
Layouts
There are multiple ways to define a visualization layout in Matplotlib. By layout, we mean the arrangement of multiple
Axes within a Figure. We will start with subplots and how to use the tight layout to create visually appealing plots and
then cover GridSpec, which offers a more flexible way to create multi-plots.
Subplots
It is often useful to display several plots next to one another. Matplotlib offers the concept of
subplots, which are multiple Axes within a Figure. These plots can be grids of plots, nested
plots, and so on.
Explore the following options to create subplots:
• The plt.subplots(, ncols) function creates a Figure and a set of subplots. nrows, ncols define
the number of rows and columns of the subplots, respectively.
• The plt.subplot(nrows, ncols, index) function or, equivalently, plt. subplot(pos) adds a
subplot to the current Figure. The index starts at 1. The plt.subplot(2, 2, 1) function is
equivalent to plt.subplot(221).
• The Figure.subplots(nrows, ncols) function adds a set of subplots to the specified Figure.
• The Figure.add_subplot(nrows, ncols, index) function or, equivalently,
Figure.add_subplot(pos), adds a subplot to the specified Figure.
To share the x-axis or y-axis, the parameters sharex and sharey must be set, respectively. The Figure 5.34: Subplots
axis will have the same limits, ticks, and scale.
plt.subplot and Figure.add_subplot have the option to set a projection. For a polar projection,
either set the projection='polar' parameter or the parameter polar=True parameter.
Radar Charts
Radar charts, also known as spider or web charts, visualize multiple variables, with each variable plotted on its own
axis, resulting in a polygon. All axes are arranged radially, starting at the center with equal distance between each other,
and have the same scale.

Figure 5.38: Radar charts


GridSpec
The matplotlib.gridspec.GridSpec(nrows, ncols) function specifies the geometry of the grid in which a subplot will be
placed. For example, you can specify a grid with three rows and four columns. As a next step, you have to define which
elements of the gridspec are used by a subplot; elements of a gridspec are accessed in the same way as NumPy arrays.
You could, for example, only use a single element of a gridspec for a subplot and therefore end up with 12 subplots in
total. Another possibility, as shown in the following example, is to create a bigger subplot using 3x3 elements of the
gridspec and another three subplots with a single element each.
Images
If you want to include images in your visualizations or work with image data, Matplotlib offers several functions for you.
In this section, we will show you how to load, save, and plot images with Matplotlib.
Writing Mathematical Expressions
In case you need to write mathematical expressions within the code, Matplotlib supports TeX, one of the most
popular typesetting systems, especially for typesetting mathematical formulas. You can use it in any text by
placing your mathematical expression in a pair of dollar signs. There is no need to have TeX installed since
Matplotlib comes with its own parser.
An example of this is given in the following code:
plt.xlabel(‚$x$')
plt.ylabel('$\cos(x)$')

Figure 5.49: Diagram demonstrating mathematical expressions

You might also like