0% found this document useful (0 votes)
43 views50 pages

Datascience

Data science

Uploaded by

thanya.deashna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views50 pages

Datascience

Data science

Uploaded by

thanya.deashna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

CS3352-FOUNDATION OF DATA SCIENCE

Unit-5
DATA VISUALIZATION
Importing Matplotlib - Line plots - Scatter plots -
visualizing errors - density and contour plots -
Histograms - legends - colours -subplots-text
and annotation-customization - three
dimensional plotting - Geographic Data with
Base map-Visualization with Seaborn

INTRODUCTION OF MATPLOTLIB
Matplotlib is an amazing visualization library in Python for
2D plots of arrays.

Installation :
Windows, Linux and macOS distributions have matplotlib and most of its
dependencies as wheel packages. Run the following command to
install matplotlib

python -mpip install -U matplotlib

Importing matplotlib :

from matplotlib import pyplot as plt


or
import matplotlib.pyplot as plt
Basic plots in Matplotlib :

Matplotlib comes with a wide variety of plots. Plots helps to understand


trends, patterns, and to make correlations.

# importing matplotlib module

from matplotlib import pyplot as plt

# x-axis values

x = [5, 2, 9, 4, 7]

# Y-axis values

y = [10, 5, 8, 4, 2]

# Function to plot

plt.plot(x,y)

# function to show the plot

plt.show()

Output :
Bar plot :
# importing matplotlib module

from matplotlib import pyplot as plt

# x-axis values

x = [5, 2, 9, 4, 7]

# Y-axis values

y = [10, 5, 8, 4, 2]

# Function to plot the bar

plt.bar(x,y)

# function to show the plot

plt.show()

Output :
Histogram :
# importing matplotlib module
from matplotlib import pyplot as plt

# Y-axis values
y = [10, 5, 8, 4, 2]

# Function to plot histogram


plt.hist(y)

# Function to show the plot


plt.show()

Output :
Types of Line Plots
There are three main types of line plots that we commonly use, namely,

 Simple Line Graph


A simple line graph is a graph that is plotted by using only a single line. One
of these variables is almost always independent, while the other is a
dependent variable.

Ex: The line plot here is a single line plot that represents the data of
students height.

 Multiple Line Graph


A multiple line graph is a line graph with two or more lines plotted on it. It's
used to display the changes in two or more variables over the same time
span. The independent variable is normally plotted on the horizontal axis,
while the two or more dependent variables are plotted on the vertical.

Ex: Here the multiple line plots gives the data of a number of Class 9 and
Class 10 students choosing different subjects.
 Compound Line Graph
When information can be separated into various categories, this sort of
chart is used. It's an evolution of the basic line graph, which depicts the
overall data proportion as well as the various layers that make up the data.
We must first create several line graphs, then shade each portion to denote
the component of each data from the total while creating a compound line
map. The bottom lines each represent a portion of the total, while the top
line represents the total. The distance between any two consecutive lines
on a compound line graph represents the size of each element, with the
bottom line bounded by the origin.

Ex: Here the Compound Line graph gives the data of a number of Class 8,
Class 9, and Class 10 students choosing different subjects.
SIMPLE SCATTER PLOT
A scatter plot (aka scatter chart, scatter graph) uses dots to
represent values for two different numeric variables. The position
of each dot on the horizontal and vertical axis indicates values for
an individual data point. Scatter plots are used to observe
relationships between variables.
The example scatter plot above shows the diameters and heights
for a sample of fictional trees. Each dot represents a single tree;
each point’s horizontal position indicates that tree’s diameter (in
centimeters) and the vertical position indicates that tree’s height (in
meters). From the plot, we can see a generally tight positive
correlation between a tree’s diameter and its height. We can also
observe an outlier point, a tree that has a much larger diameter
than the others. This tree appears fairly short for its girth, which
might warrant further investigation.

Scatter plots’ primary uses are to observe and show relationships


between two numeric variables. The dots in a scatter plot not only
report the values of individual data points, but also patterns when
the data are taken as a whole.

A scatter plot can also be useful for identifying other patterns in


data. We can divide data points into groups based on how closely
sets of points cluster together. Scatter plots can also show if there
are any unexpected gaps in the data and if there are any outlier
points. This can be useful if we want to segment the data into
different parts, like in the development of user personas.
Example of data structure
DIAMETER HEIGHT

4.20 3.14

5.55 3.87

3.33 2.84

6.91 4.34

… …

In order to create a scatter plot, we need to select two columns from


a data table, one for each dimension of the plot. Each row of the
table will become a single dot in the plot with position according to
the column values.

Visualizing a Three-Dimensional Function


We'll start by demonstrating a contour plot using a function z=f(x,y)z=f(x,y), using the
following particular choice for ff (we've seen this before in Computation on Arrays:
Broadcasting, when we used it as a motivating example for array broadcasting):
In [2]:
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
A contour plot can be created with the plt.contour function. It takes three arguments: a
grid of x values, a grid of y values, and a grid of z values. The x and y values represent
positions on the plot, and the z values will be represented by the contour levels. Perhaps
the most straightforward way to prepare such data is to use the np.meshgrid function,
which builds two-dimensional grids from one-dimensional arrays:

In [3]:
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)

X, Y = np.meshgrid(x, y)
Z = f(X, Y)
Now let's look at this with a standard line-only contour plot:

In [4]:
plt.contour(X, Y, Z, colors='black');

Notice that by default when a single color is used, negative values are represented by
dashed lines, and positive values by solid lines. Alternatively, the lines can be color-coded
by specifying a colormap with the cmap argument. Here, we'll also specify that we want
more lines to be drawn—20 equally spaced intervals within the data range:

In [5]:
plt.contour(X, Y, Z, 20, cmap='RdGy');

Here we chose the RdGy (short for Red-Gray) colormap, which is a good choice for centered
data. Matplotlib has a wide range of colormaps available, which you can easily browse in
IPython by doing a tab completion on the plt.cm module:

plt.cm.<TAB>

Our plot is looking nicer, but the spaces between the lines may be a bit distracting. We can
change this by switching to a filled contour plot using the plt.contourf() function (notice
the f at the end), which uses largely the same syntax as plt.contour().

Additionally, we'll add a plt.colorbar() command, which automatically creates an


additional axis with labeled color information for the plot:

In [6]:
plt.contourf(X, Y, Z, 20, cmap='RdGy')
plt.colorbar();
The colorbar makes it clear that the black regions are "peaks," while the red regions are
"valleys."

One potential issue with this plot is that it is a bit "splotchy." That is, the color steps are
discrete rather than continuous, which is not always what is desired. This could be
remedied by setting the number of contours to a very high number, but this results in a
rather inefficient plot: Matplotlib must render a new polygon for each step in the level. A
better way to handle this is to use the plt.imshow() function, which interprets a two-
dimensional grid of data as an image.

The following code shows this:

In [7]:
plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower',
cmap='RdGy')
plt.colorbar()
plt.axis(aspect='image');
There are a few potential gotchas with imshow(), however:

 plt.imshow() doesn't accept an x and y grid, so you must manually specify


the extent [xmin, xmax, ymin, ymax] of the image on the plot.
 plt.imshow() by default follows the standard image array definition where the origin is in
the upper left, not in the lower left as in most contour plots. This must be changed when
showing gridded data.
 plt.imshow() will automatically adjust the axis aspect ratio to match the input data; this
can be changed by setting, for example, plt.axis(aspect='image') to make x and y units
match.

Finally, it can sometimes be useful to combine contour plots and image plots. For
example, here we'll use a partially transparent background image (with transparency set
via the alpha parameter) and overplot contours with labels on the contours themselves
(using the plt.clabel() function):

In [8]:
contours = plt.contour(X, Y, Z, 3, colors='black')
plt.clabel(contours, inline=True, fontsize=8)

plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower',


cmap='RdGy', alpha=0.5)
plt.colorbar();
The combination of these three functions—plt.contour, plt.contourf, and plt.imshow—
gives nearly limitless possibilities for displaying this sort of three-dimensional data within
a two-dimensional plot. For more information on the options available in these functions,
refer to their docstrings. If you are interested in three-dimensional visualizations of this
type of data, see Three-dimensional Plotting in Matplotlib.

What are Contour Plots?


Contour plots (sometimes called Level Plots) are a way to show a three-dimensional surface on
a two-dimensional plane. It graphs two predictor variables X Y on the y-axis and a response
variable Z as contours. These contours are sometimes called z-slices or iso-response values.

This type of graph is widely used in cartography, where contour lines on a topological map
indicate elevations that are the same. Many other disciples use contour graphs including:
astrology, meteorology, and physics. Contour lines commonly show altitude (like height of a
geographical features), but they can also be used to show density, brightness, or electric
potential.

A contour plot is appropriate if you want to see how some value Z changes as
a function of two inputs, X and Y:
z = f(x, y).

The most common form is the rectangular contour plot, which is (as the name suggests)
shaped like a rectangle.
Polar contour plots are circular.

Polar plot for the function r sin(θ).

Histograms, Binnings, and Density


A simple histogram can be a great first step in understanding a dataset. Earlier, we saw a
preview of Matplotlib's histogram function (see Comparisons, Masks, and Boolean Logic),
which creates a basic histogram in one line, once the normal boiler-plate imports are
done:

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')

data = np.random.randn(1000)
In [2]:

The hist() function has many options to tune both the calculation and the display; here's
an example of a more customized histogram:

In [3]:
plt.hist(data, bins=30, normed=True, alpha=0.5,
histtype='stepfilled', color='steelblue',
edgecolor='none');

The plt.hist docstring has more information on other customization options available. I
find this combination of histtype='stepfilled' along with some transparency alpha to be
very useful when comparing histograms of several distributions:
In [4]:
x1 = np.random.normal(0, 0.8, 1000)
x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)

kwargs = dict(histtype='stepfilled', alpha=0.3, normed=True, bins=40)

plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);

If you would like to simply compute the histogram (that is, count the number of points in a
given bin) and not display it, the np.histogram() function is available:

In [5]:
counts, bin_edges = np.histogram(data, bins=5)
print(counts)
[ 12 190 468 301 29]

Two-Dimensional Histograms and Binnings


Just as we create histograms in one dimension by dividing the number-line into bins, we
can also create histograms in two-dimensions by dividing points among two-dimensional
bins. We'll take a brief look at several ways to do this here. We'll start by defining some
data—an x and y array drawn from a multivariate Gaussian distribution:

In [6]:
mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 10000).T

plt.hist2

Two-dimensional histogram
One straightforward way to plot a two-dimensional histogram is to use
Matplotlib's plt.hist2d function:

In [12]:
plt.hist2d(x, y, bins=30, cmap='Blues')
cb = plt.colorbar()
cb.set_label('counts in bin')

Just as with plt.hist, plt.hist2d has a number of extra options to fine-tune the plot and
the binning, which are nicely outlined in the function docstring. Further, just
as plt.hist has a counterpart in np.histogram, plt.hist2d has a counterpart
in np.histogram2d, which can be used as follows:

In [8]:
counts, xedges, yedges = np.histogram2d(x, y, bins=30)
For the generalization of this histogram binning in dimensions higher than two, see
the np.histogramdd function.
plt.hexbin

Hexagonal binnings
The two-dimensional histogram creates a tesselation of squares across the axes. Another
natural shape for such a tesselation is the regular hexagon. For this purpose, Matplotlib
provides the plt.hexbin routine, which will represents a two-dimensional dataset binned
within a grid of hexagons:

In [9]:
plt.hexbin(x, y, gridsize=30, cmap='Blues')
cb = plt.colorbar(label='count in bin')

plt.hexbin has a number of interesting options, including the ability to specify weights for
each point, and to change the output in each bin to any NumPy aggregate (mean of
weights, standard deviation of weights, etc.).

Customizing Plot Legends


Plot legends give meaning to a visualization, assigning meaning to the various plot
elements. We previously saw how to create a simple legend; here we'll take a look at
customizing the placement and aesthetics of the legend in Matplotlib.

The simplest legend can be created with the plt.legend() command, which automatically
creates a legend for any labeled plot elements:
In [1]:
import matplotlib.pyplot as plt
plt.style.use('classic')
In [2]:
%matplotlib inline
import numpy as np
In [3]:
x = np.linspace(0, 10, 1000)
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), '-b', label='Sine')
ax.plot(x, np.cos(x), '--r', label='Cosine')
ax.axis('equal')
leg = ax.legend();

CUSTOMIZED COLORBARS
A colorbar needs a "mappable" (matplotlib.cm.ScalarMappable) object (typically, an
image) which indicates the colormap and the norm to be used. In order to create a
colorbar without an attached image, one can instead use a ScalarMappable with no
associated data.

Basic continuous colorbar


Here we create a basic continuous colorbar with ticks and labels.

The arguments to the colorbar call are the ScalarMappable (constructed using
the norm and cmap arguments), the axes where the colorbar should be drawn, and the
colorbar's orientation.

For more information see the colorbar API.

import matplotlib.pyplot as plt


import matplotlib as mpl
fig, ax = plt.subplots(figsize=(6, 1))
fig.subplots_adjust(bottom=0.5)

cmap = mpl.cm.cool
norm = mpl.colors.Normalize(vmin=5, vmax=10)

fig.colorbar(mpl.cm.ScalarMappable(norm=norm, cmap=cmap),
cax=ax, orientation='horizontal', label='Some Units')

Creating multiple subplots


using plt.subplots
pyplot.subplots creates a figure and a grid of subplots with a single call, while providing
reasonable control over how the individual plots are created. For more advanced use
cases you can use GridSpec for a more general subplot layout or Figure.add_subplot for
adding subplots at arbitrary locations within the figure.

import matplotlib.pyplot as plt


import numpy as np

# Some example data to display


x = np.linspace(0, 2 * np.pi, 400)
y = np.sin(x ** 2)
A figure with just one subplot
subplots() without arguments returns a Figure and a single Axes.

This is actually the simplest and recommended way of creating a single Figure and Axes.

fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_title('A single plot')
Stacking subplots in one direction

The first two optional arguments of pyplot.subplots define the number of rows and columns
of the subplot grid.

When stacking in one direction only, the returned axs is a 1D numpy array containing the list
of created Axes.

fig, axs = plt.subplots(2)


fig.suptitle('Vertically stacked subplots')
axs[0].plot(x, y)
axs[1].plot(x, -y)
If you are creating just a few Axes, it's handy to unpack them immediately to dedicated
variables for each Axes. That way, we can use ax1 instead of the more verbose axs[0].

fig, (ax1, ax2) = plt.subplots(2)


fig.suptitle('Vertically stacked subplots')
ax1.plot(x, y)
ax2.plot(x, -y)
To obtain side-by-side subplots, pass parameters 1, 2 for one row and two columns.

fig, (ax1, ax2) = plt.subplots(1, 2)


fig.suptitle('Horizontally stacked subplots')
ax1.plot(x, y)
ax2.plot(x, -y)
Stacking subplots in two directions
When stacking in two directions, the returned axs is a 2D NumPy array.

If you have to set parameters for each subplot it's handy to iterate over all subplots in a 2D
grid using for ax in axs.flat:.

fig, axs = plt.subplots(2, 2)


axs[0, 0].plot(x, y)
axs[0, 0].set_title('Axis [0, 0]')
axs[0, 1].plot(x, y, 'tab:orange')
axs[0, 1].set_title('Axis [0, 1]')
axs[1, 0].plot(x, -y, 'tab:green')
axs[1, 0].set_title('Axis [1, 0]')
axs[1, 1].plot(x, -y, 'tab:red')
axs[1, 1].set_title('Axis [1, 1]')

for ax in axs.flat:
ax.set(xlabel='x-label', ylabel='y-label')

# Hide x labels and tick labels for top plots and y ticks for right plots.
for ax in axs.flat:
ax.label_outer()
You can use tuple-unpacking also in 2D to assign all subplots to dedicated variables:

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)


fig.suptitle('Sharing x per column, y per row')
ax1.plot(x, y)
ax2.plot(x, y**2, 'tab:orange')
ax3.plot(x, -y, 'tab:green')
ax4.plot(x, -y**2, 'tab:red')

for ax in fig.get_axes():
ax.label_outer()
Sharing axes
By default, each Axes is scaled individually. Thus, if the ranges are different the tick values
of the subplots do not align.

fig, (ax1, ax2) = plt.subplots(2)


fig.suptitle('Axes values are scaled individually by default')
ax1.plot(x, y)
ax2.plot(x + 1, -y)
You can use sharex or sharey to align the horizontal or vertical axis.

fig, (ax1, ax2) = plt.subplots(2, sharex=True)


fig.suptitle('Aligning x-axis using sharex')
ax1.plot(x, y)
ax2.plot(x + 1, -y)
Setting sharex or sharey to True enables global sharing across the whole grid, i.e. also the y-
axes of vertically stacked subplots have the same scale when using sharey=True.

fig, axs = plt.subplots(3, sharex=True, sharey=True)


fig.suptitle('Sharing both axes')
axs[0].plot(x, y ** 2)
axs[1].plot(x, 0.3 * y, 'o')
axs[2].plot(x, y, '+')
For subplots that are sharing axes one set of tick labels is enough. Tick labels of inner Axes
are automatically removed by sharex and sharey. Still there remains an unused empty space
between the subplots.

To precisely control the positioning of the subplots, one can explicitly create
a GridSpec with Figure.add_gridspec, and then call its subplots method. For example, we
can reduce the height between vertical subplots using add_gridspec(hspace=0).

label_outer is a handy method to remove labels and ticks from subplots that are not at the
edge of the grid.

fig = plt.figure()
gs = fig.add_gridspec(3, hspace=0)
axs = gs.subplots(sharex=True, sharey=True)
fig.suptitle('Sharing both axes')
axs[0].plot(x, y ** 2)
axs[1].plot(x, 0.3 * y, 'o')
axs[2].plot(x, y, '+')

# Hide x labels and tick labels for all but bottom plot.
for ax in axs:
ax.label_outer()
Apart from True and False, both sharex and sharey accept the values 'row' and 'col' to share
the values only per row or column.

fig = plt.figure()
gs = fig.add_gridspec(2, 2, hspace=0, wspace=0)
(ax1, ax2), (ax3, ax4) = gs.subplots(sharex='col', sharey='row')
fig.suptitle('Sharing x per column, y per row')
ax1.plot(x, y)
ax2.plot(x, y**2, 'tab:orange')
ax3.plot(x + 1, -y, 'tab:green')
ax4.plot(x + 2, -y**2, 'tab:red')

for ax in fig.get_axes():
ax.label_outer()
If you want a more complex sharing structure, you can first create the grid of axes with no
sharing, and then call axes.Axes.sharex or axes.Axes.sharey to add sharing info a posteriori.

fig, axs = plt.subplots(2, 2)


axs[0, 0].plot(x, y)
axs[0, 0].set_title("main")
axs[1, 0].plot(x, y**2)
axs[1, 0].set_title("shares x with main")
axs[1, 0].sharex(axs[0, 0])
axs[0, 1].plot(x + 1, y + 1)
axs[0, 1].set_title("unrelated")
axs[1, 1].plot(x + 2, y + 2)
axs[1, 1].set_title("also unrelated")
fig.tight_layout()
Polar axes
The parameter subplot_kw of pyplot.subplots controls the subplot properties (see
also Figure.add_subplot). In particular, this can be used to create a grid of polar Axes.

fig, (ax1, ax2) = plt.subplots(1, 2, subplot_kw=dict(projection='polar'))


ax1.plot(x, y)
ax2.plot(x, y ** 2)

plt.show()
Text and Annotation

Creating a good visualization involves guiding the reader so that the figure tells a story. In
some cases, this story can be told in an entirely visual manner, without the need for added
text, but in others, small textual cues and labels are necessary. Perhaps the most basic
types of annotations you will use are axes labels and titles, but the options go beyond this.
Let's take a look at some data and how we might visualize and annotate it to help convey
interesting information. We'll start by setting up the notebook for plotting and importing
the functions we will use:

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.style.use('seaborn-whitegrid')
import numpy as np
import pandas as pd

Example: Effect of Holidays on US Births


Let's return to some data we worked with earler, in "Example: Birthrate Data", where we
generated a plot of average births over the course of the calendar year; as already
mentioned, that this data can be downloaded
at https://raw.githubusercontent.com/jakevdp/data-CDCbirths/master/births.csv.

We'll start with the same cleaning procedure we used there, and plot the results:

In [2]:
births = pd.read_csv('data/births.csv')

quartiles = np.percentile(births['births'], [25, 50, 75])


mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0])
births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)')

births['day'] = births['day'].astype(int)

births.index = pd.to_datetime(10000 * births.year +


100 * births.month +
births.day, format='%Y%m%d')
births_by_date = births.pivot_table('births',
[births.index.month, births.index.day])
births_by_date.index = [pd.datetime(2012, month, day)
for (month, day) in births_by_date.index]
In [3]:
fig, ax = plt.subplots(figsize=(12, 4))
births_by_date.plot(ax=ax);

When we're communicating data like this, it is often useful to annotate certain features of
the plot to draw the reader's attention. This can be done manually with
the plt.text/ax.text command, which will place text at a particular x/y value:

In [4]:
fig, ax = plt.subplots(figsize=(12, 4))
births_by_date.plot(ax=ax)

# Add labels to the plot


style = dict(size=10, color='gray')

ax.text('2012-1-1', 3950, "New Year's Day", **style)


ax.text('2012-7-4', 4250, "Independence Day", ha='center', **style)
ax.text('2012-9-4', 4850, "Labor Day", ha='center', **style)
ax.text('2012-10-31', 4600, "Halloween", ha='right', **style)
ax.text('2012-11-25', 4450, "Thanksgiving", ha='center', **style)
ax.text('2012-12-25', 3850, "Christmas ", ha='right', **style)

# Label the axes


ax.set(title='USA births by day of year (1969-1988)',
ylabel='average daily births')

# Format the x axis with centered month labels


ax.xaxis.set_major_locator(mpl.dates.MonthLocator())
ax.xaxis.set_minor_locator(mpl.dates.MonthLocator(bymonthday=15))
ax.xaxis.set_major_formatter(plt.NullFormatter())
ax.xaxis.set_minor_formatter(mpl.dates.DateFormatter('%h'));
The ax.text method takes an x position, a y position, a string, and then optional keywords
specifying the color, size, style, alignment, and other properties of the text. Here we
used ha='right' and ha='center', where ha is short for horizonal alignment. See the
docstring of plt.text() and of mpl.text.Text() for more information on available options.

Transforms and Text Position


In the previous example, we have anchored our text annotations to data locations.
Sometimes it's preferable to anchor the text to a position on the axes or figure,
independent of the data. In Matplotlib, this is done by modifying the transform.

Any graphics display framework needs some scheme for translating between coordinate
systems. For example, a data point at (x,y)=(1,1)(x,y)=(1,1) needs to somehow be
represented at a certain location on the figure, which in turn needs to be represented in
pixels on the screen. Mathematically, such coordinate transformations are relatively
straightforward, and Matplotlib has a well-developed set of tools that it uses internally to
perform them (these tools can be explored in the matplotlib.transforms submodule).

The average user rarely needs to worry about the details of these transforms, but it is
helpful knowledge to have when considering the placement of text on a figure. There are
three pre-defined transforms that can be useful in this situation:

 ax.transData: Transform associated with data coordinates


 ax.transAxes: Transform associated with the axes (in units of axes dimensions)
 fig.transFigure: Transform associated with the figure (in units of figure dimensions)

Here let's look at an example of drawing text at various locations using these transforms:

In [5]:
fig, ax = plt.subplots(facecolor='lightgray')
ax.axis([0, 10, 0, 10])

# transform=ax.transData is the default, but we'll specify it anyway


ax.text(1, 5, ". Data: (1, 5)", transform=ax.transData)
ax.text(0.5, 0.1, ". Axes: (0.5, 0.1)", transform=ax.transAxes)
ax.text(0.2, 0.2, ". Figure: (0.2, 0.2)", transform=fig.transFigure);

Note that by default, the text is aligned above and to the left of the specified coordinates:
here the "." at the beginning of each string will approximately mark the given coordinate
location.

The transData coordinates give the usual data coordinates associated with the x- and y-
axis labels. The transAxes coordinates give the location from the bottom-left corner of the
axes (here the white box), as a fraction of the axes size. The transFigure coordinates are
similar, but specify the position from the bottom-left of the figure (here the gray box), as a
fraction of the figure size.

Notice now that if we change the axes limits, it is only the transData coordinates that will
be affected, while the others remain stationary:

In [6]:
ax.set_xlim(0, 2)
ax.set_ylim(-6, 6)
fig
Out[6]:
This behavior can be seen more clearly by changing the axes limits interactively: if you are
executing this code in a notebook, you can make that happen by changing %matplotlib
inline to %matplotlib notebook and using each plot's menu to interact with the plot.

Arrows and Annotation


Along with tick marks and text, another useful annotation mark is the simple arrow.

Drawing arrows in Matplotlib is often much harder than you'd bargain for. While there is
a plt.arrow() function available, I wouldn't suggest using it: the arrows it creates are SVG
objects that will be subject to the varying aspect ratio of your plots, and the result is rarely
what the user intended. Instead, I'd suggest using the plt.annotate() function. This
function creates some text and an arrow, and the arrows can be very flexibly specified.

Here we'll use annotate with several of its options:

In [7]:
%matplotlib inline

fig, ax = plt.subplots()

x = np.linspace(0, 20, 1000)


ax.plot(x, np.cos(x))
ax.axis('equal')

ax.annotate('local maximum', xy=(6.28, 1), xytext=(10, 4),


arrowprops=dict(facecolor='black', shrink=0.05))
ax.annotate('local minimum', xy=(5 * np.pi, -1), xytext=(2, -6),
arrowprops=dict(arrowstyle="->",
connectionstyle="angle3,angleA=0,angleB=-90"));

The arrow style is controlled through the arrowprops dictionary, which has numerous
options available. These options are fairly well-documented in Matplotlib's online
documentation, so rather than repeating them here it is probably more useful to quickly
show some of the possibilities. Let's demonstrate several of the possible options using the
birthrate plot from before:

In [8]:
fig, ax = plt.subplots(figsize=(12, 4))
births_by_date.plot(ax=ax)

# Add labels to the plot


ax.annotate("New Year's Day", xy=('2012-1-1', 4100), xycoords='data',
xytext=(50, -30), textcoords='offset points',
arrowprops=dict(arrowstyle="->",
connectionstyle="arc3,rad=-0.2"))

ax.annotate("Independence Day", xy=('2012-7-4', 4250), xycoords='data',


bbox=dict(boxstyle="round", fc="none", ec="gray"),
xytext=(10, -40), textcoords='offset points', ha='center',
arrowprops=dict(arrowstyle="->"))

ax.annotate('Labor Day', xy=('2012-9-4', 4850), xycoords='data', ha='center',


xytext=(0, -20), textcoords='offset points')
ax.annotate('', xy=('2012-9-1', 4850), xytext=('2012-9-7', 4850),
xycoords='data', textcoords='data',
arrowprops={'arrowstyle': '|-|,widthA=0.2,widthB=0.2', })

ax.annotate('Halloween', xy=('2012-10-31', 4600), xycoords='data',


xytext=(-80, -40), textcoords='offset points',
arrowprops=dict(arrowstyle="fancy",
fc="0.6", ec="none",
connectionstyle="angle3,angleA=0,angleB=-90"))

ax.annotate('Thanksgiving', xy=('2012-11-25', 4500), xycoords='data',


xytext=(-120, -60), textcoords='offset points',
bbox=dict(boxstyle="round4,pad=.5", fc="0.9"),
arrowprops=dict(arrowstyle="->",
connectionstyle="angle,angleA=0,angleB=80,rad=20"))

ax.annotate('Christmas', xy=('2012-12-25', 3850), xycoords='data',


xytext=(-30, 0), textcoords='offset points',
size=13, ha='right', va="center",
bbox=dict(boxstyle="round", alpha=0.1),
arrowprops=dict(arrowstyle="wedge,tail_width=0.5", alpha=0.1));

# Label the axes


ax.set(title='USA births by day of year (1969-1988)',
ylabel='average daily births')

# Format the x axis with centered month labels


ax.xaxis.set_major_locator(mpl.dates.MonthLocator())
ax.xaxis.set_minor_locator(mpl.dates.MonthLocator(bymonthday=15))
ax.xaxis.set_major_formatter(plt.NullFormatter())
ax.xaxis.set_minor_formatter(mpl.dates.DateFormatter('%h'));

ax.set_ylim(3600, 5400);

You'll notice that the specifications of the arrows and text boxes are very detailed: this
gives you the power to create nearly any arrow style you wish. Unfortunately, it also
means that these sorts of features often must be manually tweaked, a process that can be
very time consuming when producing publication-quality graphics! Finally, I'll note that
the preceding mix of styles is by no means best practice for presenting data, but rather
included as a demonstration of some of the available options.

More discussion and examples of available arrow and annotation styles can be found in
the Matplotlib gallery, in particular the Annotation Demo.
Customizing Ticks

Matplotlib's default tick locators and formatters are designed to be generally sufficient in
many common situations, but are in no way optimal for every plot. This section will give
several examples of adjusting the tick locations and formatting for the particular plot type
you're interested in.

Before we go into examples, it will be best for us to understand further the object
hierarchy of Matplotlib plots. Matplotlib aims to have a Python object representing
everything that appears on the plot: for example, recall that the figure is the bounding
box within which plot elements appear. Each Matplotlib object can also act as a container
of sub-objects: for example, each figure can contain one or more axes objects, each of
which in turn contain other objects representing plot contents.

The tick marks are no exception. Each axes has attributes xaxis and yaxis, which in turn
have attributes that contain all the properties of the lines, ticks, and labels that make up
the axes.

Major and Minor Ticks


Within each axis, there is the concept of a major tick mark, and a minor tick mark. As the
names would imply, major ticks are usually bigger or more pronounced, while minor ticks
are usually smaller. By default, Matplotlib rarely makes use of minor ticks, but one place
you can see them is within logarithmic plots:

In [1]:
import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np
In [2]:
ax = plt.axes(xscale='log', yscale='log')
ax.grid();
We see here that each major tick shows a large tickmark and a label, while each minor tick
shows a smaller tickmark with no label.

These tick properties—locations and labels—that is, can be customized by setting


the formatter and locator objects of each axis. Let's examine these for the x axis of the
just shown plot:

In [3]:
print(ax.xaxis.get_major_locator())
print(ax.xaxis.get_minor_locator())
<matplotlib.ticker.LogLocator object at 0x10dbaf630>
<matplotlib.ticker.LogLocator object at 0x10dba6e80>
In [4]:
print(ax.xaxis.get_major_formatter())
print(ax.xaxis.get_minor_formatter())
<matplotlib.ticker.LogFormatterMathtext object at 0x10db8dbe0>
<matplotlib.ticker.NullFormatter object at 0x10db9af60>
We see that both major and minor tick labels have their locations specified by
a LogLocator (which makes sense for a logarithmic plot). Minor ticks, though, have their
labels formatted by a NullFormatter: this says that no labels will be shown.

We'll now show a few examples of setting these locators and formatters for various plots.
Hiding Ticks or Labels
Perhaps the most common tick/label formatting operation is the act of hiding ticks or
labels. This can be done using plt.NullLocator() and plt.NullFormatter(), as shown
here:

In [5]:
ax = plt.axes()
ax.plot(np.random.rand(50))

ax.yaxis.set_major_locator(plt.NullLocator())
ax.xaxis.set_major_formatter(plt.NullFormatter())

Three-dimensional Plotting in Python


using Matplotlib

Matplotlib was introduced keeping in mind, only


two-dimensional plotting. But at the time when the
release of 1.0 occurred, the 3d utilities were developed
upon the 2d and thus, we have 3d implementation of
data available today! The 3d plots are enabled by
importing the mplot3d toolkit. In this article, we will deal
with the 3d plots using matplotlib.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection ='3d')

Output:

Plotting 3-D Lines and Points


Graph with lines and point are the simplest 3
dimensional graph. ax.plot3d and ax.scatter are the
function to plot line and point graph respectively.

importing mplot3d toolkits, numpy and matplotli# b


from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
# syntax for 3-D projection
ax = plt.axes(projection ='3d')
# defining all 3 axes
z = np.linspace(0, 1, 100)
x = z * np.sin(25 * z)
y = z * np.cos(25 * z)
# plotting
ax.plot3D(x, y, z, 'green')
ax.set_title('3D line plot ')
plt.show()

Output:
Points, Lines:
A Point in three-dimensional geometry is defined as a location in
3D space that is uniquely defined by an ordered triplet (x, y, z) where x, y, &
z are the distances of the point from the X-axis, Y-axis, and Z-axis
respectively.

A Line in three-dimensional geometry is defined as a set of points in 3D that


extends infinitely in both directions and is represented by L : (x – x1) / l = (y –
y1) / m = (z – z1) / n; here (x, y, z) are the position coordinates of any
variable point lying on the line, (x1, y1, z1) are the position coordinates of a
point P lying on the line, and l, m, & n are the direction ratios (DRs). In 3D a
line is also formed by the intersection of two non-parallel planes.
Mapping Geographical Data with
Basemap Python Package
Basemap is a matplotlib extension used to visualize and create
geographical maps in python. The main purpose of this tutorial is to
provide basic information on how to plot and visualize geographical
data with the help of Basemap package.

Data Visualization with Python Seaborn


Data Visualization is the presentation of data in pictorial format.
It is extremely important for Data Analysis, primarily because of the fantastic
ecosystem of data-centric Python packages. And it helps to understand the
data, however, complex it is, the significance of data by summarizing and
presenting a huge amount of data in a simple and easy-to-understand format
and helps communicate information clearly and effectively.

import pandas as pd

# initialise data of lists.


data = {'Name':[ 'Mohe' , 'Karnal' , 'Yrik' , 'jack' ],
'Age':[ 30 , 21 , 29 , 28 ]}

# Create DataFrame
df = pd.DataFrame( data )

# Print the output.


dp

Output:

You might also like