Unit 4 DSF
Unit 4 DSF
UNIT IV
DATA VISUALIZATION
Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density and contour plots – Histograms –
legends – colors – subplots – text and annotation – customization – three dimensional plotting - Geographic Data
with Basemap - Visualization with Seaborn.
60 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
plt.xlim(10, 0)
plt.ylim(1.2, -1.2);
The plt.axis() method allows you to set the x and y limits with a single call, by passing a list that
specifies [xmin, xmax, ymin, ymax]
plt.axis([-1, 11, -1.5, 1.5]);
Aspect ratio equal is used to represent one unit in x is equal to one unit in y.
plt.axis('equal')
Labeling Plots
The labeling of plots includes titles, axis labels, and simple legends.
Title - plt.title()
Label - plt.xlabel()
plt.ylabel()
Legend - plt.legend()
OUTPUT:
Line style:
import matplotlib.pyplot as plt
import numpy as
np fig = plt.figure()
ax = plt.axes()
x = np.linspace(0, 10, 1000)
plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1, linestyle='dashed')
plt.plot(x, x + 2, linestyle='dashdot')
61 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
plt.plot(x, x + 3, linestyle='dotted');
# For short, you can use the following codes:
plt.plot(x, x + 4, linestyle='-') # solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.')# dashdot
plt.plot(x, x + 7, linestyle=':'); # dotted
OUTPUT:
OUTPUT:
62 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
Example
Various symbols used to specify ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']
Short hand assignment of line, symbol and color also allowed.
plt.plot(x, y, '-ok');
Additional arguments in plt.plot()
We can specify some other parameters related with scatter plot which makes it more attractive. They are
color, marker size, linewidth, marker face color, marker edge color, marker edge width, etc
Example
plt.plot(x, y, '-p', color='gray',
markersize=15, linewidth=4,
markerfacecolor='white',
markeredgecolor='gray',
markeredgewidth=2) plt.ylim(-1.2, 1.2);
63 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
3. VISUALIZING ERRORS
For any scientific measurement, accurate accounting for errors is nearly as important, if not more important,
than accurate reporting of the number itself. For example, imagine that I am using some astrophysical
observations to estimate the Hubble Constant, the local measurement of the expansion rate of the Universe.
In visualization of data and results, showing these errors effectively can make a plot convey much more
complete information.
Types of errors
Basic Errorbars
Continuous Errors
Basic Errorbars
A basic errorbar can be created with a single Matplotlib function call.
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)
plt.errorbar(x, y, yerr=dy, fmt='.k');
64 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
Here the fmt is a format code controlling the appearance of lines and points, and has the same syntax
as the shorthand used in plt.plot()
In addition to these basic options, the errorbar function has many options to fine tune the outputs.
Using these additional options you can easily customize the aesthetics of your errorbar plot.
Continuous Errors
In some situations it is desirable to show errorbars on continuous quantities. Though Matplotlib does
not have a built-in convenience routine for this type of application, it’s relatively easy to combine
primitives like plt.plot and plt.fill_between for a useful result.
Here we’ll perform a simple Gaussian process regression (GPR), using the Scikit-Learn API. This is a
method of fitting a very flexible nonparametric function to data with a continuous measure of the
uncertainty.
4. DENSITY AND CONTOUR PLOTS
To display three-dimensional data in two dimensions using contours or color-coded regions. There are three
Matplotlib functions that can be helpful for this task:
plt.contour for contour plots,
plt.contourf for filled contour plots, and
plt.imshow for showing images.
65 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
Example
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, colors='black');
Notice that by default when a single color is used, negative values are represented by dashed lines, and
positive values by solid lines.
Alternatively, you can color-code the lines by specifying a colormap with the cmap argument.
We’ll also specify that we want more lines to be drawn—20 equally spaced intervals within the data range.
plt.contour(X, Y, Z, 20, cmap='RdGy');
One potential issue with this plot is that it is a bit “splotchy.” That is, the color steps are discrete rather
than continuous, which is not always what is desired.
You could remedy this by setting the number of contours to a very high number, but this results in a
rather inefficient plot: Matplotlib must render a new polygon for each step in the level.
A better way to handle this is to use the plt.imshow() function, which interprets a two-dimensional grid
of data as an image.
Example Program
66 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
import numpy as np
import matplotlib.pyplot as plt
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) *
np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.imshow(Z, extent=[0, 10, 0, 10], origin='lower', cmap='RdGy')
plt.colorbar()
5. HISTOGRAMS
Histogram is the simple plot to represent the large data set. A histogram is a graph showing
frequency distributions. It is a graph showing the number of observations within each given interval.
5. 1Parameters
plt.hist( ) is used to plot histogram. The hist() function will use an array of numbers to create a
histogram, the array is sent into the function as an argument.
bins - A histogram displays numerical data by grouping data into "bins" of equal width. Each bin is
plotted as a bar whose height corresponds to how many data points are in that bin. Bins are also
sometimes called "intervals", "classes", or "buckets".
normed - Histogram normalization is a technique to distribute the frequencies of the histogram over
a wider range than the current range.
x - (n,) array or sequence of (n,) arrays Input values, this takes either a single array or a sequence of
arrays which are not required to be of the same length.
histtype - {'bar', 'barstacked', 'step', 'stepfilled'}, optional The type of histogram to draw.
'bar' is a traditional bar-type histogram. If multiple data are given the bars are arranged side
by side.
'barstacked' is a bar-type histogram where multiple data are stacked on top of each other.
'step' generates a lineplot that is by default unfilled.
'stepfilled' generates a lineplot that is by default filled. Default is 'bar'
align - {'left', 'mid', 'right'}, optional Controls how the histogram is plotted.
Default is None
label - str or None, optional. Default is None
Example
import numpy as np
67 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
The hist() function has many options to tune both the calculation and the display; here’s an example of
a more customized histogram.
The plt.hist docstring has more information on other customization options available. I find this combination
of histtype='stepfilled' along with some transparency alpha to be very useful when comparing histograms of
several distributions
OUTPUT:
Example
mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 1000).T
plt.hist2d(x, y, bins=30, cmap='Blues')
cb = plt.colorbar()
cb.set_label('counts in bin')
OUTPUT:
68 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
6. LEGENDS
Plot legends give meaning to a visualization, assigning labels to the various plot elements. We previously saw
how to create a simple legend; here we’ll take a look at customizing the placement and aesthetics of the legend
in Matplotlib.
Plot legends give meaning to a visualization, assigning labels to the various plot elements. We previously saw
how to create a simple legend; here we’ll take a look at customizing the placement and aesthetics of the legend
in Matplotlib
69 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
lines = plt.plot(x, y)
plt.legend(lines[:2],['first','second']);
# Applying label individually.
plt.plot(x, y[:, 0], label='first')
plt.plot(x, y[:, 1], label='second')
plt.plot(x, y[:, 2:])
plt.legend(framealpha=1, frameon=True);
Example
import matplotlib.pyplot as plt
plt.style.use('classic')
import numpy as np
x = np.linspace(0, 10, 1000)
ax.legend(loc='lower center', frameon=True, shadow=True,borderpad=1,fancybox=True)
fig
7. COLOR BARS
In Matplotlib, a color bar is a separate axes that can provide a key for the meaning of colors in a plot.
For continuous labels based on the color of points, lines, or regions, a labeled color bar can be a great tool.
We can specify the colormap using the cmap argument to the plotting function that is creating the
visualization. Broadly, we can know three different categories of colormaps:
Sequential colormaps - These consist of one continuous sequence of colors (e.g., binary or viridis).
Divergent colormaps - These usually contain two distinct colors, which show positive and
negative deviations from a mean (e.g., RdBu or PuOr).
Qualitative colormaps - These mix colors with no particular sequence (e.g., rainbow or jet).
Color limits and extensions
Matplotlib allows for a large range of colorbar customization. The colorbar itself is simply an instance
of plt.Axes, so all of the axes and tick formatting tricks we’ve learned are applicable.
We can narrow the color limits and indicate the out-of-bounds values with a triangular arrow at the top
and bottom by setting the extend property.
plt.subplot(1, 2, 2)
plt.imshow(I, cmap='RdBu')
plt.colorbar(extend='both')
plt.clim(-1, 1);
OUTPUT:
70 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
Discrete colorbars
Colormaps are by default continuous, but sometimes you’d like to represent discrete values. The easiest way to
do this is to use the plt.cm.get_cmap() function, and pass the name of a suitable colormap along with the number
of desired bins.
plt.imshow(I, cmap=plt.cm.get_cmap('Blues', 6))
plt.colorbar()
plt.clim(-1, 1);
8. SUBPLOTS
Matplotlib has the concept of subplots: groups of smaller axes that can exist together within a single
figure.
These subplots might be insets, grids of plots, or other more complicated layouts.
We’ll explore four routines for creating subplots in Matplotlib.
plt.axes: Subplots by Hand
plt.subplot: Simple Grids of Subplots
plt.subplots: The Whole Grid in One Go
plt.GridSpec: More Complicated Arrangements
The most basic method of creating an axes is to use the plt.axes function. As we’ve seen previously,
by default this creates a standard axes object that fills the entire figure.
plt.axes also takes an optional argument that is a list of four numbers in the figure coordinate system.
These numbers represent [bottom, left, width,height] in the figure coordinate system, which ranges
from 0 at the bottom left of the figure to 1 at the top right of the figure.
71 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
For example,
we might create an inset axes at the top-right corner of another axes by setting the x and y position to 0.65
(that is, starting at 65% of the width and 65% of the height of the figure) and the x and y extents to 0.2 (that is,
the size of the axes is 20% of the width and 20% of the height of the figure).
import matplotlib.pyplot
as plt import numpy as np
ax1 = plt.axes() # standard axes
ax2 = plt.axes([0.65, 0.65, 0.2, 0.2])
OUTPUT:
OUTPUT:
We now have two axes (the top with no tick labels) that are just touching: the bottom of the upper
panel (at position 0.5) matches the top of the lower panel (at position 0.1+ 0.4).
If the axis value is changed in second plot both the plots are separated with each other,
example
ax2 = fig.add_axes([0.1, 0.01, 0.8, 0.4])
72 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
OUTPUT:
The approach just described can become quite tedious when you’re creating a large grid of subplots,
especially if you’d like to hide the x- and y-axis labels on the inner plots.
For this purpose, plt.subplots() is the easier tool to use (note the s at the end of subplots).
Rather than creating a single subplot, this function creates a full grid of subplots in a single line,
returning them in a NumPy array
Rather than creating a single subplot, this function creates a full grid of subplots in a single line,
returning them in a NumPy array.
The arguments are the number of rows and number of columns, along with optional keywords sharex
and sharey, which allow you to specify the relationships between different axes.
Here we’ll create a 2×3 grid of subplots, where all axes in the same row share their y- axis scale, and
all axes in the same column share their x-axis scale
Note that by specifying sharex and sharey, we’ve automatically removed inner labels on the grid to
make the plot cleaner.
73 | P a g e
Downloaded by Kalai ilaiya ([email protected])
lOMoARcPSD|28284242
To go beyond a regular grid to subplots that span multiple rows and columns, plt.GridSpec() is the best tool.
The plt.GridSpec() object does not create a plot by itself; it is simply a convenient interface that is recognized
by the plt.subplot() command.
For example, a gridspec for a grid of two rows and three columns with some specified width and height
space looks like this:
OUTPUT:
74 | P a g e
Downloaded by Kalai ilaiya ([email protected])