SASWATI SARANGI
PLOTTING WITH PYPLOT
➢ Why Data Visualization?
➢ What is Data Visualization?
➢ Using PyPlot of Matplotlib Library
SASWATI SARANGI
➢ Creating Charts-Line, Bar, Histograms
SASWATI SARANGI
SASWATI SARANGI
Why Data Visualization?
SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
Line graph Box Plot
Line Chart: A line plot/chart is a graph that
shows the frequency of data occurring along a
number line. The line plot is represented by a
series of data points called markers connected
with a straight line. Generally line plots are used
to display trends over time. A line plot or line
graph can be created using the plot() function
available in pyplot library.
SASWATI SARANGI
SASWATI SARANGI
Bar Chart: A graph drawn using rectangular bars to
show how large each value is. The bars can be
horizontal or vertical. A bar graph makes it easy to
compare data between different groups at a glance.
Bar graph represents categories on one axis and a
discrete value in the other. With PyPlot, a bar graph
is created using bar() and barh() functions.
Scatter Plot: simply plots the data points to show
SASWATI SARANGI
the trend in the data. With PyPlot, scatter chart is
created using scatter() function.
Pie Chart: is a circular chart divided into slices to
represent the value/percentage. With PyPlot, a pie
chart is created using pie() function.
Histogram Plot: is a type of chart that represents
the number of data points that lie within a range of
values. With PyPlot, hist() function is used to plot a
histogram.
Difference between Histogram and Bar Chart:
Histograms are used to show distributions of variables
while bar charts are used to compare variables.
Histograms plot quantitative data with ranges of the data
grouped into bins or intervals while bar charts
plot categorical data. SASWATI SARANGI
SASWATI SARANGI
BoxPlot Chart: is a method for graphically depicting
groups of numerical data through their quartiles.
With PyPlot, a boxplot is created using boxplot()
function.
Installing and Importing matplotlib
• Install matplotlib by pip command –
SASWATI SARANGI
pip install matplotlib in command prompt
• Import matplotlib library in it using –
import matplotlib.pyplot as pl
Then we can invoke PyPlot’s method as :
pl.plot(X,Y)
The plot() function of the pyplot module is used to
create a figure. A figure is the overall window
where the outputs of pyplot functions are plotted.
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI
Specifying Plot Size and Grid: To set up the size of
the graph/plot, we need to set up figure size as:
<matplotlib.pyplot>.figure(figsize=(<width>,<length
>))
e.g., import matplotlib.pyplot as plt
plt.figure(figsize=(5,7)) # 5 units is the width or
x-coordinate and 7 units is the height or y-
coordinate
SASWATI SARANGI
figure() creates a new figure window using specific
property values. figure() is mostly useful if you want
to create a new figure window with specific
properties set, or you want to immediately save the
figure handle to start manipulating its properties, or
you want to have multiple figure windows active at
once.
It can be redundant in code sometimes, because
other functions like plot() and subplot() will
automatically create a figure window if there isn't
one already available to use.
When figure() is used to change the size of the
graph, it should be written in the beginning of the
program.
SASWATI SARANGI
SASWATI SARANGI
To use grid on the plot, we can write:
plt.grid(True)
SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI
In Script mode(.py), it is necessary to write
plt.show() in order to display the graph. But in
jupyter notebook even if you do not write
plt.show(), with plt.plot() it will display the graph.
Changing Line Color and Style:
Syntax:
<matplotlib.pyplot>.plot(<data>,[,data2], <color
code>)
SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI
To change Line Width and Style:
An additional argument in plot is to be added for
changing the line width and style.
For line width, syntax in plot() is
linewidth=<width>, where value is given in points
like linewidth=2 or 3 or 0.5 or 0.75 etc.
For linestyle, syntax in plot() is written as linestyle
or ls=[‘solid’ |’dashed’|’dashdot’|’dotted’]
SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI
Changing Marker Type, Size and Color:
To change marker type, its size and color additional
arguments in plot() function is to be given as:
marker=<valid marker type>, markersize=<in
points>, markeredgecolor=<valid color>
There are various types of marker(‘.’, ’x’, ‘o’, ‘+’, ‘d’,
‘D’ etc.). If marker type is not specified, then data
points will not be marked on the line chart.
SASWATI SARANGI
The marker size is specified in points and
markeredgecolor a valid color.
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI
Note: Make sure to set the limits keeping in mind
the data set being plotted. The data that falls into
the limits of X and Y-axes will be plotted. Rest of the
data will not show in the plot. Example:
SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
Adding Legends SASWATI SARANGI
For plotting multiple ranges on a single plot, we require
legends. Legend is a color or mark linked to a specific
data range plotted. To plot a legend, we need to use
label() to give a specific label to data range in plot(),
bar() etc.
Syntax:
<matplotlib.pyplot>.legend(loc=<position number or
string>)
The loc argument can take values 1,2,3,4 signifying the
SASWATI SARANGI
position strings ‘upper right’, ‘upper left’, ‘lower left’,
‘lower right’ respectively. Default position is ‘upper
right’ or 1.
SASWATI SARANGI
SASWATI SARANGI
Saving a Figure
To save a plot created using pyplot functions for
later use, savefig() is to be used. Figures can be
saved in different formats like .pdf, .png, .eps etc.
Syntax:
<matplotlib.pyplot>.savefig(<string with filename
and path>)
While specifying path, use double slashes to
SASWATI SARANGI
suppress special meaning of single slash character.
Example:
plt.savefig(“F:\\data\\bar1.pdf”)
plt.savefig(“F:\\data\\bar1.png”)
SASWATI SARANGI
SASWATI SARANGI
A histogram is an accurate graphical representation
of the distribution of numerical data. It is an
estimate of the probability distribution of a
continuous variable (quantitative variable) and was
SASWATI SARANGI
first introduced by Karl Pearson. To construct a
histogram, the first step is to “bin” the range of
values — that is, divide the entire range of values
into a series of intervals — and then count how
many values fall into each interval. The bins are
usually specified as consecutive, non-overlapping
intervals of a variable. The bins (intervals) must be
adjacent and are often (but are not required to be)
of equal size.
SASWATI SARANGI
SASWATI SARANGI
Histogram using hist() function
The hist() of PyPlot module lets you create and plot
histogram from a given sequence(s) of numbers. The
syntax for hist() function is:
matplotlib.pyplot.hist(x, bins = None, cumulative =
False, histtype = ‘bar’, align = ‘mid’, orientation =
‘vertical’)
Parameters:
x : (n,) array or sequence of (n,) arrays
SASWATI SARANGI
Input values, this takes either a single array or a
sequence of arrays which are not required to be of the
same length.
bins : int or sequence or str, optional
If an integer is given, bins + 1 bin edges are calculated
and returned, consistent with numpy.histogram.
If bins is a sequence, gives bin edges, including left edge
of first bin and right edge of last bin. If bins is:
[1,2,3,4], then the first bin is [1, 2) (including 1, but
excluding 2) and the second [2, 3). The last bin,
however, is [3, 4], which includes 4.
SASWATI SARANGI
SASWATI SARANGI
cumulative : bool, optional
If True, then a histogram is computed where each bin gives
the counts in that bin plus all bins for smaller values. The last
bin gives the total number of datapoints.
histtype : {'bar', 'barstacked', 'step', 'stepfilled'}, optional
The type of histogram to draw.
'bar' is a traditional bar-type histogram. If multiple data are
given the bars are arranged side by side.
SASWATI SARANGI
'barstacked' is a bar-type histogram where multiple data are
stacked on top of each other.
'step' generates a lineplot that is by default unfilled.
'stepfilled' generates a lineplot that is by default filled.
Default is 'bar’.
align : {'left', 'mid', 'right'}, optional
Controls how the histogram is plotted.
'left': bars are centered on the left bin edges.
'mid': bars are centered between the bin edges.
'right': bars are centered on the right bin edges.
Default is 'mid'
SASWATI SARANGI
SASWATI SARANGI
orientation : {'horizontal', 'vertical'}, optional
If 'horizontal', barh will be used for bar-type histograms.
rwidth : scalar or None, optional
The relative width of the bars as a fraction of the bin
width. If None, automatically compute the width.
Ignored if histtype is 'step' or 'stepfilled’.
You can add color and label when you have multiple
datasets to be represented.
SASWATI SARANGI
Default is None
Values on Y-axis
Bins of equal intervals
created by default
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI
A scatter plot is similar to plot() function if we
specify the line color and marker style string(e.g.,’r+’,
’bo’ etc.) without linestyle argument. The primary
difference between scatter() and plot() is that in
scatter plots, the properties of each individual
point(size, face color, edge color etc.) can be
individually controlled or mapped to data.
SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI
Plotting Data from DataFrame
To plot a DataFrame’s data, pass the column name
to Pyplot’s graph functions-plot(), bar(), barh(),
scatter(), boxplot(), hist(). It will treat the data as
series and plot it. Example:
SASWATI SARANGI
SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI
SASWATI SARANGI