0% found this document useful (0 votes)
13 views

Data Visualisation PyPlot

sgtwg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Data Visualisation PyPlot

sgtwg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

SASWATI SARANGI

PLOTTING WITH PYPLOT

➢ Why Data Visualization?

➢ What is Data Visualization?

➢ Using PyPlot of Matplotlib Library


SASWATI SARANGI

➢ Creating Charts-Line, Bar, Histograms

SASWATI SARANGI
SASWATI SARANGI

Why Data Visualization?


SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

Line graph Box Plot

Line Chart: A line plot/chart is a graph that


shows the frequency of data occurring along a
number line. The line plot is represented by a
series of data points called markers connected
with a straight line. Generally line plots are used
to display trends over time. A line plot or line
graph can be created using the plot() function
available in pyplot library.

SASWATI SARANGI
SASWATI SARANGI
Bar Chart: A graph drawn using rectangular bars to
show how large each value is. The bars can be
horizontal or vertical. A bar graph makes it easy to
compare data between different groups at a glance.
Bar graph represents categories on one axis and a
discrete value in the other. With PyPlot, a bar graph

is created using bar() and barh() functions.

Scatter Plot: simply plots the data points to show


SASWATI SARANGI

the trend in the data. With PyPlot, scatter chart is


created using scatter() function.

Pie Chart: is a circular chart divided into slices to


represent the value/percentage. With PyPlot, a pie
chart is created using pie() function.

Histogram Plot: is a type of chart that represents


the number of data points that lie within a range of
values. With PyPlot, hist() function is used to plot a
histogram.
Difference between Histogram and Bar Chart:
Histograms are used to show distributions of variables
while bar charts are used to compare variables.
Histograms plot quantitative data with ranges of the data
grouped into bins or intervals while bar charts
plot categorical data. SASWATI SARANGI
SASWATI SARANGI

BoxPlot Chart: is a method for graphically depicting


groups of numerical data through their quartiles.
With PyPlot, a boxplot is created using boxplot()
function.

Installing and Importing matplotlib


• Install matplotlib by pip command –
SASWATI SARANGI

pip install matplotlib in command prompt


• Import matplotlib library in it using –
import matplotlib.pyplot as pl
Then we can invoke PyPlot’s method as :
pl.plot(X,Y)
The plot() function of the pyplot module is used to
create a figure. A figure is the overall window
where the outputs of pyplot functions are plotted.

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI
Specifying Plot Size and Grid: To set up the size of
the graph/plot, we need to set up figure size as:
<matplotlib.pyplot>.figure(figsize=(<width>,<length
>))
e.g., import matplotlib.pyplot as plt
plt.figure(figsize=(5,7)) # 5 units is the width or
x-coordinate and 7 units is the height or y-
coordinate
SASWATI SARANGI

figure() creates a new figure window using specific


property values. figure() is mostly useful if you want
to create a new figure window with specific
properties set, or you want to immediately save the
figure handle to start manipulating its properties, or
you want to have multiple figure windows active at
once.
It can be redundant in code sometimes, because
other functions like plot() and subplot() will
automatically create a figure window if there isn't
one already available to use.
When figure() is used to change the size of the
graph, it should be written in the beginning of the
program.

SASWATI SARANGI
SASWATI SARANGI

To use grid on the plot, we can write:


plt.grid(True)
SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI

In Script mode(.py), it is necessary to write


plt.show() in order to display the graph. But in
jupyter notebook even if you do not write
plt.show(), with plt.plot() it will display the graph.
Changing Line Color and Style:
Syntax:
<matplotlib.pyplot>.plot(<data>,[,data2], <color
code>)
SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI

To change Line Width and Style:


An additional argument in plot is to be added for
changing the line width and style.
For line width, syntax in plot() is
linewidth=<width>, where value is given in points
like linewidth=2 or 3 or 0.5 or 0.75 etc.
For linestyle, syntax in plot() is written as linestyle
or ls=[‘solid’ |’dashed’|’dashdot’|’dotted’]
SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI
Changing Marker Type, Size and Color:
To change marker type, its size and color additional
arguments in plot() function is to be given as:
marker=<valid marker type>, markersize=<in
points>, markeredgecolor=<valid color>
There are various types of marker(‘.’, ’x’, ‘o’, ‘+’, ‘d’,
‘D’ etc.). If marker type is not specified, then data
points will not be marked on the line chart.
SASWATI SARANGI

The marker size is specified in points and


markeredgecolor a valid color.

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI
Note: Make sure to set the limits keeping in mind
the data set being plotted. The data that falls into
the limits of X and Y-axes will be plotted. Rest of the
data will not show in the plot. Example:
SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
Adding Legends SASWATI SARANGI

For plotting multiple ranges on a single plot, we require


legends. Legend is a color or mark linked to a specific
data range plotted. To plot a legend, we need to use
label() to give a specific label to data range in plot(),
bar() etc.
Syntax:
<matplotlib.pyplot>.legend(loc=<position number or
string>)
The loc argument can take values 1,2,3,4 signifying the
SASWATI SARANGI

position strings ‘upper right’, ‘upper left’, ‘lower left’,


‘lower right’ respectively. Default position is ‘upper
right’ or 1.

SASWATI SARANGI
SASWATI SARANGI
Saving a Figure
To save a plot created using pyplot functions for
later use, savefig() is to be used. Figures can be
saved in different formats like .pdf, .png, .eps etc.
Syntax:
<matplotlib.pyplot>.savefig(<string with filename
and path>)
While specifying path, use double slashes to
SASWATI SARANGI

suppress special meaning of single slash character.


Example:
plt.savefig(“F:\\data\\bar1.pdf”)
plt.savefig(“F:\\data\\bar1.png”)

SASWATI SARANGI
SASWATI SARANGI

A histogram is an accurate graphical representation


of the distribution of numerical data. It is an
estimate of the probability distribution of a
continuous variable (quantitative variable) and was
SASWATI SARANGI

first introduced by Karl Pearson. To construct a


histogram, the first step is to “bin” the range of
values — that is, divide the entire range of values
into a series of intervals — and then count how
many values fall into each interval. The bins are
usually specified as consecutive, non-overlapping
intervals of a variable. The bins (intervals) must be
adjacent and are often (but are not required to be)
of equal size.

SASWATI SARANGI
SASWATI SARANGI
Histogram using hist() function
The hist() of PyPlot module lets you create and plot
histogram from a given sequence(s) of numbers. The
syntax for hist() function is:
matplotlib.pyplot.hist(x, bins = None, cumulative =
False, histtype = ‘bar’, align = ‘mid’, orientation =
‘vertical’)
Parameters:
x : (n,) array or sequence of (n,) arrays
SASWATI SARANGI

Input values, this takes either a single array or a


sequence of arrays which are not required to be of the
same length.
bins : int or sequence or str, optional
If an integer is given, bins + 1 bin edges are calculated
and returned, consistent with numpy.histogram.
If bins is a sequence, gives bin edges, including left edge
of first bin and right edge of last bin. If bins is:
[1,2,3,4], then the first bin is [1, 2) (including 1, but
excluding 2) and the second [2, 3). The last bin,
however, is [3, 4], which includes 4.

SASWATI SARANGI
SASWATI SARANGI
cumulative : bool, optional
If True, then a histogram is computed where each bin gives
the counts in that bin plus all bins for smaller values. The last
bin gives the total number of datapoints.
histtype : {'bar', 'barstacked', 'step', 'stepfilled'}, optional

The type of histogram to draw.


'bar' is a traditional bar-type histogram. If multiple data are
given the bars are arranged side by side.
SASWATI SARANGI

'barstacked' is a bar-type histogram where multiple data are


stacked on top of each other.
'step' generates a lineplot that is by default unfilled.
'stepfilled' generates a lineplot that is by default filled.
Default is 'bar’.
align : {'left', 'mid', 'right'}, optional
Controls how the histogram is plotted.
'left': bars are centered on the left bin edges.
'mid': bars are centered between the bin edges.
'right': bars are centered on the right bin edges.
Default is 'mid'

SASWATI SARANGI
SASWATI SARANGI
orientation : {'horizontal', 'vertical'}, optional
If 'horizontal', barh will be used for bar-type histograms.
rwidth : scalar or None, optional
The relative width of the bars as a fraction of the bin
width. If None, automatically compute the width.
Ignored if histtype is 'step' or 'stepfilled’.
You can add color and label when you have multiple
datasets to be represented.
SASWATI SARANGI

Default is None
Values on Y-axis

Bins of equal intervals


created by default

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI
A scatter plot is similar to plot() function if we
specify the line color and marker style string(e.g.,’r+’,
’bo’ etc.) without linestyle argument. The primary
difference between scatter() and plot() is that in
scatter plots, the properties of each individual
point(size, face color, edge color etc.) can be
individually controlled or mapped to data.
SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI

Plotting Data from DataFrame


To plot a DataFrame’s data, pass the column name
to Pyplot’s graph functions-plot(), bar(), barh(),
scatter(), boxplot(), hist(). It will treat the data as
series and plot it. Example:
SASWATI SARANGI

SASWATI SARANGI
SASWATI SARANGI SASWATI SARANGI

SASWATI SARANGI

You might also like