0% found this document useful (0 votes)
19 views

Chapter 3-Plotting With PyPlot

The document discusses various data visualization techniques in Python using Matplotlib library. It explains concepts like line chart, scatter plot, bar chart, pie chart and how to create them using functions like plot(), scatter(), bar(), barh() and pie(). It also discusses formatting and styling of charts.

Uploaded by

Anjana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Chapter 3-Plotting With PyPlot

The document discusses various data visualization techniques in Python using Matplotlib library. It explains concepts like line chart, scatter plot, bar chart, pie chart and how to create them using functions like plot(), scatter(), bar(), barh() and pie(). It also discusses formatting and styling of charts.

Uploaded by

Anjana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

3.1 What is Data Visualization?


Data visualization basically refers to the graphical or visual
representation of information and data using visual elements
like charts,graphs , and maps etc.


Data visualization is immensely useful in decision-making.


Data visualization unveils(uncover) like
patterns ,trends,outliers, correlation etc in the data, and thereby
helps decision -makers understand the meaning of data to drive
business decisions.
3.2 Using Pyplot of Matplotlib Library

For data visualization in Python, the Matplotlib library's Pyplot
interface is used.


Matplotlib is a high quality plotting library of Python.It
provides a quick way to visualize data from Python and
publication-quality figures in many formats.


The Matplotlib is a Python library that provides many
interfaces and functionality for 2D- graphics.


Pyplot is one such interface that has a collection of methods ,
which allows user to construct 2D plots easily
3.2.1 Installing and Importing matplotlib

If you have installed Python using Anaconda, then matplotlib
library is already installed.

If you have installed Python using standard official distribution ,
you may need to install matplotlib by following steps
1. Download required package from the link
https://pypi.org/project/matplotlib/#files
2. Install it by following command from the command prompt:
python -m pip install -U pip
python -m pip install -U matplotli
You can now Import Pyplot interface by using following command:

import matplotlib.pyplot
Or
import matplotlib.pyplot as pl
3.2.2 Working with PyPlot Methods

The PyPlot interface provides many methods for 2D plotting of
data.

Have a look at below example where we are plotting a simple
chart using an ndarray.
3.2.2A Basics of Simple Plotting

Line Chart - plot( ) function

Bar Chart - bar( ) and barh( )

Scatter Plot - scatter( )

Pie Chart - pie( )

Histogram Plot. - hist( )

BoxPlot Chart - boxplot( )


3.3 Creating Line Chart and Scatter Charts

Before you plot/create any chart or graph type, make sure to
import the matplotlib.pyplot library by the given command
import matplotlib.pyplot
or
import matplotlib.pyplot as pl

For simple chart, the line chart and the scatter chart are
similar.The only difference is the absence/presence the of the
lines connecting the points.

For using the plot( ) function of pyplot you can create both
these basic type chart

The scatter chart, however can also be created using the
scatter( ) function in further topics.
3.3.1 Line Chart using plot( ) Function
“ A line Chart or a line graph is a type of chart which display
informartion as a series of data points callled 'markers'
connected by straight line segment.”

Funtion used for creating line chart.


plot( ) - for creating graph
xlabel( ) - set the x-axis label
ylabel( ) - set the y-axis label
show( ) - to show a plot
Note : Data points
are called markers
3.3.2 Specifying Plot Size and Grid

To set up the size of your generated graph or plot you may set
figure size as -
<matlpotlib.pyplot>.figure(figsize = (<width>,length>)
e.g.
import matplotlib.pyplot as plt
plt.figure(figsize = (15,7)) #15 unit wide & 7 unit long

If you want to show a grid on the plot, you can write:
import matplotlib.pyplot as plt
plt.grid(true)
3.3.3.Applying Various Settings in plot( ) Function
The plot( ) function allows you specify multiple settings for
your chart/graph such as:

Color( line color/marker color)

Marker type

Marker size

And so forth

Chaning Line Color
<matlpotlib.pyplot>.plot(<data1>,<data2>,<color code>)

Color code
e.g plt.plot([1,2,3,4] ,[40,80,100,50],'b')


To Change Line Width using linewidth = <width> argument
plt.plot(x,a,linewidth = 2)
Change the line style using linestyle or ls =['solid' | 'dashed'
|'dashdot' | 'dotted']

plt.plot(x,a,linewidth = 2)

plt.plot(x,b,linewidth = 4, linestyle ='dashed')
Changing Marker Type, Size and Color

marker = <valid marker type>, markersize = <in points>,markeredgecolor = <valid color>
3.3.4 Creating Scatter Charts
“ The scatter chart is a graph of plotted points on two axes that
show the relationship between two sets of data.”

The scatter charts can be created through two functions of


PyPlot library :

(i) plot( ) function


(ii) scatter( ) function
3.3.4A Scatter Charts using plot( ) Function

In plot( ) function, whenever you specify markerstyle with
color or without color, and do not give linestyle argument ,
plot( ) will create a scatter chart
3.3.4 B Scatter Charts using scatter( ) Function
Another method of creating scatter charts is using the scatter( )
function of pyplot library. It is more powerful method of
creating scatter plots than the plot( ) function.
matplotlib.pyplot.scatter( <array1> , <array2> )
or <pyplotaliasname>.scatter( <array1> , <array2> )
Argument of scatter( ) function
We can use different argument with scatter function like
marker, s (size of marker), c (color of the marker )
Specifying Colors and Sizes for Data Point

Scatter( ) function allows you to specify different sizes and
color for individual data point.

For this purpose, you need to specify an array of colors and an
array of sizes.

First color and size to the first data point, second color/size to
the second data point and so on.

The size of the color/size array should be same as of data size.

NOTE
The primary difference of scatter( ) from plot() is that it can be
used to create scatter plots where the properties of each individual
point (size,face color,edge color,etc) can be individually controlled
or mapped to data.
3.4 Creating Bar Charts and Pie Charts

A Bar Graph or a Bar Chart is a graphical display of data using bars of
different heights.

Chart can be drawn vertically or horizontally using retangles or bars of
different hights/widths.

PyPlot offers bar( ) function to create a bar chart where you can specify
the sequence for x-axis and corresponding sequence to be plotted on y-
axis.

e.g. A,b,c = [1,2,3,4] , [2,4,6,8] , [1,4,9,16]

Consider another example
3.4.1 Changing Widths of the Bars in a Bar Chart

By default, bar chart draws bars with equal widths and having a
default width of 0.8 units on a bar chart.

But we can specify a different width for all the bars of a bar
chart.
plt.bar( <x-sequence>, <y-sequence>, width = <float value> )

We can also specify different widths for different bars of a bar
chart.
plt.bar( <x-sequence>, <y-sequence>, width = <width values
sequence> )
N O T E : The width values sequence in a bar( ) must have widths for all the
bars, i.e. Its length must match the length of data sequence being plotted, otherwise
Python will report an error
3.4.2 Changing Colors of the Bars in a Bar Chart

By default . A bar chart draws bars with same default color.But
you can always change the color of the bars.

To specify a different color for all the bars
plt.bar( <x – sequence>, <y – sequence>, color = <color code/name>)

To specify a different color for different bars of a bar chart
plt.bar( <x – sequence>, <y – sequence>, color = <color code/name
sequence>)
3.4.3 Creating Multiple Bars Chart
Often in real life, you may need to plot multiple data ranges on
the same bar chart creating multiple bars.PtPlot does not
provide a specific function for this, but you can always create
one exploting the width and color argument of bar( ) that you
learnt above.
1. Decide number of X points.
2. Decide thickness of each bar and accordingly adjust X points
on X- axis
3. Give different color to different data range.
4. The width argument remains the same for all ranges being
plotted.
5. Plot using bar( ) for each range seperately.
3.4.4 Creating a Horizontal Bar Chart

To create a horizontal bar chart, you need to use barh( )
function (bar horizontal), in place of bar( ). Also , you need to
give a and y axis labels carefully – the label that you gave to x
axis in bar( ), will become y-axis' label in barh( ) and vice-
versa.
3.4.5 Creating Pie Chart
“ The Pie chart is a type of graph in which a circle is divided
into sectors that each represent a proportion of the whole ”

Typically, a Pie Chart is used to show parts to the whole and often
a % share.

The PyPlot interface offers pie( ) function for creating a pie chart.
i) The pie( ) function, plots a single data range only.
ii) The default shape of a pie chart is oval but you always change
to circle by using axis() of pyplot, sending “equal” as argument to
it.
e.g contri = [17,8.8, 12.75,14]
plt.pie(contri)
Labels of Slices of Pie

We need to create a sequence containing the labels and then
specify this sequence as value for labels argument of pie( )
function.The first label is given to first value, second label to
second value and so on.
Contri = [17,8.8,12.75,14]
houses = ['Vidya', 'Kshama' , 'Namrta', 'Karuna']
plt.pie(contri, labels = houses)

Adding Formatted Slice Percentages to Pie
To view percentage of share ina pie chart, you need to add an
argument autopct with a format string, such as “% 1.1F%%”.

plt.pie(contri, labels = houses, autopct = “ %1.1f%%”)


The general syntax for a format placeholder is
“ % [flags] [width] [.precision] type %%”

% - special character to determine the format of the values


[flags] - flag = 0 , when digits are less than width it preceded
with 0
[width] – total no of character to be displayed
[.precision] – no of digits after decimal points.
type – type of value, d or i means integer , f or F means float
%% - to print % sign
Changing Colors of the Slices

For changing the color of pie chart , we need to create a sequence
containing the color codes or names for each slice.

Specify this sequence as a value for colors argument of pie( ) function.

The first color is given to the first value, second for second value and
so on.
colr = ['red' , 'cyan' , 'pink' , 'yellow' , 'silver']
plt.pie(contri, labels = houses, colors = colr, autopct = “%2.2f%% “ )
Exploding a Slice

Sometimes you want to emphasize on one or more slices and
show them little pulled out.This feature is called explode in
pie charts.

We can provide explode values for the slices in the form of a
sequence
e.g [ 0.2 , 0 , 0 , 0 , 0]
This sequence will explode first slice out of five slice being
plotted with a distance of 0.2 units
e.g [ 0, 0 ,0.15 ,0 , 0.3]
This will explode 3rd and 5th slice by a distance of 0.15 and 0.3
units out of five slices
expl = [0 , 0.2, 0, 0 , 0 ]
plt.pie(contri, labels = houses, explode = expl , colors = colr,
autopct = “%2.2f%%”)
3.5 Customizing the Plot

The graph or plot should have a proper title, X and Y limits


defined , labels, legends etc.All these makes understaing the
plot and taking the decisions easier.There are some methods
to customize your plot using which we can show more details
related to chart.
3.5.1 Anatomy of a Chart
3.5.2 Adding a Title
To add a tilte to your plot, you need to call function title( )
before you show your plot.
Setting X and Y Labels, Limits and Ticks

Function xlabel( ) and ylabel( ) can be used to set labels for X
and Y- axis repectively.

We can use xlim( ) and ylim( ) functions to set limits for X-
axis and Y-axis respectively.

If you have set X-axis or Y-axis limits which


are not compatible with the data value being
Plotted, you may either get incomlete plot or
the data being plotted is not visible at all
Interesting
You can use decreasing axes by fliping the normal order of the
axis limit i.e. If you limits (min,max) as (max,min) , then the plot
gets flipped. e.g., see below X = [0 , 1, 2, 3 ]
X = [0 , 1, 2, 3 ] Y = [5. ,25. , 45., 20. ]
Y = [5. ,25. , 45., 20. ] Plt.xlim(3.5, -0.5)
plt.plot(X,Y) plt.plot(X,Y)
Plt.show( ) Plt.show( )
Setting Ticks for Axes
By default , PyPlot will automatically decide which data points will
have ticks on the axes, but you can also decide which data points will
have tick marks on X- and Y-axes
For X-axis you can use xticks( ) function
xticks( < sequence containing tick data points> , [ < Optional sequence
containing tick labels>] )
For X-axis you can use xticks( ) function
yticks( < sequence containing tick data points> , [ < Optional sequence
containing tick labels>] )
3.5.4 Adding Legends
When we plot multiple ranges on a single plot, it becomes
necessary that legends are specified. Legend is a color mark
linked to a specific data range plotted. To plot a legend you
need to do two things :
i) In the plotting function like plot( ) , bar( ) etc, give a specific
label to data range using argument label.
ii) Add legend to the plot using legend( ) as per format :
plt.legend(loc = <position number or string> )

The loc argument can either take values 1, 2, 3, 4 signifying the
position string 'upper right', 'upper left', 'lower right', 'lower left'
respectively .Default position is 'upper right' or 1.
3.5.5 Saving a Figure
If you want to save a plot created using pyplot functions for
later use or for keeping records you can use savefig( ) to save
the plot.
You can use the pyplot's savfig( ) as per format :

You can save figure in popular formats like .pdf , .png , .eps etc

While specifying the path, use double slashes to suppress
special meaning of single slash character.
e.g.
3.6 Creating Histogram with PyPlot
A histogram is a summarisation tool for discrete or continuous
data. A histogram provides a visual interpretation of numerical
data by showing the number of data points that fall within a
specified range of values called (bins).It is similar to a vertical bar
graph, however a histogram, unlike a vertical bar graph, shows no
gaps between the bars.
The hist( ) function of PyPlot module is used to create
histogram.The syntax is;
plt.hist( x, bins=None , cumulative = False, histtype = 'bar', align = 'mid' ,
orientation = 'vertical', )
Parameters

x Array or sequence to be plotted on histogram


bins Int(optional)
cumulative Bool (optional) default is false
histtype 'bar', 'barstacked' ,'step', 'stepfilled'
orientation 'horizontal' , 'vertical'
1. Plot a histogram from 2. Plot a histogram from
an ndrray x with 20 bins an ndrray y with 50 bins

pl.hist( x , bins = 20 ) pl.hist( y , bins = 50 )



3. Plot a cumulative histogram of ●
4. Plot ndarray x's histogram as
ndrray x with 30 bins 'step' type histogram with 20 bins
pl.hist( x ,bins=30,cumulative=True ) pl.hist( x ,bins=20, histtype ='step' )

5. Plot both ndarray x and y in same histogram with

pl.hist( [ x , y ] )
6. Plot a stacked bar type histogram from both ndarray x and y


(a) regular histogram ●
(b) cumulative histogram
pl.hist( [ x, y] , histtype = 'barstacked' pl.hist( [ x, y] , histtype =
'barstacked',cumulative = True )
7. Plot a horizontal histogram from from ndarray y with 50 bins

pl.hist( y, bins=50 , orientation = 'horizontal' )


3.7 Creating Frequency Ploygons

A frequency polygon is a type of frequency distribution graph.

In a frequency polygon, the number of observation is marked with a single
point at the midpoint of an interval.A straight line then connects each set of
points.

Frequency polygons make it easy to compare two or more distribution on the
same of axes.

Python's pyplot module of matplotlib provides no seperate function for
creating frequency polygon.Therefore , to create a frequency polygon, we can
do is :
i) Plot a histogram from the data.
ii) Mark a single point at the midpoint of an interval/bin.
iii)Draw straight lines to connnect the adjacent points.
iv) Connect first data point to the midpoint of previous interval on x-axis.
v) Connect last data point to the midpoint of follwoing interval on x-axis
3.8 Creating Box Plots
The box plot has become the standard technique for presenting
the 5-number summary which consist of :

The minmum range value

The maximum range value

The upper quartile

The lower quartiles, and

The median
A box plot is used to show the range and middle half of ranked
data.Ranked data is numerical data such as number etc. The
middle half of the data is represented by the box.The highest
and lowest scores are joined to the box by straight lines.
Creating boxplot with boxplot( ) of Pyplot
Pyplot module's boxplot( ) allows you to create boxplots.
syntax is :
plt.boxplot(x, notch= None, vert = None, meanline =
None , showmeans = None, showbox = None,)
Parameters
X Array or a sequence of vectors. The input data.
notch Bool, optional(False); If True, produces a notched boxplot otherwise a
rectangular boxplot
vert Bool, optional(True), if True makes the boxes vertical otherwise horizontal.
meanline Bool, optional(False) ; If True render the mean as a line spanning
showbox Bool, optional (True); Show the central box.
showmeans Bool , optional (False); Show the arithmetic means.
patch_artist Bool, optional (None) ; Fill the box if True.
labels List sequence with labels of mutiple squence being plotted.

Have a look at the following examples :
3.9 Plotting Data From a DataFrame
Till now we have plotted data from either linear lists, or 1D
arrays. You can also plot data from a DataFrame using its
columns selectively.

We can do it in two ways:


i ) Using PyPlot's graph functions
ii) Using DataFrame's plot( ) function. It is availabel from
version 0.17.0 onwords.
3.9.1 Plotting a DataFrame's Data using PyPlot's Graph Function


To plot DataFrame's data, just pass its column name to the
Pyplot's graph functions ( plot( ) , bar( ), barh( ), scatter( ),
boxplot( ), hist( ) ).

It will treat the passed column's data as a Series and plot it .
Example:

We can also plot a bar chart using this Dataframe's data as :
plt.bar(df2.index , df2.Projects)

The plot( ) can take a DataFrame's name and will plot all
columns
import pandas as pd
import matplotlib.pyplot as plt
: # df2 created or loaded
plt.plot(df2)
3.9.2 Plotting a DataFrame's Data using DataFrame's plot()
Pandas provides a function plot( ) which you can use with
DataFrame's as:
<DF>.plot( )
The DataFrame's plot( ) is a versatile function, which can plot
all types of chart by just specifying kind argument.
Advantages

It plots only the numeric columns unlike plot( ) of PyPlot
when used with a DataFrame.

It automatically adds legends for the plotted data.

You might also like