0% found this document useful (0 votes)
3 views

Chapter 4 Plotting Data using Matplotlib

Chapter 4 discusses data visualization using the Matplotlib library in Python, which allows for creating various types of plots, including line and bar charts. It provides examples of how to plot data, customize plots with titles, labels, colors, and markers, and explains the difference between bar charts and histograms. Additionally, it covers how to utilize Pandas for plotting and accessing open data for analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 4 Plotting Data using Matplotlib

Chapter 4 discusses data visualization using the Matplotlib library in Python, which allows for creating various types of plots, including line and bar charts. It provides examples of how to plot data, customize plots with titles, labels, colors, and markers, and explains the difference between bar charts and histograms. Additionally, it covers how to utilize Pandas for plotting and accessing open data for analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter 4

Plotting Data using Matplotlib


Data visualisation means graphical or pictorial representation of the data
using graph, chart, etc. The purpose of plotting data is to visualize variation
or show relationships between variables.
Visualisation also helps to effectively communicate information to
intended users.
Matplotlib
Matplotlib library is used for creating static, animated, and interactive 2D-
plots or figures in Python. It can be installed using the following pip
command from the command prompt:
pip install matplotlib
For plotting using Matplotlib, we need to import its Pyplot module using the
following command:
import matplotlib.pyplot as plt
The pyplot module of matplotlib contains a collection of functions that can
be used to work on a plot.
The plot() function of the pyplot module is used to create a figure. The
plot() function by default plots a line chart.
A figure is the overall window where the outputs of pyplot functions are
plotted. A figure contains a plotting area, legend, axis labels, ticks, title,
etc.
The show() function is used to display the figure created using the plot()
function.
Program 4-1 Plotting Temperature against Height(LINE CHART)
import matplotlib.pyplot as plt
#list storing date in string format
date=["25/12","26/12","27/12"]
#list storing temperature values
temp=[8.5,10.5,6.8]
#create a figure plotting temp versus date
plt.plot(date, temp)
#show the figure
plt.show()
Note: click on the save button on the output window and save the plot as
an image.
A figure can be saved by using savefig() function.
For example: plt.savefig('x.png').
List of Pyplot functions to plot different charts

List of Pyplot functions to customise plots

Program 4-2 Plotting a line chart of date versus temperature by adding


Label on X and Y axis, and adding a Title and Grids to the chart.

import matplotlib.pyplot as plt


date=["25/12","26/12","27/12"]
temp=[8.5,10.5,6.8]
plt.plot(date, temp)
plt.xlabel("Date")
plt.ylabel("Temperature")
plt.title("Date wise Temperature")
plt.grid(True)
plt.yticks(temp)
plt.show()

Marker
A marker is any symbol that represents a data value in a line chart or a
scatter plot. It is also possible to specify each point in the line through a
marker.
Some of the Matplotlib Markers

Colour
It is also possible to format the plot further by changing the colour of the
plotted data. We can either use character codes or the color names as
values to the parameter color in the plot().

Colour abbreviations for plotting


Linewidth and Line Style
The linewidth and linestyle property can be used to change the width and
the style of the line chart.
➢ Linewidth is specified in pixels.
➢ The default line width is 1 pixel showing a thin line.
➢ The line style parameter can take a string such as "solid"(-),
"dotted"(:), "dashed" (- - ) or "dashdot"(- . ).

Program 4-3 Consider the average heights and weights of persons aged 8
to 16 stored in the following two lists:

height = [121.9,124.5,129.5,134.6,139.7,147.3,152.4, 157.5,162.6]


weight= [19.7,21.3,23.5,25.9,28.5,32.1,35.7,39.6,43.2]

Let us plot a line chart where:


i. x axis will represent weight
ii. y axis will represent height
iii. x axis label should be “Weight in kg”
iv. y axis label should be “Height in cm”
v. colour of the line should be green
vi. use * as marker
vii. Marker size as10
viii. The title of the chart should be “Average weight with respect to average
height”.
ix. Line style should be dashed
x. Linewidth should be 2.

import matplotlib.pyplot as plt


import pandas as pd
height=[121.9,124.5,129.5,134.6,139.7,147.3,152.4,157.5,162.6]
weight=[19.7,21.3,23.5,25.9,28.5,32.1,35.7,39.6,43.2]
df=pd.DataFrame({"height":height,"weight":weight})
#Set xlabel for the plot
plt.xlabel('Weight in kg')
#Set ylabel for the plot
plt.ylabel('Height in cm')
#Set chart title:
plt.title('Average weight with respect to average height')
#plot using marker'-*' and line colour as green
plt.plot(df.weight,df.height,marker='*',markersize=10,color='green',linewid
th=2,linestyle='dashdot')
plt.show()

The Pandas Plot function(Pandas Visualization)


Pandas objects Series and DataFrame come equipped with their own plot()
methods. This plot() method is just a simple wrapper around the plot()
function of pyplot.

If we have a Series 's' or DataFrame type object 'df', we can call the plot
method by writing:
s.plot() or df.plot()

The plot() method of Pandas accepts a considerable number of arguments


that can be used to plot a variety of graphs. It allows customising different
plot types by supplying the kind keyword arguments. The general syntax is:
plt.plot(kind),where kind accepts a string indicating the type of .plot, as
listed below. In addition, we can use the matplotlib.pyplot methods and
functions also along with the plt() method of Pandas objects.

Arguments accepted by kind for different plots

Plotting a Line chart


A line plot is a graph that shows the frequency of data along a number line.
It is used to show continuous dataset. A line plot is used to visualise growth
or decline in data over a time interval. To plot a line chart for data stored in
a DataFrame.

Program 4-4 Smile NGO has participated in a three week cultural mela.
Using Pandas, they have stored the sales (in Rs) made day wise for every
week in a CSV file named “MelaSales.csv”, as shown in below Table.

Depict the sales for the three weeks using a Line chart. It should have the
following:
i. Chart title as “Mela Sales Report”.
ii. axis label as Days.
iii. axis label as “Sales in Rs”.
iv. Line colours are red for week 1, blue for week 2 and brown for week 3.

import pandas as pd
import matplotlib.pyplot as plt
# reads "MelaSales.csv" to df by giving path to the file
df=pd.read_csv("MelaSales.csv")
#create a line plot of different color for each week
df.plot(kind='line', color=['red','blue','brown'])
# Set title to "Mela Sales Report"
plt.title('Mela Sales Report')
# Label x axis as "Days"
plt.xlabel('Days')
# Label y axis as "Sales in Rs"
plt.ylabel('Sales in Rs')
#Display the figure
plt.show()

Customising Line Plot

Program 4-5 Assuming the same CSV file, i.e., MelaSales. CSV, plot the
line chart with following customisations:
Maker ="*"
Marker size=10
linestyle="--"
Linewidth =3

import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("MelaSales.csv")
#creates plot of different color for each week
df.plot(kind='line',
color=['red','blue','brown'],marker="*",markersize=10,linewidth=3,linestyle
="--")
plt.title('Mela Sales Report')
plt.xlabel('Days')
plt.ylabel('Sales in Rs')
#store converted index of DataFrame to a list
ticks = df.index.tolist()
#displays corresponding day on x axis
plt.xticks(ticks,df.Day)
plt.show()

Plotting Bar Chart


In order to show comparisons, we prefer Bar charts. Unlike line plots, bar
charts can plot strings on the x axis. To plot a bar chart, we will specify
kind=’bar’. We can also specify the DataFrame columns to be used as x and
y axes.

Let us now add a column “Days” consisting of day names to “MelaSales.csv”


as shown below.
Day-wise sales data along with Day’s names

Program 4-6 This program displays the Python script to display Bar plot
for the “MelaSales.csv” file with column Day on x axis as shown above
Figure. If we do not specify the column name for the x parameter in the
plot(), the bar plot will plot all the columns of the DataFrame with the index
(row label) of DataFrame at x axis which is a numeric starting from 0.

import pandas as pd
df= pd.read_csv('MelaSales.csv')
import matplotlib.pyplot as plt
# plots a bar chart with the column "Days" as x axis
df.plot(kind='bar',x='Day',title='Mela Sales Report')
#set title and set ylabel
plt.ylabel('Sales in Rs')
plt.show()

Customising Bar Chart


We can also customise the bar chart by adding certain parameters to the
plot function. We can control the edgecolor of the bar, linestyle and
linewidth. We can also control the color of the lines. The following example
shows various customisations on the bar chart of Figure 4.7

Program 4-7 Let us write a Python script to display Bar plot for the
“MelaSales.csv” file with column Day on x axis, and having the following
customisation:
● Changing the color of each bar to red, yellow and purple.
● Edgecolor to green
● Linewidth as 2
● Line style as "--"
import pandas as pd
import matplotlib.pyplot as plt
df= pd.read_csv('MelaSales.csv')
# plots a bar chart with the column "Days" as x axis
df.plot(kind='bar',x='Day',title='Mela Sales
Report',color=['red','yellow','purple'],edgecolor='Green',linewidth=2,linestyl
e='--')
#set title and set ylabel
plt.ylabel('Sales in Rs')
plt.show()

Program to plot bargraph (matplotlib library)


import matplotlib.pyplot as plt
x=['6th','7th','8th','9th','10th','11th','12th']
y=[130,120,135,130,150,80,75]
plt.bar(x,y,width=0.3,color=['b','r','y','g','m','orange','gold'])
# plt.barh(x,y,height=0.3,color=['b','r','y','g','m','orange','gold'])
plt.xlabel('Class')
plt.ylabel('Number of students')
plt.title('Data of secondary standard students')
plt.show()

Histogram
Histograms are column-charts, where each column represents a range of
values, and the height of a column corresponds to how many values are in
that range.

To make a histogram, the data is sorted into "bins" and the number of data
points in each bin is counted. The height of each column in the histogram is
then proportional to the number of data points its bin contains.

Difference between a bar chart/graph and a histogram


● A bar chart represents categorical data (data that has some
labels associated with it).
● Histogram charts are used to describe distributions.
● Bar graph represented using rectangular bars with length
proportional to the values that they represent.
● Histogram represented as bars, showing what portion of the
dataset falls in each category of bins(intervals).

Program to plot a histogram


import matplotlib.pyplot as plt
age=[22,32,35,45,55,14,26,19,56,44,48,33,38,28]
years=[0,10,20,30,40,50,60]
plt.hist(age,bins=years,color='magenta',rwidth=1,edgecolor='black',histty
pe='bar',label='age')
plt.xlabel('Emp age')
plt.ylabel('No of emp')
plt.title('Nims')
plt.legend()
plt.show()
Program 4-8
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Arnav', 'Sheela', 'Azhar', 'Bincy', 'Yash','Nazar'],'Height' :
[160,171,183,165,181,180],'Weight' : [47,89,52,58,50,47]}
df=pd.DataFrame(data)
df.plot(kind='hist')
plt.show()

It is also possible to set value for the bins parameter,


for example,
df.plot(kind=’hist’,bins=20)
df.plot(kind='hist',bins=[18,19,20,21,22])
df.plot(kind='hist',bins=range(18,25))

Customising Histogram
Taking the same data as above, now let see how the histogram can be
customised. Let us change

• the edgecolor, which is the border of each hist, to green.


• the line style to ":" and line width to 2.
• Let us try another property called fill, which takes boolean values. The
default True means each hist will be filled with color and False means
each hist will be empty.
• Another property called hatch can be used to fill to each hist with
pattern ( '-', '+', 'x', '\\', '*', 'o', 'O', '.'). use the hatch value as "o".

Program 4-9
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Arnav', 'Sheela', 'Azhar','Bincy','Yash','Nazar'],
'Height' : [60,61,63,65,61,60],
'Weight' : [47,89,52,58,50,47]}
df=pd.DataFrame(data)
df.plot(kind='hist',edgecolor='Green',linewidth=2,linestyle=':',fill=False,hat
ch='o')
plt.show()
Using Open Data
There are many websites that provide data freely for anyone to download
and do analysis, primarily for educational purposes. These are called Open
Data as the data source is open to the public.

“Open Government Data (OGD) Platform India” (data.gov.in) is a platform


for supporting the Open Data initiative of the Government of India.
Let us consider a dataset called “Seasonal and Annual Min/Max Temp Series
- India from 1901 to 2017” from the URL
https://data.gov.in/resources/seasonal-andannual-minmax-temp-series-
india-1901-2017.

Our aim is to plot the minimum and maximum temperature and observe the
number of times (frequency) a particular temperature has occurred. We
only need to extract the 'ANNUAL - MIN' and 'ANNUAL - MAX' columns from
the file. Also, let us aim to display two Histogram plots:
i) Only for 'ANNUAL - MIN'
ii) For both 'ANNUAL - MIN' and 'ANNUAL - MAX'

Program 4-10
import pandas as pd
import matplotlib.pyplot as plt
#read the CSV file with specified columns
#usecols parameter to extract only two required columns
data=pd.read_csv("Min_Max_Seasonal_IMD_2017.csv",usecols=['ANNUAL -
MIN','ANNUAL - MAX'])
df=pd.DataFrame(data)
#plot histogram for 'ANNUAL - MIN'
df.plot(kind='hist',y='ANNUAL - MIN',title='Annual MinimumTemperature
(1901-2017)')
plt.xlabel('Temperature')
plt.ylabel('Number of times')
#plot histogram for both 'ANNUAL - MIN' and 'ANNUAL - MAX'
df.plot(kind='hist',title='Annual Min and Max Temperature (1901-
2017)',color=['blue','red'])
plt.xlabel('Temperature')
plt.ylabel('Number of times')
plt.show()

You might also like