Chapter 4 Plotting Data using Matplotlib
Chapter 4 Plotting Data using Matplotlib
Marker
A marker is any symbol that represents a data value in a line chart or a
scatter plot. It is also possible to specify each point in the line through a
marker.
Some of the Matplotlib Markers
Colour
It is also possible to format the plot further by changing the colour of the
plotted data. We can either use character codes or the color names as
values to the parameter color in the plot().
Program 4-3 Consider the average heights and weights of persons aged 8
to 16 stored in the following two lists:
If we have a Series 's' or DataFrame type object 'df', we can call the plot
method by writing:
s.plot() or df.plot()
Program 4-4 Smile NGO has participated in a three week cultural mela.
Using Pandas, they have stored the sales (in Rs) made day wise for every
week in a CSV file named “MelaSales.csv”, as shown in below Table.
Depict the sales for the three weeks using a Line chart. It should have the
following:
i. Chart title as “Mela Sales Report”.
ii. axis label as Days.
iii. axis label as “Sales in Rs”.
iv. Line colours are red for week 1, blue for week 2 and brown for week 3.
import pandas as pd
import matplotlib.pyplot as plt
# reads "MelaSales.csv" to df by giving path to the file
df=pd.read_csv("MelaSales.csv")
#create a line plot of different color for each week
df.plot(kind='line', color=['red','blue','brown'])
# Set title to "Mela Sales Report"
plt.title('Mela Sales Report')
# Label x axis as "Days"
plt.xlabel('Days')
# Label y axis as "Sales in Rs"
plt.ylabel('Sales in Rs')
#Display the figure
plt.show()
Program 4-5 Assuming the same CSV file, i.e., MelaSales. CSV, plot the
line chart with following customisations:
Maker ="*"
Marker size=10
linestyle="--"
Linewidth =3
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("MelaSales.csv")
#creates plot of different color for each week
df.plot(kind='line',
color=['red','blue','brown'],marker="*",markersize=10,linewidth=3,linestyle
="--")
plt.title('Mela Sales Report')
plt.xlabel('Days')
plt.ylabel('Sales in Rs')
#store converted index of DataFrame to a list
ticks = df.index.tolist()
#displays corresponding day on x axis
plt.xticks(ticks,df.Day)
plt.show()
Program 4-6 This program displays the Python script to display Bar plot
for the “MelaSales.csv” file with column Day on x axis as shown above
Figure. If we do not specify the column name for the x parameter in the
plot(), the bar plot will plot all the columns of the DataFrame with the index
(row label) of DataFrame at x axis which is a numeric starting from 0.
import pandas as pd
df= pd.read_csv('MelaSales.csv')
import matplotlib.pyplot as plt
# plots a bar chart with the column "Days" as x axis
df.plot(kind='bar',x='Day',title='Mela Sales Report')
#set title and set ylabel
plt.ylabel('Sales in Rs')
plt.show()
Program 4-7 Let us write a Python script to display Bar plot for the
“MelaSales.csv” file with column Day on x axis, and having the following
customisation:
● Changing the color of each bar to red, yellow and purple.
● Edgecolor to green
● Linewidth as 2
● Line style as "--"
import pandas as pd
import matplotlib.pyplot as plt
df= pd.read_csv('MelaSales.csv')
# plots a bar chart with the column "Days" as x axis
df.plot(kind='bar',x='Day',title='Mela Sales
Report',color=['red','yellow','purple'],edgecolor='Green',linewidth=2,linestyl
e='--')
#set title and set ylabel
plt.ylabel('Sales in Rs')
plt.show()
Histogram
Histograms are column-charts, where each column represents a range of
values, and the height of a column corresponds to how many values are in
that range.
To make a histogram, the data is sorted into "bins" and the number of data
points in each bin is counted. The height of each column in the histogram is
then proportional to the number of data points its bin contains.
Customising Histogram
Taking the same data as above, now let see how the histogram can be
customised. Let us change
Program 4-9
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Arnav', 'Sheela', 'Azhar','Bincy','Yash','Nazar'],
'Height' : [60,61,63,65,61,60],
'Weight' : [47,89,52,58,50,47]}
df=pd.DataFrame(data)
df.plot(kind='hist',edgecolor='Green',linewidth=2,linestyle=':',fill=False,hat
ch='o')
plt.show()
Using Open Data
There are many websites that provide data freely for anyone to download
and do analysis, primarily for educational purposes. These are called Open
Data as the data source is open to the public.
Our aim is to plot the minimum and maximum temperature and observe the
number of times (frequency) a particular temperature has occurred. We
only need to extract the 'ANNUAL - MIN' and 'ANNUAL - MAX' columns from
the file. Also, let us aim to display two Histogram plots:
i) Only for 'ANNUAL - MIN'
ii) For both 'ANNUAL - MIN' and 'ANNUAL - MAX'
Program 4-10
import pandas as pd
import matplotlib.pyplot as plt
#read the CSV file with specified columns
#usecols parameter to extract only two required columns
data=pd.read_csv("Min_Max_Seasonal_IMD_2017.csv",usecols=['ANNUAL -
MIN','ANNUAL - MAX'])
df=pd.DataFrame(data)
#plot histogram for 'ANNUAL - MIN'
df.plot(kind='hist',y='ANNUAL - MIN',title='Annual MinimumTemperature
(1901-2017)')
plt.xlabel('Temperature')
plt.ylabel('Number of times')
#plot histogram for both 'ANNUAL - MIN' and 'ANNUAL - MAX'
df.plot(kind='hist',title='Annual Min and Max Temperature (1901-
2017)',color=['blue','red'])
plt.xlabel('Temperature')
plt.ylabel('Number of times')
plt.show()