Matplotlib Notes
Matplotlib Notes
For plotting using Matplotlib, we need to import its Pyplot module using the
following command:
What is visualization?
Data visualization means graphical or pictorial representation of the data using
graph, chart, etc. The purpose of plotting data is to visualize variation or show
relationships between variables.
A figure contains a plotting area, legend, axis labels, ticks, title, etc. The plot()
function by default plots a line chart.
We can click on the save button on the output window and save the plot as an
image. A figure can also be saved by using savefig() function.
Marker : - A marker is any symbol that represents a data value in a line chart or a
scatter plot.
Colour : - It is also possible to format the plot further by changing the colour of
the plotted data.We can either use character codes or the color names as values to
the parameter color in the plot().
Linewidth and Line Style: - The linewidth and linestyle property can be used to
change the width and the style of the line chart. Linewidth is specified in pixels.
The default line width is 1 pixel showing a thin line.
We can also set the line style of a line chart using the linestyle parameter. It can
take a string such as "solid", "dotted", "dashed" or "dashdot".
The Plot()
The plot() method of Pandas accepts a considerable number of arguments that can
be used to plot a variety of graphs. It allows customizing different plot types by
supplying the kind keyword arguments. Where kind accepts a string indicating the
type of .plot syntax is: plt.plot(kind)
example: df.plot(kind='line')
Plotting a Line chart: A line plot is a graph that shows the frequency of data
along a number line. It is used to show continuous dataset. A line plot is used to
visualise growth or decline in data over a time interval.
The linewidth and linestyle property can be used to change the width and the style
of the line chart. Linewidth is specified in pixels. The default line width is 1 pixel
showing a thin line. We can also set the line style of a line chart using the linestyle
parameter.
Bar graph: - The bar() function takes arguments that describes the layout of the
bars. Bar graph presents categorical data with rectangular bars with heights or
lengths proportional to the values that they represent.
We can also customise the bar chart by adding certain parameters to the plot
function. We can control the edgecolor of the bar, linestyle and linewidth. We can
also control the color of the lines.
df.plot(kind='bar',x='Day',title='Mela Sales
Report',color=['red','yellow','purple'],
edgecolor='Green',linewidth=2,linestyle='--')
Boxplot : - A Box Plot is the visual representation of the statistical five number
summary of a given data set.
A Box Plot is the visual representation of the statistical summary of a given data
set. The summary includes Minimum value, Quartile 1, Quartile 2, Median,
Quartile 4 and Maximum value.
The whiskers are the two lines outside the box that extend to the highest and lowest
values. It also helps in identifying the outliers. An outlier is an observation that is
numerically distant from the rest of the data.
The distance between the box and lower or upper whiskers in some boxplots are
more, and in some less. Shorter distance indicates small variation in data, and
longer distance indicates spread in data to mean larger variation.
The histogram can be customized like edgecolor, border, style, fill, Another
property called hatch can be used to fill to each hist with pattern ( '-', '+', 'x', '\\', '*',
'o', 'O', '.').
df.plot(kind=’hist’,bins=20)
df.plot(kind='hist',bins=[18,19,20,21,22])
df.plot(kind='hist',bins=range(18,25))
df.plot(kind='hist',edgecolor='Green', linewidth=2,
linestyle=':', fill=False,hatch='o')
Pie / Pie chart: - Pie is a type of graph in which a circle is divided into different
sectors and each sector represents a part of the whole. A pie plot is used to
represent numerical data proportionally.
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'GeoArea':[83743,78438,22327,22429,21081,16579,10486],'Fo
restCover':[67353,27692,17280,17321,19240,13464,8073]},
index=['Arunachal Pradesh','Assam','Manipur','Meghalaya',
'Mizoram','Nagaland','Tripura'])
df.plot(kind='pie',y='ForestCover', title='Forest cover of North Eastern
states',legend=False)
plt.show()
Scatter plot: - It is similar to a line chart the major difference is that while line
graph connects the data points with a line, scatter chart simply plots the data points
to show trend in the data.
Scatter plots are used when you want to show the relationship between two
variables. Scatter plots are sometimes called correlation plots because they show
how two variables are correlated. The size of the bubble can also be used to reflect
a value.
plt.scatter(x=discount,y=saleInRs,s=size,color='red',linewidth=3,m
arker='*',edgecolor='blue')
Using Open Data:- There are many websites that provide data freely for anyone to
download and do analysis, primarily for educational purposes. These are called
Open Data as the data source is open to the public. Availability of data for access
and use promotes further analysis and innovation.