Matplotlib
Matplotlib
org/
What is Matplotlib?
Matplotlib is a graph plotting library in python that serves as a data visualization utility. It was created by
John D. Hunter. Matplotlib is open source and we can use it freely.
Data Visualization:
Human brain can process information easily when it is in pictorial or graphical form. Data visualization
allows us to quickly interpret the data and adjust different variables to see their effect.
Types of Plots:
In [1]: from matplotlib import pyplot as plt
Line Plot
Line plot can be defined as a graph that displays data as points or check marks above a
number line, showing the frequency of each value
In [3]: plt.plot([2,4,6,8],[10,3,20,4])
plt.show()
In [4]: x=[2,4,6,8]
y=[10,3,20,4]
plt.figure(figsize=(5,3))
plt.plot(x,y,linestyle = 'dotted',
linewidth = '2',
marker = 'o',
color = 'r')
plt.title('Line Plot')
plt.ylabel("Y Axis")
plt.show()
In [5]: x1=[5,8,10]
y1=[12,16,6]
x2=[6,9,11]
y2=[6,15,7]
plt.figure(figsize=(5,3))
plt.legend()
plt.grid(True, color='k', linestyle='--', linewidth=0.5)
plt.show()
plt.figure(figsize=(5, 3))
plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')
plt.xlabel('X label')
plt.ylabel('Y label')
plt.title("Simple Plot")
plt.legend()
plt.show()
Bar Graph
Bar graph are a type of data visualization used to represent data in the form of rectangular
bars. The height of each bar represents the value of a data point, and the width of each bar
represents the category of the data. e.g., region wise sales of a company
#Horizontal Bars
#plt.barh([1,3,5,7,9],[5,2,7,8,2], label='Example One', color='g')
#plt.barh([2,4,6,8,10],[8,6,2,5,6], label='Example Two', color='c')
plt.title('Bar Graph')
plt.ylabel('Bar Height')
plt.xlabel('Bar Number')
plt.legend()
plt.show()
Histograms
Histogram is a plot that lets you discover, and show, the underlying frequency distribution
(shape) of a set of continuous data. This allows the inspection of the data for its underlying
distribution. e.g., data distribution
In [8]: pop=[22,55,62,45,21,22,34,42,42,4,99,101,110,120,121,122,130,111,115,112,80,75,65,54,44,
bins=[0,10,20,30,40,50,60,70,80,90,100,110,120,130]
plt.figure(figsize=(5,3))
plt.hist(pop,bins, histtype='bar', rwidth=0.8)
plt.title('Histogram')
plt.ylabel('Y')
plt.xlabel('X')
plt.show()
Scatter Plot
Scatter plot uses dots to represent values for two different numeric variables. The position of
each dot on the horizontal and vertical axis indicates values for an individual data point.
Scatter plots are used to observe relationships between variables. e.g., different companies
sales
In [9]: x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
#c = ["red","green","blue","yellow","pink","black","orange","purple","beige","brown","gr
#s = [20,50,40,100,75,200,60,90,10,150,250,200,75]
plt.figure(figsize=(5,3))
plt.scatter(x,y) #color=c, sizes=s
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot')
plt.show()
Area Plot / Stack Plot
Area Plot, also known as a mountain chart, is a data visualization type that combines the appearance of a line
chart and a bar chart. It is commonly used to show how numerical valueschange based on a second variable,
usually a time period. e.g., growth of company sales in different region
In [10]: days=[1,2,3,4,5]
sleeping=[7,8,6,11,7]
eating=[2,3,4,3,2]
working=[7,8,7,2,2]
playing=[8,5,7,8,13]
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Area Plot')
plt.legend()
plt.show()
Pie Chart
Pie chart is a graphical representation technique that displays data in a circular-shaped graph.
e.g., income, expenses and profit of a company
In [11]: slices=[7,2,2,13]
activities=['Sleeping', 'Eating', 'Working', 'Playing']
cols=['c','m','r','b']
plt.title('Pie Plot')
plt.legend()
plt.show()
In [12]: n = 10000
x=np.random.randn(n)
y=np.random.randn(n)
plt.figure(figsize=(5,4))
plt.hexbin(x, y, gridsize = 30, cmap ='Greens')
plt.title('Hexagonal Bin Plot')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
Box Plot
Box plot give a good graphical image of the concentration of the data. They also show how
far the extreme values are from most of the data.
plt.figure(figsize=(5,4))
plt.boxplot(x)
plt.title('Box Plot')
plt.show()
Sub Plotting
With the subplot( () function you can draw multiple plots in one figure
In [14]: plt.figure(figsize=(5,4))
# Plot 1:
a1 = [0, 1, 2, 3]
b1 = [3, 8, 1, 10]
# Plot 2:
a2 = [0, 1, 2, 3]
b2 = [10, 20, 30, 40]
In [15]: df=pd.read_csv('data.csv')
df.plot(figsize=(5,3))
df.plot(kind='scatter', x='Duration', y='Calories',figsize=(5,3))
df.plot(kind='hist', x='Duration', y='Calories',figsize=(5,3))
<Axes: ylabel='Frequency'>
Out[15]: