0% found this document useful (0 votes)
52 views

Data Visualization - 1 by Matplot Lib

matplotlib in python
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Data Visualization - 1 by Matplot Lib

matplotlib in python
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Introduction to MatPlotLib

Most people visualize information better when they see it in graphic versus textual
format.
Graphics help people see relationships and make comparisons with greater ease.
Fortunately, python makes the task of converting textual data into graphics relatively
easy using libraries, one of most commonly used library for this is MatPlotLib.
Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python.
Graph
A Graph or chart is simply a visual representation of numeric data.
MatPlotLib makes a large number of graph and chart types.
We can choose any of the common graph such as line charts, histogram, scatter
plots etc....

Line Histogra Scatter 3D Image Bar Pie


Chart m Plot Plot s Chart Chart
Etc......
.
Plot
To define a plot, we need some values, the matplotlib.pyplot module and an
idea of what we want to display.
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 values = [5,8,9,4,1,6,7,2,3,8]
4 plt.plot(range(1,11),values)
5 plt.show()

In this case, the code tells the plt.plot() function to create a plot using x-axis
between 1 and 11 and y-axis as per values list.
Plot – Drawing multiple lines
We can draw multiple lines in a plot by making multiple plt.plot() calls.
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 values1 = [5,8,9,4,1,6,7,2,3,8]
4 values2 = [8,3,2,7,6,1,4,9,8,5]
5 plt.plot(range(1,11),values1)
6 plt.plot(range(1,11),values2)
7 plt.show()
Plot – Export graphs/plots
We can export/save our plots on a drive using savefig() method.
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 values1 = [5,8,9,4,1,6,7,2,3,8]
4 values2 = [8,3,2,7,6,1,4,9,8,5]
5 plt.plot(range(1,11),values1)
6 plt.plot(range(1,11),values2)
7 #plt.show()
8 plt.savefig('SaveToPath.png',format='png')
SaveToPath.pn
g
Possible values for the format parameters are
○ png
○ svg
○ pdf
○ Etc...
Plot – Axis, Ticks and Grid
We can access and format the axis, ticks and grid on the plot using the axis()
method of the matplotlib.pyplot.plt
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib notebook
3 values = [5,8,9,4,1,6,7,2,3,8]
4 ax = plt.axes()
5 ax.set_xlim([0,50])
6 ax.set_ylim([-10,10])
7 ax.set_xticks([0,5,10,15,20,25,30,35,40,45,50])
8 ax.set_yticks([-10,-8,-6,-4,-2,0,2,4,6,8,10])
9 ax.grid()
10 plt.plot(range(1,11),values)
Plot – Line Appearance
We need different line styles in order to differentiate when having multiple lines in the same
plot, we can achieve this using many parameters, some of them are listed below.
○ Line style (linestyle or ls)
○ Line width (linewidth or lw)
○ Line color (color or c)
○ Markers (marker)
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 values1 = [5,8,9,4,1,6,7,2,3,8]
4 values2 = [8,3,2,7,6,1,4,9,8,5]
5 plt.plot(range(1,11),values1,c='r',lw=1,ls='--',marker='>')
6 plt.plot(range(1,11),values2,c='b',lw=2,ls=':',marker='o')
7 plt.show()
Plot – Line Appearance (Cont.)
Possible Values for each parameters are,
Values Line Style Values Color Values Marker
‘-’ Solid line ‘b’ Blue ‘.’ Point
‘--’ Dashed line ‘g’ Green ‘,’ Pixel
‘-.’ Dash-dot line ‘r’ Red ‘o’ Circle
‘:’ Dotted line ‘c’ Cyan ‘v’ Triangle down
‘m’ Magenta ‘^’ Triangle up
‘y’ Yellow ‘>’ Triangle right
‘k’ Black ‘<’ Triangle left
‘w’ White ‘*’ Star
‘+’ Plus
‘x’ X
Etc.......
Plot – Labels, Annotation and Legends
To fully document our graph, we have to
resort the labels, annotation and legends.
Each of this elements has a different
purpose as follows,
○ Label : provides identification of a particular data
element or grouping, it will make easy for viewer Y Label
to know the name or kind of data illustrated.
○ Annotation : augments the information the
viewer can immediately see about the data with
notes, sources or other useful information.
○ Legend : presents a listing of the data groups
within the graph and often provides cues ( such
as line type or color) to identify the line with the
data. Annotation

Legend
X Label
Plot – Labels, Annotation and Legends (Example)
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 values1 = [5,8,9,4,1,6,7,2,3,8]
4 values2 = [8,3,2,7,6,1,4,9,8,5]
5 plt.plot(range(1,11),values1)
6 plt.plot(range(1,11),values2)
7 plt.xlabel('Roll No')
8 plt.ylabel('CPI')
9 plt.annotate(xy=[5,1],s='Lowest CPI')
10 plt.legend(['CX','CY'],loc=4)
11 plt.show()
Choosing the Right Graph
The kind of graph we choose determines how people view the associated data, so
choosing the right graph from the outset is important.
For example,
○ if we want o show how various data elements contribute towards a whole, we should use pie chart.
○ If we want to compare data elements, we should use bar chart.
○ If we want to show distribution of elements, we should use histograms.
○ If we want to depict groups in elements, we should use boxplots.
○ If we want to find patterns in data, we should use scatterplots.
○ If we want to display trends over time, we should use line chart.
○ If we want to display geographical data, we should use basemap.
○ If we want to display network, we should use networkx.
All the above graphs are there in our syllabus and we are going to cover all the
graphs in this Unit.
We are also going to cover some other types of libraries which is not in the syllabus
like seaborn, plotly, cufflinks and choropleth maps etc..
Pie Chart
Pie chart focus on showing parts of a whole, the entire pie would be 100 percentage,
the question is how much of that percentage each value occupies.
pieChartDemo.py
1 import matplotlib.pyplot as plt
2 %matplotlib notebook
3 values = [305,201,805,35,436]
4 l =
['Food','Travel','Accomodation','Misc','Shoping']
5 c = ['b','g','r','c','m']
6 e = [0,0.2,0,0,0]
7 plt.pie(values,colors=c,labels=l,explode=e)
8 plt.show()
Pie Chart (Cont.)
There are lots of other options available with the pie chart, we are going to cover two
important parameters in this slide.
pieChartDemo.py
1 import matplotlib.pyplot as plt
2 %matplotlib notebook
3 values = [305,201,805,35,436]
4 l =
['Food','Travel','Accomodation','Misc','Shoping']
5 c = ['b','g','r','c','m']
6 plt.pie(values,colors=c,labels=l,shadow=True,
7 autopct='%1.1f%%')
8 plt.show()
Bar charts
Bar charts make comparing values easy, wide bars an d segregated measurements
emphasize the difference between values, rather that the flow of one value to
another as a line graph.
barChartDemo.py
1 import matplotlib.pyplot as plt
2 %matplotlib notebook
3 x = [1,2,3,4,5]
4 y = [5.9,6.2,3.2,8.9,9.7]
5 l = ['1st','2nd','3rd','4th','5th']
6 c = ['b','g','r','c','m']
7 w = [0.5,0.6,0.3,0.8,0.9]
8 plt.title('Sem wise spi')
9 plt.bar(x,y,color=c,label=l,width=w)
10 plt.show()
Histograms
Histograms categorize data by breaking it into bins, where each bin contains a
subset of the data range.
A Histogram then displays the number of items in each bin so that you can see the
distribution of data and the progression of data from bin to bin.
histDemo.py
1 import matplotlib.pyplot as plt
2 import numpy as np
3 %matplotlib notebook
4 cpis = np.random.randint(0,10,100)
5 plt.hist(cpis,bins=10,
histtype='stepfilled',align='mid',label
='CPI Hist')
6 plt.legend()
7 plt.show()
Boxplots
Boxplots provide a means of depicting groups of numbers through their quartiles.
Quartiles means three points dividing a group into four equal parts.
In boxplot, data will be divided in 4 part using the 3 points (25 th percentile, median,
75th percentile)
Interquartile Range
(IQR)

Outlier Whisker Whisker Outlier


s s s s

Minimum Maximum
(Q1 – 1.5 * IQR) Media (Q3 + 1.5 * IQR)
n
Q1 Q2 Q3
(25th (50th (75th
Percentile) Percentile) Percentile)
-5 -4 -3 -2 -1 0 1 2 3 4 5
Boxplot (Cont.)
Boxplot basically used to detect outliers in the data, lets see an example where we
need boxplot.
We have a dataset where we have time taken to check the paper, and we want to
find the faculty which either takes more time or very little time to check the paper.
boxDemo.py
1 import pandas as pd
2 import matplotlib.pyplot as plt
3 %matplotlib inline
4 timetaken =
pd.Series([50,45,52,63,70,21,56,68,54,5
7,35,62,65,92,32])
5 plt.boxplot(timetaken)

We can specify other parameters like


○ widths, which specify the width of the box
○ notch, default is False
○ vert, set to 0 if you want to have horizontal graph
Scatter Plot
A scatter plot is a type of plot that shows the data as a collection of points.
The position of a point depends on its two-dimensional value, where each value is a
position on either the horizontal or vertical dimension.
It is really useful to study the relationship/pattern between variables.
histDemo.py
1 import matplotlib.pyplot as plt
2 import pandas as pd
3 %matplotlib inline
4 df = pd.read_csv('insurance.csv')
5 plt.scatter(df['bmi'], df['charges'])
6 plt.show()
Scatter Plot (Cont.)
To find specific pattern from the data, we can further divide the data and plot scatter
plot.
We can do this with the help of groupby method of DataFrame, and then using tuple
unpacking
histDemo.py while looping the group.
1 import matplotlib.pyplot as plt
2 import pandas as pd
3 %matplotlib inline
4 df = pd.read_csv('insurance.csv')
5 grouped = df.groupby(['smoker'])
6 for key, group in grouped:
7 plt.scatter(group['bmi'],
group['charges'],
label='Smoke = '+key)
8 plt.legend()
9 plt.show()

Note : we can specify marker, color, and size of the marker with the help

You might also like