Data Visualisation
Data Visualisation
Representation of Data
Data representation can be defined as a technique for presenting large volumes of data in a
manner that enables the user to interpret the important data with minimum effort and time.
Data representation techniques are broadly classified in two ways:
Non-Graphical Technique:
Tabular form and case form: This is the old format of data representation not suitable for large
datasets. Non-graphical techniques are not so suitable when our objective is to make some
decisions after analysing a set of data.
Graphical Technique:
The visual display of statistical data in the form of points, lines, dots and other geometrical
forms is most common. For a complex and large quantity of data, human brain is more
comfortable in dealing if represented through visual format means Graphical or pictorial
representation of the data using graph, chart, etc. is known as Data visualization.
For data visualization in Python, the Matplotlib library’s Pyplot interfaces is used. The
Matplotlib is a Python library that provides many interfaces and functionality for 2D-graphics.
You can install it by giving the following commands on the command prompt:
Pyplot: Plot is a collection of methods within matplotlib which allows user to construct 2D
plots easily and interactively.
Importing PyPlot
To import pyplot in our code we have to use the following statement:
import matplotlib.pyplot as pl
Here, pl is an alias of matplotlib.pyplot.
Some common functions of Matplotlib library with their description is given below:
Function Name Description
title ( ) Adds title to the chart/graph
xlabel ( ) Sets label for X-axis
ylabel ( ) Sets label for Y-axis
xlim ( ) Sets the value limit for X-axis
ylim( ) Sets the value limit for Y-axis
xticks ( ) Sets the tick marks in X-axis
yticks( ) Sets the tick marks in Y-axis
show ( ) Displays the graph in the screen
savefig(“adrress”) Saves the graph in the address specified as argument.
figure ( figsize = value Determines the size of the plot in which the graph is drawn. Values
in tuple format) should be supplied in tuple format to the attribute figsize which is
passed as argument.
Different Types of Graphs:
Line Chart:
A line chart or line graph is the type of chart which displays information as a series of data
points called ‘marker’ connected by straight line segments. With Pyplot, a line chart is created
using plot() function.
Line Chart
Bar Chart:
A bar chart bar graph represents categorial data with rectangular bars with heights or length
proportional to the values that they represent. The bars can be plotted vertically or horizontally.
With Pyplot, a bar chart is created using bar() and barh() functions.
Bar Chart
Scatter Plot:
The scattered plot is similar to a line chart, the major difference is that while line graphs
connects the data points with the line, scatter chart simply plot the data points to show the
trend in that data. With Pyplot, a scattered chart is created using scatter() function.
Scatter Chart
Pie Chart:
A pie chart is circular statistical graphic, which is divided into slices two illustrate numerical
proportion. with Pyplot, a pie chart is created using pie() function. But pie chart can plot only
one data sequence unlike other chart types.
Pie Chart
Histogram Plot:
A histogram is a type of graph that provides a visual interpretation of numerical data range by
indicating the number of data points that lie within the range of values. With PyPlot, a
histogram is created using hist() function.
We can change x-axis and y-axis labels using xlabel() and y label() function
respectively.
Syntax:
1) Matplotlib.pyplot.xlabel(<str>)
2) Matplotlib.pyplot.ylabel(<str>)
Example1:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
pl.plot(a,b)
Q.1 Write a program to plot a line chart to depict the changing of price of a share
in share market for four weeks.
Specifying Plot Size and Grid
We can specify the size of a graph with the help of following statement:
<matplotlib.pyplot>.figure(figsize=(<width>,length>))
Example: Before:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
print("Before:") after:
pl.plot(a,b)
pl.show()
print("after:")
pl.figure(figsize=(10,7))
pl.plot(a,b)
To see the grid lines we can use grid() function.
Example3:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.grid(True)
pl.plot(a,b)
Example2:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
pl.plot(a,b,"r")
Example1:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.plot(a,b,"c",linewidth=4, linestyle='dotted')
Example2:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.plot(a,b,"c",linewidth=4, linestyle='dashdot')
Example2:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.plot(a,b,"c",marker='s',markeredgecolor='red')
Example3:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.plot(a,b,'cs',linestyle='solid',markeredgecolor='red')
MarkerTypes
marker symbol description
"." Point
"," Pixel
"o" Circle
"v" triangle_down
"^" triangle_up
"<" triangle_left
">" triangle_right
"1" tri_down
"2" tri_up
"3" tri_left
"4" tri_right
"8" Octagon
"s" Square
"p" pentagon
"P" plus (filled)
"*" Star
"h" hexagon1
"H" hexagon2
"+" Plus
"x" X
"X" x (filled)
"D" diamond
"d" thin_diamond
"|" vline
"_" hline
Program
Write a program to draw a line chart, we use plot function.
import matplotlib.pyplot as pl
Tests=[1,2,3,4,5]
Marks=[25,34,49,40,48]
pl.title("Analysis of Test Marks")
pl.xlabel("Test-No")
pl.ylabel("Marks")
pl.plot(Tests,Marks,'g',marker='D',markersize=10,
markeredgecolor='blue',linestyle='solid')
pl.show()
Eg:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
pl.plot(a,b, 'ro',markersize=8)
Ex.:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
pl.scatter(a,b,marker="d")
Setting Size of the marker:
we use ‘s’ argument to set the size of the marker.
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.scatter(a,b,s=200)
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.scatter(a,b,c='r',s=200)
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.bar(a,b)
Example2:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.bar(a,b, color=['r','g','b','y','c'])
By default Pyplot automatically tries to find the best fitting range for X and Y axis
depending upon the values being plotted. Sometimes we need to have limits for
X-axis and Y-axis respectively. For this we use xlim() and ylim() functions to set
limit for x-axis and y-axis.
The syntax for setting xlimit and ylimit is as follows:
<Matplotlib.pyplot>.xlim(<xmin>,<xmax>)
<Matplotlib.pyplot>.ylim(<ymin>,<ymax>)
Example:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
print("=====Before=========")
pl.bar(a,b, color=['r','g','b','y','c'])
pl.show()
print("=====After=========")
pl.xlim(0,10)
pl.ylim(0,100)
pl.bar(a,b, color=['r','g','b','y','c'])
pl.show()
import matplotlib.pyplot as pl
a=[1,3,5,7,9]
b=[10,25,16,70,45]
print("=====Before=========")
pl.bar(a,b, color=['r','g','b','y','c'])
pl.show()
print("=====After=========")
pl.xlim(0,10)
pl.ylim(0,100)
pl.xticks(a)
pl.yticks(b)
pl.bar(a,b, color=['r','g','b','y','c'])
pl.show()
Creating Histogram with Pyplot
Syntax:
Matplotlib.pyplot.hist(x, bins=None, cumulative=False, histtype= ‘bar’, align=
‘mid’, orientation= “vertical”)
Here,
Example1:
import matplotlib.pyplot as pl
a=[2,3,7,19,6,8,11,14,15,22,33,24,22,8,2,4]
pl.hist(a)
pl.show()
import matplotlib.pyplot as pl
a=[2,3,7,6,8,12,4]
pl.hist(a)
pl.show()
import matplotlib.pyplot as pl
a=[2,3,7,6,8,12,4]
pl.hist(a,bins=[1,10,20])
pl.show()
Creating Multiple Histogram:
import matplotlib.pyplot as pl
a=[2,3,7,6,8,12,4]
b=[5,6,2,3,10,26]
pl.hist([a,b])
pl.show()
import matplotlib.pyplot as pl
a=[2,3,7,6,8,12,4]
b=[5,6,2,8,8,26]
pl.hist([a,b],bins=4)
pl.show()
Output?
import matplotlib.pyplot as pl
a=[2,3,7,6,8,12,4]
b=[5,6,2,7,8,26]
pl.hist([a,b],bins=4)
pl.show()
import matplotlib.pyplot as pl
a=[2,3,7,6,5.5,8,12,4,15]
pl.hist(a,bins=4)
pl.show()
Pie Chart
A pie chart is a circular graph divided into segments or sections, each representing
a relative proportion or percentage of the total. Each segment resembles a slice of
pie, hence the name. Pie charts are commonly used to visualize data from a small
table, but it is recommended to limit the number of categories to seven to maintain
clarity. However, zero values cannot be depicted in pie charts.
Program:
Write a program to draw a pie chart to visualize the comparative rainfall data for 12
months in Tamil Nadu using the CSV file "rainfall.csv".
import pandas as pd Day Rainfall
import matplotlib.pyplot as plt Monday 1
df=pd.read_csv("d:\Rainfall.csv") Tuesday 2
Wednesday 1
x=df['Day'] Thursday 3
y=df['Rainfall'] Friday 2
wp={'linewidth':1,'edgecolor':"black"} Saturday 1
Sunday 1
plt.pie(y,labels=x,startangle=90,wedgeprops=wp)
plt.legend(loc='upper right')
plt.title("Rain Fall Data",fontname='calibri',color='m',fontsize=16)
plt.show()