0% found this document useful (0 votes)
2 views16 pages

Data Visualisation

Data visualization is the representation of data in graphical formats to facilitate easier interpretation and decision-making. Techniques include non-graphical methods like tables and graphical methods such as line charts, bar charts, scatter plots, and pie charts, with Python's Matplotlib library being a popular tool for creating these visualizations. The document also provides examples of how to use various functions in Matplotlib to customize and display different types of charts.

Uploaded by

useroppo01234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views16 pages

Data Visualisation

Data visualization is the representation of data in graphical formats to facilitate easier interpretation and decision-making. Techniques include non-graphical methods like tables and graphical methods such as line charts, bar charts, scatter plots, and pie charts, with Python's Matplotlib library being a popular tool for creating these visualizations. The document also provides examples of how to use various functions in Matplotlib to customize and display different types of charts.

Uploaded by

useroppo01234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Data Visualization

Representation of Data
Data representation can be defined as a technique for presenting large volumes of data in a
manner that enables the user to interpret the important data with minimum effort and time.
Data representation techniques are broadly classified in two ways:
Non-Graphical Technique:
Tabular form and case form: This is the old format of data representation not suitable for large
datasets. Non-graphical techniques are not so suitable when our objective is to make some
decisions after analysing a set of data.

Graphical Technique:
The visual display of statistical data in the form of points, lines, dots and other geometrical
forms is most common. For a complex and large quantity of data, human brain is more
comfortable in dealing if represented through visual format means Graphical or pictorial
representation of the data using graph, chart, etc. is known as Data visualization.

For data visualization in Python, the Matplotlib library’s Pyplot interfaces is used. The
Matplotlib is a Python library that provides many interfaces and functionality for 2D-graphics.
You can install it by giving the following commands on the command prompt:

python -m pip install -U pip


python -m pip install -U matplotlib

Pyplot: Plot is a collection of methods within matplotlib which allows user to construct 2D
plots easily and interactively.

Importing PyPlot
To import pyplot in our code we have to use the following statement:
import matplotlib.pyplot as pl
Here, pl is an alias of matplotlib.pyplot.
Some common functions of Matplotlib library with their description is given below:
Function Name Description
title ( ) Adds title to the chart/graph
xlabel ( ) Sets label for X-axis
ylabel ( ) Sets label for Y-axis
xlim ( ) Sets the value limit for X-axis
ylim( ) Sets the value limit for Y-axis
xticks ( ) Sets the tick marks in X-axis
yticks( ) Sets the tick marks in Y-axis
show ( ) Displays the graph in the screen
savefig(“adrress”) Saves the graph in the address specified as argument.
figure ( figsize = value Determines the size of the plot in which the graph is drawn. Values
in tuple format) should be supplied in tuple format to the attribute figsize which is
passed as argument.
Different Types of Graphs:
Line Chart:
A line chart or line graph is the type of chart which displays information as a series of data
points called ‘marker’ connected by straight line segments. With Pyplot, a line chart is created
using plot() function.

Line Chart
Bar Chart:
A bar chart bar graph represents categorial data with rectangular bars with heights or length
proportional to the values that they represent. The bars can be plotted vertically or horizontally.
With Pyplot, a bar chart is created using bar() and barh() functions.

Bar Chart

Scatter Plot:
The scattered plot is similar to a line chart, the major difference is that while line graphs
connects the data points with the line, scatter chart simply plot the data points to show the
trend in that data. With Pyplot, a scattered chart is created using scatter() function.
Scatter Chart

Pie Chart:
A pie chart is circular statistical graphic, which is divided into slices two illustrate numerical
proportion. with Pyplot, a pie chart is created using pie() function. But pie chart can plot only
one data sequence unlike other chart types.

Pie Chart

Histogram Plot:
A histogram is a type of graph that provides a visual interpretation of numerical data range by
indicating the number of data points that lie within the range of values. With PyPlot, a
histogram is created using hist() function.

Creating Line Charts


A line chart or line graphics a type of chart which displays information as a series of data
points called ‘markers’ connected by a straight-line segment. We can use plot() function for
creating a line graph. To create a line graph we must import matplotlib.pyplot interface in our
code.
Example:
import matplotlib.pyplot as pl
a=[10,25,16,70,45]
b=[1,2,3,4,5]
pl.plot(a,b)
Defining Labels for the Axis

All the graphs / charts have two axis: Y


AXI
1) X-axis (Horizontal Axis) S

2) Y-axis (Vertical Axis)


X-AXIS

We can change x-axis and y-axis labels using xlabel() and y label() function
respectively.
Syntax:
1) Matplotlib.pyplot.xlabel(<str>)
2) Matplotlib.pyplot.ylabel(<str>)
Example1:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
pl.plot(a,b)

Q.1 Write a program to plot a line chart to depict the changing of price of a share
in share market for four weeks.
Specifying Plot Size and Grid
We can specify the size of a graph with the help of following statement:
<matplotlib.pyplot>.figure(figsize=(<width>,length>))
Example: Before:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
print("Before:") after:
pl.plot(a,b)
pl.show()
print("after:")
pl.figure(figsize=(10,7))
pl.plot(a,b)
To see the grid lines we can use grid() function.

Example3:

import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.grid(True)
pl.plot(a,b)

We can apply following setting in plot () function


• Color (Line color / Marker Color)
• Marker Type
• Marker size
Changing Line color and style
We can specify line color as per the following syntax:
<matplotlib.pyplot>.plot(<data1>,<data2>,<color code>)
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
pl.plot(a,b,"g")

Example2:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
pl.plot(a,b,"r")

Colors and their code:


‘b’ blue ‘m’ magenta ‘c’ cyan ‘g’ green ‘y’ yellow
‘w’ white ‘r’ red ‘k’ black
To Change Line Width
We can pass linewidth=<width> argument with plot() to set the width of line. Line
width or thickness is measured in points.
Example1:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.plot(a,b,"c",linewidth=4)

To change the line style


We can use linestyle=<type> to change the line style in our graph.
Types: solid, dashed, dashdot, dotted

Example1:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.plot(a,b,"c",linewidth=4, linestyle='dotted')

Example2:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.plot(a,b,"c",linewidth=4, linestyle='dashdot')

Changing Marker Type, size and color


The datapoints being plotted on a graph/ chart are called markers. We can
specify the following arguments in plot() function to change the marker, its size
and color:
Syntax:
<matplotlib.pyplot>.plot(<data1>,<data2>, marker=<marker type>,
markersize=<in points>, markeredgecolor=<valid color>)
Example1:

Example2:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.plot(a,b,"c",marker='s',markeredgecolor='red')
Example3:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.plot(a,b,'cs',linestyle='solid',markeredgecolor='red')

Here ‘cs’ is color : cyan and marker is square

MarkerTypes
marker symbol description
"." Point
"," Pixel
"o" Circle
"v" triangle_down
"^" triangle_up
"<" triangle_left
">" triangle_right
"1" tri_down
"2" tri_up
"3" tri_left
"4" tri_right
"8" Octagon
"s" Square
"p" pentagon
"P" plus (filled)
"*" Star
"h" hexagon1
"H" hexagon2
"+" Plus
"x" X
"X" x (filled)
"D" diamond
"d" thin_diamond
"|" vline
"_" hline
Program
Write a program to draw a line chart, we use plot function.
import matplotlib.pyplot as pl
Tests=[1,2,3,4,5]
Marks=[25,34,49,40,48]
pl.title("Analysis of Test Marks")
pl.xlabel("Test-No")
pl.ylabel("Marks")
pl.plot(Tests,Marks,'g',marker='D',markersize=10,
markeredgecolor='blue',linestyle='solid')
pl.show()

Creating Scatter Charts


The scattered charts can be created through two functions of Pyplot Library.
(i) plot () function (ii) scatter() function
Scatter Chart using plot() function
In plot() function when we specify linecolor-and-markerstyle- string (eg. “r+” or
“ro”) without the linestyle argument, then the plot created resembles a scatter
chart as only the datapoints are plotted.

Eg:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
pl.plot(a,b, 'ro',markersize=8)

Scatter Chart using scatter() function


The scattered plot is similar to a line chart, the major difference is that while line
graphs connects the data points with the line, scatter chart simply plot the data
points to show the trend in that data. With Pyplot, a scattered chart is created
using scatter() functionEg:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
pl.scatter(a,b)
Setting Marker Type:
With the use of marker argument we can specify the marker type.

Ex.:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.xlabel("Roll No.")
pl.ylabel("Marks")
pl.scatter(a,b,marker="d")
Setting Size of the marker:
we use ‘s’ argument to set the size of the marker.

>>>pl.scatter(data1, data2, s=10)

import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.scatter(a,b,s=200)

Setting the color of the marker:


we use ‘c’ argument to set the color of the marker.

import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.scatter(a,b,c='r',s=200)

Setting different color and size of markers:


We can provide any sequence / array for specifying different color and size of the
array with ‘c’ and ‘s’ argument.
Example:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
clr=['r','g','b','m','c']
sz=[200,250,300,350,400]
pl.scatter(a,b,c=clr,s=sz)
Creating Bar Graph
We can create bar graph with the help of bar() function:
Example:

import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.bar(a,b)

Example2:

import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.bar(a,b, color=['r','g','b','y','c'])

Setting width of the bar graph:


import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.bar(a,b, width=0.2)
pl.show()
pl.bar(a,b, width=0.5)

Changing color of the bars of the bar chart


• we can specify a different color for all the bars of a bar chart
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.bar(a,b, color='g')
• we can specify different color for different bar of a bar chart.
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
pl.bar(a,b, color=['r','g','b','y','c'])
Setting Xlimits and YLimits

By default Pyplot automatically tries to find the best fitting range for X and Y axis
depending upon the values being plotted. Sometimes we need to have limits for
X-axis and Y-axis respectively. For this we use xlim() and ylim() functions to set
limit for x-axis and y-axis.
The syntax for setting xlimit and ylimit is as follows:
<Matplotlib.pyplot>.xlim(<xmin>,<xmax>)
<Matplotlib.pyplot>.ylim(<ymin>,<ymax>)

Example:
import matplotlib.pyplot as pl
a=[1,2,3,4,5]
b=[10,25,16,70,45]
print("=====Before=========")
pl.bar(a,b, color=['r','g','b','y','c'])
pl.show()
print("=====After=========")
pl.xlim(0,10)
pl.ylim(0,100)
pl.bar(a,b, color=['r','g','b','y','c'])
pl.show()

Using xticks and yticks


By default Pyplot will automatically decide which datapoint will have ticks on the
axes, but you can also decide which datapoint will have tick marks on x and y
axes.
The syntax is as follows:
<Matplotlim.pyplot>.xticks(<sequence containing tick data points>)
<Matplotlim.pyplot>.yticks(<sequence containing tick data points>)
Example:
import matplotlib.pyplot as pl
a=[1,3,5,7,9]
b=[10,25,16,70,45]
print("=====Before=========")
pl.bar(a,b, color=['r','g','b','y','c'])
pl.show()
print("=====After=========")
pl.xlim(0,10)
pl.ylim(0,100)
pl.xticks(a)
pl.bar(a,b, color=['r','g','b','y','c'])
pl.show()

import matplotlib.pyplot as pl
a=[1,3,5,7,9]
b=[10,25,16,70,45]
print("=====Before=========")
pl.bar(a,b, color=['r','g','b','y','c'])
pl.show()
print("=====After=========")
pl.xlim(0,10)
pl.ylim(0,100)
pl.xticks(a)
pl.yticks(b)
pl.bar(a,b, color=['r','g','b','y','c'])
pl.show()
Creating Histogram with Pyplot

A Histogram is a graphical display of data using bars of different heights. In a


histogram, each bar groups number into ranges. The highest or tallest bars show
that more data falls in that range of values(called bins). We use hist() function for
generating histogram in Pyplot.
Difference between bar chart and Histogram:
Bar chart represents a single value whereas histogram represents range of value.
It is similar to bar graph but it doesn’t show gaps between the bars.

Syntax:
Matplotlib.pyplot.hist(x, bins=None, cumulative=False, histtype= ‘bar’, align=
‘mid’, orientation= “vertical”)
Here,

x array or sequence to be plotted on histogram


bins it is optional. If an integer is given it automatically divides the number of
ranges (i.e. if bins=4, the whole list will be divided into for 4 ranges). We
can also define the sequence in bins also.
Cumulative it is optional. It takes Boolean value i.e. True or False. If it is True, then
a histogram, is computed where each bin gives the counts in that bin
plus all bins for smaller values. The last bin gives the total number of
data points.
histtype it is optional. It has following types: {bar, barstacked, step, stepfilled}.
By default the value is bar.
Orientation: it is optional. It has two values i.e. horizontal and vertical.

Example1:

import matplotlib.pyplot as pl
a=[2,3,7,19,6,8,11,14,15,22,33,24,22,8,2,4]
pl.hist(a)
pl.show()

import matplotlib.pyplot as pl
a=[2,3,7,6,8,12,4]
pl.hist(a)
pl.show()

Defining Range using bins:


import matplotlib.pyplot as pl
a=[2,3,7,6,8,12,4]
pl.hist(a,bins=[1,5,10,15])
pl.show()

import matplotlib.pyplot as pl
a=[2,3,7,6,8,12,4]
pl.hist(a,bins=[1,10,20])
pl.show()
Creating Multiple Histogram:

import matplotlib.pyplot as pl
a=[2,3,7,6,8,12,4]
b=[5,6,2,3,10,26]
pl.hist([a,b])
pl.show()

We can specify number of Bins

import matplotlib.pyplot as pl
a=[2,3,7,6,8,12,4]
b=[5,6,2,8,8,26]
pl.hist([a,b],bins=4)
pl.show()

Output?
import matplotlib.pyplot as pl
a=[2,3,7,6,8,12,4]
b=[5,6,2,7,8,26]
pl.hist([a,b],bins=4)
pl.show()

import matplotlib.pyplot as pl
a=[2,3,7,6,5.5,8,12,4,15]
pl.hist(a,bins=4)
pl.show()

2 ---15 =13 /4=3.25


2-5.24 5.25-8.49, 8.5-11.74, 11.75 -15
Creating bar chart using csv file. Week1 Week2 Week3 Day
import matplotlib as plt 5000 4000 4000 Monday
import pandas as pd 5900 3000 5800 Tuesday
df=pd.read_csv("d:\MahaSales.csv") 6500 5000 3500 Wednesday
df.plot(kind='bar',x='Day',title="MahaSale") 3500 5500 2500 Thursday
plt.ylabel('sales in Rs') 4000 3000 3000 Friday
plt.show 5300 4300 5300 Saturday
7900 5900 6000 Sunday

Creating histogram using csv file.


import matplotlib as plt
import pandas as pd
df=pd.read_csv("d:\MahaSales.csv")
df.plot(kind='hist',x='Day',title="MahaSale")
plt.ylabel('sales in Rs')
plt.show

Creating line chart using csv file.


import matplotlib as plt
import pandas as pd
df=pd.read_csv("d:\MahaSales.csv")
df.plot(kind='line',x='Day',title="MahaSale")
plt.ylabel('sales in Rs')
plt.show

Pie Chart
A pie chart is a circular graph divided into segments or sections, each representing
a relative proportion or percentage of the total. Each segment resembles a slice of
pie, hence the name. Pie charts are commonly used to visualize data from a small
table, but it is recommended to limit the number of categories to seven to maintain
clarity. However, zero values cannot be depicted in pie charts.
Program:
Write a program to draw a pie chart to visualize the comparative rainfall data for 12
months in Tamil Nadu using the CSV file "rainfall.csv".
import pandas as pd Day Rainfall
import matplotlib.pyplot as plt Monday 1
df=pd.read_csv("d:\Rainfall.csv") Tuesday 2
Wednesday 1
x=df['Day'] Thursday 3
y=df['Rainfall'] Friday 2
wp={'linewidth':1,'edgecolor':"black"} Saturday 1
Sunday 1
plt.pie(y,labels=x,startangle=90,wedgeprops=wp)
plt.legend(loc='upper right')
plt.title("Rain Fall Data",fontname='calibri',color='m',fontsize=16)
plt.show()

You might also like