0% found this document useful (0 votes)
30 views17 pages

CSK W Data Visualization

The document discusses various methods for data visualization in Python using Matplotlib library. It explains how to create line graphs, bar graphs, stacked and double bar graphs, histograms using Matplotlib functions like plot(), bar(), hist() and customize plots using parameters like color, marker, linestyle, title etc.

Uploaded by

heizaibrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views17 pages

CSK W Data Visualization

The document discusses various methods for data visualization in Python using Matplotlib library. It explains how to create line graphs, bar graphs, stacked and double bar graphs, histograms using Matplotlib functions like plot(), bar(), hist() and customize plots using parameters like color, marker, linestyle, title etc.

Uploaded by

heizaibrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

DATA VISUALIZATION IN PYTHON

Data Visualization refers to the graphical representation of information and data in the form of
charts , graphs, maps etc. It helps to reveal trends, patterns, correlation etc in the data and thereby
helps decision makers understand to understand the meaning of data to drive business decisions.

For data visualization in Python, the Matplotlib library’s Pyplot interface is used.

In order to use pyplot in our programme we need to import it as shown below:


import matplotlib.pyplot or import matplotlib.pyplot as plt

Using pylot module we can plot data as line chart, bar chart, histogram, etc.

Creating Line Graph - Generally line graphs are drawn to track the changes over a period of time.

1) Use of plot( ) function of pyplot module:

The plot() function is used to draw points (markers) in a diagram.

By default, the plot() function draws a line from point to point.

The function takes parameters for specifying points in the diagram.

Parameter 1 is an array/list containing the points on the x-axis.

Parameter 2 is an array/list containing the points on the y-axis.

Example 1

To draw a line from point P(5,5) to Q(10,7)

import matplotlib.pyplot as plt

xpoints=[5,10]

ypoints=[5,7]

plt.plot(xpoints,ypoints) # the plot function uses a list

plt.show()

or

import matplotlib.pyplot as plt

import numpy as np

xaxis=np.array([5,10])

yaxis=np.array([5,7])

plt.plot(xaxis,yaxis) # the plot function uses a numpy array

plt.show()
Output:

The various parameters that can be used with the plot function:

Parameter Description Example

A marker is any xaxis=np.array([5,8,11,15])


marker symbol that
represents a data yaxis=np.array([3,8,1,10])
value in a line chart.
Some of the markers: plt.plot(xaxis,yaxis,marker="*")
“.” – Point plt.show()
“*” – Star
“s” – Square
“+” – Plus Sign
“D” – Diamond
“o” – Circle
“x” – X sign
“X” – Filled up x

We can specify the


markersize size of the marker xaxis=np.array([5,8,11,15])

yaxis=np.array([3,8,1,10])

plt.plot(xaxis,yaxis,marker="*",markersize=10)

plt.show()
color Can specify the xaxis=np.array([5,8,11,15])
colour of the line.
The colours you can yaxis=np.array([3,8,1,10])
use:
“b” – Blue plt.plot(xaxis,yaxis,marker="*",color="r")
“g” – Green
“r” – Red plt.show()
“k” – Black
“w” – White
“y” – Yellow
“m” – Magenta
“c” -Cyan

The linewidth and


linewidth/ linestyle property xaxis=np.array([5,8,11,15])
linestyle can be used to
change the width yaxis=np.array([3,8,1,10])
and the style of
the line chart. plt.plot(xaxis,yaxis,marker="*",linewidth=2,linest
Linewidth is yle="dashdot")
specified in pixels.
The default line plt.show()
width
is 1 pixel showing
a thin line. The
linestyle
parameter can be
"solid", "dotted",
"dashed" or
"dashdot".

List of Pyplot functions to customize plot:


Function Description Example

time=[2.5,5,7,10.25]
xlabel( ) / Sets the label for x- distance=[5,10,15,20]
ylabel( ) axis and y-axis plt.plot(time,distance,marker="X",markersize=10,l
inestyle="dotted",linewidth=2,color="m")
plt.xlabel("Time")
plt.ylabel("Distance")
plt.show( )

time=[2.5,5,7,10.25]
title( ) We can specify a title distance=[5,10,15,20]
for the graph. It has plt.xlabel("Time")
an parameter named plt.ylabel("Distance")
loc which can be plt.title("Distance-Time Graph")
“left”, “right”, plt.plot(time,distance,marker="X",markersize=10,l
“center”. By default is inestyle="dotted",linewidth=2,color="m")
center plt.show( )

time=[2.5,5,7,10.25]
grid( ) We can specify grid distance=[5,10,15,20]
lines to the plot. It plt.plot(time,distance,marker="+",markersize=10,l
has a parameter inestyle="dashed",linewidth=2,color="m")
named axis with plt.xlabel("Time")
which you can plt.ylabel("Distance")
specify which grid plt.title("Distance-Time Graph")
lines to be displayed. plt.grid( )
axis=”x” would show
grid lines on x-axis
and axis=”y” would
show grid lines on y-
axis. By default it
shows on both the
axis.

time=[2.5,5,7,10.25]
legend( ) When we plot more distanceA=[5,10,15,20]
than one line graph distanceB=[4,7,9,12]
then we specify the plt.plot(time,distanceA,marker="o",linewidth=2,co
legend. The loc lor="g")
parameter can have plt.plot(time,distanceB,marker="o",linewidth=2,co
the following values : lor="y")
“upper left”, “upper plt.xlabel("Time")
right”, “upper center” plt.ylabel("Distance")
, “lower left” , “lower plt.legend(["Car A","Car B"],loc="upper left")
right” , “lower plt.show( )
center”. By default is
“upper left”

time=[2.5,5,7,10.25]
xticks( ) / You can specify the distance=[5,10,15,20]
yticks( ) divisions to be plt.plot(time,distance,marker="o",linewidth=2,col
shown on x and y or="g")
axis. plt.xlabel("Time")
plt.ylabel("Distance")
plt.xticks([3,6,9,12])
plt.yticks([4,8,12,16,20])
To display multiple line graph on one plot:

x1=[2,4,6,8]

y1=[3,7,10,12]

x2=[1,3,5,7]

y2=[2,5,8,10]

plt.plot(x1,y1,marker="s")

plt.plot(x2,y2,marker="s")

plt.show()

or

plt.plot(x1,y1,x2,y2,marker="s")

plt.show()

Output:

Creating Bar Graph – Is generally used to compare things between different groups

1) Using bar( ) function


cls=[9,10,11,12]

students=[170,160,107,120]

plt.bar(cls,students,width=0.3,color=”c”)

plt.xlabel("Class")

plt.ylabel("No. Of Students")

plt.xticks(cls)

plt.show()

Output :

Creating Stacked Bar Graph

cls=[9,10,11,12]

boys=[70,95,97,50]

girls=[100,65,85,70]

plt.xlabel("Class")

plt.ylabel("No. Of Students")

plt.bar(cls,boys)

plt.bar(cls,girls)

plt.xticks(cls)

plt.show()
Output :

Note : Since the boys graph was drawn first (blue colour), and then the girls so the girls bar
overlaps the boys and only if the values of boys is more than the girls it will be visible or
stacked over.

Creating Double Bar Graph

cls=[9,10,11,12]

boys=[70,65,77,50]

girls=[100,95,85,70]

plt.xlabel("Class")

plt.ylabel("No. Of Students")

N=np.arange(4)

plt.bar(N,boys,width=0.3)

plt.bar(N+0.3,girls,width=0.3)

plt.xticks(N,cls)

plt.legend(["Boys","Girls"])

plt.show()

Output :
Note : In this method we manipulate the x axis values so that we get a double bar graph
and then replace the xticks with the required labels

Creating a Histogram – It is a bar graph which shows frequency distribution.

#Histogram

marks=[50,25,35,25,76,50,35,25,30,45,40,60,50,75,45,48,30,33,78]

plt.hist(marks)

plt.xlabel("Marks of Term 1 Exam")

plt.ylabel("No. of Students")

plt.show()

Output :

By default the bins parameter is 10 i.e it will divide the data equally into 10 parts

Example 2

marks=[50,25,35,25,76,50,35,25,30,45,40,60,50,75,45,48,30,33,78]

plt.xlabel("Marks of Term 1 Exam")

plt.ylabel("No. of Students")

plt.hist(marks,bins=5)
plt.show()

Output:

Note : We specified that we need 5 equal divisions

Example 3:

marks=[50,25,35,25,76,50,35,25,30,45,40,60,50,75,45,48,30,33,78]

plt.xlabel("Marks of Term 1 Exam")

plt.ylabel("No. of Students")

plt.hist(marks,bins=[25,35,45,55,65,75,85])

plt.show()

Output :
Note : When we specify a list of values for the bins parameter then it makes division based on

the values passed.

Example 4:

marks=[50,25,35,25,76,50,35,25,30,45,40,60,50,75,45,48,30,33,78]

plt.xlabel("Marks of Term 1 Exam")

plt.ylabel("No. of Students")

plt.hist(marks,orientation="horizontal",histtype="step")

plt.show()

Output :

Note : By default the orientation of the bars of the histogram is vertical and histtype=
“bar” but it can be “step”, “stepfilled”

Note : You can plot a line graph by only mentioning one parameter. That parameter becomes the y-
axis and the values of x-axis automatically becomes 0,1,2,3…. As shown in the example below:

y=np.array([3,8,1,10])

plt.plot(y,marker="*",markersize=10)

plt.show()

Output:
Plotting a line graph using values from the series
Example 1 :
#plotting a line graph using Series with default index
import pandas as pd
Sales1=pd.Series([1000,3300,1800,4500])
print(Sales1)

Output:

plt.plot(Sales1)
plt.show()

Output:

By default the index is taken as the x-axis and since no index was specified it takes the default
index.

Example 2:
#plotting a line graph using Series with labelled index
import pandas as pd
Sales1=pd.Series([1000,3300,1800,4500],index=["Week1","Week2","Week3","Week4"])
print(Sales1)

Output :
plt.plot(Sales1)

Output:

Plotting a bar graph using columns of a data frame

Example 1 :
plotting a bar graph using columns of a dataframe
import pandas as pd
D={"Name":["Anjali","Pooja","Rohan","Tara","Varun"],"Test1":[55,68,79,35,88],"Class":["XII-A","XII-
C","XII-A","XII-B","XII-C"]}
df=pd.DataFrame(D)
print(df)

Output :

plt.bar(df.Name,df.Test1)
plt.xlabel("Students")
plt.ylabel("Marks of Test 1")
plt.show()

Output:
Example 2 :
#plotting a bar graph using columns of a dataframe
import pandas as pd
D={"Name":["Anjali","Pooja","Rohan","Tara","Varun"],"Test1":[55,68,79,35,88],"Test2":
[63,58,70,40,90]}
df=pd.DataFrame(D)
print(df)

Output :

N=np.arange(len(D["Name"]))
plt.bar(N,df.Test1,width=0.4,label=”Test 1”)
plt.bar(N+0.4,df.Test2,width=0.4,label=”Test 2”)
plt.xticks(N+0.2,D["Name"])
plt.legend(loc=”upper left”)
plt.show()

Output :

To save a particular plot :


plt.savefig(“Filename”)
Using Plot Function of Pandas Library
We can also plot graphs using plot function of pandas library as show in examples below:
Syntax :
dafaframename.plot(kind= “line/bar/hist”, other parameters)
By default is the line graph

Consider the dataframe shown below :


import pandas as pd
data={"Term1":[90,88,75,68],"Term2":[95,78,65,70],"Term3":[89,80,60,70],"Term4":[90,85,67,78]}
df=pd.DataFrame(data,index=["Rohan","Tara","Pooja","Amit"])
print(df)

Example 1: To plot a line graph to show the marks obtained by students in a year.
df.plot(xlabel="Names of Students",ylabel="Marks Obtained",title="Marks obtained by students
duirng the year",linestyle="dashed")

Output:

Note : The row index automatically becomes the x axis. In case the row index is the positional
value and the dataframe consists of a column named Name then to get the name on the x axis
we can write
df.plot(x=”Name”, xlabel="Names of Students",ylabel="Marks Obtained",title="Marks obtained by
students duirng the year",linestyle="dashed")

Example 2 :To plot a bar graph to show the marks obtained by students in a year.
df.plot(kind="bar",xlabel="Names of Students",ylabel="Marks Obtained",title="Marks obtained by
students duirng the year",edgecolor="green",linestyle="dashdot",linewidth=2)

Output :

Example 3 :To plot a bar graph to show the marks obtained by students in term 1 exam.
df["Term1"].plot(kind="bar",xlabel="Names of Students",ylabel="Marks Obtained",title="Marks
obtained by students in Term 1")

Output :

Example 4 : To plot a histogram to show the heights of various students of a class


import pandas as pd
import matplotlib.pyplot as plt
Series=pd.Series([162,160,165,175,150,165,150,163,152,155])
Series.plot(kind="hist", bins=5,title="Height of Students in Class")

Output :
By default the y axis shows the label Frequency
To give names to the x axis and y axis we can add the following lines

plt.xlabel("Height")
plt.ylabel("Number of Student")

Output :

You might also like