CSK W Data Visualization
CSK W Data Visualization
Data Visualization refers to the graphical representation of information and data in the form of
charts , graphs, maps etc. It helps to reveal trends, patterns, correlation etc in the data and thereby
helps decision makers understand to understand the meaning of data to drive business decisions.
For data visualization in Python, the Matplotlib library’s Pyplot interface is used.
Using pylot module we can plot data as line chart, bar chart, histogram, etc.
Creating Line Graph - Generally line graphs are drawn to track the changes over a period of time.
Example 1
xpoints=[5,10]
ypoints=[5,7]
plt.show()
or
import numpy as np
xaxis=np.array([5,10])
yaxis=np.array([5,7])
plt.show()
Output:
The various parameters that can be used with the plot function:
yaxis=np.array([3,8,1,10])
plt.plot(xaxis,yaxis,marker="*",markersize=10)
plt.show()
color Can specify the xaxis=np.array([5,8,11,15])
colour of the line.
The colours you can yaxis=np.array([3,8,1,10])
use:
“b” – Blue plt.plot(xaxis,yaxis,marker="*",color="r")
“g” – Green
“r” – Red plt.show()
“k” – Black
“w” – White
“y” – Yellow
“m” – Magenta
“c” -Cyan
time=[2.5,5,7,10.25]
xlabel( ) / Sets the label for x- distance=[5,10,15,20]
ylabel( ) axis and y-axis plt.plot(time,distance,marker="X",markersize=10,l
inestyle="dotted",linewidth=2,color="m")
plt.xlabel("Time")
plt.ylabel("Distance")
plt.show( )
time=[2.5,5,7,10.25]
title( ) We can specify a title distance=[5,10,15,20]
for the graph. It has plt.xlabel("Time")
an parameter named plt.ylabel("Distance")
loc which can be plt.title("Distance-Time Graph")
“left”, “right”, plt.plot(time,distance,marker="X",markersize=10,l
“center”. By default is inestyle="dotted",linewidth=2,color="m")
center plt.show( )
time=[2.5,5,7,10.25]
grid( ) We can specify grid distance=[5,10,15,20]
lines to the plot. It plt.plot(time,distance,marker="+",markersize=10,l
has a parameter inestyle="dashed",linewidth=2,color="m")
named axis with plt.xlabel("Time")
which you can plt.ylabel("Distance")
specify which grid plt.title("Distance-Time Graph")
lines to be displayed. plt.grid( )
axis=”x” would show
grid lines on x-axis
and axis=”y” would
show grid lines on y-
axis. By default it
shows on both the
axis.
time=[2.5,5,7,10.25]
legend( ) When we plot more distanceA=[5,10,15,20]
than one line graph distanceB=[4,7,9,12]
then we specify the plt.plot(time,distanceA,marker="o",linewidth=2,co
legend. The loc lor="g")
parameter can have plt.plot(time,distanceB,marker="o",linewidth=2,co
the following values : lor="y")
“upper left”, “upper plt.xlabel("Time")
right”, “upper center” plt.ylabel("Distance")
, “lower left” , “lower plt.legend(["Car A","Car B"],loc="upper left")
right” , “lower plt.show( )
center”. By default is
“upper left”
time=[2.5,5,7,10.25]
xticks( ) / You can specify the distance=[5,10,15,20]
yticks( ) divisions to be plt.plot(time,distance,marker="o",linewidth=2,col
shown on x and y or="g")
axis. plt.xlabel("Time")
plt.ylabel("Distance")
plt.xticks([3,6,9,12])
plt.yticks([4,8,12,16,20])
To display multiple line graph on one plot:
x1=[2,4,6,8]
y1=[3,7,10,12]
x2=[1,3,5,7]
y2=[2,5,8,10]
plt.plot(x1,y1,marker="s")
plt.plot(x2,y2,marker="s")
plt.show()
or
plt.plot(x1,y1,x2,y2,marker="s")
plt.show()
Output:
Creating Bar Graph – Is generally used to compare things between different groups
students=[170,160,107,120]
plt.bar(cls,students,width=0.3,color=”c”)
plt.xlabel("Class")
plt.ylabel("No. Of Students")
plt.xticks(cls)
plt.show()
Output :
cls=[9,10,11,12]
boys=[70,95,97,50]
girls=[100,65,85,70]
plt.xlabel("Class")
plt.ylabel("No. Of Students")
plt.bar(cls,boys)
plt.bar(cls,girls)
plt.xticks(cls)
plt.show()
Output :
Note : Since the boys graph was drawn first (blue colour), and then the girls so the girls bar
overlaps the boys and only if the values of boys is more than the girls it will be visible or
stacked over.
cls=[9,10,11,12]
boys=[70,65,77,50]
girls=[100,95,85,70]
plt.xlabel("Class")
plt.ylabel("No. Of Students")
N=np.arange(4)
plt.bar(N,boys,width=0.3)
plt.bar(N+0.3,girls,width=0.3)
plt.xticks(N,cls)
plt.legend(["Boys","Girls"])
plt.show()
Output :
Note : In this method we manipulate the x axis values so that we get a double bar graph
and then replace the xticks with the required labels
#Histogram
marks=[50,25,35,25,76,50,35,25,30,45,40,60,50,75,45,48,30,33,78]
plt.hist(marks)
plt.ylabel("No. of Students")
plt.show()
Output :
By default the bins parameter is 10 i.e it will divide the data equally into 10 parts
Example 2
marks=[50,25,35,25,76,50,35,25,30,45,40,60,50,75,45,48,30,33,78]
plt.ylabel("No. of Students")
plt.hist(marks,bins=5)
plt.show()
Output:
Example 3:
marks=[50,25,35,25,76,50,35,25,30,45,40,60,50,75,45,48,30,33,78]
plt.ylabel("No. of Students")
plt.hist(marks,bins=[25,35,45,55,65,75,85])
plt.show()
Output :
Note : When we specify a list of values for the bins parameter then it makes division based on
Example 4:
marks=[50,25,35,25,76,50,35,25,30,45,40,60,50,75,45,48,30,33,78]
plt.ylabel("No. of Students")
plt.hist(marks,orientation="horizontal",histtype="step")
plt.show()
Output :
Note : By default the orientation of the bars of the histogram is vertical and histtype=
“bar” but it can be “step”, “stepfilled”
Note : You can plot a line graph by only mentioning one parameter. That parameter becomes the y-
axis and the values of x-axis automatically becomes 0,1,2,3…. As shown in the example below:
y=np.array([3,8,1,10])
plt.plot(y,marker="*",markersize=10)
plt.show()
Output:
Plotting a line graph using values from the series
Example 1 :
#plotting a line graph using Series with default index
import pandas as pd
Sales1=pd.Series([1000,3300,1800,4500])
print(Sales1)
Output:
plt.plot(Sales1)
plt.show()
Output:
By default the index is taken as the x-axis and since no index was specified it takes the default
index.
Example 2:
#plotting a line graph using Series with labelled index
import pandas as pd
Sales1=pd.Series([1000,3300,1800,4500],index=["Week1","Week2","Week3","Week4"])
print(Sales1)
Output :
plt.plot(Sales1)
Output:
Example 1 :
plotting a bar graph using columns of a dataframe
import pandas as pd
D={"Name":["Anjali","Pooja","Rohan","Tara","Varun"],"Test1":[55,68,79,35,88],"Class":["XII-A","XII-
C","XII-A","XII-B","XII-C"]}
df=pd.DataFrame(D)
print(df)
Output :
plt.bar(df.Name,df.Test1)
plt.xlabel("Students")
plt.ylabel("Marks of Test 1")
plt.show()
Output:
Example 2 :
#plotting a bar graph using columns of a dataframe
import pandas as pd
D={"Name":["Anjali","Pooja","Rohan","Tara","Varun"],"Test1":[55,68,79,35,88],"Test2":
[63,58,70,40,90]}
df=pd.DataFrame(D)
print(df)
Output :
N=np.arange(len(D["Name"]))
plt.bar(N,df.Test1,width=0.4,label=”Test 1”)
plt.bar(N+0.4,df.Test2,width=0.4,label=”Test 2”)
plt.xticks(N+0.2,D["Name"])
plt.legend(loc=”upper left”)
plt.show()
Output :
Example 1: To plot a line graph to show the marks obtained by students in a year.
df.plot(xlabel="Names of Students",ylabel="Marks Obtained",title="Marks obtained by students
duirng the year",linestyle="dashed")
Output:
Note : The row index automatically becomes the x axis. In case the row index is the positional
value and the dataframe consists of a column named Name then to get the name on the x axis
we can write
df.plot(x=”Name”, xlabel="Names of Students",ylabel="Marks Obtained",title="Marks obtained by
students duirng the year",linestyle="dashed")
Example 2 :To plot a bar graph to show the marks obtained by students in a year.
df.plot(kind="bar",xlabel="Names of Students",ylabel="Marks Obtained",title="Marks obtained by
students duirng the year",edgecolor="green",linestyle="dashdot",linewidth=2)
Output :
Example 3 :To plot a bar graph to show the marks obtained by students in term 1 exam.
df["Term1"].plot(kind="bar",xlabel="Names of Students",ylabel="Marks Obtained",title="Marks
obtained by students in Term 1")
Output :
Output :
By default the y axis shows the label Frequency
To give names to the x axis and y axis we can add the following lines
plt.xlabel("Height")
plt.ylabel("Number of Student")
Output :