0% found this document useful (0 votes)
31 views

Pyth

This Python program performs data visualization using the matplotlib library. It creates line plots showing the salaries of data scientists and software engineers at different experience levels. It demonstrates various plotting options like adding titles, labels, legends, changing line styles, colors and widths, adding markers and grids. It also shows stacking multiple plots and using different plotting styles.

Uploaded by

Minal Joshi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Pyth

This Python program performs data visualization using the matplotlib library. It creates line plots showing the salaries of data scientists and software engineers at different experience levels. It demonstrates various plotting options like adding titles, labels, legends, changing line styles, colors and widths, adding markers and grids. It also shows stacking multiple plots and using different plotting styles.

Uploaded by

Minal Joshi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING (DATA SCIENCE)


COURSE CODE: DJS22DSL305 DATE: 1/12/2023
COURSE NAME: Python Laboratory CLASS: SYBTECH
Name: Minal Joshi SAP ID: 60009220180

EXPERIMENT NO. 8

CO/LO:
CO5Apply various advance modules of Python for data analysis.
AIM / OBJECTIVE: Write a Python program to perform visualization using matplotlib
DESCRIPTION OF EXPERIMENT:

Importing Libraries
In [1]:
from matplotlib import pyplot as plt

In [2]: import seaborn as sns

In [3]: from matplotlib import font_manager as fm

In [4]: import pandas as pd

In [5]: import numpy as np

In [6]:

from datetime import datetime, timedelta #It's for time series

Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

Matplotlib makes easy things easy and hard things possible.

Official Page of Matplotlib: https://matplotlib.org/stable/index.html


Pyplot
Pyplot is a collection of functions that make matplotlib work like MATLAB. Each pyplot function makes
some change to a figure

In [7]:

x = [0,2,4,5,6,7,8,9,10] y =
[60,13,45,29,48,77,102,95,58]
In [8]:

plt.plot(x, y) plt.show()

Line Plot
A Line plot can be defined as a graph that displays data as points or check marks above a number line,
showing the frequency of each value.

We will plot data scientist's salaries with respect to their experiences.

In [9]:

experience = [1,3,4,5,7,8,10,12]

salary = [6500, 9280, 12050, 13200, 16672, 21000, 23965, 29793]

In [10]:

plt.plot(experience,salary) plt.show()
Adding a Title

In [11]:

plt.plot(experience,salary)
plt.title("Salary of Data Scientists by their experiences")
plt.show()

Adding Labels to x and y

In [12]:

plt.plot(experience,salary)
plt.title("Salary of Data Scientists by their
experiences") plt.xlabel("Experience")
plt.ylabel("Salary") plt.show()

Plotting Multiple Graphs in One Graph

We will also add software engineer's salary to our graph.


In [13]:

experience = [1,3,4,5,7,8,10,12]
data_scientists_salary = [6500, 9280, 12050, 13200, 16672, 21000, 23965,
29793]
software_engineers_salary = [9020, 12873, 15725, 18000, 19790, 20196,
25769,32000 ]

In [14]:

plt.plot(experience,data_scientists_salary)
plt.plot(experience,software_engineers_salary)
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.show()

We can't understand which line represents what, we need to add legend. We can add them as a list or we
can add them in the beginning.

In [15]:

plt.plot(experience,data_scientists_salary)
plt.plot(experience,software_engineers_salary)
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience")
plt.ylabel("Salary")
plt.legend(["Data Scientists","Software Engineers"])
plt.show()

In [16]:
plt.plot(experience,data_scientists_salary, label= "Data Scientists")
plt.plot(experience,software_engineers_salary, label= "Software Engineers" )
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend() plt.show()

We can also change the location of legend with loc argument.

In [17]:

plt.plot(experience,data_scientists_salary, label= "Data Scientists")


plt.plot(experience,software_engineers_salary, label= "Software Engineers" )
plt.title("Salary of Data Scientists and Software Engineers by their
experiences")
plt.xlabel("Experience")
plt.ylabel("Salary")
plt.legend(loc="lower right")
plt.show()

A format string consists of a part for color, marker and line:


In [18]: fmt = '[marker][line][color]'

Each of them is optional. If not provided, the value from the style cycle is used. Exception: If line is given,
but no marker, the data will be a line without markers.

We can also specify the arguments:

In [19]:

plt.plot(experience,data_scientists_salary,color="r", label= "Data


Scientists")
plt.plot(experience,software_engineers_salary, color="g", label= "Software
Engineers" )
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend() plt.show()

In [20]:

plt.plot(experience,data_scientists_salary,color="r", linestyle="--", label=


"Data Scientists") #We can also make lines different
plt.plot(experience,software_engineers_salary, color="g",linestyle=':', label=
"Software Engineers" )
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend() plt.show()

In [21]:

#We can also add markers


plt.plot(experience,data_scientists_salary,color="r", linestyle="--
",marker="o", label= "Data Scientists")
plt.plot(experience,software_engineers_salary,
color="g",linestyle=':',marker=".", label= "Software Engineers" )
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend() plt.show()

In [22]:

#We can also adjust line width by using linewidth argument.

plt.plot(experience,data_scientists_salary,color="r", linestyle="--
",linewidth=6,marker="o", label= "Data Scientists")
plt.plot(experience,software_engineers_salary,
color="g",linestyle=':',marker=".",linewidth=6, label= "Software Engineers" )
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend() plt.show()

tight_layout automatically adjusts subplot params so that the subplot(s) fits in to the figure area. This is an
experimental feature and may not work for some cases. It only checks the extents of ticklabels, axis labels,
and titles.
For more example and details:
https://matplotlib.org/stable/tutorials/intermediate/tight_layout_guide.html

In [23]:

#We can also add grids by using grids argument

plt.plot(experience,data_scientists_salary,color="r", linestyle="--
",linewidth=6,marker="o", label= "Data Scientists")
plt.plot(experience,software_engineers_salary,
color="g",linestyle=':',marker=".",linewidth=6, label= "Software Engineers" )
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend()
plt.tight_layout()
plt.show()

In [24]:

plt.plot(experience,data_scientists_salary,color="r", linestyle="--
",linewidth=6,marker="o", label= "Data Scientists")
plt.plot(experience,software_engineers_salary,
color="g",linestyle=':',marker=".",linewidth=6, label= "Software Engineers" )
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend()
plt.tight_layout()
plt.grid(True)
plt.show()

We can fill below of the lines by using stackplot.

In [25]:
plt.stackplot(experience,data_scientists_salary, colors="g")
plt.title("Salary of Data Scientists by their experiences")
plt.xlabel("Experience") plt.ylabel("Salary")

plt.show()

We can change the style of the plots. In order to see all available styles:

plt.style.available In [26]:
Out[26]:

['Solarize_Light2',
'_classic_test_patch',
'bmh',
'classic',
'dark_background',
'fast',
'fivethirtyeight',
'ggplot',
'grayscale',
'seaborn',
'seaborn-bright',
'seaborn-colorblind',
'seaborn-dark',
'seaborn-dark-palette',
'seaborn-darkgrid',
'seaborn-deep',
'seaborn-muted',
'seaborn-notebook',
'seaborn-paper',
'seaborn-pastel',
'seaborn-poster',
'seaborn-talk',
'seaborn-ticks', 'seaborn-
white',
'seaborn-whitegrid',
'tableau-colorblind10']

In [27]:
plt.style.use('dark_background')
plt.plot(experience,data_scientists_salary,color="r", linestyle="--
",linewidth=6,marker="o", label= "Data Scientists")
plt.plot(experience,software_engineers_salary,
color="g",linestyle=':',marker=".",linewidth=6, label= "Software Engineers" )
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend()
plt.tight_layout()
plt.grid(True)
plt.show()

We can save figures by using savefig argument.

In [28]:
plt.style.use('seaborn-dark')

plt.plot(experience,data_scientists_salary,color="r", linestyle="--
",linewidth=6,marker="o", label= "Data Scientists")
plt.plot(experience,software_engineers_salary,
color="g",linestyle=':',marker=".",linewidth=6, label= "Software Engineers" )
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend()
plt.tight_layout()
plt.grid(True)
plt.savefig("plot1.png")
plt.show()

Bar Plot
A barplot (or barchart) is one of the most common types of graphic. It shows the relationship between a
numeric and a categoric variable. Each entity of the categoric variable is represented as a bar. The size of the
bar represents its numeric value.

In [29]:
x = ["A", "B", "C", "D"] y
= [3, 8, 1, 10]

In [30]:
plt.bar(x,y)
plt.show()

In [31]:

experience = [1,2,3,4,5,6,7,8]
data_scientists_salary = [6500, 9280, 12050, 13200, 16672, 21000, 23965,
29793]

In [32]:
plt.style.use('seaborn-paper')
plt.bar(experience,data_scientists_salary,color="b")
plt.title("Salary of Data Scientists")
plt.xlabel("Experience")
plt.ylabel("Salary")
plt.tight_layout()
plt.grid(False)

plt.show()
We can combine bar and line plot.

In [33]: experience = [1,2,3,4,5,6,7,8]


data_scientists_salary = [6500, 9280, 12050, 13200, 16672, 21000, 23965,
29793]
software_engineers_salary = [9020, 12873, 15725, 18000, 19790, 20196,
25769,32000 ]

In [34]:
plt.style.use('tableau-colorblind10')
plt.bar(experience,data_scientists_salary,color="r", label= "Data Scientists")
plt.plot(experience,software_engineers_salary, color="g",label= "Software
Engineers" )
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend()
plt.grid(False
) plt.show()

We can specify the width with width argument.

In [35]:
width = 0.2
plt.style.use('tableau-colorblind10')
plt.bar(experience,data_scientists_salary,color="m",width=width, label= "Data
Scientists")
plt.title("Salary of Data Scientists by their experiences")
plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend()
plt.grid(False
) plt.show()

We can also plot multiple bar plots.

In [36]: plt.style.use("fivethirtyeight")

plt.bar(experience,software_engineers_salary, color="g",linewidth=3,label=
"Software Engineers" )
plt.bar(experience,data_scientists_salary,color="r",linewidth=3, label= "Data
Scientists")
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend()
plt.grid(False
) plt.show()

It seems that in the x axis, the values of 8 doesn't seem ok. We can shift the plots.

experience_indexes = np.arange(len(experience)) In [37]:

In [38]:

experience_indexes

Out[38]:
array([0, 1, 2, 3, 4, 5, 6, 7])

In [39]:
plt.style.use("fivethirtyeight")
width = 0.4

plt.bar(experience_indexes - width,software_engineers_salary,
color="g",width=width,linewidth=3,label= "Software Engineers" )
plt.bar(experience_indexes+width,data_scientists_salary,color="r",linewidth=3,
width=width, label= "Data Scientists")
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.legend()
plt.grid(False
) plt.show()

As you can see, x axis does not represent true values of experiences, so we can solve it with xticks()
method.

In [40]:
plt.style.use("fivethirtyeight")
width = 0.25
plt.bar(experience_indexes - width,software_engineers_salary,
color="g",width=width,linewidth=3,label= "Software Engineers"
)
plt.bar(experience_indexes+width,data_scientists_salary,color="r",linewidth=3,
width=width, label= "Data Scientists")
plt.title("Salary of Data Scientists and Software Engineers by their
experiences") plt.xlabel("Experience") plt.ylabel("Salary")
plt.xticks(ticks=experience_indexes, labels=experience)

plt.legend()
plt.grid(True)

plt.show()

Pie Chart
A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate
numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and
area), is proportional to the quantity it represents.

In [41]:

experience = [1,2,3,4,5,6,7,8]
data_scientists_salary = [6500, 9280, 12050, 13200, 16672, 21000, 23965,
29793]
software_engineers_salary = [9020, 12873, 15725, 18000, 19790, 20196,
25769,32000 ]

In [42]: plt.title("Pie Chart Example") slices = [60,40] plt.pie(slices)


plt.tight_layout()
plt.show()
Sum of Values for wedges can be different than 100. The size of each wedge is determined by comparing
the value with all the other values, by using this formula:

The value divided by the sum of all values: x/sum(x)


In [43]: list_1 = [40,56,72,38,4] plt.pie(list_1) plt.show()

We can add labels.

In [44]:

incomes = [40,56,72,38,4]
persons = ["Josh","Berkay","Maria","Michael","Anastacia"]
plt.pie(incomes,labels=persons)

plt.show()

By default the plotting of the first wedge starts from the x-axis and move counterclockwise:
But you can change the start angle by specifying a startangle parameter. The startangle parameter is
defined with an angle in degrees, default angle is 0.

In [45]:

incomes = [40,56,72,38,4]
persons = ["Josh","Berkay","Maria","Michael","Anastacia"]
plt.pie(incomes,labels=persons,startangle=180) plt.show()

If we want use stand one of the wedges out, we can use explode parameter.It takes an array with one
value for each edge.

In [46]:

incomes = [40,56,72,38,4]
persons = ["Josh","Berkay","Maria","Michael","Anastacia"] myexplode
= [0,0.2, 0, 0, 0]
plt.pie(incomes,labels=persons,startangle=180,explode = myexplode) plt.show()

We can also change size of the chart with figsize argument

In [47]:

plt.figure(figsize=(10,10))

plt.rcParams['font.size'] = 20
incomes = [40,56,72,38,4]
persons = ["Josh","Berkay","Maria","Michael","Anastacia"] myexplode
= [0,0.2, 0, 0, 0]
plt.pie(incomes,labels=persons,startangle=180,explode = myexplode) plt.show()

We can also add shadows by making shadow argument True.

In [48]:

plt.figure(figsize=(10,10))

plt.rcParams['font.size'] = 20

incomes = [40,56,72,38,4]

persons = ["Josh","Berkay","Maria","Michael","Anastacia"]
myexplode = [0,0.2, 0, 0, 0]
plt.pie(incomes,labels=persons,startangle=180,explode = myexplode,shadow=True)
plt.show()

We can also set color of each wedge with colors parameter. Some
of possible color options are here:

Shortage Colour
"r" Red
"g" Green
"b" Blue
"c" Cyan
"m" Magenta
"y" Yellow
"k" Black
"w" White
for more color option: https://www.w3schools.com/colors/colors_names.asp

In [49]:

plt.figure(figsize=(10,10))

plt.rcParams['font.size'] = 20

incomes = [40,56,72,38,4]
persons = ["Josh","Berkay","Maria","Michael","Anastacia"]
myexplode = [0,0.2, 0, 0, 0]
colors = ["black","g","y","hotpink","#4CAF70"]
plt.pie(incomes,labels=persons,startangle=180,explode =
myexplode,shadow=True,colors=colors)

plt.show()

In order to add a list of explanation for each wedge, we can use the legend() function.

In [50]:

plt.figure(figsize=(7,7))

incomes = [40,56,72,38,4] persons =


["Josh","Berkay","Maria","Michael","Anastacia"] colors =
["black","g","y","hotpink","#4CAF70"]
plt.pie(incomes,labels=persons,colors=colors)
plt.legend() plt.show()

We can also add a title to legends by using title parameter.

In [51]:
plt.style.use("fivethirtyeight") plt.figure(figsize=(7,7))

incomes = [40,56,72,38,4]
persons = ["Josh","Berkay","Maria","Michael","Anastacia"]
colors = ["black","g","y","hotpink","#4CAF70"]
plt.pie(incomes,labels=persons,colors=colors)
plt.legend(title="Persons") plt.show()

We can add percentages of slices by using autopct argument.

In [52]: plt.style.use("fivethirtyeight") plt.figure(figsize=(7,7))

incomes = [40,56,72,38,4]
persons = ["Josh","Berkay","Maria","Michael","Anastacia"]
colors = ["black","g","y","hotpink","#4CAF70"]
plt.pie(incomes,labels=persons,colors=colors, autopct="%1.1f%%")
plt.legend(title="Persons") plt.show()
At the end:

In [53]:

fig = plt.figure(1, figsize=(6,6))

ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])


plt.title('Raining Hogs and Dogs') labels
= 'Frogs', 'Hogs', 'Dogs', 'Logs' fracs =
[15,30,45, 10]

patches, texts, autotexts = ax.pie(fracs, labels=labels, autopct='%1.1f%%')


proptease = fm.FontProperties() proptease.set_size('xx-small')
plt.setp(autotexts, fontproperties=proptease)
plt.setp(texts, fontproperties=proptease)

plt.show()

Stack Plot
The idea of stack plots is to show “parts to a whole” over time; basically, it’s like a pie-chart, only over
time.

We can use stackplot() built-in function.

In [54]: days = [1,2,3,4,5,6]


sleep = [6,7,5,8,6,7]
drinking_water = [2,2,1,2,1,1]
work = [5,7,10,8,6,9] exercise=
[3,3,0,1,3,2]

In [55]:

plt.style.use("fivethirtyeight")
plt.plot([],[],color='green', label='sleep', linewidth=3)
plt.plot([],[],color='blue', label='drinking_water', linewidth=3)
plt.plot([],[],color='red', label='work', linewidth=3)
plt.plot([],[],color='orange', label='play', linewidth=3)
plt.stackplot(days, sleep, drinking_water, work, exercise,
colors=['green','blue','red','orange'])

plt.xlabel('days')
plt.ylabel('activities')
plt.title('6 DAY ROUTINE STACK PLOT EXAMPLE') plt.legend(loc="lower
right")
plt.tight_layout()
plt.show()

We can visualize the data that has a spesific total.


In [56]:

stock1= [5,3,3,6,1,8,2,7,9] stock2=


[2,4,1,3,5,0,3,1,0] stock3=
[2,2,5,1,3,1,4,1,0]

days =[1,2,3,4,5,6,7,8,9]

In [57]:
stocks= ["stock1","stock2","stock3"] colors = ["#F9CDAD", "#FC9D9A",
"#83AF9B"] plt.title("Stack Plot of Stock Rates")
plt.stackplot(days,stock1,stock2,stock3,labels=stocks,colors=colors)

plt.legend() plt.tight_layout() plt.show()

Histograms
A histogram is a graph showing frequency distributions.
It is a graph showing the number of observations within each given interval.

We use hist() function in order to create histograms.

In [58]: notes = [30,74,94,14,55,47,63,28,88,44,53,18,66,74,81]

In [59]: plt.style.use("fivethirtyeight") plt.hist(notes)

plt.show()

In [60]: plt.style.use("fivethirtyeight") plt.hist(notes,color="r")

plt.title("Notes") plt.xlabel("Notes")
plt.ylabel("Person")
plt.tight_layout()
plt.grid(False)
plt.show()

We can add edge colors in order to interpret the table better.

In [61]: plt.style.use("fivethirtyeight")
plt.hist(notes,color="r",edgecolor="black")
plt.title("Notes")
plt.xlabel("Notes")
plt.ylabel("Person")
plt.tight_layout()
plt.grid(False)
plt.show()

We can specify the size of bins.

In [62]: plt.style.use("fivethirtyeight")
plt.hist(notes,bins=5,color="g",edgecolor="black")
plt.title("Notes")
plt.xlabel("Notes")
plt.ylabel("Person")
plt.tight_layout()
plt.grid(False)
plt.show()

We can give bin values spesifically.


In [63]:

plt.style.use("fivethirtyeight") bins
= [10,45,65,80,100]
plt.hist(notes,bins=bins,color="g",edgecolor="black")
plt.title("Notes")
plt.xlabel("Notes")
plt.ylabel("Person")
plt.tight_layout()
plt.grid(False)
plt.show()
Let's plot a normal distribution(bell shape)

In
[64]: x = np.random.normal(170, 10, 250)
plt.hist(x,color="gray",edgecolor="black")
plt.title("Normal Distribution")
plt.xlabel("Numbers")
plt.ylabel("Count")
plt.tight_layout()
plt.grid(False)
plt.show()

For a real world example, we will work with Human Resources Data Set.
Dataset can be downloaded from here : https://www.kaggle.com/rhuebner/human-resources-data-set We
will read it with pandas.
In [65]: df = pd.read_csv("../input/human-resources-data-set/HRDataset_v14.csv")

In [66]: df.head()
Out[66]:

La
st
Fr o Pe
m Pe En Sp
Re rf
Em M Di cr
rf ga E m eci or Da
M
pl M ari Em Pe ve M uit or ge pS al m ys
an
oy E
ar tal Ge
pS De rfS rsi
an m m m ati Pr an La Ab
m nd tyJ Sa ag an en oj se
ee rie St tat ptI co ag en sf ce te
erI ob lar er
_N pI
dI at us D reI ...
erI tS ce tS ac ec Re La nc
D D Fa y Na Sc ur ts es
am D us D D ou tio vi e st
ID irI me ve Co
e ID rc e or n w 30
D e y un _D
t at e

Ad
in Mi
1/
olf ch Li Ex
10 62 17
i, ae l nk ce
02 50 Al 22 4. /2
0 Wi 0 0 1 1 5 4 0 ... ed ed 5 0 0 1
6 6 be .0 60 01
lso In s
rt 9
nK

Ait
Si Fu 2/
di, Si
10 10 In lly 24
Ka m 4. 4.
08 44 de M /2
1 rt 1 1 1 5 3 3 0 ... on 0 96 3 6 0 17
4 37 ed ee 01
hi Ro
ts 6
ke up
ya
n
Ak
in Kis Fu 5/
ku sy Li 15
10 64 lly
oli Su nk
19 95 20 M 3. /2
2 e, 1 1 0 5 5 3 0 ... lliv ed 3 0 0 3
6 5 .0 ee 02 01
Sa an In ts 2
ra h

Al Eli Fu
ag 10 jia lly 1/
64 In
be 08 h 16 M 4. 3/
3 1 1 0 1 5 3 0 99 ... de 5 0 0 15
,Tr Gr .0 ee 84 20
8 1 ed
in a ay ts 19

W
An Go
eb Fu
de og 2/
st lly
rs 10 50 le 1/
er 39 M 5.
4 on , 06 0 2 0 5 5 3 0 82 ... Se 4 0 20 0 2
Bu .0 ee 00
Ca 9 5 ar 16
tle ts
rol ch
r
5 rows × 36 columns

In [67]:
Out[66]:

We will work with Salary column.


In [67]:
plt.style.use("fivethirtyeight")
bins = [40000,55000,70000,85000,100000,120000]
plt.hist(df.Salary,bins=bins,color="blue",edgecolor="black")
plt.title("Salaries of Workers")
plt.xlabel("Salary")
plt.ylabel("Count")
plt.tight_layout()
plt.grid(False)
plt.show()

If some values are so higher, we can use logarithmic scale to plot.

In [68]:
plt.style.use("fivethirtyeight") bins =
[40000,55000,70000,85000,100000,120000]
plt.hist(df.Salary,bins=bins,color="blue",edgecolor="black",log=True)

plt.title("Salaries of Workers")

plt.xlabel("Salary") plt.ylabel("Count")
plt.tight_layout() plt.grid(False) plt.show()
We can also add median and mean of Salaries.
In [69]: plt.style.use("fivethirtyeight")

salary_median = df.Salary.median() salary_mean


= df.Salary.mean()
bins = [40000,55000,70000,85000,100000,120000]
plt.hist(df.Salary,bins=bins,color="blue",edgecolor="black")
plt.axvline(salary_median, color="gray", label="Salary Median", linewidth=3)
plt.axvline(salary_mean, color="green", label="Salary Mean", linewidth=3)
plt.legend() plt.title("Salaries
of Workers") plt.xlabel("Salary")
plt.ylabel("Count")
plt.tight_layout()
plt.grid(False)
plt.show()

Scatter Plots
Scatter plots are used to plot data points on horizontal and vertical axis in the attempt to show how much
one variable is affected by another.

In [70]:

first_exam_grades = [89, 90, 70, 89, 100, 80, 90, 100, 80, 34]
second_exam_grades = [30, 29, 49, 48, 100, 48, 38, 45, 20, 30]

In [71]: plt.title("Exam Grades Scatter plot")


plt.scatter(first_exam_grades,second_exam_grades)
plt.tight_layout()
plt.xlabel("First Exam Grades")
plt.ylabel("Second Exam Grades")
plt.grid(True) plt.show()

We can change the dot size and color.

In [72]: plt.title("Exam Grades Scatter plot")


plt.scatter(first_exam_grades,second_exam_grades,s=100,color="r")
plt.tight_layout()
plt.xlabel("First Exam Grades")
plt.ylabel("Second Exam Grades")
plt.grid(True)
plt.show()
You can change dot size by values.

In [73]:

plt.title("Exam Grades Scatter plot")


sizes = np.array([20,50,100,200,500,1000,60,90,10,300])
plt.scatter(first_exam_grades,second_exam_grades,s=sizes,color="r")
plt.tight_layout()
plt.xlabel("First Exam Grades")
plt.ylabel("Second Exam
Grades") plt.grid(True)
plt.show()

We can also change the marker.

In [74]:
plt.title("Exam Grades Scatter plot")

plt.scatter(first_exam_grades,second_exam_grades,s=100,color="green",marker="x
")
plt.tight_layout()
plt.xlabel("First Exam Grades")
plt.ylabel("Second Exam Grades")
plt.grid(True) plt.show()

We can also plot two different plots.

In [75]:

first_exam_grades = [89, 90, 70, 89, 100, 80, 90, 100, 80, 34]
first_study_hours = [6,8,3,9,9,1,4,2,2,5]

In [76]:
second_exam_grades = [30, 29, 49, 48, 100, 48, 38, 45, 20,
30] second_study_hours = [2,7,1,5,3,3,2,6,3,2]
In [77]:
plt.title("Exam Grades Scatter plot")

plt.scatter(first_exam_grades,first_study_hours,s=100,color="green",marker="x"
) plt.scatter(second_exam_grades,second_study_hours,s=100,color="red")
plt.tight_layout()
plt.xlabel("Exam Grades")
plt.ylabel("Study Hours")
plt.grid(True)
plt.show()

You can also add features and colormaps.

In [78]:

first_exam_grades = [89, 90, 70, 89, 100, 80, 90, 100, 80, 34]
second_exam_grades = [30, 29, 49, 48, 100, 48, 38, 45, 20, 30]
colors = [7, 5, 9, 7, 5, 7, 2, 5, 3, 7]
sizes = [209, 486, 381, 255, 191, 315, 185, 228, 174,538]

In [79]:

plt.title("Exam Grades Scatter plot")

plt.scatter(first_exam_grades,second_exam_grades,s=sizes,c=colors,cmap="Blues"
,edgecolor="black")
cbar = plt.colorbar()
cbar.set_label("Exam Grades")
plt.tight_layout()
plt.xlabel("First Exam Grades")
plt.ylabel("Second Exam
Grades")
plt.grid(True)
plt.show()

Contour(Level) Plots
Contour plots (sometimes called Level Plots) are a way to show a three-dimensional surface on a two-
dimensional plane. It graphs two predictor variables X Y on the y-axis and a response variable Z as
contours. These contours are sometimes called the z-slices or the iso-response values.

In [80]:

x = [0,3,6,9,13,15,19,23,26,29,33,35,39,41,47,56] y
= [5,8,13,16,17,20,25,26,30,33,37,39,41,44,48,59]
In order to create contour plot, first we will use numpy's meshgrid function, and then use contour
function.

In [81]:

# Creating 2-D grid of features


[X, Y] = np.meshgrid(x,
y)
fig, ax = plt.subplots(1, 1) Z
= np.sqrt(X**2+Y**2)

# plots contour
lines ax.contour(X, Y,
Z)
ax.set_title('Contour Plot')
ax.set_xlabel('X values')
ax.set_ylabel('Y values')
plt.show()

We can also fill inside of plot by using contourf() function.

In [82]:

# Creating 2-D grid of features


[X, Y] = np.meshgrid(x,
y)
fig, ax = plt.subplots(1, 1) Z
= np.sqrt(X**2+Y**2)

# plots contour
lines ax.contourf(X, Y,
Z)
ax.set_title('Contour Plot')
ax.set_xlabel('X values')
ax.set_ylabel('Y values')
plt.show()

Violin Plots
Violin plots are similar to box plots, except that they also show the probability density of the data at
different values. These plots include a marker for the median of the data and a box indicating the
interquartile range, as in the standard box plots.

In [83]:

x = [0,3,6,9,13,15,19,23,26,29,33,35,39,41,47,56] y
= [5,8,13,16,17,20,25,26,30,33,37,39,41,44,48,59]

In [84]: data=[x,y]#First we will combine the collections fig = plt.figure()

ax = fig.add_axes([0,0,1,1]) bp

= ax.violinplot(data)

plt.grid(False) plt.title("Violin
Plot") plt.show()

Plotting Time Series


A time series is a sequence of numerical data points in successive order. In investing, a time series tracks
the movement of the chosen data points, such as a security's price, over a specified period of time with
data points recorded at regular intervals.

In [85]:

dates = [ datetime(2021,
3, 10),

datetime(2021, 3, 13), datetime(2021,


3, 14), datetime(2021, 3, 15),
datetime(2021, 3, 16), datetime(2021,
3, 17), datetime(2021, 3, 18),
datetime(2021, 3, 19) ]

values = [0,3,4,7,5,3,5,6]

In [86]: plt.title("Time Series") plt.plot_date(dates, values)


plt.xticks(rotation='vertical') plt.show()

We can add a line to plot.

In [87]: plt.title("Time Series")


plt.plot_date(dates, values,linestyle="solid",marker = 'o',ms = 20, mfc =
'r',c="b" )
plt.xticks(rotation='vertical')
plt.xlabel("Dates")
plt.ylabel("Values")
plt.grid(False)

plt.show()
Box Plot
In descriptive statistics, a box plot or boxplot is a method for graphically depicting groups of numerical
data through their quartiles. Box plots may also have lines extending from the boxes (whiskers) indicating
variability outside the upper and lower quartiles. We will use plt.boxplot() function for that.

In [88]:

Salaries =
[6900,7500,4700,11997,22000,16550,9655,8670,15090,29000,7600,14980,1250]

In [89]: plt.boxplot(Salaries) plt.title("Box Plot of Salaries")


plt.ylabel("Salaries")

plt.show()

By making notch argument True, we can create notched boxes.

In [90]: plt.boxplot(Salaries,
notch=True) plt.title("Box Plot of Salaries")
plt.ylabel("Salaries") plt.show()

We can change the colors.

In [91]: green_diamond = dict(markerfacecolor='g', marker='D')


plt.boxplot(Salaries, notch=True, flierprops=green_diamond) plt.title("Box Plot
of Salaries")

plt.ylabel("Salaries") plt.show()

We can make showfliers argument False in order to hide Outlier Points.

In [92]: plt.boxplot(Salaries, notch=True, showfliers=False) plt.title("Box Plot of


Salaries") plt.ylabel("Salaries") plt.show()
We can plot it horizontal by making vert argument False.

In
[93] :plt.boxplot(Salaries, notch=True, showfliers=False, vert=False)
plt.title("Box Plot of Salaries")

plt.xlabel("Salaries") plt.show()

Heatmap
It is often desirable to show data which depends on two independent variables as a color coded image
plot. This is often referred to as a heatmap. If the data is categorical, this would be called a categorical
heatmap.

A heat map is a data visualization technique that shows magnitude of a phenomenon as color in two
dimensions. The variation in color may be by hue or intensity, giving obvious visual cues to the reader
about how the phenomenon is clustered or varies over space. It's generally used to understand
correlations between variables.
Matplotlib's imshow() or heatmap() function makes production of such plots particularly easy.
matplotlib.pyplot.pcolormesh() is an alternative function.

In [94]:

data = np.random.random(( 6 , 6 )) data

Out[94]:

array([[0.80507909, 0.33227263, 0.11982315, 0.43404586, 0.00439907,


0. 36893734],
[0.32868725, 0.49988084, 0.27837337, 0.34234132, 0.04036771,
0.139306 ],
[0.32850119, 0.5059023 , 0.22190802, 0.74768769, 0.25377415,
0.06561068],
[0.28074898, 0.93220202, 0.28393501, 0.0436093 , 0.7720071 ,
0.33618152],
[0.69157098, 0.51986173, 0.35829982, 0.35398844, 0.67410653,
0.15522212],
[0.10798746, 0.73882111, 0.50958755, 0.62619836, 0.28166925,
0.22563274]])

In [95]:
plt.imshow( data , cmap = 'autumn' )
plt.title( "2-D Heat Map" )
plt.show()
In [96]:

sns.heatmap( data , linewidth = 0.5 , cmap = 'coolwarm' )

plt.title( "2-D Heat Map" )


plt.show()

In [97]:

plt.pcolormesh( data , cmap = 'summer' )

plt.title( '2-D Heat Map' ) plt.show()

For a real world example, we will use flights dataset of Seaborn.


In [98]:
flights = sns.load_dataset("flights")

In [99]:
flights.head()

Out[99]:

year month passengers


0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
Let's make a pivot table in order to make this dataset ready to plot heatmap. Otherwise heatmap will not
work.

In [100]:
flights = flights.pivot("month","year","passengers")

In [101]: flights
Out[101]:

year 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
month
Jan 112 115 145 171 196 204 242 284 315 340 360 417
Feb 118 126 150 180 196 188 233 277 301 318 342 391
Mar 132 141 178 193 236 235 267 317 356 362 406 419
Apr 129 135 163 181 235 227 269 313 348 348 396 461
May 121 125 172 183 229 234 270 318 355 363 420 472
Jun 135 149 178 218 243 264 315 374 422 435 472 535
Jul 148 170 199 230 264 302 364 413 465 491 548 622
Aug 148 170 199 242 272 293 347 405 467 505 559 606
Sep 136 158 184 209 237 259 312 355 404 404 463 508
Oct 119 133 162 191 211 229 274 306 347 359 407 461
Nov 104 114 146 172 180 203 237 271 305 310 362 390
Dec 118 140 166 194 201 229 278 306 336 337 405 432
In [102]:

sns.heatmap( flights , linewidth = 0.5 , cmap = 'YlGn' )


plt.title( "Flights Heat Map" )
plt.show()

OBSERVATIONS / DISCUSSION OF RESULT:


1. Plot
2. Pie Plot
3. Violin Plot

4. Box Plot
5. Bar Pot
6. Scatter Plot
7. Heatmap

8. Stack Plot
CONCLUSION:
Therefore we have learnt matplotlib and seaborn modules in python and how to use them to
create bar graphs, pie plots, violin plots, scatter plots, box plots,etc and use it for data
visualization.

REFERENCES:
Website References:
[1] https://www.mygreatlearning.com/
[2] https://www.geeksforgeeks.org/

You might also like