Visualization Using Python
Visualization Using Python
Q1 25th percentile
Q2 Median 50th percentile
03 75th percentile
upper bound highest value
xx
Ign
x x
upper
bound bound
Q1 Q2 03
median
03
The whiskers at the end show how bound upper bound
IOR
To find outliers visually
matpletlib
Visualization
Python Seaborn
visualization utility
Install matpletlib
import matplotlib
plot
It is a difficult way
Seaborn Seaborn is a library that uses matplotlib
underneath to plot graphs
Install Seaborn
matplotlib
It is a easier way
ABBATE
Sepaloid them
datac bonplete
Anes subplot
5
8
Y
Y
s
z
o
s
x
I se
Calum I
NUST
numerical vs numerical
yr
715
Faa T I i n i seams
Estates
we use pit put ay from matplotlib to draw the
line plot
By default pit plot C draws a line plot
pit n label
Is string
pit y label
I string
a n
Plt title
string
ex
Tent o s l o Slrs Sw
Sluss w
783
g
g
isepallenginem
Additional parameters
line style defines the style of the line
Slusser
goy
isepallenginem
color K black
r red
Marker size 5
Line size 7
Ext
we can split dataset in to 3 datasets depending on
species
label Setosa
pl t plot versicolor sepallength em Versicolor sepal width cm
line style d
marker o color r marker size 7
label versicolor
label Virginica
pit legend
Sluss w
ggy
x xx
q
xx xx
B
n
n I i n I l
sepallengthen
Assignment Do scatter plot using all the combinations
and find 2 important variables that can help
in separating the 3 different species
SL SW PL PW
combinations no i e
4oz
SL SW
SL PL
SL PW
SW PL
SW PW
PL PW
I
xxxxx
É eggs I
By t
t t
t t t t t tt t
petal lengthen
we have another matplotlib function called scatter
that plots scatter plot directly
45
go
4 o
35 as
3.0 É
2.5
I n i n I l
0.2 03 0 4 0 5 0 6
0 I
Using Seaborn for bi variant data Analysis
with Seaborn we can do scatter plot
As parameters we mention the data we are
use
going to
E
Sns scatter plot data datac n sepallengthen
y Sepalwidtham
É xxx
q
xxxx
q
sepallenginem
species
x setosa
x versicolor
x virginica
É g
xxx
xxxx
no
sepallenginem
when we
using Sepallengtham and Sepalwidthan
are
few mistakes
4 Se heresepal lens them petalwidthon are more
Density
L
petal lengthen
smooth we will get PDF
on Histogram if we
able to classify
so first we will divide the data and do analysis
with PDF
Setosa data c Loc datacC species Iris setosay
Versicolor data c Loc datacC species Iris Version'd
virginica data c Loc datacC species Iris Virginia
Sns distplot setosa petallengtham
Sns distplot versicolor petallengtham
Sns distplot Virginia petallengtham
Density
H tint
petal lengthens
is lot of overlapping
conclusion
most important petal length cm
next petalwidtham
neut Sepal lengthen
worst sepal width am
Density M
petal lengthen
default histogram
is
Sns displet data datac n petallersthanshueispecies
species
to
count
petal lengthen
Sns dis plot data datac n petallengthan hue species
kind Kde
i É
Density
AAA
petal lengthens
As we
analyze each feature variable we see that
there is lot of overlapping
Conclusion
Iris sets a 50
Iris Versicolor so
9ns Virginica 50
Name species d type intoy
bar graph
matplotlib command
50,50 50
Plt bar data species unique 7
Versicolor
Setosa Virginia
Plt bar data species unique C
Versicolor
Setosa Virginia
50
balanced
count
balanced
If bars are uneven then unbalanced
N
unbalanced
MMA
Seaborn command to plot bargraph
This is more beautiful with colors differential
the species
50
count
50
count
Versicolor Virginica
setosa
face color or paper color can be changed
Pit figure fig size 515 face color k
50
count
se to sa Hersicolor Virginica
Histogram A histogram is a graphshowing frequency
distributions It is a graph showing the number of
observations within each given interval
Bargraph Histogram
bins
t t
free t t
f
bin
height
I
d f I k numerical data
bin edges bath
If we are given a dataset
1 sort the values in ascending order
want decide
2 How many bins we we can
3 find the man valued min value
bin width man min
no of bins
ex G 5 4 3 1 2 7 8 9
1 I 2 3 4 5 6 7 8 9 sorted
2 no of bins y
3 man 9 min I
binwidth 1 2
94
1 1 2 1 2 2 1 2 2 2 1 2 2 242
bin edges
1 3 5 7 9
C 3 2
II Values
31 Values 5 2
5 Values 7 2
7 Values 9 2
i j si y d
If we choose less no of bins then the bin width
will be more
import pandas as pd
25
20
15
10
default is 1 no width
output will be two arrays and histogram plot
bin height
bin edges
default bins 9
25
20
U I
15
10
sepal length cm
I his Versicolor
35
30
25
j 20
15
10
pit Legend C
we can plot histogram using displot as default
plot of dispute is histogram
hue species
n
species
7 Iris sets a
15 I Ins Versicolor
f Ins Virginica
J
Violin plot violin plot is a statistical representation
of numerical data It is simillar to bone plot
03
acmedian
PDF lb 0 I 5 IOR
IfEenput
t
i I
lb i
a an as
median
MW
8 12 y 6 8
Petal length cm
d 1ns
versicolor at at
Iris
virginica
i i s i s s t
petal lens them
ps his Versicolor
É H
o
f y
83
2
Ji's In s
setosa Versicolor Fironica
Count plot this is simillar to bar graph
Takes categorical data
Uni variant analysis
unlike bar do not need to mention
graph we
data
G
Axes subplot se label species y label county
50
yo
30
20
10
In's
serosa
Yeficolor Iisinica
hin math
bio
eng
there to remove all the info we get after
running this command we use pit show c
Plt show C
him math
bio
eng
Plt Show C
hin
II It 33.34
bid
g Gt
50.04
eng
Plt Show C
Marks
him math
33 34
II It
bio
5 Gt
50.07
eng
ie
pit pie 60 20,10 903 labels math him bro
engl
auto pct Y 0.1ft t explodes 0 1,0 0,03
radius 3
Plt Show C
Marks
him math
33 34
II It
io
5 Gt
50.04
eng
categorical data
Regression plot Regression plot as the name
relationships
This is bi variant analysis
This is numerical data vs numerical data
u s
g xÉÉÉx X y
tht
XI
t
y
E u o
x Xx
X
g 35 x x
xx
xx
T x x x X X X
3 o
X x
b X x x x
2.5 X X X X
20
y's go s's to 6s to it 80
Sepal lengthen
Summarization of plots learned so far