0% found this document useful (0 votes)
55 views37 pages

Cs3353 Foundations of Data Science Unit V 01.12.2022

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views37 pages

Cs3353 Foundations of Data Science Unit V 01.12.2022

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

UNIT V DATA VISUALIZATION

Importing Matplotlib – Line plots – Scatter plots – visualizing errors


– density and contour plots –Histograms – legends – colors – subplots
– text and annotation – customization – three dimensional plotting -
Geographic Data with Basemap - Visualization with Seaborn.

1. Data Visualization
 Data visualization is the practice of translating information into a visual context, such as a
map or graph, to make data easier for the human brain to understand and pull insights
from. The main goal of data visualization is to make it easier to identify patterns, trends and
outliers in large data sets.
o The process of finding trends and correlations in our data by representing it
pictorially is called Data Visualization.

Why is data visualization important?


Human memory can remember a picture better than words. We can process visuals 60,000 times
faster compared to text.

The raw data undergoes different stages within a pipeline, which are:
 Fetching the Data
 Cleaning the Data Data visualization is the graphical representation of
 Data Visualization information and data in a pictorial or graphical format
 Modeling the Data (Example: charts, graphs, and maps).
 Interpreting the Data
 Revision

Data visualization is an easy and quick way to convey concepts to others. Data visualization has
some more specialties such as:
 Data visualization can identify areas that need improvement or modifications.
 Data visualization can clarify which factor influence customer behaviour.
 Data visualization helps you to understand which products to place where.
 Data visualization can predict sales volumes.

Merits of using Data Visualization


 To make easier in understanding and remembering.
 To discover unknown facts, outliers, and trends.
 To visualize relationships and patterns quickly.
 To make better decisions.
 To competitive analyse.
 To improve insights.

General Types of Visualizations


 Chart: Information presented in a tabular, graphical form with data displayed along two
axes. Can be in the form of a graph, diagram, or map.
 Table: A set of figures displayed in rows and columns.
 Graph: A diagram of points, lines, segments, curves, or areas that represents certain
variables in comparison to each other, usually along two axes at a right angle.
Unit V CS3352 Foundations of Data Science 1
 Geospatial: A visualization that shows data in map form using different shapes and colors to
show the relationship between pieces of data and specific locations.
 Infographic: A combination of visuals and words that represent data. Usually uses charts or
diagrams.
 Dashboards: A collection of visualizations and data displayed in one place to help with
analyzing and presenting data.

1.1 Python in Data visualization


Python provides various libraries that come with different features for visualizing data. All these
libraries come with different features and can support various types of graphs.
 Matplotlib
 Seaborn
 Bokeh
 Plotly

1.2 Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multi-
platform data visualization library built on NumPy arrays and designed to work with the broader
SciPy stack. It was introduced by John Hunter in the year 2002. One of the greatest benefits of
visualization is that it allows us visual access to huge amounts of data in easily digestible visuals.
Matplotlib consists of several plots like line, bar, scatter, histogram etc.

Installation :
Run the following command to install matplotlibpackage :
python -mpip install -U matplotlib

import matplotlib
Once Matplotlib is installed, import it in your applications by adding the import module statement:
from matplotlib import pyplot as plt
or
import matplotlib.pyplot as plt

matplotlib Version
The version string is stored under __version__ attribute.
import matplotlib Output
print(matplotlib.__version__) 3.4.3

MatplotlibPyplot
Most of the Matplotlib utilities lies under the pyplotsubmodule, and are usually imported under the
plt as:
import matplotlib.pyplot as plt
Now the Pyplot package can be referred to as plt.

1.3 Pyplot Simple


The plot() function draws a line from point to point and it takes two parameters – plot(x,y). Parameter
1 is an array containing the points on the x-axis & Parameter 2 is an array containing the points on the

Unit V CS3352 Foundations of Data Science 2


y-axis. If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and [3, 10] to
the plot function.
/* Python program to plot line using matplotlib */ Output :

import matplotlib.pyplot as plt


x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]
plt.plot(x,y)
plt.show()

Note:
Points plotted are {[5,10], [2,5], [9,8], [4,4], [7,2]}

/* Python program to plot line using numpy arrays */ Output :

import matplotlib.pyplot as plt


import numpy as np

x = np.array([0, 6])
y = np.array([0, 25])
plt.plot(x, y)
plt.show()

Markers
You can use the keyword argument marker to emphasize each point with a specified marker with
markersize = 15.
/* Python program to show marker */ Output :

import matplotlib.pyplot as plt


x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]
plt.plot(x,y,marker ='o', markersize=15)
plt.show()

Linestyle
You can use the keyword argument linestyle, or shorter ls, to change the style of the plotted line

Output :
/* Python program to show linestyle */

import matplotlib.pyplot as plt


x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]
plt.plot(x,y,linestyle='dotted',marker='*')
plt.show()

Note:
linestyle = 'dashed'
plt.plot(x,y,ls ='dashed',marker='*')

Create Labels for a Plot


Use the xlabel() and ylabel() functions to set a label for the x- and y-axis.
/* Python program to show xlabel,ylabel,title */ Output :

Unit V CS3352 Foundations of Data Science 3


import matplotlib.pyplot as plt
x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]
plt.plot(x,y,ls ='dashed',marker='*')
plt.title('Adhiparasathi Engineering College')
plt.xlabel('This is CSE class')
plt.ylabel('Foundations of Data Science')

plt.show()

Add Grid Lines to a Plot


With Pyplot, you can use the grid() function to add grid lines to the plot.
/* Python program to add Grid Lines */ Output :

import matplotlib.pyplot as plt


x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]
plt.plot(x,y,marker = 'o')
plt.grid()
plt.show()

Display Multiple Plots


With the subplot() function you can draw multiple plots in one figure. The subplot() function takes
three arguments. First and second arguments are rows and columns and the third argument
represents the index of the current plot.

/* Python program to show multiple plots */ Output :

import matplotlib.pyplot as plt

#plot 1:
x = [0, 1, 2, 3]
y = [3, 8, 1, 10]
plt.subplot(2, 1, 1)
plt.plot(x,y)

#plot 2:
x = [0, 1, 2, 3]
y = [10, 20, 30, 40]
plt.subplot(2, 1, 2)
plt.plot(x,y)
Note:
plt.show() plt.subplot(2, 1, 1)
It means 2 rows , 1 column, and this
plot is the first plot.

plt.subplot(2, 1, 2)
It means 2 rows, 1 column, and this
plot is the second plot.

1.4 Matplotlib Scatter


With Pyplot, we can use the scatter() function to draw a scatter plot. The scatter() function plots one
dot for each observation. It needs two arrays of the same length, one for the values of the x-axis, and
Unit V CS3352 Foundations of Data Science 4
one for values on the y-axis. The scatter() method takes in the following parameters:
 x_axis_data - An array containing x-axis data
 y_axis_data - An array containing y-axis data
 s- marker size (can be scalar or array of size equal to size of x or y)
 c - color of sequence of colors for markers
 marker- marker style
 cmap - cmap name
 linewidth s- width of marker border
 edgecolor - marker border color
 alpha- blending value, between 0 (transparent) and 1 (opaque)

/* Python program to create scatter plots*/ Output :

import matplotlib.pyplot as plt

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y=[99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y)
plt.show()

/* Python program to create scatter plots*/ Output :

import matplotlib.pyplot as plt

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y=[99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y, marker='*', c='red', s=200,
edgecolor='black' )

plt.show()

/* Python program to create scatter plots & color each Output :


dot*/

import matplotlib.pyplot as plt


x1 = [26, 29, 48, 64, 6]
y1 = [26, 34, 90, 33, 38]
colors=["red","green","blue","yellow","violet”]
plt.scatter(x1, y1, c = colors, s = 200)
plt.show()

/* Python program to create scatter plots*/ Output :

import matplotlib.pyplot as plt


# dataset-1
Unit V CS3352 Foundations of Data Science 5
x1 = [89, 43, 36, 36, 95, 10]
y1 = [21, 46, 3, 35, 67, 95]
plt.scatter(x1, y1, c ="pink", marker
="s", edgecolor ="green", s =50)

# dataset2
x2 = [26, 29, 48, 64, 6]
y2 = [26, 34, 90, 33, 38]
plt.scatter(x2, y2, c ="yellow", marker
="^", edgecolor ="red", s =200)
plt.show()

Add a legend to a scatter plot in Matplotlib


/* Python program to add legends*/ Output :

import matplotlib.pyplot as plt

x1 = [89, 43, 36, 36, 95, 10]


y1 = [21, 46, 3, 35, 67, 95]
plt.scatter(x1, y1, c ="pink", marker
="s", edgecolor ="green", s =50)

x2 = [26, 29, 48, 64, 6]


y2 = [26, 34, 90, 33, 38]
plt.scatter(x2, y2, c ="yellow", marker
="^", edgecolor ="red", s =200)

# apply legend() Note:


plt.legend(["supply" , "sales"]) plt.legend(["supply" , "sales"], ncol = 2 , loc
plt.show() = "lower right")
/* Python program to add legends*/ Output :

import matplotlib.pyplot as plt


x1 = [26, 29, 48, 64, 6]
y1 = [26, 34, 90, 33, 38]
plt.scatter(x1, y1, c ="yellow", marker
="^", edgecolor ="red", s =200)

x2 = [89, 43, 36, 36, 95, 10]


y2 = [21, 46, 3, 35, 67, 95]
plt.scatter(x2, y2, c ="pink", marker
="s", edgecolor ="green", s =50)
plt.legend(["supply" , "sales"])

plt.title("Scatter Plot Demo ",


fontsize=22)
plt.xlabel('FODS',fontsize=20)
plt.ylabel('II-CSE',fontsize=20)
plt.show()

ColorMap
The Matplotlib module has a number of available colormaps. A colormap is like a list of colors,
where each color has a value that ranges from 0 to 100. This colormap is called 'viridis' and as you
can see it ranges from 0, which is a purple color, up to 100, which is a yellow color.
Unit V CS3352 Foundations of Data Science 6
How to Use the ColorMap?
Specify the colormap with the keyword argument cmap with the value of the colormap, in this
case 'viridis' which is one of the built-in colormaps available in Matplotlib. In addition
create an array with values (from 0 to 100), one value for each point in the scatter plot. Some of the
available ColorMaps are Accent, Blues, BuPu, BuGn, CMRmap, Greens, Greys, Dark2 etc.

/* Python program to add color maps*/

import matplotlib.pyplot as plt


import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0,10,20,30,40,45,50,55,60,70,80,90,100])
plt.scatter(x, y, c=colors, cmap='viridis')
plt.show()

Output :

/* Python program to add color maps & color bar*/


Adding
colorbar() import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0,10,20,30,40,45,50,55,60,70,80,90,100])
plt.scatter(x, y, c=colors, cmap='viridis')
plt.colorbar()
plt.show()

Output :

Size
We can change the size of the dots with the s argument. Just like colors, we can do for sizes.

/* Python program to Set your own size for the markers*/ Output :

Unit V CS3352 Foundations of Data Science 7


import matplotlib.pyplot as plt
import numpy as np

x = np.array([5,6,7,8,9,10])
y = np.array([10,20,30,40,50,60])
colors=["red","green","blue","yellow","violet","purple"]
sizes = np.array([100,200,300,400,500,600])
plt.scatter(x, y, c = colors, s=sizes )
plt.show()
Alpha
Adjust the transparency of the dots with the alpha argument. Just like colors, make sure the array for
sizes has the same length as the arrays for the x- and y-axis.
/* Python program to Set alpha*/
Output :
import matplotlib.pyplot as plt
import numpy as np

x = np.array([5,6,7,8,9,10])
y = np.array([10,20,30,40,50,60])
colors=["red","green","blue","yellow","violet","purple"
]
sizes = np.array([100,200,300,400,500,600])
plt.scatter(x,y,c=colors,s=sizes,alpha=0.5)
plt.show()

Create random arrays with 100 values for x-points, y-points, colors and sizes
/* Python program to create random arrays , random colors, Output :
random sizes*/
import matplotlib.pyplot as plt
import numpy as np

x = np.random.randint(100,size=(100))
y = np.random.randint(100,size=(100))
colors = np.random.randint(100,size=(100))
sizes = 10 * np.random.randint(100,size=(100))

plt.scatter(x, y, c = colors, s=sizes,


cmap='nipy_spectral',alpha = 0.5 )
plt.colorbar()
plt.show()

Unit V CS3352 Foundations of Data Science 8


1.4 Visualizing errors in Python using Matplotlib

Error bars function used as graphical enhancement that visualizes the variability of the plotted
data on a Cartesian graph. Error bars can be applied to graphs to provide an additional layer of
detail on the presented data.

Scatter plot Dot Plot

Bar chart Line plot

Error bars indicate estimated error or uncertainty. Measurement is done through the use of
markers drawn over the original graph and its data points. To visualize this information, error
bars work by drawing lines that extend from the centre of the plotted data point to reveal this
uncertainty of a data point.
A short error bar shows that values are concentrated signaling around the plotted value, while a
long error bar indicate that the values are more spread out and less reliable. The length of each
pair of error bars tends to be of equal length on both sides; however, if the data is skewed then
the lengths on each side would be unbalanced.

Error bars always run parallel to a quantity of scale axis so they can be displayed either vertically
or horizontally depending on whether the quantitative scale is on the y-axis or x-axis if there are
two quantities of scales and two pairs of arrow bars can be used for both axes.

Unit V CS3352 Foundations of Data Science 9


/* Python program to create random simple Output :
graph */

# importing matplotlib
import matplotlib.pyplot as plt

# making a simple plot


x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]

# plotting graph
plt.plot(x, y)
/* Python program to add some error in y Output :
value in the simple graph */

import matplotlib.pyplot as plt


x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]

# creating error
y_error = 0.2

plt.plot(x, y)
plt.errorbar(x, y, Note:
yerr = y_error, fmt is a format code controlling the appearance of
fmt ='o') lines and points
/* Python program to add some error in x Output :
value in the simple graph */

import matplotlib.pyplot as plt


x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]
x_error = 0.5
plt.plot(x, y)
plt.errorbar(x, y,
xerr = x_error,
fmt ='o')
/* Python program to add various Output :
parameters & some error in x value */

import matplotlib.pyplot as plt


x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]
x_error = 0.5
plt.plot(x, y, color = "red")
plt.errorbar(x, y, xerr=x_error,
fmt='o', color='black',
ecolor='green', elinewidth=3,
capsize=10);
/* Python program to add some error in x & Output :
y value in the simple graph */

import matplotlib.pyplot as plt


x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]
x_error = 0.5
y_error = 0.3
plt.plot(x, y, color = "red")
plt.errorbar(x, y,yerr = y_error,
xerr = x_error,
fmt='o',ecolor="green")

Unit V CS3352 Foundations of Data Science 10


/* Python program to add some error in Output :
scatter plot */

import matplotlib.pyplot as plt


x = [1, 3, 5, 7]
y = [11, -2, 4, 19]
plt.scatter(x, y, marker='*' )
c = [1, 3, 2, 1]
plt.errorbar(x, y, yerr=c, fmt="o",
ecolor= "black")
plt.show()

Bar Plot in Matplotlib


A bar plot or bar chart is a graph that represents the category of data with rectangular bars with
lengths and heights that is proportional to the values which they represent. The bar plots can be
plotted horizontally or vertically. A bar chart describes the comparisons between the discrete
categories. One of the axis of the plot represents the specific categories being compared, while
the other axis represents the measured values corresponding to those categories.
ax.bar(x, height, width, bottom, align)
The function returns a Matplotlib container object with all bars.

/* Python program to implement Bar Chart */ Output :


import matplotlib.pyplot as plt
import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.bar(x,y, color="red")
plt.show()

Following is a simple example of the Matplotlib bar plot. It shows the number of students enrolled for
various courses offered at an institute.
/* Python program to implement Bar Chart */ Output :

import matplotlib.pyplot as plt


langs = ['C', 'C++', 'Java', 'Python', 'PHP']
students = [23,17,35,29,12]
plt.bar(langs,students, color= "violet")
plt.show()

Bar Width
The bar() takes the keyword argument width to set the width of the bars. Default width value is 0.8

/* Python program to implement Bar Chart */ Output :

import matplotlib.pyplot as plt


langs = ['C', 'C++', 'Java', 'Python', 'PHP']
students = [23,17,35,29,12]
plt.bar(langs,students, color= "violet", width = 0.1)
plt.show()

Unit V CS3352 Foundations of Data Science 11


Bar Height
The barh() takes the keyword argument height to set the height of the bars: Note: For horizontal bars,
use height instead of width.

/* Python program to implement Bar Chart */ Output :

import matplotlib.pyplot as plt


langs = ['C', 'C++', 'Java', 'Python', 'PHP']
students = [23,17,35,29,12]
plt.barh(langs,students, color= "violet")
plt.show()

/* Python program to implement Bar Chart */ Output :

import matplotlib.pyplot as plt


langs = ['C', 'C++', 'Java', 'Python', 'PHP']
students = [23,17,35,29,12]
plt.barh(langs,students, color= "violet", height = 0.1
)
plt.show()

Plotting multiple bar charts using Matplotlib in Python


A multiple bar chart is also called a Grouped Bar chart. A Bar plot or a Bar Chart has many
customizations such as Multiple bar plots, stacked bar plots, horizontal bar charts. Multiple bar
charts are generally used for comparing different entities.
Example 1: Simple multiple bar chart
In this example we will see how to plot multiple bar charts using matplotlib, here we are plotting
multiple bar charts to visualize the number of boys and girls in each Group.

/* Python program to implement Bar Chart */ Output :

import numpy as np
import matplotlib.pyplot as plt

X = ['Group A','Group B','Group C','Group D']


girls = [10,20,20,40]
boys = [20,30,25,30]

X_axis = np.arange(len(X))
width = 0.25

plt.bar(X_axis - 0.2, girls, width, label = 'Girls')


plt.bar(X_axis + 0.2, boys, width, label = 'Boys')

plt.xticks(X_axis, X)
plt.xlabel("Groups")
plt.ylabel("Number of Students")
plt.title("Number of Students in each group")
plt.legend()
plt.show()

Unit V CS3352 Foundations of Data Science 12


 Importing required libraries such as numpy for performing numerical calculations with
arrays and matplotlib for visualization of data.
 The data for plotting multiple bar charts are taken into the list.
 The np.arange( ) function from numpy library is used to create a range of values. We are
creating the X-axis values depending on the number of groups in our example.
 Plotting the multiple bars using plt.bar( ) function.
 To avoid overlapping of bars in each group, the bars are shifted -0.2 units and +0.2 units
from the X-axis.
 The width of the bars of each group is taken as 0.4 units.
 Finally, the multiple bar charts for both boys and girls are plotted in each group.

numpy.linspace() function
The linspace() function returns evenly spaced numbers over a specified interval [start, stop].
The endpoint of the interval can optionally be excluded.
Syntax: numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

# Both Linear Plotting & Scatter plot Output :

import numpy as np
import matplotlib.pyplot as plt
A = 5
y = np.zeros(A)
a1 = np.linspace(0, 10, 5)
plt.plot(a1, y)
plt.scatter(a1, y, c ="red", marker ="s",
edgecolor ="green", s =50)
# Python Program to illustrate Linear Plotting Output :
import matplotlib.pyplot as plt

year = [1972, 1982, 1992, 2002, 2012]


e_india = [100, 158, 305, 394, 724]
e_bangladesh = [10, 25, 58, 119, 274]

plt.plot(year, e_india, color ='red',


marker ='o', markersize = 12,
label ='India')
plt.plot(year, e_bangladesh, color ='g',
linestyle ='dashed', linewidth = 2,
label ='Bangladesh')

plt.xlabel('Years')
plt.ylabel('Power consumption in kWh')
plt.title('Electricity consumption')
plt.legend()
plt.show()

Unit V CS3352 Foundations of Data Science 13


Contour Plot using Matplotlib – Python
 Contour plots also called level plots are a tool for doing multivariate analysis and
visualizing 3-D plots in 2-D space. If we consider X and Y as our variables then the
response Z will be plotted as slices on the X-Y plane. That’s why contours are sometimes
referred as Z-slices or iso-response.
 Contour plots are widely used to visualize density, altitudes or heights of the mountain as
well as in the meteorological department. matplotlib.pyplot provides a
method contour to draw contour plots.

matplotlib.pyplot.contour
 matplotlib.pyplot.contour() are usually referred as Z = f(X, Y) i.e Z changes as a
function of input X and Y.
 contourf() is also available which allows us to draw filled contours.

Syntax: matplotlib.pyplot.contour([X, Y, ] Z, [levels])

 Parameters:
X, Y : 2-D numpy arrays with len(X)==M & len(Y)==N [M = rows, N = columns of Z]
Z : The height values over which the contour is drawn. Shape is (M, N)
levels : Determines the number and positions of the contour lines / regions

Example #1: Plotting of Contour using contour() which only plots contour lines.

/* Python program to implement Contour Plot */ Output :

import matplotlib.pyplot as plt


import numpy as np

feature_x = np.arange(0, 50, 2)


feature_y = np.arange(0, 50, 3)

# Creating 2-D grid of features


[X, Y] = np.meshgrid(feature_x, feature_y)
fig, ax = plt.subplots(1, 1)
Z = np.cos(X / 2) + np.sin(Y / 4)

# plots contour lines


ax.contour(X, Y, Z)
ax.set_title('Contour Plot')
ax.set_xlabel('feature_x')
ax.set_ylabel('feature_y')
plt.show()
Example #2: Plotting of contour using contourf() which plots filled contours.
/* Python program to implement Contourf Plot */
Output :
# Implementation of matplotlib function
import matplotlib.pyplot as plt
import numpy as np
feature_x = np.linspace(-5.0, 3.0, 70)
feature_y = np.linspace(-5.0, 3.0, 70)
Unit V CS3352 Foundations of Data Science 14
# Creating 2-D grid of features
[X, Y] = np.meshgrid(feature_x, feature_y)
fig, ax = plt.subplots(1, 1)
Z = X ** 2 + Y ** 2

# plots filled contour plot


ax.contourf(X, Y, Z)
ax.set_title('Filled Contour Plot')
ax.set_xlabel('feature_x')
ax.set_ylabel('feature_y')
plt.show()

Example #3: Plotting of contour using contourf() which plots filled contours.
/* Python program to implement Contourf Plot */
Output :
import numpy as np
import matplotlib.pyplot as plt
xlist = np.linspace(-3.0, 3.0, 100)
ylist = np.linspace(-3.0, 3.0, 100)
X, Y = np.meshgrid(xlist, ylist)
Z = np.sqrt(X**2 + Y**2)
fig,ax=plt.subplots(1,1)

cp = ax.contourf(X, Y, Z)
fig.colorbar(cp)
ax.set_title('Filled Contours Plot')
#ax.set_xlabel('x (cm)')
ax.set_ylabel('y (cm)')
plt.show()

Unit V CS3352 Foundations of Data Science 15


Matplotlib Histogram
 A histogram is a graph showing frequency distributions.
 It is a graph showing the number of observations within each given interval.
Example: Histogram for the height of 250 people is

We can read from the histogram that there are approximately:


2 people from 140 to 145cm, 5 people from 145 to 150cm , 15 people from 151 to 156cm
31 people from 157 to 162cm, 46 people from 163 to 168cm, 53 people from 168 to 173cm
45 people from 173 to 178cm, 28 people from 179 to 184cm, 21 people from 185 to 190cm
4 people from 190 to 195cm
Example
A Normal Data Distribution by NumPy:

import numpy as np
x = np.random.normal(170, 10, 250)
print(x)

/* Python program to implement Histogram */ Output :

import matplotlib.pyplot as plt


import numpy as np
x = np.random.normal(170, 10, 250)
plt.hist(x)
plt.show()

/* Python program to implement Histogram */ Output :

import matplotlib.pyplot as plt


x = [1, 1, 2, 3, 3, 5, 7, 8, 9, 10,
10, 11, 11, 13, 13, 15, 16, 17, 18, 18,
18, 19, 20, 21, 21, 23, 24, 24, 25, 25,
25, 25, 26, 26, 26, 27, 27, 27, 27, 27,
29, 30, 30, 31, 33, 34, 34, 34, 35, 36,
36, 37, 37, 38, 38, 39, 40, 41, 41, 42,
43, 44, 45, 45, 46, 47, 48, 48, 49, 50,
51, 52, 53, 54, 55, 55, 56, 57, 58, 60,
61, 63, 64, 65, 66, 68, 70, 71, 72, 74,
75, 77, 81, 83, 84, 87, 89, 90, 90, 91]
plt.style.use('ggplot')
plt.hist(x, bins = 10)
plt.show()

Unit V CS3352 Foundations of Data Science 16


Matplotlib Pie Charts
A Pie Chart is a circular statistical plot that can display only one series of data. The area of the chart
is the total percentage of the given data. The area of slices of the pie represents the percentage of
the parts of the data. The slices of pie are called wedges. The area of the wedge is determined by
the length of the arc of the wedge. Pie charts are commonly used in business presentations like
sales, operations, survey results, resources, etc as they provide a quick summary.
With Pyplot, you can use the pie() function to draw pie charts. By default the plotting of the first
wedge starts from the x-axis and moves counterclockwise

/* Python program to implement Barchart */ Output :

import matplotlib.pyplot as plt


y = [15, 35, 10, 40]
mylabels = ["Apples", "Bananas", "Cherries",
"Dates"]
plt.pie(y, labels = mylabels)
plt.show()

/* Python program to change start angle */ Output :

import matplotlib.pyplot as plt


y = [15, 35, 10, 40]
mylabels = ["Apples", "Bananas", "Cherries",
"Dates"]
plt.pie(y, labels = mylabels, startangle =
90 )
plt.show()

/* Python program to include Explode */ Output :

import matplotlib.pyplot as plt


import numpy as np
y = np.array([15, 35, 10, 40])
mylabels = ["Apples", "Bananas", "Cherries
", "Dates"]
myexplode = [0.2, 0, 0, 0]
plt.pie(y, labels = mylabels, startangle =
90, explode = myexplode, shadow = True )
plt.show()

/* Python program to include Explode & adding Output :


Unit V CS3352 Foundations of Data Science 17
legends*/

import matplotlib.pyplot as plt


import numpy as np

y = np.array([15, 35, 10, 40])


mylabels = ["Apples", "Bananas", "Cherries
", "Dates"]
myexplode = [0.2, 0, 0, 0]
mycolors=["red","hotpink","green","darkblu
e"]
plt.pie(y, labels = mylabels, colors = myc
olors, startangle = 90, explode = myexplod
e )
plt.legend(title = "Four Fruits:")
plt.show()

Matplotlib.pyplot.annotate() in Python
 Matplotlib is a library in Python and it is numerical – mathematical extension for NumPy
library. Pyplot is a state-based interface to a Matplotlib module which provides a
MATLAB-like interface.
Syntax: matplotlib.pyplot.annotate()
The annotate() function in pyplot module of matplotlib library is used to annotate the point xy
with text s.
 Text - This parameter represents the text that we want to annotate.
 xy - This parameter represents the Point X and Y to annotate.
 XYText - An optional parameter represents the position where the text along X and Y needs
to be placed.
 XYCOORDS - This parameter contains the string value.
 ARROWPROPS - This parameter is also an optional value and contains “dict” type. By
default it is none.
Terms used
 s: This parameter is the text of the annotation.
 xy: This parameter is the point (x, y) to annotate.
 xytext: This parameter is an optional parameter. It is The position (x, y) to place the text at.
 xycoords: This parameter is also an optional parameter and contains the string value.
 textcoords: This parameter contains the string value.Coordinate system that xytext is given,
which may be different than the coordinate system used for xy
 arrowprops : This parameter is also an optional parameter and contains dict type.Its default
value is None.
 annotation_clip : This parameter is also an optional parameter and contains boolean
value.Its default value is None which behaves as True.

/* Sine waveform */
Unit V CS3352 Foundations of Data Science 18
import matplotlib.pyplot as plt Output :
import numpy as np
fig, ppool = plt.subplots()
t = np.arange(0.0, 1.0, 0.001)
s = np.sin(2 * np.pi * t)
line = ppool.plot(t, s, lw=2)
ppool.annotate('Max value II Year',xy=(.25
, 1), xytext=(1, 1),arrowprops=dict(faceco
lor='green',shrink=0.05),xycoords="data",)

ppool.set_ylim(-1.5, 1.5)
plt.show()

/* Cosine waveform */ Output :


import matplotlib.pyplot as plt
import numpy as np
fig, ppool = plt.subplots()
t = np.arange(0.0, 5.0, 0.001)
s = np.cos(3 * np.pi * t)
line = ppool.plot(t, s, lw=2)
ppool.annotate('Max value II Year', y=(3.3
,1),xytext=(3, 1.5),arrowprops=dict(faceco
lor='green',shrink=0.05),xycoords="data",)

ppool.set_ylim(-2, 2)
plt.show()

Annotate Scatter Plot - We annotate a scatter plot using this method


import matplotlib.pyplot as plt Output :
y = [3.2, 3.9, 3.7, 3.5, 3.02199]
x = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [155, "outliner", 293, 230, 670]
fig, ax = plt.subplots()
ax.scatter(x, y, c ="yellow", marker ="^",
edgecolor ="red", s =100)
for i, txt in enumerate(n):
ax.annotate(txt, (x[i], y[i]))
plt.show()

Annotate Bar chart Output :


import matplotlib.pyplot as plt
import numpy as np
labels = ['Mon', 'Tue', 'Wed','Thu','Fri']
shop_a = [20, 33, 30, 28, 27]
shop_b = [25, 32, 33, 20, 25]
x = np.arange(len(labels))
width = 0.35 # the width of the bars

fig, ax = plt.subplots()

Unit V CS3352 Foundations of Data Science 19


rects1=ax.bar(x- width/2, shop_a, width,
label ='Sales-a')
rects2 = ax.bar(x + width/2, shop_b, width
, label='sales-b')

ax.set_ylabel('Sales')
ax.set_title('Sales report of 2 shops')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

def autolabel(rects):
for rect in rects:
height = rect.get_height()
ax.annotate('{}'.format(height),
xy=(rect.get_x() + rec
t.get_width() / 2, height),
xytext=(0, 3),
textcoords="offset poi
nts", size=16,color="Green",
ha='center', va='botto
m')

autolabel(rects1)
autolabel(rects2)
fig.tight_layout()
plt.show()

Unit V CS3352 Foundations of Data Science 20


Three-Dimensional Plotting in Matplotlib
Matplotlib was initially designed with only two-dimensional plotting in mind. Around the
time of the 1.0 release, some three-dimensional plotting utilities were built on top of Matplotlib's
two-dimensional display, and the result is a convenient (if somewhat limited) set of tools for three-
dimensional data visualization.
Three-dimensional plots are enabled by importing the mplot3d toolkit, included with the
main Matplotlib installation:
from mpl_toolkits import mplot3d

Once this submodule is imported, a three-dimensional axes can be created by passing the keyword
projection='3d' to any of the normal axes creation routines:

#Python program to create axis in 3DPlotting Output :

import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection ='3d')

With the above syntax three dimensional axes are enabled and data can be plotted in 3 dimensions.
3 dimension graph gives a dynamic approach and makes data more interactive. Like 2-D graphs, we
can use different ways to represent 3-D graph. We can make a scatter plot, contour plot, surface
plot, etc. Let’s have a look at different 3-D plots.
 Graph with lines and point are the simplest 3 dimensional graph.
 ax.plot3d and ax.scatter are the function to plot line and point graph respectively.

Example 1: Three dimensional line graph


# importing mplot3d tool, numpy and matplotlib Output:

from mpl_toolkits import mplot3d


import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()

# syntax for 3-D projection


ax = plt.axes(projection ='3d')

# defining all 3 axes


z = np.linspace(0, 1, 100)
x = z * np.sin(25 * z)
y = z * np.cos(25 * z)

# plotting
ax.plot3D(x, y, z, 'darkblue')
ax.set_title('3D Line Plot by II CSE B Students')
plt.show()

Unit V CS3352 Foundations of Data Science 21


Example 2: Three dimensional Scatter plot
# importing mplot3d tool, numpy and matplotlib
from mpl_toolkits import mplot3d Output:
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()

# syntax for 3-D projection


ax = plt.axes(projection ='3d')

# defining all 3 axes


z = np.linspace(0, 1, 100)
x = z * np.sin(25 * z)
y = z * np.cos(25 * z)

# plotting
ax.scatter3D(x, y, z, c ='green')
ax.set_title('3D scatter Plot by II CSE B Students')
plt.show()

Example 3: Three dimensional Scatter plot


# importing mplot3d, numpy and matplotlib Output:
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()

# syntax for 3-D projection


ax = plt.axes(projection ='3d')

# defining all 3 axes


z = np.linspace(0, 1, 100)
x = z * np.sin(25 * z)
y = z * np.cos(25 * z)
col = x + y

# plotting
ax.scatter3D(x, y, z, c = col)
ax.set_title('3D scatter Plot by II CSE B Students')
plt.show()

Example 4: Three dimensional Contour plot


# importing mplot3d, numpy and matplotlib Output:
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt

def f(x, y):


return np.sin(np.sqrt(x ** 2 + y ** 2))

Unit V CS3352 Foundations of Data Science 22


feature_x = np.linspace(-6, 6, 30)
feature_y = np.linspace(-6, 6, 30)

X, Y = np.meshgrid(feature_x, feature_y)
Z = f(X, Y)

fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='binary')

# plots contour lines cmap='RdGy'


ax.contour(X, Y, Z) cmap='viridis'
ax.set_title('Contour Plot')
ax.set_xlabel('feature_x')
ax.set_ylabel('feature_y')
plt.show()

Geographic Data with Basemap


The common type of visualization in data science is that of geographic data. Matplotlib's
main tool for this type of visualization is the Basemap. Modern solutions such as leaflet or the
Google Maps API may be a better choice for more intensive map visualizations. Still, Basemap is a
useful tool for Python users to have in their virtual toolbelts.
Installation of Basemap
pip install basemap

Once you have the Basemap toolkit installed and imported, geographic plots can be easily drawn.

#Python program to create Geographic Data with Basemap Output:

from mpl_toolkits.basemap import Basemap


import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=
50, lon_0=-100)
m.bluemarble(scale=0.5);

#Python program to create Geographic Data with Basemap Output:

from mpl_toolkits.basemap import Basemap


from itertools import chain
import matplotlib.pyplot as plt
import numpy as np

def draw_map(m, scale=0.2):


# draw a shaded-relief image projection='cyl'
m.shadedrelief(scale=scale)

# lats and longs are returned as a dictionary


lats = m.drawparallels(np.linspace(-90, 90, 13))

Unit V CS3352 Foundations of Data Science 23


lons = m.drawmeridians(np.linspace(-180, 180, 13)) projection='moll'

# keys contain the plt.Line2D instances


lat_lines=chain(*(tup[1][0] for tup in lats.items()))
lon_lines=chain(*(tup[1][0] for tup in lons.items()))
all_lines=chain(lat_lines,lon_lines) projection='lcc'

# cycle through these lines and set the desired style


for line in all_lines:
line.set(linestyle='-', alpha=0.3, color='w')

fig = plt.figure(figsize=(8, 8))


m = Basemap(projection='ortho', resolution=None,
lat_0=50, lon_0=0) projection='merc'
draw_map(m);

#Python program to create Geographic Data with Basemap Output:


from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
# set up orthographic map projection with
# perspective of satellite looking down at 50N, 100W.
# use low resolution coastlines.
map = Basemap(projection='ortho',lat_0=45,lon_0=-
100,resolution='l')
# draw coastlines, country boundaries, fill continents.
map.drawcoastlines(linewidth=0.25)
map.drawcountries(linewidth=0.25)
map.fillcontinents(color='coral',lake_color='aqua')
# draw the edge of the map projection region (the proje
ction limb)
map.drawmapboundary(fill_color='aqua')
# draw lat/lon grid lines every 30 degrees.
map.drawmeridians(np.arange(0,360,30))
map.drawparallels(np.arange(-90,90,30))
# make up some data on a regular lat/lon grid.
nlats = 73; nlons = 145; delta = 2.*np.pi/(nlons-1)
lats = (0.5*np.pi-delta*np.indices((nlats,nlons))
[0,:,:])
lons = (delta*np.indices((nlats,nlons))[1,:,:])
wave = 0.75*(np.sin(2.*lats)**8*np.cos(4.*lons))
mean = 0.5*np.cos(2.*lats)*((np.sin(2.*lats))**2 + 2.)
# compute native map projection coordinates of lat/lon
grid.
x, y = map(lons*180./np.pi, lats*180./np.pi)
# contour data over the map.
cs = map.contour(x,y,wave+mean,15,linewidths=1.5)
plt.title('contour lines over filled continent backgrou
nd')
plt.show()

Unit V CS3352 Foundations of Data Science 24


Map Projections
The first thing to decide when using maps is what projection to use. Depending on the intended use
of the map projection, there are certain map features (e.g., direction, area, distance, shape, or other
considerations) that are useful to maintain. The Basemap package implements several dozen such
projections, all referenced by a short format code.
 Cylindrical projections
 Pseudo-cylindrical projections
 Perspective projections
Cylindrical projections
The simplest of map projections are cylindrical projections, in which lines of constant latitude and
longitude are mapped to horizontal and vertical lines, respectively. This type of mapping
represents equatorial regions quite well, but results in extreme distortions near the poles. The
spacing of latitude lines varies between different cylindrical projections, leading to different
conservation properties, and different distortion near the poles.
Projections used:
projection='merc' & projection='cea'

Pseudo-cylindrical projections
Pseudo-cylindrical projections relax the requirement that meridians should be always vertical
Hence, this give better properties near the poles of the projection. The Mollweide projection
(projection='moll') is one common example of this, in which all meridians are elliptical arcs.
(projection='sinu') and Robinson (projection='robin') projections.
Projections used:
projection='moll' & projection='sinu' & projection='robin'

Perspective projections
Perspective projections are constructed using a particular choice of perspective point, similar to if
you photographed the Earth from a particular point in space. One common example is the
orthographic projection (projection='ortho'), which shows one side of the globe as seen from a
viewer at a very long distance. Popular projections used are gnomonic projection and
stereographic projection. These are often the most useful for showing small portions of the map.
Projections used:
projection='gnom' & projection='stere'

Conic projections
A Conic projection projects the map onto a single cone, which is then unrolled. This can lead to very
good local properties, but regions far from the focus point of the cone may become very distorted.
One example of this is the Lambert Conformal Conic projection. It projects the map onto a cone
arranged in such a way that two standard parallels (specified in Basemap by lat_1 and lat_2) have
well-represented distances, with scale decreasing between them and increasing outside of them.
Other useful conic projections are the equidistant conic projection and the Albers equal-area
projection Conic projections, like perspective projections, tend to be good choices for representing
small to medium patches of the globe.
Projections used:
projection='lcc' & projection='eqdc' & projection='aea'

Unit V CS3352 Foundations of Data Science 25


Drawing a Map Background
Earlier we saw the bluemarble() and shadedrelief() methods for projecting global images on the
map, as well as the drawparallels() and drawmeridians() methods for drawing lines of constant
latitude and longitude. The Basemap package contains a range of useful functions for drawing
borders of physical features like continents, oceans, lakes, and rivers, as well as political
boundaries such as countries and US states and counties. The following are some of the available
drawing functions that you may wish to explore using IPython's help features:
Physical boundaries and bodies of water
 drawcoastlines(): Draw continental coast lines
 drawlsmask(): Draw a mask between the land and sea, for use with projecting images on
one or the other
 drawmapboundary(): Draw the map boundary, including the fill color for oceans.
 drawrivers(): Draw rivers on the map
 fillcontinents(): Fill the continents with a given color; optionally fill lakes with another
color
Political boundaries
 drawcountries(): Draw country boundaries
 drawstates(): Draw US state boundaries
 drawcounties(): Draw US county boundaries
Map features
 drawgreatcircle(): Draw a great circle between two points
 drawparallels(): Draw lines of constant latitude
 drawmeridians(): Draw lines of constant longitude
 drawmapscale(): Draw a linear scale on the map
Whole-globe images
 bluemarble(): Project NASA's blue marble image onto the map
 shadedrelief(): Project a shaded relief image onto the map
 etopo(): Draw an etopo relief image onto the map
 warpimage(): Project a user-provided image onto the map

Two dimenstional Histogram & Binnings

import matplotlib.pyplot as plt Output


import numpy as np

mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 1000
0).T
plt.hist2d(x, y, bins=30, cmap='Blues')
cb = plt.colorbar()
cb.set_label('counts in bin')

Unit V CS3352 Foundations of Data Science 26


Python Seaborn Tutorial
Seaborn is a library mostly used for statistical plotting in Python. It is built on top of Matplotlib
and provides beautiful default styles and color palettes to make statistical plots more attractive.
Installation:
Seaborn can be installed using the pip.
pip install seaborn

Note:
Seaborn has the following dependencies:
 Python 2.7 or 3.4+
 numpy
 scipy
 pandas
 matplotlib

Categories of Plots in Python's seaborn library


Plots are generally used to make visualization of the relationships between the given variables.
These variables can either be a category like a group, division, or class or can be completely
numerical variables. There are various different categories of plots that we can create using the
seaborn library.
 Distribution plots: This type of plot is used for examining both types of distributions, i.e.,
univariate and bivariate distribution.
 Relational plots: This type of plot is used to understand the relation between the two given
variables.
 Regression plots: Regression plots in the seaborn library are primarily intended to add an
additional visual guide that will help to emphasize dataset patterns during the analysis of
exploratory data.
 Categorical plots: The categorical plots are used to deals with categories of variables and
how we can visualize them.
 Multi-plot grids: The multi-plot grids are also a type of plot that is a useful approach is to
draw multiple instances for the same plot with different subsets of a single dataset.
 Matrix plots: The matrix plots are a type of arrays of the scatterplots.

Dist plot :
Seaborn dist plot is used to plot a histogram, with some other variations like kdeplot and rugplot.
# Importing libraries Output:
import numpy as np
import seaborn as sns

# Selecting style as white,dark, whitegri


d,
# darkgrid or ticks
sns.set(style="white")

# Generate a random univariate dataset


rs = np.random.RandomState(10)
d = rs.normal(size=100)

# Plot a simple histogram and kde


# with binsize determined automatically
sns.distplot(d, kde=True, color="m")
Unit V CS3352 Foundations of Data Science 27
Line plot : The line plot is one of the most basic plot in
seaborn library. This plot is mainly used to visualize
the data in form of some time series, i.e. in continuous
manner.

import seaborn as sns

sns.set(style="dark")
fmri = sns.load_dataset("fmri")

# Plot the responses for different\


# events and regions
sns.lineplot(x="timepoint",
y="signal",
hue="region",
style="event",
data=fmri)

Lmplot : The lmplot is another most basic plot. It shows a line


representing a linear regression model along with data points on the 2D-
space and x and y can be set as the horizontal and vertical labels
respectively.
import seaborn as sns

sns.set(style="ticks")

# Loading the dataset


df = sns.load_dataset("anscombe")

# Show the results of a linear regression


sns.lmplot(x="x", y="y", data=df)

seaborn.lineplot()
x, y: Input data variables; must be numeric.
Can pass data directly or reference columns
in data.
hue: Grouping variable that will produce
lines with different colors. Can be either
categorical or numeric, although color
mapping will behave differently in latter
case.
style: Grouping variable that will produce
lines with different dashes and/or markers.
Can have a numeric dtype but will always
be treated as categorical.
data: Tidy (“long-form”) dataframe where
each column is a variable and each row is
an observation.
Unit V CS3352 Foundations of Data Science 28
markers: Object determining how to draw
the markers for different levels of the style
variable.
legend: How to draw the legend. If “brief”,
numeric “hue“ and “size“ variables will be
represented with a sample of evenly spaced
value

Plotting categorical scatter plots with


Seaborn
Explanation: This is the one kind of scatter
plot of categorical data with the help of
seaborn.
 Categorical data is represented on the x-
axis and values correspond to them
represented through the y-axis.
 .striplot() function is used to define the
type of the plot and to plot them on
canvas using.
 .set() function is used to set labels of x-
axis and y-axis.
 .title() function is used to give a title to
the graph.
 To view plot we use .show() function.

# Python program to illustrate


# Plotting categorical scatter
# plots with Seaborn

# importing the required module


import matplotlib.pyplot as plt
import seaborn as sns

# x axis values
x =['sun', 'mon', 'fri', 'sat', 'tue', 'w
ed', 'thu']

# y axis values
y =[5, 6.7, 4, 6, 2, 4.9, 1.8]

# plotting strip plot with seaborn


ax = sns.stripplot(x, y);

# giving labels to x-axis and y-axis


ax.set(xlabel ='Days', ylabel ='Amount_sp
end')

# giving title to the plot


plt.title('My first graph');

Unit V CS3352 Foundations of Data Science 29


# function to show plot
plt.show()

Stripplot
It basically creates a scatter plot based on
the category.
Syntax:
stripplot([x, y, hue, data, order,
…])

# import the seaborn library


import seaborn as sns

# reading the dataset


df = sns.load_dataset('tips')
sns.stripplot(x='day', y='total_bill', da
ta=df,
jitter=True, hue='smoker',
dodge=True)

Explanation:
 One problem with strip plot is
that you can’t really tell which
points are stacked on top of
each other and hence we use
the jitter parameter to add
some random noise.
 jitter parameter is used to add
an amount of jitter (only along
the categorical axis) which
can be useful when you have
many points and they overlap
so that it is easier to see the
distribution.
 hue is used to provide an
additional categorical
separation
 setting split=True is used to
draw separate strip plots
based on the category
specified by the hue
parameter.

Unit V CS3352 Foundations of Data Science 30


Barplot
A barplot is basically used to aggregate the
categorical data according to some methods
and by default it’s the mean. It can also be
understood as a visualization of the group
by action. To use this plot we choose a
categorical column for the x-axis and a
numerical column for the y-axis, and we see
that it creates a plot taking a mean per
categorical column.
barplot([x, y, hue, data, order,
hue_order, …])

# import the seaborn library


import seaborn as sns

# reading the dataset


df = sns.load_dataset('tips')

# change the estimator from mean to


# standard deviation
sns.barplot(x ='sex', y ='total_bill', da
ta = df,
palette ='plasma') Explanation:
Looking at the plot we can say
that the average total_bill for the
male is more than compared to
the female.
 Palette is used to set the color
of the plot
 The estimator is used as a
statistical function for
estimation within each
categorical bin.

Countplot
A countplot basically counts the categories
and returns a count of their occurrences. It
is one of the simplest plots provided by the
seaborn library.
Syntax:
countplot([x, y, hue, data, order,
…])

# import the seaborn library


import seaborn as sns

Unit V CS3352 Foundations of Data Science 31


# reading the dataset
df = sns.load_dataset('tips')

sns.countplot(x ='sex', data = df)

Explanation:
Looking at the plot we can say
that the number of males is more
than the number of females in
the dataset. As it only returns the
count based on a categorical
column, we need to specify only
the x parameter.

Boxplot
Box Plot is the visual representation of the
depicting groups of numerical data through
their quartiles. Boxplot is also used to detect
the outlier in the data set.
Syntax:
boxplot([x, y, hue, data, order,
hue_order, …])

# import the seaborn library


import seaborn as sns

# reading the dataset


df = sns.load_dataset('tips')

sns.boxplot(x='day', y='total_bill', data


=df, hue='smoker')

Explanation:
x takes the categorical column
and y is a numerical column.

Unit V CS3352 Foundations of Data Science 32


Hence we can see the total bill
spent each day.” hue” parameter
is used to further add a
categorical separation. By
looking at the plot we can say
that the people who do not
smoke had a higher bill on Friday
as compared to the people who
smoked.

Violinplot
It is similar to the boxplot except that it
provides a higher, more advanced
visualization and uses the kernel density
estimation to give a better description about
the data distribution.
Syntax:
violinplot([x, y, hue, data, order,
…])

# import the seaborn library


import seaborn as sns

# reading the dataset


df = sns.load_dataset('tips')
sns.violinplot(x='day', y='total_bill', d
ata=df,
hue='sex', split=True)

Explanation:
 hue is used to separate the
data further using the sex
category
 setting split=True will draw
half of a violin for each level.
This can make it easier to
directly compare the
distributions.

Unit V CS3352 Foundations of Data Science 33


Seaborn | Style And Color
Seaborn is a statistical plotting library in python. It has beautiful default styles.
This article deals with the ways of styling the different kinds of plots in seaborn.

Seaborn Figure Styles


This affects things like the color of the axes, whether a grid is enabled by
default, and other aesthetic elements.
The ways of styling themes are as follows:
 white
 dark
 whitegrid
 darkgrid
 ticks

Set the background to be


white:
import seaborn as sns
import matplotlib.pyplot as pl
t

# load the tips dataset presen


t by default in seaborn
tips = sns.load_dataset('tips'
)
sns.set_style('white')

# make a countplot
style must be one of white, dark,
sns.countplot(x ='sex', data =
whitegrid, darkgrid, ticks
tips)

Removing Axes
Spines
The despine() is a function that
removes the spines from the
right and upper portion of the
plot by default. sns.despine(left
= True) helps remove the spine
from the left.

Unit V CS3352 Foundations of Data Science 34


import seaborn as sns
import matplotlib.pyplot as pl
t

tips = sns.load_dataset('tips'
)
sns.set_style('white')
sns.countplot(x ='sex', data =
tips)
sns.despine()

Size and aspect


Non grid plot: The figure() is a
matplotlib function used to plot
the figures. The figsize is used
to set the size of the figure.
import seaborn as sns
import matplotlib.pyplot as pl
t

tips = sns.load_dataset('tips'
)
plt.figure(figsize =(12, 3))
sns.countplot(x ='sex', data =
tips)

Scale and Context


The set_context() allows us to override default parameters. This affects things
like the size of the labels, lines, and other elements of the plot, but not the
overall style.
The context are:
 poster
 paper
 notebook
 talk

Unit V CS3352 Foundations of Data Science 35


Example 1: using poster.
import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset('tips')
sns.set_context('poster', font_scale
= 2)
sns.countplot(x ='sex', data = tips,
palette ='coolwarm')

Example 2: Using paper.


import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset('tips')
sns.set_context('paper', font_scale =
2)
sns.countplot(x ='sex', data = tips,
palette = 'coolwarm')

Example 3: Using notebook.


import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset('tips')
sns.set_context('notebook', font_scal
e = 2)
sns.countplot(x ='sex', data = tips,
palette ='coolwarm')

Example 4: Using talk.


import seaborn as sns
import matplotlib.pyplot as plt

Unit V CS3352 Foundations of Data Science 36


tips = sns.load_dataset('tips')
sns.set_context('talk', font_scale =
2)
sns.countplot(x ='sex', data = tips,
palette ='coolwarm')

seaborn color_palette(),
n this article, We are going to see seaborn color_palette(), which can be used
for coloring the plot. Using the palette we can generate the point with
different colors. In this below example we can see the palette can be
responsible for generating the different colormap values.
Syntax: seaborn.color_palette(palette=None, n_colors=None, desat=None)
Parameters:
 palette: Name of palette or None to return current palette.
 n_colors: Number of colors in the palette.
 desat: Proportion to desaturate each color.
Returns: list of RGB tuples or matplotlib.colors.Colormap

Unit V CS3352 Foundations of Data Science 37

You might also like