Data Visualization Using Matplotlib and Seaborn
Data Visualization Using Matplotlib and Seaborn
Seaborn
Introduction
xlabel(),
Sets labels for the X and Y axes.
Customization ylabel()
Visualization
xlim(), ylim() Sets limits for the X and Y axes.
Control
Figure
figure() Creates or activates a figure.
Management
3. Histogram
A histogram is basically used to represent data provided
in a form of some groups. It is a type of bar plot where the X-axis
represents the bin ranges while the Y-axis gives information about
frequency. The hist() function is used to compute and create
histogram of x.
Syntax:
matplotlib.pyplot.hist(x, bins=None, range=None, density=False,
weights=None, cumulative=False, bottom=None, histtype=’bar’,
align=’mid’, orientation=’vertical’, rwidth=None, log=False,
color=None, label=None, stacked=False, \*, data=None, \*\*kwargs)
Example:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('tips.csv')
x = data['total_bill']
plt.hist(x)
plt.title("Tips Dataset")
plt.ylabel('Frequency')
plt.xlabel('Total Bill')
plt.show()
4. Scatter Plot
Scatter plots are used to observe relationships between
variables. The scatter() method in the matplotlib library is used to
draw a scatter plot.
Syntax:
matplotlib.pyplot.scatter(x_axis_data, y_axis_data, s=None, c=None,
marker=None, cmap=None, vmin=None, vmax=None, alpha=None,
linewidths=None, edgecolors=None
Example:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('tips.csv')
x = data['day']
y = data['total_bill']
plt.scatter(x, y)
plt.title("Tips Dataset")
plt.ylabel('Total Bill')
plt.xlabel('Day')
plt.show()
5. Pie Chart
Pie chart is a circular chart used to display only one
series of data. The area of slices of the pie represents the percentage
of the parts of the data. The slices of pie are called wedges. It can be
created using the pie() method.
Syntax:
matplotlib.pyplot.pie(data, explode=None, labels=None, colors=None,
autopct=None, shadow=False)
Example:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('tips.csv')
cars = ['AUDI', 'BMW', 'FORD',
'TESLA', 'JAGUAR',]
data = [23, 10, 35, 15, 12]
plt.pie(data, labels=cars)
plt.title("Car data")
plt.show()
6. Box Plot
A Box Plot, also known as a Whisker Plot, is a
standardized way of displaying the distribution of data based on a
five-number summary: minimum, first quartile (Q1), median (Q2),
third quartile (Q3), and maximum. It can also show outliers.
Let’s see an example of how to create a Box Plot using Matplotlib in
Python:
Example:
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(10)
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
plt.boxplot(data, vert=True, patch_artist=True,
boxprops=dict(facecolor='skyblue'),
medianprops=dict(color='red'))
plt.xlabel('Data Set')
plt.ylabel('Values')
plt.title('Example of Box Plot')
plt.show()
7. Heatmap
A Heatmap is a data visualization technique that represents
data in a matrix form, where individual values are represented as
colors. Heatmaps are particularly useful for visualizing the
magnitude of data across a two-dimensional surface and identifying
patterns, correlations, and concentrations.
Example:-
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(0)
data = np.random.rand(10, 10)
plt.imshow(data, cmap='viridis', interpolation='nearest')
# Add a color bar to show the scale
plt.colorbar()
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Example of Heatmap')
plt.show()
Errorbar :-
Error bars function used as graphical enhancement that
visualizes the variability of the plotted data on a Cartesian graph.
Error bars can be applied to graphs to provide an additional layer of
detail on the presented data.
# importing matplotlib
import matplotlib.pyplot as plt
x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]
y_error = 0.2
plt.plot(x, y)
plt.errorbar(x, y,
yerr = y_error,
fmt ='o')
# importing matplotlib
import matplotlib.pyplot as plt
x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]
x_error = 0.5
plt.plot(x, y)
plt.errorbar(x, y,
xerr = x_error,
fmt ='o')
# importing matplotlib
import matplotlib.pyplot as plt
x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]
x_error = 0.5
y_error = 0.3
plt.plot(x, y)
plt.errorbar(x, y,
yerr = y_error,
xerr = x_error,
fmt ='o')
3D Scatter Plotting :-
A 3D Scatter Plot is a mathematical diagram, the most
basic version of three-dimensional plotting used to display the
properties of data as three variables of a dataset using the cartesian
coordinates.To create a 3D Scatter plot, Matplotlib’s mplot3d
toolkit is used to enable three dimensional plotting.Generally 3D
scatter plot is created by using ax.scatter3D() the function of the
matplotlib library which accepts a data sets of X, Y and Z to create
the plot while the rest of the attributes of the function are the same
as that of two dimensional scatter plot.
Example:-
# Import libraries
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
z = np.random.randint(100, size =(50))
x = np.random.randint(80, size =(50))
y = np.random.randint(60, size =(50))
fig = plt.figure(figsize = (10, 7))
ax = plt.axes(projection ="3d")
ax.scatter3D(x, y, z, color = "green")
plt.title("simple 3D scatter plot")
plt.show()
Example:-
# Import libraries
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
z = 4 * np.tan(np.random.randint(10, size =(500))) +
np.random.randint(100, size =(500))
x = 4 * np.cos(z) + np.random.normal(size = 500)
y = 4 * np.sin(z) + 4 * np.random.normal(size = 500)
fig = plt.figure(figsize = (16, 9))
ax = plt.axes(projection ="3d")
ax.grid(b = True, color ='grey',
linestyle ='-.', linewidth = 0.3,
alpha = 0.2)
my_cmap = plt.get_cmap('hsv')
sctt = ax.scatter3D(x, y, z,
alpha = 0.8,
c = (x + y + z),
cmap = my_cmap,
marker ='^')
3D Surface plotting
A Surface Plot is a representation of three-dimensional
dataset. It describes a functional relationship between two
independent variables X and Z and a designated dependent variable
Y, rather than showing the individual data points. It is a companion
plot of the contour plot. It is similar to the wireframe plot, but each
face of the wireframe is a filled polygon. This helps to create the
topology of the surface which is being visualized.
Attribute Description
Example:
# Import libraries
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
x = np.outer(np.linspace(-3, 3, 32), np.ones(32))
y = x.copy().T # transpose
z = (np.sin(x **2) + np.cos(y **2) )
fig = plt.figure(figsize =(14, 9))
ax = plt.axes(projection ='3d')
ax.plot_surface(x, y, z)
plt.show()
fig.colorbar(surf, ax = ax,
shrink = 0.5, aspect = 5)
ax.set_title('Surface plot')
plt.show()
triangular
Matplotlib is a library in Python and it is numerical
– mathematical extension for NumPy library. Pyplot is a state-based
interface to a Matplotlib module which provides a MATLAB-like
interface. There are various plots which can be used in Pyplot are
Line Plot, Contour, Histogram, Scatter, 3D Plot, etc.
matplotlib.pyplot.triplot() Function:
The triplot() function in pyplot module of matplotlib
library is used to draw a unstructured triangular grid as lines
and/or markers.
Syntax: matplotlib.pyplot.triplot(*args, **kwargs)
Parameters: This method accept the following parameters that are
described below:
x, y: These parameter are the x and y coordinates of the data
which is to be plot.
triangulation: This parameter is
a matplotlib.tri.Triangulation object.
**kwargs: This parameter is Text properties that is used to
control the appearance of the labels.
All remaining args and kwargs are the same as for
matplotlib.pyplot.plot().
Returns: This returns the list of 2 Line2D containing following:
The lines plotted for triangles edges.
The markers plotted for triangles nodes
Below examples illustrate the matplotlib.pyplot.triplot() function in
matplotlib.pyplot:
Example 1:
# Create triangulation.
x = np.asarray([0, 1, 2, 3, 0.5, 1.5,
2.5, 1, 2, 1.5])
y = np.asarray([0, 0, 0, 0, 1.0,
1.0, 1.0, 2, 2, 3.0])
plt.tricontourf(triang, z)
plt.triplot(triang, 'go-')
plt.title('matplotlib.pyplot.triplot() Example')
plt.show()
Example 2:
# Implementation of matplotlib function
import matplotlib.pyplot as plt
import matplotlib.tri as tri
import numpy as np
n_angles = 24
n_radii = 9
min_radius = 0.5
radii = np.linspace(min_radius, 0.9,
n_radii)
x = (radii * np.cos(angles)).flatten()
y = (radii * np.sin(angles)).flatten()
triang = tri.Triangulation(x, y)
triang.set_mask(np.hypot(x[triang.triangles].mean(axis = 1),
y[triang.triangles].mean(axis = 1))
< min_radius)
plt.triplot(triang, 'go-', lw = 1)
plt.title('matplotlib.pyplot.triplot() Example')
plt.show()
Python Seaborn – Catplot
What is Seaborn:-
Parameters
x, y, hue: names of variables in data
Inputs for plotting long-form data.
data: DataFrame
Long-form (tidy) dataset for plotting. Each column should
correspond to a variable, and each row should correspond to an
observation.
row, col: names of variables in data, optional
Categorical variables that will determine the faceting of the grid.
kind: str, optional
The kind of plot to draw, corresponds to the name of a categorical
axes-level plotting function. Options are: “strip”, “swarm”, “box”,
“violin”, “boxen”, “point”, “bar”, or “count”.
color: matplotlib color, optional
Color for all of the elements, or seed for a gradient palette.
palette: palette name, list, or dict
Colors to use for the different levels of the hue variable. Should
be something that can be interpreted by color_palette(), or a
dictionary mapping hue levels to matplotlib colors.
kwargs: key, value pairings
Other keyword arguments are passed through to the underlying
plotting function.
Examples:
If you are working with data that involves any
categorical variables like survey responses, your best tools to
visualize and compare different features of your data would be
categorical plots. Plotting categorical plots it is very easy in
seaborn. In this example x,y and hue take the names of the
features in your data. Hue parameters encode the points with
different colors with respect to the target variable.
.
Example:-
import matplotlib.pyplot as plt
import seaborn as sns
exercise = sns.load_dataset("exercise")
sns.catplot(x="time", y="pulse",
hue="kind",
data=exercise)
plt.show()
Example:-
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="ticks")
exercise = sns.load_dataset("exercise")
g = sns.catplot(x="time",
kind="count",
data=exercise)
plt.show()
Example:-
import seaborn as sns
import matplotlib.pyplot as plt
exercise = sns.load_dataset("exercise")
g = sns.catplot(x="time",
y="pulse",
kind="bar",
data=exercise)
plt.show()
Example:-
import matplotlib.pyplot as plt
import seaborn as sns
exercise = sns.load_dataset("exercise")
g = sns.catplot(x="pulse",
y="time",
kind="bar",
data=exercise)
plt.show()
Example:-
import matplotlib.pyplot as plt
import seaborn as sns
exercise = sns.load_dataset("exercise")
g = sns.catplot(x="time",
y="pulse",
hue="kind",
data=exercise,
kind="violin")
plt.show()
Example:-
import matplotlib.pyplot as plt
import seaborn as sns
exercise = sns.load_dataset("exercise")
g = sns.catplot(x="time",
y="pulse",
hue="kind",
col="diet",
data=exercise)
plt.show()
Make many column facets and wrap them into the rows of the grid.
The aspect will change the width while keeping the height constant.
Example:-
import matplotlib.pyplot as plt
titanic = sns.load_dataset("titanic")
g = sns.catplot(x="alive", col="deck", col_wrap=4,
data=titanic[titanic.deck.notnull()],
kind="count", height=2.5, aspect=.8)
plt.show()
Example:-
g = sns.catplot(x="age", y="embark_town",
hue="sex", row="class",
data=titanic[titanic.embark_town.notnull()],
orient="h", height=2, aspect=3, palette="Set3",
kind="violin", dodge=True, cut=0, bw=.2)
Box plots are visuals that can be a little difficult to understand but
depict the distribution of data very beautifully. It is best to start the
explanation with an example of a box plot. I am going to use one of
the common built-in datasets in Seaborn:
Example:-
tips = sns.load_dataset('tips')
sns.catplot(x='day',
y='total_bill',
data=tips,
kind='box');
Density Plot
Density Plot is the continuous and smoothed version
of the Histogram estimated from the data. It is estimated
through Kernel Density Estimation.
In this method Kernel (continuous curve) is drawn at every
individual data point and then all these curves are added together
to make a single smoothened density estimation. Histogram fails
when we want to compare the data distribution of a single variable
over the multiple categories at that time Density Plot is useful for
visualizing the data.
Approach:
Import the necessary libraries.
Create or import a dataset from seaborn library.
Select the column for which we have to make a plot.
For making the plot we are using distplot() function provided
by seaborn library for plotting Histogram and Density Plot
together in which we have to pass the dataset column.
We can also make Histogram and Density Plot individually
using distplot() function according to our needs.
For creating Histogram individually we have to
pass hist=False as a parameter in the distplot() function.
For creating Density Plot individually we have to
pass kde=False as a parameter in the distplot() function.
Now after making the plot we have to visualize that, so for
visualization, we have to use show() function provided
by matplotlib.pyplot library.
Example 1:
import seaborn as sns
df = sns.load_dataset('diamonds')
print(df)
Joint Plot
o Draw a plot of two variables with bivariate and univariate
graphs. This function provides a convenient interface to the
‘JointGrid’ class, with several canned plot kinds. This is
intended to be a fairly lightweight wrapper; if you need more
flexibility, you should use :class:’JointGrid’ directly.
o Syntax: seaborn.jointplot(x, y, data=None, kind=’scatter’,
stat_func=None, color=None, height=6, ratio=5, space=0.2,
dropna=True, xlim=None, ylim=None, joint_kws=None,
marginal_kws=None, annot_kws=None, **kwargs)
o Parameters: The description of some main parameters are
given below:
o x, y: These parameters take Data or names of variables in
“data”.
o data: (optional) This parameter take DataFrame when “x” and
“y” are variable names.
o kind: (optional) This parameter take Kind of plot to draw.
o color: (optional) This parameter take Color used for the plot
elements.
o dropna: (optional) This parameter take boolean value, If True,
remove observations that are missing from “x” and “y”.
o Return: jointgrid object with the plot on it.
.
Example
Example
import seaborn as sns
import matplotlib.pyplot as plt
data = sns.load_dataset("exercise")
sns.jointplot(x = "id", y = "pulse",
kind = "kde", data = data)
plt.show()
Example