0% found this document useful (0 votes)
34 views

module 3

Uploaded by

satyasatya255280
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

module 3

Uploaded by

satyasatya255280
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

DATA VISUALIZATION -21AD71 NOTES

DATA VISUALIZATION
21AD71
MODULE-3
Simplifying Visualizations using Seaborn

Visualizations Using Seaborn

Introduction, Advantages of Seaborn Controlling Figure Aesthetics: Seaborn Figure Styles,


Removing Axes Spines, Contexts; Color Palettes: Categorical Color Palettes, Sequential
Color Palettes, Diverging Color Palettes; Interesting Plots in Seaborn: Bar Plots, Kernel
Density Estimation, Plotting Bivariate Distributions,Visualizing Pairwise Relationships,
Violin Plots;

Seaborn is a library for making statistical graphics in Python. It builds on top


of matplotlib and integrates closely with pandas data structures.

Seaborn helps you explore and understand your data. Its plotting functions
operate on data frames and arrays containing whole datasets and internally
perform the necessary semantic mapping and statistical aggregation to produce
informative plots.

 Unlike Matplotlib, S eaborn is not a standalone Python library


 It is built on top of M atplotlib and provides a higher-level abstraction to
make visually appealing statistical visualizations
 Seaborn, attempt to make visualization a central part of data exploration
and understanding.
 Internally, Seaborn operates on DataFrames and arrays that contain the
complete dataset.
 This enables it to perform semantic mappings and statistical aggregation
that are essential for displaying informative visualizations.
 Seaborn can also be used to simply change the style and appearance o
Matplotlib visualizations.

The most prominent features of Seaborn are as follows:

S.VINUTHA, RNSIT CSE-DS 1


DATA VISUALIZATION -21AD71 NOTES

 Beautiful out-of-the-box plots with different themes


 Built-in color palettes that can be used to reveal patterns in the dataset
 A dataset-oriented interface
 A high-level abstraction that still allows for complex visualizations

Advantages of Seaborn
Seaborn, however, is built to operate on DataFrames and full dataset arrays,
which makes this process simpler. It internally performs the necessary semantic
mappings and statistical aggregation to produce informative plots.

 Seaborn uses Matplotlib for plotting.


 Many tasks can be done solely with Seaborn, but advanced customization
may require Matplotlib.
 Users only need to specify variable names and their roles, without
translating them into visualization parameters.
 Seaborn's default parameters improve visualizations compared to
Matplotlib's defaults.
 Users familiar with Matplotlib will find Seaborn's concepts similar and
easy to understand.
1. Controlling Figure Aesthetics
 Matplotlib is highly customizable but can be inconvenient and time-
consuming to adjust parameters.
 Seaborn offers customized themes to simplify the visualization process.
 Seaborn provides a high-level interface for controlling the appearance of
Matplotlib figures.

The following code snippet creates a simple line plot in


Matplotlib:

%matplotlib inline
import matplotlib.pyplot as plt
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
plt.plot(x1, label='Group A')
plt.plot(x2, label='Group B')
plt.legend()
plt.show()

S.VINUTHA, RNSIT CSE-DS 2


DATA VISUALIZATION -21AD71 NOTES

This is what the plot looks with M atplotlib's default parameters

To switch to the Seaborn defaults, simply call the set() function:


%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
plt.plot(x1, label='Group A')
plt.plot(x2, label='Group B')
plt.legend()
plt.show()

S.VINUTHA, RNSIT CSE-DS 3


DATA VISUALIZATION -21AD71 NOTES

Seaborn Figure Styles

 Seaborn offers two methods for controlling plot style: set_style(style,


[rc]) and axes_style(style, [rc]).
 seaborn.set_style(style, [rc]) sets the aesthetic style of plots.
 Parameters:
 style: Accepts a dictionary of parameters or names of preconfigured
styles:
o darkgrid
o whitegrid
o dark
o white
o ticks
 rc (optional): Allows overriding values in the preset Seaborn-style
dictionaries.
Here is an example:

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
S.VINUTHA, RNSIT CSE-DS 4
DATA VISUALIZATION -21AD71 NOTES

plt.plot(x1, label='Group A')


plt.plot(x2, label='Group B')
plt.legend()
plt.show()

 seaborn.axes_style(style, [rc]) returns a parameter dictionary for plot


aesthetics.
 This function can be used within a with statement to temporarily change
style parameters.
 Parameters:
 style: A dictionary of parameters or a name from the pre-configured
styles:
o darkgrid
o whitegrid
o dark
o white
o ticks
 rc (optional): Allows overriding values in the preset Seaborn-style
dictionaries.
Example :

%matplotlib inline
import matplotlib.pyplot as plt

S.VINUTHA, RNSIT CSE-DS 5


DATA VISUALIZATION -21AD71 NOTES

import seaborn as sns


sns.set()
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
with sns.axes_style('dark'):
plt.plot(x1, label='Group A')
plt.plot(x2, label='Group B')
plt.legend()
plt.show()

The aesthetics are only changed temporarily. The result is shown in the
following diagram:

For further customization, you can pass a dictionary of parameters to the rc


argument. You can only override parameters that are part of the style definition.

Removing Axes Spines

The despine() function removes the top and right axes spines from a plot.
Parameters:
 fig (optional): Figure object to apply despine to.
 ax (optional): Axes object to apply despine to.

S.VINUTHA, RNSIT CSE-DS 6


DATA VISUALIZATION -21AD71 NOTES

 top (default=True): Remove the top spine if set to True.


 right (default=True): Remove the right spine if set to True.
 left (default=False): Remove the left spine if set to True.
 bottom (default=False): Remove the bottom spine if set to True.
 offset (optional): Adjusts the position of the remaining spines.
 trim (default=False): Trims the remaining spines to the axes limits if set
to True.

seaborn.despine(fig=None, ax=None, top=True, right=True, \


left=False, bottom=False, \
offset=None, trim=False)

The following code helps to remove the axes spines:


%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("white")
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
plt.plot(x1, label='Group A')
plt.plot(x2, label='Group B')
sns.despine()
plt.legend()

S.VINUTHA, RNSIT CSE-DS 7


DATA VISUALIZATION -21AD71 NOTES

Controlling the Scale of Plot Elements

 A separate set of parameters controls the scale of plot elements. This is a


handy way to use the same code to create plots that are suited for use in
contexts where larger or smaller plots are necessary. To control the
context, two functions can be used.

 seaborn.set_context(context, [font_scale], [rc]) sets the plotting context


parameters. This does not change the overall style of the plot but affects
things such as the size of the labels and lines. The base context is a
notebook, and the other contexts are paper, talk, and poster—versions
of the notebook parameters scaled by 0.8, 1.3, and 1.6, respectively.

 Here are the parameters:


 context: A dictionary of parameters or the name of one of the following
preconfigured sets: paper, notebook, talk, or poster font_scale
(optional): A scaling factor to independently scale the size of font
elements
 rc (optional): Parameter mappings to override the values in the preset
Seaborn context dictionaries

The following code helps set the context

S.VINUTHA, RNSIT CSE-DS 8


DATA VISUALIZATION -21AD71 NOTES

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context("poster")
plt.figure()
x1 = [10, 20, 5, 40, 8]
x2 = [30, 43, 9, 7, 20]
plt.plot(x1, label='Group A')
plt.plot(x2, label='Group B')
plt.legend()
plt.show()

parameter
dictionary to scale Figure elements.

parameters.
Parameters:
 context: A dictionary or name from preconfigured sets: paper, notebook,
talk, or poster.
 font_scale (optional): A scaling factor for font size.
 rc (optional): Overrides values in preset Seaborn context dictionaries.

o
compare IQ scores of different test groups.

S.VINUTHA, RNSIT CSE-DS 9


DATA VISUALIZATION -21AD71 NOTES

Exercise 4.01: Comparing IQ Scores for Different Test Groups by Using


a Box Plot

 The exercise involves generating a box plot using Seaborn to compare IQ


scores among different test groups.
 Demonstrates the ease and efficiency of creating plots with Seaborn,
given a proper DataFrame.
 Highlights how to quickly change the style and context of a Figure using
Seaborn’s pre-configurations.

Create an Exercise4.01.ipynb Jupyter Notebook in the


Chapter04/Exercise4.01 folder to implement this exercise.
2. Import the necessary modules and enable plotting within the
Exercise4.01.ipynb file:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
3. Use the pandas read_csv() function to read the data located in the Datasets
folder:
mydata = pd.read_csv("../../Datasets/iq_scores.csv")
4. Access the data of each test group in the column. Convert this into a list using
the tolist() method. Once the data of each
test group has been converted into a list, assign this list to variables of each
respective test group:
group_a = mydata[mydata.columns[0]].tolist()
group_b = mydata[mydata.columns[1]].tolist()
group_c = mydata[mydata.columns[2]].tolist()
group_d = mydata[mydata.columns[3]].tolist()
5. Print the values of each group to check whether the data inside it is converted
into a list. This can be done with the help of the
print() function:
print(group_a)

The data values of Group A are shown in the following screenshot:

S.VINUTHA, RNSIT CSE-DS 10


DATA VISUALIZATION -21AD71 NOTES

print(group_b)

The following is the code for printing Group C:

print(group_c)

print(group_d)

Once we have the data for each test group, we need to construct a DataFrame
from this data. This can be done with the help of
the pd.DataFrame() function, which is provided by pandas:
data = pd.DataFrame({'Groups': ['Group A'] \
* len(group_a) + ['Group B'] \
* len(group_b) + ['Group C'] \
* len(group_c) + ['Group D'] \
* len(group_d),\
'IQ score': group_a + group_b \
+ group_c + group_d})

If you don't create your own DataFrame, it is often helpful to print the column
names, which is done by calling

S.VINUTHA, RNSIT CSE-DS 11


DATA VISUALIZATION -21AD71 NOTES

print(data.columns). The output is as follows:

You can see that our DataFrame has two variables with the labels Groups and
IQ score. This is especially interesting since
we can use them to specify which variable to plot on the x-axis and which one
on the y-axis.

DataFrame.
-axis variable as "Groups" and the y-axis variable as "IQ score."

plt.figure(dpi=150)
# Set style
sns.set_style('whitegrid')
# Create boxplot
sns.boxplot('Groups', 'IQ score', data=data)
# Despine
sns.despine(left=True, right=True, top=True)
# Add title
plt.title('IQ scores for different test groups')
# Show plot
plt.show()

The despine() function helps in removing the top and right spines from the plot
by default (without passing any arguments
to the function). Here, we have also removed the left spine. Using the title()
function, we have set the title for our plot. The
show() function visualizes the plot.
After executing the preceding steps, the final output should be as follows:

S.VINUTHA, RNSIT CSE-DS 12


DATA VISUALIZATION -21AD71 NOTES

2. Color Palettes

 Color is crucial in visualizations, as it can reveal or obscure data patterns.


 Seaborn simplifies the selection of color palettes suitable for various
tasks.
 The color_palette() function allows for generating color palettes.
 Command: seaborn.color_palette([palette], [n_colors], [desat]) returns a
list of colors defining a color palette.
 Parameters:
o palette (optional): Name of the palette or None to return the
current palette.
o n_colors (optional): Number of colors; excess colors will cycle
through the palette.
o desat (optional): Proportion to desaturate each color.
 Use set_palette() to apply a palette for all plots, accepting the same
arguments as color_palette().
 Choosing an effective color palette is subjective and depends on the data
characteristics.
 There are three main groups of color palettes: categorical, sequential, and
diverging, which will be explored in detail later.
Categorical Color Palettes

S.VINUTHA, RNSIT CSE-DS 13


DATA VISUALIZATION -21AD71 NOTES

Categorical palettes (or qualitative color palettes) are best suited for
distinguishing categorical data that does not have an inherent ordering.

The color palette should have colors as distinct from one another as possible.

 A rule of thumb is that if you have double-digit categories, it is advisable


to divide the categories into groups. Different shades of color could be
used for a group.
 Another way to keep groups apart could be to use hues that are close
together in the color wheel within a group and hues that are far apart for
different groups.
 There are six default themes in Seaborn:
o deep,
o muted,
o bright,
o pastel,
o dark,
o colorblind
 The following is the code to create a deep color palette:
import seaborn as sns
palette1 = sns.color_palette("deep")
sns.palplot(palette1)

The following diagram shows the output of the code:

The following code creates a muted color palette:

palette2 = sns.color_palette("muted")
sns.palplot(palette2)

The following is the output of the code:

S.VINUTHA, RNSIT CSE-DS 14


DATA VISUALIZATION -21AD71 NOTES

The following code creates a bright color palette:

palette3 = sns.color_palette("bright")
sns.palplot(palette3)

The following code creates a pastel color palette:

palette4 = sns.color_palette("pastel")
sns.palplot(palette4)

Here is the output showing a pastel color palette:

The following code creates a dark color palette:

palette5 = sns.color_palette("dark")
sns.palplot(palette5)

The following diagram shows a dark color palette:

The following code creates a colorblind palette:

S.VINUTHA, RNSIT CSE-DS 15


DATA VISUALIZATION -21AD71 NOTES

palette6 = sns.color_palette("colorblind")
sns.palplot(palette6)

Here is the output of the code:

Sequential Color Palettes

 Sequential color palettes are appropriate for sequential data ranges from
low to high values, or vice versa
 It is recommended to use bright colors for low values and dark ones for
high values
 One of the sequential color palettes that Seaborn offers is cubehelix
palettes. They have a linear increase or decrease in brightness and some
variation in hue, meaning that even when converted to black and white,
the information is preserved
 The default palette returned by cubehelix_palette() is illustrated in the
following diagram. To customize the cubehelix palette, the hue at the
start of the helix can be set with start (a value between 0 and 3), or the
number of rotations around the hue wheel can be set with rot:

 Creating custom sequential palettes that only produce colors that start at
either light or dark desaturated colors and end with a specified color can
be accomplished with light_palette() or dark_palette().
 Two examples are given in the following:
custom_palette2 = sns.light_palette("magenta")
sns.palplot(custom_palette2)

S.VINUTHA, RNSIT CSE-DS 16


DATA VISUALIZATION -21AD71 NOTES

The preceding palette can also be reversed by setting the reverse parameter to
True in the following code:

custom_palette3 = sns.light_palette("magenta", reverse=True)


sns.palplot(custom_palette3)

The following diagram shows the output of the code:

By default, creating a color palette only returns a list of colors. If you want to
use it as a colormap object, for example, in combination
with a heatmap, set the as_cmap=True argument, as demonstrated in the
following example:
x = np.arange(25).reshape(5, 5)
ax = sns.heatmap(x, cmap=sns.cubehelix_palette(as_cmap=True))
This creates the following heatmap:

S.VINUTHA, RNSIT CSE-DS 17


DATA VISUALIZATION -21AD71 NOTES

Diverging Color Palettes


 Diverging color palettes are used for data that consists of a well-defined
midpoint. An emphasis is placed on both high and low values.
 The following code snippet and output provides a better understanding of
diverging plots, wherein we use the coolwarm template, which is built
into M atplotlib:
custom_palette4 = sns.color_palette("coolwarm", 7)
sns.palplot(custom_palette4)
The following diagram shows the output of the code:

S.VINUTHA, RNSIT CSE-DS 18


DATA VISUALIZATION -21AD71 NOTES

You can use the diverging_palette() function to create custom-diverging


palettes. We can pass two hues in degrees as
parameters, along with the total number of palettes. The following code snippet
and output provides a better insight:
custom_palette5 = sns.diverging_palette(120, 300, n=7)
sns.palplot(custom_palette5)
The following diagram shows the output of the code:

Excises: Surface Temperature Analysis


Advanced Plots in Seaborn

 Creating bar plots with subgroups was quite tedious, but Seaborn offers a
very convenient way to create various bar plots.
 They can also be used in Seaborn to represent estimates of central
tendency with the height of each bar, while uncertainty is indicated by
error bars at the top of the bar.
import pandas as pd
import seaborn as sns
data = pd.read_csv("../Datasets/salary.csv")
sns.set(style="whitegrid")
sns.barplot(x="Education", y="Salary", hue="District", data=data)

S.VINUTHA, RNSIT CSE-DS 19


DATA VISUALIZATION -21AD71 NOTES

Activity: Movie Comparison Revisited


Kernel Density Estimation
 It is often useful to visualize how variables of a dataset are distributed.
Seaborn offers handy functions to examine univariate and bivariate
distributions.
 One possible way to look at a univariate distribution in Seaborn is by
using the distplot() function. This will draw a histogram and fit a kernel
density estimate (KDE), as illustrated in the following example:

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('../../Datasets/age_salary_hours.csv')
sns.distplot(data.loc[:, 'Age'])
plt.xlabel('Age')
plt.ylabel('Density')

S.VINUTHA, RNSIT CSE-DS 20


DATA VISUALIZATION -21AD71 NOTES

To just visualize the KDE, Seaborn provides the kdeplot() function:

sns.kdeplot(data.loc[:, 'Age'], shade=True)


plt.xlabel('Age')
plt.ylabel('Density')

The KDE plot is shown in the following diagram, along with a shaded
area under the curve:

S.VINUTHA, RNSIT CSE-DS 21


DATA VISUALIZATION -21AD71 NOTES

Plotting Bivariate Distributions

 For visualizing bivariate distributions, we will introduce three different


plots.
 The first two plots use the jointplot() function,which creates a multi-
panel figure that shows both the joint relationship between both variables
and the corresponding marginal distributions.
 A scatter plot shows each observation as points on the x and y axes.
Additionally, a histogram for each variable is shown:
import pandas as pd
import seaborn as sns
data = pd.read_csv('../../Datasets/age_salary_hours.csv')
sns.set(style="white")
sns.jointplot(x="Annual Salary", y="Age", data=data))
The scatter plot with marginal histograms is shown in the following diagram:

S.VINUTHA, RNSIT CSE-DS 22


DATA VISUALIZATION -21AD71 NOTES

It is also possible to use the KDE procedure to visualize bivariate distributions.


The joint distribution is shown as a contour plot, as
demonstrated in the following code:

sns.jointplot('Annual Salary', 'Age', data=subdata, \


kind='kde', xlim=(0, 500000), ylim=(0, 100))

S.VINUTHA, RNSIT CSE-DS 23


DATA VISUALIZATION -21AD71 NOTES

The joint distribution is shown as a contour plot in the center of the diagram.
The darker the color, the higher the density. The marginal distributions are
visualized on the top and on the right.
Visualizing Pairwise Relationships
 For visualizing multiple pairwise relationships in a dataset, Seaborn
offers the pairplot() function
 This function creates a matrix where off-diagonal elements visualize the
relationship between each pair of variables and the diagonal elements
show the marginal distributions.
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv('../../Datasets/age_salary_hours.csv')
sns.set(style="ticks", color_codes=True)

S.VINUTHA, RNSIT CSE-DS 24


DATA VISUALIZATION -21AD71 NOTES

g = sns.pairplot(data, hue='Education')

 A pair plot, also called a correlogram, is shown in the following


diagram. Scatter plots are shown for all variable pairs on the offdiagonal,
while KDEs are shown on the diagonal. Groups are highlighted by
different colors:

Violin Plots
 A different approach to visualizing statistical measures is by using violin
plots. They combine box plots with the kernel density estimation
procedure that we described previously.
 It provides a richer description of the variable's distribution. Additionally,
the quartile and whisker values from the box plot are shown inside the
violin.
The following example demonstrates the usage of violin plots:
import pandas as pd
import seaborn as sns
S.VINUTHA, RNSIT CSE-DS 25
DATA VISUALIZATION -21AD71 NOTES

data = pd.read_csv("../../Datasets/salary.csv")
sns.set(style="whitegrid")
sns.violinplot('Education', 'Salary', hue='Gender', \
data=data, split=True, cut=0)

The result appears as follows Figure

Activity : Comparing IQ Scores for Different Test Groups by Using a


Violin Plot

S.VINUTHA, RNSIT CSE-DS 26

You might also like