0% found this document useful (0 votes)
1 views8 pages

Advanced Plot Types With Seaborn

The document discusses advanced plotting techniques using the Seaborn library for data visualization, emphasizing its ability to create informative statistical graphics. It covers various statistical plotting functions such as distplot, boxplot, violinplot, scatter plots, and heatmaps, providing examples and code snippets for visualizing univariate distributions and pairwise relationships. Seaborn's features facilitate the exploration of complex datasets, making it an essential tool for effective data analysis.

Uploaded by

priya verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views8 pages

Advanced Plot Types With Seaborn

The document discusses advanced plotting techniques using the Seaborn library for data visualization, emphasizing its ability to create informative statistical graphics. It covers various statistical plotting functions such as distplot, boxplot, violinplot, scatter plots, and heatmaps, providing examples and code snippets for visualizing univariate distributions and pairwise relationships. Seaborn's features facilitate the exploration of complex datasets, making it an essential tool for effective data analysis.

Uploaded by

priya verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Advanced Plot Types With Seaborn

Data visualization is a crucial aspect of data analysis, enabling us to uncover


patterns, trends, and relationships within our datasets. Seaborn, a powerful
data visualization library built on top of Matplotlib, offers a high-level interface
for creating aesthetically pleasing and informative statistical graphics. In this
section, you will explore Seaborn's statistical plotting functions, which
provide an easy-to-use yet flexible way to visualise various types of data.

Seaborn is particularly adept at visualising complex datasets with multiple


variables. Its default themes and colour palettes make it simple to create
visually appealing plots with minimal effort. Let's delve into some key features
of Seaborn's statistical plotting functions.

Introduction to Seaborn's Statistical Plotting Functions


Seaborn extends beyond basic plotting by offering specialised functions for
statistical analysis.

These include:

a. distplot, histplot, kdeplot for univariate distribution plots.

b. boxplot for creating box plots to show distributions with respect to


categories.

c. violinplot for combining box plots and kernel density plots.

d. scatter plots to visualize relationships between pairs of variables.

e. heatmap for visualizing matrices like correlation matrices

Creating Informative Visualizations for Statistical Analysis


Seaborn excels at creating visualizations that convey valuable insights in
statistical analysis. Its functions are designed to handle complex relationships
between variables, making it an ideal choice for exploring and presenting data.
To get started, let us consider a basic example using Seaborn to visualise the
distribution of a univariate dataset.

1|Page
Example 1: Visualizing Univariate Distribution

import seaborn as sns


import matplotlib.pyplot as plt

# Load a sample dataset


tips = sns.load_dataset("tips")

# Create a histogram using Seaborn


sns.histplot(tips["total_bill"], kde=True, bins=30,
color='skyblue')

# Set plot labels and title


plt.xlabel("Total Bill Amount")
plt.ylabel("Frequency")
plt.title("Distribution of Total Bill Amount")

# Show the plot


plt.show()

Output

In this example, we have use the histplot function to create a histogram of


the "total_bill" column from the "tips" dataset. The kde=True parameter
adds a kernel density estimate to visualise the probability density function.
This simple yet informative plot gives us a quick overview of the distribution
of total bill amounts.

2|Page
Creating Pair Plots to Visualize Pairwise Relationships

import seaborn as sns


import pandas as pd
import matplotlib.pyplot as plt

# Example DataFrame
data = {
'A': [1, 2, 3, 4, 5],
'B': [5, 4, 3, 2, 1],
'C': [1, 3, 5, 3, 1]
}
df = pd.DataFrame(data)

# Create pair plot


sns.pairplot(df)
plt.show()

Output

3|Page
Here is the breakdown of the code step-by-step:

1. Import Libraries:
• seaborn as sns: Seaborn is a statistical data visualization library
based on Matplotlib. It provides a high-level interface for drawing
attractive and informative statistical graphics.
• pandas as pd: Pandas is a powerful data manipulation and analysis
library for Python. It provides data structures like DataFrames, which
are perfect for handling tabular data.
• matplotlib.pyplot as plt: Matplotlib is a plotting library for Python.
Pyplot is a module in Matplotlib that provides a MATLAB-like
interface for plotting.

2. Create a DataFrame:
• A dictionary data is defined with three keys ('A', 'B', and 'C') each
associated with a list of numbers.
• pd.DataFrame(data): This converts the dictionary into a Pandas
DataFrame. The DataFrame df will have three columns ('A', 'B', and
'C') with the corresponding values.

3. Generate and Display the Pair Plot:


• sns.pairplot(df): This function creates a grid of pairwise plots for the
DataFrame df. It automatically creates scatter plots for each pair of
numerical variables and histograms for the univariate distributions
along the diagonal.
• plt.show(): This function from Matplotlib displays the generated plots.

Here are some additional parameters you can use to customize the pairplot:

• hue: Adds a categorical variable for colour encoding.


• markers: Specifies different markers for different levels of the hue
variable.
• diag_kind: Specifies the kind of plot for the diagonal elements ('auto',
'hist', 'kde').
• plot_kws: Additional keyword arguments for the scatter plots.
• diag_kws: Additional keyword arguments for the diagonal plots.

4|Page
Generating Heatmaps for Correlation and Categorical Data

Heatmaps are effective for visualizing the relationships between variables


in a dataset, especially when dealing with correlation matrices or
categorical data. Seaborn's heatmap function makes it easy to create
visually appealing and informative heatmaps.

Correlation Heatmap: This heatmap shows the pairwise correlations


between numerical variables in your dataset. It helps in understanding how
variables are related to each other in terms of their numerical values (e.g.,
how changes in one variable relate to changes in another). Here is an
example of correlation heatmap:

import seaborn as sns


import pandas as pd
import matplotlib.pyplot as plt

# Example DataFrame for correlation


data = {
'A': [1, 2, 3, 4, 5],
'B': [5, 4, 3, 2, 1],
'C': [1, 3, 5, 3, 1]
}
df = pd.DataFrame(data)

# Compute the correlation matrix


correlation_matrix = df.corr()

# Create a heatmap
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm',
vmin=-1, vmax=1)
plt.title('Correlation Heatmap')
plt.show()

5|Page
Output

Here is the breakdown of the code step-by-step.

1. Import Libraries: Same as before, we import necessary libraries


(seaborn, pandas, and matplotlib.pyplot).

2. Example DataFrame: Define a sample DataFrame (df) with columns 'A',


'B', and 'C' containing numerical data.

3. Compute the Correlation Matrix:


• df.corr(): Computes the pairwise correlation of columns in df. This
gives us a matrix where each element represents the correlation
coefficient between two variables.

4. Create the Heatmap:


• sns.heatmap(): This function creates a heatmap using Seaborn.

o correlation_matrix: Pass the correlation matrix computed earlier.

o annot=True: Displays the correlation values in each cell.

o cmap='coolwarm': Color map for the heatmap, here 'coolwarm'


represents a spectrum from blue (negative correlation) to red
(positive correlation).

o vmin=-1, vmax=1: Set the range of values for the color map to
indicate the correlation strength from -1 (strong negative
correlation) to 1 (strong positive correlation).

6|Page
• plt.title('Correlation Heatmap'): Adds a title to the heatmap plot.

• plt.show(): Displays the heatmap.

Categorical Data Heatmap: This type of heatmap is used when you have
categorical data and want to visualize how different categories are
distributed across numerical values. It shows counts or frequencies of
categories for each value, providing insights into distribution patterns.

Here is the break down the code step-by-step:

# Example DataFrame for categorical data


data_cat = {
'Category': ['A', 'A', 'B', 'B', 'A', 'B', 'A', 'B'],
'Value': [1, 2, 3, 4, 5, 6, 7, 8]
}
df_cat = pd.DataFrame(data_cat)

# Pivot table to prepare data for heatmap


pivot_table = df_cat.pivot_table(index='Category',
columns='Value', aggfunc=len, fill_value=0)

# Create a categorical heatmap


sns.heatmap(pivot_table, annot=True, cmap='viridis')
plt.title('Categorical Heatmap')
plt.show()

Output

7|Page
Here is the breakdown of the code step-by-step:

1. Example DataFrame for Categorical Data: data_cat contains a


categorical column 'Category' and a numerical column 'Value'.

2. Prepare Data for Heatmap:


a. df_cat.pivot_table(): Pivot the DataFrame to aggregate the counts of
each category ('A' and 'B') against each value in 'Value'. Here, len is
used as the aggregation function to count occurrences.

b. fill_value=0: Replace missing values with 0.

3. Create the Categorical Heatmap:


a. sns.heatmap(): Similar to the correlation heatmap, but here it
visualizes counts of categories against numerical values.
i. pivot_table: Pass the pivoted DataFrame.

ii. annot=True: Displays the counts in each cell.

iii. cmap='viridis': Color map for the heatmap.

4. Display the Heatmap:


a. plt.title('Categorical Heatmap'): Sets the title for the plot.

b. plt.show(): Displays the categorical heatmap.

Seaborn's statistical plotting functions offer a versatile and efficient


way to visualize data for statistical analysis. Whether you're exploring
univariate distributions, analysing pairwise relationships, or
examining correlation matrices, Seaborn provides an intuitive and
aesthetically pleasing interface. By incorporating these functions into
your data analysis workflow, you can gain valuable insights and
communicate your findings effectively. Experiment with different
datasets and customization options to discover the full potential of
Seaborn's visualization capabilities.

8|Page

You might also like