Python Pandas - Density Plot



A Density Plot, also known as a Kernel Density Estimate (KDE) plot, is a non-parametric way to estimate the Probability Density Function (PDF) of a random variable.

It is commonly used to visualize the distribution of data. It is closely related to histograms, instead of using bars to represent data distribution, density plot creates a smooth curve that estimates the Probability Density Function (PDF) of a dataset by applying Gaussian kernels to smooth the data points.

In this tutorial, we will learn about creating and customizing density plots using Pandas library with different examples.

Density Plot in Pandas

In Pandas, you can easily create Density Plots using the plot.kde() or plot.density() methods available for both Series and DataFrame objects. These methods internally use Matplotlib and return either a matplotlib.axes.Axes object or NumPy array np.ndarray of Axes.

Syntax

Following is the syntax of the plot.kde() or plot.density() method −

plot.kde(bw_method=None, ind=None, **kwargs)

Where,

  • bw_method: Specifies the method to calculate the bandwidth for the kernel. This can be 'scott', 'silverman', a scalar, or callable, by default it uses the 'scott'.

  • ind: Specifies the evaluation points for the KDE. Can be an integer or a NumPy array for custom evaluation points. By default, 1000 equally spaced points are used.

  • **kwargs: Additional arguments for plot customization.

Density Plots for Series

For creating a density plot for a single Pandas Series object, you can use the series.plot.kde() or plot.density() methods.

Example

Here's an example of how to create a simple density plot for a Pandas Series object. This plot visualizes the data distribution as a smooth curve.

import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = [7, 4]

# Sample data
series = pd.Series([1, 2, 2.5, 3, 3.5, 4, 5])

# Create density plot
densityplot = series.plot.kde()

# Set the title and display the plot
plt.title("Simple Density Plot")
plt.show()

Following is the output of the above code −

Density Plot for Series

Density Plots for DataFrame

You can also create a density plot for an entire DataFrame or specific columns of the DataFrame by using the DataFrame.plot.kde() or DataFrame.plot.density() methods.

Example

The following example demonstrates how to generate a density plot for a specific attribute of a DataFrame.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

plt.rcParams["figure.figsize"] = [7, 4]

# Generate random data
df = pd.DataFrame(np.random.normal(size=(10, 4)), columns=["Col1", "Col2", "Col3", "Col4"])

# Create density plot for a specific attribute of a DataFrame
df.Col1.plot.kde()

# Set the title and display the plot
plt.title("Density Plot for a DataFrame column")
plt.show()

On executing the above code we will get the following output −

Density Plot for DataFrame

Multiple Density Plots on the Same Axes

You can overlay multiple density plots on the same axes. Which is useful for comparing the distributions of multiple columns.

Example

The following example demonstrates creating a multiple density plots on the same axes using the DataFrame.plot.kde() method.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = [7, 4]

# Create a DataFrame
df = pd.DataFrame(np.random.normal(size=(100, 4)), columns=["Col1", "Col2", "Col3", "Col4"])

# Plot density plots
ax = df.plot.kde()

# Set the title and display the plot
plt.title("Multiple Density Plots")
plt.show()

Following is the output of the above code −

Multiple Density Plot

Adjusting Bandwidth of the Density Plot

The bw_method parameter controls the smoothness of the density plot. Smaller values may lead to over-fitting, while larger values result in under-fit the data.

Example: Density plot for Small Bandwidth

This example uses the bw_method parameter to adjust the bandwidth of the density plot for small bandwidth.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = [7, 4]

# Create a DataFrame
df = pd.DataFrame(np.random.normal(size=(100, 4)), columns=["Col1", "Col2", "Col3", "Col4"])

# Small bandwidth
df.plot.kde(bw_method=0.3)
plt.title("Density Plot with Small Bandwidth (DataFrame)")
plt.show()

Following is the output of the above code −

Density Plot for Small Bandwidth

Example: Density plot for Large Bandwidth

This example uses the bw_method parameter to adjust the bandwidth of the density plot for large bandwidth.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = [7, 4]

# Create a DataFrame
df = pd.DataFrame(np.random.normal(size=(100, 4)), columns=["Col1", "Col2", "Col3", "Col4"])

# Large bandwidth
df.plot.kde(bw_method=3)
plt.title("Density Plot with Large Bandwidth (DataFrame)")
plt.show()

Following is the output of the above code −

Density Plot for Large Bandwidth

Customizing Evaluation Points

To customize the evaluation point, you can use the ind parameter. This allows you to control the specific points at which the KDE is calculated.

Example

The following example demonstrates customizing the evaluation points of the density plot by using the ind parameter of the plot.kde() method.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = [7, 4]

# Generate random data
df = pd.DataFrame(np.random.normal(size=(10, 4)), columns=["Col1", "Col2", "Col3", "Col4"])

# Create density plot

df.plot.kde(ind=[-2, -1, 0, 1, 2, 3])
plt.title("Density Plot with Custom Evaluation Points (DataFrame)")
plt.show()

On executing the above code we will get the following output −

Density Plot with Custom Evaluation
Advertisements