Python Pandas - Density Plot



A Density Plot, also known as a Kernel Density Estimate (KDE) plot, is a non-parametric way to estimate the Probability Density Function (PDF) of a random variable.

It is commonly used to visualize the distribution of data. It is closely related to histograms, instead of using bars to represent data distribution, density plot creates a smooth curve that estimates the Probability Density Function (PDF) of a dataset by applying Gaussian kernels to smooth the data points.

In this tutorial, we will learn about creating and customizing density plots using Pandas library with different examples.

Density Plot in Pandas

In Pandas, you can easily create Density Plots using the plot.kde() or plot.density() methods available for both Series and DataFrame objects. These methods internally use Matplotlib and return either a matplotlib.axes.Axes object or NumPy array np.ndarray of Axes.

Syntax

Following is the syntax of the plot.kde() or plot.density() method −

plot.kde(bw_method=None, ind=None, **kwargs)

Where,

  • bw_method: Specifies the method to calculate the bandwidth for the kernel. This can be 'scott', 'silverman', a scalar, or callable, by default it uses the 'scott'.

  • ind: Specifies the evaluation points for the KDE. Can be an integer or a NumPy array for custom evaluation points. By default, 1000 equally spaced points are used.

  • **kwargs: Additional arguments for plot customization.

Density Plots for Series

For creating a density plot for a single Pandas Series object, you can use the series.plot.kde() or plot.density() methods.

Example

Here's an example of how to create a simple density plot for a Pandas Series object. This plot visualizes the data distribution as a smooth curve.

Open Compiler
import pandas as pd import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Sample data series = pd.Series([1, 2, 2.5, 3, 3.5, 4, 5]) # Create density plot densityplot = series.plot.kde() # Set the title and display the plot plt.title("Simple Density Plot") plt.show()

Following is the output of the above code −

Density Plot for Series

Density Plots for DataFrame

You can also create a density plot for an entire DataFrame or specific columns of the DataFrame by using the DataFrame.plot.kde() or DataFrame.plot.density() methods.

Example

The following example demonstrates how to generate a density plot for a specific attribute of a DataFrame.

Open Compiler
import pandas as pd import matplotlib.pyplot as plt import numpy as np plt.rcParams["figure.figsize"] = [7, 4] # Generate random data df = pd.DataFrame(np.random.normal(size=(10, 4)), columns=["Col1", "Col2", "Col3", "Col4"]) # Create density plot for a specific attribute of a DataFrame df.Col1.plot.kde() # Set the title and display the plot plt.title("Density Plot for a DataFrame column") plt.show()

On executing the above code we will get the following output −

Density Plot for DataFrame

Multiple Density Plots on the Same Axes

You can overlay multiple density plots on the same axes. Which is useful for comparing the distributions of multiple columns.

Example

The following example demonstrates creating a multiple density plots on the same axes using the DataFrame.plot.kde() method.

Open Compiler
import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame df = pd.DataFrame(np.random.normal(size=(100, 4)), columns=["Col1", "Col2", "Col3", "Col4"]) # Plot density plots ax = df.plot.kde() # Set the title and display the plot plt.title("Multiple Density Plots") plt.show()

Following is the output of the above code −

Multiple Density Plot

Adjusting Bandwidth of the Density Plot

The bw_method parameter controls the smoothness of the density plot. Smaller values may lead to over-fitting, while larger values result in under-fit the data.

Example: Density plot for Small Bandwidth

This example uses the bw_method parameter to adjust the bandwidth of the density plot for small bandwidth.

Open Compiler
import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame df = pd.DataFrame(np.random.normal(size=(100, 4)), columns=["Col1", "Col2", "Col3", "Col4"]) # Small bandwidth df.plot.kde(bw_method=0.3) plt.title("Density Plot with Small Bandwidth (DataFrame)") plt.show()

Following is the output of the above code −

Density Plot for Small Bandwidth

Example: Density plot for Large Bandwidth

This example uses the bw_method parameter to adjust the bandwidth of the density plot for large bandwidth.

Open Compiler
import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Create a DataFrame df = pd.DataFrame(np.random.normal(size=(100, 4)), columns=["Col1", "Col2", "Col3", "Col4"]) # Large bandwidth df.plot.kde(bw_method=3) plt.title("Density Plot with Large Bandwidth (DataFrame)") plt.show()

Following is the output of the above code −

Density Plot for Large Bandwidth

Customizing Evaluation Points

To customize the evaluation point, you can use the ind parameter. This allows you to control the specific points at which the KDE is calculated.

Example

The following example demonstrates customizing the evaluation points of the density plot by using the ind parameter of the plot.kde() method.

Open Compiler
import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = [7, 4] # Generate random data df = pd.DataFrame(np.random.normal(size=(10, 4)), columns=["Col1", "Col2", "Col3", "Col4"]) # Create density plot df.plot.kde(ind=[-2, -1, 0, 1, 2, 3]) plt.title("Density Plot with Custom Evaluation Points (DataFrame)") plt.show()

On executing the above code we will get the following output −

Density Plot with Custom Evaluation
Advertisements