Adding Legend to Boxplot with Multiple Plots
Last Updated :
23 Jul, 2025
Boxplots are an effective way to visualize the distribution of a dataset. When analyzing multiple datasets simultaneously, it can become challenging to differentiate between them without a clear legend. This article will guide you through the process of adding a legend to a Matplotlib boxplot with multiple plots on the same axis, ensuring clarity and effectiveness in your visualizations.
Understanding Boxplots in Matplotlib
Before diving into the specifics of adding legends, it's important to understand how boxplots are created in Matplotlib. A boxplot is a graphical representation of the distribution of a set of data based on a five-number summary: minimum, first quartile, median, third quartile, and maximum.
When dealing with multiple boxplots on the same axis, legends are crucial for identifying which boxplot corresponds to which dataset. Without legends, the plot can be confusing and difficult to interpret.
Here is a simple example of creating a boxplot in Matplotlib:
Python
import matplotlib.pyplot as plt
data1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data2 = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
plt.boxplot([data1, data2])
plt.title("Multiple Boxplots")
plt.show()
Output:
Boxplots in MatplotlibBoxplot with Legend - Customizing the Appearance
To make the plot informative, adding a legend is essential, especially when multiple datasets are represented on the same axes.
# Create custom labels
labels = ['Dataset 1', 'Dataset 2', 'Dataset 3']
# Add a legend
ax.legend([box['boxes'][i] for i in range(len(colors))], labels)
Let's implement an complete code for visualizing boxplot with legend:
Python
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(10)
data1 = np.random.normal(loc=20, scale=5, size=100)
data2 = np.random.normal(loc=30, scale=10, size=100)
data3 = np.random.normal(loc=25, scale=7, size=100)
# Combine data into a list
data = [data1, data2, data3]
# Create a figure and axis
fig, ax = plt.subplots()
# Create the boxplot
box = ax.boxplot(data, patch_artist=True)
# Set labels for x-axis
ax.set_xticklabels(['Dataset 1', 'Dataset 2', 'Dataset 3'])
# Define colors for each dataset
colors = ['lightblue', 'lightgreen', 'lightcoral']
# Apply colors to each boxplot
for patch, color in zip(box['boxes'], colors):
patch.set_facecolor(color)
# Create custom labels
labels = ['Dataset 1', 'Dataset 2', 'Dataset 3']
# Add a legend
ax.legend([box['boxes'][i] for i in range(len(colors))], labels)
plt.title('Boxplot with Legend for Multiple Datasets')
plt.show()
Output:
Boxplots in MatplotlibModifying Legend Position in Boxplot
By default, the legend appears in the upper right corner of the plot. You can customize its position for better visibility.
Python
box = ax.boxplot(data, patch_artist=True)
# Set labels for x-axis
ax.set_xticklabels(['Dataset 1', 'Dataset 2', 'Dataset 3'])
# Define colors for each dataset
colors = ['lightblue', 'lightgreen', 'lightcoral']
# Apply colors to each boxplot
for patch, color in zip(box['boxes'], colors):
patch.set_facecolor(color)
# Create custom labels
labels = ['Dataset 1', 'Dataset 2', 'Dataset 3']
# Add a legend
# Positioning the legend
ax.legend([box['boxes'][i] for i in range(len(colors))], labels, loc='upper left')
# Show the plot
plt.title('Boxplot with Legend for Multiple Datasets')
plt.show()
Output:
Boxplots in MatplotlibHandling Multiple Boxplots in a Loop
If you are plotting multiple boxplots in a loop, you can manage the legend labels and handles within the loop. Here’s an example:
Python
import matplotlib.pyplot as plt
data1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data2 = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
data3 = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
datasets = [data1, data2, data3]
labels = ['Dataset 1', 'Dataset 2', 'Dataset 3']
colors = ['blue', 'orange', 'green']
fig, ax = plt.subplots()
for data, label, color in zip(datasets, labels, colors):
bp = ax.boxplot(data, patch_artist=True)
for patch in bp['boxes']:
patch.set_facecolor(color)
patch.set_label(label)
ax.legend()
plt.title("Multiple Boxplots with Legend")
plt.show()
Output:
Boxplots in MatplotlibThis method ensures that each boxplot is correctly labeled and colored, and the legend reflects these labels.
Boxplot Using Custom Legend Handles
For older versions of Matplotlib or for more customized control, you can create custom legend handles. Here’s an example using matplotlib.patches:
Python
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
data1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data2 = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
plt.boxplot([data1, data2])
legend_handle1 = mpatches.Patch(color='blue', label='Dataset 1')
legend_handle2 = mpatches.Patch(color='orange', label='Dataset 2')
plt.legend(handles=[legend_handle1, legend_handle2])
plt.title("Boxplot with Custom Legend")
plt.show()
Output:
Boxplots in MatplotlibConclusion
Adding legends to boxplots in Matplotlib is a crucial step in ensuring that your data visualizations are clear and interpretable. By using the label parameter in newer versions of Matplotlib or creating custom legend handles, you can effectively manage and customize the legends for your boxplots.
Similar Reads
Python - Data visualization tutorial Data visualization is the process of converting complex data into graphical formats such as charts, graphs, and maps. It allows users to understand patterns, trends, and outliers in large datasets quickly and clearly. By transforming data into visual elements, data visualization helps in making data
5 min read
What is Data Visualization and Why is It Important? Data visualization uses charts, graphs and maps to present information clearly and simply. It turns complex data into visuals that are easy to understand.With large amounts of data in every industry, visualization helps spot patterns and trends quickly, leading to faster and smarter decisions.Common
4 min read
Data Visualization using Matplotlib in Python Matplotlib is a widely-used Python library used for creating static, animated and interactive data visualizations. It is built on the top of NumPy and it can easily handles large datasets for creating various types of plots such as line charts, bar charts, scatter plots, etc. Visualizing Data with P
11 min read
Data Visualization with Seaborn - Python Seaborn is a popular Python library for creating attractive statistical visualizations. Built on Matplotlib and integrated with Pandas, it simplifies complex plots like line charts, heatmaps and violin plots with minimal code.Creating Plots with SeabornSeaborn makes it easy to create clear and infor
9 min read
Data Visualization with Pandas Pandas is a powerful open-source data analysis and manipulation library for Python. The library is particularly well-suited for handling labeled data such as tables with rows and columns. Pandas allows to create various graphs directly from your data using built-in functions. This tutorial covers Pa
6 min read
Plotly for Data Visualization in Python Plotly is an open-source Python library designed to create interactive, visually appealing charts and graphs. It helps users to explore data through features like zooming, additional details and clicking for deeper insights. It handles the interactivity with JavaScript behind the scenes so that we c
12 min read
Data Visualization using Plotnine and ggplot2 in Python Plotnine is a Python data visualization library built on the principles of the Grammar of Graphics, the same philosophy that powers ggplot2 in R. It allows users to create complex plots by layering components such as data, aesthetics and geometric objects.Installing Plotnine in PythonThe plotnine is
6 min read
Introduction to Altair in Python Altair is a declarative statistical visualization library in Python, designed to make it easy to create clear and informative graphics with minimal code. Built on top of Vega-Lite, Altair focuses on simplicity, readability and efficiency, making it a favorite among data scientists and analysts.Why U
4 min read
Python - Data visualization using Bokeh Bokeh is a data visualization library in Python that provides high-performance interactive charts and plots. Bokeh output can be obtained in various mediums like notebook, html and server. It is possible to embed bokeh plots in Django and flask apps. Bokeh provides two visualization interfaces to us
4 min read
Pygal Introduction Python has become one of the most popular programming languages for data science because of its vast collection of libraries. In data science, data visualization plays a crucial role that helps us to make it easier to identify trends, patterns, and outliers in large data sets. Pygal is best suited f
5 min read