Pandas Scatter Plot – DataFrame.plot.scatter()
Last Updated :
03 Apr, 2025
A Scatter plot is a type of data visualization technique that shows the relationship between two numerical variables. In Pandas, we can create a scatter plot using the DataFrame.plot.scatter() method. This method helps in visualizing how one variable correlates with another. Example:
Python
import pandas as pd
import matplotlib.pyplot as plt
data = {'Height': [150, 160, 170, 180, 190],
'Weight': [50, 65, 75, 85, 95]}
df = pd.DataFrame(data)
# Creating a scatter plot
df.plot.scatter(x='Height', y='Weight')
plt.show()
Output
Basic scatter plotExplanation: This scatter plot shows how Weight changes with Height. As height increases, weight also tends to increase, indicating a positive correlation
Syntax of DataFrame.plot.scatter()
DataFrame.plot.scatter(x, y, s=None, c=None, colormap=None, alpha=None, figsize=None, grid=False, **kwargs)
Parameters:
Parameter | Description |
---|
x (Required) | Column name to be used for x-axis values. |
---|
y (Required) | Column name to be used for y-axis values. |
---|
s (Optional) | Size of the markers (default is None). Can be a single value or an array. |
---|
c (Optional) | Color of the markers. Can be a column name, color string or an array. |
---|
colormap (Optional) | Colormap to use for coloring points. |
---|
alpha (Optional) | Transparency level of points (range: 0 to 1). |
---|
figsize (Optional) | Tuple (width, height) to define figure size. |
---|
grid (Optional) | Boolean (True or False) to display a grid. |
---|
**kwargs | Additional arguments passed to Matplotlib’s scatter() function. |
---|
Returns: It returns a Matplotlib AxesSubplot object with the scatter plot.
Examples of scatter plot
Example 1: In this example, we visualize Age distribution among individuals. The size of each point is determined by the Age and the color of all points is set to red.
Python
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name': ['Dhanashri', 'Smita', 'Rutuja', 'Sunita', 'Poonam', 'Srushti'],
'Age': [20, 18, 27, 50, 12, 15]}
df = pd.DataFrame(data)
# scatter plot with size determined by age
df.plot.scatter(x='Name', y='Age', s=df['Age']*10, c='red')
plt.show()
Output
Customized scatter plotExplanation: A scatter plot where each person's name is plotted on the x-axis, and their age on the y-axis. The marker size is proportional to the age, making older individuals more prominent in the plot.
Example 2: In this example, we analyze how the population of different countries correlates with their CO₂ emissions. The size of the markers is determined by the country's population, making larger countries more prominent.
Python
import pandas as pd
import matplotlib.pyplot as plt
data = {'Country': ['USA', 'China', 'India', 'Germany', 'Brazil', 'Australia'],
'Population': [331, 1441, 1393, 83, 213, 26], # in millions
'CO2_Emissions': [5000, 12000, 2500, 800, 1300, 400]} # in megatonnes
df = pd.DataFrame(data)
# Creating scatter plot
df.plot.scatter(x='Population', y='CO2_Emissions', s=df['Population'] * 2, c='blue')
plt.xlabel("Population (in millions)")
plt.ylabel("CO₂ Emissions (megatonnes)")
plt.grid(True)
plt.show()
Output
Population vs. CO₂ EmissionsExplanation: A scatter plot showing the relationship between a country's population and its CO₂ emissions. Larger populations tend to have higher emissions, which is reflected in the marker size.
Example 3: In this example, we analyze how years of experience affect salary while using job level to size the markers. The size of each marker is determined by the Job Level (higher job levels result in larger markers).
Python
import pandas as pd
import matplotlib.pyplot as plt
data = {'Experience': [1, 3, 5, 7, 10, 12, 15],
'Salary': [40000, 60000, 80000, 110000, 140000, 180000, 220000], # in $
'Job_Level': [1, 2, 3, 4, 5, 6, 7]} # Job level (higher = senior)
df = pd.DataFrame(data)
# Creating scatter plot
df.plot.scatter(x='Experience', y='Salary', s=df['Job_Level'] * 50, c='green')
plt.xlabel("Years of Experience")
plt.ylabel("Salary ($)")
plt.grid(True)
plt.show()
Output
Experience vs. Salary GrowthExplanation: A scatter plot where salary increases as experience grows. Higher job levels are represented with larger markers, making it easy to see how senior positions impact salary.
Similar Reads
Python | Pandas Dataframe.plot.bar Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas DataFrame.plot.bar() plots the graph vertically in form of rectangular bars. Sy
2 min read
How to Plot a Dataframe using Pandas Pandas plotting is an interface to Matplotlib, that allows to generate high-quality plots directly from a DataFrame or Series. The .plot() method is the core function for plotting data in Pandas. Depending on the kind of plot we want to create, we can specify various parameters such as plot type (ki
8 min read
Process Pandas DataFrame into a Violin Plot A Violin Plot is a combination of a box plot and a density plot, providing a richer visualization for data distributions. It's especially useful when comparing the distribution of data across several categories. In this article, we'll walk through the process of generating a violin plot from a Panda
5 min read
Pair plots using Scatter matrix in Pandas Checking for collinearity among attributes of a dataset, is one of the most important steps in data preprocessing. A good way to understand the correlation among the features, is to create scatter plots for each pair of attributes. Pandas has a function scatter_matrix(), for this purpose. scatter_ma
2 min read
PyQtGraph â Setting Data of Scatter Plot Graph In this article, we will see how we can set data on the plot graph in the PyQtGraph module. PyQtGraph is a graphics and user interface library for Python that provides functionality commonly required in designing and science applications. Its primary goals are to provide fast, interactive graphics f
3 min read
Animated Scatter Plots in Plotly for Time-Series Data Time-series data consists of observations collected at regular intervals over time, often used in fields such as finance, meteorology, and economics. One powerful way to visualize this type of data is through animated scatter plots, which not only display the data points but also reveal changes and
5 min read
What Is a Scatter Plot in Python? Scatter plots are a fundamental tool in data visualization, providing a visual representation of the relationship between two variables. In Python, scatter plots are commonly created using libraries such as Matplotlib and Seaborn. This article will delve into the concept of scatter plots, their appl
6 min read
How to Annotate Matplotlib Scatter Plots? A scatter plot uses dots to represent values for two different numeric variables. In Python, we have a library matplotlib in which there is a function called scatter that helps us to create Scatter Plots. Here, we will use matplotlib.pyplot.scatter() method to plot. Syntax : matplotlib.pyplot.scatte
3 min read
Python Altair - Scatter Plot In this article, we will learn a Simple Scatter plot with Altair using python. Altair is one of the latest interactive data visualizations library in python. Altair is based on vega and vegalite- A grammar of interactive graphics. Â Here we will import the Altair library for using it. And then we wil
2 min read
How to Draw a Line Inside a Scatter Plot Scatter plots are a fundamental tool in data visualization, providing a clear way to display the relationship between two variables. Enhancing these plots with lines, such as trend lines or lines of best fit, can offer additional insights. This article will guide you through the process of drawing a
4 min read