How to Get the Common Index of Two Pandas DataFrames
Last Updated :
15 Sep, 2024
When working with large datasets in Python Pandas, having multiple DataFrames with overlapping or related data is common. In many cases, we may want to identify the common indices between two DataFrames to perform further analysis, such as merging, filtering, or comparison.
This article will guide us through the process of finding the common index between two Pandas DataFrames using simple and efficient methods.
Finding the common index is helpful when:
- We want to focus on the shared data between two DataFrames.
- We must perform operations like intersection or filtering based on the indices.
- We want to merge DataFrames on their indices but only for common elements.
Let’s explore different methods to achieve this in Python.
Pandas DataFrames with Common Index
In Pandas, the index represents the labels for rows in a DataFrame. It helps to uniquely identify rows, providing more control during merging, filtering, or aligning data across DataFrames. If two DataFrames share some indices, we may want to retrieve those common indices to perform meaningful operations like merging or analyzing similar subsets.
Here, let us first create Panads DataFrames with common index.
Install Pandas
First, let’s install Pandas by writing the following command in the terminal.
pip install pandas
Import Pandas
Next, import the installed Pandas library.
import pandas as pd
Create two data frames that contains common index values
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
}, index=['a', 'b', 'c', 'd'])
df2 = pd.DataFrame({
'B': [5, 6, 7, 8],
}, index=['b', 'c', 'd', 'e'])
Now, let’s see the full code execution. Here, df1 has indices ['a', 'b', 'c', 'd'], while df2 has indices ['b', 'c', 'd', 'e']. The common indices between df1 and df2 are ['b', 'c', 'd'].
Python
# import pandas module
import pandas as pd
# Create the first DataFrame
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
}, index=['a', 'b', 'c', 'd'])
# Create the second DataFrame
df2 = pd.DataFrame({
'B': [5, 6, 7, 8],
}, index=['b', 'c', 'd', 'e'])
print("DataFrame 1:\n", df1)
print("\nDataFrame 2:\n", df2)
Output:
Create Pandas DataframeMethods to Get the Common Index of Pandas DataFrame
Now, let us see a few different ways to get common index of Pandas DataFrame in Python.
1. Using intersection()
Pandas provides a straightforward way to get the common indices using the intersection() method. This method returns the intersection of two Index objects, which represents the common elements between them.
Python
# import pandas module
import pandas as pd
# Create the first DataFrame
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
}, index=['a', 'b', 'c', 'd'])
# Create the second DataFrame
df2 = pd.DataFrame({
'B': [5, 6, 7, 8],
}, index=['b', 'c', 'd', 'e'])
# finding common index using intersection()
common_index = df1.index.intersection(df2.index)
print("Common Index:", common_index)
Output:
Common Index: Index(['b', 'c', 'd'], dtype='object')
2. Using Set Operations
Another approach is to treat the indices as Python sets and use set operations to find the common indices. This approach is particularly useful if we’re familiar with Python’s set operations. This approach provides the same result, though the output is a set instead of a Pandas Index object.
Python
# import pandas module
import pandas as pd
# Create the first DataFrame
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
}, index=['a', 'b', 'c', 'd'])
# Create the second DataFrame
df2 = pd.DataFrame({
'B': [5, 6, 7, 8],
}, index=['b', 'c', 'd', 'e'])
# finding common index using
common_index_set = set(df1.index) & set(df2.index)
print("Common Index (Set Operation):", common_index_set)
Output:
Common Index (Set Operation): {'d', 'c', 'b'}
Filtering DataFrames by Common Index
Once we have the common index, we might want to filter the DataFrames to only include the rows that share this index. This can be done by using the df.loc[] label indexer. It selects the rows from df1 where the index values match the common_index. It filters df1 down to only those rows that share the common indices.
Python
# import pandas module
import pandas as pd
# Create the first DataFrame
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
}, index=['a', 'b', 'c', 'd'])
# Create the second DataFrame
df2 = pd.DataFrame({
'B': [5, 6, 7, 8],
}, index=['b', 'c', 'd', 'e'])
# finding common index using intersection()
common_index = df1.index.intersection(df2.index)
# Filter df1 by common index
df1_filtered = df1.loc[common_index]
# Filter df2 by common index
df2_filtered = df2.loc[common_index]
print("Filtered DataFrame 1:\n", df1_filtered)
print("\nFiltered DataFrame 2:\n", df2_filtered)
Output:
Filtering DataFrame by common index in Pandas
Merging on the Common Index
Once we have the common indices, we can perform a merge operation to combine the data from both DataFrames based on these shared indices. This is done by using the Pandas merge() method which combines the two DataFrames based on the common index, keeping only the rows where the index is shared.
Python
# import pandas module
import pandas as pd
# Create the first DataFrame
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
}, index=['a', 'b', 'c', 'd'])
# Create the second DataFrame
df2 = pd.DataFrame({
'B': [5, 6, 7, 8],
}, index=['b', 'c', 'd', 'e'])
# merging dataframe based on common index
merged_df = pd.merge(df1, df2, left_index=True, right_index=True, how='inner')
print("Merged DataFrame:\n", merged_df)
Output:
Merged DataFrame:
A B
b 2 5
c 3 6
d 4 7
Conclusion
Identifying the common index between two Pandas DataFrames is a valuable technique when comparing, merging, or aligning data. Whether we want to merge DataFrames based on shared indices or simply retrieve the common indices for further analysis, Pandas provides efficient ways to achieve this using functions like merge() and intersection().
Similar Reads
How to combine two DataFrames in Pandas?
While working with data, there are multiple times when you would need to combine data from multiple sources. For example, you may have one DataFrame that contains information about a customer, while another DataFrame contains data about their transaction history. If you want to analyze this data tog
3 min read
How to convert index in a column of the Pandas dataframe?
Each row in a dataframe (i.e level=0) has an index value i.e value from 0 to n-1 index location and there are many ways to convert these index values into a column in a pandas dataframe. First, let's create a Pandas dataframe. Here, we will create a Pandas dataframe regarding student's marks in a pa
4 min read
How to Merge Two Pandas DataFrames on Index
Merging two pandas DataFrames on their index is necessary when working with datasets that share the same row identifiers but have different columns. The core idea is to align the rows of both DataFrames based on their indices, combining the respective columns into one unified DataFrame. To merge two
3 min read
How to take column-slices of DataFrame in Pandas?
In this article, we will learn how to slice a DataFrame column-wise in Python. DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns.Creating Dataframe to slice columnsPython# importing pandas import pandas as pd # Using DataFrame() method from pandas module df1 = pd.
2 min read
How to Get First Column of Pandas DataFrame?
Getting the first column of a Pandas DataFrame is a frequent task when working with tabular data. Pandas provides multiple simple and efficient ways to extract a column, whether you want it as a Series (1D) or as a DataFrame (2D). Letâs explore the common methods to retrieve the first column of a Da
3 min read
How to Convert Index to Column in Pandas Dataframe?
Pandas is a powerful tool which is used for data analysis and is built on top of the python library. The Pandas library enables users to create and manipulate dataframes (Tables of data) and time series effectively and efficiently. These dataframes can be used for training and testing machine learni
2 min read
How To Concatenate Two or More Pandas DataFrames?
In real-world data the information is often spread across multiple tables or files. To analyze it properly we need to bring all that data together. This is where the pd.concat() function in Pandas comes as it allows you to combine two or more DataFrames in: Vertically (stacking rows on top of each o
3 min read
How to Convert Dataframe column into an index in Python-Pandas?
Pandas provide a convenient way to handle data and its transformation. Let's see how can we convert a data frame column to row name or index in Pandas. Create a dataframe first with dict of lists.  Python3 # importing pandas as pd import pandas as pd # Creating a dict of lists data = {'Name':["Akas
2 min read
How to compare values in two Pandas Dataframes?
Let's discuss how to compare values in the Pandas dataframe. Here are the steps for comparing values in two pandas Dataframes: Step 1 Dataframe Creation: The dataframes for the two datasets can be created using the following code:Â Python3 import pandas as pd # elements of first dataset first_Set =
2 min read
How to get column and row names in DataFrame?
While analyzing the real datasets which are often very huge in size, we might need to get the rows or index names and columns names in order to perform certain operations. Note: For downloading the nba dataset used in the below examples Click Here Getting row names in Pandas dataframe First, let's
3 min read