How to combine two DataFrames in Pandas?
Last Updated :
05 May, 2025
While working with data, there are multiple times when you would need to combine data from multiple sources. For example, you may have one DataFrame that contains information about a customer, while another DataFrame contains data about their transaction history. If you want to analyze this data together, then you would need to combine these DataFrames. The two main ways to achieve this in Pandas are: concat() and merge().
In this article, we will implement and compare both methods to show you when each is best.
1. Using concat() to Combine DataFrames
The concat() function allows you to stack DataFrames by adding rows on top of each other or columns side by side.
Stacking DataFrames Vertically
Python
import pandas as pd
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})
c_df = pd.concat([df1, df2])
print(c_df)
Output:
Stacking Dataframes Vertically
The indexes are not reset. If you want a clean, new index, you can use ignore_index=True:
Python
c_df = pd.concat([df1, df2], ignore_index=True)
print(c_df)
Output:
Writing Index in Order
Stacking DataFrames Horizontally
Python
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'City': ['New York', 'Los Angeles'], 'Salary': [70000, 80000]})
c_df = pd.concat([df1, df2], axis=1)
print(c_df)
Output:
Stacking Dataframes Horizontally
2. Using merge() to Combine DataFrames
The merge() Function is like joining tables in SQL. It combines DataFrames based on common columns or indexes.
Basic Merge (Inner Join)
The default join is an "inner join," meaning only the rows that have the same value in the shared column will be kept:
Python
df1 = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]})
df2 = pd.DataFrame({'Name': ['Alice', 'Bob', 'David'], 'Salary': [50000, 60000, 70000]})
m_df = pd.merge(df1, df2, on='Name')
print(m_df)
Output:
Inner Join of Dataframes
Types of Joins in merge()
- Inner Join: Only rows with matching values in both DataFrames.
- Outer Join: Includes all rows from both DataFrames. Where there's no match, it fills in
NaN
for missing values. - Left Join: All rows from the left DataFrame and matching rows from the right.
- Right Join: All rows from the right DataFrame and matching rows from the left.
Example of an outer join:
Python
outer_m_df = pd.merge(df1, df2, on='Name', how='outer')
print(outer_m_df)
Output:
Outer Join of DataframesWhen to Use: concat()
vs merge()
concat():
- When you want to stack DataFrames (add rows or columns).
- When the DataFrames have similar structures.
merge():
- When you need to join DataFrames based on shared columns or indices.
- When you need different types of joins (inner, outer, etc.).
Comparison Table - concat() vs merge()
Feature | concat() | merge() |
---|
Purpose | Stack/concatenate along an axis | Combine DataFrames based on columns or index |
---|
Axis | Can stack along rows or columns | Joins based on common columns or index |
---|
Join Types | - | Supports inner, outer, left, and right joins |
---|
Flexibility | Simple stacking | More complex merging with conditions |
---|
Use Case | Stacking DataFrames row-wise or column-wise | Joining datasets based on shared columns or indices |
---|
Read More: