Combining DataFrames in Pandas is a fundamental operation that allows users to merge, concatenate, or join data from multiple sources into a single DataFrame.
This article explores the different techniques we can use to combine DataFrames in Pandas, focusing on concatenation, merging and joining.
Python
import pandas as pd
data1 = {'name': ['Alice', 'Bob'], 'age': [25, 30]}
data2 = {'name': ['Alice', 'Charlie'], 'age': [26, 35]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = df1.merge(df2, on='name', how='outer')
print(df3)
Output name age_x age_y
0 Alice 25.0 26.0
1 Bob 30.0 NaN
2 Charlie NaN 35.0
Method 1: Concatenating DataFrames
concat() function in Pandas is used to combine multiple DataFrames along rows (vertically) or columns (horizontally). It's most useful when you have DataFrames that we want to stack.
- Use pd.concat() to combine DataFrames vertically or horizontally.
Python
import pandas as pd
df1 = pd.DataFrame({
'Name': ['John', 'Jane'],
'Age': [28, 34],
'Gender': ['Male', 'Female']
})
df2 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Age': [25, 23],
'Gender': ['Female', 'Male']
})
# Concatenate the two DataFrames along rows (vertically)
df3 = pd.concat([df1, df2], ignore_index=True)
print(df3)
We can refer this article for detailed explanation: How to concatenate two datframes
Method 2: Merging DataFrames
merge() function in Pandas is used to combine DataFrames based on a common column (similar to SQL JOINs). It’s helpful when we need to match rows from different DataFrames based on key columns.
- Use merge() to combine DataFrames based on a shared column.
Python
import pandas as pd
df1 = pd.DataFrame({
'Name': ['John', 'Jane', 'Alice'],
'Age': [28, 34, 25]
})
df2 = pd.DataFrame({
'Name': ['John', 'Alice', 'Bob'],
'Department': ['HR', 'IT', 'Finance']
})
# Merge the two DataFrames on the 'Name' column
df3 = pd.merge(df1, df2, on='Name', how='inner')
print(df3)
Method 3: Joining DataFrames
join() method is used to combine two DataFrames based on their index or on a key column. It is a convenient method when joining DataFrames with a shared index.
- Use join() to combine DataFrames based on their index or a key column.
Python
import pandas as pd
df1 = pd.DataFrame({
'Age': [28, 34, 25],
'Gender': ['Male', 'Female', 'Female']
}, index=['John', 'Jane', 'Alice'])
df2 = pd.DataFrame({
'Department': ['HR', 'IT', 'Finance']
}, index=['John', 'Jane', 'Alice'])
# Join the two DataFrames based on their index
df3 = df1.join(df2)
print(df3)
Method 4: Combining DataFrames Horizontally
We can also combine DataFrames horizontally (along columns) using concat(). This is useful when we want to join different features or attributes of the same observations.
- Use pd.concat() with the axis=1 argument to combine DataFrames horizontally.
Python
import pandas as pd
df1 = pd.DataFrame({
'Name': ['John', 'Jane', 'Alice'],
'Age': [28, 34, 25]
})
df2 = pd.DataFrame({
'Gender': ['Male', 'Female', 'Female'],
'Department': ['HR', 'IT', 'Finance']
})
# Concatenate the two DataFrames horizontally (along columns)
df3 = pd.concat([df1, df2], axis=1)
print(df3)
Output Name Age Gender Department
0 John 28 Male HR
1 Jane 34 Female IT
2 Alice 25 Female Finance
Method 5: Using combine_first() for Combining DataFrames
combine_first() function is used to combine two DataFrames where the values in the first DataFrame are retained unless there are missing values (NaN), in which case the corresponding values from the second DataFrame are used.
- Use combine_first() to combine DataFrames.
Python
import pandas as pd
df1 = pd.DataFrame({
'Name': ['John', 'Jane', 'Alice'],
'Age': [28, None, 25],
'Department': [None, 'IT', 'Finance']
})
df2 = pd.DataFrame({
'Name': ['John', 'Jane', 'Alice'],
'Age': [30, 34, 26],
'Department': ['HR', 'IT', 'Finance']
})
# Combine the DataFrames using combine_first()
df3 = df1.combine_first(df2)
print(df3)
Output Name Age Department
0 John 28.0 HR
1 Jane 34.0 IT
2 Alice 25.0 Finance
Related Articles:
Similar Reads
Combining DataFrames with Pandas Pandas DataFrame consists of three principal components, the data, rows, and columns. To combine these DataFrames, pandas provides multiple functions like concat() and append(). Method #1: Using concat() method Initially, creating two datasets and converting them into dataframes. Python3 # import r
2 min read
Pandas DataFrame A Pandas DataFrame is a two-dimensional table-like structure in Python where data is arranged in rows and columns. Itâs one of the most commonly used tools for handling data and makes it easy to organize, analyze and manipulate data. It can store different types of data such as numbers, text and dat
10 min read
How to combine two DataFrames in Pandas? While working with data, there are multiple times when you would need to combine data from multiple sources. For example, you may have one DataFrame that contains information about a customer, while another DataFrame contains data about their transaction history. If you want to analyze this data tog
3 min read
Combine two Pandas series into a DataFrame In this post, we will learn how to combine two series into a DataFrame? Before starting let's see what a series is?Pandas Series is a one-dimensional labeled array capable of holding any data type. In other terms, Pandas Series is nothing but a column in an excel sheet. There are several ways to con
3 min read
Pandas Join Dataframes Joining DataFrames is a common operation in data analysis, where you combine two or more DataFrames based on common columns or indices. Pandas provides various methods to perform joins, allowing you to merge data in flexible ways. In this article, we will explore how to join DataFrames using methods
4 min read