Merge, Join and Concatenate DataFrames using Pandas
Last Updated :
25 Nov, 2024
Dataframe is a two-dimensional data structure having multiple rows and columns. In a Pandas DataFrame, the data is aligned in the form of rows and columns only. A dataframe can perform arithmetic as well as conditional operations. It has a mutable size. This article will show how to join, concatenate, and merge in Pandas.
Python Merge, Join, and Concatenate DataFrames Using Pandas
Below are the different ways and approaches by which we can merge, join, and concatenate in Pandas in Python:
Concatenate DataFrames
concat()
function does all of the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes. In this example, three DataFrames (df1
, df2
, and df3
) with identical columns but different indices are created. Using pd.concat()
, these Pandas DataFrames are vertically stacked, resulting in a combined DataFrame where rows from each original DataFrame follow one another.
Python
# Creating first dataframe
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
# Creating second dataframe
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']},
index=[4, 5, 6, 7])
# Creating third dataframe
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
'B': ['B8', 'B9', 'B10', 'B11'],
'C': ['C8', 'C9', 'C10', 'C11'],
'D': ['D8', 'D9', 'D10', 'D11']},
index=[8, 9, 10, 11])
# Concatenating the dataframes
pd.concat([df1, df2, df3])
Output:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
Merge DataFrames
DataFrames Merge Pandas provides a single function, merge(), as the entry point for all standard database join operations between DataFrame objects. In this example, two DataFrames (left
and right
) are created with a common key column 'Key'. The pd.merge()
function merges these DataFrames based on their common 'Key' column using an inner join, resulting in a combined DataFrame containing only the rows where the key values match in both DataFrames.
Python
# Dataframe created
left = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
# Merging the dataframes
pd.merge(left, right, how='inner', on='Key')
Output:
Inner Merge:
Key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
Left Merge
The pd.merge()
function with how='left'
performs a left join, merging the left
DataFrame with the right
DataFrame on the 'Key' column. All rows from the left
DataFrame are retained, and matching rows from the right
DataFrame are appended with their respective values.
Python
left_merged = pd.merge(left, right, how='left', on='Key')
print(left_merged)
Output:
Left Merge:
Key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 NaN NaN
3 K3 A3 B3 NaN NaN
Right Merge
The pd.merge()
function with how='right'
performs a right join, merging the left
DataFrame with the right
DataFrame on the 'Key' column. All rows from the right
DataFrame are retained, and matching rows from the left
DataFrame are appended with their respective values.
Python
right_merged = pd.merge(left, right, how='right', on='Key')
print(right_merged)
Output:
Right Merge:
Key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K4 NaN NaN C4 D4
3 K5 NaN NaN C5 D5
Outer Merge
The pd.merge()
function with how='outer'
performs a full outer join, merging the left
DataFrame with the right
DataFrame on the 'Key' column. This results in a DataFrame that contains all rows from both the left
and right
DataFrames, filling in missing values with NaN where there isn't a match.
Python
outer_merged = pd.merge(left, right, how='outer', on='Key')
print(outer_merged)
Output:
Outer Merge:
Key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 NaN NaN
3 K3 A3 B3 NaN NaN
4 K4 NaN NaN C4 D4
5 K5 NaN NaN C5 D5
Join DataFrames
In this example, both left
and right
DataFrames are indexed with keys 'K0', 'K1', 'K2', and 'K3'. The left.join(right)
operation merges the DataFrames on their indices, combining columns 'A' and 'B' from left
with columns 'C' and 'D' from right
based on the shared index.
Python
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']},
index=['K0', 'K1', 'K2', 'K3'])
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=['K0', 'K1', 'K2', 'K3'])
# Joining the dataframes
left.join(right)
Output
A B C D
K0 A0 B0 C0 D0
K1 A1 B1 C1 D1
K2 A2 B2 C2 D2
K3 A3 B3 C3 D3
Similar Reads
Joining two Pandas DataFrames using merge() The merge() function is designed to merge two DataFrames based on one or more columns with matching values. The basic idea is to identify columns that contain common data between the DataFrames and use them to align rows. Let's understand the process of joining two pandas DataFrames using merge(), e
4 min read
How to Join Pandas DataFrames using Merge? Joining and merging DataFrames is that the core process to start  out with data analysis and machine learning tasks. It's one of the toolkits which each Data Analyst or Data Scientist should master because in most cases data comes from multiple sources and files. In this tutorial, you'll how to join
3 min read
How To Concatenate Two or More Pandas DataFrames? In real-world data the information is often spread across multiple tables or files. To analyze it properly we need to bring all that data together. This is where the pd.concat() function in Pandas comes as it allows you to combine two or more DataFrames in: Vertically (stacking rows on top of each o
3 min read
How to Concatenate Column Values in Pandas DataFrame? Many times we need to combine values in different columns into a single column. There can be many use cases of this, like combining first and last names of people in a list, combining day, month, and year into a single column of Date, etc. Now we'll see how we can achieve this with the help of some
2 min read
Append or Concatenate Two DataFrames in Python Polars Polars is a fast Data Frame library implemented in Rust, providing efficient ways to work with large datasets. Whether we need to append rows or concatenate columns, Polars offers multiple methods to handle these tasks effectively.Setting Up Your EnvironmentBefore diving into the examples, ensure yo
3 min read