Open In App

Pandas Combine Dataframe

Last Updated : 15 Feb, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Combining DataFrames in Pandas is a fundamental operation that allows users to merge, concatenate, or join data from multiple sources into a single DataFrame.

This article explores the different techniques we can use to combine DataFrames in Pandas, focusing on concatenation, merging and joining.

Python
import pandas as pd

data1 = {'name': ['Alice', 'Bob'], 'age': [25, 30]}
data2 = {'name': ['Alice', 'Charlie'], 'age': [26, 35]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

df3 = df1.merge(df2, on='name', how='outer')
print(df3)

Output
      name  age_x  age_y
0    Alice   25.0   26.0
1      Bob   30.0    NaN
2  Charlie    NaN   35.0

Method 1: Concatenating DataFrames

concat() function in Pandas is used to combine multiple DataFrames along rows (vertically) or columns (horizontally). It's most useful when you have DataFrames that we want to stack.

  • Use pd.concat() to combine DataFrames vertically or horizontally.
Python
import pandas as pd

df1 = pd.DataFrame({
    'Name': ['John', 'Jane'],
    'Age': [28, 34],
    'Gender': ['Male', 'Female']
})

df2 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 23],
    'Gender': ['Female', 'Male']
})

# Concatenate the two DataFrames along rows (vertically)
df3 = pd.concat([df1, df2], ignore_index=True)

print(df3)

We can refer this article for detailed explanation: How to concatenate two datframes

Method 2: Merging DataFrames

merge() function in Pandas is used to combine DataFrames based on a common column (similar to SQL JOINs). It’s helpful when we need to match rows from different DataFrames based on key columns.

  • Use merge() to combine DataFrames based on a shared column.
Python
import pandas as pd

df1 = pd.DataFrame({
    'Name': ['John', 'Jane', 'Alice'],
    'Age': [28, 34, 25]
})

df2 = pd.DataFrame({
    'Name': ['John', 'Alice', 'Bob'],
    'Department': ['HR', 'IT', 'Finance']
})

# Merge the two DataFrames on the 'Name' column
df3 = pd.merge(df1, df2, on='Name', how='inner')

print(df3)

Method 3: Joining DataFrames

join() method is used to combine two DataFrames based on their index or on a key column. It is a convenient method when joining DataFrames with a shared index.

  • Use join() to combine DataFrames based on their index or a key column.
Python
import pandas as pd

df1 = pd.DataFrame({
    'Age': [28, 34, 25],
    'Gender': ['Male', 'Female', 'Female']
}, index=['John', 'Jane', 'Alice'])

df2 = pd.DataFrame({
    'Department': ['HR', 'IT', 'Finance']
}, index=['John', 'Jane', 'Alice'])

# Join the two DataFrames based on their index
df3 = df1.join(df2)

print(df3)

Method 4: Combining DataFrames Horizontally

We can also combine DataFrames horizontally (along columns) using concat(). This is useful when we want to join different features or attributes of the same observations.

  • Use pd.concat() with the axis=1 argument to combine DataFrames horizontally.
Python
import pandas as pd

df1 = pd.DataFrame({
    'Name': ['John', 'Jane', 'Alice'],
    'Age': [28, 34, 25]
})

df2 = pd.DataFrame({
    'Gender': ['Male', 'Female', 'Female'],
    'Department': ['HR', 'IT', 'Finance']
})

# Concatenate the two DataFrames horizontally (along columns)
df3 = pd.concat([df1, df2], axis=1)

print(df3)

Output
    Name  Age  Gender Department
0   John   28    Male         HR
1   Jane   34  Female         IT
2  Alice   25  Female    Finance

Method 5: Using combine_first() for Combining DataFrames

combine_first() function is used to combine two DataFrames where the values in the first DataFrame are retained unless there are missing values (NaN), in which case the corresponding values from the second DataFrame are used.

  • Use combine_first() to combine DataFrames.
Python
import pandas as pd

df1 = pd.DataFrame({
    'Name': ['John', 'Jane', 'Alice'],
    'Age': [28, None, 25],
    'Department': [None, 'IT', 'Finance']
})

df2 = pd.DataFrame({
    'Name': ['John', 'Jane', 'Alice'],
    'Age': [30, 34, 26],
    'Department': ['HR', 'IT', 'Finance']
})

# Combine the DataFrames using combine_first()
df3 = df1.combine_first(df2)

print(df3)

Output
    Name   Age Department
0   John  28.0         HR
1   Jane  34.0         IT
2  Alice  25.0    Finance

Related Articles:


Next Article

Similar Reads