Skip to content

ENH: Add anti-joins to pandas.merge #42916

@rj-global

Description

@rj-global

Is your feature request related to a problem?

Pandas does not allow anti-joins. It would be helpful to have these added
Pyspark has implemented this on its 'join' command.
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html

Describe the solution you'd like

Add anti-left and anti-right joins to 'how'
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html

API breaking implications

None

Describe alternatives you've considered

You can replicate an anti-join by doing an outer join, and filtering.

My code does work but sometimes gives too many columns - not sure why. Also my code relies on the column name not already containing '_drop' - which obviously could be an issue.

Additional context

[add any other context, code examples, or references to existing implementations about the feature request here]

import pandas as pd
def left_anti_join(df1,df2,left_on,right_on):
   """ Perfoms left anti join of two data frames"""
   df=pd.merge(df1,df2,left_on=left_on,right_on=right_on,suffixes=('','_drop'), how='outer', indicator=True)
   df=df[df['_merge']=='left_only']
   df.drop([col for col in df.columns if '_drop' in col],axis=1, inplace=True)
   df.drop(columns="_merge", inplace=True)
   return df

Metadata

Metadata

Labels

EnhancementReshapingConcat, Merge/Join, Stack/Unstack, Explode

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions