-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
EnhancementReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode
Description
Is your feature request related to a problem?
Pandas does not allow anti-joins. It would be helpful to have these added
Pyspark has implemented this on its 'join' command.
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html
Describe the solution you'd like
Add anti-left and anti-right joins to 'how'
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html
API breaking implications
None
Describe alternatives you've considered
You can replicate an anti-join by doing an outer join, and filtering.
My code does work but sometimes gives too many columns - not sure why. Also my code relies on the column name not already containing '_drop' - which obviously could be an issue.
Additional context
[add any other context, code examples, or references to existing implementations about the feature request here]
import pandas as pd
def left_anti_join(df1,df2,left_on,right_on):
""" Perfoms left anti join of two data frames"""
df=pd.merge(df1,df2,left_on=left_on,right_on=right_on,suffixes=('','_drop'), how='outer', indicator=True)
df=df[df['_merge']=='left_only']
df.drop([col for col in df.columns if '_drop' in col],axis=1, inplace=True)
df.drop(columns="_merge", inplace=True)
return df
ehsantn, ilanfri and Mouli04
Metadata
Metadata
Assignees
Labels
EnhancementReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode