I have the following 2 columns, from a Pandas DataFrame:
antecedents consequents
apple orange
orange apple
apple water
apple pineapple
water lemon
lemon water
I would like to remove duplicates that appear as bot antecedents and consequents, keeping only the first appearing, and thus obtain:
antecedents consequents
apple orange
apple water
apple pineapple
water lemon
How can I achieve that using Pandas?
>Solution :
Use frozenset by both columns and test duplicates by Series.duplicated:
df2 = df[~df[['antecedents','consequents']].apply(frozenset,axis=1).duplicated()]
Or sorting values per rows in numpy.sort:
df1 = pd.DataFrame(np.sort(df[['antecedents','consequents']], axis=1), index=df.index)
df2 = df[~df1.duplicated()]
print (df2)
antecedents consequents
0 apple orange
2 apple water
3 apple pineapple
4 water lemon