Python Dataframe Duplicates Removal based on combination of two columns

September 27, 2022

I have a pandas dataframe that contains duplicates, but not regular duplicates you can remove by using simple df.drop_duplicates. For example when combining columns 1 and 2, for the purpose of my work, AB and BA are the same, and the 5th row should be eliminated.

1 2
---
A B ===> AB
C D
E F
G H
B A ===> BA
I J
K L

I want to remove duplicates keeping only one of the first rows. This would lead to:

1 2
---
A B
C D
E F
G H
I J
K L

cannot figure out how to do that. any help would be appreciated.

>Solution :

One solution could be:

import pandas as pd

data = {'1': {0: 'A', 1: 'C', 2: 'E', 3: 'G', 4: 'B', 5: 'I', 6: 'K'}, 
        '2': {0: 'B', 1: 'D', 2: 'F', 3: 'H', 4: 'A', 5: 'J', 6: 'L'}}
df = pd.DataFrame(data)

out = df.loc[~df.apply(sorted, axis=1).duplicated()]

print(out)

   1  2
0  A  B
1  C  D
2  E  F
3  G  H
5  I  J
6  K  L