I have a pandas dataframe that contains duplicates, but not regular duplicates you can remove by using simple df.drop_duplicates. For example when combining columns 1 and 2, for the purpose of my work, AB and BA are the same, and the 5th row should be eliminated.
1 2
---
A B ===> AB
C D
E F
G H
B A ===> BA
I J
K L
I want to remove duplicates keeping only one of the first rows. This would lead to:
1 2
---
A B
C D
E F
G H
I J
K L
cannot figure out how to do that. any help would be appreciated.
>Solution :
One solution could be:
import pandas as pd
data = {'1': {0: 'A', 1: 'C', 2: 'E', 3: 'G', 4: 'B', 5: 'I', 6: 'K'},
'2': {0: 'B', 1: 'D', 2: 'F', 3: 'H', 4: 'A', 5: 'J', 6: 'L'}}
df = pd.DataFrame(data)
out = df.loc[~df.apply(sorted, axis=1).duplicated()]
print(out)
1 2
0 A B
1 C D
2 E F
3 G H
5 I J
6 K L