Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Dataframe Duplicates Removal based on combination of two columns

I have a pandas dataframe that contains duplicates, but not regular duplicates you can remove by using simple df.drop_duplicates. For example when combining columns 1 and 2, for the purpose of my work, AB and BA are the same, and the 5th row should be eliminated.

1 2
---
A B ===> AB
C D
E F
G H
B A ===> BA
I J
K L

I want to remove duplicates keeping only one of the first rows. This would lead to:

1 2
---
A B
C D
E F
G H
I J
K L

cannot figure out how to do that. any help would be appreciated.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

One solution could be:

import pandas as pd

data = {'1': {0: 'A', 1: 'C', 2: 'E', 3: 'G', 4: 'B', 5: 'I', 6: 'K'}, 
        '2': {0: 'B', 1: 'D', 2: 'F', 3: 'H', 4: 'A', 5: 'J', 6: 'L'}}
df = pd.DataFrame(data)

out = df.loc[~df.apply(sorted, axis=1).duplicated()]

print(out)

   1  2
0  A  B
1  C  D
2  E  F
3  G  H
5  I  J
6  K  L
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading