Home Pandas: remove duplicates based on substring

Questions

Pandas: remove duplicates based on substring

April 21, 2022

I have the following 2 columns, from a Pandas DataFrame:

antecedents        consequents
  apple               orange
  orange              apple

  apple               water
  apple               pineapple

  water               lemon
  lemon               water

I would like to remove duplicates that appear as bot antecedents and consequents, keeping only the first appearing, and thus obtain:

antecedents        consequents
  apple               orange

  apple               water
  apple               pineapple

  water               lemon

How can I achieve that using Pandas?

>Solution :

Use frozenset by both columns and test duplicates by Series.duplicated:

df2 = df[~df[['antecedents','consequents']].apply(frozenset,axis=1).duplicated()]

Or sorting values per rows in numpy.sort:

df1 = pd.DataFrame(np.sort(df[['antecedents','consequents']], axis=1), index=df.index)
df2 = df[~df1.duplicated()]

print (df2)
  antecedents consequents
0       apple      orange
2       apple       water
3       apple   pineapple
4       water       lemon

duplicates

byMR

Published April 21, 2022

Add a comment

after_commit and after_destroy callbacks are not called on ActiveRecord::Relation delete_by method

byMR

April 21, 2022

Questions

React Native text string must be rendered within a text

byMR

April 21, 2022

Questions

pl/SQL for loop

byMR

April 21, 2022

Questions

SQL join with Where clause still displays part of record

byMR

April 21, 2022

Questions

The best way of the using useState hook in Function Component?

byMR

April 21, 2022

Questions

SQL Query generates duplicates

byMR

April 21, 2022

Pandas: remove duplicates based on substring

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

after_commit and after_destroy callbacks are not called on ActiveRecord::Relation delete_by method

React Native text string must be rendered within a text

pl/SQL for loop

SQL join with Where clause still displays part of record

The best way of the using useState hook in Function Component?

SQL Query generates duplicates

Keep Up to Date with the Most Important News

Pandas: remove duplicates based on substring

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

after_commit and after_destroy callbacks are not called on ActiveRecord::Relation delete_by method

React Native text string must be rendered within a text

pl/SQL for loop

SQL join with Where clause still displays part of record

The best way of the using useState hook in Function Component?

SQL Query generates duplicates

Discover more from Dev solutions