Home Drop duplicates from a pandas dataframe based on all columns starting from the third one

Questions

Drop duplicates from a pandas dataframe based on all columns starting from the third one

January 18, 2022

I have a dataframe with 50 + more columns, and the first 2 are unique IDs. For some reason for different IDs the data from the third column can be the exact same.

What I want to achieve is to delete the duplicates from the dataframe based on all columns starting from the third one. If there are more than 1 rows with different IDs and the same data from the third column, it is all the same which row we will keep, it can be the last one or the first one, whichever is easier to do.

I am fairly new to pandas, what I tried is something like this:

df.drop_duplicates(subset=df.iloc[2:], keep="last")

>Solution :

df.drop_duplicates expects a list of column names as the subset argument, so try this:

df.drop_duplicates(subset=df.columns[2:], keep="last")

pandas

byMR

Published January 18, 2022

Add a comment

Shift only certain column right in dataframe, without overwritting existing columns

byMR

January 18, 2022

Questions

When reading from /dev/tty, what is happening in input and keyboard buffer?

byMR

January 18, 2022

Questions

Angular / RxJs: Download Progress

byMR

January 18, 2022

Questions

GitHub Issues Rest API returns 404

byMR

January 18, 2022

Questions

Calculate total difference of values between two timestamps

byMR

January 18, 2022

Questions

Multilevel Location – Output as delimited series

byMR

January 18, 2022

Drop duplicates from a pandas dataframe based on all columns starting from the third one