Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to delete duplicated clients based on value in datetime column in Data Frame in Python Pandas?

I have DataFrame in Python Pandas like below:

date_col   | ID  | Phone
-----------|-----|--------
2020-05-17 | 111 | Apple
2020-06-11 | 111 | Sony
2021-12-28 | 222 | Sony

As you can see ID "111" is duplicated and I need to do that when ID is duplicated I need to take row with the latest date from column "date_col" (this col is in format datetime64).
So as a result I need something like below becase ID "111" is duplicated but date 2020-06-11 is higher than 2020-05-17:

date_col   | ID  | Phone
-----------|-----|--------
2020-06-11 | 111 | Sony
2021-12-28 | 222 | Sony

How can I do that in Python Pandas ?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Try:

df = df.sort_values(by="date_col").drop_duplicates(subset="ID", keep="last")
print(df)

Prints:

    date_col   ID Phone
1 2020-06-11  111  Sony
2 2021-12-28  222  Sony
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading