Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

remove duplicated rows and convert it to list or tuple

I have dataframe as follows (Name is index):

Name Age year
Tom 20 2020
Tom 20 2021
Nick 19 2019
Jack 18 2018

my goal is to remove duplicate and convert the column year to tuple or list, like below

Name Age year
Tom 20 (2020, 2019)
Nick 19 2019
Jack 18 2018

how can I do that efficiently since my df has more than 800,000 rows

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use np.unique on groupby. Assuming Name is already the index:

>>> df.groupby(level=0).agg(np.unique)
      Age          year
Name                   
Jack   18          2018
Nick   19          2019
Tom    20  [2020, 2021]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading