Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Grouping a dataframe and conserving the same number of rows

I’m trying to make the kind of transformation shown in the image below :

enter image description here

I made the code below but unfortunately I’m not getting the result I’m looking for:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd

df = pd.DataFrame({'Id': ['Id001', 'Id002', 'Id002', 'Id003', 'Id003', 'Id003', 'Id004', 'Id004'],
                   'Values': ['red', 'brown','white','blue', 'green', 'yellow', 'rose', 'purple']})

out = (df['Values']
      .astype(str)
      .groupby(df['Id'])
      .agg('|'.join)
      .reset_index())

Do you have any suggestions/propositions, please ?

>Solution :

You’re close, you just need to use out to assign the result back to the df (it’s better if you don’t reset_index() in this case):

import pandas as pd

df = pd.DataFrame({'Id': ['Id001', 'Id002', 'Id002', 'Id003', 'Id003', 'Id003', 'Id004', 'Id004'],
                   'Values': ['red', 'brown','white','blue', 'green', 'yellow', 'rose', 'purple']})

out = (df['Values']
      .astype(str)
      .groupby(df['Id'])
      .agg('|'.join))

counts = df['Id'].value_counts()
df['Id_occurrences'] = [counts.loc[id] for id in df['Id']]
df['Values_grouped'] = [out.loc[id] for id in df['Id']]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading