Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas groupby with strings – return single string

I have a dataframe that looks like the following:

    ID  Type    Size
0   123 Red     5
1   456 Blue    7
2   789 Yellow  12
3   789 Yellow  4

I now want to aggregate by ID and take the mean of the size for duplicates. However, I wish to only return the same string for Type, not concatenate it. I have attempted to capture this using agg:

df = pd.DataFrame({'ID' : [123, 456, 789, 789], 'Type' : ['Red', 'Blue', 'Yellow', 'Yellow'], 'Size' : [5, 7, 12, 4]})

def identity(x):
    return x

special_columns = ['Type']
aggfuncs = {col: statistics.mean for col in df.columns}
aggfuncs.update({col:identity for col in special_columns})
df.groupby(['ID'], as_index=False).agg(aggfuncs)

However, this still turns into an array of the repeated string:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    ID  Type              Size
0   123 Red                 5
1   456 Blue                7
2   789 [Yellow, Yellow]    8

The end result I wanted was:

    ID  Type              Size
0   123 Red                 5
1   456 Blue                7
2   789 Yellow              8

How can this be achieved?

>Solution :

If each ID has one corresponding type, this should work

# use both ID and Type as grouper
res = df.groupby(["ID", "Type"], as_index=False)["Size"].mean()
res

enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading