Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to make an aggregation that depends on another one?

My input is this dataframe :

df = pd.DataFrame({'category': ['A', 'A', 'A', 'B', 'B', 'C'],
 'id': ['id002', 'id001', 'id003', 'id004', 'id005', 'id006'],
 'total': [None, 25.0, 10.0, 20.0, None, 5.0],
 'origin': ['x', 'z', 'y', 'y', 'x', 'y']})

print(df)

  category     id  total origin
0        A  id002    NaN      x
1        A  id001   25.0      z
2        A  id003   10.0      y
3        B  id004   20.0      y
4        B  id005    NaN      x
5        C  id006    5.0      y

I want in each category to get the minimum id, the sum of total column and the origin that correspond to the minimum id that was choosen.

For that I made the code below :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

result = df.groupby('category', as_index=False).agg(
    {'id': 'min', 'total': 'sum', 'origin': 'first'}
)

print(result)

  category     id  total origin
0        A  id001   35.0      x
1        B  id004   20.0      y
2        C  id006    5.0      y

Unfortunately there is two problems. The origin is wrong for A and B and the index is gone.

My expected output is this :

  category     id  total origin
1        A  id001   35.0      z
3        B  id004   20.0      y
5        C  id006    5.0      y

Have you guys dealed with a situation like this ? Is there a solution ?

>Solution :

Use DataFrame.sort_values with DataFrame.drop_duplicates for rows by minimal id per groups and rewrite total column with mappping aggregation sum:

s = df.groupby('category')['total'].sum()

result = (df.sort_values(['category','id'])
            .drop_duplicates('category')
            .assign(total=lambda x: x['category'].map(s)))
print(result)
  category     id  total origin
1        A  id001   35.0      z
3        B  id004   20.0      y
5        C  id006    5.0      y
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading