Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get last value from pandas groupeddataframe, summed by another column

I have the following dataframe

x = pd.DataFrame(
  {
    'FirstGroupCriterium': [1,1,2,2,3],
    'SortingCriteria': [1,1,1,2,1],
    'Value': [10,20,30,40,50]
  }
)
x.sort_values('SortingCriteria').groupby('FirstGroupCriterium').agg(last_value=('Value', 'last'))

The latter outputs:

FirstGroupCriterium last_value
1 20
2 40
3 50

What I would like to have, is to sum up the last value, based on the last SortingCriteria. So in this case:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

FirstGroupCriterium last_value
1 10+20 = 30
2 40
3 50

My initial idea was to call a custom aggregator function that groups the data yet again, but that fails.

def last_value(group):
  return group.groupby('SortingCriteria')['Value'].sum().tail(1)

Do you have any idea how to get this to work? Thank you!

>Solution :

Sorting by both columns first, then filter last rows per FirstGroupCriterium in GroupBy.transform and aggregate sum:

df = x.sort_values(['FirstGroupCriterium','SortingCriteria'])

df1 = df[df['SortingCriteria'].eq(df.groupby('FirstGroupCriterium')['SortingCriteria'].transform('last'))]
print (df1)
   FirstGroupCriterium  SortingCriteria  Value
0                    1                1     10
1                    1                1     20
3                    2                2     40
4                    3                1     50
   
   
df2 = df1.groupby(['FirstGroupCriterium'],as_index=False)['Value'].sum()
print (df2)
   FirstGroupCriterium  Value
0                    1     30
1                    2     40
2                    3     50

Anoter idea is aggregate sum by both columns and then remove duplicates with keep last row by DataFrame.drop_duplicates:

df2 = (df.groupby(['FirstGroupCriterium','SortingCriteria'],as_index=False)['Value'].sum()
         .drop_duplicates(['FirstGroupCriterium'], keep='last'))
print (df2)

   FirstGroupCriterium  SortingCriteria  Value
0                    1                1     30
2                    2                2     40
3                    3                1     50
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading