Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas simplify adding aggregate columns (average, sum, count, max) in one step like with groupby.agg()

I have some data similar to:

df = pd.DataFrame({'ID':[1, 2, 3, 4, 5],
                   '1' :[7, 8, 2, 3, 0],
                   '2' :[1, 0, 4, 9, 9],
                   '3' :[3, 1, 1, 6, 8]})

   ID  1  2  3
0   1  7  1  3
1   2  8  0  1
2   3  2  4  1
3   4  3  9  6
4   5  0  9  8

and I want to add an Av, Sum, Count and Max column for each row for the values in columns 1, 2 and 3. Currently I am doing this like this:

cols = ['1', '2', '3']
df['Av'] = df[cols].mean(axis=1)
df['Sum'] = df[cols].sum(axis=1).fillna(0)
df['Count'] = df[cols].count(axis=1).fillna(0)
df['Max'] = df[cols].max(axis=1).fillna(0)

   ID  1  2  3        Av  Sum  Count  Max
0   1  7  1  3  3.666667   11      3    7
1   2  8  0  1  3.000000    9      3    8
2   3  2  4  1  2.333333    7      3    4
3   4  3  9  6  6.000000   18      3    9
4   5  0  9  8  5.666667   17      3    9

I was wondering if there is a way to do this neater/in 1 or 2 lines like with groupby().agg()?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use DataFrame.agg with list of functions:

df = df.join(df[cols].agg(['mean','sum','count','max'], axis=1).fillna(0).convert_dtypes())
print (df)
   ID  1  2  3      mean  sum  count  max
0   1  7  1  3  3.666667   11      3    7
1   2  8  0  1       3.0    9      3    8
2   3  2  4  1  2.333333    7      3    4
3   4  3  9  6       6.0   18      3    9
4   5  0  9  8  5.666667   17      3    9

With rename columns:

d = {'mean':'Av', 'sum':'Sum','count':'Count', 'max':'Max'}
df = (df.join(df[cols].agg(list(d.keys()), axis=1).fillna(0)
                                  .convert_dtypes().rename(columns=d)))
print (df)
   ID  1  2  3        Av  Sum  Count  Max
0   1  7  1  3  3.666667   11      3    7
1   2  8  0  1       3.0    9      3    8
2   3  2  4  1  2.333333    7      3    4
3   4  3  9  6       6.0   18      3    9
4   5  0  9  8  5.666667   17      3    9

Another idea:

d = {'mean':'Av', 'sum':'Sum','count':'Count', 'max':'Max'}
df = df.join(pd.concat({v: df[cols].agg(k, axis=1).fillna(0) 
                                      for k, v in d.items()}, axis=1))
print (df)
   ID  1  2  3        Av  Sum  Count  Max
0   1  7  1  3  3.666667   11      3    7
1   2  8  0  1  3.000000    9      3    8
2   3  2  4  1  2.333333    7      3    4
3   4  3  9  6  6.000000   18      3    9
4   5  0  9  8  5.666667   17      3    9
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading