Home Python – aggregate groupby multiple columns, skipping nan cells

Questions

Python – aggregate groupby multiple columns, skipping nan cells

August 18, 2022

I’m trying to aggregate a dataframe accross multiple columns, grouped by Date. Some cells contain nan, which I would like to keep. So my df looks like this:

import random
import numpy as np
import pandas as pd

rng = ['2015-02-24','2015-02-24','2015-02-24','2015-02-24','2015-02-24','2015-02-24',
       '2015-02-25','2015-02-25','2015-02-25','2015-02-25','2015-02-25','2015-02-25']
rng = pd.to_datetime(rng)
Predicted = [random.randrange(-1, 50, 1) for i in range(12)]
Actual_data = [5,3,8,2,9,20, np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]
Category = ['A','A','A','B','B','B','A','A','A','B','B','B']

df = pd.DataFrame({ 'Date': rng, 'Predicted' : Predicted, 'Actual': Actual_data, 'Category': Category})

          Date  Predicted  Actual Category
0   2015-02-24         0      5.0        A
1   2015-02-24         36     3.0        A
2   2015-02-24         30     8.0        A
3   2015-02-24         33     2.0        B
4   2015-02-24         49     9.0        B
5   2015-02-24         42    20.0        B
6   2015-02-25         25     NaN        A
7   2015-02-25          9     NaN        A
8   2015-02-25         21     NaN        A
9   2015-02-25         39     NaN        B
10  2015-02-25         17     NaN        B
11  2015-02-25         11     NaN        B

I Want to group by : Date and thereby summing Predicted and Actual, leaving Category out. Also, As I want to maintain the groups that only have NaN, to keep the NaN value, not transforming it to 0. So in the end I would like to have this:

          Date  Predicted  Actual 
0   2015-02-24        190      49   
1   2015-02-25        122     NaN

I tried things like:

df = data.groupby(['Date'])['Predicted', 'Actual'].agg('sum').reset_index()

which looked like it worked, but then I saw that it transforms NaN to 0, which I don’t want. I tried it with adding skipna=False, but it says it’s not possible to add it. I did update to the latest version of pandas as I read somewhere that this might be the problem, but it didn’t matter. Could someone help me..?

>Solution :

Perhaps you could write a lambda function that returns the sum if any of the values are not null, otherwise return NaN

df.groupby(['Date'])['Predicted', 'Actual'].agg(lambda x: sum(x) if any(x) else np.nan).reset_index()

Output

        Date  Predicted  Actual
0 2015-02-24        174    47.0
1 2015-02-25        164     NaN

byMR

Published August 18, 2022

Add a comment

How do I correct this method

byMR

August 18, 2022

Questions

Why is the floating-point variable of the subclass the same memory address as the floating-point variable of the parent class?

byMR

August 18, 2022

Questions

Converting Table To Columns

byMR

August 18, 2022

Questions

Stop / Interrupt the long running request

byMR

August 18, 2022

Questions

Checking pandas Timestamp for timezone string

byMR

August 18, 2022

Questions

Walk through objects nested at different levels and delete selected properties

byMR

August 18, 2022

Python – aggregate groupby multiple columns, skipping nan cells

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How do I correct this method

Why is the floating-point variable of the subclass the same memory address as the floating-point variable of the parent class?

Converting Table To Columns

Stop / Interrupt the long running request

Checking pandas Timestamp for timezone string

Walk through objects nested at different levels and delete selected properties

Keep Up to Date with the Most Important News

Python – aggregate groupby multiple columns, skipping nan cells

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How do I correct this method

Why is the floating-point variable of the subclass the same memory address as the floating-point variable of the parent class?

Converting Table To Columns

Stop / Interrupt the long running request

Checking pandas Timestamp for timezone string

Walk through objects nested at different levels and delete selected properties

Discover more from Dev solutions