How to calculate cumulative missing values group by an ID (in python)?

a) given the following "id" and "freq"

df = pd.DataFrame({'id':[1,1,1,1,1,1,2,2,2,2,2,3,3,3],'freq':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})

df

b) how to calculate cumulative missing of "freq" group by "id"? with a reset to zero when freq > 0

so that the result ‘cum_null’ should look like –
print(df([‘cum_null’])
0 0 1 2 3 0 0 0 0 0 1 1 0 0

c) I’ve tried this. Very close, but cannot reset to zero when freq > 0

df['cum_null'] = id_grp['freq'].apply(lambda x:x.isnull().astype(int).cumsum())

df

>Solution :

If you case you can do groupby with mask

df['cum_null'] = df.freq.isnull().groupby(df['id']).cumsum().where(df.freq.isnull(),0)
0     0
1     0
2     1
3     2
4     3
5     0
6     0
7     0
8     0
9     0
10    1
11    1
12    0
13    0
Name: freq, dtype: int64

Leave a Reply