Without further ado, my input (s1) & expected-output (df) are below :
#INPUT
s1 = pd.Series(['a', np.nan, 'b', 'c', np.nan, np.nan, 'd', np.nan]).rename('col1')
#EXPECTED-OUTPUT
s2 = pd.Series([1, 2, 3, 3, 4, 4, 5, 6]).rename('col2') # flag the transition null>notnull or vice-versa
s3 = pd.Series([0, 1, 0, 0, 2, 3, 0, 4]).rename('col3') # counter of the null values
df = pd.concat([s1, s2, s3], axis=1)
col1 col2 col3
0 a 1 0
1 NaN 2 1
2 b 3 0
3 c 3 0
4 NaN 4 2
5 NaN 4 3
6 d 5 0
7 NaN 6 4
I tried plenty of weird combinations of cumsum and masks but without any success. Likely that’s because I don’t have the basics of the logical thinking. What questions do I need to ask myself before starting to build the chain that will give me my series ?
Any help would be greately appreciated, guys !
>Solution :
You can use isna with cumsum and where for "col3". For "col2" a classical ne+shift/cumsum:
m = df['col1'].isna()
# if the flag is different from the previous one, increment
df['col2'] = m.ne(m.shift()).cumsum()
# increment on each True, mask the False
df['col3'] = m.cumsum().where(m, 0)
Output:
col1 col2 col3
0 a 1 0
1 NaN 2 1
2 b 3 0
3 c 3 0
4 NaN 4 2
5 NaN 4 3
6 d 5 0
7 NaN 6 4
Intermediates:
col1 col2 col3 m m.shift() m.ne(m.shift()) m.cumsum()
0 a 1 0 False NaN True 0
1 NaN 2 1 True False True 1
2 b 3 0 False True True 1
3 c 3 0 False False False 1
4 NaN 4 2 True False True 2
5 NaN 4 3 True True False 3
6 d 5 0 False True True 3
7 NaN 6 4 True False True 4