Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

What's the logic behind "cumsum" to make flags, compute counts and form groups?

Without further ado, my input (s1) & expected-output (df) are below :

#INPUT
s1 = pd.Series(['a', np.nan, 'b', 'c', np.nan, np.nan, 'd', np.nan]).rename('col1')

#EXPECTED-OUTPUT
s2 = pd.Series([1, 2, 3, 3, 4, 4, 5, 6]).rename('col2') # flag the transition null>notnull or vice-versa
s3 = pd.Series([0, 1, 0, 0, 2, 3, 0, 4]).rename('col3') # counter of the null values
df = pd.concat([s1, s2, s3], axis=1)

  col1  col2  col3
0    a     1     0
1  NaN     2     1
2    b     3     0
3    c     3     0
4  NaN     4     2
5  NaN     4     3
6    d     5     0
7  NaN     6     4

I tried plenty of weird combinations of cumsum and masks but without any success. Likely that’s because I don’t have the basics of the logical thinking. What questions do I need to ask myself before starting to build the chain that will give me my series ?

Any help would be greately appreciated, guys !

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use isna with cumsum and where for "col3". For "col2" a classical ne+shift/cumsum:

m = df['col1'].isna()

# if the flag is different from the previous one, increment
df['col2'] = m.ne(m.shift()).cumsum()

# increment on each True, mask the False
df['col3'] = m.cumsum().where(m, 0)

Output:

  col1  col2  col3
0    a     1     0
1  NaN     2     1
2    b     3     0
3    c     3     0
4  NaN     4     2
5  NaN     4     3
6    d     5     0
7  NaN     6     4

Intermediates:

  col1  col2  col3      m m.shift()  m.ne(m.shift())  m.cumsum()
0    a     1     0  False       NaN             True           0
1  NaN     2     1   True     False             True           1
2    b     3     0  False      True             True           1
3    c     3     0  False     False            False           1
4  NaN     4     2   True     False             True           2
5  NaN     4     3   True      True            False           3
6    d     5     0  False      True             True           3
7  NaN     6     4   True     False             True           4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading