Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to stop and restart cumsum using a marker in another column

I have a pandas dataframe with values that needs to be totalized inside a period for each device, but the periods ends are marked in another column (an easy identifiable event line). The cumsum must go until it finds this end of period marker and then start again from zero(the first value on the next line).

  device_name value end
0   A5         1    False
1   A5         7    False
2   A5         2    True
3   A5         1    False
4   A5         1    False
5   A5         1    False
6   A5         1    True
7   A6         2    False
8   A6         4    False
9   A6         2    False
10  A6         2    True
11  A6         2    False
12  A6         2    False

the result should be something like

  device_name value end     total
0   A5         1    False    1  
1   A5         7    False    8
2   A5         2    True     10
3   A5         1    False    1
4   A5         1    False    2
5   A5         1    False    3
6   A5         1    True     4
7   A6         2    False    2
8   A6         4    False    6
9   A6         2    False    8
10  A6         2    True     10
11  A6         2    False    2
12  A6         2    False    4

I tried cumsum but I can’t separate the end of a period from the other, I can iterate over rows and if i find a True on df.end.shift(1) i ignore the last value, but there are at least 60000 and probably there are faster methods, Any suggestion?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

IIUC, you want to cumsum per group until you reach a True. Then, after this row, restart the count.

You can use an extra group based on the "end" value (also using a cumsum):

df['total'] = (df.groupby(['device_name',
                           df['end'].shift(1, fill_value=0).cumsum()])
               ['value'].cumsum())

output:

   device_name  value    end  total
0           A5      1  False      1
1           A5      7  False      8
2           A5      2   True     10
3           A5      1  False      1
4           A5      1  False      2
5           A5      1  False      3
6           A5      1   True      4
7           A6      2  False      2
8           A6      4  False      6
9           A6      2  False      8
10          A6      2   True     10
11          A6      2  False      2
12          A6      2  False      4

NB. note that I get a different value for row #2

NB.2. for purists, the extra group could also be computed using a groupby. It doesn’t really matter in this case. The internal groups will just not start from zero after the first group, but their name will not be used anywhere in the output.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading