I have a table like this
| Unit | status | date |
|---|---|---|
| One | 1 | 1 |
| One | 1 | 2 |
| One | 1 | 3 |
| One | 0 | 4 |
| One | 0 | 5 |
| One | 1 | 6 |
| One | 1 | 7 |
and I want to create a new column where I’d have the size of the sequence of zeros from the status column. So for that example, the output would be
| Unit | status | date | gap |
|---|---|---|---|
| One | 1 | 1 | 0 |
| One | 1 | 2 | 0 |
| One | 1 | 3 | 0 |
| One | 0 | 4 | 2 |
| One | 0 | 5 | 2 |
| One | 1 | 6 | 0 |
| One | 1 | 7 | 0 |
This would have to be done for all the units in the DataFrame. I was basing myself on this question, but I’m stuck in the part where I set the total size for all the rows that are part of the gap
>Solution :
The usual way to group the block of some values is to cumsum on the other values. Given that your data is sorted by Unit:
df['gap'] = (df.groupby(['Unit', 'status', df['status'].cumsum()])
['status'].transform('size')
.where(df['status'].eq(0), other=0)
)
Output:
Unit status date gap
0 One 1 1 0
1 One 1 2 0
2 One 1 3 0
3 One 0 4 2
4 One 0 5 2
5 One 1 6 0
6 One 1 7 0