Looking for the Pandas syntax for grouping rows between column boolean values.
DataTable
| distance | time | new |
|---|---|---|
| 0 | 2020-10-11 1:30:15 | true |
| 5 | 2020-10-11 1:31:15 | false |
| 10 | 2020-10-11 1:32:15 | false |
| 1 | 2020-10-11 2:30:15 | true |
| 2 | 2020-10-11 2:31:15 | false |
| 30 | 2020-10-11 2:32:15 | false |
| 31 | 2020-10-11 2:33:15 | false |
| 0 | 2020-10-12 1:31:15 | true |
| 5 | 2020-10-12 1:32:15 | false |
Question
Looking to group all the rows that are a part of each activity. A new activity is determined by the true value in the new column. Based on the data above there should be 3 groups. Each group needs to start with the first row with True and end with the last False before the next True.
How can this be achieved using groupby?
Code
Here is the ability to use groupby and group by the date, but like I said above, I’m looking to change this to group by the "new" column group rows into activities. I’m planning to use in a for loop like this though.
for idx, day in df.groupby(df.timestamp.dt.date):
print(idx)
print(day)
>Solution :
Just group by df['new'].cumsum():
for idx, day in df.groupby(df['new'].cumsum()):
print('-----')
print(idx)
print(day)
Output:
-----
1
distance time new
0 0 2020-10-11+1:30:15 True
1 5 2020-10-11+1:31:15 False
2 10 2020-10-11+1:32:15 False
-----
2
distance time new
3 1 2020-10-11+2:30:15 True
4 2 2020-10-11+2:31:15 False
5 30 2020-10-11+2:32:15 False
6 31 2020-10-11+2:33:15 False
-----
3
distance time new
7 0 2020-10-12+1:31:15 True
8 5 2020-10-12+1:32:15 False