This is my dataframe:
import pandas as pd
df = pd.DataFrame({'a': [1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0]})
And my desired outcome which is about grouping them is:
a
0 1
1 1
2 1
3 0
4 1
5 0
6 1
7 1
8 0
10 1
11 1
12 0
Basically, I want to group them by streak of 1 and one row after where streak ends.
For example for the first group I want the first three rows plus the row after it.
I have tried the solutions of these posts: post1, post2.
And also this code:
df.groupby(df.a.diff().cumsum().eq(1))
But it didn’t work.
>Solution :
You can use a reverse cumsum to form the groups:
g = df.loc[::-1, 'a'].eq(0).cumsum()
out = [g for _,g in df.groupby(g, sort=False) if len(g)>1]
Output:
[ a
0 1
1 1
2 1
3 0,
a
4 1
5 0,
a
6 1
7 1
8 0,
a
10 1
11 1
12 0]