Pandas dataframe – groupby() blocks of constant value over multiple columns

May 29, 2023

I have the following pandas dataframe:

df = pd.DataFrame({
   'A': [1,1,1,1,2,2,2,1,1,3,3,3],
   'B': [0,0,1,1,0,0,0,1,1,0,0,0],
});
df.index.names = ['Index']
df
        A   B
Index       
0       1   0
1       1   0
2       1   1
3       1   1
4       2   0
5       2   0
6       2   0
7       1   1
8       1   1
9       3   0
10      3   0
11      3   0

I can group this dataframe into blocks of constant ‘A’ like so:

df = df.groupby(df['A'].diff().ne(0).cumsum()).apply(lambda x: x)
df.index.names = ['Block', 'Index']
df
                A   B
Block   Index       
1       0       1   0
        1       1   0
        2       1   1
        3       1   1
2       4       2   0
        5       2   0
        6       2   0
3       7       1   1
        8       1   1
4       9       3   0
        10      3   0
        11      3   0

How do I instead group this dataframe into blocks of constant ‘A’ AND constant ‘B’? My desired result is:

                A   B
Block   Index       
1       0       1   0
        1       1   0
2       2       1   1
        3       1   1
3       4       2   0
        5       2   0
        6       2   0
4       7       1   1
        8       1   1
5       9       3   0
        10      3   0
        11      3   0

>Solution :

Use the same logic with any (df.diff().ne(0).any(axis=1).cumsum()) as grouper:

out = df.groupby(df.diff().ne(0).any(axis=1).cumsum(), group_keys=True).apply(lambda x: x)
out.index.names = ['Block', 'Index']

Or:

out = (df.assign(Block=df.diff().ne(0).any(axis=1).cumsum())
         .groupby('Block', group_keys=True)
         .apply(lambda x: x)
       )

Output:

             A  B
Block Index      
1     0      1  0
      1      1  0
2     2      1  1
      3      1  1
3     4      2  0
      5      2  0
      6      2  0
4     7      1  1
      8      1  1
5     9      3  0
      10     3  0
      11     3  0