Process dataframe as blocks with same column

January 5, 2022

I have a dataframe that looks like this:

x | time |  zone
1     10     a
3     11     a
5     12     b
7     13     b
8     14     a
9     18     a
10    20     a
11    22     c
12    24     c

Imagine that zone is a state that changes over time, I would like to process a certain state individually so I can calculate some metrics at each state.

Basically, I want to divide the data frame into blocks, like this:
1st block:

x | time |  zone
1     10     a
3     11     a

2nd block:

5     12     b
7     13     b

3rd block:

8     14     a
9     18     a
10    20     a

and so on. With this I can calculate metrics like time spent in state, x difference, etc

How can I accomplish this using pandas?

Thanks!

>Solution :

The classical approach is to use this formula for generating groups of consecutive value.

This works by setting a boolean (True) whenever the value changes, and incrementing the count for each change using cumsum.

group = df['zone'].ne(df['zone'].shift()).cumsum()

output:

0    1
1    1
2    2
3    2
4    3
5    3
6    3
7    4
8    4
Name: zone, dtype: int64

Then you can use it to groupby your data (here showing as a dictionary for the example):

dict(list(df.groupby(group)))

output:

{1:    x  time zone
 0  1    10    a
 1  3    11    a,
 2:    x  time zone
 2  5    12    b
 3  7    13    b,
 3:     x  time zone
 4   8    14    a
 5   9    18    a
 6  10    20    a,
 4:     x  time zone
 7  11    22    c
 8  12    24    c}