I have a dataframe that looks like this:
x | time | zone
1 10 a
3 11 a
5 12 b
7 13 b
8 14 a
9 18 a
10 20 a
11 22 c
12 24 c
Imagine that zone is a state that changes over time, I would like to process a certain state individually so I can calculate some metrics at each state.
Basically, I want to divide the data frame into blocks, like this:
1st block:
x | time | zone
1 10 a
3 11 a
2nd block:
5 12 b
7 13 b
3rd block:
8 14 a
9 18 a
10 20 a
and so on. With this I can calculate metrics like time spent in state, x difference, etc
How can I accomplish this using pandas?
Thanks!
>Solution :
The classical approach is to use this formula for generating groups of consecutive value.
This works by setting a boolean (True) whenever the value changes, and incrementing the count for each change using cumsum.
group = df['zone'].ne(df['zone'].shift()).cumsum()
output:
0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
Name: zone, dtype: int64
Then you can use it to groupby your data (here showing as a dictionary for the example):
dict(list(df.groupby(group)))
output:
{1: x time zone
0 1 10 a
1 3 11 a,
2: x time zone
2 5 12 b
3 7 13 b,
3: x time zone
4 8 14 a
5 9 18 a
6 10 20 a,
4: x time zone
7 11 22 c
8 12 24 c}