How to groupby continous records (e.g., "gaps and islands") in pandas?

February 10, 2022

My problems is the same as, How to group by continuous records in SQL, only I need a solution in Pandas.

Given a df like

ID  Colour
------------
 1   Red
 2   Red
 3   Red
 4   Red
 5   Red
 6   Green
 7   Green
 8   Green
 9   Green
10   Red
11   Red
12   Red
13   Red
14   Green
15   Green
16   Green
17   Blue
18   Blue
19   Red
20   Blue

I want it grouped into

color  minId
------------
Red     1
Green   6
Red    10
Green  14
Blue   17
Red    19
Blue   20

It is okay to change the name of the colors (e.g., Green1)

The solution should generalize into other aggregations other than just min

>Solution :

You can grouping by consecutive values by helper Series created by compared shifted values and cumsum and then aggregate first and min:

g = df['Colour'].ne(df['Colour'].shift()).cumsum()
df = df.groupby(g).agg(color=('Colour','first'), minId=('ID','min')).reset_index(drop=True)
print (df)
   color  minId
0    Red      1
1  Green      6
2    Red     10
3  Green     14
4   Blue     17
5    Red     19
6   Blue     20