Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to groupby continous records (e.g., "gaps and islands") in pandas?

My problems is the same as, How to group by continuous records in SQL, only I need a solution in Pandas.

Given a df like

ID  Colour
------------
 1   Red
 2   Red
 3   Red
 4   Red
 5   Red
 6   Green
 7   Green
 8   Green
 9   Green
10   Red
11   Red
12   Red
13   Red
14   Green
15   Green
16   Green
17   Blue
18   Blue
19   Red
20   Blue

I want it grouped into

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

color  minId
------------
Red     1
Green   6
Red    10
Green  14
Blue   17
Red    19
Blue   20

It is okay to change the name of the colors (e.g., Green1)

The solution should generalize into other aggregations other than just min

>Solution :

You can grouping by consecutive values by helper Series created by compared shifted values and cumsum and then aggregate first and min:

g = df['Colour'].ne(df['Colour'].shift()).cumsum()
df = df.groupby(g).agg(color=('Colour','first'), minId=('ID','min')).reset_index(drop=True)
print (df)
   color  minId
0    Red      1
1  Green      6
2    Red     10
3  Green     14
4   Blue     17
5    Red     19
6   Blue     20
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading