Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Count consecutive boolean values in Python/pandas array for whole subset

I am looking for a way to aggregate pandas data frame by consecutive same values and perform actions like count or max on this aggregation.

for example, if I would have one column in df:

    my_column
0        0  
1        0  
2        1  
3        1  
4        1  
5        0  
6        0  
7        0  
8        0  
9        1  
10       1  
11       0

the result needs to be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    result
0        2  
1        2  
2        3  
3        3  
4        3  
5        4  
6        4  
7        4  
8        4  
9        2  
10       2  
11       1

Why: We have two 0 at the beginning, and three 1 next,…

What I need, is similar that this answer but for all elements in the group I need the same value.

The preferred answer would be one that shows this aggregation of the consecutive same element and applies the aggregation function to it. So that I could do even max value:

    my_column    other_value
0        0           7
1        0           4
2        1           1
3        1           0
4        1           5
5        0           1
6        0           1
7        0           2
8        0           8
9        1           1
10       1           0
11       0           2

and the result would be

    result
0        7  
1        7  
2        5  
3        5  
4        5  
5        8  
6        8  
7        8  
8        8  
9        1  
10       1  
11       2

>Solution :

You can use :

g = df["my_column"].ne(df["my_column"].shift()).cumsum()

out = df.groupby(g)["my_column"].transform("count")

Output :

print(out)
​
    my_column
0           2
1           2
2           3
3           3
4           3
5           4
6           4
7           4
8           4
9           2
10          2
11          1

NB : to get the max, use df.groupby(g)["other_value"].transform("max").

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading