Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

count sets of consecutive true values in a column

Let’s say that I have a dataframe as follow:

df = pd.DataFrame({'A':[1,1,1,1,1,0,0,1,1,0,1,1,1,1,1,0,0,0,0,0,1,1]})

Then, I convert it into a boolean form:

df.eq(1)
Out[213]: 
        A
0    True
1    True
2    True
3    True
4    True
5   False
6   False
7    True
8    True
9   False
10   True
11   True
12   True
13   True
14   True
15  False
16  False
17  False
18  False
19  False
20   True
21   True

What I want is to count consecutive sets of True values in the column. In this example, the output would be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    df
Out[215]: 
    A  count
0   1    5.0
1   1    2.0
2   1    5.0
3   1    2.0
4   1    NaN
5   0    NaN
6   0    NaN
7   1    NaN
8   1    NaN
9   0    NaN
10  1    NaN
11  1    NaN
12  1    NaN
13  1    NaN
14  1    NaN
15  0    NaN
16  0    NaN
17  0    NaN
18  0    NaN
19  0    NaN
20  1    NaN
21  1    NaN

My progress has been by using tools as ‘groupby’ and ‘cumsum’ but honestly, I can not figure out how to solve it. Thanks in advance

>Solution :

You can use df['A'].diff().ne(0).cumsum() to generate a grouper that will group each consecutive group of zeros/ones:

# A side-by-side comparison:
>>> pd.concat([df['A'], df['A'].diff().ne(0).cumsum()], axis=1)
    A  A
0   1  1
1   1  1
2   1  1
3   1  1
4   1  1
5   0  2
6   0  2
7   1  3
8   1  3
9   0  4
10  1  5
11  1  5
12  1  5
13  1  5
14  1  5
15  0  6
16  0  6
17  0  6
18  0  6
19  0  6
20  1  7
21  1  7

Thus, group by that grouper, calculate sums, replace zero with NaN + dropna, and reset the index:

df['count'] = df.groupby(df['A'].diff().ne(0).cumsum()).sum().replace(0, np.nan).dropna().reset_index(drop=True)

Output:

>>> df
    A    B
0   1  5.0
1   1  2.0
2   1  5.0
3   1  2.0
4   1  NaN
5   0  NaN
6   0  NaN
7   1  NaN
8   1  NaN
9   0  NaN
10  1  NaN
11  1  NaN
12  1  NaN
13  1  NaN
14  1  NaN
15  0  NaN
16  0  NaN
17  0  NaN
18  0  NaN
19  0  NaN
20  1  NaN
21  1  NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading