Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas – find index distance between batches of equal values in a row

I would like to find the "distance" between the starting points of two batches of 1‘s in a row or in other words the length of batches of "1‘s followed by 0‘s" (indicated with spaces below).

So I start with the following series:

df = pd.Series([0,0, 1,1,1,0,0,  1,1,0,  1,1,1,0,0,0,0,  1,1,1,0,0,0,  1,1,0,0])

and would like to get the following output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

0    NaN
1    5.0
2    3.0
3    7.0
4    6.0
5    NaN

I know how to get either the counts of the number of 1‘s in a row or the counts of the number of 0‘s in a row but I don’t know how to deal with the case of this pattern of 1‘s followed by 0‘s as a pattern for its own…

Having NaN’s at the beginning and end would be the ideal case but is not necessary.

>Solution :

Use diff() to find the difference, 1 indicates starting of a new batch. Then you can use np.diff on the index:

s = df.diff().eq(1)
np.diff(s.index[s])

# or a one-liner
# np.diff(np.where(df.diff().eq(1))[0])

Output:

array([5, 3, 7, 6])

Note There is an edge case where the series starts with a 1.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading