Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to find a group midpoint of a pandas or python array?

There is an array, that look like that (it’s actually a column in a pandas dataframe, but any suggestions how to make in a plain python would also work)

[0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,1, 0]

For each subsequence of 1s I need to find a midpoint position: an index of a point in the middle of this subsequence, or the closest to it. So for the example above, these would be 6 for the first subsequence, 18 for the second etc.

It can be easily done with just a naive looping, but I wonder if there is more efficient way (maybe built-in pandas function?)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Try with groupby:

  1. Use the series (i.e. column) index to groupby sequences of 0s and 1s with srs.ne(srs.shift()).cumsum()
  2. Get the average of the first and last indices for each sequence
  3. Keep only the unique values where the original column value is 1
srs = pd.Series([0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,1,0])
g = srs.index.to_series().groupby(srs.ne(srs.shift()).cumsum())

>>> g.transform("first").add(g.transform("last")).floordiv(2).where(srs.eq(1)).dropna().unique()
array([ 6., 18., 25.])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading