How to find a group midpoint of a pandas or python array?

March 29, 2022

There is an array, that look like that (it’s actually a column in a pandas dataframe, but any suggestions how to make in a plain python would also work)

[0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,1, 0]

For each subsequence of 1s I need to find a midpoint position: an index of a point in the middle of this subsequence, or the closest to it. So for the example above, these would be 6 for the first subsequence, 18 for the second etc.

It can be easily done with just a naive looping, but I wonder if there is more efficient way (maybe built-in pandas function?)

>Solution :

Try with groupby:

Use the series (i.e. column) index to groupby sequences of 0s and 1s with srs.ne(srs.shift()).cumsum()
Get the average of the first and last indices for each sequence
Keep only the unique values where the original column value is 1

srs = pd.Series([0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,1,0])
g = srs.index.to_series().groupby(srs.ne(srs.shift()).cumsum())

>>> g.transform("first").add(g.transform("last")).floordiv(2).where(srs.eq(1)).dropna().unique()
array([ 6., 18., 25.])