Consider the following array:
arr = np.array(
[
[10, np.nan],
[20, np.nan],
[np.nan, 50],
[15, 20],
[np.nan, 30],
[np.nan, np.nan],
[10, np.nan],
]
)
For every cell in each column in arr I need to find the distance to the next non-NaN value.
That is, the expected outcome should look like this:
expected = np.array(
[
[1, 2],
[2, 1],
[1, 1],
[3, 1],
[2, np.nan],
[1, np.nan],
[np.nan, np.nan]
]
)
>Solution :
Using pandas, you can compute a reverse cumcount, with mask and shift:
out = (pd.DataFrame(arr).notna()[::-1]
.apply(lambda s: s.groupby(s.cumsum()).cumcount().add(1)
.where(s.cummax()).shift()[::-1])
.to_numpy()
)
Output:
array([[ 1., 2.],
[ 2., 1.],
[ 1., 1.],
[ 3., 1.],
[ 2., nan],
[ 1., nan],
[nan, nan]])