Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get maximum previous nonmissing value within group in pandas dataframe

I have a pandas dataframe with a group structure where the value of interest, val, is guaranteed to be sorted within the group. However, there are missing values in val which I need to bound. The data I have looks like this:

group_id    id_within_group     val
1           1                   3.2  
1           2                   4.8
1           3                   5.2
1           4                   NaN
1           5                   7.5
2           1                   1.8
2           2                   2.8
2           3                   NaN
2           4                   5.4
2           5                   6.2

I now want to create a lower bound, max_prev which is the maximum value within the group for the rows before the current row, whereas min_next is the minimum value within the group for the rows after the current row. It is not possible to just look one row back and ahead, because there could be clusters of NaN. I don’t need to take care of the edge cases of the first and last row within group. The desired output would hence be

group_id    id_within_group     val     max_prev    min_next
1           1                   3.2     NaN         4.8
1           2                   4.8     3.2         5.2
1           3                   5.2     4.8         7.5
1           4                   NaN     5.2         7.5
1           5                   7.5     5.2         NaN
2           1                   1.8     NaN         2.8
2           2                   2.8     1.8         5.4
2           3                   NaN     2.8         5.4
2           4                   5.4     2.8         6.2
2           5                   6.2     5.4         NaN

How can I achieve this in a reasonable fast way?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You could use a custom groupby.transform with ffill/bfill+shift:

g = df.groupby('group_id')['val']

df['max_prev'] = g.transform(lambda x: x.ffill().shift())
df['min_next'] = g.transform(lambda x: x[::-1].ffill().shift())

# or
df['min_next'] = g.transform(lambda x: x.bfill().shift(-1))

If your values are not sorted, add a cummax/cummin:

g = df.groupby('group_id')['val']

df['max_prev'] = g.transform(lambda x: x.ffill().cummax().shift())
df['min_next'] = g.transform(lambda x: x[::-1].ffill().cummin().shift())

Output:

   group_id  id_within_group  val  max_prev  min_next
0         1                1  3.2       NaN       4.8
1         1                2  4.8       3.2       5.2
2         1                3  5.2       4.8       7.5
3         1                4  NaN       5.2       7.5
4         1                5  7.5       5.2       NaN
5         2                1  1.8       NaN       2.8
6         2                2  2.8       1.8       5.4
7         2                3  NaN       2.8       5.4
8         2                4  5.4       2.8       6.2
9         2                5  6.2       5.4       NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading