How to take the cumulative maximum of a column based on another column

April 28, 2023

I have a DataFrame like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "realization_id": np.repeat([0, 1], 6),
    "sample_size": np.tile([0, 1, 2], 4),
    "num_obs": np.tile(np.repeat([25, 100], 3), 2),
    "accuracy": [0.8, 0.7, 0.8, 0.6, 0.7, 0.5, 0.6, 0.7, 0.8, 0.7, 0.9, 0.7],
    "prob": [0.94, 0.96, 0.95, 0.98, 0.93, 0.92, 0.90, 0.92, 0.95, 0.9, 0.91, 0.92]
})

df["accum_max_prob"] = df.groupby(["realization_id", "num_obs"])["prob"].cummax()

And I want to know how to create a column with this output:

df["desired_accuracy"] = [0.8, 0.7, 0.7, 0.6, 0.6, 0.6, 0.6, 0.7, 0.8, 0.7, 0.9, 0.7]

Each entry of desired_accuracy equals the accuracy value that corresponds to the row where the highest prob has been achieved so far by group (that is why I create accum_max_prob).

So, for example: the first value is 0.8 because there is no data prior to that, but then the next one is 0.7 because the prob of the second row is greater than the first. The third value stays the same, because the third prob is lower than the second one, so it does not update desired_accuracy. For each pair (realization_id, num_obs) the criteria resets.

How can I do it in a vectorized fashion using Pandas?

>Solution :

It looks like:

df['desired_accuracy'] = df['accuracy'].mask(df['prob'].lt(df['accum_max_prob'])).ffill()

Output:

    realization_id  sample_size  num_obs  accuracy  prob  accum_max_prob  desired_accuracy
0                0            0       25       0.8  0.94            0.94               0.8
1                0            1       25       0.7  0.96            0.96               0.7
2                0            2       25       0.8  0.95            0.96               0.7
3                0            0      100       0.6  0.98            0.98               0.6
4                0            1      100       0.7  0.93            0.98               0.6
5                0            2      100       0.5  0.92            0.98               0.6
6                1            0       25       0.6  0.90            0.90               0.6
7                1            1       25       0.7  0.92            0.92               0.7
8                1            2       25       0.8  0.95            0.95               0.8
9                1            0      100       0.7  0.90            0.90               0.7
10               1            1      100       0.9  0.91            0.91               0.9
11               1            2      100       0.7  0.92            0.92               0.7