Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to take the cumulative maximum of a column based on another column

I have a DataFrame like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "realization_id": np.repeat([0, 1], 6),
    "sample_size": np.tile([0, 1, 2], 4),
    "num_obs": np.tile(np.repeat([25, 100], 3), 2),
    "accuracy": [0.8, 0.7, 0.8, 0.6, 0.7, 0.5, 0.6, 0.7, 0.8, 0.7, 0.9, 0.7],
    "prob": [0.94, 0.96, 0.95, 0.98, 0.93, 0.92, 0.90, 0.92, 0.95, 0.9, 0.91, 0.92]
})

df["accum_max_prob"] = df.groupby(["realization_id", "num_obs"])["prob"].cummax()

And I want to know how to create a column with this output:

df["desired_accuracy"] = [0.8, 0.7, 0.7, 0.6, 0.6, 0.6, 0.6, 0.7, 0.8, 0.7, 0.9, 0.7]

Each entry of desired_accuracy equals the accuracy value that corresponds to the row where the highest prob has been achieved so far by group (that is why I create accum_max_prob).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

So, for example: the first value is 0.8 because there is no data prior to that, but then the next one is 0.7 because the prob of the second row is greater than the first. The third value stays the same, because the third prob is lower than the second one, so it does not update desired_accuracy. For each pair (realization_id, num_obs) the criteria resets.

How can I do it in a vectorized fashion using Pandas?

>Solution :

It looks like:

df['desired_accuracy'] = df['accuracy'].mask(df['prob'].lt(df['accum_max_prob'])).ffill()

Output:

    realization_id  sample_size  num_obs  accuracy  prob  accum_max_prob  desired_accuracy
0                0            0       25       0.8  0.94            0.94               0.8
1                0            1       25       0.7  0.96            0.96               0.7
2                0            2       25       0.8  0.95            0.96               0.7
3                0            0      100       0.6  0.98            0.98               0.6
4                0            1      100       0.7  0.93            0.98               0.6
5                0            2      100       0.5  0.92            0.98               0.6
6                1            0       25       0.6  0.90            0.90               0.6
7                1            1       25       0.7  0.92            0.92               0.7
8                1            2       25       0.8  0.95            0.95               0.8
9                1            0      100       0.7  0.90            0.90               0.7
10               1            1      100       0.9  0.91            0.91               0.9
11               1            2      100       0.7  0.92            0.92               0.7
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading