Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter grouped data based on stage ordering

My data consist of rows recording what stage an ID is in on a certain date. Ideally, each ID should proceed to stage 4 directionally, though it need not go through every stage. For example ID "A" below goes throgh stages 1->2->4. Sometimes (for a variety of possible reasons) an ID is returned to a previous stage. The data are sorted by date within each ID:

df = pd.DataFrame( {"ID": ["A","A","A","B","B","B","B","B","B","C","C","C","C","C","C"], "Stage":[4,2,1,4,3,4,3,2,1,4,3,2,1,2,1],\
        "Date":['2022-09-18','2022-09-17','2022-09-16','2022-09-20','2022-09-19','2022-09-18','2022-09-17','2022-09-16','2022-09-15',\
            '2022-09-20','2022-09-19','2022-09-18','2022-09-17','2022-09-16','2022-09-15']} )
    print(df)
       ID  Stage        Date
    0   A      4  2022-09-18
    1   A      2  2022-09-17
    2   A      1  2022-09-16
    3   B      4  2022-09-20
    4   B      3  2022-09-19
    5   B      4  2022-09-18
    6   B      3  2022-09-17
    7   B      2  2022-09-16
    8   B      1  2022-09-15
    9   C      4  2022-09-20
    10  C      3  2022-09-19
    11  C      2  2022-09-18
    12  C      1  2022-09-17
    13  C      2  2022-09-16
    14  C      1  2022-09-15

I would like to drop all rows of data for each ID that occur before the most recent time that it is sent back to a previous stage. I would get something like this:

print(df_filtered)
  ID  Stage        Date
0  A      4  2022-09-18
1  A      2  2022-09-17
2  A      1  2022-09-16
3  B      4  2022-09-20
4  B      3  2022-09-19
5  C      4  2022-09-20
6  C      3  2022-09-19
7  C      2  2022-09-18
8  C      1  2022-09-17

Notice that A has not changed as it moved directionally through the process, B has dropped all data before it was sent back to Stage 3 on 2022-09-19, and C has dropped the two rows before it was returned to Stage 1 on 2022-09-17. This is a greatly simplified case , but in the true data an ID might be returned to a previous stage several times, including within the same stage. For example, an ID might be sent from stage 2 back to stage 2.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Is there a clean way to get from df to df_filtered without using for loops?

>Solution :

IIUC, sort the dates, then per group, check if there is an increase in Stage and drop the values afterwards:

m = (df
  # ensure the dates are in decreasing order
  # optional if the dates are already in descending order
  .sort_values(by=['ID', 'Date'], ascending=[True, False])
  # for each group, if Stage increases, flag this and the successive rows
  .groupby('ID')['Stage'].apply(lambda x: x.diff().gt(0).cummax())
)

# select the non-flagged rows
out = df[~m]

output:

   ID  Stage        Date
0   A      4  2022-09-18
1   A      2  2022-09-17
2   A      1  2022-09-16
3   B      4  2022-09-20
4   B      3  2022-09-19
9   C      4  2022-09-20
10  C      3  2022-09-19
11  C      2  2022-09-18
12  C      1  2022-09-17

intermediates:

   ID  Stage        Date  diff  gt(0)  cummax
0   A      4  2022-09-18   NaN  False   False
1   A      2  2022-09-17  -2.0  False   False
2   A      1  2022-09-16  -1.0  False   False
3   B      4  2022-09-20   NaN  False   False
4   B      3  2022-09-19  -1.0  False   False
5   B      4  2022-09-18   1.0   True    True
6   B      3  2022-09-17  -1.0  False    True
7   B      2  2022-09-16  -1.0  False    True
8   B      1  2022-09-15  -1.0  False    True
9   C      4  2022-09-20   NaN  False   False
10  C      3  2022-09-19  -1.0  False   False
11  C      2  2022-09-18  -1.0  False   False
12  C      1  2022-09-17  -1.0  False   False
13  C      2  2022-09-16   1.0   True    True
14  C      1  2022-09-15  -1.0  False    True
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading