Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas – filtering on a transition within a chronologically ordered dataframe

I have a dataframe with the format shown below, where each row represents a snapshot in time of a particular specimen.

As time moves forward, a specimen can move from type 1 to type 2, but not from type 2 to type 1. There are other types such as 3, 4, and 5, but I figured if I know how to deal with 1 and 2, I can make it work for the others as well.

The data contains errors where there are transitions from 2 to 1, and my goal is to find them and produce a set of ID.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

For example, the output should be 456, since it went from Type 2 to Type 1 as time passes, which is an error.

ID (not unique) Snapshot Month (YYYYMMDD) Type (1, 2, 3, 4, 5)
123 20210131 1
123 20210521 2
456 20210131 2
456 20210521 1

What I have tried is to sort by ID, then by Snapshot Month, and thought about slicing the dataframe by Type and with a loop, find the maximum date for each ID where Type is 1 and minimum date for each ID where Type is 2, and merge the two on ID and check to see if the maximum is greater than the minimum.

But not only this is unpythonic, but also inefficient (loops). I wonder if there are better ways?

>Solution :

Grouping your data by ID and snaphot then doing a shift might help you :

# sort the dataframe by ID and Snapshot
df.sort_values(['ID', 'Snapshot'], inplace=True)

# then group by ID
grouped = df.groupby('ID')

# Create a previous type column
df['Previous Type'] = grouped['Type'].shift(1)

# get the lines that contains an error by comparing the value and the previous one
errors = df[(df['Type'] == 1) & (df['Previous Type'] == 2)]

# keep only the unique ID
error_ids = errors['ID'].unique()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading