Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas – drop records if on specific column value is equal to previous record

I have a dataframe like following (sorted by id, time and status):

id status timestamp
111 A 29.08.2023 12:39
111 A 29.08.2023 12:45
111 B 29.08.2023 12:47
111 C 29.08.2023 12:50
111 A 29.08.2023 12:50
112 A 29.08.2023 12:50
112 B 29.08.2023 13:09
112 C 29.08.2023 13:40
112 B 29.08.2023 13:50
112 A 29.08.2023 13:55

I need to remove remove repeats in status sequences but only for cases when status repeats in next to each other records. Meaning that id 111 can have multiple statuses == ‘A’, but I need to remove second ‘A’ if previous status in time also was ‘A’.
So new table should be like this (remove second line with same status):

id status timestamp
111 A 29.08.2023 12:39
111 B 29.08.2023 12:47
111 C 29.08.2023 12:50
111 A 29.08.2023 12:50
112 A 29.08.2023 12:50
112 B 29.08.2023 13:09
112 C 29.08.2023 13:40
112 B 29.08.2023 13:50
112 A 29.08.2023 13:55

So in the end we have a unique sequences of statuses for each id without repeats.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Appreciate any help because I stuck with some very slow and blunt approach comparing each row with previous, but dataset is very huge, > 5M records.

>Solution :

Get difference of multiple columns shifted values and get rows by boolean indexing:

out = df[df[['id','status']].ne(df[['id','status']].shift()).any(axis=1)]
print (out)
    id status         timestamp
0  111      A  29.08.2023 12:39
2  111      B  29.08.2023 12:47
3  111      C  29.08.2023 12:50
4  111      A  29.08.2023 12:50
5  112      A  29.08.2023 12:50
6  112      B  29.08.2023 13:09
7  112      C  29.08.2023 13:40
8  112      B  29.08.2023 13:50
9  112      A  29.08.2023 13:55
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading