I have data like this:
| Task | ID | Status |
|---|---|---|
| Task1 | 123 | Open |
| Task2 | 123 | Closed |
| Task3 | 211 | Closed |
| Task4 | 211 | Closed |
| Task5 | 564 | Closed |
| Task6 | 994 | Open |
I want to delete rows with the same IDs that have ‘Open’ status. In other words I want to delete all IDs that have ‘Open’ status.
The end result will be like this:
| Task | ID | Status |
|---|---|---|
| Task3 | 211 | Closed |
| Task4 | 211 | Closed |
| Task5 | 564 | Closed |
Data:
{'Task': ['Task1', 'Task2', 'Task3', 'Task4', 'Task5', 'Task6'],
'ID': [123, 123, 211, 211, 564, 994],
'Status': ['Open', 'Closed', 'Closed', 'Closed', 'Closed', 'Open']}
>Solution :
We could take the Open Statuses and groupby + cummax to create a boolean filter.
The idea is if a Status is Open, we flag its corresponding ID as True for all rows it appears, then we filter out all such rows:
out = df[~df['Status'].eq('Open').groupby(df['ID']).cummax()]
Output:
Task ID Status
2 Task3 211 Closed
3 Task4 211 Closed
4 Task5 564 Closed