I have a dataset and I need to remove IDs entirely that are flagged before a certain date and beyond. I am having trouble getting a start on this one.
df =
ID Date Flagged
101 6/4/2023 0
101 7/23/2023 0
102 4/28/2023 1
102 5/2/2023 1
102 6/30/2023 1
102 7/11/2023 1
103 6/23/2023 1
103 7/12/2023 1
104 4/17/2023 0
104 5/12/2023 1
104 6/17/2023 1
104 7/22/2023 1
I would like to remove the IDs all together that are Flagged before May 1 2023.
I tried
today = datetime.datetime.today()
x_days = today - dt(days=90)`
filtered_df = df{(df['Flagged'] == 1) & (df['Date' >= x_days)]
When I run this I still have IDs that I would like to have completely removed. Below is a desired output:
df =
ID Date Flagged
103 6/23/2023 1
103 7/12/2023 1
104 5/12/2023 1
104 6/17/2023 1
104 7/22/2023 1
Any help with this would be great thank you!
>Solution :
Use boolean indexing:
#convert to datetime if needed
df["Date"] = pd.to_datetime(df["Date"],format="%m/%d/%Y")
#get flagged IDs to ignore
flagged = df[df["Flagged"].eq(1)&df["Date"].lt(pd.Timestamp.today()-pd.DateOffset(90))]
>>> df[~df["ID"].isin(flagged["ID"])&df["Flagged"].eq(1)]
ID Date Flagged
6 103 2023-06-23 1
7 103 2023-07-12 1
9 104 2023-05-12 1
10 104 2023-06-17 1
11 104 2023-07-22 1