I have a dataframe which looks like this:
ID Date EoM_Val Val
---------------------------------------
AAA 2021-06-30 3946750 3
AAA 2021-06-30 3946750 2
BBB 2021-06-30 2792182 6
BBB 2021-06-30 2792182 7
And basically, I want to only keep the the last EoM_Val based on the ID and Date columns, such that the final output looks like this:
ID Date EoM_Val Val
---------------------------------------
AAA 2021-06-30 3
AAA 2021-06-30 3946750 2
BBB 2021-06-30 6
BBB 2021-06-30 2792182 7
Is there a good way of doing this? Any help would be greatly appreciated, thanks!
>Solution :
I try this code, I hope It may use
df=pd.DataFrame({"ID":["AAA","AAA","BBB","BBB"],"Date":["2021-06-30","2021-06-30","2021-06-30","2021-06-30"],"EoM_Val":[3946750,3946750,2792182,2792182],"Val":[3,2,6,7]})
df2=df.drop_duplicates(["EoM_Val"],keep="first")
indices = list(df2.index)
df.loc[indices,'EoM_Val'] = ""
print(df)
I got output like this
ID Date EoM_Val Val
---------------------------------------
AAA 2021-06-30 3
AAA 2021-06-30 3946750 2
BBB 2021-06-30 6
BBB 2021-06-30 2792182 7