Say I have the following dataframe (Duplicate ID 1 and 3):
ID Name ALT_ID
1 Jack 111
1 James 222
2 Joe 333
3 Jim 444
3 Jen 555
How do I replace duplicate ID with ALT_ID for each occurrence? I would want the final dataframe to look like this:
ID Name ALT_ID
1 Jack 111
222 James 222
2 Joe 333
3 Jim 444
555 Jen 555
This will be a massive dataframe but long running time is not really an issue. Please let me know if there is any more information I can provide, thanks!
I’ve been using ‘pandas’ so far so any functions that would help me from that library would be a big bonus!
>Solution :
Just use pandas.DataFrame.duplicated, a method of your dataframe to locate which values are dupes in your "ID" column. Then use the same rows but take the value in "ALT_ID":
>>> df.loc[df["ID"].duplicated(), "ID"] = df.loc[df["ID"].duplicated(), "ALT_ID"]
>>> df
ID Name ALT_ID
0 1 Jack 111
1 222 James 222
2 2 Joe 333
3 3 Jim 444
4 555 Jen 555