Home Replace duplicate value with the value from another column on each row where a duplicate is located

Questions

Replace duplicate value with the value from another column on each row where a duplicate is located

March 1, 2022

Say I have the following dataframe (Duplicate ID 1 and 3):

ID    Name    ALT_ID
1     Jack    111
1     James   222
2     Joe     333
3     Jim     444
3     Jen     555

How do I replace duplicate ID with ALT_ID for each occurrence? I would want the final dataframe to look like this:

ID    Name    ALT_ID
1     Jack    111
222   James   222
2     Joe     333
3     Jim     444
555   Jen     555

This will be a massive dataframe but long running time is not really an issue. Please let me know if there is any more information I can provide, thanks!

I’ve been using ‘pandas’ so far so any functions that would help me from that library would be a big bonus!

>Solution :

Just use pandas.DataFrame.duplicated, a method of your dataframe to locate which values are dupes in your "ID" column. Then use the same rows but take the value in "ALT_ID":

>>> df.loc[df["ID"].duplicated(), "ID"] = df.loc[df["ID"].duplicated(), "ALT_ID"]
>>> df
    ID   Name  ALT_ID
0    1   Jack     111
1  222  James     222
2    2    Joe     333
3    3    Jim     444
4  555    Jen     555