Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace duplicate value with the value from another column on each row where a duplicate is located

Say I have the following dataframe (Duplicate ID 1 and 3):

ID    Name    ALT_ID
1     Jack    111
1     James   222
2     Joe     333
3     Jim     444
3     Jen     555

How do I replace duplicate ID with ALT_ID for each occurrence? I would want the final dataframe to look like this:

ID    Name    ALT_ID
1     Jack    111
222   James   222
2     Joe     333
3     Jim     444
555   Jen     555

This will be a massive dataframe but long running time is not really an issue. Please let me know if there is any more information I can provide, thanks!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’ve been using ‘pandas’ so far so any functions that would help me from that library would be a big bonus!

>Solution :

Just use pandas.DataFrame.duplicated, a method of your dataframe to locate which values are dupes in your "ID" column. Then use the same rows but take the value in "ALT_ID":

>>> df.loc[df["ID"].duplicated(), "ID"] = df.loc[df["ID"].duplicated(), "ALT_ID"]
>>> df
    ID   Name  ALT_ID
0    1   Jack     111
1  222  James     222
2    2    Joe     333
3    3    Jim     444
4  555    Jen     555
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading