Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to check similarity between two name pairs interchangeably and give them a unique identifier in python pandas dataframe?

I have a dataframe as follows,

df_names = pd.DataFrame({'last_name':['Williams','Henry','XYX','Smith','David','Freeman','Walter','Test_A'],
                        'first_name':['Henry','Williams','ABC','David','Smith','Walter','Freeman','Test_B']})

A new column full name adding last and first names as below –

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here i would like to check how similar the full names are ? Williams Henry and Henry Williams to be considered as same and give it a unique identifier some random code.

similarly Smith David and David Smith should also be consider as one unique identifier.

Final expected output as below.

enter image description here

>Solution :

Use:

res = (df_names.assign(group=df_names[["last_name", "first_name"]].apply(frozenset, axis=1))
               .groupby("group")
               .ngroup() + 1)

df_names["unique_identifier"] = "A-" + res.astype("string")
print(df_names)

Output

  last_name first_name unique_identifier
0  Williams      Henry               A-1
1     Henry   Williams               A-1
2       XYX        ABC               A-2
3     Smith      David               A-3
4     David      Smith               A-3
5   Freeman     Walter               A-4
6    Walter    Freeman               A-4
7    Test_A     Test_B               A-5

The idea is to use frozenset to map each row to an object where the order of the elements is irrelevant. It has to be a frozenset so is hashable, this is a requirement of pandas.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading