Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Match 2 data frames by name where one is full name and other first name initial and last name

I’m trying to merge 2 dataframes by name where one column is name + last name and the other only first name initial + last name.

Example:

df1:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    name
John Doe

df2:

  name
J. Doe

>Solution :

You can use a regex to change the format of df1['name'] and feed this to merge:

df1.merge(df2,
          left_on=df1['name'].str.replace(r'^(.)\w+', r'\1.', regex=True),
          right_on='name'
         )

NB. check the documentation of merge for more parameters to format the output, in particular suffixes to customize the column names.

output:

     name    name_x  name_y
0  J. Doe  John Doe  J. Doe

example input:

df1 = pd.DataFrame({'name': ['John Doe']})
df2 = pd.DataFrame({'name': ['J. Doe']})
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading