Python Match 2 data frames by name where one is full name and other first name initial and last name

I’m trying to merge 2 dataframes by name where one column is name + last name and the other only first name initial + last name.

Example:

df1:

    name
John Doe

df2:

  name
J. Doe

>Solution :

You can use a regex to change the format of df1['name'] and feed this to merge:

df1.merge(df2,
          left_on=df1['name'].str.replace(r'^(.)\w+', r'\1.', regex=True),
          right_on='name'
         )

NB. check the documentation of merge for more parameters to format the output, in particular suffixes to customize the column names.

output:

     name    name_x  name_y
0  J. Doe  John Doe  J. Doe

example input:

df1 = pd.DataFrame({'name': ['John Doe']})
df2 = pd.DataFrame({'name': ['J. Doe']})

Leave a Reply