Adding values to a dataframe based on lookup in the same dataframe

April 19, 2024

I try to get my head around a solution for a lookup and filling of an additional column in a pandas dataframe for readability.

The following data (truncated) is available where manager_id is filled with the corresponding user_ids

last_name	first_name	user_id	manager_id
scorsese	martin	1	2
wenders	wim	2	2
kurosawa	akira	3	3
sabu	sabu	4	3

the outcome should be:

last_name	first_name	user_id	manager_id	manager_name
scorsese	martin	1	2	wim wenders
wenders	wim	2	2	wim wenders
kurosawa	akira	3	3	akira kurosawa
sabu	sabu	4	3	akira kurosawa

So far I have been struggeling to find a concise and nice solution with only pandas based methods. I have a working solution but it is a dirty hack iterating through a dictionary of the same dataframe and looking up the row index based on name etc. Very ugly.

    dictionary_of_kantoku = df_kantoku.to_dict(orient="records")
    for kantoku in dictionary_of_kantoku:
        row_index = df_kantoku.loc[
            (df_kantoku['last_name'].str.contains(kantoku['last_name'])
             & df_kantoku['first_name'].str.contains(kantoku['first_name']))].index[0]
        manager_id = df_kantoku[(df_kantoku['last_name'].str.contains(kantoku['last_name'])
             & df_kantoku['first_name'].str.contains(kantoku['first_name']))]['manager_id'].values[0]
        manager_name = df_kantoku[df_kantoku['user_id'] == manager_id]['first_name'].values[0] + ' ' + df_kantoku[df_kantoku['user_id'] == manager_id]['last_name'].values[0]
        if row_index != 0:
            resultset.loc[row_index, 'manager_name'] = manager_name

Can someone please shed some light how this can be done efficiently without the dictionary hack and the iterating?

Thanks a lot.

>Solution :

You can simply do this

import pandas as pd

data = {
    'last_name': ['Scorsese', 'Wenders', 'Kurosawa', 'Sabu'],
    'first_name': ['Martin', 'Wim', 'Akira', 'Sabu'],
    'user_id': [1, 2, 3, 4],
    'manager_id': [2, 2, 3, 3]
}

df = pd.DataFrame(data)

name_map = df.set_index('user_id')['first_name'] + ' ' + df.set_index('user_id')['last_name']
df['manager_name'] = df['manager_id'].map(name_map)

print(df)

which gives

 last_name first_name  user_id  manager_id    manager_name
0  Scorsese     Martin        1           2     Wim Wenders
1   Wenders        Wim        2           2     Wim Wenders
2  Kurosawa      Akira        3           3  Akira Kurosawa
3      Sabu       Sabu        4           3  Akira Kurosawa
[ ]