Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Adding values to a dataframe based on lookup in the same dataframe

I try to get my head around a solution for a lookup and filling of an additional column in a pandas dataframe for readability.

The following data (truncated) is available where manager_id is filled with the corresponding user_ids

last_name first_name user_id manager_id
scorsese martin 1 2
wenders wim 2 2
kurosawa akira 3 3
sabu sabu 4 3

the outcome should be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

last_name first_name user_id manager_id manager_name
scorsese martin 1 2 wim wenders
wenders wim 2 2 wim wenders
kurosawa akira 3 3 akira kurosawa
sabu sabu 4 3 akira kurosawa

So far I have been struggeling to find a concise and nice solution with only pandas based methods. I have a working solution but it is a dirty hack iterating through a dictionary of the same dataframe and looking up the row index based on name etc. Very ugly.

    dictionary_of_kantoku = df_kantoku.to_dict(orient="records")
    for kantoku in dictionary_of_kantoku:
        row_index = df_kantoku.loc[
            (df_kantoku['last_name'].str.contains(kantoku['last_name'])
             & df_kantoku['first_name'].str.contains(kantoku['first_name']))].index[0]
        manager_id = df_kantoku[(df_kantoku['last_name'].str.contains(kantoku['last_name'])
             & df_kantoku['first_name'].str.contains(kantoku['first_name']))]['manager_id'].values[0]
        manager_name = df_kantoku[df_kantoku['user_id'] == manager_id]['first_name'].values[0] + ' ' + df_kantoku[df_kantoku['user_id'] == manager_id]['last_name'].values[0]
        if row_index != 0:
            resultset.loc[row_index, 'manager_name'] = manager_name

Can someone please shed some light how this can be done efficiently without the dictionary hack and the iterating?

Thanks a lot.

>Solution :

You can simply do this

import pandas as pd

data = {
    'last_name': ['Scorsese', 'Wenders', 'Kurosawa', 'Sabu'],
    'first_name': ['Martin', 'Wim', 'Akira', 'Sabu'],
    'user_id': [1, 2, 3, 4],
    'manager_id': [2, 2, 3, 3]
}

df = pd.DataFrame(data)

name_map = df.set_index('user_id')['first_name'] + ' ' + df.set_index('user_id')['last_name']
df['manager_name'] = df['manager_id'].map(name_map)

print(df)

which gives

 last_name first_name  user_id  manager_id    manager_name
0  Scorsese     Martin        1           2     Wim Wenders
1   Wenders        Wim        2           2     Wim Wenders
2  Kurosawa      Akira        3           3  Akira Kurosawa
3      Sabu       Sabu        4           3  Akira Kurosawa
[ ]


Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading