I try to get my head around a solution for a lookup and filling of an additional column in a pandas dataframe for readability.
The following data (truncated) is available where manager_id is filled with the corresponding user_ids
| last_name | first_name | user_id | manager_id |
|---|---|---|---|
| scorsese | martin | 1 | 2 |
| wenders | wim | 2 | 2 |
| kurosawa | akira | 3 | 3 |
| sabu | sabu | 4 | 3 |
the outcome should be:
| last_name | first_name | user_id | manager_id | manager_name |
|---|---|---|---|---|
| scorsese | martin | 1 | 2 | wim wenders |
| wenders | wim | 2 | 2 | wim wenders |
| kurosawa | akira | 3 | 3 | akira kurosawa |
| sabu | sabu | 4 | 3 | akira kurosawa |
So far I have been struggeling to find a concise and nice solution with only pandas based methods. I have a working solution but it is a dirty hack iterating through a dictionary of the same dataframe and looking up the row index based on name etc. Very ugly.
dictionary_of_kantoku = df_kantoku.to_dict(orient="records")
for kantoku in dictionary_of_kantoku:
row_index = df_kantoku.loc[
(df_kantoku['last_name'].str.contains(kantoku['last_name'])
& df_kantoku['first_name'].str.contains(kantoku['first_name']))].index[0]
manager_id = df_kantoku[(df_kantoku['last_name'].str.contains(kantoku['last_name'])
& df_kantoku['first_name'].str.contains(kantoku['first_name']))]['manager_id'].values[0]
manager_name = df_kantoku[df_kantoku['user_id'] == manager_id]['first_name'].values[0] + ' ' + df_kantoku[df_kantoku['user_id'] == manager_id]['last_name'].values[0]
if row_index != 0:
resultset.loc[row_index, 'manager_name'] = manager_name
Can someone please shed some light how this can be done efficiently without the dictionary hack and the iterating?
Thanks a lot.
>Solution :
You can simply do this
import pandas as pd
data = {
'last_name': ['Scorsese', 'Wenders', 'Kurosawa', 'Sabu'],
'first_name': ['Martin', 'Wim', 'Akira', 'Sabu'],
'user_id': [1, 2, 3, 4],
'manager_id': [2, 2, 3, 3]
}
df = pd.DataFrame(data)
name_map = df.set_index('user_id')['first_name'] + ' ' + df.set_index('user_id')['last_name']
df['manager_name'] = df['manager_id'].map(name_map)
print(df)
which gives
last_name first_name user_id manager_id manager_name
0 Scorsese Martin 1 2 Wim Wenders
1 Wenders Wim 2 2 Wim Wenders
2 Kurosawa Akira 3 3 Akira Kurosawa
3 Sabu Sabu 4 3 Akira Kurosawa
[ ]