I have the following pandas dataframe:
import pandas as pd
foo_dt = pd.DataFrame({'var_1': ['filter coffee', 'american cheesecake', 'espresso coffee', 'latte tea'],
'var_2': ['coffee', 'coffee black', 'tea', 'strawberry cheesecake']})
and the following dictionary:
foo_colors = {'coffee': 'brown', 'cheesecake': 'white', 'tea': 'green'}
I want to add two columns in foo_dt (color_var_1 and color_var_2), the values of which will be the respective value of the foo_colors dictionary which corresponds to the key depending if the key is in the value of the column var_1 or var_2 respectively.
EDIT
In other words, for every key in foo_colors , check where "it is contained" in both columns var_1 & var_2, and then give as value of the respective column (color_var_1 & color_var_2) the respective value of the dictionary
My resulting dataframe looks like this:
var_1 var_2 color_var_1 color_var_2
0 filter coffee coffee brown brown
1 american cheesecake coffee black white brown
2 espresso coffee tea brown green
3 latte tea strawberry cheesecake green white
Any idea how can I do this ?
>Solution :
Use Series.str.extract for get first matched substring created by join by | for regex or of keys in dict with Series.map by dict:
pat = '|'.join(r"\b{}\b".format(x) for x in foo_colors)
for c in ['var_1','var_2']:
foo_dt[f'color_{c}'] = foo_dt[c].str.extract(f'({pat})', expand=False).map(foo_colors)
print(foo_dt)
var_1 var_2 color_var_1 color_var_2
0 filter coffee coffee brown brown
1 american cheesecake coffee black white brown
2 espresso coffee tea brown green
3 latte tea strawberry cheesecake green white