Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add new column based on values of another column from a dictionary in pandas

I have the following pandas dataframe:

import pandas as pd
foo_dt = pd.DataFrame({'var_1': ['filter coffee', 'american cheesecake', 'espresso coffee', 'latte tea'],
                   'var_2': ['coffee', 'coffee black', 'tea', 'strawberry cheesecake']})

and the following dictionary:

foo_colors = {'coffee': 'brown', 'cheesecake': 'white', 'tea': 'green'}

I want to add two columns in foo_dt (color_var_1 and color_var_2), the values of which will be the respective value of the foo_colors dictionary which corresponds to the key depending if the key is in the value of the column var_1 or var_2 respectively.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

EDIT

In other words, for every key in foo_colors , check where "it is contained" in both columns var_1 & var_2, and then give as value of the respective column (color_var_1 & color_var_2) the respective value of the dictionary

My resulting dataframe looks like this:

           var_1                     var_2                  color_var_1   color_var_2
0          filter coffee             coffee                 brown         brown
1          american cheesecake       coffee black           white         brown
2          espresso coffee           tea                    brown         green
3          latte tea                 strawberry cheesecake  green         white

Any idea how can I do this ?

>Solution :

Use Series.str.extract for get first matched substring created by join by | for regex or of keys in dict with Series.map by dict:

pat = '|'.join(r"\b{}\b".format(x) for x in foo_colors)

for c in ['var_1','var_2']:
    foo_dt[f'color_{c}'] = foo_dt[c].str.extract(f'({pat})', expand=False).map(foo_colors)
print(foo_dt)
                 var_1                  var_2 color_var_1 color_var_2
0        filter coffee                 coffee       brown       brown
1  american cheesecake           coffee black       white       brown
2      espresso coffee                    tea       brown       green
3            latte tea  strawberry cheesecake       green       white
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading