Detecting Keys in a Column of Strings

October 25, 2022

I have a dictionary with key and value pairs. I also have a data frame with a column containing strings that contain the various keys. If a key appears in the column in the data frame, I’d like to append the corresponding value in the adjacent column

my_dict = {'elon' : 'is awesome', 'jeff' : 'is not so awesome, but hes ok, ig', 'mustard' : 'is gross', 'pigs' : 'can fly'}
my_dict

import pandas as pd
import numpy as np
pd.DataFrame({'Name (Key)' : ['elon musk', 'jeff bezos and elon musk', 'jeff bezos', 'she bought mustard for elon'], 'Corresponding Value(s)' : [np.nan, np.nan, np.nan, np.nan]})

Desired output:

# Desired output:

pd.DataFrame({'Name (Key)' : ['elon musk', 'jeff bezos and elon musk', 'jeff bezos', 'she bought mustard for elon'], 
              'Corresponding Value(s)' : [['is awesome'], ['is not so awesome, but hes ok, ig', 'is awesome'], ['is not so awesome, but hes ok, ig'], ['is gross', 'is awesome']]})

I am new to python, but assume there will be the apply function used in this. Or perhaps map()? Would an if statement be plausible, or is there a better way to approach this?

>Solution :

Below an approach using .apply() for creating the additional column. In addition to if also looping over the words of Name (Key) column values is necessary to create multiple items in the lists being values of the new DataFrame column.

my_dict = {'elon' : 'is awesome', 
           'jeff' : 'is not so awesome, but hes ok, ig', 
           'mustard' : 'is gross', 
           'pigs' : 'can fly'}
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name (Key)' : ['elon musk', 'jeff bezos and elon musk', 'jeff bezos', 'she bought mustard for elon'], 'Corresponding Value(s)' : [np.nan, np.nan, np.nan, np.nan]})

def create_corr_vals_column(row_value):
    cvc = []
    for word in row_value.split():
        if word in my_dict:
            cvc.append(my_dict[word])
    return cvc
df['Corresponding Value(s)'] = df['Name (Key)'].apply( create_corr_vals_column )
print(df)