Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Detecting Keys in a Column of Strings

I have a dictionary with key and value pairs. I also have a data frame with a column containing strings that contain the various keys. If a key appears in the column in the data frame, I’d like to append the corresponding value in the adjacent column

my_dict = {'elon' : 'is awesome', 'jeff' : 'is not so awesome, but hes ok, ig', 'mustard' : 'is gross', 'pigs' : 'can fly'}
my_dict

import pandas as pd
import numpy as np
pd.DataFrame({'Name (Key)' : ['elon musk', 'jeff bezos and elon musk', 'jeff bezos', 'she bought mustard for elon'], 'Corresponding Value(s)' : [np.nan, np.nan, np.nan, np.nan]})

Desired output:

# Desired output:

pd.DataFrame({'Name (Key)' : ['elon musk', 'jeff bezos and elon musk', 'jeff bezos', 'she bought mustard for elon'], 
              'Corresponding Value(s)' : [['is awesome'], ['is not so awesome, but hes ok, ig', 'is awesome'], ['is not so awesome, but hes ok, ig'], ['is gross', 'is awesome']]})

I am new to python, but assume there will be the apply function used in this. Or perhaps map()? Would an if statement be plausible, or is there a better way to approach this?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Below an approach using .apply() for creating the additional column. In addition to if also looping over the words of Name (Key) column values is necessary to create multiple items in the lists being values of the new DataFrame column.

my_dict = {'elon' : 'is awesome', 
           'jeff' : 'is not so awesome, but hes ok, ig', 
           'mustard' : 'is gross', 
           'pigs' : 'can fly'}
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name (Key)' : ['elon musk', 'jeff bezos and elon musk', 'jeff bezos', 'she bought mustard for elon'], 'Corresponding Value(s)' : [np.nan, np.nan, np.nan, np.nan]})

def create_corr_vals_column(row_value):
    cvc = []
    for word in row_value.split():
        if word in my_dict:
            cvc.append(my_dict[word])
    return cvc
df['Corresponding Value(s)'] = df['Name (Key)'].apply( create_corr_vals_column )
print(df)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading