Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract Keys from a string representation of dictionaries stored within a pandas dataframe

I have the following dataframe which contains a string representations of dictionaries in every row of the columns summary_in and summary out:

import pandas as pd
    
df_vals = [[0,
  'Person1',
  "['xyz', 'abc', 'Jim']",
  "['jkl', 'efg', 'Smith']",
  1134,
  1180,
  46,
  'sample text',
  "{'xyz_key': ['xyz', 756.0], 'abc_key': ['abc', 378.0], 'Jim_key': ['Jim', 0]}",
  "{'jkl_key': ['jkl', 395.0], 'efg_key': ['efg', 785.0], 'Smith_key': ['Smith', 0]}"],
 [1,
  'Person2',
  "['lmn', 'opq', 'Mick']",
  "['rst', 'uvw', 'Smith']",
  1134,
  1180,
  46,
  'sample tex2',
  "{'lmn_key': ['lmn', 756.0], 'opq_key': ['opq', 378.0], 'Mick_key': ['Mick', 0]}",
  "{'rst_key': ['rst', 395.0], 'uvw_key': ['uvw', 785.0], 'Smith_key': ['Smith', 0]}"]]

df = pd.DataFrame(data=df_vals, columns =['row','Person','in','out','val1','val2','diff','note','summary_in','summary_out'] )
df

enter image description here

What I am trying to do it iterate over every row in the dataframe to print each key that exists in the summary_in for each Person row

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

After running this code to test datatypes:

#create dict of column
dict_from_dataframe = df['summary_in'].to_dict()
print(type(dict_from_dataframe))

for k in dict_from_dataframe.items():
    d = k[1]
    print(type(d))
    print(d)

I get the following output that shows once i hit the next level, the dictionary (d)is now a string and cannot be accessed as would normally be with a dictionary:

<class 'dict'>
<class 'str'>
{'xyz_key': ['xyz', 756.0], 'abc_key': ['abc', 378.0], 'Jim_key': ['Jim', 0]}
<class 'str'>
{'lmn_key': ['lmn', 756.0], 'opq_key': ['opq', 378.0], 'Mick_key': ['Mick', 0]}

Any ideas on what I have done wrong here?

My expected output is to loop over the df to print the following

Person1
xyz_key
abc_key
Jim_key
Person2
lmn_key
opq_key
Mick_key

Any help would be much appreciated! Thanks

>Solution :

IIUC, you could use a custom function. You need to convert the string representation to dictionary with ast.literal_eval.

from ast import literal_eval

def print_infos(s):
    print(s['Person'])
    d = literal_eval(s['summary_in'])
    for k in d:
        print(k)

for _, r in df.iterrows():
    print_infos(r)

output:

Person1
xyz_key
abc_key
Jim_key
Person2
lmn_key
opq_key
Mick_key
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading