Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Parse a list of dictionaries with apply/lambda

I have a huge dataframe in which a certain column has a list of dictionaries (it is the school history of several people). So, what I’m trying to do is parsing this data to a new dataframe (because the relation is going to be 1 person to many schools).

However, my first option was to loop over the dataframe with itertuples(). Too slow!

Each list looks like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

list_of_dicts = {
    0: '[]',
    1: "[{'name': 'USA Health', 'subject': 'Residency, Internal Medicine, 2006 - 2009'}, {'name': 'Ross University School of Medicine', 'subject': 'Class of 2005'}]",
    2: "[{'name': 'Physicians Medical Center Carraway', 'subject': 'Residency, Surgery, 1957 - 1960'}, {'name': 'Physicians Medical Center Carraway', 'subject': 'Internship, Transitional Year, 1954 - 1955'}, {'name': 'University of Alabama School of Medicine', 'subject': 'Class of 1954'}]"
}

df_dict = pd.DataFrame.from_dict(list_of_dicts, orient='index', columns=['school_history'])

What I thought about, was to have a function and them apply it to the dataframe:

def parse_item(row):
    eval_dict = eval(row)[0]
    school_df = pd.DataFrame.from_dict(eval_dict, orient='index').T
    return school_df

df['column'].apply(lambda x: parse_item(x))

However, I’m not able to figure out how to generate a dataframe bigger than original (due to situations of multiple schools to one person). Any ideas?

From those 3 rows, the idea is to have this dataframe (that has 5 rows from 2 rows):
enter image description here

>Solution :

Iterate over the column to convert each string into a python list using ast.literal_eval(); the result is a nested list, which can be flattened inside the same comprehension.

from ast import literal_eval
pd.DataFrame([x for row in df_dict['school_history'] for x in literal_eval(row)])

res

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading