I have a huge dataframe in which a certain column has a list of dictionaries (it is the school history of several people). So, what I’m trying to do is parsing this data to a new dataframe (because the relation is going to be 1 person to many schools).
However, my first option was to loop over the dataframe with itertuples(). Too slow!
Each list looks like this:
list_of_dicts = {
0: '[]',
1: "[{'name': 'USA Health', 'subject': 'Residency, Internal Medicine, 2006 - 2009'}, {'name': 'Ross University School of Medicine', 'subject': 'Class of 2005'}]",
2: "[{'name': 'Physicians Medical Center Carraway', 'subject': 'Residency, Surgery, 1957 - 1960'}, {'name': 'Physicians Medical Center Carraway', 'subject': 'Internship, Transitional Year, 1954 - 1955'}, {'name': 'University of Alabama School of Medicine', 'subject': 'Class of 1954'}]"
}
df_dict = pd.DataFrame.from_dict(list_of_dicts, orient='index', columns=['school_history'])
What I thought about, was to have a function and them apply it to the dataframe:
def parse_item(row):
eval_dict = eval(row)[0]
school_df = pd.DataFrame.from_dict(eval_dict, orient='index').T
return school_df
df['column'].apply(lambda x: parse_item(x))
However, I’m not able to figure out how to generate a dataframe bigger than original (due to situations of multiple schools to one person). Any ideas?
From those 3 rows, the idea is to have this dataframe (that has 5 rows from 2 rows):

>Solution :
Iterate over the column to convert each string into a python list using ast.literal_eval(); the result is a nested list, which can be flattened inside the same comprehension.
from ast import literal_eval
pd.DataFrame([x for row in df_dict['school_history'] for x in literal_eval(row)])
