Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Write list of dictionaries to multiple rows in a Pandas Dataframe

So I have a list of dictionaries, that itself has lists of dictionaries within it like this:

myDict = [{'First_Name': 'Jack', 'Last_Name': 'Smith', 'Job_Data': [{'Company': 'Amazon'}, {'Hire_Date': '2011-04-01', 'Company': 'Target'}]}, 
 {'First_Name': 'Jill', 'Last_Name': 'Smith', 'Job_Data': [{'Hire_Date': '2009-11-16', 'Company': 'Sears'}, {'Hire_Date': '2011-04-01'}]}]

However, as you can see, some of the key values are the same, and sometimes data elements will be missing like Jack missing a Hire Date and Jill missing a company. So what I want to do is preserve the data and write it to multiple rows so that my final output looks like this:

      First_Name    Last_Name    Hire_Date     Company
0     Jack          Smith        NaN           Amazon  
1     Jack          Smith        2011-04-01    Target
2     Jill          Smith        2009-11-16    Sears
3     Jill          Smith        2011-04-01    NaN

Edit: Follow-up question. Say now that I have a dictionary that looks like this that adds in an extract key and I want to produce a similar output but with the new data included:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 myDict = [{'First_Name': 'Jack', 'Last_Name': 'Smith', 'Job_Data': [{'Company': 'Amazon'}, {'Hire_Date': '2011-04-01', 'Company': 'Target'}, 'Dependent_data': [{'Dependent': 'Susan Smith'}, {'Dependent': 'Will Smith'}]}, 
     {'First_Name': 'Jill', 'Last_Name': 'Smith', 'Job_Data': [{'Hire_Date': '2009-11-16', 'Company': 'Sears'}, {'Hire_Date': '2011-04-01'}]}]

Output:

      First_Name    Last_Name    Hire_Date     Company    Dependent
0     Jack          Smith        NaN           Amazon     Susan Smith  
1     Jack          Smith        2011-04-01    Target     Will Smith
2     Jill          Smith        2009-11-16    Sears      NaN
3     Jill          Smith        2011-04-01    NaN        NaN

>Solution :

Using json_normalize

df = pd.json_normalize(data=myDict, meta=["First_Name", "Last_Name"], record_path="Job_Data")
print(df)

  Company   Hire_Date First_Name Last_Name
0  Amazon         NaN       Jack     Smith
1  Target  2011-04-01       Jack     Smith
2   Sears  2009-11-16       Jill     Smith
3     NaN  2011-04-01       Jill     Smith
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading