Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do you unnest JSON, column with a list of dict in a dataframe to a new Dataframe?

I have a dataframe, df_object that has a column that contains a list of dict items.

ObjectID research
392 {‘researchID’: ‘10003’, ‘research.type’: ‘x-ray’, ‘research.date’: ‘2004’}, {‘researchID’: ‘10006’, ‘research.type’: ‘document’, ‘research.date’: ‘2005’
394 {‘researchID’: ‘10012’, ‘research.type’: ‘x-ray’, ‘research.date’: ‘2005’}

My goal is to have a new dataframe from the research column, that includes the ObjectID, but uses the researchID as the index:

researchID (index) ObjectID research.type research.date
10003 392 x-ray 2004
10006 392 document 2005
10012 394 x-ray 2005

When I use this code: df = pd.DataFrame(df_object['research'][392])
the columns are correct and it looks like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

researchID (index) ObjectID research.type research.date
10003 392 x-ray 2004
10006 392 document 2005

The issue is that I don’t know how to add the rest of the objectID data on. I thought I could iterate all the rows in df_object and append the rows but wasn’t sure if that’s the best way because I read that you shouldn’t keep appending hundreds of rows to dataframes like that because it’s slow.

When I tried df = pd.DataFrame(df_object['research']), then the index stays on ObjectID and it’s nested.

I tried a few flatten JSON functions from around StackOverflow, but I kept getting errors and I’m not sure if that’s what I needed to be searching for.

ADDED
Output from: df_object.head(2).to_dict()

{'research': {392: [{'researchID': '10003', 'research.type': ['x-ray'], 'research.date': '2004'},{'researchID': '10006', 'research.type': 'document', 'research.date': '2005'}], 393: {'researchID': '10012', 'research.type': 'x-ray', 'research.date': '2005'}} 

>Solution :

I’m assuming that in research column you have list of dict items (as you stated in your question). Then:

df = df.explode("research")
df = pd.concat([df, df.pop("research").apply(pd.Series)], axis=1)
print(df)

Prints:

   ObjectID researchID research.type research.date
0       392      10003         x-ray          2004
0       392      10006      document          2005
1       394      10012         x-ray          2005

To set researchID as index:

print(df.set_index("researchID"))

Prints:

            ObjectID research.type research.date
researchID                                      
10003            392         x-ray          2004
10006            392      document          2005
10012            394         x-ray          2005
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading