Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: Explode Nested JSON and Retain Row ID

How might I retain a row ID mapping after exploding a nested JSON?

Consider this example:

df = pd.DataFrame({'id': [1, 2], 'xd': [
    [
     {
        "status": "pass",
        "desc": "desc",
        "actionable": False,
        "err_code": "None",
        "err_msg": "None"
        },
      {
         "status": "pass",
         "desc": "desc",
         "actionable": False,
         "err_code": "err",
         "err_msg": "not found"
         }
     ],
    [
     {
        "status": "fail",
        "desc": "desc",
        "actionable": True,
        "err_code": "None",
        "err_msg": "None",
    },
   {
      "status": "pass",
      "desc": "desc",
      "actionable": True,
      "err_code": "err",
      "err_msg": "found"
      }
     ] ]})
# example df
    id  xd
0   1   [{'status': 'pass', 'desc': 'desc', 'actionabl...
1   2   [{'status': 'fail', 'desc': 'desc', 'actionabl...

Now explode it:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

pd.json_normalize(df['xd'].explode())
    status  desc    actionable  err_code    err_msg
0   pass    desc    False       None        None
1   pass    desc    False       err         not found
2   fail    desc    True        None        None
3   pass    desc    True        err         found

Ok great, but now I want to retain a way that lets me link the first two rows as belonging to id 1 and the second two rows belonging two id 2 for an arbitrarily deep nested JSON xd.

>Solution :

Perhaps just explode the column, and then pipe it and call json_normalize and use the exploded index?

new_df = df['xd'].explode().pipe(lambda x: pd.json_normalize(x).set_index(x.index))

Output:

>>> new_df
  status  desc  actionable err_code    err_msg
0   pass  desc       False     None       None
0   pass  desc       False      err  not found
1   fail  desc        True     None       None
1   pass  desc        True      err      found
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading