Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

JSON not splitted in different cells

I have a problem. I have a json that contain logs. Each logs could have a change but
does not have to. So I want to show the logs with the changes in a pandas. But unfortunately the changes is not splitted in different cells (e.g. id and operation). So the complete changes is one cell. How could I get the following output like below?

{'model': 
     {'logs': 
          [{'id': '001',
            'changes': 
                      [{'id': '101','operation': 'UPDATE'}]},
                        {'id': '0000000017b068fc',
             'changes': 
                      [{'id': '201','operation': 'UPDATE'}]},
           {'id': '002',
              'changes': 
                      [{'id': '102','operation': 'UPDATE'},
                       {'id': '202','operation': 'UPDATE'},
                       {'id': '302','operation': 'UPDATE'}]},
          {'id': '003', 
               'changes': 
                      []},
          {'id': '004', 
                'changes': 
                       []},

Code

import pandas as pd
df=pd.json_normalize(result['model']['logs'])

What I got

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[OUT]
id  changes
0   001 [{'id': '101'...
1   002 [{'id': '102'...
3   003 []

What I want


[OUT]

     id     changes.id    changes.operation
0    001    101           UPDATE
1    001    201           UPDATE
2    002    102           UPDATE
3    002    202           UPDATE
4    002    302           UPDATE
5    003    None          None
6    004    None          None

>Solution :

Try:

out = pd.json_normalize(result['model']['logs'], 'changes', 'id', 
                        record_prefix='changes.', errors='ignore')
print(out)

# Output
  changes.id changes.operation                id
0        101            UPDATE               001
1        201            UPDATE  0000000017b068fc
2        102            UPDATE               002
3        202            UPDATE               002
4        302            UPDATE               002

Update

If you need to keep NaN rows, use:

out = pd.json_normalize(result['model']['logs']).explode('changes')
changes = pd.DataFrame(out.pop('changes').to_dict()).T.add_prefix('changes.')
out = pd.concat([out, changes], axis=1)
print(out)

# Output
                 id changes.id changes.operation
0               001        101            UPDATE
1  0000000017b068fc        201            UPDATE
2               002        302            UPDATE
2               002        302            UPDATE
2               002        302            UPDATE
3               003        NaN               NaN
4               004        NaN               NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading