Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Skip level in nested JSON and convert to Pandas dataframe

I have json data that is structured like this, which I want to turn into a data frame:

{
    "data": {
        "1": {
            "Conversion": {
                "id": "1",
                "datetime": "2024-03-26 08:30:00"
            }
        },        
        "50": {
            "Conversion": {
                "id": "50",
                "datetime": "2024-03-27 09:00:00"
            }
        }
    }
}

My usual approach would be to use json_normalize, like this:

df = pd.json_normalize(input['data'])

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My goal is to have a table/dataframe with just the columns "id" and "datetime".

How do I skip the numbering level below data and go straight to Conversion? I would imagine something like this (which clearly doesn’t work):

df = pd.json_normalize(input['data'][*]['Conversion'])

What is the best way to achieve this? Any hints are greatly appreciated!

>Solution :

You have to manually change data in double list comprehension:

L = [b['Conversion'] for k, v in input['data'].items() for a, b in v.items()]
print (L)
[{'id': '1', 'datetime': '2024-03-26 08:30:00'}, 
 {'id': '50', 'datetime': '2024-03-27 09:00:00'}]


out = pd.json_normalize(L)
print (out)
   id             datetime
0   1  2024-03-26 08:30:00
1  50  2024-03-27 09:00:00

Here is json_normalize not necessary, working DataFrame constructor:

out = pd.DataFrame(L)
print (out)
   id             datetime
0   1  2024-03-26 08:30:00
1  50  2024-03-27 09:00:00

Thank you chepner for another idea with .values:

out = pd.json_normalize((b['Conversion'] for v in input['data'].values() 
                                         for b in v.values()))
print (out)
   id             datetime
0   1  2024-03-26 08:30:00
1  50  2024-03-27 09:00:00

out = pd.DataFrame((b['Conversion'] for v in input['data'].values() 
                                    for b in v.values()))
print (out)
   id             datetime
0   1  2024-03-26 08:30:00
1  50  2024-03-27 09:00:00

In json_normalize is parameter max_level, but working different:

Max number of levels(depth of dict) to normalize. if None, normalizes all levels.

out = pd.json_normalize(input['data'], max_level=1)
print (out)
                                              data.1  \
0  {'Conversion': {'id': '1', 'datetime': '2024-0...   

                                             data.50  
0  {'Conversion': {'id': '50', 'datetime': '2024-...  
                   
out = pd.json_normalize(input['data'], max_level=2)
print (out)
                                data.1.Conversion  \
0  {'id': '1', 'datetime': '2024-03-26 08:30:00'}   

                                data.50.Conversion  
0  {'id': '50', 'datetime': '2024-03-27 09:00:00'}  

out = pd.json_normalize(input['data'], max_level=3)
print (out)
  data.1.Conversion.id data.1.Conversion.datetime data.50.Conversion.id  \
0                    1        2024-03-26 08:30:00                    50   

  data.50.Conversion.datetime  
0         2024-03-27 09:00:00  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading