Skip level in nested JSON and convert to Pandas dataframe

Advertisements

I have json data that is structured like this, which I want to turn into a data frame:

{
    "data": {
        "1": {
            "Conversion": {
                "id": "1",
                "datetime": "2024-03-26 08:30:00"
            }
        },        
        "50": {
            "Conversion": {
                "id": "50",
                "datetime": "2024-03-27 09:00:00"
            }
        }
    }
}

My usual approach would be to use json_normalize, like this:

df = pd.json_normalize(input['data'])

My goal is to have a table/dataframe with just the columns "id" and "datetime".

How do I skip the numbering level below data and go straight to Conversion? I would imagine something like this (which clearly doesn’t work):

df = pd.json_normalize(input['data'][*]['Conversion'])

What is the best way to achieve this? Any hints are greatly appreciated!

>Solution :

You have to manually change data in double list comprehension:

L = [b['Conversion'] for k, v in input['data'].items() for a, b in v.items()]
print (L)
[{'id': '1', 'datetime': '2024-03-26 08:30:00'}, 
 {'id': '50', 'datetime': '2024-03-27 09:00:00'}]


out = pd.json_normalize(L)
print (out)
   id             datetime
0   1  2024-03-26 08:30:00
1  50  2024-03-27 09:00:00

Here is json_normalize not necessary, working DataFrame constructor:

out = pd.DataFrame(L)
print (out)
   id             datetime
0   1  2024-03-26 08:30:00
1  50  2024-03-27 09:00:00

Thank you chepner for another idea with .values:

out = pd.json_normalize((b['Conversion'] for v in input['data'].values() 
                                         for b in v.values()))
print (out)
   id             datetime
0   1  2024-03-26 08:30:00
1  50  2024-03-27 09:00:00

out = pd.DataFrame((b['Conversion'] for v in input['data'].values() 
                                    for b in v.values()))
print (out)
   id             datetime
0   1  2024-03-26 08:30:00
1  50  2024-03-27 09:00:00

In json_normalize is parameter max_level, but working different:

Max number of levels(depth of dict) to normalize. if None, normalizes all levels.

out = pd.json_normalize(input['data'], max_level=1)
print (out)
                                              data.1  \
0  {'Conversion': {'id': '1', 'datetime': '2024-0...   

                                             data.50  
0  {'Conversion': {'id': '50', 'datetime': '2024-...  
                   
out = pd.json_normalize(input['data'], max_level=2)
print (out)
                                data.1.Conversion  \
0  {'id': '1', 'datetime': '2024-03-26 08:30:00'}   

                                data.50.Conversion  
0  {'id': '50', 'datetime': '2024-03-27 09:00:00'}  

out = pd.json_normalize(input['data'], max_level=3)
print (out)
  data.1.Conversion.id data.1.Conversion.datetime data.50.Conversion.id  \
0                    1        2024-03-26 08:30:00                    50   

  data.50.Conversion.datetime  
0         2024-03-27 09:00:00  

Leave a ReplyCancel reply