I have json data that is structured like this, which I want to turn into a data frame:
{
"data": {
"1": {
"Conversion": {
"id": "1",
"datetime": "2024-03-26 08:30:00"
}
},
"50": {
"Conversion": {
"id": "50",
"datetime": "2024-03-27 09:00:00"
}
}
}
}
My usual approach would be to use json_normalize, like this:
df = pd.json_normalize(input['data'])
My goal is to have a table/dataframe with just the columns "id" and "datetime".
How do I skip the numbering level below data and go straight to Conversion? I would imagine something like this (which clearly doesn’t work):
df = pd.json_normalize(input['data'][*]['Conversion'])
What is the best way to achieve this? Any hints are greatly appreciated!
>Solution :
You have to manually change data in double list comprehension:
L = [b['Conversion'] for k, v in input['data'].items() for a, b in v.items()]
print (L)
[{'id': '1', 'datetime': '2024-03-26 08:30:00'},
{'id': '50', 'datetime': '2024-03-27 09:00:00'}]
out = pd.json_normalize(L)
print (out)
id datetime
0 1 2024-03-26 08:30:00
1 50 2024-03-27 09:00:00
Here is json_normalize
not necessary, working DataFrame
constructor:
out = pd.DataFrame(L)
print (out)
id datetime
0 1 2024-03-26 08:30:00
1 50 2024-03-27 09:00:00
Thank you chepner for another idea with .values
:
out = pd.json_normalize((b['Conversion'] for v in input['data'].values()
for b in v.values()))
print (out)
id datetime
0 1 2024-03-26 08:30:00
1 50 2024-03-27 09:00:00
out = pd.DataFrame((b['Conversion'] for v in input['data'].values()
for b in v.values()))
print (out)
id datetime
0 1 2024-03-26 08:30:00
1 50 2024-03-27 09:00:00
In json_normalize
is parameter max_level
, but working different:
Max number of levels(depth of dict) to normalize. if None, normalizes all levels.
out = pd.json_normalize(input['data'], max_level=1)
print (out)
data.1 \
0 {'Conversion': {'id': '1', 'datetime': '2024-0...
data.50
0 {'Conversion': {'id': '50', 'datetime': '2024-...
out = pd.json_normalize(input['data'], max_level=2)
print (out)
data.1.Conversion \
0 {'id': '1', 'datetime': '2024-03-26 08:30:00'}
data.50.Conversion
0 {'id': '50', 'datetime': '2024-03-27 09:00:00'}
out = pd.json_normalize(input['data'], max_level=3)
print (out)
data.1.Conversion.id data.1.Conversion.datetime data.50.Conversion.id \
0 1 2024-03-26 08:30:00 50
data.50.Conversion.datetime
0 2024-03-27 09:00:00