Say I have a dict defined as:
dict = {'1': [{'name': 'Hospital 0',
'students': 5,
'grad': 71},
{'name': 'Hospital 1',
'students': 8,
'grad': 74}],
'2': [{'name': 'Hospital 0',
'students': 11,
'grad': 72}]
{'name': 'Hospital 1',
'students': 10,
'grad': 78}]}
Suppose I want to make a dataframe from this formatted as follows:
| step | name | students | grad |
|---|---|---|---|
| 1 | Hospital 0 | 5 | 71 |
| 1 | Hospital 1 | 8 | 74 |
| 2 | Hospital 0 | 11 | 72 |
| 2 | Hospital 1 | 10 | 78 |
Do you guys have any ideas?
>Solution :
Here is an approach using json_normalize() Note: I am using data as variable name instead of dict which is python built-in function.
from pandas import json_normalize
import pandas as pd
dfs = [json_normalize(data[key]).assign(step=key) for key in data if "name" in data[key][0]]
df = pd.concat(dfs, ignore_index=True)
df = df[["step", "name", "students", "grad"]]
print(df)
step name students grad
0 1 Hospital 0 5 71
1 1 Hospital 1 8 74
2 2 Hospital 0 11 72
3 2 Hospital 1 10 78