I am trying to think out of the box here, and my idea might be really bad. Feel free to point out better alternatives. I want to create a nested column pandas dataframe, both for visualization and analysis purposes. The output should look like this –
Marks
Physics | Biology
Theo|Prac | Theo|Prac
99 | 100 | 89 | 100
My data is stored in the form of a nested dictionary –
nested_dict = {
'Marks': {
'Physics': {
'Theo': 99,
'Prac': 100
},
'Biology': {
'Theo': 89,
'Prac': 100
}
}
}
I think the above table looks great for visualization but I am not sure if it’ll make things easy for analysis. For analysis, I’d need to do operations on the subgroups, For eg: Physics percentage = (Theo + Prac)/200*100. Is a nested panda dataframe the best way to do the analysis?
Is there a way I can do that? Doing pd.DataFrame.from_dict(nested_dict) doesn’t seem to work. This is what I get –
Marks
Biology {'Theo': 89, 'Prac': 100}
Physics {'Theo': 99, 'Prac': 100}
>Solution :
Un-nesting can be achieved manually with concat and a dictionary comprehension, then unstack+to_frame and transposition:
out = (pd.concat({k: pd.DataFrame(v)
for k, v in nested_dict.items()},
axis=1)
.unstack().to_frame().T
)
You could also use json_normalize and convert the columns to MultiIndex:
out = pd.json_normalize(nested_dict)
out.columns = out.columns.str.split('.', expand=True)
Output:
Marks
Physics Biology
Theo Prac Theo Prac
0 99 100 89 100