I have a pandas dataframe with 8 columns, one of which is made up of cells containing nested dictionaries and lists in the following format: {'x1': {'y1': ['z1', 'z2'], 'y2': ['z3', 'z4', 'z5']}, 'x2': {'y1': ['z6', 'z7', 'z8'], 'y2': ['z9', z10']}}
How can I transform this column so that the x-variables are melted with the y-variables and become column headers (next to all the remaining columns in the rest of the data frame) and the z-variables become the data within the cells?
As this is not JSON format, the json_normalize() does not work. I have tried different ways to .stack and .unstack, with not much success.
>Solution :
json_normalize
seems to be doing what you expect:
df = pd.DataFrame({'col': [{'x1': {'y1': ['z1', 'z2'], 'y2': ['z3', 'z4', 'z5']}, 'x2': {'y1': ['z6', 'z7', 'z8'], 'y2': ['z9', 'z10']}}],
'other': '...'
})
out = df.join(pd.json_normalize(df.pop('col')))
NB. you can control the way the levels are joined with the sep
parameter`.
Output:
other x1.y1 x1.y2 x2.y1 x2.y2
0 ... [z1, z2] [z3, z4, z5] [z6, z7, z8] [z9, z10]