Flatten column within data frame containing nested dictionaries and lists


I have a pandas dataframe with 8 columns, one of which is made up of cells containing nested dictionaries and lists in the following format: {'x1': {'y1': ['z1', 'z2'], 'y2': ['z3', 'z4', 'z5']}, 'x2': {'y1': ['z6', 'z7', 'z8'], 'y2': ['z9', z10']}}

How can I transform this column so that the x-variables are melted with the y-variables and become column headers (next to all the remaining columns in the rest of the data frame) and the z-variables become the data within the cells?

As this is not JSON format, the json_normalize() does not work. I have tried different ways to .stack and .unstack, with not much success.

>Solution :

json_normalize seems to be doing what you expect:

df = pd.DataFrame({'col': [{'x1': {'y1': ['z1', 'z2'], 'y2': ['z3', 'z4', 'z5']}, 'x2': {'y1': ['z6', 'z7', 'z8'], 'y2': ['z9', 'z10']}}],
                   'other': '...'

out = df.join(pd.json_normalize(df.pop('col')))

NB. you can control the way the levels are joined with the sep parameter`.


  other     x1.y1         x1.y2         x2.y1      x2.y2
0   ...  [z1, z2]  [z3, z4, z5]  [z6, z7, z8]  [z9, z10]

Leave a ReplyCancel reply