Home Exploding multiple dict columns and concatenating with original Pandas data frame

Questions

Exploding multiple dict columns and concatenating with original Pandas data frame

May 13, 2022

I export a Postgres SQL query to create a Pandas data frame df similar to the following:

df = pd.DataFrame({
    'employee_id' : [123, 456, 789],
    'country_code' : ['US', 'CAN', 'MEX'],
    'sales' : [{'foo': 2, 'bar': 0, 'baz': 1},
               {'foo': 3, 'bar': 1, 'baz': 2},
               {'foo': 7, 'bar': 0, 'baz': 4}],
    'expenses' : [{'red': 1, 'white': 0, 'blue': 3},
               {'red': 1, 'white': 0, 'blue': 1},
               {'red': 2, 'white': 2, 'blue': 2}]
})

df
 
    employee_id   country_code      sales                             expenses
0   123           US                {'foo': 2, 'bar': 0, 'baz': 1}    {'red': 1, 'white': 0, 'blue': 3}
1   456           CAN               {'foo': 3, 'bar': 1, 'baz': 2}    {'red': 1, 'white': 0, 'blue': 1}
2   789           MEX               {'foo': 7, 'bar': 0, 'baz': 4}    {'red': 2, 'white': 2, 'blue': 2}

I would like to be able to explode both the sales and expenses columns so that their keys are separate columns. Currently, I’m only able to explode one of these columns, as follows:

df = pd.json_normalize(df['sales'])
df
    foo bar baz
0   2   0   1
1   3   1   2
2   7   0   4

I’m not able to pass a list of columns to pd.json.normalize().

Questions:

How do I explode both the sales and expenses columns?
After exploding both columns, how do I add back the other two columns (employee_id and country_code) from the original data frame?

The desired output is:

    employee_id   country_code   foo   bar   baz   red   white   blue
0   123           US             2     0     1     1     0       3
1   456           CAN            3     1     2     1     0       1
2   789           MEX            7     0     4     2     2       2

Thank you!

>Solution :

You may use concat along axis=1 with json_normalize:

json_cols = ['sales','expenses']
result = pd.concat([pd.json_normalize(df[col]) for col in json_cols],axis=1)
result = pd.concat([df.drop(json_cols,axis=1),result],axis=1)

Output:

result

    employee_id country_code    foo bar baz red white   blue
0   123         US               2   0   1   1   0       3 
1   456         CAN              3   1   2   1   0       1
2   789         MEX              7   0   4   2   2       2