Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Exploding multiple dict columns and concatenating with original Pandas data frame

I export a Postgres SQL query to create a Pandas data frame df similar to the following:

df = pd.DataFrame({
    'employee_id' : [123, 456, 789],
    'country_code' : ['US', 'CAN', 'MEX'],
    'sales' : [{'foo': 2, 'bar': 0, 'baz': 1},
               {'foo': 3, 'bar': 1, 'baz': 2},
               {'foo': 7, 'bar': 0, 'baz': 4}],
    'expenses' : [{'red': 1, 'white': 0, 'blue': 3},
               {'red': 1, 'white': 0, 'blue': 1},
               {'red': 2, 'white': 2, 'blue': 2}]
})

df
 
    employee_id   country_code      sales                             expenses
0   123           US                {'foo': 2, 'bar': 0, 'baz': 1}    {'red': 1, 'white': 0, 'blue': 3}
1   456           CAN               {'foo': 3, 'bar': 1, 'baz': 2}    {'red': 1, 'white': 0, 'blue': 1}
2   789           MEX               {'foo': 7, 'bar': 0, 'baz': 4}    {'red': 2, 'white': 2, 'blue': 2}

I would like to be able to explode both the sales and expenses columns so that their keys are separate columns. Currently, I’m only able to explode one of these columns, as follows:

df = pd.json_normalize(df['sales'])
df
    foo bar baz
0   2   0   1
1   3   1   2
2   7   0   4

I’m not able to pass a list of columns to pd.json.normalize().

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Questions:

  1. How do I explode both the sales and expenses columns?
  2. After exploding both columns, how do I add back the other two columns (employee_id and country_code) from the original data frame?

The desired output is:

    employee_id   country_code   foo   bar   baz   red   white   blue
0   123           US             2     0     1     1     0       3
1   456           CAN            3     1     2     1     0       1
2   789           MEX            7     0     4     2     2       2

Thank you!

>Solution :

You may use concat along axis=1 with json_normalize:

json_cols = ['sales','expenses']
result = pd.concat([pd.json_normalize(df[col]) for col in json_cols],axis=1)
result = pd.concat([df.drop(json_cols,axis=1),result],axis=1)

Output:

result

    employee_id country_code    foo bar baz red white   blue
0   123         US               2   0   1   1   0       3 
1   456         CAN              3   1   2   1   0       1
2   789         MEX              7   0   4   2   2       2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading