Convert Pandas dataframe to nested JSON (without nesting as lists)

May 26, 2022

I’m aware there are other threads on this topic, but I’m running into an issue that no other answer seems to address.

Given the following Pandas dataframe:

a  b  c  d
a1 b1 c1 d1
a2 b2 c2 d2

I would like to create a nested JSON object with the following structure:

[
    {
        "a": "a1",
        "b": "b1",
        "nested_group":
            {
                "c": "c1",
                "d": "d1"
            }
    },
    {
        "a": "a2",
        "b": "b2",
        "nested_group":
            {
                "c": "c2",
                "d": "d2"
            }
    }
]

The solution I’ve found in other threads is to use the following code:

j = (df.groupby(['a','b']) 
      .apply(lambda x: x[['c','d']].to_dict('records')) 
      .reset_index() 
      .rename(columns={0:'nested_group'}) 
      .to_json(orient='records'))

However, the issue I’m running into is that the above code places each nested_group object inside square brackets, like so:

"nested_group": [
    {
        "c": "c2",
        "d": "d2"
    }
]

The object I’m trying to generate is intended to be the payload for an API call, and unfortunately the square brackets around each inner dictionary cause the API to return an error. (The outermost brackets at the very top/bottom of the object are fine.) How can I make Python NOT treat the inner dictionaries as lists?

Code to reproduce the example dataframe:

import numpy as np
import pandas as pd

array = np.array([['a1', 'b1', 'c1', 'd1'], ['a2', 'b2', 'c2', 'd2']])
df = pd.DataFrame(data=array, columns=['a','b','c','d'])

Thank you in advance 🙂

>Solution :

Looking at the docs for to_dict it seems like we still have to use the records option, and if we assume it will always be a list of 1 element, just take the 0th element using your original code

>>> import numpy as np
>>> import pandas as pd
>>> array = np.array([['a1', 'b1', 'c1', 'd1'], ['a2', 'b2', 'c2', 'd2']])
>>> df = pd.DataFrame(data=array, columns=['a','b','c','d'])
>>> (df.groupby(['a','b']) 
      .apply(lambda x: x[['c','d']].to_dict('records')[0]) 
      .reset_index() 
      .rename(columns={0:'nested_group'}) 
      .to_json(orient='records'))
'[{"a":"a1","b":"b1","nested_group":{"c":"c1","d":"d1"}},{"a":"a2","b":"b2","nested_group":{"c":"c2","d":"d2"}}]'