Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas JSON Normalize – Choose Correct Record Path

I am trying to figure out how to normalize the nested JSON response sampled below.

Right now, json_normalize(res,record_path=['data']) is giving me MOST of the data I need but what I would really like is the detail in the "session_pageviews" list/dict with the attributes of the data list/dic included.

I tried json_normalize(res,record_path=['data', ['session_pageviews']], meta = ['data']) but I get an error: ValueError: operands could not be broadcast together with shape (32400,) (180,)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I also tried json_normalize(res,record_path=['data'], max_level = 1) but that does not unnest session_pageviews

Any help would be appreciated!

enter image description here

>Solution :

You can try to apply the following function to your json:

def flatten_nested_json_df(df):
    df = df.reset_index()
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()
    
    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    
    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns) # inplace

        for col in list_columns:
            #print(f"exploding: {col}")
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df

by doing this:

df1= flatten_nested_json_df(df)

where

df = pd.json_normalize(json)

That should give you all the information contained in your json.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading