Follow

Follow

Contact

Home CSV data format to nested dictionary

Questions

CSV data format to nested dictionary

byMR

June 19, 2024

I have below csv format. I want it to convert some nested dict.

name,columns,tests
ABC_ESTIMATE_REFINED,cntquota,dbt_expectations.expect_column_to_exist
ABC_ESTIMATE_REFINED,cntquota,not_null
ABC_ESTIMATE_REFINED,is_purged,dbt_expectations.expect_column_to_exist
ABC_ESTIMATE_REFINED,is_purged,not_null

Expected Output

{
    "name": "ABC_ESTIMATE_REFINED",
    "columns": [
        {
            "name": "cntquota",
            "tests": [
                "dbt_expectations.expect_column_to_exist",
                "not_null"
            ]
        },
        {
            "name": "is_purged",
            "tests": [
                "dbt_expectations.expect_column_to_exist",
                "not_null"
            ]
        }
    ]
}

my attempt is below , but not reaching even close to it.

df=pd.read_csv('data.csv')
print(df)
nested_dict = df.groupby(['name','columns']).apply(lambda x: x[['tests']].to_dict(orient='records')).to_dict()
 
print(nested_dict)

>Solution :

IIUC, you can use nested groupby calls:

out = [{'name': k1, 'columns': [{'name': k2, 'tests': g2['tests'].tolist()}
                                for k2, g2 in g1.groupby('columns')]}
       for k1, g1 in df.groupby('name')]

Since the processing occurs by pairs or columns, you could also imagine a recursive approach:

def group(df, keys):
    if len(keys) > 1:
        key1, key2 = keys[:2]
        return [{key1: k, key2: group(g, keys[1:])}
                for k, g in df.groupby(key1)]
    else:
        return df[keys[0]].tolist()

out = group(df, ['name', 'columns', 'tests'])

Output:

[{'name': 'ABC_ESTIMATE_REFINED',
  'columns': [{'name': 'cntquota',
               'tests': ['dbt_expectations.expect_column_to_exist', 'not_null']},
              {'name': 'is_purged',
               'tests': ['dbt_expectations.expect_column_to_exist', 'not_null']}],
 }]

Demo of the recursive approach with a different order of the keys:

group(df, ['name', 'tests', 'columns'])

[{'name': 'ABC_ESTIMATE_REFINED',
  'tests': [{'tests': 'dbt_expectations.expect_column_to_exist',
             'columns': ['cntquota', 'is_purged']},
            {'tests': 'not_null', 'columns': ['cntquota', 'is_purged']}],
}]

dictionary

byMR

Published June 19, 2024

Add a comment

Leave a ReplyCancel reply

Read more

Questions

Question about vue 3 lifecycle and availability of properties

byMR

June 19, 2024

Questions

Is there a possibility to play a CSS animation if the content of an element is changed?

byMR

June 19, 2024

Questions

Why both of the custom decorators are called even if I only called one of them?

byMR

June 19, 2024

Questions

Column in dataframe turns from integer to numeric when replacing NA with 0

byMR

June 19, 2024

Questions

Multiple case and conditions

byMR

June 19, 2024

Questions

How to get the the number of data stored in the rows of an array with a formula in excel?

byMR

June 19, 2024