Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Unnest hierarchy json with python?

I have a JSON like this:

{
    "department":"Data & Analytics",
    "child":[
        {
            "department":"Data Enginnering",
            "child": [
                {"department":"AWS Squad"},
                {"department":"GCP Squad"}
                ..
                    ..so..
                        ..on..
                            ..so..
                                ..forth..
                                    ..
            ]
        },
        {
            "department":"Data Science"
        }
    ]
}

I need to load it in BigQuery so what I am looking for is to transform it in something like the code below before:

[
    {
        "department":"Data & Analytics",
        "child":["Data Enginnering", "Data Science"]
    },
    {
        "department":"Data Enginnering",
        "child":["AWS Squad", "GCP Squad"]
    },
    {
        "department":"Data Science"
    },
    {
        "department": "AWS Squad"
    },
    {
        "department": "GCP Squad"
    }
]

But i got stuck trying

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Since the data is recursive, this can be solved using recursion.

def convert(data, output):
    department = data["department"]
    children = data.get("child")

    new_object = {"department": department}
    output.append(new_object)

    if children:
        new_object["child"] = [convert(child, output) for child in children]
    
    return department

It would be used like this

test_data = {
    "department":"Data & Analytics",
    "child":[
        {
            "department":"Data Enginnering",
            "child": [
                {"department":"Other"},
                {"department":"Sales"}
            ]
        },
        {
            "department":"Data Science"
        }
    ]
}

output = []
convert(test_data, output)
# convert output to json and send to BigQuery...

For the above example, the result is

[
    {
        "department": "Data & Analytics",
        "child": [
            "Data Enginnering",
            "Data Science"
        ]
    },
    {
        "department": "Data Enginnering",
        "child": [
            "Other",
            "Sales"
        ]
    },
    {
        "department": "Other"
    },
    {
        "department": "Sales"
    },
    {
        "department": "Data Science"
    }
]

It is not quite the same as your example output, but it’s unclear from that example why some departments get an object added to the main list, and others don’t.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading