Home Automate fractal like nested JSON normalization

Questions

Automate fractal like nested JSON normalization

August 30, 2022

The problem :

I have 100+ JSON with a fractal like structure of list of dicts. The width and the heigth of the data structure vary a lot from one JSON to another. Each labels are parts of a sentence.

test = [
    {
        "label": "I",
        "children": [
            {
                "label": "want",
                "children": [
                    {
                        "label": "a",
                        "children": [
                            {"label": "coffee"},
                            {"label": "big", "children": [{"label": "piece of cake"}]},
                        ],
                    }
                ],
            },
            {"label": "need", "children": [{"label": "time"}]},
            {"label": "like",
                "children": [{"label": "italian", "children": [{"label": "pizza"}]}],
            },
        ],
    },
    {
        "label": "We",
        "children": [
            {"label": "are", "children": [{"label": "ok"}]},
            {"label": "will", "children": [{"label": "rock you"}]},
        ],
    },
]

I want to automate the normalization of JSON to obtain this type of output :

sentences = [
'I want a coffee', 
'I want a big piece of cake', 
'I need time', 
'I like italian pizza', 
'We are ok',
'We will rock you',
]

It’s really like the os.walk function that returns each "path".

What I tried :

pandas.json_normalize but it need to a predifine meta and record_path arguments to work with complexe herarchies ;
jsonpath_ng with parse('[*]..label') but I coudn’t find the way to work this out ;
flatten function like this post that obtains :

{'0label': 'I',
 '0children_0label': 'want',
 '0children_0children_0label': 'a',
 '0children_0children_0children_0label': 'coffee',
 '0children_0children_0children_1label': 'big',
 '0children_0children_0children_1children_0label': 'piece of cake',
 '0children_1label': 'need',
 '0children_1children_0label': 'time',
 '0children_2label': 'like',
 '0children_2children_0label': 'italian',
 '0children_2children_0children_0label': 'pizza',
 '1label': 'We',
 '1children_0label': 'are',
 '1children_0children_0label': 'ok',
 '1children_1label': 'will',
 '1children_1children_0label': 'rock you'}

I tried to split keys to identify hierarchy but I have an indexation problem. For example, I don’t understand why some keyslike ‘1children_0label’ contains ‘0label’ and not ‘1label’ index that should refer to {‘1label’ : ‘We’}.

while loops that output a list of ‘levels’ as list of tuples containing count of n+1 children and label. It was meant to be the first step to recreate the final output but I’m couldn’t work this out too.

import copy
levels = []
idx = [i for i in range(len(test))]
stack = copy.deepcopy(test)
lvl = 1
while stack: 
    idx = []
    children = []
    for i,d in enumerate(stack):
        if 'children' in d:
            n = len(d['children'])
        else : 
            n = 0
        occurences = (n,d['label'])
        idx.append(occurences)
        
        children = stack[i].copy()
        if 'children' in stack[i]:
            children.extend(stack[i]['children'])
    
    stack = childs.copy()
    children = []
    levels.append(idx.copy())       

print(levels)

Output :

[[(3, 'I'), (2, 'We')], [(1, 'want'), (1, 'need'), (1, 'like'), (1, 'are'), (1, 'will')], [(2, 'a'), (0, 'time'), (1, 'italian'), (0, 'ok'), (0, 'rock you')], [(0, 'coffee'), (1, 'big'), (0, 'pizza')], [(0, 'piece of cake')]]

Please help.

>Solution :

You can try a recursion:

def get_sentences(o):
    if isinstance(o, dict):
        if "children" in o:
            for item in get_sentences(o["children"]):
                yield o["label"] + " " + item
        else:
            yield o["label"]
    elif isinstance(o, list):
        for v in o:
            yield from get_sentences(v)


print(list(get_sentences(test)))

Prints:

[
    "I want a coffee",
    "I want a big piece of cake",
    "I need time",
    "I like italian pizza",
    "We are ok",
    "We will rock you",
]

data-cleaning

byMR

Published August 30, 2022

Add a comment

in Rust macroquad engine, how to make sure one moving direction is selected

byMR

August 30, 2022

Questions

pandas read csv as empty while it is not

byMR

August 30, 2022

Questions

how to get key from map in flutter at required position's key in flutter

byMR

August 30, 2022

Questions

PHP preg_replace to extract first number before dash

byMR

August 30, 2022

Questions

C# / Dotnet importing classes syntax

byMR

August 30, 2022

Questions

Matplotlib import/apply style with minor changes

byMR

August 30, 2022

Automate fractal like nested JSON normalization

The problem :

MEDevel.com: Open-source for Healthcare and Education

What I tried :

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

in Rust macroquad engine, how to make sure one moving direction is selected

pandas read csv as empty while it is not

how to get key from map in flutter at required position's key in flutter

PHP preg_replace to extract first number before dash

C# / Dotnet importing classes syntax

Matplotlib import/apply style with minor changes

Keep Up to Date with the Most Important News

Automate fractal like nested JSON normalization

The problem :

MEDevel.com: Open-source for Healthcare and Education

What I tried :

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

in Rust macroquad engine, how to make sure one moving direction is selected

pandas read csv as empty while it is not

how to get key from map in flutter at required position's key in flutter

PHP preg_replace to extract first number before dash

C# / Dotnet importing classes syntax

Matplotlib import/apply style with minor changes

Discover more from Dev solutions