Convert a list of jsons into a dataframe and then step-by-step

May 10, 2022

I have a problem. I have a list that contains 2549150 elements. However, I don’t want to convert the whole list into a dataframe at once, using the pd.json_normalize method.

I would like to convert the list into a dataframe step by step. First I want to convert the first 100,000 elements of the list, then from the 100,000 + 1 element the next 100,000 elements and so on.
However, the problem is that my dataframe contains 2500000 elements at the end instead of 2549150 elements. I therefore have too many and wrong elements. How can I fix the error?

In summary, I would like to convert the list into a dataframe in 100,000 steps.

import pandas as pd
my_Dict = {
'_key': '1',
 'group': 'test',
 'data': {},
 'type': '',
 'code': '007',
 'conType': '1',
 'flag': None,
 'createdAt': '2021',
 'currency': 'EUR',
 'detail': {
        'selector': {
            'number': '12312',
            'isTrue': True,
            'requirements': [{
                'type': 'customer',
                'requirement': '1'}]
            }
        }   
 }
a1D= [my_Dict] * 2549150
size = 25 # Didn't want to calculate this myself, but didn't know how else to solve it.
df_complete = pd.DataFrame()
for i in range(0,len(a1D),len(a1D)//size):
    #print(i)
    df = pd.json_normalize(a1D[i:i+100000], sep='_')
    #print(df.shape)
    df_complete= pd.concat([df_complete, df])
df_complete.shape
>>> [OUT]
>>> (2500000, 11)

>Solution :

Rather than step up to your guess at how many elements there should be, step by the chunk size up to the length of the array instead:

df_complete = pd.DataFrame()
chunk = 100000
for i in range(0, len(a1D), chunk):
    df = pd.json_normalize(a1D[i:i+chunk], sep='_')
    df_complete = pd.concat([df_complete, df])

df_complete.shape

Output: