Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert a list of jsons into a dataframe and then step-by-step

I have a problem. I have a list that contains 2549150 elements. However, I don’t want to convert the whole list into a dataframe at once, using the pd.json_normalize method.

I would like to convert the list into a dataframe step by step. First I want to convert the first 100,000 elements of the list, then from the 100,000 + 1 element the next 100,000 elements and so on.
However, the problem is that my dataframe contains 2500000 elements at the end instead of 2549150 elements. I therefore have too many and wrong elements. How can I fix the error?

In summary, I would like to convert the list into a dataframe in 100,000 steps.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd
my_Dict = {
'_key': '1',
 'group': 'test',
 'data': {},
 'type': '',
 'code': '007',
 'conType': '1',
 'flag': None,
 'createdAt': '2021',
 'currency': 'EUR',
 'detail': {
        'selector': {
            'number': '12312',
            'isTrue': True,
            'requirements': [{
                'type': 'customer',
                'requirement': '1'}]
            }
        }   
 }
a1D= [my_Dict] * 2549150
size = 25 # Didn't want to calculate this myself, but didn't know how else to solve it.
df_complete = pd.DataFrame()
for i in range(0,len(a1D),len(a1D)//size):
    #print(i)
    df = pd.json_normalize(a1D[i:i+100000], sep='_')
    #print(df.shape)
    df_complete= pd.concat([df_complete, df])
df_complete.shape
>>> [OUT]
>>> (2500000, 11)

>Solution :

Rather than step up to your guess at how many elements there should be, step by the chunk size up to the length of the array instead:

df_complete = pd.DataFrame()
chunk = 100000
for i in range(0, len(a1D), chunk):
    df = pd.json_normalize(a1D[i:i+chunk], sep='_')
    df_complete = pd.concat([df_complete, df])

df_complete.shape

Output:

(2549150, 11)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading