Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Trying to stream my (very large) json file with ijson – is it formatted wrong?

I’m trying to stream through a large json file using ijson in python. This is my first time trying this.

my code is really simple right now:

with open('file.json', 'rb') as f:
j = ijson.items(f, 'item')

for item in j:
    print('x')

This returns a "trailing garbage" error – essentially the 2nd item in the file is considered garbage, i think because of the file format.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My json file is this one from kaggle, and is formatted like this:

{"_id":{"$oid":"6457879fd1187d621cbbba9c"},"sourceCC":"us",...etc...}
{"_id":{"$oid":"6457879fd1187d621cbddd8a"},"sourceCC":"us",...etc...}

It is about 3GB in size, so im unable to open it.

If i use ‘multiple_items=True’ i believe it considers all the items to be multiple values for the same item, so it does not return any error, but also does not return anything else.

What can I do?

Thanks.

>Solution :

That’s not actuall a JSON document. That is a series of JSON documents concatenated using newlines. You don’t need ijson to read it; you can instead read it line-by-line and use the built-in json module:

import json

with open('myfile.json') as fd:
  for line in fd:
    obj = json.loads(line)
    # do something with obj here
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading