Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

"trailing data" error when reading json to Pandas dataframe

I have a Python 3.8.5 script that gets a JSON from an API, saves to disk, reads JSON to DF. It works.

df = pd.io.json.read_json('json_file', orient='records')

I want to try IO buffer instead so I don’t have to read/write to disk, but I am getting an error. The code is like this:

from io import StringIO
io = StringIO()
json_out = []
# some code to append API results to json_out
json.dump(json_out, io)
df = pd.io.json.read_json(io.getvalue())

On that last line I get the error

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 199, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 618, in read_json
    result = json_reader.read()

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 755, in read
    obj = self._get_object_parser(self.data)

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 777, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 886, in parse
    self._parse_no_numpy()

  File "C:\Users\chap\Anaconda3\lib\site-packages\pandas\io\json\_json.py", line 1119, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None

ValueError: Trailing data

The JSON is in a list format. So this is not the actual json but it looks like this when I write to disk:

json = [
      {"state": "North Dakota",
        "address": "123 30th st E #206",
        "account": "123"
    },
    {"state": "North Dakota",
        "address": "456 30th st E #206",
        "account": "456"
    }
    ]

Given that it worked in the first case (write/read from disk), I don’t know how to troubleshoot. How do I troubleshoot something in the buffer? The actual data is mostly text but has some number fields.

>Solution :

Don’t know what’s going wrong for you, this works for me:

import json
import pandas as pd
from io import StringIO

json_out = [
    {"state": "North Dakota",
     "address": "123 30th st E #206",
     "account": "123"
     },
    {"state": "North Dakota",
     "address": "456 30th st E #206",
     "account": "456"
     }
]

io = StringIO()
json.dump(json_out, io)
df = pd.io.json.read_json(io.getvalue())
print(df)

leads me to believe there’s something wrong with the code that appends the API data…

However, if you have a list of dictionaries, you don’t need the IO step. You can just do:

pd.DataFrame(json_out)

EDIT: I think I remember this error when there was a comma at the end of my json like so:

[
  {
    "hello":"world",
  },
]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading