Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Parsing data containing escaped quotes and separators in python

I have data that is structured like this:

1661171420, foo="bar", test="This, is a \"TEST\"", count=5, com="foo, bar=blah"

It always starts with a unix timestamp, but then I can’t know how many other fields follow and how they are called.

The goal is to parse this into a dictionary as such:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

{"timestamp": 1661171420,
 "foo": "bar",
 "test": 'This, is a "TEST"',
 "count": 5,
 "com": "foo, bar=blah"}

I’m having trouble parsing this, especially regarding the escaped quotes and commas in the values.
What would be the best way to parse this correctly? preferably without any 3rd party modules.

>Solution :

If changing the format of input data is not an option (JSON would be much easier to handle, but if it is an API as you say then you might be stuck with this) the following would work assuming the file follows given structure more or less. Not the cleanest solution, I agree, but it does the job.

import re

d = r'''1661171420, foo="bar", test="This, is a \"TEST\"", count=5, com="foo, bar=blah", fraction=-0.11'''.replace(r"\"", "'''")

string_pattern = re.compile(r'''(\w+)="([^"]*)"''')

matches = re.finditer(string_pattern, d)

parsed_data = {}
parsed_data['timestamp'] = int(d.partition(", ")[0])
for match in matches:
    parsed_data[match.group(1)] = match.group(2).replace("'''", "\"")

number_pattern = re.compile(r'''(\w+)=([+-]?\d+(?:\.\d+)?)''')

matches = re.finditer(number_pattern, d)
for match in matches:
    try:
        parsed_data[match.group(1)] = int(match.group(2))
    except ValueError:
        parsed_data[match.group(1)] = float(match.group(2))

print(parsed_data)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading