Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Parse data multiple lines where pattern found

I need to parse the below file where each row starts with date and any row can span multiple lines. Basically row delimiter should be date instead of newline

2021-01-01 INFO Workflow successful
2021-02-02 ERROR Workflow Failed due to below error:
    Data Type mismatch
    at Line number 30
2021-03-03 INFO Workflow successful 

Code:

import json
import re
result = []
with open(r"C:\DUMMY\log\a1.txt", "r") as f:
    lines = f.readlines()
    for line in lines:
        data = line.split(' ')
        x = re.search('^\d{4}-\d{2}-\d{2}.*?', data[0])
        if x != None:
            result.append({'Date':data[0], 'Severity':data[1], 'Message':' '.join(data[2:])})
        
data = json.dumps(result)
jsondata = json.loads(data)
print(jsondata)

Actual Output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Since the 2nd row is spanning multiple lines, the data is not getting parsed. Need help to parse the entire output till next row starting with date is found

[{'Date': '2021-01-01',
  'Severity': 'INFO',
  'Message': 'Workflow successful\n'},
 {'Date': '2021-02-02',
  'Severity': 'ERROR',
  'Message': 'Workflow Failed due to below error:\n'},
 {'Date': '2021-03-03',
  'Severity': 'INFO',
  'Message': 'Workflow successful\n'}]

Expected Output:

[{'Date': '2021-01-01',
  'Severity': 'INFO',
  'Message': 'Workflow successful'},
 {'Date': '2021-02-02',
  'Severity': 'ERROR',
  'Message': 'Workflow Failed due to below error: Data Type mismatch at Line number 30'},
 {'Date': '2021-03-03',
  'Severity': 'INFO',
  'Message': 'Workflow successful'}]

>Solution :

Does this fix your issue?

if x != None:
    # line contains a date
    result.append({'Date':data[0], 'Severity':data[1], 'Message':' '.join(data[2:])})
else:
    result[-1]['Message'] += ' ' + line.strip()

Note, I’ve made the following assumption: each row is represented as a line that starts with a date optionally followed by additional lines that describe the row/error in more detail. If this assumption is broken, result[-1] may cause an IndexError or the output may be incorrect.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading