Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Parsing a log file and ignoring text between two targets

This question is a follow-up to my previous question here: Parsing text and JSON from a log file and keeping them together

I have a log file, your_file.txt with the following structure and I would like to extract the timestamp, run, user, and json:

A whole bunch of irrelevant text
2022-12-15 12:45:06 garbage, run: 1, user: james json:
[{"value": 30, "error": 8}]

Another stack user was helpful enough to provide this abridged code to extract the relevant pieces:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import re

pat = re.compile(
    r'(?ms)^([^,\n]+),\s*run:\s*(\S+),\s*user:\s*(.*?)\s*json:\n(.*?)$'
)

with open('your_file.txt', 'r') as f_in:
    print(pat.findall(f_in.read()))

Which returns this value which is then processed further:

[('2022-12-15 12:45:06 garbage', '1', 'james', '[{"value": 30, "error": 8}]')]

How can I amend the regex expression used to ignore the word "garbage" after the timestamp so that word is not included in the output of pat.findall?

>Solution :

You can use the date time pattern to match date time first and then the rest of the substring before ,:

(?ms)^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})[^,\n]*,\s*run:\s*(\S+),\s*user:\s*(.*?)\s*json:\n(.*?)$

See the regex demo.

The ([^,\n]+) is replaced with (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})[^,\n]* that matches

  • (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) – Group 1: four digits, two occurrences of - and then two digits, a space, two digits, and then two occurrences of : and then two digits
  • [^,\n]* – zero or more chars other than a comma and newline
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading