Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use regex to capture only some of the lines inside a specific section of a log file

I have the following structure in log file:

message0
Begin
message1
Error: message2
message3
Error: message4
End
Error: message5

Note this is a simplified version, the file can contain a larger number of lines, but the messages that I want to match are the Error messages, meaning:

  1. The message is in lines that are between Begin and End
  2. the line starts with Error:

So for this example, the regex should return message2 and message4

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried using this pattern:
Begin\n((Error: (?<message>.*)\n)|(.*\n))*End

But message group returns only message4. Why is that? And how should I fix the regex to capture both messages?

Here is a link to what I tried: https://regex101.com/r/tfDs8t/1

>Solution :

You have shared a link to a JavaScript sample. The nature of your current pattern is to match a group untill no match is found. It will return the last captured pattern. I’m not aware of any fancy JS-style pattern other than to assert that ‘End’ is still following and we are not crossing over ‘Begin’.

^Error: (?<message>.*)(?=(?:\n(?!Begin$).*)*\nEnd$)

See an online demo which I altered a little (included an 1st error message) to demonstrate the pattern. You can also see I used the multiline regex option to allow for different start-positions.


  • ^Error: (?<message>.*) – Start-line anchor, followed by literally ‘Error: ‘ and a named capture group to hold 0+ characters (greedy);
  • (?= – Open a positive lookahead;
    • (?:\n(?!Begin$).*)* – Match a newline character followed by 0+ characters (when the line does not equal ‘Begin’) and match this 0+ times;
    • \nEnd$) – Match another newline and the word ‘End’ before the end-line anchor.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading