I have the following structure in log file:
message0
Begin
message1
Error: message2
message3
Error: message4
End
Error: message5
Note this is a simplified version, the file can contain a larger number of lines, but the messages that I want to match are the Error messages, meaning:
- The message is in lines that are between Begin and End
- the line starts with Error:
So for this example, the regex should return message2 and message4
I tried using this pattern:
Begin\n((Error: (?<message>.*)\n)|(.*\n))*End
But message group returns only message4. Why is that? And how should I fix the regex to capture both messages?
Here is a link to what I tried: https://regex101.com/r/tfDs8t/1
>Solution :
You have shared a link to a JavaScript sample. The nature of your current pattern is to match a group untill no match is found. It will return the last captured pattern. I’m not aware of any fancy JS-style pattern other than to assert that ‘End’ is still following and we are not crossing over ‘Begin’.
^Error: (?<message>.*)(?=(?:\n(?!Begin$).*)*\nEnd$)
See an online demo which I altered a little (included an 1st error message) to demonstrate the pattern. You can also see I used the multiline regex option to allow for different start-positions.
^Error: (?<message>.*)– Start-line anchor, followed by literally ‘Error: ‘ and a named capture group to hold 0+ characters (greedy);(?=– Open a positive lookahead;(?:\n(?!Begin$).*)*– Match a newline character followed by 0+ characters (when the line does not equal ‘Begin’) and match this 0+ times;\nEnd$)– Match another newline and the word ‘End’ before the end-line anchor.