I was getting quit far via regex101 but now I am stuck.
I want to extract a string between "markers" using Regex from Python 3.9.
In the following example lines I will get the foobar back for each line. The "marker" is =. But that marker does have some edge cases.
lore =foobar= ipsum(there is space before and after=)lore =foobar=.=foobar= ipsumlore =foobar=
This is what shouldn’t not match because the =x is not allowed.
lore =foobar=x
That is the regex I am using (Python 3.9)
=(.*?)=[ .] (see a space in the beginning!)
I can handle the characters following after the second marker; allowed is a space or a period.
Number 1 and 2 are working. But 3 and 4 are missing.
The no character or line ending is missing.
Also in the beginning I don’t now how to check for no character before = OR .
>Solution :
You could write the pattern as:
(?:^| )=(.*?)=(?:[ .]|$)
(?:^| )Non capture group with an alternation|matching either a space or assert the start of the string=Match literally(.*?)Capture group 1, match any character as least as possible=Match literallt(?:[ .]|$)Match either a space or dot, or assert the end of the string
If there can not be any equals sign in between, you might also write the pattern as:
(?<!\S)=([^=\n]*)=(?:[ .]|$)
(?<!\S)Assert a whitspace boundary to the left=Match literally([^=\n]*)Capture group 1, match any character except=or a newline=Match literally(?:[ .]|$)Match either a space or dot, or assert the end of the string