Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex with multiple cues: find all shortest options

I have a problem closely related to this question:
Regex find match within a string

In that case the problem is to find Warner Music Group instead of XYZ becomes Chief Digital Officer and EVP, Business Development of Warner Music Group for

Ole Abraham  of XYZ becomes Chief Digital Officer and EVP, Business Development of Warner Music Group.

which is solved using .*\bof\s+([^.]+)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Now I have a very similar problem, with the difference that I want all matches, and the previous solution returns only one. Here you have my basic setup with the solution above: https://regex101.com/r/bIbFaW/1

The problem is that for the string

This is a test with a string with punctuation, and an end. Then test words, and more text. And here whith more text with more punctuation, like that.

the pattern .*\bwith(.*?), will only get me more punctuation (a good match), missing an earlier option punctuation from the first sentence.

Is it possible to do this or should I approach it differently? For example with(.*?), gets all matches, but they are the longer options ( a string with punctuation instead of punctuation,). I could then try to find matches within my matches, but doing this at this moment has unrelated overhead which would be nice to avoid if possible.

example text, with colours highlighting different parts of the string

>Solution :

You can avoid matching a comma with a negated character class [^,] and match with followed by matching any character except a comma or matching with again using a tempered greedy token.

Then match the comma at the end.

\bwith\b((?:(?!\bwith\b)[^,])*),
  • \bwith\b Match the word with
  • ( Capture group 1
    • (?: Non capture group to repeat as a whole part
      • (?!\bwith\b)[^,] Match any char except a comma if the current position is not directly followed by the word "with"
    • )* Close the non capture group and optionally repeat it
  • ) Close group 1
  • , Match a comma

Regex demo

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading