Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex Unexpected Behavior with optional groups

So I have this expression

#(?<category>.+)(?:\/(?<id>.+))?

Which is supposed to capture the foo of #foo or capture both foo and bar of #foo/bar

However, it seems to match the entire rest of the string as foo and capture it
RegexTester bad

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Removing the last functions as expected, but, of course, the last part is no longer optional
RegexTester good

I don’t understand why this happens. (This still happens without capture groups too)

>Solution :

It’s because .+ is greedy, and optional groups don’t have to match.

First, #(?<category>.+) consumes the whole string. Then there’s nothing for (?:\/(?<id>.+))? to match, but it’s not required to match anything, so the whole expression still succeeds.

There isn’t a general technique to rewrite every regex that suffers this issue, but there is a general approach to preventing it: make sure you write the preceding group to stop before the optional group would be matched. In this instance, since you want a backslash in the "id" group, you can have "category" not match backslashes:

#(?<category>[^/]+)(?:\/(?<id>.+))?

You might be tempted to use a lazy modifier, but this still won’t work, as it will then match the shortest possible substring ("f") and the optional group will still have nothing to match.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading