Regex Unexpected Behavior with optional groups

So I have this expression

#(?<category>.+)(?:\/(?<id>.+))?

Which is supposed to capture the foo of #foo or capture both foo and bar of #foo/bar

However, it seems to match the entire rest of the string as foo and capture it
RegexTester bad

Removing the last functions as expected, but, of course, the last part is no longer optional
RegexTester good

I don’t understand why this happens. (This still happens without capture groups too)

>Solution :

It’s because .+ is greedy, and optional groups don’t have to match.

First, #(?<category>.+) consumes the whole string. Then there’s nothing for (?:\/(?<id>.+))? to match, but it’s not required to match anything, so the whole expression still succeeds.

There isn’t a general technique to rewrite every regex that suffers this issue, but there is a general approach to preventing it: make sure you write the preceding group to stop before the optional group would be matched. In this instance, since you want a backslash in the "id" group, you can have "category" not match backslashes:

#(?<category>[^/]+)(?:\/(?<id>.+))?

You might be tempted to use a lazy modifier, but this still won’t work, as it will then match the shortest possible substring ("f") and the optional group will still have nothing to match.

Leave a Reply