Ignoring the leading space captured in a repeated group?

The following pattern matches a line that starts with ‘v’ followed by an arbitrary number of floats:

    const RegexOptions options = RegexOptions.Compiled | RegexOptions.Singleline | RegexOptions.CultureInvariant;

    var regex = new Regex(@"^\s*v((?:\s+)[-+]?\b\d*\.?\d+\b)+$", options);

    const string text = @"
v +0.5 +0.5 +0.5 0.0 1.0 1.0
v +0.5 -0.5 -0.5 1.0 0.0 1.0
v -0.5 +0.5 -0.5 1.0 1.0 0.0
v -0.5 -0.5 +0.5 0.0 0.0 0.0
";

    using var reader = new StringReader(text);

    for (var s = reader.ReadLine(); s != null; s = reader.ReadLine())
    {
        if (string.IsNullOrWhiteSpace(s))
            continue;

        var match = regex.Match(s);

        if (match.Success)
        {
            foreach (Capture capture in match.Groups[1].Captures)
            {
                Console.WriteLine($"'{capture.Value}'");
            }
        }
    }

It works as expected except that it includes the leading space before a number:

' +0.5'
' +0.5'
' +0.5'
' 0.0'
' 1.0'
' 1.0'
...

Question:

How can I ignore the leading space for each captured number?

>Solution :

You could change the regex to match the whitespace chars instead of capturing.

This part (?:\s+) is the same as just \s+ and as you repeat the pattern with 1 or more whitspace chars you can omit the word boundary \b at the end.

Note that in C# \d can match more than [0-9]

^\s*v(?:\s+([-+]?\b\d*\.?\d+))+$

The line in C# would be:

var regex = new Regex(@"^\s*v(?:\s+([-+]?\b\d*\.?\d+))+$", options);

Output

'+0.5'
'+0.5'
'+0.5'
'0.0'
'1.0'
'1.0'
'+0.5'
'-0.5'
'-0.5'
'1.0'
'0.0'
'1.0'
'-0.5'
'+0.5'
'-0.5'
'1.0'
'1.0'
'0.0'
'-0.5'
'-0.5'
'+0.5'
'0.0'
'0.0'
'0.0'

Leave a Reply