Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Optimize regex to read date

I have developed a regex to use in a .NET WebAPI that gets a date and a control code from a given input already formatted in final format.

I tried regex to avoid using multiple string splits.

I’ve been using Regex101 to test my expression and I have one that already works as expected by I think it’s too large for what it does.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Expression:
^([0-9]{2})+([0-9]{2})+([0-9]{2})0-9([0-9]{2})+([0-9]{2})+([0-9]{2})

// Get in format Year, Month, Day, Code(M|F), Year, Month, Day

Input:
7603259M2209058PRT<<<<<<<<<<<8

Do you have any suggestions to simplify it?

>Solution :

There is one issue with your regex: you quantified the two-digit matching capturing groups with a + quantifier, making them match one or more times. ([0-9]{2})+ matches one or more sequences of any two ASCII digits, while keeping the last captured value in the corresponding group. See Repeating a Capturing Group vs. Capturing a Repeated Group.

You need to remove all + chars from your pattern and then you can also use the following:

  • Use \d to match any digit while passing the RegexOptions.ECMAScript option to the regex compile method so that it can only match ASCII digits (otherwise, \d will be equal to \p{Nd} and will match any Unicode digits, see \d less efficient than [0-9])
  • Instead of alterantion with single chars ((M|F)), use a character class, ([MF]), this is more efficient (see Why is a character class faster than alternation?).

You can use

var pattern = new Regex(@"^(\d{2})(\d{2})(\d{2})\d([MF])(\d{2})(\d{2})(\d{2})", RegexOptions.ECMAScript);

See the .NET regex demo.

If you want to use and even shorter regex you may use:

var pattern = new Regex(@"^(?:(\d{2})){3}\d([MF])(?:(\d{2})){3}", RegexOptions.ECMAScript);
var match = pattern.Match("7603259M2209058PRT<<<<<<<<<<<8");
if (match.Success)
{
    Console.WriteLine(match.Groups[1].Captures[0].Value); // => 76
    Console.WriteLine(match.Groups[1].Captures[1].Value); // => 03
    Console.WriteLine(match.Groups[1].Captures[2].Value); // => 25
    Console.WriteLine(match.Groups[2].Value);             // => M
    Console.WriteLine(match.Groups[3].Captures[0].Value); // => 22
    Console.WriteLine(match.Groups[3].Captures[1].Value); // => 09
    Console.WriteLine(match.Groups[3].Captures[2].Value); // => 05
}

See the C# demo and this regex demo.

Note this is possible because .NET Regex allows access to all the captures inside the group stack.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading