Parsing this special format file

October 6, 2022

I have a file that is formatted this way —

{2000}000000012199{3100}123456789*{3320}110009558*{3400}9876
54321*{3600}CTR{4200}D2343984*JOHN DOE*1232 STREET*DALLAS TX
78302**{5000}D9210293*JANE DOE*1234 STREET*SUITE 201*DALLAS
TX 73920**

Basically, the number in curly brackets denotes field, followed by the value for that field. For example, {2000} is the field for "Amount", and the value for it is 121.99 (implied decimal). {3100} is the field for "AccountNumber" and the value for it is 123456789*.

I am trying to figure out a way to split the file into "records" and each record would contain the record type (the value in the curly brackets) and record value, but I don’t see how.

How do I do this without a loop going through each character in the input?

>Solution :

This regular expression should get you going:

Match a literal {
Match 1 or more digts ("a number")
Match a literal }
Match all characters that are not an opening {

\{\d+\}[^{]+

It assumes that the values itself cannot contain an opening curly brace. If that’s the case, you need to be more clever, e.g. @"\{\d+\}(?:\\{|[^{])+" (there are likely better ways)

Create a Regex instance and have it match against the text. Each "field" will be a separate match

var text = @"{123}abc{456}xyz";
var regex = new Regex(@"\{\d+\}[^{]+", RegexOptions.Compiled);
foreach (var match in regex.Matches(text)) {
  Console.WriteLine(match.Groups[0].Value);
}