Recognize block of data as one block using regex vba

I am trying to create a pattern for the following text

not included

 468049876 
some text some text ffgg   
 30905103300638 
 1
other text other text

no included

Here’s my try

^\s*\d{6,10}(?:\n(?!\s*\d{1,}\n).*){5}

I will be using such a pattern in VBA
The expected output to be highlighted (in five lines)

468049876 
some text some text ffgg   
 30905103300638 
 1
other text other text

** I have updated the question as I face a problem
Suppose the text like that

not included

 468041476 
some text some text ffgg   
 30905103300638 
 1
other text other text
extra line
 416524332 
some text some text ffgg   
 30905103300638 
 1
other text other text
extra line
6354422
no included

Here I need the block to follow the sequence:
1- Numbers from 6 to 12 digits
2- Then some text in one line
3- Numbers equals to 14 digits
4- Numbers from 1 to 3 digits
5- Text (this is the problem as this text may be in two lines not one line) and I need to include that extra line as one line
so the output of the text example

 468049876 
some text some text ffgg   
 30905103300638 
 1
other text other text extra line

and

 416524332 
some text some text ffgg   
 30905103300638 
 1
other text other text extra line

I mean that text would include two blocks only (each of five lines)

>Solution :

It seems to me you should check for 6-10 digit number in the negative condition, and to match whitespace byt line breaks you can use [^\S\r\n]:

^[^\S\r\n]*\d{6,10}[^\S\r\n]*(?:(?:\r\n?|\n)(?![^\S\r\n]*\d{6,10}[^\S\r\n]*[\r\n]).+)*

If we assume line breaks are \n and whitespaces are just spaces you could write it as

^ *\d{6,10} *(?:\n(?! *\d{6,10} *\n).+)*

See the regex demo. Details:

  • ^ – start of a line (remember to use )
  • [^\S\r\n]* – zero or more horizontal whitespace
  • \d{6,10} – six to ten digits
  • [^\S\r\n]* – zero or more horizontal whitespace
  • (?: – start of a non-capturing group:
    • (?:\r\n?|\n) – a CRLF, LF or CR line ending
    • (?![^\S\r\n]*\d{6,10}[^\S\r\n]*[\r\n]) – not immediately followed with zero or more horizontal whitespace, six to ten digits, zero or more horizontal whitespace and end of a line
    • .+ – a non-empty line (one or more chars other than line break chars as many as possible
  • )* – end of the grouping, zero or more occurrences.

Leave a Reply