I am trying to create a pattern for the following text
not included
468049876
some text some text ffgg
30905103300638
1
other text other text
no included
Here’s my try
^\s*\d{6,10}(?:\n(?!\s*\d{1,}\n).*){5}
I will be using such a pattern in VBA
The expected output to be highlighted (in five lines)
468049876
some text some text ffgg
30905103300638
1
other text other text
** I have updated the question as I face a problem
Suppose the text like that
not included
468041476
some text some text ffgg
30905103300638
1
other text other text
extra line
416524332
some text some text ffgg
30905103300638
1
other text other text
extra line
6354422
no included
Here I need the block to follow the sequence:
1- Numbers from 6 to 12 digits
2- Then some text in one line
3- Numbers equals to 14 digits
4- Numbers from 1 to 3 digits
5- Text (this is the problem as this text may be in two lines not one line) and I need to include that extra line as one line
so the output of the text example
468049876
some text some text ffgg
30905103300638
1
other text other text extra line
and
416524332
some text some text ffgg
30905103300638
1
other text other text extra line
I mean that text would include two blocks only (each of five lines)
>Solution :
It seems to me you should check for 6-10 digit number in the negative condition, and to match whitespace byt line breaks you can use [^\S\r\n]
:
^[^\S\r\n]*\d{6,10}[^\S\r\n]*(?:(?:\r\n?|\n)(?![^\S\r\n]*\d{6,10}[^\S\r\n]*[\r\n]).+)*
If we assume line breaks are \n
and whitespaces are just spaces you could write it as
^ *\d{6,10} *(?:\n(?! *\d{6,10} *\n).+)*
See the regex demo. Details:
^
– start of a line (remember to use )[^\S\r\n]*
– zero or more horizontal whitespace\d{6,10}
– six to ten digits[^\S\r\n]*
– zero or more horizontal whitespace(?:
– start of a non-capturing group:(?:\r\n?|\n)
– a CRLF, LF or CR line ending(?![^\S\r\n]*\d{6,10}[^\S\r\n]*[\r\n])
– not immediately followed with zero or more horizontal whitespace, six to ten digits, zero or more horizontal whitespace and end of a line.+
– a non-empty line (one or more chars other than line break chars as many as possible
)*
– end of the grouping, zero or more occurrences.