How to RegEx between values that can be multiline or singleline

I have a long text from which I need to extract data. I am trying to use RegEx but with little success. I did my research, tried a lot of things, but it is not working.

The pattern should:

  • Find the string: "Adónem számlaszáma: "
  • Return the account number after that
  • Go backwards UNTIL the first word with 3 digits
  • Return that 3-letter code
  • Return the text between this code and the first string

Part of the text:

Időszak: 2021.01.01-2021.11.24
101 Társasági adó   Adónem számlaszáma: 10032000-01076019

Pattern used:

*Flags used: global, single line*

(\b\d\d\d\b)( .*?)Adónem számlaszáma: (.*?)\n

Match is good:


Another part of the text:

-13 000

    101 adónemen többlet:   5 000 Ft
104 Általános forgalmi adó  Adónem számlaszáma: 10032000-01076868

Same pattern used.

Match is not good:


This is the full file I am working with: samplefile.txt

What am I missing? I have the lazy quantifier, dot matches newline etc… Thank you in advance.

>Solution :

If you do not need to match across lines, you may get it done with

\b\d{3}\b\s*(.*?)\s*Adónem számlaszáma: (\S*)

See this regex in action.

Otherwise, you would need to make sure there are no other 3-digit numbers between a 3-digit number and your fixed string:

\b\d{3}\b\s*((?:(?!\b\d{3}\b)[^])*?)\s*Adónem számlaszáma: (\S*)

See this demo. Let me explain the second pattern as it is more specific:

  • \b\d{3}\b – three digits enclosed with word boundaries
  • \s* – zero or more whitespaces
  • ((?:(?!\b\d{3}\b)[^])*?) – Group 1: any char ([^]), zero or more repetitions but as few as possible (*?), that does not start a 3-digit number enclosed with word boundaries
  • Adónem számlaszáma: – a fixed string
  • (\S*) – Group 2: zero or more non-whitespace chars.

Leave a Reply