Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex to match everything after a pattern occurrence until the next pattern occurs and so on

I’d like to extract everything that follows a "line break and integer" until the next "line break and integer", where i’d like to capture everything that follows that and so on.
For example for the following string:

"\na \n1 b\nc \n2 b\nc \n3 b\nc"

I’d like to capture the following groups:

["\n1 b\nc ", "\n2 b\nc ", "\n3 b\nc"]

This is what i’ve tried

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

re.findall("\n\d[\s\S]*(?=\n\d)*","\na \n1 b\nc \n2 b\nc \n3 b\nc")

But it’s not splitting the matches, I think i need to make it "non-greedy" but i’m not sure how.

['\n1 b\nc \n2 b\nc \n3 b\nc']

>Solution :

You may use this regex in DOTALL or single line mode:

(?s)\n\d.*?(?=\n\d|\Z)

RegEx Demo

RegEx Details:

  • (?s): Enable single line mode to allow dot to match line break
  • \n: Match a line break
  • \d: Match a digit
  • .*?: Match 0 or more of any characters (lazy)
  • (?=\n\d|\Z): Lookahead to assert that we have either another line break and digit or end of input

Code:

>>> import re
>>> s = "\na \n1 b\nc \n2 b\nc \n3 b\nc"
>>> re.findall(r'(?s)\n\d.*?(?=\n\d|\Z)', s)
['\n1 b\nc ', '\n2 b\nc ', '\n3 b\nc']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading