Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Not more than one special symbol in a range from a long text

Simplify the problem:

There is an article (long text)

Extract the content between start (included) and end (included)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Requirement: There cannot be more than one \n between start and end

Find all matches

Use python re only

For code:

lines = re.findall(pattern, text, re.DOTALL)
for line in lines:
    print(line)
    print('===')

So, how can I fixed my pattern?

What I try pattern:

  1. start[^\n]*\n?[^\n]*end
    with text:
...
start just me and python regex 1 end
start just me and python regex 2 end
start just me and python regex 3 end
...

wrong:

start just me and python regex 1 end
start just me and python regex 2 end --> should be split with the line before
===
start just me and python regex 3 end
===
  1. start(?:(?!\n\n).)*?end and start(?:[^\n]|\n(?!\n))*?end
    with text:
start just 
me and python 
regex 1 end
start just me and python regex 2 end
start just me and python regex 3 end

wrong:

start just 
me and python 
regex 1 end --> should not match this cause there is two `\n` in
===
start just me and python regex 2 end
===
start just me and python regex 3 end
===

>Solution :

you can use: start[^\n]*?\n?[^\n]*?end

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading