Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python extract repeating substring(s) between equal markers

let’s say I have a textfile as follows:

 1. MarkerOne
 Some text
 EndMarkerOne
 2. Something else
 Some more text
 EndSomethingElse
 3. MarkerTwo
 Some Text
 EndMarkerTwo

whereas MarkerOne and MarkerTwo as well as EndMarkerOne and EndMarkerTwo are the same. E.g.:

    1. Notice 
    Some text 
    End Notice
    2. Blabla 
    Some other text 
    End Blabla
    3. Notice 
    Some more text
    End Notice

Now I want to extract the "some text" and the "some more text" from the file as two different substrings in a list.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried:

    import re
    pattern = "\d+. Notice[\S\t\n\v ]*End Notice"
    re.compile(pattern)
    result = re.findall(pattern, text)
    print(result)

Unfortunately this gives me all text between the first "Notice" and the last "End Notice" and not two separate results.

What I need is to tell the script to separate the results by each "End Notice" and start the next with finding the pattern again.

Any idea?

>Solution :

Use a non-greedy regex, change * to *?, see What is the difference between .*? and .* regular expressions?

import re

ptn = re.compile(r"\d+. Notice[\S\t\n\v ]*?End Notice")
result = ptn.findall(text)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading