I am trying to match the following text using this regular expression:
ABC: ((?:.+\n?)+|.+)(?=DE:)
The text I have as sample is:
ABC: Lorem ipsum dolor
sit amet. Lorem ipsum dolor DE: ** Lorem
Other Text1: 1Lorem ipsum dolor sit amet
Other Text2: 2Lorem ipsum dolor sit amet
Other Text3: 3Lorem ipsum dolor sit amet
Other Text4: 4Lorem ipsum dolor sit amet
But I have an issue with the number of iteration in the backtracking causing it to be stuck for ever.
I share the full code If you want to test it:
import re
text = """ABC: Lorem ipsum dolor
sit amet. Lorem ipsum dolor DE: Lorem
Other Text1: 1Lorem ipsum dolor sit amet
Other Text2: 2Lorem ipsum dolor sit amet
Other Text3: 3Lorem ipsum dolor sit amet
Other Text4: 4Lorem ipsum dolor sit amet
"""
aux = re.search(r"ABC: ((?:.+\n?)+(?=DE:)|.+)",text,re.M|re.U)
if aux:
print(aux.group(1))
else:
print("Could not be found")
>Solution :
Maybe you could try:
aux = re.findall(r'\bABC:\s*(.+?)\s*\bDE:', text, re.DOTALL)[0]
Or:
aux = re.findall(r'\bABC:\s*([\w\W]+?)\s*\bDE:', text)[0]
Both print:
Lorem ipsum dolor
sit amet. Lorem ipsum dolor