I am reading text from a file that contains flags start and end. I want to replace everything between start and end with the same text except I want to remove any newlines in the matching group.
I tried to do it as follows:
import re
start = '---'
end = '==='
text = '''\
Some text
---line 1
line 2
line 3===
More text
...
Some more text
---line 4
line 5===
and even more text\
'''
modified = re.sub(pattern=rf'{start}(.+){end}', repl=re.sub(r'\n', ' ', r'\1'), string=text, flags=re.DOTALL)
print(modified)
This prints:
Some text
line 1
line 2
line 3===
More text
...
Some more text
---line 4
line 5
and even more text
Couple of issues with this, 1. it matches the biggest group (and not the smaller matching groups), and 2. it does not remove the newlines.
I am expecting the output to be:
Some text
line 1 line 2 line 3
More text
...
Some more text
line 4 line 5
and even more text
Any help will be appreciated. Thank you!
>Solution :
Use non-greedy modifier (?) in the capturing group. Also, change the replacement function for simple str.replace:
import re
start = "---"
end = "==="
text = """\
Some text
---line 1
line 2
line 3===
More text
...
Some more text
---line 4
line 5===
and even more text\
"""
modified = re.sub(
pattern=rf"{start}(.+?){end}",
repl=lambda g: g.group(1).replace("\n", " "),
string=text,
flags=re.DOTALL,
)
print(modified)
Prints:
Some text
line 1 line 2 line 3
More text
...
Some more text
line 4 line 5
and even more text