Follow

Follow

Contact

Home How to exclude some text between two patterns?

Questions

How to exclude some text between two patterns?

byMR

May 21, 2022

I’d like to match all patterns between <PDF> and </PDF> inside a string:

import re

lines = """
hello
<PDF>
bla1
</PDF>
test
<PDF>
bla2
</PDF>
"""

matches = re.findall(r"<PDF>.*</PDF>", lines, re.DOTALL)
print(matches)

Output:

['<PDF>\nbla1\n</PDF>\ntest\n<PDF>\nbla2\n</PDF>']

Expected Output:

['<PDF>\nbla1\n</PDF>', '<PDF>\nbla2\n</PDF>']

What’s going wrong here? How can I ensure that no text between </PDF> and <PDF> gets matched?

>Solution :

* is greedy, so it tries to match as much as possible.

Use *? in this case. See Python’s documentation of module re:

Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched.

matches = re.findall(r"<PDF>.*?</PDF>", lines, re.DOTALL)

byMR

Published May 21, 2022

Add a comment

Leave a ReplyCancel reply

Read more

Questions

A pending promise is returned from stripe.checkout.sessions.listLineItems each time

byMR

May 21, 2022

Questions

How do I fix "AttributeError: 'TextChannel' object has no attribute 'news'" in Pycord?

byMR

May 21, 2022

Questions

Why are these centered divs shifted upwards and how to fix this?

byMR

May 21, 2022

Questions

Plot looks different everytime i run the code

byMR

May 21, 2022

Questions

Passing c++ map by reference and see changes after insert

byMR

May 21, 2022

Questions

How do I json_normalize() a specific field within a df and keep the other columns?

byMR

May 21, 2022