Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove a pattern if does not contains a specific words

I need to remove everything from the given text after a specific pattern if doesn’t include specific words. For example, I need to remove everything after a number if doesn’t include "key1" and "key2"

txt1 = "this is a number 123456789 and there aren't any keys here. we might have a lot of words here as well but no key words'

There are no key1 and key2 in this text, so, the output for txt1 should be:

out1 = "this is a number"
txt2 = "this is a number 123456789 but we have their key1 here. key2 might be in the second or the third sentence. hence we can't remove everything after the given number'

There are key1 and key2 in the above text, so, the output for txt2 should be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

out2 = "this is a number 123456789 but we have their key1 here. key2 might be in the second or the third sentence. hence we can't remove everything after the given number'

I tried to use negative lookahead as below but it didn’t work.

re.sub(r'\d+.*(?!key1|key2).*', '', txt)

>Solution :

(?=^(?:(?!key[12]).)*$)^.*(?=\s\d+)

Short Explanation

  • (?=^(?:(?!key[12]).)*$) Assert that the string does not contain neither key1 or key2
  • ^.*?(?=\s\d+) Capture the string till the digits

See the regex demo

Python Example

import re

strings = [
    "this is a number 123456789 and there aren't any keys here. we might have a lot of words here as well but no key words",
    "this is a number 123456789 but we have their key1 here. key2 might be in the second or the third sentence. hence we can't remove everything after the given number",
]

for string in strings:
    match = re.search(r"(?=^(?:(?!key[12]).)*$)^.*?(?=\s\d+)", string)
    output = match.group() if match else string
    print(output)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading