Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Exclude any digits from match but keep specific digits within brackets

string: abc keyword1 ddd 111 ddd (ddd 99/ddd) 1 ddd (ddd) ddd 11 ddd keyword2 abc

regex: re.compile(r'(?:keyword1)(.*)(?:keyword2)', flags = re.DOTALL | re.MULTILINE)

goal: exclude all digits except the ones within brackets from match

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

desired output: 'ddd ddd (ddd 99/ddd) ddd (ddd) ddd ddd'

approach1: Any digit within brackets is always 99 but the digits outside of brackets can also be 99. That is why i could also remove every digit from matching, except 99 and subsequently use not regex to remove the remaining 99s outside of brackets?!

approach2: match ddd (basically everything including 99s) except all other digits using some variant of the help below. I played with the (\([^)]*\)|\S)* around but failed prob because its java 😀

Question: Which approach makes sense? How can i modify my regex to reach my goal?

related help
Exclude strings within parentheses from a regular expression?
(\([^)]*\)|\S)*
where one balanced set of parentheses is treated as if it were a single character, and so the regex as a whole matches a single word, where a word can contain these parenthesized groups.

>Solution :

Without any additional packages, you can use a two step approach: get the string between keywords and then remove all digit chunks that are not inside parentheses:

import re
s = "abc keyword1 ddd 111 ddd (ddd 99/ddd) 1 ddd (ddd) ddd 11 ddd keyword2 abc"
m = re.search(r'keyword1(.*?)keyword2', s, re.I | re.S)
if m:
    print( re.sub(r'(\([^()]*\))|\s*\d+', r'\1', m.group(1)) )

## => ddd ddd (ddd 99/ddd) ddd (ddd) ddd ddd

See the Python demo.

Notes:

  • keyword1(.*?)keyword2 extracts all contents between keyword1 and keywor2 into Group 1
  • re.sub(r'(\([^()]*\))|\s*\d+', r'\1', m.group(1)) removes any digit chunks preceded with optional whitespace from the Group 1 value while keeping all strings between ( and ) intact.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading