Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

python regex to substitute all digits except when they are part of a substring

I want to remove all digits, except if the digits make up one of the special substrings. In the example below, my special substring that should skip the digit removal are 1s, 2s, s4, 3s. I think I need to use a negative lookahead

s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(?!1s|2s|s4|3s)[0-9\.]"
re.sub(pattern, ' ', s)

To my understanding, the pattern above is:

  • starting from the end ([]) match all digits including decimals
  • only do that if we have not matched the patter after ?!
  • which are 1s, 2s, s4, OR 3s (| = OR)

It all makes sense until you try it. The sample s above returns a 1s sa 2s3s as s af3s, which suggests that all the exclusion patterns are working except if the digit is at the end of the special substring, in which case it still gets matched?!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I believe this operation should return a 1s sa 2s3s as4s4af3s, how to fix my pattern?

>Solution :

You can use

import re
s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(1s|2s|s4|3s)|[\d.]"
print( re.sub(pattern, lambda x: x.group(1) or ' ', s) )
# => a 1s sa 2s3s as4s4af3s

See the Python demo.

Details:

  • (1s|2s|s4|3s) – Group 1: 1s, 2s, s4 or 3s
  • | – or
  • [\d.] – a digit or dot.

If Group 1 matches, Group 1 value is the replacement, else, it is a space.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading