Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How Replace a dot (.) in sentence except when it appears in an abbreviation using regular Expression

I want to replace every dot with a space in a sentence except when it is used with an abbreviation. When it is used with an abbreviation, I want to replace it with '' NULL.

Abbreviation means a dot surrounded at least two Capital letters.

My regex are working except they catch U.S.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

r1 = r'\b((?:[A-Z]\.){2,})\s*'
r2 = r'(?:[A-Z]\.){2,}'

'U.S.A is abbr  x.y  is not. But I.I.T. is also valid ABBVR and so is M.Tech'

should become

'USA is abbr  x y  is not But IIT is also valid ABBVR and so is MTech'

>Solution :

You can use

import re
s='U.S.A is abbr  x.y  is not. But I.I.T. is also valid ABBVR and so is M.Tech'
print(re.sub(r'\b((?:[A-Z]\.)+)\.?|\.', lambda x: x.group(1).replace('.', '') if x.group(1) else ' ', s))
# => USA is abbr  x y  is not  But IIT is also valid ABBVR and so is MTech

See the Python demo. Here is a regex demo. It matches

  • \b((?:[A-Z]\.)+)\.? – a word boundary, then Group 1 capturing one or more occurrences of an uppercase ASCII letter and a ., and then an optional dot (if an abbreviation ends with a dot)
  • | – or
  • \. – a dot (in any other context)

If Group 1 matches, the replacement is Group 1 value with all dots removed with .replace('.', ''), else, the replacement is a space.

To make it Unicode-aware, install PyPi regex library (pip install regex) and use

import regex
s='U.S.A is abbr  x.y  is not. But I.I.T. is also valid ABBVR and so is M.Tech'
print(regex.sub(r'\b((?:\p{Lu}\.)+)\.?|\.', lambda x: x.group(1).replace('.', '') if x.group(1) else ' ', s))

The \p{Lu} matches any Unicode uppercase letter.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading