Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Match all punctuation not surrounded by alphanumeric characters?

I am trying to write a regular expression that removes all non alphanumeric characters from a string, except for those that are surrounded by alphanumeric characters.

For example, consider the following three examples.

1.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

it's -> it's

2.

its. -> its

3.

It's a: beautiful day? I'm =sure it is. The coca-cola (is frozen right?

It's a beautiful day I'm sure it is The coca-cola is frozen right

I am using Python’s re module, and can match the opposite of what I am looking for with the following expression.

(?<=[a-zA-Z])[^a-zA-Z ](?=[a-zA-Z])

Any ideas?

>Solution :

Use

[^a-zA-Z\s](?!(?<=[a-zA-Z].)[a-zA-Z])

Regex proof

EXPLANATION

PATTERN DETAILS
[^a-zA-Z\s] non-letter and non-whitespace
(?!(?<=[a-zA-Z].)[a-zA-Z]) unmatch if followed and preceded with letter
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading