Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Match 1 or more free standing non-unicode alpha numeric characters

I’m looking for a single pattern to match, free-standing collections of non-Unicode alpha numeric characters. I will eventually do a replace with a single space.

Prerequisite

  • In regard to alpha characters, the Unicode category \p{L} is necessary
  • In regard to numeric \d is adequate
  • white space is included

Match Examples

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

‘/’ denotes any non-unicode alpha numeric character

aĂ a 111 /
   ^   ^^
aĂ a / 111
   ^^^
aĂ a /// 111
   ^^^^^
aĂ a/// 111
   ^^^^
aĂ a ///111
   ^^^^
aĂ a *&^#* 111
   ^^^^^^^
)(*)* 111
^^^^^^
Ă - 1
 ^^
Ă  -1
 ^^

Unmatched Examples

aĂ a///111
aĂ a-111
aĂ -/*&^*-a-1-1-1

What I have so far

  • The pattern [^\p{L}\d] will match any non-alpha numeric pattern.
  • Zero-width negative lookahead / lookbehind with word boundaries gets it closer e.g. (?<!\b)[^\p{L}\d](?!\b)

However, a pattern that solves all the above examples has been elusive

Note: my spidey senses tell me this is likely possible with a single pattern. Though, if this is more efficient or practical as 2 separate patterns, so be it.

>Solution :

\b word boundaries are problematic because those match a boundary between \w and \W, but you’re not using \w and \W.

It looks like you always want whitespace on one side or the other of a match so that needs to be worked in. Give this a try. It matches [^\p{L}\d\n]* either preceded or followed by [ \t]+.

[ \t]+[^\p{L}\d\n]*|[^\p{L}\d\n]*[ \t]+

Demo:

Regex101.com demo

Visual Representation

enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading