Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find either regular and/or non-ascii characters in the same string with notepad++/regex

Looking for the proper notepad++ regex search string that will find both regular ASCII and non-ASCII characters in the same string.

Currently using 2-3 finds to track down these non-ASCII characters.

The search string text always resides in-between square brackets [ ].

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[Alfirimë Nóri]

[Sadoc panting]

[Eärien shouting]

[Queen Míriel] One, two, three,

[The Stranger gasping]

Need a SINGLE regex search string to find both ASCII and non-ASCII. Also needs to find the strings that have NO non-ASCII characters.

Find 1: \[([A-Z]*(?:(?:\h*|-)[A-Z0-9.#&',íáéíóôúüñÁÉÍÓÚÜÑÇçåø][a-z]*)*)\]

This only finds strings that have non-ASCII characters in the string:

Find 2: \[([A-Z]*(?:(?:\h*|-)[^\x00-\x7F][a-z]*)*)\]

I noticed that the Find 2: above did not find this .. í … "Míriel" two different in same name

Is it possible to have a SINGLE regex search string to find both ASCII and non-ASCII. And also find strings that have NO non-ASCII characters.

Any improvements in the above would be greatly appreciated.

Thanks in advance

Edit: Find 2 didn’t find "Míriel" because the first name/word encountered didn’t have non-ASCII characters in it.

>Solution :

You can use

\[(?=[^][A-Za-z]*[A-Za-z])(?=[^][]*[^[:^alpha:]A-Za-z])[^][]*]

See the regex demo. NOTE: to match the strings between [...] that do not contain non-ASCII letters, replace the second (?= with (?!:

\[(?=[^][A-Za-z]*[A-Za-z])(?![^][]*[^[:^alpha:]A-Za-z])[^][]*]

See this regex demo.

Details:

  • \[ – a [ char
  • (?=[^][A-Za-z]*[A-Za-z]) – after zero or more chars other than ASCII letters, [ and ], there must be an ASCII letter
  • (?=[^][]*[^[:^alpha:]A-Za-z]) – after zero or more chars other than [ and ], there must be a non-ASCII letter
  • [^][]* – zero or more chars other than [ and ]
  • ] – a ] char.

enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading