Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find certain colons in string using Regex

I’m trying to search for colons in a given string so as to split the string at the colon for preprocessing based on the following conditions

  1. Preceeded or followed by a word e.g A Book: Chapter 1 or A Book :Chapter 1
  2. Do not match if it is part of emoticons i.e :( or ): or :/ or :-) etc
  3. Do not match if it is part of a given time i.e 16:00 etc

I’ve come up with a regex as such

(\:)(?=\w)|(?<=\w)(\:)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

which satisfies conditions 2 & 3 but still fails on condition 3 as it matches the colon present in the string representation of time. How do I fix this?

edit: it has to be in a single regex statement if possible

>Solution :

You can use

(:\b|\b:)(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b)

See the regex demo. Details:

  • (:\b|\b:) – Group 1: a : that is either preceded or followed with a word char
  • (?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b) – there should be no one or two digits right after : (followed with a word boundary) if the : is preceded with a single or two digits (preceded with a word boundary).

Note :\b is equal to :(?=\w) and \b: is equal to (?<=\w):.

If you need to get the same capturing groups as in your original pattern, replace (:\b|\b:) with (?:(:)\b|\b(:)).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading