Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regular expression negative lookbehind

I want to have a matching regexp pattern that matches all the addresses that end in 4 or more digits, but not coming after 'APT', 'BOX', 'APT ', or 'BOX '.
So it should match these cases

HITME 1234
HITME 12345
HITME1234

but not the following cases

BOX 1234
BOX 12345
BOX4044
APT 1234
APT 12345
NONHIT123
NONHIT 123

I have made this one

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

(?<!(APT |BOX ))([0-9]{4,})$

but it does not work right. Somehow still matches the no no cases.

>Solution :

TL;DR use ^(?!APT|BOX).*?([0-9]{4,})$


Your regex (?<!(APT |BOX ))([0-9]{4,})$ incorrectly matches:

  • BOX 12345 on 2345 because it is not preceded by APT or BOX . Instead, it is preceded by BOX 1
  • BOX4044 on 4044 because it is not preceded by APT or BOX . Instead, it is preceded by BOX
  • APT 12345 on 2345 for a similar reason.

The regex you’re looking for is ^(?!APT|BOX).*?([0-9]{4,})$, which is broken down like so:

  • ^(?!APT|BOX) – the beginning of the string cannot be followed by APT or BOX
  • .*? – a bunch of garbage in the middle of the string, taking as few characters as possible (i.e. HITME in your test cases)
  • ([0-9]{4,})$ – the matched digits at the end of the string
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading