I want to have a matching regexp pattern that matches all the addresses that end in 4 or more digits, but not coming after 'APT', 'BOX', 'APT ', or 'BOX '.
So it should match these cases
HITME 1234
HITME 12345
HITME1234
but not the following cases
BOX 1234
BOX 12345
BOX4044
APT 1234
APT 12345
NONHIT123
NONHIT 123
I have made this one
(?<!(APT |BOX ))([0-9]{4,})$
but it does not work right. Somehow still matches the no no cases.
>Solution :
TL;DR use ^(?!APT|BOX).*?([0-9]{4,})$
Your regex (?<!(APT |BOX ))([0-9]{4,})$ incorrectly matches:
BOX 12345on2345because it is not preceded byAPTorBOX. Instead, it is preceded byBOX 1BOX4044on4044because it is not preceded byAPTorBOX. Instead, it is preceded byBOXAPT 12345on2345for a similar reason.
The regex you’re looking for is ^(?!APT|BOX).*?([0-9]{4,})$, which is broken down like so:
^(?!APT|BOX)– the beginning of the string cannot be followed byAPTorBOX.*?– a bunch of garbage in the middle of the string, taking as few characters as possible (i.e.HITMEin your test cases)([0-9]{4,})$– the matched digits at the end of the string