Using this text as an example: foo 34abcd bar 7890xyz 123
.
Please help me find a regex to find all "orphan" numbers, that would be 123
;
and regex to find all "orphan" texts, that would be foo
and bar
.
To find all number followed immediately by a text is simple, I use a simple regex
([0-9]+)([a-z]+)
. But how to find all non-conforming segments/words?
To find number not followed by a text, the "orphan" nuber, I tried to use regex negative look-ahead ([0-9]+)(?![a-z]+)
, but that return false positive like 3
from 34abcd
, because 3
is not followed by a letter. How can I make regex consider 34
and not a 3
and 4
?
To find texts not immediately predeced by a number, the "orphan" texts,
I tried regex with negative look-behind, (?<![0-9]+)([a-z]+)
.
But it make false positive match on 34abcd
returning text bcd
.
The look-behing seems to work only for exact texts, like (min|sec|year|years), but not for a any string of letters.
Background:
I’m trying to write a regex based parser for Java Duration
, something more human friendly then the default parser in Duration.parse
. Something what would parse "1h 30min 5sec" but also detect and report back any unparsable bits, like "foo bar 30min" shoudl complain about not recognizing "foo bar".
There is a question for that
Parsing time strings like "1h 30min"
but the suggested solutions are not satisfactory.
>Solution :
You need to include the letter pattern into the character class inside the lookbehind:
(?<![0-9a-z])([a-z]+)
Note that the capturing group around [a-z]+
is superfluous, and should be removed.
See the regex demo.