Regext to find text segments where text in not preceded by a number

Advertisements

Using this text as an example: foo 34abcd bar 7890xyz 123.
Please help me find a regex to find all "orphan" numbers, that would be 123;
and regex to find all "orphan" texts, that would be foo and bar.

To find all number followed immediately by a text is simple, I use a simple regex
([0-9]+)([a-z]+). But how to find all non-conforming segments/words?

To find number not followed by a text, the "orphan" nuber, I tried to use regex negative look-ahead ([0-9]+)(?![a-z]+), but that return false positive like 3 from 34abcd, because 3 is not followed by a letter. How can I make regex consider 34 and not a 3 and 4?

To find texts not immediately predeced by a number, the "orphan" texts,
I tried regex with negative look-behind, (?<![0-9]+)([a-z]+).
But it make false positive match on 34abcd returning text bcd.
The look-behing seems to work only for exact texts, like (min|sec|year|years), but not for a any string of letters.

Background:
I’m trying to write a regex based parser for Java Duration, something more human friendly then the default parser in Duration.parse. Something what would parse "1h 30min 5sec" but also detect and report back any unparsable bits, like "foo bar 30min" shoudl complain about not recognizing "foo bar".

There is a question for that
Parsing time strings like "1h 30min"
but the suggested solutions are not satisfactory.

>Solution :

You need to include the letter pattern into the character class inside the lookbehind:

(?<![0-9a-z])([a-z]+)

Note that the capturing group around [a-z]+ is superfluous, and should be removed.

See the regex demo.

Leave a ReplyCancel reply