Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract last sequence of digits from string along with everything that precede it

Consider the following string:

AB01CD03

What I want to do is break it down into two tokens namely AB01CD and 03.

In my string the number of digits following the last alpha character is unknown. There is always a sequence of digits at the end of the string.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Now, I can do this:

import re
S = 'AB01CD03'
v, = re.findall(r'(\d+)$', S)
assert v == '03'

…and because I now know the length of v I can deduce how to acquire the preamble using a slice – e.g.,

preamble = S[:len(S)-len(v)]
assert preamble == 'AB01CD'

Bearing in mind that the preamble may contain digits, what I’m looking for is a single RE that will reveal the two separate tokens – i.e.,

a, b = re.findall(MAGIC_EXPRESSION, S)

Is this possible?

>Solution :

Yes, like this:

import re
s = 'AB01CD03'
m = re.match(r'^(.+?)(\d+)$', s)
print(m.group(1), m.group(2))

This works because the group (.+?) is not greedy, so the second group (\d+) is allowed to match all the digits at the end. ^ and $ ensure the groups sit at the start and end respectively.

Result:

AB01CD 03

Closer to the syntax you were asking for:

a, b = re.match(r'^(.+?)(\d+)$', s).groups()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading