Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using Python re and findall to match complex combination of digits in string

Im trying to use python re library in order to analyze a string containing a street name and multiple (or just a single) numbers separated by a forward slash.

example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'

I want to match all digits, including positions after the dot and adjacent alpha characters. If a hyphen connects two numbers with an alpha character, they should also be considered as one match.


Expected output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']

I’m trying the following

numbers = re.findall(r'\d+\.*\d*\w[-\w]*', example)

Which is able to find all except single non-float digits (i.e. '1'):

print(numbers)

['2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c'] 

How do I need to tweak my regex in order to achieve the desired output?

>Solution :

The pattern does not match the single 1 as \d+\.*\d*\w[-\w]* expects at least 2 characters being at least 1 digit for \d+ and 1 word character for \w

If the address should not end on - and can only match characters a-z after the digits, and using a case insensitive match:

\b\d+(?:\.\d+)?[a-z]*(?:-\w+)*
  • \b A word boundary
  • \d+(?:\.\d+)? Match digits with an optional decimal part
  • [a-z]* Match optional chars a-z
  • (?:-\w+)* optionally repeat matching - and 1 or more word characters

Regex demo

Note that matching an address can be hard as there can be many different notations, this pattern matches the given format in the example string.

import re

example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'
pattern = r"\b\d+(?:\.\d+)?[a-z]*(?:-\w+)*"
print(re.findall(pattern, example))

Output

['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading