Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex Trying To Grab Specific Part Of String Python

I am pretty new to regex and I am trying to grab part of this string, I am looking for it to start grabbing the string at the first digit in the string and copy the entire string all the away until the end digits. Example below.

import re

string = "['Today is the open house of 1234 High Drive, Denver, COLORADO 80204; open to the Public "

property_address = re.findall('\d-\d\d\d\d\d', str(string))

print(property_address)

Code above does not work, I’m a bit confused on how to tell Regex, start on first digit you find and grab until you find 5 digit sequence.

Thanks for all the help or examples.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use:

import re

s = """
aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204 aldjfladjfa alsdjflaksjdf 
aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204 - 1829
aldjfladjfa alsdjflaksjdf  1234 High Drive, Denver, COLORADO 00204 - 1829
aldjfladjfa alsdjflaksjdf aldjfladjfa alsdjflaksjdf aldjfladjfa alsdjflaksjdf 
aldjfladjfa alsdjflaksjdf 1234 High Drive, 3rd, 4th phone number 1391713917 Denver, COLORADO 00204 - 1829 aldfjald

"""

p = r'\b[1-9].*[0-9]{5}(?:-[0-9]{4}\b)?'

find_address = re.findall(p, s)

print(find_address)

Prints

[‘1234 High Drive, Denver, COLORADO 80204’, ‘1234 High Drive, Denver,
COLORADO 80204’, ‘1234 High Drive, Denver, COLORADO 00204’, ‘1234 High
Drive, 3rd, 4th phone number 1391713917 Denver, COLORADO 00204’]

Notes

  • Occasionally, there is a - and four digits after zipcode. Right? That should be considered.

\b[1-9].*[0-9]{5}(?:-[0-9]{4}\b)?:

  • \b is a word boundary.
  • [1-9] assumes that the address starts with [1-9] numbers and not 0. If you want 0, then use \b[0-9].*[0-9]{5}(?:-[0-9]{4}\b)?.
  • (?:-[0-9]{4}\b)? is an optional group. It means, if the group is in the text, will take it, otherwise no.
  • [0-9]{5} means all digits, only five times.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading