I am pretty new to regex and I am trying to grab part of this string, I am looking for it to start grabbing the string at the first digit in the string and copy the entire string all the away until the end digits. Example below.
import re
string = "['Today is the open house of 1234 High Drive, Denver, COLORADO 80204; open to the Public "
property_address = re.findall('\d-\d\d\d\d\d', str(string))
print(property_address)
Code above does not work, I’m a bit confused on how to tell Regex, start on first digit you find and grab until you find 5 digit sequence.
Thanks for all the help or examples.
>Solution :
You can use:
import re
s = """
aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204 aldjfladjfa alsdjflaksjdf
aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204 - 1829
aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 00204 - 1829
aldjfladjfa alsdjflaksjdf aldjfladjfa alsdjflaksjdf aldjfladjfa alsdjflaksjdf
aldjfladjfa alsdjflaksjdf 1234 High Drive, 3rd, 4th phone number 1391713917 Denver, COLORADO 00204 - 1829 aldfjald
"""
p = r'\b[1-9].*[0-9]{5}(?:-[0-9]{4}\b)?'
find_address = re.findall(p, s)
print(find_address)
Prints
[‘1234 High Drive, Denver, COLORADO 80204’, ‘1234 High Drive, Denver,
COLORADO 80204’, ‘1234 High Drive, Denver, COLORADO 00204’, ‘1234 High
Drive, 3rd, 4th phone number 1391713917 Denver, COLORADO 00204’]
Notes
- Occasionally, there is a
-and four digits after zipcode. Right? That should be considered.
\b[1-9].*[0-9]{5}(?:-[0-9]{4}\b)?:
\bis a word boundary.[1-9]assumes that the address starts with [1-9] numbers and not 0. If you want 0, then use\b[0-9].*[0-9]{5}(?:-[0-9]{4}\b)?.(?:-[0-9]{4}\b)?is an optional group. It means, if the group is in the text, will take it, otherwise no.[0-9]{5}means all digits, only five times.