I have a pandas column that looks like this:
0 LOT 8 BLK 32 MISSIONARY HEIGHTS 1 LTS 31-39 & PT 30, 15-22 LEWIS 2 LT 2 DEPOT SUB ....
I want to extract out the lot number from this but sometimes they are spelled "LOT", "LTS" or "LT". And I want to extract out the lot number into a new column. So, for the example above, I would have:
df['Lot'] 0 8 1 31-39 2 2
How can I do this? Please advise.
We can use
df["Lot"] = df["Address"].str.extract(r'\b(?:LOT|LTS?) (\d+(?:-\d+)*)')
Here is a demo showing that the regex extraction logic is working.
To remove these lot values from the address, you may use
df["Other"] = df["Address"].str.replace(r'\s*(?:LOT|LTS?) \d+(?:-\d+)*\s*', ' ').str.strip()
The final call to
strip() is to remove dangling leading/trailing whitespace, which would occur for those addresses either starting or ending with a lot quantity.