I simply want to extract all words before a certain word from a pandas df column. For example if I have a df column:
County
Salt Lake County
San Juan County
Dover County
I want to get:
Salt Lake
San Juan
Dover
I have tried:
df['new_county'] = df['County'].str.lower().str.extract(r'\w+(?=\s+county)')
But this is only extracting one word right before the "County" and I couldn’t figure it out how to get all words. All the other questions on SO are a lot more complicated. Please help.
>Solution :
I might actually express your problem as just stripping off County from the end of the country names:
df["new_county"] = df["County"].str.replace(r'\s+County$', '')
Note that this approach is also robust regarding country names that might not end in County. In those cases, the above replacement would not alter the current text for the county.