I have a Pandas dataframe with the column store. It contains a list of stores that look like this:
H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX
H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX
I want the store number, which are 721 and 39 in the above examples.
Here is my process for getting it:
- Find the position of the colon.
- Slice backwards until reaching a space.
How do I do this in Python/Pandas? I’m guessing that I need to use regex, but I have no idea how to start.
>Solution :
You can use str.extract with the (\d+): regex:
df['number'] = df['store'].str.extract('(\d+):', expand=False).astype(int)
Output:
store number
0 H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX 721
1 H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX 39