Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regular expression in Pandas: Get substring between a space and a colon

I have a Pandas dataframe with the column store. It contains a list of stores that look like this:

H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX
H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX

I want the store number, which are 721 and 39 in the above examples.

Here is my process for getting it:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  1. Find the position of the colon.
  2. Slice backwards until reaching a space.

How do I do this in Python/Pandas? I’m guessing that I need to use regex, but I have no idea how to start.

>Solution :

You can use str.extract with the (\d+): regex:

df['number'] = df['store'].str.extract('(\d+):', expand=False).astype(int)

Output:

                                             store  number
0  H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX     721
1           H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX      39

regex demo

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading