Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex to match multiple numbers within string

I have a regex that looks like this to extract order numbers from columns:

df["Orders"].str.extract('([0-9]{9,10}[/+ #_;.-]?)')

The orders column can look like this:

12
123456789
1234567890
123456789/1234567890
123456789/1/123456789
123456789+1234567890

The resulting new column in the dataframe after the regex should look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

NaN
123456789
1234567890
123456789/1234567890
123456789/123456789
123456789+1234567890

However, with my current regex I’m getting the following result:

NaN
123456789
1234567890
123456789/
123456789/
123456789+

How can I get the result that I’m looking for?

>Solution :

You can use

import pandas as pd
df = pd.DataFrame({'Orders':['12','123456789','1234567890','123456789/1234567890','123456789/1/123456789','123456789+1234567890', 'Order number: 6508955960_000010_1005500']})
df["Result"] = df["Orders"].str.findall(r'[/+ #_;.-]?(?<![0-9])[0-9]{9,10}(?![0-9])').str.join('').str.lstrip('/+ #_;.-')
df.loc[df['Result'] == '', 'Result'] = np.nan

See the regex demo. Details

  • [/+ #_;.-]?(?<![0-9])[0-9]{9,10}(?![0-9]) – matches an optional /, +, space, #, _, ;, . or - char, and then none or ten digit number not enclosed with other digits
  • Series.str.findall extracts all occurrences
  • .str.join('') concatenates the matches into a single string
  • .str.lstrip('/+ #_;.-') – removes the special chars that were matched with the number at the beginning of the string
  • df.loc[df['Result'] == '', 'Result'] = np.nan – if needed – replaces empty strings with np.nan values in the Result column.

Output:

>>> df
                  Orders                Result
0                    NaN                   NaN
1              123456789             123456789
2             1234567890            1234567890
3   123456789/1234567890  123456789/1234567890
4  123456789/1/123456789   123456789/123456789
5   123456789+1234567890  123456789+1234567890
>>> 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading