Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Regex to extract text between numbers

I’d like to extract the text between digits. For example, if have text such as the following

1964 ORDINARY shares
EXECUTORS OF JOANNA C RICHARDSON
100 ORDINARY shares 
TG MARTIN
C MARTIN
7500 ORDINARY shares 
ARCO LIMITED

I want to produce a list of 3 elements, where each element is the text between the numbers including the first number but not the end number, and the final element in the list where there is no end number

[
'1964 ORDINARY shares \nEXECUTORS OF JOANNA C RICHARDSON',
'100 ORDINARY shares \nTG MARTIN\nC MARTIN\n',
'7500 ORDINARY shares\nARCO LIMITED'
]

I tried doing this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

regex = r'\d(.+?)\d
re.findall(regex, a, re.DOTALL)

but it returned

['9',
 ' ORDINARY shares\nEXECUTORS OF JOANNA C RICHARDSON\n',
 '0 ORDINARY shares\nTG MARTIN\nC MARTIN\n',
 '0']

>Solution :

You can use the below code to achieve this.

import re

text = """1964 ORDINARY shares
EXECUTORS OF JOANNA C RICHARDSON
100 ORDINARY shares 
TG MARTIN
C MARTIN
7500 ORDINARY shares 
ARCO LIMITED"""

# Use regex to find the text between digits
pattern = r'\d+.*?(?=\d|$)'
matches = re.findall(pattern, text, flags=re.DOTALL)

print(matches)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading