Python Regex to extract text between numbers

March 31, 2023

I’d like to extract the text between digits. For example, if have text such as the following

1964 ORDINARY shares
EXECUTORS OF JOANNA C RICHARDSON
100 ORDINARY shares 
TG MARTIN
C MARTIN
7500 ORDINARY shares 
ARCO LIMITED

I want to produce a list of 3 elements, where each element is the text between the numbers including the first number but not the end number, and the final element in the list where there is no end number

[
'1964 ORDINARY shares \nEXECUTORS OF JOANNA C RICHARDSON',
'100 ORDINARY shares \nTG MARTIN\nC MARTIN\n',
'7500 ORDINARY shares\nARCO LIMITED'
]

I tried doing this

regex = r'\d(.+?)\d
re.findall(regex, a, re.DOTALL)

but it returned

['9',
 ' ORDINARY shares\nEXECUTORS OF JOANNA C RICHARDSON\n',
 '0 ORDINARY shares\nTG MARTIN\nC MARTIN\n',
 '0']

>Solution :

You can use the below code to achieve this.

import re

text = """1964 ORDINARY shares
EXECUTORS OF JOANNA C RICHARDSON
100 ORDINARY shares 
TG MARTIN
C MARTIN
7500 ORDINARY shares 
ARCO LIMITED"""

# Use regex to find the text between digits
pattern = r'\d+.*?(?=\d|$)'
matches = re.findall(pattern, text, flags=re.DOTALL)

print(matches)