I’d like to extract the text between digits. For example, if have text such as the following
1964 ORDINARY shares
EXECUTORS OF JOANNA C RICHARDSON
100 ORDINARY shares
TG MARTIN
C MARTIN
7500 ORDINARY shares
ARCO LIMITED
I want to produce a list of 3 elements, where each element is the text between the numbers including the first number but not the end number, and the final element in the list where there is no end number
[
'1964 ORDINARY shares \nEXECUTORS OF JOANNA C RICHARDSON',
'100 ORDINARY shares \nTG MARTIN\nC MARTIN\n',
'7500 ORDINARY shares\nARCO LIMITED'
]
I tried doing this
regex = r'\d(.+?)\d
re.findall(regex, a, re.DOTALL)
but it returned
['9',
' ORDINARY shares\nEXECUTORS OF JOANNA C RICHARDSON\n',
'0 ORDINARY shares\nTG MARTIN\nC MARTIN\n',
'0']
>Solution :
You can use the below code to achieve this.
import re
text = """1964 ORDINARY shares
EXECUTORS OF JOANNA C RICHARDSON
100 ORDINARY shares
TG MARTIN
C MARTIN
7500 ORDINARY shares
ARCO LIMITED"""
# Use regex to find the text between digits
pattern = r'\d+.*?(?=\d|$)'
matches = re.findall(pattern, text, flags=re.DOTALL)
print(matches)