I have a string with 2 phrases, separated by an upper case word in the same string:
c="Text is here. TEST . More text here also"
I want to separate both phrases, removing the upper case word, TEST so that the output looks like:
["Text is here.","More text here also"]
What I did:
import re
c="Text is here. TEST . More text here also"
s=re.split('[A-Z][A-Z\d]+',c)
t=[re.sub('[^A-Za-z0-9]',' ',i) for i in s]
But I still get some unwanted spaces:
['Text is here ', ' More text here also']
Is there a cleaner and pythonic way to generate t ?
>Solution :
>>> re.split('\s*[A-Z]{2,}[\s\.]*', c)
['Text is here.', 'More text here also']
Spaces (optional) followed by at least two uppercase characters, followed by spaces or dots (optional).