I have a string like below.
10. Title text
text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2
text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2
text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2
11. Title text
text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2
text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2
12. Title text
text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2
13. Title text
text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2
text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2
text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2
What I want to do is to separate the title and content in chunks and put them in a list.
result = [10. Title text\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2\n\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2, 11. Title text\ntext1text2text1text2text1text2text1text2text1text2text1text2text1 text2text1 text2text1 text2text1 text2text1text2\n\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2, ........]
I’ve tried this, but honestly I have no idea what to do. help
la_text = []
num = 1
for a in range(3):
sepa = re.findall(r"\d*(.*)\d*", text)[num]
la_text.append(sepa)
num += 1
print(la_text)
>Solution :
If s contains your string from the question you can do:
import re
pat = re.compile(r"^(\d+\.\s+.*?)(?=\n^\d+\.|\Z)", flags=re.M | re.S)
print(pat.findall(s))
Prints:
[
"10. Title text\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2\n\n\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2\n\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2\n",
"11. Title text\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2\n\n\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2\n",
"12. Title text\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2\n",
"13. Title text\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2\n\n\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2\n\n\ntext1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2text1 text2\n",
]