So I have a string that goes like this:
...<p><noop><fademusic:23,0><26:1><wait:30> <speed:10><30:2><5D:1><color:3>August 3, 9:47 AM<b>District Court<b>Defendant Lobby No. 2<color:0><p><hidetextbox:1><5D:0> <speed:255><music:8,0><wait:30><26:0><bgcolor:513,1,31><wait:7> <person:0,0,0><bg:2><bgcolor:258,1,31><wait:15><wait:30><hidetextbox:0> <name:512><shake:30,0><color:2>(Boy am I nervous!)<color:0><p> <hidetextbox:1><wait:45><name:1792><hidetextbox:0><bgcolor:769,8,31> Wright!<p>...
What do I need: find everything between <p>s. (Note that the ending one is also a starting one for the next.)
My code:
...
filetext = open(fn).read()
tag = '<p>'
result = re.findall(tag+"(.*?)"+tag,filetext,re.DOTALL)
print(result)
...
Expected output:
['<noop><fademusic:23,0><26:1><wait:30>\n<speed:10><30:2><5D:1><color:3>August 3, 9:47 AM<b>District \nCourt<b>Defendant Lobby No. 2<color:0>', '<hidetextbox:1><5D:0>\n<speed:255><music:8,0><wait:30><26:0><bgcolor:513,1,31><wait:7>\n<person:0,0,0><bg:2><bgcolor:258,1,31><wait:15><wait:30><hidetextbox:0>\n<name:512><shake:30,0><color:2>(Boy am I nervous!)<color:0>', '\n<hidetextbox:1><wait:45><name:1792><hidetextbox:0><bgcolor:769,8,31>\nWright!']
Resulting output:
['<noop><fademusic:23,0><26:1><wait:30>\n<speed:10><30:2><5D:1><color:3>August 3, 9:47 AM<b>District \nCourt<b>Defendant Lobby No. 2<color:0>', '\n<hidetextbox:1><wait:45><name:1792><hidetextbox:0><bgcolor:769,8,31>\nWright!']
>Solution :
No need for re module, just use str.split('<p>'). You may not want the empty strings in the result if <p> starts or ends a string so here is a solution if so:
s = '<p><noop><fademusic:23,0><26:1><wait:30> <speed:10><30:2><5D:1><color:3>August 3, 9:47 AM<b>District Court<b>Defendant Lobby No. 2<color:0><p><hidetextbox:1><5D:0> <speed:255><music:8,0><wait:30><26:0><bgcolor:513,1,31><wait:7> <person:0,0,0><bg:2><bgcolor:258,1,31><wait:15><wait:30><hidetextbox:0> <name:512><shake:30,0><color:2>(Boy am I nervous!)<color:0><p> <hidetextbox:1><wait:45><name:1792><hidetextbox:0><bgcolor:769,8,31> Wright!<p>'
result = s.split('<p>')
for n in (0, -1):
if result and not result[n]:
del result[n]
print(result)
Output:
['<noop><fademusic:23,0><26:1><wait:30> <speed:10><30:2><5D:1><color:3>August 3, 9:47 AM<b>District Court<b>Defendant Lobby No. 2<color:0>', '<hidetextbox:1><5D:0> <speed:255><music:8,0><wait:30><26:0><bgcolor:513,1,31><wait:7> <person:0,0,0><bg:2><bgcolor:258,1,31><wait:15><wait:30><hidetextbox:0> <name:512><shake:30,0><color:2>(Boy am I nervous!)<color:0>', ' <hidetextbox:1><wait:45><name:1792><hidetextbox:0><bgcolor:769,8,31> Wright!']
If you don’t want any empty strings, e.g., 'abc<p><p>def' would return ['abc', '', 'def'], then use:
result = [n for n in s.split('<p>') if n]