I have a list of strings, some of them start with a specific pattern. My aim is to trim that pattern out of all the strings that contain it. Basically, a string in my case is a comment that could be a reply to a comment, or a reply to a reply, and so on. I would like to remove the pattern "Quote from: …. ". Let me show you what I mean.
import re
texts = ['Quote from: cdog on July 27, 2017, 04:32:00 AMQuote from: karanggatak on July 27, 2017, 03:42:38 AMyes they can, just keep in personal.', 'Quote from: doublebit21 on July 29, 2017, 03:39:53 AMPossible but imposible they will do that because.', 'Quote from: denny27 on August 01, 2017, 04:46:58 AMIts already happened, there is ample evidence that many wallets have been hacked.']
What I would like to get:
texts = ['yes they can, just keep in personal.', 'Possible but imposible they will do that because.', 'Its already happened, there is ample evidence that many wallets have been hacked.']
I want to remove all the "Quote from: … {timestamp}" from each comment. There can be multiple "Quote from" patterns, because a string can be a reply to multiple replies (so many multiple "quote from" patterns will be stacked one next to the other at the begging of the sentence). I suspect this can be done with regex. I have not made serious progress until now.
>Solution :
What about
[re.sub(r"Quote from.*[A|P]M", "", string) for string in texts]