I am learning regular expressions and am getting pretty frustrated with this. I have the following text:
From: sender name
To: the recepient
Subject: well done!
Body: lorem ipsum lorem ipsum
I am trying to extract the text in the lines "From" and "To". I wrote the following regex:
(^From: [a-zA-Z]*)+|(^To: [a-zA-Z]*)+|(^Subject: [a-zA-Z])+
and I’m matching it using this code:
regex = re.compile(pattern, flags=re.IGNORECASE | re.MULTILINE)
result = regex.match(text).groups()
but this is only matching the first line. I couldn’t figure out what’s wrong nor do I seem to understand how to write regular expressions correctly
>Solution :
Trying to stay close to your approach, the pattern ^From: ([ a-zA-Z]*)\nTo: ([ a-zA-Z]*) results in:
>>> result
('sender name', 'the recepient')
Now, why doesn’t your pattern work?
(^From: [a-zA-Z]*)would never capturesender namebecause you’re not allowing any whitespace with[a-zA-Z]- Using the
A|Bpattern makes it so the engine matches eitherAORB, so it wouldn’t look for yourTo:pattern after encounteringFrom: