Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use regex to extract recepient and sender from an email text in python

I am learning regular expressions and am getting pretty frustrated with this. I have the following text:

From: sender name
To: the recepient
Subject: well done!
Body: lorem ipsum lorem ipsum

I am trying to extract the text in the lines "From" and "To". I wrote the following regex:

(^From: [a-zA-Z]*)+|(^To: [a-zA-Z]*)+|(^Subject: [a-zA-Z])+

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

and I’m matching it using this code:

regex = re.compile(pattern, flags=re.IGNORECASE | re.MULTILINE)
result = regex.match(text).groups() 

but this is only matching the first line. I couldn’t figure out what’s wrong nor do I seem to understand how to write regular expressions correctly

>Solution :

Trying to stay close to your approach, the pattern ^From: ([ a-zA-Z]*)\nTo: ([ a-zA-Z]*) results in:

>>> result
('sender name', 'the recepient')

Now, why doesn’t your pattern work?

  1. (^From: [a-zA-Z]*) would never capture sender name because you’re not allowing any whitespace with [a-zA-Z]
  2. Using the A|B pattern makes it so the engine matches either A OR B, so it wouldn’t look for your To: pattern after encountering From:
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading