Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regular expression: Match everything after a particular word until multiple occurence of carriage return and new line

I am using Python and would like to match all the words after "Examination(s):" till one or more empty lines occur.

text = "Examination(s):\sMathematics 2nd Paper\r\n\r\nTimeTable"
text = "Examination(s):\r\n\r\nMathematics 2nd Paper\r\nblahblah"
text = "Examination(s):\r\nMathematics 2nd Paper\r\n\r\n\r\nmarks"

In all the above examples, my output should be "Mathematics 2nd Paper". Here is what I tried:

import re
pat = re.compile(r'(?:Examination\(s\):)[^\r\n]*')
re.search(pat,text)

The above snippet works fine for example 2 (one occurrence of \r\n), but is not working for examples 1 and 3.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I am getting this error when i tried to apply your pattern @Wiktor

enter image description here

Updating the question to capture the missed scenario, it can be a space or newline after colon

[![enter image description here][2]][2]

>Solution :

To get the line after Examination(s): you can use

re.search(r'Examination\(s\):\s*([^\r\n]+)', text)

See the regex demo. Details:

  • Examination\(s\): – a literal Examination(s): string
  • \s* – zero or more whitespaces
  • ([^\r\n]+) – Group 1: one or more chars other than CR and LF chars.

See the Python demo:

import re
texts = ["Examination(s):\r\nMathematics 2nd Paper\r\n\r\nTimeTable",
    "Examination(s):\r\nMathematics 2nd Paper\r\nblahblah",
    "Examination(s):\r\nMathematics 2nd Paper\r\n\r\n\r\nmarks"]
 
for text in texts:
    m = re.search(r'Examination\(s\):\s*([^\r\n]+)', text)
    print(f'--- {repr(text)} ---')
    if m:
        print(m.group(1))

Output:

--- 'Examination(s):\r\nMathematics 2nd Paper\r\n\r\nTimeTable' ---
Mathematics 2nd Paper
--- 'Examination(s):\r\nMathematics 2nd Paper\r\nblahblah' ---
Mathematics 2nd Paper
--- 'Examination(s):\r\nMathematics 2nd Paper\r\n\r\n\r\nmarks' ---
Mathematics 2nd Paper
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading