Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex end with a character or end of line with lookahead

I have this string

Book Release Date: 2 June, 2010 [Edition#5]

Book Release Date: 24 October, 1996

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want to use a regex to find the date only like follow:

2 June, 2010

24 October, 1996

I have tried using this pattern that is close to what I want

# this pattern result
# 2 June, 2010 [Edition#5]
# 24 October, 1996
date = re.findall(r"(?<=(Book Release Date: ))(.*?)(?=(\[|\n))", text)

# this pattern result
# 2 June, 2010
# None
date = re.findall(r"(?<=(Book Release Date: ))(.*?)(?=\[)", text)

>Solution :

You don’t need any lookaround assertions, just a single capture group that will be returned using re.findall

\bBook Release Date: (\d+ [A-Z][a-z]+, \d{4})\b

Explanation

  • \bBook Release Date:
  • ( Capture group 1
    • \d+ [A-Z][a-z]+ Match 1+ digits, space, uppercase char A-Z, 1+ lowercase chars
    • , \d{4} Match , and 4 digits
  • ) Close group 1
  • \b A word boundary to prevent a partial word match

Regex demo | Python demo

Example

import re
 
pattern = r"\bBook Release Date: (\d+ [A-Z][a-z]+, \d{4})\b"
 
s = ("Book Release Date: 2 June, 2010 [Edition#5]\n"
    "Book Release Date: 24 October, 1996")
 
print(re.findall(pattern, s))

Output

['2 June, 2010', '24 October, 1996']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading