Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex pattern is taking more than 4 digit number

import re
text = """State of California that the foregoing is true and correct. (For California sheriff or marshal use only) 1950-24-12 I certify that the foregoing is true and correct. Date: (SIGNATURE) SUBP-010 [Rev. January 1,2012] PROOF OF SERVICE OF DEPOSITION SUBPOENA FOR PRODUCTION OF BUSINESS RECORDS 055826-00-07 Page 2 of 2"""
pattern = re.findall("\d{2,4}[-]\d{1,2}[-]\d{1,2}",text)
print(pattern)

Required_output: 1950-24-12

The solution is taking 5826-00-07. Though it has more than 4 digit number. Is there any solution to remove it

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What you want is called negative lookbehind. This means only matching a pattern when the section directly behind the match does not match a given sequence. To give you an example of what this means, (?<!something)abc will match any occurrence of "abc" that does not directly get proceeded by "something".

So in your case, you want to add (?<!\d) to the beginning of your regex to only match a pattern not proceeded by a digit.

Also, [-] will only match the character - so you don’t need the brackets. After this change, the new regex is (?<!\d)\d{2,4}-\d{1,2}-\d{1,2}.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading