Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex doesn't match pattern on Python

I’m using the following pattern to identify values in a document in Python:

\d{2}\/\d{4}\s{1,}\d{1,}(\.?)\d{1,},\d{1,}

I tested this pattern on https://regexr.com/, with this string:

11/2003 480,00 12/2003 480,00 12/2003 480.00,00

And it matches the three dates and values, but when I run it on python, with the same string, it gives me these results:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

['.', '.', '.']

Only dots.

What could I be possibly be missing?

>Solution :

Assumption: You’re using re.findall.

If you have capture groups in your pattern, findall will only show you these groups

\d{2}\/\d{4}\s{1,}\d{1,}(\.?)\d{1,},\d{1,}
                        ^^^^^
       Capturing group for an optional dot
11/2003 480,00 12/2003 480,00 12/2003 480.00,00
                                         ^
                              This gets matched

Hence, this gives us:

re.findall(
    r"\d{2}\/\d{4}\s{1,}\d{1,}(\.?)\d{1,},\d{1,}", 
    "11/2003 480,00 12/2003 480,00 12/2003 480.00,00")
-> ['', '', '.']

Removing the capture group:

re.findall(
    r"\d{2}\/\d{4}\s{1,}\d{1,}\.?\d{1,},\d{1,}", 
    "11/2003 480,00 12/2003 480,00 12/2003 480.00,00")
-> ['11/2003 480,00', '12/2003 480,00', '12/2003 480.00,00']

Sidenote 1:

You can reduce x{1,} to x+

\d{2}\/\d{4}\s+\d+\.?\d+,\d+

Sidenote 2:

I could assume you put this group there to make this separator and the following numbers optional, but not allwo 123.,45, you can group this with an optional Non-Capturing group

re.findall(
    r"\d{2}\/\d{4}\s+\d+(?:\.\d+)?,\d+" 
    "11/2003 480,00 12/2003 480,00 12/2003 480.00,00")
-> ['11/2003 480,00', '12/2003 480,00', '12/2003 480.00,00']

Sidenote 3:

If you want to refer to capturing groups and keep the whole match each time, you can use re.finditer instead of re.findall, this will give you an iterator over every Match object instead of just the capture groups.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading