Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

re.findall() function python

Can you please help me to understand the following line of the code:

import re 
a= re.findall('[А-Яа-я-\s]+', string)

I am a bit confused with the pattern that has to be found in the string. Particularly, a string should start with A and end with any string in-between A and я, should be separated by - and space, but what does the second term Яа stand for?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

[         ]      any of the characters in here
 А-Я             any character from А and Я, inclusive
    а-я          any character between а and я, inclusive
       -         the character -   (this is ambiguous; it should only be at the very start or end of the class)
        \s       any whitespace character
           +     at least one of the preceding class of characters

[А-Яа-я-\s]+     at least one character between А and Я (uppercase or lowercase), a dash, or whitespace

the [] is called a "class" in regex, and it’s basically meant to say "any of the characters inside here is valid". And then + means "at least one occurrence of the preceding character/class".
Python has a Regular Expressions HowTo that you might find useful to read through.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading