Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use regex to find a repeated pattern in python

My question is, can i repeat the pattern of interest while using regex?
For example i am looking for words in a file (each line is just a word so it makes it easier) that contain only consonant followed by a vowel, and that can happen many times.
This means ‘banana’ is allowed but ‘bananas’, ‘banaana’, ‘bananna’ and so on, is not allwed.
Also ‘ba’ is allowed, so is ‘bana’ and so onetc.
Basicly the pattern i want to repeat is :

[bcdfghjklmnpqrstvwxyz]{1}[aeiouy]{1}

What i did was this (the pattern is the same as above but with greek letters)

import re
def f(x):
    res_count = 0
    regex_list = ['^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  ]
    with open(x) as greek_words:
        for words in greek_words:
            for w_pat in regex_list:
                result = re.findall(w_pat,words)
                if result:
                    res_count += 1
                    corrected = str(result).strip('[]\'')
                    with open('easy_words_for_children.txt', 'a') as g:
                        g.write(f'{corrected}\n')
                    result = False
        return res_count 
f('words_greek_normalized.txt')

So i am just manually repeating the intended pattern but i wanted to see if there is another way to get the same output. The rest is just to write the results in another file.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You’re just looking to repeat a pattern, so this works:

import re

# first two match, the rest don't
some_words = ['banana', 'cola', 'cocoa', 'hear', 'agape', 'letter']

# y is not technically a vowel, but that's not an issue here
expression = '^(?:[bcdfghjklmnpqrstvwxyz][aeiouy])+$'

for word in some_words:
    if re.match(expression, word):
        print(word)

Output:

banana
cola

So, just wrapping the matched text that needs to be repeated in (?:..)+. The + means "once or more times", the parentheses just group what you’re repeating and the ?: means you’re interesting in the grouping, but not in capturing the grouped part separately – you just want to match the whole thing.

Note that you don’t need the {1} – the default is to match it just once unless you tell the regex engine otherwise.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading