Home How to get rid of words with special characters in Python using regular expressions?

Questions

How to get rid of words with special characters in Python using regular expressions?

January 23, 2022

I am trying to eliminate words from a list that contain any non-alphabet characters. I’ve tried for hours to understand why my regex attempts fail.

import re

lst = ["acted", "30-30", "adage", "fatal", "tested", "abcd-ef", "g'day"]

# get rid of words that have any non-alphabet characters
pattern = r"\W"
# pattern = r"[a-z|A-Z]{5}" # tried this as well
for word in lst:
    if re.findall(pattern, word):
        print(word + " not valid")
        lst.remove(word)
    else:
        print(word + " valid")
print(lst)

Why does adage not print as valid but then is not removed from the list? Why is g'day not being removed for having ' in it? Ideally, I was hoping to check for 5 letter words but just getting the special char words out is eluding me and I don’t want to get more confused.

>Solution :

The regex pattern is correct. As @JCaeser has mentioned, having a new list to store the valid words works fine. The word g'day is not being checked due to some indexing behaviour.

import re

lst = ["acted", "30-30", "adage", "fatal", "tested", "abcd-ef", "g'day"]
new_lst = []
# get rid of words that have any non-alphabet characters
pattern = r"\W"
for word in lst:
    if re.findall(pattern, word):
        print(word + " not valid")
    else:
        print(word + " valid")
        new_lst.append(word)
print(new_lst)

Output: