I am trying to eliminate words from a list that contain any non-alphabet characters. I’ve tried for hours to understand why my regex attempts fail.
import re
lst = ["acted", "30-30", "adage", "fatal", "tested", "abcd-ef", "g'day"]
# get rid of words that have any non-alphabet characters
pattern = r"\W"
# pattern = r"[a-z|A-Z]{5}" # tried this as well
for word in lst:
if re.findall(pattern, word):
print(word + " not valid")
lst.remove(word)
else:
print(word + " valid")
print(lst)
Why does adage not print as valid but then is not removed from the list? Why is g'day not being removed for having ' in it? Ideally, I was hoping to check for 5 letter words but just getting the special char words out is eluding me and I don’t want to get more confused.
>Solution :
The regex pattern is correct. As @JCaeser has mentioned, having a new list to store the valid words works fine. The word g'day is not being checked due to some indexing behaviour.
import re
lst = ["acted", "30-30", "adage", "fatal", "tested", "abcd-ef", "g'day"]
new_lst = []
# get rid of words that have any non-alphabet characters
pattern = r"\W"
for word in lst:
if re.findall(pattern, word):
print(word + " not valid")
else:
print(word + " valid")
new_lst.append(word)
print(new_lst)
Output:
['acted', 'adage', 'fatal', 'tested']