ive been trying to remove extra words like {'by','the','and','of' ,'a'}
from text so my best way to do it is like this .
Code :
def clean_text(text):
"""
takes the text and removes signs and some words
"""
stopwords = {'by','the','and','of' ,'a'}
result = [word for word in re.split("\W+",text) if word.lower() not in stopwords]
result = (' ').join(result)
print(result)
return result
#dummy text
long_string = "one Groups are marked by the ()meta-characters. two They group together the expressions contained one inside them, and you can one repeat the contents of a group with a repeating qualifier, such as there"
clean_text(long_string)
my question is , is there any better way to do it without using forloop , does regex has any method to remove some words from text and ignore using forloop
>Solution :
You could use a regex replacement approach by forming an alternation of stop words and then removing them.
long_string = "one Groups are marked by the ()meta-characters. two They group together the expressions contained one inside them, and you can one repeat the contents of a group with a repeating qualifier, such as there"
words = ["by", "the", "and", "of", "a"]
regex = r'\s*\b(?:' + r'|'.join(words) + r')\b\s*'
output = re.sub(regex, ' ', long_string).strip()
print(output)
This prints:
one Groups are marked ()meta-characters. two They group together expressions contained one inside them, you can one repeat contents group with repeating qualifier, such as there