Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to remove a list of words from a text ONLY IF it is a whole word, not a part of a word

I have a list of word that I want to remove from a given text. With my limited python knowledge, I tried to replace those list of words with null value in a loop. It worked ok but the problem is it replaced all string matched to it even chunk of a word. Please look the code and output below:

word_list = {'the', 'mind', 'pen'}
def remove_w(text):
  for word in word_list:
    text = text.replace(word, '')
  return text
remove_w('A pencil is over a thermometer with mind itself.')

The output is:

‘A cil is over a rmometer with itself.’

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

It removed part of some words. However, clearly I wanted the following output below.

A pencil is over a thermometer with itself.

How to remove such list of words from a text ONLY IF it is a whole word, not a part of a word. (Since I will use it on large articles, please suggest a way that is faster approach) Thank you.

>Solution :

You can use a regular expression with word boundaries.

pattern = re.compile('|'.join(rf'\b{re.escape(w)}\b' for w in word_list))
def remove_w(text):
    return pattern.sub('', text)

Alternatively, use str.split to separate into words delimited by spaces, remove the words exactly matching one of those in the set, then join it back together.

def remove_w(text):
    return ' '.join(w for w in text.split() if w not in word_list)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading