Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

remove entire rows from df if the word occurs

list of stowwords:

stop_w = ["in", "&", "the", "|", "and", "is", "of", "a", "an", "as", "for", "was"]

df:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

words frequency
the company 10
green energy 9
founded in 8
gases for 8
electricity 5

I would like to remove entire row if it contains ANY of given stopwords, in this example output should be:

words frequency
green energy 9
electricity 5

>Solution :

The | character has a meaning, it means or in python’s terms, so you need to escape that meaning in order to use it in your stop words list. You escape that with a backslash \ (see more here)

Having said that you can do:

stop_w = ["in", "&", "the", "\|", "and", "is", "of", "a", "an", "as", "for", "was"]
df.loc[~df['words'].str.contains('|'.join(stop_w))]

prints:

          words  frequency
1  green energy          9
4   electricity          5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading