list of stowwords:
stop_w = ["in", "&", "the", "|", "and", "is", "of", "a", "an", "as", "for", "was"]
df:
| words | frequency |
|---|---|
| the company | 10 |
| green energy | 9 |
| founded in | 8 |
| gases for | 8 |
| electricity | 5 |
I would like to remove entire row if it contains ANY of given stopwords, in this example output should be:
| words | frequency |
|---|---|
| green energy | 9 |
| electricity | 5 |
>Solution :
The | character has a meaning, it means or in python’s terms, so you need to escape that meaning in order to use it in your stop words list. You escape that with a backslash \ (see more here)
Having said that you can do:
stop_w = ["in", "&", "the", "\|", "and", "is", "of", "a", "an", "as", "for", "was"]
df.loc[~df['words'].str.contains('|'.join(stop_w))]
prints:
words frequency
1 green energy 9
4 electricity 5