There is a dataset in this form:
company_url Name Revenue
mackter.com Mack Sander NaN
nientact.com Neient Dan 321
ventienty.com Richard NaN
So, my task here is to remove all the rows where string ‘tac’, ‘bux’ or ‘mvy’ is coming in either ‘company_url’ or ‘Name’ column…. As you can see, ‘tac’ is present in nientact.com , so the row should get deleted… Similarly, all the rows where any of these 3 string are present in either company_url or Name, the rows should get deleted…. SO, Initially I tried it for company_url column and written the below code, but it’s showing error.
lists=['tac', 'bux', 'mvy']
for i in lists:
df = df[~df['company_url].str.contains(i)]
but its showing
TypeError: unhashable type: ‘list’
>Solution :
You can craft a regex to use with str.contains, then aggregate with any, invert with ~, and perform boolean indexing:
import re
lists = ['tac', 'bux', 'mvy']
pattern = '|'.join(map(re.escape, lists))
# 'tac|bux|mvy'
out = df[~df[['company_url', 'Name']]
.apply(lambda s: s.str.contains(pattern, case=False))
.any(axis=1)
]
Output:
company_url Name Revenue
0 mackter.com Mack Sander NaN
2 ventienty.com Richard NaN
Just for info, as this is inefficient, a fix of your loop:
lists=['tac', 'bux', 'mvy']
for i in lists:
df = df[~df[['company_url', 'Name']]
.apply(lambda s: s.str.contains(i))
.any(axis=1)]