Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Unable to Filter based on Substring in Pandas

There is a dataset in this form:

company_url         Name                  Revenue
mackter.com         Mack Sander           NaN
nientact.com        Neient Dan            321
ventienty.com       Richard               NaN

So, my task here is to remove all the rows where string ‘tac’, ‘bux’ or ‘mvy’ is coming in either ‘company_url’ or ‘Name’ column…. As you can see, ‘tac’ is present in nientact.com , so the row should get deleted… Similarly, all the rows where any of these 3 string are present in either company_url or Name, the rows should get deleted…. SO, Initially I tried it for company_url column and written the below code, but it’s showing error.

lists=['tac', 'bux', 'mvy']
for i in lists:
    df = df[~df['company_url].str.contains(i)]

but its showing
TypeError: unhashable type: ‘list’

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can craft a regex to use with str.contains, then aggregate with any, invert with ~, and perform boolean indexing:

import re

lists = ['tac', 'bux', 'mvy']
pattern = '|'.join(map(re.escape, lists))
# 'tac|bux|mvy'

out = df[~df[['company_url', 'Name']]
          .apply(lambda s: s.str.contains(pattern, case=False))
                            .any(axis=1)
        ]

Output:

     company_url         Name  Revenue
0    mackter.com  Mack Sander      NaN
2  ventienty.com      Richard      NaN

Just for info, as this is inefficient, a fix of your loop:

lists=['tac', 'bux', 'mvy']
for i in lists:
    df = df[~df[['company_url', 'Name']]
               .apply(lambda s: s.str.contains(i))
               .any(axis=1)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading