Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Search with a list of terms in a dataframe column where rows are lists of eligible terms

I would like to filter a dataframe by using a list of search terms. I need to check if any of the terms in the list filter_by are present in one of the rows of my dataframe. Each row in the dataframe contains a list of eligible terms.

I tried several hacks and more verbose solutions that kind of work. I wonder if there is a simple and possibly vectorized solutions to this.

Example list of search terms:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

filter_by = ["CSV", "JPG", "TXT"]

Example dataframe:

    data
0   [JPG, XML, PDF, TXT, XLS, XLSX]
1   [XLS, JPG, PDF, XLSX]
2   [JPG, PDF, TXT]
3   [XLS, PDF, XML, JPG]
4   [CSV]
5   [PDF, TXT, XLS]
6   [XML]
7   [JPG]
8   [PDF, CSV, XML, XLS, TXT, JPG]
9   [TXT, CSV, XLS, XLSX, PDF, JPG]

The resulting dataframe should only contain the rows where one or more of the elements of filter_by is present (OR condition).

I tried:

  • Flattening each dataframe row to a string and using str.contains("|".join(filter_by). This works. But it seems strange to me that I have to change the data structure fundamentally to achieve such a seemingly simple task.
  • .apply() and a custom function. This works but seems verbose.
  • .query() doesn’t seem to be a solution. Neither is something like df[df.data.isin(filter_by)].
  • I also searched for similar questions on SO, to no avail.

Thanks in advance. Any help is much appreciated.

>Solution :

You may use not set(filter_by).isdisjoint inside apply:

filter_by = ["CSV", "JPG", "TXT"]
df[~df.data.apply(set(filter_by).isdisjoint)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading