Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Select string based on certain word's presence only and exclude everything else

I have a table like this:

Column1 Column2 Text
1 2 Apple Orange Car
2 5 Apple Tree
3 8 Apple Orange
4 7 Sun Orange
5 8 Orange
6 7 Apple Orange Apple

Now what I want is to filter this DataFrame by Text column with either (Apple or Orange) present within a text there and nothing else.

So the output should look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Column1 Column2 Text
3 8 Apple Orange
5 8 Orange
6 7 Apple Orange Apple

What would be the way to achieve it?

>Solution :

This breaks the words into a list, makes the list into a set, and then uses set operations to essentially ask:

  • "Is the Text set a subset of {'Apple', 'Orange'}"
df[df.Text.str.split().apply(set).le({'Apple', 'Orange'})]

Output:

   Column1  Column2                Text
2        3        8        Apple Orange
4        5        8              Orange
5        6        7  Apple Orange Apple
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading