I have a table like this:
| Column1 | Column2 | Text |
|---|---|---|
| 1 | 2 | Apple Orange Car |
| 2 | 5 | Apple Tree |
| 3 | 8 | Apple Orange |
| 4 | 7 | Sun Orange |
| 5 | 8 | Orange |
| 6 | 7 | Apple Orange Apple |
Now what I want is to filter this DataFrame by Text column with either (Apple or Orange) present within a text there and nothing else.
So the output should look like this:
| Column1 | Column2 | Text |
|---|---|---|
| 3 | 8 | Apple Orange |
| 5 | 8 | Orange |
| 6 | 7 | Apple Orange Apple |
What would be the way to achieve it?
>Solution :
This breaks the words into a list, makes the list into a set, and then uses set operations to essentially ask:
- "Is the
Textset a subset of{'Apple', 'Orange'}"
df[df.Text.str.split().apply(set).le({'Apple', 'Orange'})]
Output:
Column1 Column2 Text
2 3 8 Apple Orange
4 5 8 Orange
5 6 7 Apple Orange Apple