As an example, I have the next dataset (fake random data) –
| Index | category | value |
|---|---|---|
| 1 | dog | 5 |
| 2 | cat | 22 |
| 3 | Tasselled Wobbegong | 44 |
| 4 | cat | 66 |
| 5 | Tasselled Wobbegong | 5 |
| 6 | dog | 23 |
I have this in a vaex dataframe.
Now imagine I have 10,000 categories not only 3.
I want to filter my vaex dataframe by a list of categories.
like so:
filter_category_list = ['cat','dog']
df = df[df.category in filter_category_list ]
(the code above doesn’t work I imagine it would be similar to this)
I expect my output to be:
| Index | category | value |
|---|---|---|
| 1 | dog | 5 |
| 2 | cat | 22 |
| 4 | cat | 66 |
| 6 | dog | 23 |
Any idea how to achieve that with vaex?
Thanks for taking the time to read!
>Solution :
Here are some solutions for that.
df.query("category in @filter_category_list")
df[df['category'].apply(lambda x: x in filter_category_list)]
df[df['category'].isin(filter_category_list)]