I have a data frame and I want to remove some rows if their value is not equal to some values that I have stored in a list.
So I have a list variable stating the values of objects I want to keep:
allowed_values = ["value1", "value2", "value3"]
And I am attempting to remove rows from my dataframe if a certain column does not contain 1 of the allowed_values. At first I was using a for loop and if statement like this:
for index, row in df.iterrows():
if row["Type"] not in allowed_values:
# drop the row, was about to find out how to do this, but then I found out about the `.loc()` method and thought it would be better to use this.
So using the .loc() method, I can do something like this to only keep objects that have a Type value equal to value1:
df = df.loc[df["Type"] == "value1"]
But I want to keep all objects that have a Type in the allowed_values list. I tried to do this:
df = df.loc[df["Type"] in allowed_values]
but this gives me the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I would expect this to still work as using the in or a combination of not in operators still results in a boolean, so I’m not sure why the .loc() method doesn’t like these operators.
What exactly is wrong with using in or not operators in the .loc() method and how can I create a logical statment that will drop rows if their Type value is not in the allowed_values list?
EDIT: I found this question asking about the same error I got and the answer was that you need to use bitwise operators only (e.g. ==, !=, &, |, etc) and not and in are not bitwise operators and require something called "truth-values". So I think the only way to get the functionality I want is to just have a lengthy bitwise logical operator, something like:
df = df.loc[(df["Type"] == "value1") | (df["Type"] == "value2") | (df["Type"] == "value3")]
Is there no other way to check each value is in the allowed_values list? This would make my code a lot neater (I have more than 3 values in the list, so this is a lengthy line).
>Solution :
Try this:
import pandas as pd
allowed_values = ['White', 'Green', 'Red']
df = pd.DataFrame({'color': ['White', 'Black', 'Green', 'White']})
df = df[df['color'].isin(allowed_values)]
df
color
0 White
2 Green
3 White
If you must use .loc then you can use:
df = df.loc[df['color'].isin(allowed_values)]