Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas .loc() method using "not" and "in" operators

I have a data frame and I want to remove some rows if their value is not equal to some values that I have stored in a list.

So I have a list variable stating the values of objects I want to keep:

allowed_values = ["value1", "value2", "value3"]

And I am attempting to remove rows from my dataframe if a certain column does not contain 1 of the allowed_values. At first I was using a for loop and if statement like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for index, row in df.iterrows():
    if row["Type"] not in allowed_values:
        # drop the row, was about to find out how to do this, but then I found out about the `.loc()` method and thought it would be better to use this.

So using the .loc() method, I can do something like this to only keep objects that have a Type value equal to value1:

df = df.loc[df["Type"] == "value1"]

But I want to keep all objects that have a Type in the allowed_values list. I tried to do this:

df = df.loc[df["Type"] in allowed_values]

but this gives me the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I would expect this to still work as using the in or a combination of not in operators still results in a boolean, so I’m not sure why the .loc() method doesn’t like these operators.

What exactly is wrong with using in or not operators in the .loc() method and how can I create a logical statment that will drop rows if their Type value is not in the allowed_values list?

EDIT: I found this question asking about the same error I got and the answer was that you need to use bitwise operators only (e.g. ==, !=, &, |, etc) and not and in are not bitwise operators and require something called "truth-values". So I think the only way to get the functionality I want is to just have a lengthy bitwise logical operator, something like:

df = df.loc[(df["Type"] == "value1") | (df["Type"] == "value2") | (df["Type"] == "value3")]

Is there no other way to check each value is in the allowed_values list? This would make my code a lot neater (I have more than 3 values in the list, so this is a lengthy line).

>Solution :

Try this:

import pandas as pd
allowed_values = ['White', 'Green', 'Red']
df = pd.DataFrame({'color': ['White', 'Black', 'Green', 'White']})
df = df[df['color'].isin(allowed_values)]
df

   color
0  White
2  Green
3  White

If you must use .loc then you can use:

df = df.loc[df['color'].isin(allowed_values)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading