Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to filter dataframe by columns values combinations?

I have a dataframe:

col1   col2   col3
a       b       b
a       b       c
k       l       o
b       l       b

I want to keep only rows where col1 is "a", col2 is "b" and col3 is "b" or col1 is "k", col2 is "l" and col3 is "o". So desired result is:

col1   col2   col3
a       b       b
k       l       o

How to do that? i can write dt[(dt["col1"]=="a")&(dt["col2"]=="b")&(dt["col1"]=="b")] but what about second case? should i put it with or?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

abb = (df["col1"]=="a") & (df["col2"]=="b") & (df["col3"]=="b")
klo = (df["col1"]=="k") & (df["col2"]=="l") & (df["col3"]=="o")

df[(abb) | (klo)]

  col1 col2 col3
0    a    b    b
2    k    l    o

Alternatively, you could write something like this, just to avoid all those conditionals:

abb = 'abb'
klo = 'klo'

strings = [abb, klo]

def f(x):
    if ''.join(x) in strings:
        return True
    return False

df[df.apply(lambda x: f(x), axis=1)]

  col1 col2 col3
0    a    b    b
2    k    l    o

So, here we are applying a custom function to each row with df.apply. Inside the function we turn the row into a single string with str.join and we check if this string exists in our predefined list of strings. Finally, we use the resulting pd.Series with booleans to select from our df.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading