I have a dataframe:
col1 col2 col3
a b b
a b c
k l o
b l b
I want to keep only rows where col1 is "a", col2 is "b" and col3 is "b" or col1 is "k", col2 is "l" and col3 is "o". So desired result is:
col1 col2 col3
a b b
k l o
How to do that? i can write dt[(dt["col1"]=="a")&(dt["col2"]=="b")&(dt["col1"]=="b")] but what about second case? should i put it with or?
>Solution :
abb = (df["col1"]=="a") & (df["col2"]=="b") & (df["col3"]=="b")
klo = (df["col1"]=="k") & (df["col2"]=="l") & (df["col3"]=="o")
df[(abb) | (klo)]
col1 col2 col3
0 a b b
2 k l o
Alternatively, you could write something like this, just to avoid all those conditionals:
abb = 'abb'
klo = 'klo'
strings = [abb, klo]
def f(x):
if ''.join(x) in strings:
return True
return False
df[df.apply(lambda x: f(x), axis=1)]
col1 col2 col3
0 a b b
2 k l o
So, here we are applying a custom function to each row with df.apply. Inside the function we turn the row into a single string with str.join and we check if this string exists in our predefined list of strings. Finally, we use the resulting pd.Series with booleans to select from our df.