I have a DataFrame that looks a bit like this:
A B C D ... G H I J
0 First First First First ... 0.412470 0.758011 0.066926 0.877992
1 First First First Third ... 0.007162 0.957042 0.601337 0.636086
2 First First Third First ... 0.956398 0.640909 0.602861 0.679656
3 First First Third Third ... 0.905421 0.199685 0.471300 0.975808
4 First Third First First ... 0.378181 0.498606 0.865298 0.914407
5 First Third First Third ... 0.387706 0.247412 0.339593 0.431647
6 First Third Third First ... 0.582202 0.046199 0.496258 0.533133
7 First Third Third Third ... 0.877199 0.011512 0.338528 0.938252
8 Third First First First ... 0.446433 0.175686 0.115796 0.985400
9 Third First First Third ... 0.315839 0.252855 0.142463 0.929233
10 Third First Third First ... 0.192566 0.600732 0.434166 0.933182
11 Third First Third Third ... 0.380029 0.511411 0.672583 0.807731
12 Third Third First First ... 0.915590 0.507470 0.390135 0.303314
13 Third Third First Third ... 0.977414 0.062521 0.909845 0.314432
14 Third Third Third First ... 0.608958 0.384802 0.193425 0.689283
15 Third Third Third Third ... 0.496223 0.478222 0.076192 0.695453
[16 rows x 10 columns]
I also have a list (coming from elsewhere) with the values for A, B, C & D that I’m looking for, something like this:
expected = ['First', 'Third', 'First', 'Third']
I’d like to filter to find a row matching a certain set of ABCD values, where the expected values are in a list. Something like this (which doesn’t work):
# This looks neat, but doesn't work
rows = df[df[['A', 'B', 'C', 'D'] == expected]]
rows
# Not what I was hoping for!
Out[17]:
A B C D E F G H I J
0 First NaN First NaN NaN NaN NaN NaN NaN NaN
1 First NaN First Third NaN NaN NaN NaN NaN NaN
2 First NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 First NaN NaN Third NaN NaN NaN NaN NaN NaN
4 First Third First NaN NaN NaN NaN NaN NaN NaN
5 First Third First Third NaN NaN NaN NaN NaN NaN
6 First Third NaN NaN NaN NaN NaN NaN NaN NaN
7 First Third NaN Third NaN NaN NaN NaN NaN NaN
8 NaN NaN First NaN NaN NaN NaN NaN NaN NaN
9 NaN NaN First Third NaN NaN NaN NaN NaN NaN
10 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11 NaN NaN NaN Third NaN NaN NaN NaN NaN NaN
12 NaN Third First NaN NaN NaN NaN NaN NaN NaN
13 NaN Third First Third NaN NaN NaN NaN NaN NaN
14 NaN Third NaN NaN NaN NaN NaN NaN NaN NaN
15 NaN Third NaN Third NaN NaN NaN NaN NaN NaN
I could use dropna(subset=['A', 'B', 'C', 'D']) to get the relevant rows, then extract the index and use it on the original table, but that’s getting quite long-winded.
I know I can do this long-hand like this, but I’m wondering whether there’s a neater way:
# This works, but is clunky:
rows = df[(df['A'] == expected[0]) & (df['B'] == expected[1]) & (df['C'] == expected[2]) & (df['D'] == expected[3])]
rows
# This is what I want:
Out[19]:
A B C D ... G H I J
5 First Third First Third ... 0.387706 0.247412 0.339593 0.431647
[1 rows x 10 columns]
Is there a simpler way of doing this? My searching for filtering by lists just seems to come up with lots of isin suggestions, which aren’t relevant.
>Solution :
You need select columns by [[]] first for subset, compare by list and test if all values are Trues by DataFrame.all:
expected = ['First', 'Third', 'First', 'Third']
rows = df[(df[['A', 'B', 'C', 'D']] == expected).all(axis=1)]
print (rows)
A B C D ... G H I J
5 First Third First Third ... 0.387706 0.247412 0.339593 0.431647
How it working:
print (df[['A', 'B', 'C', 'D']])
A B C D
0 First First First First
1 First First First Third
2 First First Third First
3 First First Third Third
4 First Third First First
5 First Third First Third
6 First Third Third First
7 First Third Third Third
8 Third First First First
9 Third First First Third
10 Third First Third First
11 Third First Third Third
12 Third Third First First
13 Third Third First Third
14 Third Third Third First
15 Third Third Third Third
print ((df[['A', 'B', 'C', 'D']] == expected))
A B C D
0 True False True False
1 True False True True
2 True False False False
3 True False False True
4 True True True False
5 True True True True
6 True True False False
7 True True False True
8 False False True False
9 False False True True
10 False False False False
11 False False False True
12 False True True False
13 False True True True
14 False True False False
15 False True False True
print ((df[['A', 'B', 'C', 'D']] == expected).all(axis=1))
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
dtype: bool