Home Pandas: filter one dataframe by multiple, simultaneous column values of another dataframe

Questions

Pandas: filter one dataframe by multiple, simultaneous column values of another dataframe

February 7, 2022

I have a filtered dataframe called correct_df and a raw dataframe example_df.

example_df = pd.DataFrame({'Test': ['Test_1', 'Test_1', 'Test_1', 'Test_2', 'Test_2', 'Test_2', 'Test_3', 'Test_3', 'Test_3'], 'A': [1, 2, 3, 1, 2, 3, 1, 2, 3]})
other_df = pd.DataFrame({'Test': ['Test_1', 'Test_1', 'Test_3', 'Test_3'], 'A': [1, 2, 1, 3]})

Desired result:

I want the indexes of example_df where ‘Test’ and ‘A’ column values both match the combined row value of correct_df.

I have tried:

result = example_df.loc[ (example_df['Test'].isin(other_df['Test'])) & (example_df['A'].isin(other_df['A'])) ]
result

    Test    A
0   Test_1  1
1   Test_1  2
2   Test_1  3
6   Test_3  1
7   Test_3  2
8   Test_3  3

But as these two conditions are separated, the resulting value only applies the conditions on single columns without chaining them i.e. A then also B, not A and B. How do I get the .loc of both column conditions?

>Solution :

Use DataFrame.reset_index for avoid lost indices and then DataFrame.merge:

result = example_df.reset_index().merge(other_df, on=['Test','A'])
print (result)
   index    Test  A
0      0  Test_1  1
1      1  Test_1  2
2      6  Test_3  1
3      8  Test_3  3

result = (example_df.reset_index()
                    .merge(other_df, on=['Test','A'])
                    .set_index('index')
                    .rename_axis(None))
print (result)
     Test  A
0  Test_1  1
1  Test_1  2
6  Test_3  1
8  Test_3  3

Another idea with MultiIndex with Index.isin and filtering in boolean indexing:

result = example_df[example_df.set_index(['Test','A']).index
                              .isin(other_df.set_index(['Test','A']).index)]
print (result)
     Test  A
0  Test_1  1
1  Test_1  2
6  Test_3  1
8  Test_3  3