Dataframe filter on two columns as pair from lists or dictionary

March 8, 2022

Having a DataFrame like below

df = pd.DataFrame({'COUNTRY':['PL', 'FR','NO','SE', 'IT', 'PL'], 'REGION':[1,3,1,2,2,3], SUBREGION':[1,2,3,4,6,8]})

and two lists:

country_list = ['IT', 'PL', 'FR']
region_list = [1,1,3]

I would like to filter the DataFrame on both lists where each item of each list is a pair like a {'PL':1} ensuring that only that combination is selected. That is why a regular filter like:

df[(df['COUNTRY'].isin(country_list)) & (df['REGION'].isin(region_list))]

will present a result that is unsatisfactory:

How I go about it for the moment is a very Excel like solution where I create a temp column and merge two lists together to have one key for filtering as below:

df = df.astype({'REGION':'str'})
region_list = list(map(str,region_list))
merged_list = list(zip(country_list,region_list))
merged_list =[''.join(i) for i in merged_list]
df['TEMP COL'] = df['COUNTRY'] + df['REGION']
df[df['TEMP COL'].isin(merged_list)]

This works but I wanted to check with the community if there is a more elegant approach you are using? Let me just note that having an option to filter on one dictionary instead of two lists would be even superior.

>Solution :

Create helper DataFrame and for filter use default inner join by default both columns:

df = pd.DataFrame({'COUNTRY': country_list, 'REGION': region_list}).merge(df)
print (df)
  COUNTRY  REGION  SUBREGION
0      PL       1          1
1      FR       3          2

Or use Index.isin by MultiIndex and filter in boolean indexing:

df = df[df.set_index(['COUNTRY','REGION']).index.isin(zip(country_list, region_list))]
print (df)
  COUNTRY  REGION  SUBREGION
0      PL       1          1
1      FR       3          2