Home Remove rows from one dataframe that is present in another dataframe depending on specific columns

Questions

Remove rows from one dataframe that is present in another dataframe depending on specific columns

September 28, 2022

I have two dataframes as shown below:

df1 =
      A     B    val1     val2   val3
     
0     0    45     20      200      100       
1     3    100    10      80       50      
2     4    30     30      60       10  
3     6    150    50      34       11

df2 =
      A     B    val1     val2   val3
     
0     4    30     80      145      90      
1     0    45     10      80       50      
2     1    78     10      90       18

I want to remove the rows from df1 that are present in df2 only depending on the columns A and B. From df1, the values 0 & 45 and 4 & 30 are present in df2 in columns A and B. So I want the entire row from df1 to be removed. The final dataframe should be as follows:

final = 
      A     B    val1     val2   val3
     
       
0     3    100    10      80       50        
1     6    150    50      34       11

The index should be reset to 0 and 1 in the final dataframe
Is there a way to do this?

>Solution :

Filter columns names for df2 and add indicator parameter to DataFrame.merge with left join, test column _merge and filter in boolean indexing, last add DataFrame.reset_index:

df = (df1[df1.merge(df2[['A','B']], indicator=True, how='left')['_merge'].eq('left_only')]
         .reset_index(drop=True))
print (df)
   A    B  val1  val2  val3
1  3  100    10    80    50
3  6  150    50    34    11

Or create MultiIndex and use Index.isin with invert mask:

df = (df1[~df1.set_index(['A','B']).index.isin(df2.set_index(['A','B']).index)]
           .reset_index(drop=True)))
print (df)
   A    B  val1  val2  val3
1  3  100    10    80    50
3  6  150    50    34    11