I have two dataframes as shown below:
df1 =
A B val1 val2 val3
0 0 45 20 200 100
1 3 100 10 80 50
2 4 30 30 60 10
3 6 150 50 34 11
df2 =
A B val1 val2 val3
0 4 30 80 145 90
1 0 45 10 80 50
2 1 78 10 90 18
I want to remove the rows from df1 that are present in df2 only depending on the columns A and B. From df1, the values 0 & 45 and 4 & 30 are present in df2 in columns A and B. So I want the entire row from df1 to be removed. The final dataframe should be as follows:
final =
A B val1 val2 val3
0 3 100 10 80 50
1 6 150 50 34 11
The index should be reset to 0 and 1 in the final dataframe
Is there a way to do this?
>Solution :
Filter columns names for df2 and add indicator parameter to DataFrame.merge with left join, test column _merge and filter in boolean indexing, last add DataFrame.reset_index:
df = (df1[df1.merge(df2[['A','B']], indicator=True, how='left')['_merge'].eq('left_only')]
.reset_index(drop=True))
print (df)
A B val1 val2 val3
1 3 100 10 80 50
3 6 150 50 34 11
Or create MultiIndex and use Index.isin with invert mask:
df = (df1[~df1.set_index(['A','B']).index.isin(df2.set_index(['A','B']).index)]
.reset_index(drop=True)))
print (df)
A B val1 val2 val3
1 3 100 10 80 50
3 6 150 50 34 11