Having a couple of dataframes like that (df, df2),
df
D R1 R2 R3
0 D1 1 1 1
1 D1 1 1 1
2 D2 1 2 1
3 D2 1 2 1
4 D3 1 0 1
df2
D R1 R2 R3
0 D1 1 1 1
1 D1 1 1 1
2 D2 1 3 1
3 D2 1 3 1
4 D3 1 1 1
5 D3 2 2 2
6 D3 2 2 2
Is it possible to merge them and create an additional column called "new_values" with those values that only exist in one of the two dataframes ?
Expected result :
D R1 R2 R3 _merge new_values
0 D1 1 1 1 both False
1 D1 1 1 1 both False
2 D1 1 1 1 both False
3 D1 1 1 1 both False
4 D2 1 2 1 left_only False
5 D2 1 2 1 left_only False
6 D3 1 0 1 left_only False
7 D2 1 3 1 right_only False
8 D2 1 3 1 right_only False
9 D3 1 1 1 right_only False
10 D3 2 2 2 right_only True
11 D3 2 2 2 right_only True
>Solution :
Use outer join with merge and test memebrship of indices in Series.isin:
df = (df.reset_index()
.merge(df2.reset_index(),
how='outer',
indicator=True,
on=df.columns.tolist()))
df['new_values'] = ~df.pop('index_y').isin(df.pop('index_x'))
print (df)
D R1 R2 R3 _merge new_values
0 D1 1 1 1 both False
1 D1 1 1 1 both False
2 D1 1 1 1 both False
3 D1 1 1 1 both False
4 D2 1 2 1 left_only False
5 D2 1 2 1 left_only False
6 D3 1 0 1 left_only False
7 D2 1 3 1 right_only False
8 D2 1 3 1 right_only False
9 D3 1 1 1 right_only False
10 D3 2 2 2 right_only True
11 D3 2 2 2 right_only True