Let’s say I have DataFrames df1 and df2:
>>> df1 = pd.DataFrame({'A': [0, 2, 4], 'B': [2, 17, 7], 'C': [4, 9, 11]})
>>> df1
A B C
0 0 2 4
1 2 17 9
2 4 7 11
>>> df2 = pd.DataFrame({'A': [9, 2, 32], 'B': [1, 3, 8], 'C': [6, 2, 41]})
>>> df2
A B C
0 9 1 6
1 2 3 2
2 32 8 41
What I want is the 3rd DataFrame that will have minimal rows (min is calculated based on column B), that is:
>>> df3
A B C
0 9 1 6
1 2 3 2
2 4 7 11
I really don’t want to do this by iterating over all rows and comparing them one by one, is there a faster and compact way to do this?
>Solution :
You can mask df1 with df2 when df2['B'] < df1['B']:
out = df1.mask(df2['B'].lt(df1['B']), df2)
Output:
A B C
0 9 1 6
1 2 3 2
2 4 7 11