Pandas : How to compare 2 dataframes for multiple columns and get the non matching column names

Advertisements

Sample Code:

import pandas as pd

data_list1 = [[1, "p1", 10.0, 20.0],[2, "p2", 15.0, 25.0],[3, "p3", 20.0, 30.0]]
df1 = pd.DataFrame(data_list1, columns=['id', 'p_name', 'qty1', 'qty2'])

data_list2 = [[1, "p1", 12.0, 22.0],[2, "p2", 16.0, 25.0],[3, "p3", 20.0, 34.0], [4, "p4", 32.0, 32.0]]
df2 = pd.DataFrame(data_list2, columns=['id', 'p_name', 'qty1', 'qty2'])

print(df1)
print(df2)

for example, if have 2 pandas dataframe like this and i want to add a new column with what columns have been changed.
Basically comparing new data with old one based on qty1,qty2 column and see if something is changed.

Input1:

   id p_name  qty1  qty2
0   1     p1  10.0  20.0
1   2     p2  15.0  25.0
2   3     p3  20.0  30.0

Input2:

   id p_name  qty1  qty2
0   1     p1  12.0  22.0
1   2     p2  16.0  25.0
2   3     p3  20.0  34.0
3   4     p4  32.0  32.0

expected output:

   id p_name  qty1  qty2 update_reason
0   1     p1  12.0  22.0 qty1,qty2
1   2     p2  16.0  25.0 qty1
2   3     p3  20.0  34.0 qty2
3   4     p4  32.0  32.0 

>Solution :

Assuming the DataFrames are already aligned on their index, you can use:

m = df1.ne(df2)
out = df2.assign(update_reason=m.dot(df1.columns+',').str[:-1])

If you first want to align on a subset of columns:

cols = ['id']
m = df2[cols].merge(df1, how='left').ne(df2)
out = df2.assign(update_reason=m.dot(df1.columns+',').str[:-1])

Output:

   id p_name  qty1  qty2 update_reason
0   1     p1  11.0  22.0     qty1,qty2
different number of rows, aligned on "id"
cols = ['id']
compare = ['qty1', 'qty2']
m = df2[cols].merge(df1, how='left').ne(df2)[compare]
out = df2.assign(update_reason=m.dot(m.columns+',').str[:-1].where(df2['id'].isin(df1['id'])))

Output:

   id p_name  qty1  qty2 update_reason
0   1     p1  12.0  22.0     qty1,qty2
1   2     p2  16.0  25.0          qty1
2   3     p3  20.0  34.0          qty2
3   4     p4  32.0  32.0           NaN

Leave a ReplyCancel reply