Sample Code:
import pandas as pd
data_list1 = [[1, "p1", 10.0, 20.0],[2, "p2", 15.0, 25.0],[3, "p3", 20.0, 30.0]]
df1 = pd.DataFrame(data_list1, columns=['id', 'p_name', 'qty1', 'qty2'])
data_list2 = [[1, "p1", 12.0, 22.0],[2, "p2", 16.0, 25.0],[3, "p3", 20.0, 34.0], [4, "p4", 32.0, 32.0]]
df2 = pd.DataFrame(data_list2, columns=['id', 'p_name', 'qty1', 'qty2'])
print(df1)
print(df2)
for example, if have 2 pandas dataframe like this and i want to add a new column with what columns have been changed.
Basically comparing new data with old one based on qty1,qty2 column and see if something is changed.
Input1:
id p_name qty1 qty2
0 1 p1 10.0 20.0
1 2 p2 15.0 25.0
2 3 p3 20.0 30.0
Input2:
id p_name qty1 qty2
0 1 p1 12.0 22.0
1 2 p2 16.0 25.0
2 3 p3 20.0 34.0
3 4 p4 32.0 32.0
expected output:
id p_name qty1 qty2 update_reason
0 1 p1 12.0 22.0 qty1,qty2
1 2 p2 16.0 25.0 qty1
2 3 p3 20.0 34.0 qty2
3 4 p4 32.0 32.0
>Solution :
Assuming the DataFrames are already aligned on their index, you can use:
m = df1.ne(df2)
out = df2.assign(update_reason=m.dot(df1.columns+',').str[:-1])
If you first want to align on a subset of columns:
cols = ['id']
m = df2[cols].merge(df1, how='left').ne(df2)
out = df2.assign(update_reason=m.dot(df1.columns+',').str[:-1])
Output:
id p_name qty1 qty2 update_reason
0 1 p1 11.0 22.0 qty1,qty2
different number of rows, aligned on "id"
cols = ['id']
compare = ['qty1', 'qty2']
m = df2[cols].merge(df1, how='left').ne(df2)[compare]
out = df2.assign(update_reason=m.dot(m.columns+',').str[:-1].where(df2['id'].isin(df1['id'])))
Output:
id p_name qty1 qty2 update_reason
0 1 p1 12.0 22.0 qty1,qty2
1 2 p2 16.0 25.0 qty1
2 3 p3 20.0 34.0 qty2
3 4 p4 32.0 32.0 NaN