With dataframes df_a
and df_b
, how do I return the difference (meaning, the data in other
at variance with self
) as complete rows (e.g., all columns)? If I do
first = {
'Name': ['Bob', 'Mike', 'Alex'],
'Job': ['Forklift Operator', 'Forklift Operator', 'Master Forklift Operator']
}
second = {
'Name': ['Bob', 'Mike', 'Allen'],
'Job': ['Forklift Operator', 'Forklift Operator', 'Master Forklift Operator']
df_a = pd.DataFrame(first)
df_b = pd.DataFrame(second)
df_c = df_a.compare(df_b)
print(df_c)
that gives me
Name
self other
2 Alex Allen
What I would like to be able to get is the entire row from other
that does not match the left:
Name Job
2 Allen Master Forklift Operator
>Solution :
You can use:
df_b.loc[df_a.compare(df_b).index]
Output:
Name Job
2 Allen Master Forklift Operator