What I want to do
- I have two
pandas.DataFrame,df1anddf2. Both have the same columns. - All indices in
df2are also found indf1, but there are some indices that onlydf1has. - Rows with an index that is owned by both
df1anddf2, use rows ofdf2. - Rows with an index that is owned only by
df1, use rows ofdf1.
In short, "replaces values of df1 with values of df2 based on MultiIndex".
import pandas as pd
index_names = ['index1', 'index2']
columns = ['column1', 'column2']
data1 = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
index1 = [['i1', 'i1', 'i1', 'i2', 'i2'], ['A', 'B', 'C', 'B', 'C']]
df1 = pd.DataFrame(data1, index=pd.MultiIndex.from_arrays(index1, names=index_names), columns=columns)
print(df1)
## OUTPUT
# column1 column2
#index1 index2
#i1 A 1 2
# B 2 3
# C 3 4
#i2 B 4 5
# C 5 6
data2 = [[11, 12], [12, 13]]
index2 = [['i2', 'i1'], ['C', 'C']]
df2 = pd.DataFrame(data2, index=pd.MultiIndex.from_arrays(index2, names=index_names), columns=columns)
print(df2)
## OUTPUT
# column1 column2
#index1 index2
#i2 C 11 12
#i1 C 12 13
## DO SOMETHING!
## EXPECTED OUTPUT
# column1 column2
#index1 index2
#i1 A 1 2
# B 2 3
# C 12 13 # REPLACED!
#i2 B 4 5
# C 11 12 # REPLACED!
Environment
Python 3.10.5
Pandas 1.4.3
>Solution :
You can use direct assignment via .loc or a call to .update
>>> df3 = df1.copy()
>>> df3.update(df2)
>>> df3
column1 column2
index1 index2
i1 A 1.0 2.0
B 2.0 3.0
C 12.0 13.0
i2 B 4.0 5.0
C 11.0 12.0