Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace values in pandas.DataFrame using MultiIndex

What I want to do

  • I have two pandas.DataFrame, df1 and df2. Both have the same columns.
  • All indices in df2 are also found in df1, but there are some indices that only df1 has.
  • Rows with an index that is owned by both df1 and df2, use rows of df2.
  • Rows with an index that is owned only by df1, use rows of df1.

In short, "replaces values of df1 with values of df2 based on MultiIndex".

import pandas as pd

index_names = ['index1', 'index2']
columns = ['column1', 'column2']

data1 = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]] 
index1 = [['i1', 'i1', 'i1', 'i2', 'i2'], ['A', 'B', 'C', 'B', 'C']]
df1 = pd.DataFrame(data1, index=pd.MultiIndex.from_arrays(index1, names=index_names), columns=columns)
print(df1)
## OUTPUT
#               column1  column2
#index1 index2                  
#i1     A             1        2
#       B             2        3
#       C             3        4
#i2     B             4        5
#       C             5        6

data2 = [[11, 12], [12, 13]]
index2 = [['i2', 'i1'], ['C', 'C']]
df2 = pd.DataFrame(data2, index=pd.MultiIndex.from_arrays(index2, names=index_names), columns=columns)
print(df2)
## OUTPUT
#               column1  column2
#index1 index2                  
#i2     C            11       12
#i1     C            12       13

## DO SOMETHING!

## EXPECTED OUTPUT
#               column1  column2
#index1 index2                  
#i1     A             1        2
#       B             2        3
#       C            12       13 # REPLACED!
#i2     B             4        5
#       C            11       12 # REPLACED!

Environment

Python 3.10.5
Pandas 1.4.3

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use direct assignment via .loc or a call to .update

>>> df3 = df1.copy()
>>> df3.update(df2)
>>> df3
               column1  column2
index1 index2                  
i1     A           1.0      2.0
       B           2.0      3.0
       C          12.0     13.0
i2     B           4.0      5.0
       C          11.0     12.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading