Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas concat compare 2 dataframes with same column names how to build a differ column?

I’m using pandas concat to compare 2 dataframes have same columns and rows:

import pandas as pd

df=pd.read_csv(r'C:\Users\compare\T1.csv')
df2=pd.read_csv(r'C:\Users\compare\T2.csv')

index=['contract','RB','payee','fund']

df_all = pd.concat([df.set_index(index), df2.set_index(index)], 
                   axis='columns', keys=['First', 'Second'])

df_final = df_all.swaplevel(axis='columns')[df.columns[54:56]]

df_final

The output is:

                              SD1                SD2
                             First     Second    First   Second
contract    RB  payee   fund                
AG72916Z    2   1   W42      15622.9    15622.9  15622.9    15489.2
            4   1   W44      14697.8    14697.8  14697.8    14572.1
            8   1   W48      7388.56    7388.56  7388.56    7325.37
AL0024AZ    C3  1   202      226.585    226.59   220.366    220.37
            S3  1   204      804.059    804.06   781.99     781.99

My question is how can I add a differ column after each Second ,so that I can easily tell the comparison result,the output should looks like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

                              SD1                         SD2
                             First     Second   differ    First     Second   differ
contract    RB  payee   fund                
AG72916Z    2   1   W42      15622.9    15622.9    0       15622.9  15489.2  133.7
            4   1   W44      14697.8    14697.8    0       14697.8  14572.1  125.7
            8   1   W48      7388.56    7388.56    0       7388.56  7325.37  63.19
AL0024AZ    C3  1   202      226.585    226.59     0.05    220.366  220.37  -0.004
            S3  1   204      804.059    804.06     0.01    781.99   781.99   0

>Solution :

A bit tricky but necessary to keep ordering:

out = df_final.stack(level=0).assign(Diff=lambda x: x['First'] - x['Second']) \
              .stack().unstack(level=[-2, -1])
print(out)

# Output
                              SD1                         SD2                   
                            First    Second   Diff      First    Second     Diff
contract RB payee fund                                                          
AG72916Z 2  1     W42   15622.900  15622.90  0.000  15622.900  15489.20  133.700
         4  1     W44   14697.800  14697.80  0.000  14697.800  14572.10  125.700
         8  1     W48    7388.560   7388.56  0.000   7388.560   7325.37   63.190
AL0024AZ C3 1     202     226.585    226.59 -0.005    220.366    220.37   -0.004
         S3 1     204     804.059    804.06 -0.001    781.990    781.99    0.000
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading