Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to make difference between two pandas dataframes?

I have two pandas dataframes:

      import pandas as pd

      df_1 = pd.DataFrame({'ID': [1, 2, 4, 7, 30], 
                           'Instrument': ['temp', 'temp_sensor', 'temp_sensor',
                                'sensor', 'sensor'],
                           'Value': [1000, 0, 1000, 0, 1000]})

      print(df_1)

                 ID    Instrument       Value
                  1      temp           1000
                  2    temp_sensor         0
                  4    temp_sensor       1000
                  7      sensor            0
                 30      sensor          1000

      df_2 = pd.DataFrame({'ID': [1, 30], 
                           'Instrument': ['temp', 'sensor'],
                           'Value': [1000, 1000]})

      print(df_2)

                    ID    Instrument    Value
                     1      temp        1000
                    30     sensor       1000

I need to exclude from df_1 the lines that also exist in df_2. So I made the code:

      combined = df_1.append(df_2)
      combined[~combined.index.duplicated(keep=False)]

The (wrong) output is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

                  ID    Instrument      Value
                   4    temp_sensor     1000
                   7    sensor             0
                  30    sensor          1000

I would like the output to be:

                   ID      Instrument       Value
                  2    temp_sensor         0
                  4    temp_sensor       1000
                  7      sensor            0
                

I relied on what was explained in: How to remove a pandas dataframe from another dataframe

>Solution :

Use DataFrame.merge by all columns names with left join and parameter indicator=True and filter rows with left_only values:

s = df_1.merge(df_2, on=list(df_1.columns), how='left', indicator=True)['_merge']
df = df_1.loc[s == 'left_only']
print(df)
   ID   Instrument  Value
1   2  temp_sensor      0
2   4  temp_sensor   1000
3   7       sensor      0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading