Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to do an inverse operation of a merge between two pandas dataframes by id column?

I am working on a project for university in which I received two dataframes:

      import pandas as pd

      df1 = pd.DataFrame({'id': ['0','1','2','3'],
                          'name': ['sensor', 'actuador', 'sensor', 'sensor'],
                          'type':['analog', 'analog', 'digital', 'analog']})

      print(df1)

      id    name        type
       0    sensor      analog
       1    actuador    analog
       2    sensor      digital
       3    sensor      analog

      df2 = pd.DataFrame({'inst': ['0','10','2','143'],
                          'number': [100, 200, 300, 400]})

     print(df2)

     inst   number
      0      100
     10      200
      2      300
    143      400

I would like to check if the ‘id’ of df1 is different from the ‘inst’ of df2.
I would only need to return lines from df1 where the ‘id’ is not contained in the ‘inst’ of df2.

I thought of doing a merge between the dataframes using the ‘id’ and ‘inst’ columns as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

      merged_df = df1.merge(df2, left_on=['id'], right_on=['inst'])

The output is:

     print(merged_df)

     id     name      type    inst    number
     0     sensor   analog     0      100
     2     sensor   digital    2      300

However, the desired output is the opposite of this. I would like it to return the df1 with only the lines where the ‘id’ is not in ‘inst’. Also, I would like it to return only the df1 columns.

The desired output is:

       print(df_result)

     id     name    type
      1   actuador  analog
      3   sensor    analog

>Solution :

If there is only one column use boolean indexing with inverted mask by Series.isin:

df_result = df1[~df1['id'].isin(df2['inst'])]
print (df_result)
  id      name    type
1  1  actuador  analog
3  3    sensor  analog

If possible check per multiple columns is possible use alternative solution with indicator parameter and filtering left_only rows:

df_result = (df1.merge(df2, left_on=['id'], right_on=['inst'], how='left', indicator=True)
                .query("_merge == 'left_only'")
                .reindex(df1.columns, axis=1))

print (df_result)
  id      name    type
1  1  actuador  analog
3  3    sensor  analog
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading