Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Compare the values of 2 dataframe and create a new dataframe based on the values match and mismatch result

I have a dataframe df1.

df1 = pd.DataFrame([["A","X",5,4,1],["A","Y",3,1,3],["B","X",4,7,4],["B","W",3,9,3],["C","Z",7,4,5],["C","Y",1,0,6],["D","P",8,4,7],["D","Q",7,2,2]], columns=['col1', 'col2', 'col3', 'col4','col5'])
  col1 col2  col3  col4  col5
0    A    X     5     4     1
1    A    Y     3     1     3
2    B    X     4     7     4
3    B    W     3     9     3
4    C    Z     7     4     5
5    C    Y     1     0     6
6    D    P     8     4     7
7    D    Q     7     2     2

I have another dataframe df2.

df2 = pd.DataFrame([["B","W",3,7,3],["B","X",4,7,5],["C","Z",8,4,6],["C","Y",1,0,6]], columns=['col1', 'col2', 'col3', 'col4','col5'])

All the rows present in df1 are not present in df2 and their row order is different.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  col1 col2  col3  col4  col5
0    B    W     3     7     3
1    B    X     4     7     5
2    C    Z     8     4     6
3    C    Y     1     0     6

I want to compare the values of particular rows of 2 dataframe. If the value present in both the dataframe is the same, make it True otherwise False.

Expected Output:

Out = pd.DataFrame([["B","W",True,False,True],["B","X",True,True,False],["C","Z",False,True,False],["C","Y",True,True,True]], columns=['col1', 'col2', 'col3', 'col4','col5'])
  col1 col2   col3   col4   col5
0    B    W   True  False   True
1    B    X   True   True  False
2    C    Y   True   True   True
3    C    Z  False   True  False

How to do it?

>Solution :

IIUC, you could do:

# set reference columns
cols = ['col1', 'col2']

# set references as index to align the data and compare
Out = df1.set_index(cols).eq(df2.set_index(cols))

# keep only rows where there is at least one True
# and restore the references as columns
Out = Out[Out.any(axis=1)].reset_index()

output:

  col1 col2   col3   col4   col5
0    B    W   True  False   True
1    B    X   True   True  False
2    C    Y   True   True   True
3    C    Z  False   True  False
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading