Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to compare 2 dataframes and then output an ID to inform if a row has changed?

I have 2 dataframes:

df1 = pd.DataFrame({"id1": ["A", "B", "C", "D"], "id2": ["1", "2", "2", "1"]})

df2 = pd.DataFrame({"id1": ["A", "B", "F", "C", "D", "E"], "id2": ["1", "1", "2", "1", "1", "2"]})

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want to find the rows which are different, here is the code I tried:

vals = set(df2.iloc[:, :2].apply(tuple, 1)) \ - set(df1.apply(tuple, 1))

The result is the following which gives me the rows which are different between the 2 dataframes:

{('B', '1'), ('E', '2'), ('C', '1'), ('F', '2')}

I would like to get an additional output which tells me which rows of df2 have been modified or not (Y for yes and N for No) :

id1 Modified_ID
0 A N
1 B Y
2 F N
3 C Y
4 D N
5 E N

>Solution :

You can merge and build a boolean mask that returns True if a value didn’t change (or was expanded) and map Y/N values according to it:

df2 = df2.merge(df1, on='id1', how='left', suffixes=('_',''))
df2['Modified_ID'] = (df2['id2_'].eq(df2['id2']) | df2['id2'].isna()).map({True:'N', False:'Y'})
df2 = df2.drop(columns=['id2_','id2'])

Output:

  id1 Modified_ID
0   A           N
1   B           Y
2   F           N
3   C           Y
4   D           N
5   E           N
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading