Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merge and find the diff in df

Compare if A column is present in C, if not the type Vnot found. if present then compare B and D. if not present type T not found else NA.

import pandas as pd
data1 = {'A': [1, 2, 3],
         'B': ['p', 's', 'r']}

data2 = {'C': ['3', '2', '5'],
         'D': ['p', 'r', 'q']}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Function to compare and generate new column
def compare_columns(row):
    if row['A'] in df2['C'].astype(int).values:
        idx = df2[df2['C'].astype(int) == row['A']].index[0]
        if row['B'] == df2.loc[idx, 'D']:
            return 'NA'
        else:
            return 'T Not Found'
    else:
        return 'V Not Found'

# Apply the function to generate the new column 'E'
df1['E'] = df1.apply(compare_columns, axis=1)
df1.loc[df1['E'] == 'V Not Found', 'E'] = 'V not found'
df1.loc[df1['E'] == 'T Not Found', 'E'] = 'T Not Found'
df1.loc[df1['E'] == 'NA', 'E'] = 'NA'

print(df1)

expected output:

`   A  B            E
0  1  p  V not found
1  2  s  T Not Found
2  3  r  NA

`
i want the expected output as given above but i get

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

`   A  B            E
0  1  p  V not found
1  2  s  T Not Found
2  3  r  T Not Found`

>Solution :

It seems to me that you can use essentially the same code to test if "A" in "C" and if "B" in "D"

import pandas as pd

df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['p', 's', 'r']
})

df2 = pd.DataFrame({
    'C': ['3', '2', '5'],
    'D': ['p', 'r', 'q']
})

def compare_columns(row):
    if row['A'] not in df2['C'].astype(int).values:
        return 'V Not Found'
    
    if row['B'] not in df2['D'].astype(str).values:
        return 'T Not Found'

    return 'NA'

df1['E'] = df1.apply(compare_columns, axis=1)

print(df1)

That should give you:

   A  B            E
0  1  p  V Not Found
1  2  s  T Not Found
2  3  r           NA
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading