Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

List the differences between two dataframe columns ignoring case

I am trying to compare columns from 2 dataframes and return the difference, ignoring case. Here is what I have so far:

import pandas as pd

if __name__ == "__main__":
    data1={'Name':['Karan','Rohit','Sahil','Aryan']}
    data2={'Name':['karan','Rohit','Sahil']}

    df1=pd.DataFrame(data1)
    df2=pd.DataFrame(data2)

    print(list(set(df1['Name']).difference(df2['Name'])))

This code prints ['Karan', 'Aryan']. How do I modify this to ignore case so that karan and Karan are recognized as a match and only Aryan is returned?

I don’t want to use the following because it returns aryan and I want to maintain the capitalization of the row. In my real case, they are not first names so it isn’t as easy as making the first letter capitalized again after taking the difference.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    print(list(set(df1['Name'].str.lower()).difference(df2['Name'].str.lower())))

>Solution :

To perform a case insensitive comparison, use str.casefold:

print(list(set(df1['Name'].str.casefold()).difference(df2['Name'].str.casefold())))

If you want to keep the original case use boolean indexing with isin:

df1.loc[~df1['Name'].str.casefold().isin(df2['Name'].str.casefold()), 'Name'].unique()

Output:

array(['Aryan'], dtype=object)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading