Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Compare two column Pandas row per row

I want to compare 2 column. If same will True if not same will False like this:

filtering lemmatization check
[hello, world] [hello, world] True
[grape, durian] [apple, grape] False

The output from my code is all False. But, the data actually is different. Why?

You can get my data github

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd

dc = pd.read_excel('./data clean (spaCy).xlsx')
dc['check'] = dc['filtering'].equals(dc['lemmatization'])

>Solution :

Here is difference between columns – in one column missing '' around strings, possible solution is convert both columns to lists, for comapre use Series.eq (working like ==):

import ast

dc = pd.read_excel('data clean (spaCy).xlsx')

#removed trailing [] and split by ` ,`
dc['filtering'] = dc['filtering'].str.strip('[]').str.split(', ')
#there are string separators, so working literal_eval
dc['lemmatization'] = dc['lemmatization'].apply(ast.literal_eval)

#compare
dc['check'] = dc['filtering'].eq(dc['lemmatization'])
print (dc.head())
   label                                          filtering  \
0      2                                         [ppkm, ya]   
1      2  [mohon, informasi, pgs, pasar, turi, ppkm, buk...   
2      2                                      [rumah, ppkm]   
3      1  [pangkal, penanganan, pandemi, indonesia, terk...   
4      1                              [ppkm, mikro, anjing]   

                                       lemmatization  check  
0                                         [ppkm, ya]   True  
1  [mohon, informasi, pgs, pasar, turi, ppkm, buk...   True  
2                                      [rumah, ppkm]   True  
3  [pangkal, tangan, pandemi, indonesia, kesan, s...  False  
4                              [ppkm, mikro, anjing]   True  

Reason for False is Series.equals return scalar, so here False

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading