Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to ignore null values in DataFrame when comparing columns

I am new to Pandas and learning. I am reading excel to DataFrame and comparing columns and highlight the column that’s not same. For example if Column A is not same as Column B then highlight the Column B. However I have some null values in Column A and Column B. When I execute the code, I don’t want to highlight the null values in Column B. How can I do that?
Below is my code:

file = Path(path to excel)
df = pd.read_excel(file)
def color(x):
   c1 = 'background-color: red'
   m1 = x['AMOUNT A'] != x['AMOUNT B']
   m2 = x['AMOUNT C'] != x['AMOUNT D']
   df = pd.DataFrame('',index=x.index, columns=x.columns)
   df['AMOUNT B'] = np.select([m1],[c1], default='')
   df['AMOUNT D'] = np.select([m2],[c1], default='')
   return df
writer = pd.ExcelWriter(path to excel)
df.style.apply(color,axis=None).to_excel(writer, 'data', index=False)

df before color function:

    AMOUNT A  AMOUNT B  AMOUNT C  AMOUNT D
0     100.0     200.0       NaN       NaN
1     200.0     200.0      45.0      25.0
2     100.0     500.0       NaN       NaN
3       NaN       NaN       NaN       NaN
4       NaN       NaN       NaN       NaN
5     200.0       1.0       NaN       NaN

Output after running script:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

Expected output:

enter image description here

>Solution :

The issue here is NaN == NaN will return False

def color(x):
   c1 = 'background-color: red'
   m1 = x['AMOUNT A'].fillna('') != x['AMOUNT B'].fillna('')
   m2 = x['AMOUNT C'].fillna('') != x['AMOUNT D'].fillna('')
   df = pd.DataFrame('',index=x.index, columns=x.columns)
   df['AMOUNT B'] = np.select([m1],[c1], default='')
   df['AMOUNT D'] = np.select([m2],[c1], default='')
   return df

More Info

np.nan == np.nan
Out[527]: False
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading