Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas handling NaN values as 'nan' & 'NaN'

This is going to sound like a really stupid issue, but I’m importing file from excel, and noticed that pandas is handling values that are supposed to be NaN differently in VS Code. Here’s a visual below…

City State
‘Detroit’ ‘MI’
‘Chicago’ NaN
nan ‘CA’

This is obviously a small snapshot, but notice that row 3 in the City column is listed as nan and row 2 in the State column is NaN.

The issue is that when I use df.isna() or df.isnull() it only picks up on the NaN value in the State column, but not the city column. Unfortunately I can’t use keep_na_values = True in this situation because I need those values to be labeled as NaN so that a .isna() function can identify them later on. What’s more odd is that the cells in the excel file are empty, and on top of that I don’t seem to have this issue in Jupyter Notebook.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’ve tried using

df.City.replace(to_replace = {'nan': 'NaN'},inplace = True) 

but no luck. I also came across this post below which suggests using pd.read_excel(na_values = ['nan']) however that doesn’t allow the .isna() or .isnull() to pick up on the values.

https://stackoverflow.com/questions/50685107/pandas-dataframe-nan-values-not-replacing

Does anyone have any ideas on how I can get pandas to read all NaN values in this file in one manner rather than multiple? If it’s not able to be done then is what it is.

>Solution :

Are you sure that those nan you are referencing are not simply a string of nan? If they are could do a replace and manually set them to np.nan using numpy

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Column_1' : ['A', 'nan', 'C'],
    'Column_2' : ['1', '2', np.nan]
})

df = df.replace('nan', np.nan)
df.isna()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading