Pandas handling NaN values as 'nan' & 'NaN'

March 2, 2023

This is going to sound like a really stupid issue, but I’m importing file from excel, and noticed that pandas is handling values that are supposed to be NaN differently in VS Code. Here’s a visual below…

City	State
‘Detroit’	‘MI’
‘Chicago’	NaN
nan	‘CA’

This is obviously a small snapshot, but notice that row 3 in the City column is listed as nan and row 2 in the State column is NaN.

The issue is that when I use df.isna() or df.isnull() it only picks up on the NaN value in the State column, but not the city column. Unfortunately I can’t use keep_na_values = True in this situation because I need those values to be labeled as NaN so that a .isna() function can identify them later on. What’s more odd is that the cells in the excel file are empty, and on top of that I don’t seem to have this issue in Jupyter Notebook.

I’ve tried using

df.City.replace(to_replace = {'nan': 'NaN'},inplace = True)

but no luck. I also came across this post below which suggests using pd.read_excel(na_values = ['nan']) however that doesn’t allow the .isna() or .isnull() to pick up on the values.

https://stackoverflow.com/questions/50685107/pandas-dataframe-nan-values-not-replacing

Does anyone have any ideas on how I can get pandas to read all NaN values in this file in one manner rather than multiple? If it’s not able to be done then is what it is.

>Solution :

Are you sure that those nan you are referencing are not simply a string of nan? If they are could do a replace and manually set them to np.nan using numpy

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Column_1' : ['A', 'nan', 'C'],
    'Column_2' : ['1', '2', np.nan]
})

df = df.replace('nan', np.nan)
df.isna()