This is going to sound like a really stupid issue, but I’m importing file from excel, and noticed that pandas is handling values that are supposed to be NaN differently in VS Code. Here’s a visual below…
| City | State |
|---|---|
| ‘Detroit’ | ‘MI’ |
| ‘Chicago’ | NaN |
| nan | ‘CA’ |
This is obviously a small snapshot, but notice that row 3 in the City column is listed as nan and row 2 in the State column is NaN.
The issue is that when I use df.isna() or df.isnull() it only picks up on the NaN value in the State column, but not the city column. Unfortunately I can’t use keep_na_values = True in this situation because I need those values to be labeled as NaN so that a .isna() function can identify them later on. What’s more odd is that the cells in the excel file are empty, and on top of that I don’t seem to have this issue in Jupyter Notebook.
I’ve tried using
df.City.replace(to_replace = {'nan': 'NaN'},inplace = True)
but no luck. I also came across this post below which suggests using pd.read_excel(na_values = ['nan']) however that doesn’t allow the .isna() or .isnull() to pick up on the values.
https://stackoverflow.com/questions/50685107/pandas-dataframe-nan-values-not-replacing
Does anyone have any ideas on how I can get pandas to read all NaN values in this file in one manner rather than multiple? If it’s not able to be done then is what it is.
>Solution :
Are you sure that those nan you are referencing are not simply a string of nan? If they are could do a replace and manually set them to np.nan using numpy
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Column_1' : ['A', 'nan', 'C'],
'Column_2' : ['1', '2', np.nan]
})
df = df.replace('nan', np.nan)
df.isna()