Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Method to manage "NAN" (in capital letters) with Pandas?

do you know if there is a way to manage the "NAN" all in capital letters present in a data file with Pandas?

I have some data files have this format:

"2020-08-14 14:00:00",10,154.9554,153.6879,154.3988,158.5282,"NAN","NAN",158.43,"NAN",155.2103

.isnull() and .isna() functions don’t handle when "NAN" is capitalized but handle it when it is written this way "NaN" or "nan".

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thank you in advance, I looked for other topics but nothing on this specific subject.

>Solution :

isnull and isna do NOT return True for strings, no matter the case.

Most likely you have a mix of real NaN and of strings:

s = pd.Series([float('nan'), 'NAN', 'nan', 'NaN'])
df = pd.concat({'s': s, 'isnull': s.isnull(), 'isna': s.isna()}, axis=1)

output:

     s  isnull   isna
0  NaN    True   True
1  NAN   False  False
2  nan   False  False
3  NaN   False  False

Now, by default, read_csv recognizes the following strings as NaN:

'', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN',
'-NaN', '-nan', '1.#IND', '1.#QNAN', '<NA>', 'N/A',
'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'

You can add "NAN" with the na_values option:

df = pd.read_csv(..., na_values=['NAN'])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading