Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas dataframe: convert column to number with default value

I’m surprised I couldn’t find anything about this:

Q: How can I convert values of a column to numbers with a default value if casting fails.

I need all values to be numbers in order to perform arithmetical functions on them. The raw input data contains non-convertible values like "","**", "not available", ….

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

E.g.

1
2
*
4

Should become:

1
2
0
4

What I’ve tried:

  • to_numberic: Does not allow to set a default value. errors=coerce will set the value to NaN, which can not be handled arithmetically afterwards.
  • df[COLUMN].apply (makeNumber) with def makeNumber(value): return 0.0 does apparently nothing on the string values.

>Solution :

If no missing values in original use:

df['col'] = pd.to_numeric(df['col'], errors='coerce').fillna(0)

If need downcast like comment solution:

df['col'] = pd.to_numeric(df['col'], errors='coerce').fillna(0, downcast="infer")
print (df)
   col
0    1
1    2
2    0
3    4

If possible missing values in original and need not replace them:

print (df)
   col
0    1
1    2
2    *
3    4
4  NaN

s = pd.to_numeric(df['col'], errors='coerce')

df['col'] = s.mask(s.isna() & df['col'].notna(), 0)
print (df)
   col
0  1.0
1  2.0
2  0.0
3  4.0
4  NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading