Filling NAs to string 0 and then to int reverts to float


What I want here is for the Nans to ultimately be integer values. Since my dataset is 1000s of columns, I can’t just change a couple of columns to make them integer and when I tried df = df.astype('int') in Dask, after changing the values to 0 floats, for whatever reason, it didn’t work. `

While the values below have all reverted to floats in Pandas; in Dask, only some of the columns’ zero values reverted to floats. I figure if I can solve this issue in Pandas, then likely it will also solve it in Dask (Fingers crossed).

import pandas as pd
import numpy as np

data = [['tom', 10, 15000], ['nick', 15, 12000], ['juli', 5, 20000]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age', 'salary'])

import numpy as np
df = df.replace(5, np.nan)
df = df.replace(12000, np.nan)

expanded = df.replace(np.nan, '0')
expanded = expanded.replace('0', 0)

>Solution :


from dask.dataframe import from_pandas
ddf = from_pandas(df, npartitions=2)

out = ddf.select_dtypes('number').fillna(0).astype('int64')


>>> out.compute()
   Age  salary
0   10   15000
1   15       0
2    0   20000

Leave a ReplyCancel reply