What I want here is for the Nans to ultimately be integer values. Since my dataset is 1000s of columns, I can’t just change a couple of columns to make them integer and when I tried
df = df.astype('int') in Dask, after changing the values to 0 floats, for whatever reason, it didn’t work. `
While the values below have all reverted to floats in Pandas; in Dask, only some of the columns’ zero values reverted to floats. I figure if I can solve this issue in Pandas, then likely it will also solve it in Dask (Fingers crossed).
import pandas as pd import numpy as np data = [['tom', 10, 15000], ['nick', 15, 12000], ['juli', 5, 20000]] # Create the pandas DataFrame df = pd.DataFrame(data, columns = ['Name', 'Age', 'salary']) import numpy as np df = df.replace(5, np.nan) df = df.replace(12000, np.nan) expanded = df.replace(np.nan, '0') expanded = expanded.replace('0', 0) expanded
from dask.dataframe import from_pandas ddf = from_pandas(df, npartitions=2) out = ddf.select_dtypes('number').fillna(0).astype('int64')
>>> out.compute() Age salary 0 10 15000 1 15 0 2 0 20000