Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why am I getting a 'float' has no attribute 'fillna' error when using fillna inside a function in Pandas?

Why don’t fillna and other functions work inside a function?

I have a DataFrame with 10 columns. I would like to write a function taking each column and creating multiple columns. My final DataFrame would be 50 columns.

def newVars(df,col='my_var'):
    df[col+'_filled'] = df[col].fillna(0)
    df[col+'_rank'] = df[col].fillna(0).rank()
    df[col+'_percentile'] = df[col].fillna(0).rank(pct=True)
    df[col+'_halved'] = df[col]/2
    return df

new_df = df.apply(newVars, axis=1)

I get the error: ‘float’ has no attribute ‘fillna’

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I am expecting a DataFrame with 5 times the columns of my initial DataFrame. If I take the line outside of the function it works fine:

df['my_var_filled'] = df['my_var].fillna(0)

>Solution :

apply doesn’t really make sense in your context.

It rather looks like you should pass the DataFrame to the function:

df = pd.DataFrame({'my_var': [1,3,20]})

def newVars(df,col='my_var'):
    df[col+'_filled'] = df[col].fillna(0)
    df[col+'_rank'] = df[col].fillna(0).rank()
    df[col+'_percentile'] = df[col].fillna(0).rank(pct=True)
    df[col+'_halved'] = df[col]/2
    return df

new_df = newVarsars(df)

Or use pipe:

df = pd.DataFrame({'my_var': [1,3,20]})

def newVars(df,col='my_var'):
    df[col+'_filled'] = df[col].fillna(0)
    df[col+'_rank'] = df[col].fillna(0).rank()
    df[col+'_percentile'] = df[col].fillna(0).rank(pct=True)
    df[col+'_halved'] = df[col]/2
    return df

new_df = df.pipe(newVarsars)

Output:

   my_var  my_var_filled  my_var_rank  my_var_percentile  my_var_halved
0       1              1          1.0           0.333333            0.5
1       3              3          2.0           0.666667            1.5
2      20             20          3.0           1.000000           10.0

Note that in both cases your function mutates df in place and outputs it. I would recommend to do one or the other, not both.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading