Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

A line in a self define function not working in pandas

I try and defined a function to process a df (like adding columns and convert all cols head to lower case) before doing the analysis. All other line works fine except the line that I tried to rearrange the columns orders.

the function looks like this

def cleanDf(df):    
    df.columns = df.columns.str.replace(' ','_')
    df.columns = df.columns.str.lower()
    df['date1'] = pd.to_datetime(df['date'].astype(str) + ' ' + df['time'].astype(str))
    df['weekday'] =  df['date1'].dt.day_name()
    
    business_hour_mask = (df['date1'].dt.hour >=9) & (df['date1'].dt.hour <=18)
    df['business_hour'] = np.where(business_hour_mask, "Yes","No")
    df['week_number'] = df.date1.dt.week
    
    df = df.reindex(['date1','week_number','weekday','business_hour','changed_by','customer','field_name','new_value','old_value','new_value.1','old_value.1','date','time','company_code','sales_organization','distribution_channel','division'], axis=1) 
    #problem line, i've tried both with and without "df = " in front of this line
    
    return df

my current workaround is to insert that line after i call the function then it works

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

cleanDf(df)

df = df.reindex(['date1','week_number','weekday','business_hour','changed_by','customer','field_name','new_value','old_value','new_value.1','old_value.1','date','time','company_code','sales_organization','distribution_channel','division'], axis=1)

df.head()

Appreciate if you can advise why the line does not inside the function, but ok when executed separately.

thank you very much

>Solution :

It’s because you’re reassigning the df variable inside the function, where it’s just a parameter. Since you’re returing df though, it’s simple. Just write df = cleanDf(df) instead of just cleanDf(df):

df = cleanDf(df)
df.head()

Per @mozway’s comment, you should also define your cleanDf function like so:

def cleanDf(df):
    df = df.copy()
    # ... do your stuff ...
    return df
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading