How to create a calculated column function based on conditions on another column

I am trying to create new a calculated date column by subtracting two date columns, but only if a condition in a 3rd column (column_name) is met (it cannot be NaN). This is the code I have so far but it is not working (see error below). Thank you in advance as I am a beginner.

def function(column_name):
    if df[df.column_name.notna()] == True:
        return (df['date1']-df['date2']).astype('timedelta64[s]')
    else:
        return 'Null'

df['New_Calculated_Column'] = df['columnname'].apply(function)

Received this error:

Unexpected exception formatting exception. Falling back to standard exception

>Solution :

You can use something like this, here I am calculating the time difference similar to your calculations but multiplying with a series that indicates if the column col is NaN or not:

# dummy init data:
choices = [np.NaN, 2]
df = pd.DataFrame({'date1':pd.date_range('2022-01-01', '2022-01-10'), 'date2':pd.date_range('2023-01-01', '2023-01-10'), 'col': np.random.choice(choices, 10)} )
df

df['new_col'] = (df['date1']-df['date2']).astype('timedelta64[s]') * (~df['col'].isna())

Output:

       date1      date2  col     new_col
0 2022-01-01 2023-01-01  2.0 -31536000.0
1 2022-01-02 2023-01-02  2.0 -31536000.0
2 2022-01-03 2023-01-03  2.0 -31536000.0
3 2022-01-04 2023-01-04  2.0 -31536000.0
4 2022-01-05 2023-01-05  2.0 -31536000.0
5 2022-01-06 2023-01-06  NaN        -0.0
6 2022-01-07 2023-01-07  NaN        -0.0
7 2022-01-08 2023-01-08  2.0 -31536000.0
8 2022-01-09 2023-01-09  NaN        -0.0
9 2022-01-10 2023-01-10  NaN        -0.0

Leave a Reply