I am trying to create new a calculated date column by subtracting two date columns, but only if a condition in a 3rd column (column_name) is met (it cannot be NaN). This is the code I have so far but it is not working (see error below). Thank you in advance as I am a beginner.
def function(column_name):
if df[df.column_name.notna()] == True:
return (df['date1']-df['date2']).astype('timedelta64[s]')
else:
return 'Null'
df['New_Calculated_Column'] = df['columnname'].apply(function)
Received this error:
Unexpected exception formatting exception. Falling back to standard exception
>Solution :
You can use something like this, here I am calculating the time difference similar to your calculations but multiplying with a series that indicates if the column col
is NaN
or not:
# dummy init data:
choices = [np.NaN, 2]
df = pd.DataFrame({'date1':pd.date_range('2022-01-01', '2022-01-10'), 'date2':pd.date_range('2023-01-01', '2023-01-10'), 'col': np.random.choice(choices, 10)} )
df
df['new_col'] = (df['date1']-df['date2']).astype('timedelta64[s]') * (~df['col'].isna())
Output:
date1 date2 col new_col
0 2022-01-01 2023-01-01 2.0 -31536000.0
1 2022-01-02 2023-01-02 2.0 -31536000.0
2 2022-01-03 2023-01-03 2.0 -31536000.0
3 2022-01-04 2023-01-04 2.0 -31536000.0
4 2022-01-05 2023-01-05 2.0 -31536000.0
5 2022-01-06 2023-01-06 NaN -0.0
6 2022-01-07 2023-01-07 NaN -0.0
7 2022-01-08 2023-01-08 2.0 -31536000.0
8 2022-01-09 2023-01-09 NaN -0.0
9 2022-01-10 2023-01-10 NaN -0.0