I am checking for a condition of the difference between two values is 0.5 AND if they occurred on different dates, then it’s a flag.
Sample Data:
df = pd.DataFrame({'date1' : ['2023-05-11', '2023-02-24', '2023-07-9', '2023-01-19', '2023-02-10'],
'date2' : ['2023-05-11', '2023-02-24', '2023-07-8', '2023-01-17', '2023-02-10'],
'value1' : [9.11, .12, 49.1, 2.25, 6.22],
'value2' : [2.12, .86, 0.03, .17, 4.71]})
df
date1 date2 value1 value2
0 2023-05-11 2023-05-11 9.11 2.12
1 2023-02-24 2023-02-24 0.12 0.86
2 2023-07-09 2023-07-08 49.1 0.03
3 2023-01-19 2023-01-17 2.25 0.17
4 2023-02-10 2023-02-10 6.22 4.71
df['date1'] = pd.to_datetime(df['date1'])
df['date2'] = pd.to_datetime(df['date2'])
When I try with apply function:
df.apply(lambda x : 'yes' if (abs(x['value1'] - x['value2']) > .5) & (x['date1'].date != x['date2'].date) else 'no', axis = 1)
0 yes
1 yes
2 yes
3 yes
4 yes
dtype: object
Without apply function:
(abs(df['value1'] - df['value2']) > .5) & (df['date1'].dt.date != df['date2'].dt.date)
0 False
1 False
2 True
3 True
4 False
dtype: bool
As we can see above, the direct approach without apply function is giving expected output, whereas the apply function is not. Could you please let me know why is that the case.
>Solution :
For same output need and instead & because comparing boolean scalars and add () for dates, because use Timestamp.date, not Series.dt.date:
#if omit () get built methods
print (df.apply(lambda x : x['date1'].date, axis = 1))
0 <built-in method date of Timestamp object at 0...
1 <built-in method date of Timestamp object at 0...
2 <built-in method date of Timestamp object at 0...
3 <built-in method date of Timestamp object at 0...
4 <built-in method date of Timestamp object at 0...
dtype: object
#get dates
print (df.apply(lambda x : x['date1'].date(), axis = 1))
0 2023-05-11
1 2023-02-24
2 2023-07-09
3 2023-01-19
4 2023-02-10
dtype: object
df.apply(lambda x : (abs(x['value1'] - x['value2']) > .5) and
(x['date1'].date() != x['date2'].date()), axis = 1)
Better is your second vectorized approach:
(abs(df['value1'] - df['value2']) > .5) & (df['date1'].dt.date != df['date2'].dt.date)
If need return yes/no:
np.where((abs(df['value1'] - df['value2']) > .5) & (df['date1'].dt.date != df['date2'].dt.date), 'yes', 'no')
df.apply(lambda x : 'yes' if (abs(x['value1'] - x['value2']) > .5) and (x['date1'].date() != x['date2'].date()) else 'no', axis = 1)