Lambda function returns different output from direct code

I am checking for a condition of the difference between two values is 0.5 AND if they occurred on different dates, then it’s a flag.

Sample Data:

df = pd.DataFrame({'date1' : ['2023-05-11', '2023-02-24', '2023-07-9', '2023-01-19', '2023-02-10'],
                  'date2' : ['2023-05-11', '2023-02-24', '2023-07-8', '2023-01-17', '2023-02-10'],
                  'value1' : [9.11, .12, 49.1, 2.25, 6.22],
                  'value2' : [2.12, .86, 0.03, .17, 4.71]})

df
    date1       date2       value1  value2
0   2023-05-11  2023-05-11  9.11    2.12
1   2023-02-24  2023-02-24  0.12    0.86
2   2023-07-09  2023-07-08  49.1    0.03
3   2023-01-19  2023-01-17  2.25    0.17
4   2023-02-10  2023-02-10  6.22    4.71

df['date1'] = pd.to_datetime(df['date1'])
df['date2'] = pd.to_datetime(df['date2'])

When I try with apply function:

df.apply(lambda x : 'yes' if (abs(x['value1'] - x['value2']) > .5) & (x['date1'].date != x['date2'].date) else 'no', axis = 1)

0    yes
1    yes
2    yes
3    yes
4    yes
dtype: object

Without apply function:

(abs(df['value1'] - df['value2']) > .5) & (df['date1'].dt.date != df['date2'].dt.date)

0    False
1    False
2     True
3     True
4    False
dtype: bool

As we can see above, the direct approach without apply function is giving expected output, whereas the apply function is not. Could you please let me know why is that the case.

>Solution :

For same output need and instead & because comparing boolean scalars and add () for dates, because use Timestamp.date, not Series.dt.date:

#if omit () get built methods
print (df.apply(lambda x : x['date1'].date, axis = 1))
 0    <built-in method date of Timestamp object at 0...
 1    <built-in method date of Timestamp object at 0...
 2    <built-in method date of Timestamp object at 0...
 3    <built-in method date of Timestamp object at 0...
 4    <built-in method date of Timestamp object at 0...
 dtype: object

#get dates
print (df.apply(lambda x : x['date1'].date(), axis = 1))
0    2023-05-11
1    2023-02-24
2    2023-07-09
3    2023-01-19
4    2023-02-10
dtype: object

df.apply(lambda x : (abs(x['value1'] - x['value2']) > .5) and 
                    (x['date1'].date() != x['date2'].date()), axis = 1)

Better is your second vectorized approach:

(abs(df['value1'] - df['value2']) > .5) & (df['date1'].dt.date != df['date2'].dt.date)

If need return yes/no:

np.where((abs(df['value1'] - df['value2']) > .5) & (df['date1'].dt.date != df['date2'].dt.date), 'yes', 'no')

df.apply(lambda x : 'yes' if (abs(x['value1'] - x['value2']) > .5) and (x['date1'].date() != x['date2'].date()) else 'no', axis = 1)

Leave a Reply