Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Lambda function returns different output from direct code

I am checking for a condition of the difference between two values is 0.5 AND if they occurred on different dates, then it’s a flag.

Sample Data:

df = pd.DataFrame({'date1' : ['2023-05-11', '2023-02-24', '2023-07-9', '2023-01-19', '2023-02-10'],
                  'date2' : ['2023-05-11', '2023-02-24', '2023-07-8', '2023-01-17', '2023-02-10'],
                  'value1' : [9.11, .12, 49.1, 2.25, 6.22],
                  'value2' : [2.12, .86, 0.03, .17, 4.71]})

df
    date1       date2       value1  value2
0   2023-05-11  2023-05-11  9.11    2.12
1   2023-02-24  2023-02-24  0.12    0.86
2   2023-07-09  2023-07-08  49.1    0.03
3   2023-01-19  2023-01-17  2.25    0.17
4   2023-02-10  2023-02-10  6.22    4.71

df['date1'] = pd.to_datetime(df['date1'])
df['date2'] = pd.to_datetime(df['date2'])

When I try with apply function:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.apply(lambda x : 'yes' if (abs(x['value1'] - x['value2']) > .5) & (x['date1'].date != x['date2'].date) else 'no', axis = 1)

0    yes
1    yes
2    yes
3    yes
4    yes
dtype: object

Without apply function:

(abs(df['value1'] - df['value2']) > .5) & (df['date1'].dt.date != df['date2'].dt.date)

0    False
1    False
2     True
3     True
4    False
dtype: bool

As we can see above, the direct approach without apply function is giving expected output, whereas the apply function is not. Could you please let me know why is that the case.

>Solution :

For same output need and instead & because comparing boolean scalars and add () for dates, because use Timestamp.date, not Series.dt.date:

#if omit () get built methods
print (df.apply(lambda x : x['date1'].date, axis = 1))
 0    <built-in method date of Timestamp object at 0...
 1    <built-in method date of Timestamp object at 0...
 2    <built-in method date of Timestamp object at 0...
 3    <built-in method date of Timestamp object at 0...
 4    <built-in method date of Timestamp object at 0...
 dtype: object

#get dates
print (df.apply(lambda x : x['date1'].date(), axis = 1))
0    2023-05-11
1    2023-02-24
2    2023-07-09
3    2023-01-19
4    2023-02-10
dtype: object

df.apply(lambda x : (abs(x['value1'] - x['value2']) > .5) and 
                    (x['date1'].date() != x['date2'].date()), axis = 1)

Better is your second vectorized approach:

(abs(df['value1'] - df['value2']) > .5) & (df['date1'].dt.date != df['date2'].dt.date)

If need return yes/no:

np.where((abs(df['value1'] - df['value2']) > .5) & (df['date1'].dt.date != df['date2'].dt.date), 'yes', 'no')

df.apply(lambda x : 'yes' if (abs(x['value1'] - x['value2']) > .5) and (x['date1'].date() != x['date2'].date()) else 'no', axis = 1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading