Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

New pandas DataFrame column from datetime calculation

I am trying to calculate the number of days that have elapsed since the launch of a marketing campaign. I have one row per date for each marketing campaign in my DataFrame (df) and all dates start from the same day (though there is not a data point for each day for each campaign). In column ‘b’ I have the date relating to the data points of interest (dateime64[ns]) and in column ‘c’ I have the launch date of the marketing campaign (dateime64[ns]). I would like the resulting calculation to return n/a (or np.NaN or a suitable alternative) when column ‘b’ is earlier than column ‘c’, else I would like the calculation to return the difference the two dates.

Campaign Date Launch Date Desired Column
A 2019-09-01 2022-12-01 n/a
A 2019-09-02 2022-12-01 n/a
B 2019-09-01 2019-09-01 0
B 2019-09-25 2019-09-01 24

When I try:

df['Days Since Launch'] = df['Date'] - df['Launch Date']

What I would hope returns a negative value actually returns a positive one, thus leading to duplicate values when I have dates that are 10 days prior and 10 days after the launch date.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

When I try:

df['Days Since Launch'] = np.where(df['Date'] < df['Launch Date'], XXX, df['Date'] - df['Launch Date'])

Where XXX has to be the same data type as the two input columns, so I can’t enter np.NaN because the calculation will fail, nor can I enter a date as this will still leave the same issue that i want to solve. IF statements do not work as the "truth value of a Series is ambiguous". Any ideas?

>Solution :

You can use a direct subtraction and conversion to days with dt.days, then mask the negative values with where:

s = pd.to_datetime(df['Date']).sub(pd.to_datetime(df['Launch Date'])).dt.days
# or, if already datetime:
#s = df['Date'].sub(df['Launch Date']).dt.days

df['Desired Column'] = s.where(s.ge(0))

Alternative closer to your initial attempt, using mask:

df['Desired Column'] = (df['Date'].sub(df['Launch Date'])
                         .mask(df['Date'] < df['Launch Date'])
                       )

Output:

  Campaign        Date Launch Date  Desired Column
0        A  2019-09-01  2022-12-01             NaN
1        A  2019-09-02  2022-12-01             NaN
2        B  2019-09-01  2019-09-01             0.0
3        B  2019-09-25  2019-09-01            24.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading