Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python, Can't Transform Date Array to days

I have this csv file "rfm_data.csv":

CustomerID PurchaseDate  TransactionAmount ProductInformation
8814       11-04-23             943.31          Product C
2188       11-04-23             463.70          Product A
4608       11-04-23              80.28          Product A
2559       11-04-23             221.29          Product A

I read and transform data with this code:

    data = pd.read_csv("rfm_data.csv")
    data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'],  format='%d-%m-%y')
    data['Recency'] = (datetime.now().date() - data['PurchaseDate'].dt.date).dt.days

When I print (data) I get this error message:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

AttributeError: Can only use .dt accessor with datetimelike values. Did you mean: 'at'?

If I delete the dt.day from the last line of code I got this result:

CustomerID PurchaseDate  TransactionAmount ProductInformation Recency
8814       2023-04-11             943.31          Product C   140 days, 0:00:00
2188       2023-04-11             463.70          Product A   140 days, 0:00:00
4608       2023-04-11              80.28          Product A   140 days, 0:00:00
2559       2023-04-11             221.29          Product A   140 days, 0:00:00

But what I want in [Recency] is only the number of days to make further calculations.

>Solution :

Your problem lies in calling .dt.date, which returns a vanilla Python date object column – that has no dt accessor. Since your input only has dates, normalizing to the date is not needed. If you need to do it anyways (other use case maybe), use .dt.floor("d").

EX:

from io import StringIO
import pandas as pd

s = """CustomerID PurchaseDate TransactionAmount ProductInformation
8814 11-04-23 943.31 Product-C
2188 11-04-23 463.70 Product-A
4608 11-04-23 80.28 Product-A
2559 11-04-23 221.29 Product-A"""

data = pd.read_csv(StringIO(s), sep=" ")
data['PurchaseDate'] = pd.to_datetime(data['PurchaseDate'],  format='%d-%m-%y')
data['Recency'] = (pd.Timestamp("now").floor("d") - data['PurchaseDate']).dt.days

print(data)
   CustomerID PurchaseDate  TransactionAmount ProductInformation  Recency
0        8814   2023-04-11             943.31          Product-C      140
1        2188   2023-04-11             463.70          Product-A      140
2        4608   2023-04-11              80.28          Product-A      140
3        2559   2023-04-11             221.29          Product-A      140

Note that you can also use pd.Timestamp("now").floor("d") to get today’s date, which makes the code a bit more clean since you use pandas exclusively.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading