I have dataframe like as below
cust_id,purchase_date
1,10/01/1998
1,10/12/1999
2,13/05/2016
3,14/02/2018
3,15/03/2019
I would like to do the below
a) display the output in text format as 5 years and 9 months instead of 5.93244 etc.
I tried the below
from datetime import timedelta
df['purchase_date'] = pd.to_datetime(df['purchase_date'])
gb = df_new.groupby(['unique_key'])
df_cust_age = gb['purchase_date'].agg(min_date=np.min, max_date=np.max).reset_index()
df_cust_age['diff_in_days'] = df_cust_age['max_date'] - df_cust_age['min_date']
df_cust_age['years_diff'] = df_cust_age['diff_in_days']/timedelta(days=365)
but the above code gives the output in decimal numbers.
I expect my output to be like as below
cust_id,years_diff
1, 1 years and 11 months and 0 day
2, 0 years
3, 1 year and 1 month and 1 day
>Solution :
If possible create 'default' month with 30 days use this custom function:
#https://stackoverflow.com/a/13756038/2901002
def td_format(td_object):
seconds = int(td_object.total_seconds())
periods = [
('year', 60*60*24*365),
('month', 60*60*24*30),
('day', 60*60*24),
('hour', 60*60),
('minute', 60),
('second', 1)
]
strings=[]
for period_name, period_seconds in periods:
if seconds > period_seconds:
period_value , seconds = divmod(seconds, period_seconds)
has_s = 's' if period_value > 1 else ''
strings.append("%s %s%s" % (period_value, period_name, has_s))
return ", ".join(strings) if len(strings) > 0 else '0 year'
df_cust_age['years_diff'] = df_cust_age['diff_in_days'].apply(td_format)
print (df_cust_age)
cust_id min_date max_date diff_in_days years_diff
0 1 1998-10-01 1999-10-12 376 days 1 year, 11 days
1 2 2016-05-13 2016-05-13 0 days 0 year
2 3 2018-02-14 2019-03-15 394 days 1 year, 29 days