Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas convert numeric year to text value till day resolution

I have dataframe like as below

cust_id,purchase_date
   1,10/01/1998
   1,10/12/1999
   2,13/05/2016
   3,14/02/2018
   3,15/03/2019

I would like to do the below

a) display the output in text format as 5 years and 9 months instead of 5.93244 etc.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried the below

from datetime import timedelta
df['purchase_date'] = pd.to_datetime(df['purchase_date'])
gb = df_new.groupby(['unique_key'])
df_cust_age = gb['purchase_date'].agg(min_date=np.min, max_date=np.max).reset_index()
df_cust_age['diff_in_days'] = df_cust_age['max_date'] - df_cust_age['min_date']
df_cust_age['years_diff'] = df_cust_age['diff_in_days']/timedelta(days=365)

but the above code gives the output in decimal numbers.

I expect my output to be like as below

cust_id,years_diff
  1, 1 years and 11 months and 0 day
  2, 0 years
  3, 1 year and 1 month and 1 day

>Solution :

If possible create 'default' month with 30 days use this custom function:

#https://stackoverflow.com/a/13756038/2901002

def td_format(td_object):
    seconds = int(td_object.total_seconds())
    periods = [
        ('year',        60*60*24*365),
        ('month',       60*60*24*30),
        ('day',         60*60*24),
        ('hour',        60*60),
        ('minute',      60),
        ('second',      1)
    ]

    strings=[]
    for period_name, period_seconds in periods:
        if seconds > period_seconds:
            period_value , seconds = divmod(seconds, period_seconds)
            has_s = 's' if period_value > 1 else ''
            strings.append("%s %s%s" % (period_value, period_name, has_s))

    return ", ".join(strings) if len(strings) > 0 else '0 year'


df_cust_age['years_diff'] = df_cust_age['diff_in_days'].apply(td_format)
print (df_cust_age)
   cust_id   min_date   max_date diff_in_days       years_diff
0        1 1998-10-01 1999-10-12     376 days  1 year, 11 days
1        2 2016-05-13 2016-05-13       0 days           0 year
2        3 2018-02-14 2019-03-15     394 days  1 year, 29 days
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading