I have a column date in my dataset and column group.i want to get parameter differrence between min and max date in "date" column per group. how to do that? here i example of my data:
group date
main 2024-01-01
main 2024-01-03
main 2024-01-05
second 2024-02-05
second 2024-02-20
desire result:
group date_diff
main 4
second 15
i tried to do groupby and then i dont know how to get differences
>Solution :
Get minimal and maximal datetimes per groups by GroupBy.agg, then get difference by subtract and converting days to numbers by Series.dt.days:
df['date'] = pd.to_datetime(df['date'])
out = (df.groupby('group')['date'].agg(['min','max'])
.assign(date_diff=lambda x: x['max'].sub(x['min']).dt.days)['date_diff']
.reset_index())
print (out)
group date_diff
0 main 4
1 second 15
Or use lambda function, if large DataFrame, should be slowier:
out = (df.groupby('group')['date'].agg(lambda x: (x.max() - x.min()).days)
.reset_index(name='date_diff'))
print (out)
group date_diff
0 main 4
1 second 15