Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Average time between delivered_dates for each product_id – pandas

How to find average time between delivered_date for each product_id?

Dataset Sample

    data = {
    'product_id':[27036,1748, 8849, 10280, 28712, 27036, 1748, 8849, 10280, 28712, 27036, 1748, 8849, 10280, 28712, 27036, 1748, 8849, 10280,28712],
    'delivered_datetime':pd.to_datetime(["2021-11-20 04:17:07 UTC", "2020-11-27 00:23:33 UTC", "2020-11-28 22:51:57 UTC", "2022-02-08 07:15:09 UTC","2022-01-21 09:10:38 UTC", "2020-11-12 12:57:58 UTC","2021-10-26 23:03:33 UTC","2022-01-01 09:00:10 UTC", "2022-02-14 05:34:04 UTC", "2022-02-13 05:52:01 UTC", "2022-03-08 10:50:06 UTC", "2020-10-08 09:53:50 UTC", "2021-03-23 14:33:21 UTC", "2021-03-21 13:19:59 UTC", "2021-12-31 13:41:11 UTC", "2021-05-17 03:09:10 UTC", "2021-05-17 01:06:48 UTC", "2021-06-10 15:47:14 UTC", "2022-03-07 07:23:15 UTC","2022-02-23 14:58:57 UTC"]),
    'price':[12.55, 44.7, 6.84, 15.73, 2.12, 27.11, 8.73, 18.17, 4.26, 13.64, 21.62, 64.26, 14.72, 11.6, 15.05, 14.79, 16.73, 3.14, 39.82, 25.56
],

}
df =pd.DataFrame(data)
|id | product_id| delivered_date            | price |
|-- | ----------| ------------------------- | ----- |
| 0 | 27036     | 2021-11-20 04:17:07+00:00 | 12.55 |
| 1 | 1748      | 2020-11-27 00:23:33+00:00 | 44.70 |
| 2 | 8849      | 2020-11-28 22:51:57+00:00 | 6.84  |
| 3 | 10280     | 2022-02-08 07:15:09+00:00 | 15.73 |
| 4 | 28712     | 2022-01-21 09:10:38+00:00 | 2.12  |
| 5 | 27036     | 2020-11-12 12:57:58+00:00 | 27.11 |
| 6 | 1748      | 2021-10-26 23:03:33+00:00 | 8.73  |
| 7 | 8849      | 2022-01-01 09:00:10+00:00 | 18.17 |
| 8 | 10280     | 2022-02-14 05:34:04+00:00 | 4.2   |
| 9 | 28712     | 2022-02-13 05:52:01+00:00 | 13.64 |
| 10| 27036     | 2022-03-08 10:50:06+00:00 | 21.62 |
| 11| 1748      | 2020-10-08 09:53:50+00:00 | 64.26 |

Average time between delivered_dates for each product_id?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    df['interval'] = df.groupby('product_id')['delivered_datetime']\
                  .diff().mean()

This produces one time interval for all rows, rather than separate averages for each product.

Expected output

|id | product_id| delivered_date            | price |Exp_out|
|-- | ----------| ------------------------- | ----- |----- |
| 0 | 27036     | 2021-11-20 04:17:07+00:00 | 12.55 |a     |
| 1 | 1748      | 2020-11-27 00:23:33+00:00 | 44.70 |b     |
| 2 | 8849      | 2020-11-28 22:51:57+00:00 | 6.84  |c     |
| 3 | 10280     | 2022-02-08 07:15:09+00:00 | 15.73 |d     |
| 4 | 28712     | 2022-01-21 09:10:38+00:00 | 2.12  |e     |
| 5 | 27036     | 2020-11-12 12:57:58+00:00 | 27.11 |a     |
| 6 | 1748      | 2021-10-26 23:03:33+00:00 | 8.73  |b     |
| 7 | 8849      | 2022-01-01 09:00:10+00:00 | 18.17 |c     |
| 8 | 10280     | 2022-02-14 05:34:04+00:00 | 4.2   |d     |
| 9 | 28712     | 2022-02-13 05:52:01+00:00 | 13.64 |e     |
| 10| 27036     | 2022-03-08 10:50:06+00:00 | 21.62 |a     |
| 11| 1748      | 2020-10-08 09:53:50+00:00 | 64.26 |e     |

Where a,b,c,d and e refer to the average timedelta for each product

>Solution :

You should probably sort on the date first. Then groupby after calculating the differences to get the mean as a new column:

#calculate the average time difference
df["interval"] = df.groupby("product_id")["delivered_datetime"].transform(lambda x: x.sort_values().diff().mean())

#convert interval to days if needed
df["interval"] = df["interval"].dt.total_seconds().div(86400)

>>> df
    product_id        delivered_datetime  price    interval
0        27036 2021-11-20 04:17:07+00:00  12.55  160.303735
1         1748 2020-11-27 00:23:33+00:00  44.70  127.849471
2         8849 2020-11-28 22:51:57+00:00   6.84  132.807458
3        10280 2022-02-08 07:15:09+00:00  15.73  116.917423
4        28712 2022-01-21 09:10:38+00:00   2.12   18.018002
5        27036 2020-11-12 12:57:58+00:00  27.11  160.303735
6         1748 2021-10-26 23:03:33+00:00   8.73  127.849471
7         8849 2022-01-01 09:00:10+00:00  18.17  132.807458
8        10280 2022-02-14 05:34:04+00:00   4.26  116.917423
9        28712 2022-02-13 05:52:01+00:00  13.64   18.018002
10       27036 2022-03-08 10:50:06+00:00  21.62  160.303735
11        1748 2020-10-08 09:53:50+00:00  64.26  127.849471
12        8849 2021-03-23 14:33:21+00:00  14.72  132.807458
13       10280 2021-03-21 13:19:59+00:00  11.60  116.917423
14       28712 2021-12-31 13:41:11+00:00  15.05   18.018002
15       27036 2021-05-17 03:09:10+00:00  14.79  160.303735
16        1748 2021-05-17 01:06:48+00:00  16.73  127.849471
17        8849 2021-06-10 15:47:14+00:00   3.14  132.807458
18       10280 2022-03-07 07:23:15+00:00  39.82  116.917423
19       28712 2022-02-23 14:58:57+00:00  25.56   18.018002
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading