Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to do the mean over consecutive date time and is there a "simple" sql statement for this?

I have a MySQL database with records associated with date time of record. When several values are within a time range of 3 minutes, I want to do the mean of each values. I made a fake file to illustrate.

#dataSample.csv
;y;datetime
0;1.885539280369374;2020-12-18 00:16:59
1;88.87944658745302;2020-12-18 00:18:26
2;5.4934801892366645;2020-12-18 00:21:47
3;27.481240675960745;2020-12-22 02:22:43
4;78.20955112191257;2021-03-12 00:01:45
5;69.20174844202616;2021-03-12 00:03:01
6;92.452056802478;2021-03-12 00:04:10
7;65.44391665410022;2021-03-12 00:06:12
8;40.59036279552053;2021-03-13 11:07:40
9;97.28850548113896;2021-03-13 11:08:46
10;94.73214209590618;2021-03-13 11:09:52
11;15.032038741334246;2021-03-14 00:50:10
12;26.96629037360529;2021-03-14 00:51:17
13;57.257554884427755;2021-03-14 00:52:20
14;18.845976481042804;2021-03-17 13:52:00
15;57.19160644979182;2021-03-17 13:53:48
16;3.81419643210113;2021-03-17 13:54:50
17;46.65212265222033;2021-03-17 20:00:06
18;78.99788944141437;2021-03-17 20:01:28
19;72.57950242929162;2021-03-17 20:02:18
20;31.953619913660063;2021-03-20 16:40:04
21;71.03880579866258;2021-03-20 16:41:14
22;80.07721218822367;2021-03-20 16:42:03
23;84.4974927845413;2021-03-23 23:51:04
24;23.332882564418554;2021-03-23 23:52:37
25;24.84651458538292;2021-03-23 23:53:44
26;3.2905723920299073;2021-04-13 01:07:13
27;95.00543057651691;2021-04-13 01:08:53
28;46.02579988887248;2021-04-13 01:10:03
29;71.73362449536457;2021-04-13 07:54:22
30;93.17353939667422;2021-04-13 07:56:03
31;28.06669274690586;2021-04-13 07:57:04
32;10.733532291051478;2021-04-21 23:52:19
33;92.92374999199961;2021-04-21 23:53:02
34;59.68694726616824;2021-04-21 23:54:12
35;30.01172074266929;2021-11-29 00:21:09
36;34.905022198511915;2021-11-29 00:23:09
37;25.149590827473055;2021-11-29 00:24:13
38;82.09740354280564;2021-12-01 08:30:00
39;25.58339148753002;2021-12-01 08:32:00
40;72.7009145748645;2021-12-01 08:34:00
41;8.43474445404563;2021-12-01 13:18:58
42;57.95936012084567;2021-12-01 13:19:45
43;31.118114587376713;2021-12-01 13:21:19
44;42.082098854369576;2021-12-01 20:24:46
45;75.8402567179772;2021-12-01 20:25:45
46;55.29546227636972;2021-12-01 20:26:20
47;72.52918512264547;2021-12-02 08:35:42
48;77.81077056479849;2021-12-02 08:36:35
49;34.63717484559066;2021-12-02 08:37:22
50;71.65724478546072;2021-12-06 00:05:00
51;19.54082334014094;2021-12-06 00:08:00
52;48.28967362303979;2021-12-06 00:10:00
53;34.894095185290105;2021-12-03 08:36:00
54;58.187428474357375;2021-12-03 08:40:00
55;94.53441120864328;2021-12-03 08:45:00
56;12.272217150555866;2021-12-03 13:10:00
57;87.21292441413424;2021-12-03 13:11:00
58;86.35470090744712;2021-12-03 13:12:00
59;50.23396755270806;2021-12-06 23:46:00
60;73.30424413459407;2021-12-06 23:48:00
61;60.48531615320234;2021-12-06 23:49:00
62;56.10336877052336;2021-12-06 23:51:00
63;87.6451368964707;2021-12-07 08:37:00
64;11.902048844734905;2021-12-07 10:48:00
65;57.596744167099494;2021-12-07 10:58:00
66;61.77125104854312;2021-12-07 11:05:00
67;21.542193987296695;2021-12-07 11:28:00
68;91.64520146457525;2021-12-07 11:29:00
69;78.42486998655676;2021-12-07 16:06:00
70;79.51721853991806;2021-12-07 16:08:00
71;54.46969194684532;2021-12-07 16:09:00
72;56.092025088935785;2021-12-07 16:12:00
73;2.546437552510464;2021-12-07 18:35:00
74;11.598686235757118;2021-12-07 18:40:00
75;40.26003639570842;2021-12-07 18:45:00
76;30.697636730470848;2021-12-07 23:39:00
77;66.3177096178856;2021-12-07 23:42:00
78;73.16870525875022;2021-12-07 23:47:00
79;61.68994018242363;2021-12-08 13:47:00
80;38.06598256433572;2021-12-08 13:48:00
81;43.91268499464372;2021-12-08 13:49:00
82;33.166594417250735;2021-12-15 00:23:00
83;52.68422837459157;2021-12-15 00:24:00
84;86.01398356923765;2021-12-15 00:26:00
85;21.444108620566542;2021-12-15 00:31:00
86;86.6839608035921;2021-12-18 01:09:00
87;43.83047571188636;2022-01-06 00:24:00

Here is my code:

import pandas as pd
import numpy as np
import datetime
from datetime import datetime, timedelta

fileName = "dataSample.csv"

df = pd.read_csv(fileName, sep=";", index_col=0)

df['datetime_object'] = df['datetime'].apply(datetime.fromisoformat)

def define_mask(d, delta_minutes):
    return (d <= df["datetime_object"]) & (df["datetime_object"]<= d + timedelta(minutes=delta_minutes))

group = []
i = 0
while i < len(df):
    d = df.loc[i]["datetime_object"]
    mask = define_mask(d, 3)
    
    for k in range(len(df[mask].index)):
        group.append(i)

    i += len(df[mask].index)

df["group"] = group
df_new = df.groupby("group").apply(np.mean)

It works well but I am wondering if this is good "pandas" practice .

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have 2 questions:

  • Is there another way to do that with pandas ?
  • Is there an SQL command to do that directly ?

>Solution :

You can use resample:

df = pd.read_csv('data.csv', sep=';', index_col=0, parse_dates=['datetime'])
out = df.resample('3min', on='datetime').mean().dropna().reset_index()
print(out)

# Output
              datetime          y
0  2020-12-18 00:15:00   1.885539
1  2020-12-18 00:18:00  88.879447
2  2020-12-18 00:21:00   5.493480
3  2020-12-22 02:21:00  27.481241
4  2021-03-12 00:00:00  78.209551
..                 ...        ...
59 2021-12-15 00:21:00  33.166594
60 2021-12-15 00:24:00  69.349106
61 2021-12-15 00:30:00  21.444109
62 2021-12-18 01:09:00  86.683961
63 2022-01-06 00:24:00  43.830476

[64 rows x 2 columns]

Another way to get the first datetime value of a group of 3 minutes:

out = df.groupby(pd.Grouper(freq='3min', key='datetime'), as_index=False) \
        .agg({'y': 'mean', 'datetime': 'first'}) \
        .dropna(how='all').reset_index(drop=True)
print(out)

# Output
            y            datetime
0    1.885539 2020-12-18 00:16:59
1   88.879447 2020-12-18 00:18:26
2    5.493480 2020-12-18 00:21:47
3   27.481241 2020-12-22 02:22:43
4   78.209551 2021-03-12 00:01:45
..        ...                 ...
59  33.166594 2021-12-15 00:23:00
60  69.349106 2021-12-15 00:24:00
61  21.444109 2021-12-15 00:31:00
62  86.683961 2021-12-18 01:09:00
63  43.830476 2022-01-06 00:24:00

[64 rows x 2 columns]

Or

out = df.resample('3min', on='datetime') \
        .agg({'y': 'mean', 'datetime': 'first'}) \
        .dropna(how='all').reset_index(drop=True)`
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading