Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculating 15mins data from Cumulative Sums

I came up with a problem.
I have real-time production data with Machine Codes [M] column, which has DateTime stamps [DateTime] with cumulative sums [Cumulative] column. From the original data, I created 15mins time intervals by flooring the DateTime column.

To problem is, I’d like to take latest CumulativeSum of the each machine for each TimeBins, and take the difference of each 15mins. time intervals. So, I will be able to calculate the 15mins. real-time production data of each machine with the new data frame, with a new column, named [Diff].

Please find the sample code for to represent my problem.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Original Data:

df=pd.DataFrame({'M':['18','18','18','19','19','19','18','18','18','19','19','19','19'],
             'Cumulative':['8','10','11','5','8','9','13','16','17','14','19','20','22'],
             'DateTime': ['2022-08-01 07:14:28','2022-08-01 07:25:58','2022-08-01 07:29:19',
                         '2022-08-01 07:13:17','2022-08-01 07:28:58','2022-08-01 07:29:01',
                         '2022-08-01 07:34:54','2022-08-01 07:36:02','2022-08-01 07:38:17',
                         '2022-08-01 07:33:46','2022-08-01 07:37:09','2022-08-01 07:38:17','2022-08-01 07:41:38']})

I created 15mins. TimeBins by floor the DateTime to "15T" buckets.
And change the formats of the each column.

df['DateTime'] = pd.to_datetime(df['DateTime'])
df['TimeBins'] = df['DateTime'].dt.floor(freq='15T')
df['Cumulative'] = df['Cumulative'].astype('int32')

The new dataframe that I’d like to have is as follows,

pd.DataFrame({'M':['18','18','18',
                   '19','19','19'],
             'DateTime':['2022-08-01 07:14:28 ','2022-08-01 07:29:19','2022-08-01 07:38:17',
                        '2022-08-01 07:13:17','2022-08-01 07:29:01','2022-08-01 07:41:38'],
             'TimeBins':['2022-08-01 07:00:00','2022-08-01 07:15:00 ','2022-08-01 07:30:00',
                        '2022-08-01 07:00:00','2022-08-01 07:15:00','2022-08-01 07:30:00'],
             'Cumulative':['8','11','17',
                          '5','9','22'],
             'Diff':['NaN','3','6',
                    'NaN','4','11']})

    M   DateTime                TimeBins        Cumulative       Diff
0   18  2022-08-01 07:14:28 2022-08-01 07:00:00        8        NaN
1   18  2022-08-01 07:29:19 2022-08-01 07:15:00       11        3
2   18  2022-08-01 07:38:17 2022-08-01 07:30:00       17        6
3   19  2022-08-01 07:13:17 2022-08-01 07:00:00        5        NaN
4   19  2022-08-01 07:29:01 2022-08-01 07:15:00        9        4
5   19  2022-08-01 07:41:38 2022-08-01 07:30:00       22        11

>Solution :

Use GroupBy.last and DataFrameGroupBy.diff:

df['DateTime'] = pd.to_datetime(df['DateTime'])
df['TimeBins'] = df['DateTime'].dt.floor(freq='15T')
df['Cumulative'] = df['Cumulative'].astype('int32')

cols = ['M','DateTime','TimeBins','Cumulative']
df = df.groupby(['M','TimeBins'], as_index=False).last()[cols]
df['Diff'] = df.groupby('M')['Cumulative'].diff()
print (df)
    M            DateTime            TimeBins  Cumulative  Diff
0  18 2022-08-01 07:14:28 2022-08-01 07:00:00           8   NaN
1  18 2022-08-01 07:29:19 2022-08-01 07:15:00          11   3.0
2  18 2022-08-01 07:38:17 2022-08-01 07:30:00          17   6.0
3  19 2022-08-01 07:13:17 2022-08-01 07:00:00           5   NaN
4  19 2022-08-01 07:29:01 2022-08-01 07:15:00           9   4.0
5  19 2022-08-01 07:41:38 2022-08-01 07:30:00          22  13.0

Another solution with Grouper:

df['DateTime'] = pd.to_datetime(df['DateTime'])
df['Cumulative'] = df['Cumulative'].astype('int32')


cols = ['M','DateTime','TimeBins','Cumulative']

df = (df.groupby(['M',pd.Grouper(freq='15T', key='DateTime')])[['DateTime','Cumulative']]
       .last().rename_axis(['M','TimeBins'])
       .reset_index()[cols])
df['Diff'] = df.groupby('M')['Cumulative'].diff()
print (df)
    M            DateTime            TimeBins  Cumulative  Diff
0  18 2022-08-01 07:14:28 2022-08-01 07:00:00           8   NaN
1  18 2022-08-01 07:29:19 2022-08-01 07:15:00          11   3.0
2  18 2022-08-01 07:38:17 2022-08-01 07:30:00          17   6.0
3  19 2022-08-01 07:13:17 2022-08-01 07:00:00           5   NaN
4  19 2022-08-01 07:29:01 2022-08-01 07:15:00           9   4.0
5  19 2022-08-01 07:41:38 2022-08-01 07:30:00          22  13.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading