Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to use pandas resamle method?

I want to perform a sampling from a datetime series pandas using resample method. I don’t understand the output I’ve got.
I was expecting to get a sampling of ‘5s’ but I’m getting 17460145 rows from 100 original dataframe. How should be the correct use of resample ?

import numpy as np
import pandas as pd

def random_dates(start, end, n=100):

    start_u = start.value//10**9
    end_u = end.value//10**9
    return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')

start = pd.to_datetime('2022-01-01')
end = pd.to_datetime('2023-01-01')
rd=random_dates(start, end)
clas = np.random.choice(['A','B','C'],size=100)
value = np.random.randint(0,100,size=100)
df =pd.DataFrame.from_dict({'ts':rd,'cl':clas,'vl':value}).set_index('ts').sort_index()

df
Out[48]: 
                    cl  vl
ts                        
2022-01-04 17:25:10  B  27
2022-01-06 19:17:35  C  34
2022-01-17 22:55:25  B   1
2022-01-23 00:33:25  A  20
2022-01-27 18:26:56  A  55
                ..  ..
2022-12-14 07:46:50  C  22
2022-12-18 02:33:52  C  52
2022-12-22 17:35:10  A  52
2022-12-28 04:55:20  A  57
2022-12-29 03:19:00  A  60

[100 rows x 2 columns]

df.groupby(by='cl').resample('5s').mean()
Out[49]: 
                          vl
cl ts                       
A  2022-01-23 00:33:25  20.0
   2022-01-23 00:33:30   NaN
   2022-01-23 00:33:35   NaN
   2022-01-23 00:33:40   NaN
   2022-01-23 00:33:45   NaN
                     ...
C  2022-12-18 02:33:30   NaN
   2022-12-18 02:33:35   NaN
   2022-12-18 02:33:40   NaN
   2022-12-18 02:33:45   NaN
   2022-12-18 02:33:50  52.0

[17460145 rows x 1 columns]

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Use pd.Grouper:

>>> df.groupby(['cl', pd.Grouper(freq='5s')]).mean()
                          vl
cl ts                       
A  2022-01-22 11:53:30  31.0
   2022-02-01 21:24:55  60.0
   2022-03-20 06:01:05  24.0
   2022-04-03 00:04:05  55.0
   2022-04-03 06:30:10  81.0
...                      ...
C  2022-11-23 23:17:20  92.0
   2022-11-25 07:07:45  27.0
   2022-12-07 00:18:05  88.0
   2022-12-25 10:37:25  77.0
   2022-12-28 14:29:25  33.0

[100 rows x 1 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading