Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to count event in predefined time ranges

I want to count the events for every 1 second for the csv data file and draw a histogram according to the results. But I don’t understand how I can get the number of events in every second.
Can someone please help me with this issue?

code is :

from matplotlib import pyplot as pl
import pandas as pd
import numpy as np

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

def read_data():
    df = pd.read_csv("test.csv", usecols=['time', 'unix_time', 'name'])
    df['time'] = pd.to_datetime(df['time'])
    df['unix_time'] = (df['unix_time']).astype(int)
    df.info()

    i = 1

    time_counts = df.groupby((3600 * df.time.dt.minute + df.time.dt.second) // i * i)['time'].count()
    print(time_counts)


if __name__ == "__main__":
    read_data()

output is looks strange:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   time       33 non-null     datetime64[ns]
 1   unix_time  33 non-null     int32         
 2   name       33 non-null     object        
dtypes: datetime64[ns](1), int32(1), object(1)
memory usage: 788.0+ bytes

time
18        1
25217     1
43209     1
43219     1
46804     1
54047     1
61241     1
64815     1
64833     1
68402     1
75620     1
79235     1
82806     1
82837     2
86407     1
86446     1
93625     1
97254     1
104446    1
140438    1
144050    1
162025    1
169250    1
180050    1
183623    1
183658    1
194404    1
194412    2
194433    1
194438    1
205219    1
Name: time, dtype: int64

data in csv is :

time                    unix_time       name
2022-12-15 08:00:18.034 1671091218034   apple
2022-12-15 08:07:17.376 1671091637376   apple
2022-12-15 08:12:09.648 1671091929648   apple
2022-12-15 08:12:19.320 1671091939320   apple
2022-12-15 08:13:04.623 1671091984623   apple
2022-12-15 08:15:47.103 1671092147103   apple
2022-12-15 08:17:41.878 1671092261878   apple
2022-12-15 08:18:15.842 1671092295842   apple
2022-12-15 08:18:33.786 1671092313786   apple
2022-12-15 08:19:02.022 1671092342022   apple
2022-12-15 08:21:20.350 1671092480350   apple
2022-12-15 08:22:35.603 1671092555603   apple
2022-12-15 08:23:06.009 1671092586009   apple
2022-12-15 08:23:37.101 1671092617101   apple
2022-12-15 08:23:37.334 1671092617334   apple
2022-12-15 08:24:07.645 1671092647645   apple
2022-12-15 08:24:46.978 1671092686978   apple
2022-12-15 08:26:25.430 1671092785430   apple
2022-12-15 08:27:54.027 1671092874027   apple
2022-12-15 08:29:46.712 1671092986712   apple
2022-12-15 08:39:38.742 1671093578742   apple
2022-12-15 08:40:50.310 1671093650310   apple
2022-12-15 08:45:25.007 1671093925007   apple
2022-12-15 08:47:50.770 1671094070770   apple
2022-12-15 08:50:50.856 1671094250856   apple
2022-12-15 08:51:23.914 1671094283914   apple
2022-12-15 08:51:58.572 1671094318572   apple
2022-12-15 08:54:04.959 1671094444959   apple
2022-12-15 08:54:12.424 1671094452424   apple
2022-12-15 08:54:12.807 1671094452807   apple
2022-12-15 08:54:33.562 1671094473562   apple
2022-12-15 08:54:38.531 1671094478531   apple
2022-12-15 08:57:19.777 1671094639777   apple

>Solution :

Use Grouper by one seconds frequency:

df['time'] = pd.to_datetime(df['time'])

time_counts = df.groupby(pd.Grouper(freq='1s', key='time'))['time'].count()
print(time_counts)
time
2022-12-15 08:00:18    1
2022-12-15 08:00:19    0
2022-12-15 08:00:20    0
2022-12-15 08:00:21    0
2022-12-15 08:00:22    0
                      ..
2022-12-15 08:57:15    0
2022-12-15 08:57:16    0
2022-12-15 08:57:17    0
2022-12-15 08:57:18    0
2022-12-15 08:57:19    1
Freq: S, Name: time, Length: 3422, dtype: int64

Or Series.dt.floor for remove miliseconds:

df['time'] = pd.to_datetime(df['time'])

time_counts = df.groupby(df['time'].dt.floor('S'))['time'].count()

print(time_counts)
time
2022-12-15 08:00:18    1
2022-12-15 08:07:17    1
2022-12-15 08:12:09    1
2022-12-15 08:12:19    1
2022-12-15 08:13:04    1
2022-12-15 08:15:47    1
2022-12-15 08:17:41    1
2022-12-15 08:18:15    1
2022-12-15 08:18:33    1
2022-12-15 08:19:02    1
2022-12-15 08:21:20    1
2022-12-15 08:22:35    1
2022-12-15 08:23:06    1
2022-12-15 08:23:37    2
2022-12-15 08:24:07    1
2022-12-15 08:24:46    1
2022-12-15 08:26:25    1
2022-12-15 08:27:54    1
2022-12-15 08:29:46    1
2022-12-15 08:39:38    1
2022-12-15 08:40:50    1
2022-12-15 08:45:25    1
2022-12-15 08:47:50    1
2022-12-15 08:50:50    1
2022-12-15 08:51:23    1
2022-12-15 08:51:58    1
2022-12-15 08:54:04    1
2022-12-15 08:54:12    2
2022-12-15 08:54:33    1
2022-12-15 08:54:38    1
2022-12-15 08:57:19    1
Name: time, dtype: int64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading