I understand Pandas has a pd.Grouper
where we can specify time frequency. However, it uses the frequency as border for each sample, similar to how resample does it.
For example:
df.groupby(pd.Grouper(key='Timestamp', freq='1s')).agg({...})
will create a grouped dataframe with index that are 1s
apart.
However, I want to group all rows where the difference between the previous and next rows are less than 1s
. For example, my timestamp might be (ignoring datetimes for simplicity, only showing the seconds, assume the same datetime before second precision) 1.1s, 1.8s, 2.4s, 5s, 5.9s, 9s
, in which case I want (1.1s, 1.8s, and 2.4s)
, (5s, 5.9s)
, (9s)
grouped together, and the indexes to the grouped dataframe are (1.1s, 5s, 9s)
.
How can I achieve this?
>Solution :
you can groupby cumsum on (diff > threshold):
threshold = pd.Timedelta('1s')
blocks = df['Timestamp'].diff().ge(threshold).cumsum()
df.groupby(blocks).agg({...})
Note: the question is inherently problematic, e.g. if the Timestamp
is something like pd.date_range('2023-01-01', '2023-01-02', freq='.9s')
, then you’d have just one big group.