Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas compute time delta on filtered datasets

If I make up some time series data:

import pandas as pd
import numpy as np
from numpy.random import seed

# seed random number generator
seed(1)

time = pd.date_range('6/28/2021', periods=100, freq='1min')
df = pd.DataFrame(np.random.randint(100, size=100), index=time,columns=['data'])

df.plot(figsize=(25,8))

This will plot:

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

And then filter the data for when data is above 50:
df = df.loc[df['data'] > 50]

How do I compute the time delta for when the data is above the value of 50? ONLY above the value of 50. For example if I do this:

# Compute delta time 
df['time_delta'] = df.index.to_series().diff().astype('timedelta64[m]')

df.time_delta.sum()

I dont think the sum is correct as this will include a time delta for when the data was below the value of 50, hope fully that makes sense to ONLY retrieve a time delta for when the value was above 50.

>Solution :

IIUC, you want:

df["timedelta"] = df.index.to_series().diff().where(df["data"].gt(50))

>>> df["timedelta"].sum()
Timedelta('0 days 00:44:00')

Which should be correct because there are exactly 44 rows where "data" is above 50 and each of these corresponds to a 1 minute time difference:

>>> df["data"].gt(50).sum()
44
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading