Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

find the earliest time between timestamps where a column value is below a certain value

I have a pandas data frame with the following structure:

timestamp,y
2019-08-01 00:00:00,872.0
2019-08-01 00:15:00,668.0
2019-08-01 00:30:00,604.0
2019-08-01 00:45:00,788.0
2019-08-01 01:00:00,608.0
2019-08-01 01:15:00,692.0
2019-08-01 01:30:00,716.0
2019-08-01 01:45:00,692.0
2019-08-01 02:00:00,672.0
2019-08-01 02:15:00,636.0
2019-08-01 02:30:00,596.0
2019-08-01 02:45:00,748.0
...

What I would like to do is for every time period in this dataframe which is between 6 PM and 5 AM, I would like to know the timestamp when the yvalue drops below a certain threshold.

I was thinking of doing the following pseudocode:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

timestamps = list()
for _, row in df.iterrows():
    found = False
    current = row['timestamp']
    val = row['y']
    if current is between 6 PM and 5 AM:
        if not found and value < threshold:
            found = True
            timestamps.append(current)  

But this seems quite ugly and prone to errors and I wonder if there is a more succint pandaish way to do this?

>Solution :

Set up to use df.between_time, by making it a DatetimeIndex and add whatever optional filters you’d like:

df.timestamp = pd.to_datetime(df.timestamp)
df = df.set_index('timestamp')

threshold = 700
out = df.between_time('01:00', '02:00')[lambda x: x.y < threshold]
print(out)

Output:

                         y
timestamp
2019-08-01 01:00:00  608.0
2019-08-01 01:15:00  692.0
2019-08-01 01:45:00  692.0
2019-08-01 02:00:00  672.0

Then you can resample, and take the first line from each. This would be the timestamp where it first dropped below the threshold. Here it is daily, I’m not sure how you’d do 6pm to 5am, but I’m sure it’s possible.

out.resample('d').first()

# Output:
                y
timestamp
2019-08-01  608.0

df.timestamp = pd.to_datetime(df.timestamp)
df = (df.set_index('timestamp')
        .between_time('18:00', '05:00')
        [lambda x: x.y < threshold]
        .resample('d')
        .first())
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading