Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Subtracting times in a csv for a row by row basis in Python

I have a CSV that’s few thousand rows long. It contains data sent from various devices. They should transmit frequently (every 10 minutes) however sometimes there is a lag. I’m trying to write a program that will highlight all instances where the delay between two readings is greater than 15 minutes

I’ve made a functional code that works, but with this code I first have to manually edit the CSV to change the "eventTime" variable from time format (e.g. 03:22:00) to a float value based on 1/24 (e.g. 03:22:00 becomes 0.14027). Similarly, the 15 minute interval becomes 0.01042 (15/(60*24))

import pandas as pd

df = pd.read_csv('file.csv')
df2 = pd.DataFrame()
deviceID = df["deviceId"].unique().tolist()
threshold = 0.01042
for id_no in range(0, len(deviceID)):
    subset = df[df.deviceId == deviceID[id_no]]
    for row in range(len(subset)-1):
        difference = subset.iloc[row, 1] - subset.iloc[row+1, 1]
        if difference > threshold:
            df2 = df2.append(subset.iloc[row])
            df2 = df2.append(subset.iloc[row+1])

df2.to_csv('file2.csv)

This works, and I can open the CSV in excel and manually change the float values back to time format, but when I might be dealing with a few hundred CSV files, this becomes impractical,

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’ve tried this below

import pandas as pd
from datetime import datetime

df = pd.read_csv('file.csv')
df2 = pd.DataFrame()
deviceID = df["deviceId"].unique().tolist()
df['eventTime'].apply(lambda x: datetime.strptime(x, "%H:%M:%S"))
threshold = datetime.strptime("00:15:00", '%H:%M:%S')
for id_no in range(0, len(deviceID)):
    subset = df[df.deviceId == deviceID[id_no]]
    for row in range(len(subset)-1):
        difference = datetime.strptime(subset.iloc[row, 1],'%H:%M:%S') - datetime.strptime(subset.iloc[row+1, 1], '%H:%M:%S')
        if difference > threshold:
            df2 = df2.append(subset.iloc[row])
            df2 = df2.append(subset.iloc[row+1])

df2.to_csv('file2.csv')

but I get the following error:

if difference > threshold:
TypeError: '>' not supported between instances of 'datetime.timedelta' and 'datetime.datetime'

The data looks like this:

| eventTime|  deviceId|
| -------- | -------- |
| 15:30:00 |  11234889|
| 15:45:00 |  11234889|
| 16:00:00 |  11234889|

and for different IDs

| eventTime|  deviceId|
| -------- | -------- |
| 15:30:00 |  11234890|
| 15:45:00 |  11234890|
| 16:00:00 |  11234890|

>Solution :

threshold is datetime and you compare it to timedelta object (difference). Did you mean:

from datetime import timedelta
...
threshold = datetime.timedelta(minutes=15)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading