Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to use Pandas diff() with other columns value as period?

I have a dataframe looking like this:

Timestamp description
0 Parser starts
12 parsing
24 parsing
26 Parsing finished
28 Parser starts
45 Parsing finished

I want to calculate the how long each parse took. I therefore want the difference between timestamps where (df['description'] == 'Parsing finished') and (df['description'] == 'Parser starts'). I know I can use pd.diff() but I can only find how to use it with a set period. I want to set the period based on the description value.

Expected output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Timestamp description difference
0 Parser starts NaN
12 parsing NaN
24 parsing NaN
26 Parsing finished 26
28 Parser starts NaN
45 Parsing finished 17

I thought of looping over each row but this seems counterintuitive when using Pandas.

EDIT: updated wrong value thanks to comment of @mozway. Made myself more clear with below table:

Timestamp description
0 Parser starts
12 parsing
24 parsing
26 Parsing finished
27 Uploading results
28 Parser starts
45 Parsing finished

I do not want the timestamp of uploading results (or other values in between parser starts and parsing finished) to be part of the diff. Therefore grouping on parser starts does not provide the result Im looking for.

>Solution :

You can use a groupby:

import numpy as np

# make groups starting with "Parser starts"
group = df['description'].eq('Parser starts').cumsum()

# set up the grouper
g = df.groupby(group)

# update last value with ptp (= max - min)
df.loc[g.cumcount(ascending=False).eq(0),
       'difference'] = g['Timestamp'].transform(np.ptp)

output:

   Timestamp       description  difference
0          0     Parser starts         NaN
1         12           parsing         NaN
2         24           parsing         NaN
3         26  Parsing finished        26.0
4         28     Parser starts         NaN
5         45  Parsing finished        17.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading