Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

count successive occurences in a time series without df.iterrows()

Given a dataframe with a time series like this:

time event
2020-01-01 12:00:00 1
2020-01-01 12:00:01 NaN
2020-01-01 12:00:02 1
2020-01-01 12:00:03 1
2020-01-01 12:00:04 NaN
2020-01-01 12:00:05 NaN
2020-01-01 12:00:06 1
2020-01-01 12:00:07 Nan

I would like to get a summary-dataframe like:

event_id time_start time_stop
1 2020-01-01 12:00:00 2020-01-01 12:00:01
2 2020-01-01 12:00:02 2020-01-01 12:00:04
3 2020-01-01 12:00:06 2020-01-01 12:00:07

In a step-by-step approach I think I should first add an empty column ‘event_i’,
then fill in the index of the events (1,2,3,…)
Once this works, I can try to create a summary-dataframe.
I am already stuck at giving the index to the events.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I could work something out with df.iterrows() but that is not recommended.
How can I vectorize this indexing-procedure?

import pandas as pd
import numpy as np
# define mini-dataset as an example
data= {'time': ['2020-01-01 12:00:00', '2020-01-01 12:00:01', '2020-01-01 12:00:02','2020-01-01 12:00:03',
              '2020-01-01 12:00:04','2020-01-01 12:00:05', '2020-01-01 12:00:06', '2020-01-01 12:00:07',
              '2020-01-01 12:00:08', '2020-01-01 12:00:09','2020-01-01 12:00:10'],
     'event': [1,np.nan,1,1,np.nan,np.nan,1,np.nan,1,1,np.nan]}
df = pd.DataFrame(data)
df['time']=pd.to_datetime((df['time']))

# give a sequential number to each event
df['event_i'] = np.nan

# for each event-number, group by and stack: event_id,  time_start time_stop
# ...

>Solution :

Code

# Create a grouper to mark the intervals of successive events
m = df['event'].isna()
b = m.cumsum().mask(m).ffill(limit=1)

# group the time column by the grouper and agregate with first and last
df1 = df['time'].groupby(b).agg(['first', 'last']).reset_index(drop=True)

# Create event id column
df1['event_id'] = df1.index + 1

                first                last  event_id
0 2020-01-01 12:00:00 2020-01-01 12:00:01         1
1 2020-01-01 12:00:02 2020-01-01 12:00:04         2
2 2020-01-01 12:00:06 2020-01-01 12:00:07         3
3 2020-01-01 12:00:08 2020-01-01 12:00:10         4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading