Given a dataframe with a time series like this:
| time | event |
|---|---|
| 2020-01-01 12:00:00 | 1 |
| 2020-01-01 12:00:01 | NaN |
| 2020-01-01 12:00:02 | 1 |
| 2020-01-01 12:00:03 | 1 |
| 2020-01-01 12:00:04 | NaN |
| 2020-01-01 12:00:05 | NaN |
| 2020-01-01 12:00:06 | 1 |
| 2020-01-01 12:00:07 | Nan |
I would like to get a summary-dataframe like:
| event_id | time_start | time_stop |
|---|---|---|
| 1 | 2020-01-01 12:00:00 | 2020-01-01 12:00:01 |
| 2 | 2020-01-01 12:00:02 | 2020-01-01 12:00:04 |
| 3 | 2020-01-01 12:00:06 | 2020-01-01 12:00:07 |
In a step-by-step approach I think I should first add an empty column ‘event_i’,
then fill in the index of the events (1,2,3,…)
Once this works, I can try to create a summary-dataframe.
I am already stuck at giving the index to the events.
I could work something out with df.iterrows() but that is not recommended.
How can I vectorize this indexing-procedure?
import pandas as pd
import numpy as np
# define mini-dataset as an example
data= {'time': ['2020-01-01 12:00:00', '2020-01-01 12:00:01', '2020-01-01 12:00:02','2020-01-01 12:00:03',
'2020-01-01 12:00:04','2020-01-01 12:00:05', '2020-01-01 12:00:06', '2020-01-01 12:00:07',
'2020-01-01 12:00:08', '2020-01-01 12:00:09','2020-01-01 12:00:10'],
'event': [1,np.nan,1,1,np.nan,np.nan,1,np.nan,1,1,np.nan]}
df = pd.DataFrame(data)
df['time']=pd.to_datetime((df['time']))
# give a sequential number to each event
df['event_i'] = np.nan
# for each event-number, group by and stack: event_id, time_start time_stop
# ...
>Solution :
Code
# Create a grouper to mark the intervals of successive events
m = df['event'].isna()
b = m.cumsum().mask(m).ffill(limit=1)
# group the time column by the grouper and agregate with first and last
df1 = df['time'].groupby(b).agg(['first', 'last']).reset_index(drop=True)
# Create event id column
df1['event_id'] = df1.index + 1
first last event_id
0 2020-01-01 12:00:00 2020-01-01 12:00:01 1
1 2020-01-01 12:00:02 2020-01-01 12:00:04 2
2 2020-01-01 12:00:06 2020-01-01 12:00:07 3
3 2020-01-01 12:00:08 2020-01-01 12:00:10 4