I have a dataframe that looks like this:
import pandas as pd
pd.DataFrame({'id': [1,1,1,1,2,2,2,2],
'time': [1,2,3,4,1,2,5,6],
'is': [0,1,0,0,0,1,0,0]})
id time is
0 1 1 0
1 1 2 1
2 1 3 0
3 1 4 0
4 2 1 0
5 2 2 1
6 2 5 0
7 2 6 0
which is sorted by id and time
I want for each id, to select only the rows that satisfy at least one of the two conditions:
is==1- the rows after the rows where
is==1andtimebetween these 2 rows, does not have gaps.
The resulting dataframe should look like this:
pd.DataFrame({'id': [1,1,2],
'time': [2,3,2],
'is': [1,0,1]})
How could I do that ?
>Solution :
You can use groupby.shift:
# form groups
g = df.groupby('id')
# select rows with is==1 and the next ones (per group)
m1 = df['is'].eq(1) | g['is'].shift().eq(1)
# select rows with successive time
m2 = g['time'].diff().eq(1)
out = df[m1&m2]
Output:
id time is
1 1 2 1
2 1 3 0
5 2 2 1