Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Select rows that come after a condition, under a condition in pandas

I have a dataframe that looks like this:

import pandas as pd
pd.DataFrame({'id': [1,1,1,1,2,2,2,2], 
             'time': [1,2,3,4,1,2,5,6],
             'is': [0,1,0,0,0,1,0,0]})

id  time    is
0   1   1   0
1   1   2   1
2   1   3   0
3   1   4   0
4   2   1   0
5   2   2   1
6   2   5   0
7   2   6   0

which is sorted by id and time

I want for each id, to select only the rows that satisfy at least one of the two conditions:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • is==1
  • the rows after the rows where is==1 and time between these 2 rows, does not have gaps.

The resulting dataframe should look like this:

pd.DataFrame({'id': [1,1,2], 
             'time': [2,3,2],
             'is': [1,0,1]})

How could I do that ?

>Solution :

You can use groupby.shift:

# form groups
g = df.groupby('id')

# select rows with is==1 and the next ones (per group)
m1 = df['is'].eq(1) | g['is'].shift().eq(1)
# select rows with successive time
m2 = g['time'].diff().eq(1)

out = df[m1&m2]

Output:

   id  time  is
1   1     2   1
2   1     3   0
5   2     2   1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading