Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Delete all dataframe rows that are associated with condition

Say I have this dataframe:

df = {'ID' : [1, 1, 1, 1, 1, 1, 1, 2, 2],
     'x':[76.551, 79.529, 78.336,77, 76.02, 79.23, 77.733, 79.249,  76.077],
     'y': [151.933, 152.945, 153.970, 119.369, 120.615, 118.935, 119.115, 152.004, 153.027],
    'position': ['start', 'end', 'start', 'NA', 'NA','NA','end', 'start', 'end']}
df = pd.DataFrame(df)
df
   ID       x        y position
0   1  76.551  151.933    start
1   1  79.529  152.945      end
2   1  78.336  153.970    start
3   1  77.000  119.369       NA
4   1  76.020  120.615       NA
5   1  79.230  118.935       NA
6   1  77.733  119.115      end
7   2  79.249  152.004    start
8   2  76.077  153.027      end

I want to delete all the rows that are associated with an end point between certain values. I can specify the end points that I want to remove with:

df[(df['position'] == 'end') & (df['x'] > 75) & (df['x'] < 78)]

but how do I remove all the rows associated with that condition?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Output would look like:

   ID       x        y position
0   1  76.551  151.933    start
1   1  79.529  152.945      end

EDIT: the context is that these are trajectories from different animals (with particular ID), if the animal’s end coordinate lies between particular x-axis values, i want to remove that animal’s whole trajectory from the model.

>Solution :

You can use a boolean mask:

m = (df['position'] == 'end') & (df['x'] > 75) & (df['x'] < 78)
out = df[~m.groupby(df['position'].eq('start').cumsum()).transform('max')]
print(out)

# Output
   ID       x        y position
0   1  76.551  151.933    start
1   1  79.529  152.945      end

I already used in your previous question df['position'].eq('start').cumsum() to create virtual groups to identify the different trajectories.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading