Delete all dataframe rows that are associated with condition

March 28, 2022

Say I have this dataframe:

df = {'ID' : [1, 1, 1, 1, 1, 1, 1, 2, 2],
     'x':[76.551, 79.529, 78.336,77, 76.02, 79.23, 77.733, 79.249,  76.077],
     'y': [151.933, 152.945, 153.970, 119.369, 120.615, 118.935, 119.115, 152.004, 153.027],
    'position': ['start', 'end', 'start', 'NA', 'NA','NA','end', 'start', 'end']}
df = pd.DataFrame(df)
df
   ID       x        y position
0   1  76.551  151.933    start
1   1  79.529  152.945      end
2   1  78.336  153.970    start
3   1  77.000  119.369       NA
4   1  76.020  120.615       NA
5   1  79.230  118.935       NA
6   1  77.733  119.115      end
7   2  79.249  152.004    start
8   2  76.077  153.027      end

I want to delete all the rows that are associated with an end point between certain values. I can specify the end points that I want to remove with:

df[(df['position'] == 'end') & (df['x'] > 75) & (df['x'] < 78)]

but how do I remove all the rows associated with that condition?

Output would look like:

   ID       x        y position
0   1  76.551  151.933    start
1   1  79.529  152.945      end

EDIT: the context is that these are trajectories from different animals (with particular ID), if the animal’s end coordinate lies between particular x-axis values, i want to remove that animal’s whole trajectory from the model.

>Solution :

You can use a boolean mask:

m = (df['position'] == 'end') & (df['x'] > 75) & (df['x'] < 78)
out = df[~m.groupby(df['position'].eq('start').cumsum()).transform('max')]
print(out)

# Output
   ID       x        y position
0   1  76.551  151.933    start
1   1  79.529  152.945      end

I already used in your previous question df['position'].eq('start').cumsum() to create virtual groups to identify the different trajectories.