Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Consecutive rows meeting a condition in pandas

I have a pandas dataframe like this:

    col_name
0       -1
1       -1
2       -3
3       2
4       1
5       -3
6       -2
7       4
8       3
9       5

that could be created with the code:

import pandas as pd

df = pd.DataFrame(
    {
        'col_name': [-1, -1, -3, 2, 1, -3, -2, 4, 3, 5]
    }
)

I want to find the rows that x rows before them and the row itself have positive values and y rows before those x rows have negative values and also the last row of these y rows which is actually y rows before the current row has the least value compared to k rows before of it.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

So, for x=1, y=2 and k=2 the output is:

    col_name
4       1

(Index 8 is not in the output because even though itself and one row before it have positive values, and two rows before them have negative values, but the last row with a negative value which is index 6, doesn’t have the least value compared to two rows before itself.)

Also, it’s my priority not to use any for-loops for the code.

Have you any idea about this?

>Solution :

Your explanation is not very clear, so I’ll put a base solution here and you feel free to modify to your needs. Should not be hard to adjust.

We can achieve that my shifting the series and applying iterative masks.

First, create your shifts:

m = d.assign(**{f'col_name_shift_{i}': d.col_name.shift(i) 
                for i in range(1, x+y+1)})

Note that the for loop here is very small (3 iterations only). This gives:

   col_name  col_name_shift_1  col_name_shift_2  col_name_shift_3
0        -1               NaN               NaN               NaN
1        -1              -1.0               NaN               NaN
2        -3              -1.0              -1.0               NaN
3         2              -3.0              -1.0              -1.0
4         1               2.0              -3.0              -1.0
5        -3               1.0               2.0              -3.0
6        -2              -3.0               1.0               2.0
7         4              -2.0              -3.0               1.0
8         3               4.0              -2.0              -3.0
9         5               3.0               4.0              -2.0

Now, it’s just a matter of row-wise analyze which rows follow your requirement.

For example,

I want to find the rows that x rows before them and the row itself have positive values

m1 = m.iloc[:, range(x+1)] > 0

and y rows before those x rows have negative values

m2 = m.iloc[:, range(x+1, x+y+1)] < 0

and also the last row of these y rows which is actually y rows before the current row has the least value compared to k rows before of it.

m3 = m.iloc[:, range(y+1, y+k)].gt(m.iloc[:, y], axis=0)

Then, you concatenate all your boolean series,

mask = pd.concat([m1, m2, m3, axis=1)

and find

df.loc[mask.all(1)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading