Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Finding the first row that meets condtions of a mask and select one row after it

This is my dataframe:

import pandas as pd 
df = pd.DataFrame(
    {
        'a': [100, 1123, 123, 100, 1, 0, 1],
        'b': [1000, 11123, 1123, 0, 55, 0, 1],
        'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
    }
)

And this is the output that I want. I want to create column x:

     a      b  c    x
0   100   1000  a  NaN
1  1123  11123  b  NaN
2   123   1123  c  NaN
3   100      0  d  NaN
4     1     55  e  e
5     0      0  f  NaN
6     1      1  g  NaN

By using a mask:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

mask = (
    (df.a > df.b)
)

First of all I need to find the first occurrence of this mask which in my example is row number 3. Then I want to move one row below it and use the value in column c to create column x.

So in my example, the first occurrence of mask is row 3. One row after it is row 4. That is why e is selected for column x.

Note that in row 4 which is one row after the mask, no condition is needed. For example for row 4, It is NOT necessary that df.a > df.b.

This is what I have tried:

df.loc[mask.cumsum().eq(1) & mask, 'x'] = df.c.shift(-1)

I provide some additional dfs for convenience to test whether the code works in other examples. For instance what if there are no cases that meet the conditions of mask. In that case I just want a column of NaN for x.

df = pd.DataFrame({'a': [1000, 11230, 12300, 10000, 1000, 10000, 100000], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [1, 1, 1, -1, -1, -1, -1], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [-1, -1, -1, -1, -1, -1, 100000], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})

>Solution :

You can generate a mask that indicates one location past where the first value of a is greater than b:

mask = (df.a > df.b).shift(fill_value=False)
mask = mask & ~mask.cumsum().shift().astype(bool)

You can then use that mask to set the value of x equal to c:

df.loc[mask, 'x'] = df['c']

Output for each of your dfs:

      a      b  c    x
0   100   1000  a  NaN
1  1123  11123  b  NaN
2   123   1123  c  NaN
3   100      0  d  NaN
4     1     55  e    e
5     0      0  f  NaN
6     1      1  g  NaN

        a      b  c    x
0    1000   1000  a  NaN
1   11230  11123  b  NaN
2   12300   1123  c    c
3   10000      0  d  NaN
4    1000     55  e  NaN
5   10000      0  f  NaN
6  100000      1  g  NaN

   a      b  c    x
0  1   1000  a  NaN
1  1  11123  b  NaN
2  1   1123  c  NaN
3 -1      0  d  NaN
4 -1     55  e  NaN
5 -1      0  f  NaN
6 -1      1  g  NaN

        a      b  c    x
0      -1   1000  a  NaN
1      -1  11123  b  NaN
2      -1   1123  c  NaN
3      -1      0  d  NaN
4      -1     55  e  NaN
5      -1      0  f  NaN
6  100000      1  g  NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading