Finding the first row that meets condtions of a mask and select one row after it

Advertisements

This is my dataframe:

import pandas as pd 
df = pd.DataFrame(
    {
        'a': [100, 1123, 123, 100, 1, 0, 1],
        'b': [1000, 11123, 1123, 0, 55, 0, 1],
        'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
    }
)

And this is the output that I want. I want to create column x:

     a      b  c    x
0   100   1000  a  NaN
1  1123  11123  b  NaN
2   123   1123  c  NaN
3   100      0  d  NaN
4     1     55  e  e
5     0      0  f  NaN
6     1      1  g  NaN

By using a mask:

mask = (
    (df.a > df.b)
)

First of all I need to find the first occurrence of this mask which in my example is row number 3. Then I want to move one row below it and use the value in column c to create column x.

So in my example, the first occurrence of mask is row 3. One row after it is row 4. That is why e is selected for column x.

Note that in row 4 which is one row after the mask, no condition is needed. For example for row 4, It is NOT necessary that df.a > df.b.

This is what I have tried:

df.loc[mask.cumsum().eq(1) & mask, 'x'] = df.c.shift(-1)

I provide some additional dfs for convenience to test whether the code works in other examples. For instance what if there are no cases that meet the conditions of mask. In that case I just want a column of NaN for x.

df = pd.DataFrame({'a': [1000, 11230, 12300, 10000, 1000, 10000, 100000], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [1, 1, 1, -1, -1, -1, -1], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [-1, -1, -1, -1, -1, -1, 100000], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})

>Solution :

You can generate a mask that indicates one location past where the first value of a is greater than b:

mask = (df.a > df.b).shift(fill_value=False)
mask = mask & ~mask.cumsum().shift().astype(bool)

You can then use that mask to set the value of x equal to c:

df.loc[mask, 'x'] = df['c']

Output for each of your dfs:

      a      b  c    x
0   100   1000  a  NaN
1  1123  11123  b  NaN
2   123   1123  c  NaN
3   100      0  d  NaN
4     1     55  e    e
5     0      0  f  NaN
6     1      1  g  NaN

        a      b  c    x
0    1000   1000  a  NaN
1   11230  11123  b  NaN
2   12300   1123  c    c
3   10000      0  d  NaN
4    1000     55  e  NaN
5   10000      0  f  NaN
6  100000      1  g  NaN

   a      b  c    x
0  1   1000  a  NaN
1  1  11123  b  NaN
2  1   1123  c  NaN
3 -1      0  d  NaN
4 -1     55  e  NaN
5 -1      0  f  NaN
6 -1      1  g  NaN

        a      b  c    x
0      -1   1000  a  NaN
1      -1  11123  b  NaN
2      -1   1123  c  NaN
3      -1      0  d  NaN
4      -1     55  e  NaN
5      -1      0  f  NaN
6  100000      1  g  NaN

Leave a ReplyCancel reply