This is my dataframe:
import pandas as pd
df = pd.DataFrame(
{
'a': [100, 1123, 123, 100, 1, 0, 1],
'b': [1000, 11123, 1123, 0, 55, 0, 1],
'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
}
)
And this is the output that I want. I want to create column x
:
a b c x
0 100 1000 a NaN
1 1123 11123 b NaN
2 123 1123 c NaN
3 100 0 d NaN
4 1 55 e e
5 0 0 f NaN
6 1 1 g NaN
By using a mask:
mask = (
(df.a > df.b)
)
First of all I need to find the first occurrence of this mask which in my example is row number 3
. Then I want to move one row below it and use the value in column c
to create column x
.
So in my example, the first occurrence of mask
is row 3. One row after it is row 4. That is why e
is selected for column x
.
Note that in row 4 which is one row after the mask
, no condition is needed. For example for row 4, It is NOT necessary that df.a > df.b
.
This is what I have tried:
df.loc[mask.cumsum().eq(1) & mask, 'x'] = df.c.shift(-1)
I provide some additional df
s for convenience to test whether the code works in other examples. For instance what if there are no cases that meet the conditions of mask
. In that case I just want a column of NaN
for x
.
df = pd.DataFrame({'a': [1000, 11230, 12300, 10000, 1000, 10000, 100000], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [1, 1, 1, -1, -1, -1, -1], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [-1, -1, -1, -1, -1, -1, 100000], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
>Solution :
You can generate a mask that indicates one location past where the first value of a
is greater than b
:
mask = (df.a > df.b).shift(fill_value=False)
mask = mask & ~mask.cumsum().shift().astype(bool)
You can then use that mask to set the value of x
equal to c
:
df.loc[mask, 'x'] = df['c']
Output for each of your dfs:
a b c x
0 100 1000 a NaN
1 1123 11123 b NaN
2 123 1123 c NaN
3 100 0 d NaN
4 1 55 e e
5 0 0 f NaN
6 1 1 g NaN
a b c x
0 1000 1000 a NaN
1 11230 11123 b NaN
2 12300 1123 c c
3 10000 0 d NaN
4 1000 55 e NaN
5 10000 0 f NaN
6 100000 1 g NaN
a b c x
0 1 1000 a NaN
1 1 11123 b NaN
2 1 1123 c NaN
3 -1 0 d NaN
4 -1 55 e NaN
5 -1 0 f NaN
6 -1 1 g NaN
a b c x
0 -1 1000 a NaN
1 -1 11123 b NaN
2 -1 1123 c NaN
3 -1 0 d NaN
4 -1 55 e NaN
5 -1 0 f NaN
6 100000 1 g NaN