Advertisements
I have created a pandas dataframe using the following code:
ds = {"col1":[1,1,1,1,2,3,4,4,4,4,4,5,4,4,5,6]}
df = pd.DataFrame(data=ds)
The dataframe looks like this:
print(df)
col1
0 1
1 1
2 1
3 1
4 2
5 3
6 4
7 4
8 4
9 4
10 4
11 5
12 4
13 4
14 5
15 6
I need to calculate a new dataframe column (called col2
) which contains the next value from col1
which is different than the current value.
Example.
Let’s take a look at row 0.
The value in col1
is 1.
What is the next value in col1
that is different than 1? It is the value 2 (at row 4
).
Let’s now take a look at row 1.
The value in col1 is 1.
What is the next value in col1
that is different than 1? It is the value 2 (at row 4
).
And so on.
The resulting dataframe would look like this:
What is the python code to produce such dataframe?
>Solution :
You can use a mask with where
, shift
the masked values up, and bfill
:
# identify the first value of each stretch
m = df['col1'].ne(df['col1'].shift())
# mask the other, shift up, backfill
df['col2'] = df['col1'].where(m).shift(-1).bfill()
Output:
col1 col2
0 1 2.0
1 1 2.0
2 1 2.0
3 1 2.0
4 2 3.0
5 3 4.0
6 4 5.0
7 4 5.0
8 4 5.0
9 4 5.0
10 4 5.0
11 5 4.0
12 4 5.0
13 4 5.0
14 5 6.0
15 6 NaN
Intermediates to understand the logic;
col1 col2 m where(m) shift(-1)
0 1 2.0 True 1.0 NaN
1 1 2.0 False NaN NaN
2 1 2.0 False NaN NaN
3 1 2.0 False NaN 2.0
4 2 3.0 True 2.0 3.0
5 3 4.0 True 3.0 4.0
6 4 5.0 True 4.0 NaN
7 4 5.0 False NaN NaN
8 4 5.0 False NaN NaN
9 4 5.0 False NaN NaN
10 4 5.0 False NaN 5.0
11 5 4.0 True 5.0 4.0
12 4 5.0 True 4.0 NaN
13 4 5.0 False NaN 5.0
14 5 6.0 True 5.0 6.0
15 6 NaN True 6.0 NaN