How to identify the next different value in a pandas column

I have created a pandas dataframe using the following code:

ds = {"col1":[1,1,1,1,2,3,4,4,4,4,4,5,4,4,5,6]}

df = pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)

    col1
0      1
1      1
2      1
3      1
4      2
5      3
6      4
7      4
8      4
9      4
10     4
11     5
12     4
13     4
14     5
15     6

I need to calculate a new dataframe column (called col2) which contains the next value from col1 which is different than the current value.

Example.

Let’s take a look at row 0.
The value in col1 is 1.
What is the next value in col1 that is different than 1? It is the value 2 (at row 4).

Let’s now take a look at row 1.
The value in col1 is 1.
What is the next value in col1 that is different than 1? It is the value 2 (at row 4).

And so on.

The resulting dataframe would look like this:

enter image description here

What is the python code to produce such dataframe?

>Solution :

You can use a mask with where, shift the masked values up, and bfill:

# identify the first value of each stretch
m = df['col1'].ne(df['col1'].shift())
# mask the other, shift up, backfill
df['col2'] = df['col1'].where(m).shift(-1).bfill()

Output:

    col1  col2
0      1   2.0
1      1   2.0
2      1   2.0
3      1   2.0
4      2   3.0
5      3   4.0
6      4   5.0
7      4   5.0
8      4   5.0
9      4   5.0
10     4   5.0
11     5   4.0
12     4   5.0
13     4   5.0
14     5   6.0
15     6   NaN

Intermediates to understand the logic;

    col1  col2      m  where(m)  shift(-1)
0      1   2.0   True       1.0        NaN
1      1   2.0  False       NaN        NaN
2      1   2.0  False       NaN        NaN
3      1   2.0  False       NaN        2.0
4      2   3.0   True       2.0        3.0
5      3   4.0   True       3.0        4.0
6      4   5.0   True       4.0        NaN
7      4   5.0  False       NaN        NaN
8      4   5.0  False       NaN        NaN
9      4   5.0  False       NaN        NaN
10     4   5.0  False       NaN        5.0
11     5   4.0   True       5.0        4.0
12     4   5.0   True       4.0        NaN
13     4   5.0  False       NaN        5.0
14     5   6.0   True       5.0        6.0
15     6   NaN   True       6.0        NaN

Leave a Reply