Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to identify the next different value in a pandas column

I have created a pandas dataframe using the following code:

ds = {"col1":[1,1,1,1,2,3,4,4,4,4,4,5,4,4,5,6]}

df = pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)

    col1
0      1
1      1
2      1
3      1
4      2
5      3
6      4
7      4
8      4
9      4
10     4
11     5
12     4
13     4
14     5
15     6

I need to calculate a new dataframe column (called col2) which contains the next value from col1 which is different than the current value.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Example.

Let’s take a look at row 0.
The value in col1 is 1.
What is the next value in col1 that is different than 1? It is the value 2 (at row 4).

Let’s now take a look at row 1.
The value in col1 is 1.
What is the next value in col1 that is different than 1? It is the value 2 (at row 4).

And so on.

The resulting dataframe would look like this:

enter image description here

What is the python code to produce such dataframe?

>Solution :

You can use a mask with where, shift the masked values up, and bfill:

# identify the first value of each stretch
m = df['col1'].ne(df['col1'].shift())
# mask the other, shift up, backfill
df['col2'] = df['col1'].where(m).shift(-1).bfill()

Output:

    col1  col2
0      1   2.0
1      1   2.0
2      1   2.0
3      1   2.0
4      2   3.0
5      3   4.0
6      4   5.0
7      4   5.0
8      4   5.0
9      4   5.0
10     4   5.0
11     5   4.0
12     4   5.0
13     4   5.0
14     5   6.0
15     6   NaN

Intermediates to understand the logic;

    col1  col2      m  where(m)  shift(-1)
0      1   2.0   True       1.0        NaN
1      1   2.0  False       NaN        NaN
2      1   2.0  False       NaN        NaN
3      1   2.0  False       NaN        2.0
4      2   3.0   True       2.0        3.0
5      3   4.0   True       3.0        4.0
6      4   5.0   True       4.0        NaN
7      4   5.0  False       NaN        NaN
8      4   5.0  False       NaN        NaN
9      4   5.0  False       NaN        NaN
10     4   5.0  False       NaN        5.0
11     5   4.0   True       5.0        4.0
12     4   5.0   True       4.0        NaN
13     4   5.0  False       NaN        5.0
14     5   6.0   True       5.0        6.0
15     6   NaN   True       6.0        NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading