subset first and last consecutive value from pandas df col – python

Advertisements

I want to subset a df by returning the first and last consecutive value from a pandas col. Drop_duplciates won’t work because it doesn’t account for consecutive groupings. I’m using .shift() (below) but this only returns the last consecutive value, where I want the first and last.

import pandas as pd

df = pd.DataFrame({"Item":['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'], 
           "Val1":[-20, -21, -20, -20, -20, -21, -20, -23, -22], 
           "Val2":[150, 151, 150, 148, 149, 150, 151, 150, 148]
           })

df1 = df[df['Item'].ne(df['Item'].shift())]

print(df1)

intended output:

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148

>Solution :

You need to compare against both the forward and backward shifted values so that you can find the start and finish of each group:

df1 = df[(df['Item'].ne(df['Item'].shift())) | 
         (df['Item'].ne(df['Item'].shift(-1)))]

Output:

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148

Leave a ReplyCancel reply