Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

subset first and last consecutive value from pandas df col – python

I want to subset a df by returning the first and last consecutive value from a pandas col. Drop_duplciates won’t work because it doesn’t account for consecutive groupings. I’m using .shift() (below) but this only returns the last consecutive value, where I want the first and last.

import pandas as pd

df = pd.DataFrame({"Item":['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'], 
           "Val1":[-20, -21, -20, -20, -20, -21, -20, -23, -22], 
           "Val2":[150, 151, 150, 148, 149, 150, 151, 150, 148]
           })

df1 = df[df['Item'].ne(df['Item'].shift())]

print(df1)

intended output:

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You need to compare against both the forward and backward shifted values so that you can find the start and finish of each group:

df1 = df[(df['Item'].ne(df['Item'].shift())) | 
         (df['Item'].ne(df['Item'].shift(-1)))]

Output:

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading