Advertisements
I want to subset a df by returning the first and last consecutive value from a pandas col. Drop_duplciates
won’t work because it doesn’t account for consecutive groupings. I’m using .shift()
(below) but this only returns the last consecutive value, where I want the first and last.
import pandas as pd
df = pd.DataFrame({"Item":['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'],
"Val1":[-20, -21, -20, -20, -20, -21, -20, -23, -22],
"Val2":[150, 151, 150, 148, 149, 150, 151, 150, 148]
})
df1 = df[df['Item'].ne(df['Item'].shift())]
print(df1)
intended output:
Item Val1 Val2
0 A -20 150
2 A -20 150
3 B -20 148
6 B -20 151
7 A -23 150
8 A -22 148
>Solution :
You need to compare against both the forward and backward shifted values so that you can find the start and finish of each group:
df1 = df[(df['Item'].ne(df['Item'].shift())) |
(df['Item'].ne(df['Item'].shift(-1)))]
Output:
Item Val1 Val2
0 A -20 150
2 A -20 150
3 B -20 148
6 B -20 151
7 A -23 150
8 A -22 148