I need to remove consecutive duplicates of “instance” where the session is the same. In this example, the instance of 5 in session 2 should not be removed because it is in it’s own distinct session.
Input:
| session | instance |
|---|---|
| 1 | 3 |
| 1 | 5 |
| 1 | 5 |
| 2 | 5 |
| 3 | 2 |
| 3 | 2 |
| 3 | 5 |
| 3 | 2 |
Desired Output:
| session | instance |
|---|---|
| 1 | 3 |
| 1 | 5 |
| 2 | 5 |
| 3 | 2 |
| 3 | 5 |
| 3 | 2 |
What I am currently using is getting rid of all consecutive duplicates, even if the session is different, how can I add another expression that ensures only consecutive duplicates are removed for each unique session. It is important that all duplicates are not removed, for example non-consecutive duplicates should be retained within their distinct sessions.
My current code is:
df = df.loc[df[‘instance’].shift(-1) != df[‘instance’]]
>Solution :
I think your idea can be generalized to compare any amount of columns quite easily. Just compare the whole df, not only df[‘instance’].
df.loc[~(df.shift(-1) == df).apply(all, axis=1)]
Output:
session instance
0 1 3
2 1 5
3 2 5
5 3 2
6 3 5
7 3 2