Home pandas: add column whose value is available in previous row but not in current, of another column

Questions

pandas: add column whose value is available in previous row but not in current, of another column

June 21, 2023

Suppose this is my df:

{'accuracy': [0.773, 0.841, 0.862, 0.874, 0.883, 0.913],
 'code': [('D',),('D', 'F'),('B', 'D', 'F'),
  ('B', 'F', 'K'), ('B', 'F', 'I', 'K'),
  ('F', 'I', 'K')]}

df
   accuracy         code
0   0.773           (D,)
1   0.841         (D, F)
2   0.862      (B, D, F)
3   0.874      (B, F, K)
4   0.883   (B, F, I, K)
5   0.913      (F, I, K)

I would like to add a column dropped whose value is the item in code in previous row is not available in the current row.

Expected:

    accuracy        code    dropped
0   0.773           (D,)      -
1   0.841         (D, F)      -
2   0.862      (B, D, F)      -
3   0.874      (B, F, K)      D
4   0.883   (B, F, I, K)      -
5   0.913      (F, I, K)      B

>Solution :

It’s very easy if you use sets and shift:

s = df['code'].apply(set)

df['dropped'] = s.shift(fill_value=set())-s

Output:

   accuracy          code dropped
0     0.773          (D,)      {}
1     0.841        (D, F)      {}
2     0.862     (B, D, F)      {}
3     0.874     (B, F, K)     {D}
4     0.883  (B, F, I, K)      {}
5     0.913     (F, I, K)     {B}

If you insist on the format (and have at most one dropped item per row):

s = df['code'].apply(set)

df['dropped'] = (s.shift(fill_value=set()).sub(s)
                  .apply(list).str[0].fillna('-')
                )

Output:

   accuracy          code dropped
0     0.773          (D,)       -
1     0.841        (D, F)       -
2     0.862     (B, D, F)       -
3     0.874     (B, F, K)       D
4     0.883  (B, F, I, K)       -
5     0.913     (F, I, K)       B