Group values and check if the values ​from another column cancel each other out in python pandas dataframe

I need to check if the value inside "CP" columns of a dataframe, while my dataframe is grouped by "Centre" column, are canceling each others.

Here is an example :
Sample :

Centre CP
3N7D 10
3N1F 2
3N90 -9
3N7D -1
3N7D -10
8D92 198
3N7D -10
3N90 2
8D92 -195
8D92 -3

result excepted :

Centre CP Cancel
3N7D 10 Canceled
3N1F 2
3N90 -9
3N7D -1
3N7D -10 Canceled
8D92 198 (not canceled with -195 and -3 because I need to cancel it only if there is 1 value that cancel, like -198 in one time)
3N7D -10 (not canceled because the only 10 is already canceled)
3N90 2
8D92 -195
8D92 -3

Here is what I tried :

grouped_df = df.groupby("Centre")
for name, group in grouped_df:
    cp_values = group["CP"].values
    for i in range(len(cp_values)):
        for j in range(i+1, len(cp_values)):
            if cp_values[i] == -cp_values[j]:
                print("i = ",i)
                print("j = ",j)
                print("CP = ",cp_values[i])
                print("CP2 = ",cp_values[j])

Thanks for your help.

>Solution :

You can use a custom pivot to identify the matches of a number with its negative counterpart (obtained with numpy.sign. You will need to deduplicate the values using groupby.cumcount for the pivot:

tmp = (
 df.assign(sign=np.sign(df['CP']),
           n=df.groupby(['Centre', 'CP']).cumcount(),
           abs=df['CP'].abs()
          ).reset_index()
   .pivot(index=['Centre', 'abs', 'n'], columns='sign', values='index')
)

keep = tmp.where(tmp.notna().all(axis=1)).stack().values

df.loc[keep, 'Cancel'] = 'Canceled'

Output:

  Centre   CP    Cancel
0   3N7D   10  Canceled
1   3N1F    2       NaN
2   3N90   -9       NaN
3   3N7D   -1       NaN
4   3N7D  -10  Canceled
5   8D92  198       NaN
6   3N7D  -10       NaN
7   3N90    2       NaN
8   8D92 -195       NaN
9   8D92   -3       NaN

Intermediate tmp:

sign           -1    1
Centre abs n          
3N1F   2   0  NaN  1.0
3N7D   1   0  3.0  NaN
       10  0  4.0  0.0  # indices 0 and 4 cancel each other
           1  6.0  NaN  # indice 6 is alone
3N90   2   0  NaN  7.0
       9   0  2.0  NaN
8D92   3   0  9.0  NaN
       195 0  8.0  NaN
       198 0  NaN  5.0

Leave a Reply