Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Group values and check if the values ​from another column cancel each other out in python pandas dataframe

I need to check if the value inside "CP" columns of a dataframe, while my dataframe is grouped by "Centre" column, are canceling each others.

Here is an example :
Sample :

Centre CP
3N7D 10
3N1F 2
3N90 -9
3N7D -1
3N7D -10
8D92 198
3N7D -10
3N90 2
8D92 -195
8D92 -3

result excepted :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Centre CP Cancel
3N7D 10 Canceled
3N1F 2
3N90 -9
3N7D -1
3N7D -10 Canceled
8D92 198 (not canceled with -195 and -3 because I need to cancel it only if there is 1 value that cancel, like -198 in one time)
3N7D -10 (not canceled because the only 10 is already canceled)
3N90 2
8D92 -195
8D92 -3

Here is what I tried :

grouped_df = df.groupby("Centre")
for name, group in grouped_df:
    cp_values = group["CP"].values
    for i in range(len(cp_values)):
        for j in range(i+1, len(cp_values)):
            if cp_values[i] == -cp_values[j]:
                print("i = ",i)
                print("j = ",j)
                print("CP = ",cp_values[i])
                print("CP2 = ",cp_values[j])

Thanks for your help.

>Solution :

You can use a custom pivot to identify the matches of a number with its negative counterpart (obtained with numpy.sign. You will need to deduplicate the values using groupby.cumcount for the pivot:

tmp = (
 df.assign(sign=np.sign(df['CP']),
           n=df.groupby(['Centre', 'CP']).cumcount(),
           abs=df['CP'].abs()
          ).reset_index()
   .pivot(index=['Centre', 'abs', 'n'], columns='sign', values='index')
)

keep = tmp.where(tmp.notna().all(axis=1)).stack().values

df.loc[keep, 'Cancel'] = 'Canceled'

Output:

  Centre   CP    Cancel
0   3N7D   10  Canceled
1   3N1F    2       NaN
2   3N90   -9       NaN
3   3N7D   -1       NaN
4   3N7D  -10  Canceled
5   8D92  198       NaN
6   3N7D  -10       NaN
7   3N90    2       NaN
8   8D92 -195       NaN
9   8D92   -3       NaN

Intermediate tmp:

sign           -1    1
Centre abs n          
3N1F   2   0  NaN  1.0
3N7D   1   0  3.0  NaN
       10  0  4.0  0.0  # indices 0 and 4 cancel each other
           1  6.0  NaN  # indice 6 is alone
3N90   2   0  NaN  7.0
       9   0  2.0  NaN
8D92   3   0  9.0  NaN
       195 0  8.0  NaN
       198 0  NaN  5.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading