what is an easy way to count the number of combinations in across two columns? Given this dataframe:
df =
id testA testB
1 3 NA
1 1 3
2 2 NA
2 NA 1
2 0 0
3 NA NA
3 1 1
I would like to know the different combinations independently of the score. For example:
Both tests: 3
A but not B: 2
B but not A: 1
>Solution :
For two columns, can query each condition separately:
a_exists = df["testA"].notna()
b_exists = df["testB"].notna()
# both
>>> (a_exists & b_exists).sum()
3
# A, but not B
>>> (a_exists & ~b_exists).sum()
2
# B, but not A
>>> (~a_exists & b_exists).sum()
1
But can be automated with some itertools:
from itertools import compress, repeat
cols = ["A", "B"]
for ma, mb in product([0, 1], repeat=2):
if ma == mb == 1: continue
ab_info = "".join(compress(cols, (1 - ma, 1 - mb)))
counts = ((a_exists ^ ma) & (b_exists ^ mb)).sum()
print(ab_info, counts)
- get "selector"s over [0, 1] x 2
- if both are 1s, i.e., neither doesn’t exist, skip the selector
- otherwise
- get the selected parties with
compress - see whether to negate or not through bitwise XOR and AND the results
- sum to get the total count
- get the selected parties with
which prints
AB 3
A 2
B 1