I have this dataset called x for example:
| client_id | a | b | c |
|---|---|---|---|
| 1 | 1 | 0 | 0 |
| 2 | 0 | 1 | 1 |
| 3 | 0 | 0 | 1 |
| 4 | 1 | 1 | 1 |
I then want to create another table, that calculates the number of times each column from a to c had 1, in combination of other columns having 1 too.
So the result would be this table:
| a | b | c | |
|---|---|---|---|
| a | 1 | 1 | 1 |
| b | 1 | 0 | 2 |
| c | 1 | 2 | 1 |
So the first line of the second table would read like:
How many times there was only column a having a 1? it’s once only.
How many times there was 1 in both a and b? it’s only once.
How many times there was 1 in both a and c? it happened only once.
How can I do this easily in R? note that I only want to focus on when the column had 1.
>Solution :
You can use crossprod to build the off-diagonal entries. For later referencing, this is sometimes called a co-occurrence matrix.
To create the diagonal, it’s a bit more tricky: find rows that sums up to 1 and apply a column-wise sum.
mat <- as.matrix(df[-1])
out <- crossprod(mat) # Same as: t(mat) %*% mat
diag(out) <- colSums(mat[rowSums(mat) == 1, ])
out
# a b c
# a 1 1 1
# b 1 0 2
# c 1 2 1