Create a table that record the number of row pairs that are not zero in R

April 7, 2022

Apologies if the title is confusing, but below is what I would like to accomplish. Let’s say I have a table as seen below.

df <- data.frame(
  patient = paste0("patient",seq(1:6)),
  gene_1 = c(10,5,0,0,1,0),
  gene_2 = c(0,26,4,5,6,1),
  gene_3 = c(1,3,5,12,44,1)
)

patient	gene_1	gene_2	gene_3
patient1	10	0	1
patient2	5	26	3
patient3	0	4	5
patient4	0	5	12
patient5	1	6	44
patient6	0	1	1

What I want is another table that records the total number of pairs only if both values are non-zero. The table would look like so:

col1	col2	number-of-pairs
gene1	gene2	2
gene1	gene3	3
gene2	gene3	5

Any help is appreciated. Thank you.

>Solution :

We can do this by pivoting your data to a long format, doing a self-join, and then filtering:

library(tidyr)
library(dplyr)
## Long format, keep only non-zeros
long_data = pivot_longer(df, -patient) %>%
  filter(value != 0) %>%
  select(-value)

## Self join on patient,
## Remove exact matches (can't pair with yourself)
## And use < to remove doublecounts
long_data %>%
  left_join(long_data, by = "patient") %>%
  filter(name.x != name.y & name.x < name.y) %>%
  count(name.x, name.y)
# # A tibble: 3 × 3
#   name.x name.y     n
#   <chr>  <chr>  <int>
# 1 gene_1 gene_2     2
# 2 gene_1 gene_3     3
# 3 gene_2 gene_3     5