Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Matrix that counts how many times the combination of a row and column of that matrix are present in a dataframe

I am still quite new with R so please bare with me 🙂

I need to create a matrix that counts how many times the combination of a row and column of that matrix are present in a dataframe.

As my description is probably quite vague, I have given an example set below. In reality, my dataset will contain many more fruits in the matrix and many more juices in the dataframe, so I’m looking for an efficient way to tackle this problem.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

#Stackoverflow example
#Create empty matrix ----
newMatrix <- matrix(0, nrow = 5, ncol = 5)
colnames(newMatrix) <- c("Apple", "Pear", "Orange", "Mango", "Banana")
rownames(newMatrix) <- c("Apple", "Pear", "Orange", "Mango", "Banana")

#Create dataframe ----
newDf <- data.frame(c("Juice 1", "Juice 2", "Juice 3", "Juice 4","Juice 5"),
                    c("Banana", "Banana", "Orange", "Pear", "Apple"),
                    c("Pear", "Orange", "Pear", "Apple", "Pear"),
                    c("Orange", "Mango", NA, NA, NA))
colnames(newDf) <- c("Juice", "Fruit 1", "Fruit 2", "Fruit 3")

I want to create a for loop that goes over every element in my newMatrix and adds +1 if the combination of the column and row are present in a row of newDf.
So in essence, how many juices have a combination of for example Apple and Pear, how many juices have a combination of Apple and Mango, and so forth.

The output should look like this:

       Apple Pear Orange Mango Banana
Apple      0    2      0     0      0
Pear       2    0      2     0      1
Orange     0    2      0     1      2
Mango      0    0      1     0      1
Banana     0    1      2     1      0

I started by trying to create a for loop but I got stuck at the if part:

for (i in 1:nrow(adj_matrix)){
  for (j in 1:ncol(adj_matrix)) {
    if (???)
      adj_matrix[i,j] <- adj_matrix[i,j] + 1
  }
}

Can somebody help me with this? Would be highly appreciated!

>Solution :

With base R, you can take the combinations of your values, and then use igraph to get the adjacency matrix:

m <- do.call(cbind, apply(newDf[-1], 1, \(x) if(sum(complete.cases(x)) >= 2) combn(x, m = 2) else x, simplify = F))
g <- graph_from_data_frame(na.omit(t(m)), directed = F)
get.adjacency(g, sparse = F)

       Banana Pear Orange Apple Mango
Banana      0    1      2     0     1
Pear        1    0      2     2     0
Orange      2    2      0     0     1
Apple       0    2      0     0     0
Mango       1    0      1     0     0

It might a bit convoluted, but you can also use igraph with tidyverse packages:

newDf %>% 
  pivot_longer(-Juice) %>% 
  group_by(Juice) %>% 
  summarise(new = ifelse(n() > 1, paste(combn(na.omit(value), 2), collapse = "-"), value)) %>% 
  separate_rows(new, sep = "(?:[^-]*(?:-[^-]*){1})\\K-") %>% 
  separate(new, into = c("X1", "X2")) %>% 
  select(-Juice) %>% 
  graph_from_data_frame(directed = FALSE) %>% 
  get.adjacency(sparse = FALSE)

       Banana Pear Orange Apple Mango
Banana      0    1      2     0     1
Pear        1    0      2     2     0
Orange      2    2      0     0     1
Apple       0    2      0     0     0
Mango       1    0      1     0     0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading