count co-occurrences of values in dataframe r

April 24, 2022

I have a sample data frame as follows and I need to explore the Category column.

> df
  Id  Category
  1 [1,2,3,4]
  2   [2,3,5]
  3   [1,4,5]
  4   [1,2,5]
  5       [4]
  6   [2,3,5]
  7     [1,5]

In reality, there are thousands of rows in the data frame with categories between 1 to 25. I want to see how many times these categories co-occurred in that column. The output should be as a matrix or data frame.

Matrix Output:

     [,1] [,2] [,3] [,4] [,5]
[1,]    0    2    1    2    3
[2,]    2    0    3    1    3
[3,]    1    3    0    1    2
[4,]    2    1    1    0    1
[5,]    3    3    2    1    0

Dataframe Output:

     C1 C2     Count
     1  2      2
     1  3      1
     1  4      2
     1  5      3
     2  3      3
     2  4      1
     2  5      3
     3  4      1
     3  5      2
     4  5      1

Can anybody help me in this regard?

>Solution :

another option:

df %>%
  group_by(Id) %>%
  mutate(Category = list(reticulate::py_eval(Category))) %>%
  unnest(Category) %>%
  table()  %>%
  crossprod() %>%
  as.data.frame.table() %>%
  filter(Category!=Category.1)

   Category Category.1 Freq
1         2          1    2
2         3          1    1
3         4          1    2
4         5          1    3
5         1          2    2
6         3          2    3
7         4          2    1
8         5          2    3
9         1          3    1
10        2          3    3
11        4          3    1
12        5          3    2
13        1          4    2
14        2          4    1
15        3          4    1
16        5          4    1
17        1          5    3
18        2          5    3
19        3          5    2
20        4          5    1