I have a sample data frame as follows and I need to explore the Category column.
> df
Id Category
1 [1,2,3,4]
2 [2,3,5]
3 [1,4,5]
4 [1,2,5]
5 [4]
6 [2,3,5]
7 [1,5]
In reality, there are thousands of rows in the data frame with categories between 1 to 25. I want to see how many times these categories co-occurred in that column. The output should be as a matrix or data frame.
Matrix Output:
[,1] [,2] [,3] [,4] [,5]
[1,] 0 2 1 2 3
[2,] 2 0 3 1 3
[3,] 1 3 0 1 2
[4,] 2 1 1 0 1
[5,] 3 3 2 1 0
Dataframe Output:
C1 C2 Count
1 2 2
1 3 1
1 4 2
1 5 3
2 3 3
2 4 1
2 5 3
3 4 1
3 5 2
4 5 1
Can anybody help me in this regard?
>Solution :
another option:
df %>%
group_by(Id) %>%
mutate(Category = list(reticulate::py_eval(Category))) %>%
unnest(Category) %>%
table() %>%
crossprod() %>%
as.data.frame.table() %>%
filter(Category!=Category.1)
Category Category.1 Freq
1 2 1 2
2 3 1 1
3 4 1 2
4 5 1 3
5 1 2 2
6 3 2 3
7 4 2 1
8 5 2 3
9 1 3 1
10 2 3 3
11 4 3 1
12 5 3 2
13 1 4 2
14 2 4 1
15 3 4 1
16 5 4 1
17 1 5 3
18 2 5 3
19 3 5 2
20 4 5 1