Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R output matrix index with values in dataframe

I am trying to find the "matrix index" from values of dataframe in the Position column. The "matrix" that I would like to reference to is either a 3 x 3 or 4 x 4 matrix, depending on the length of the Position column (1:9 for 3 x 3 and 1:16 for 4 x 4). Different groups in col1 would have different length of Position.

Here’s a dummy dataframe to demonstrate my problem.

df <- structure(list(col1 = c("group1", "group1", "group1", "group1", 
"group1", "group1", "group1", "group1", "group1", "group2", "group2", 
"group2", "group2", "group2", "group2", "group2", "group2", "group2", 
"group2", "group2", "group2", "group2", "group2", "group2", "group2", 
"group3", "group3", "group3", "group3", "group3", "group3", "group3", 
"group3", "group3", "group3", "group3", "group3", "group3"), 
    col2 = c("A", "Q", NA, "A", "K", "L", "O", "R", "J", "S", 
    "C", "S", "H", "O", "T", "Z", "D", "Y", "J", "V", "Z", "P", 
    "L", "X", "D", "K", "M", "X", "E", "P", "U", "Z", "Z", "L", 
    "W", "X", "F", "K"), Position = c(1L, 2L, 3L, 4L, 5L, 6L, 
    7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
    12L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
    9L, 10L, 11L, 12L, 13L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -38L))

Rules

From this dataframe, I would like to get a new column Position_ij specifying the ith and jth Position if it were in a matrix.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

For example, "group1" has a Position of length 9, therefore, it should reference a 3 x 3 matrix, and the Position_ij should be 1 = "[1, 1]", 2 = "[1, 2]", 3 = "[1, 3]", 4 = "[2, 1]" …, 9 = "[3, 3]".

For "group2", it has a Position length of 16, therefore it should reference a 4 x 4 matrix, and the Position_ij should be 1 = "[1, 1]", …, 4 = "[1, 4]", 5 = "[2, 1]" …, 16 = "[4, 4]".

For "group3", it has a Position length of 13, which is greater than 9, therefore it should reference a 4 x 4 matrix.

Current attempt (failed)

My current method uses %/% and %% to get the quotient and remainder of Position divided by matrix length, however, when Position == matrix length, the remainder is 0 but instead I want 3 or 4.

library(dplyr)

df %>% group_by(col1) %>% 
  mutate(Position_ij = if (n() == 9) {
      paste0("[", (Position %/% 3) + 1, ", ", Position %% 3, "]")
    } else {
      paste0("[", (Position %/% 4) + 1, ", ", Position %% 4, "]")
      })
# A tibble: 38 × 4
# Groups:   col1 [3]
   col1   col2  Position Position_ij
   <chr>  <chr>    <int> <chr>      
 1 group1 A            1 [1, 1]     
 2 group1 Q            2 [1, 2]     
 3 group1 NA           3 [2, 0]     # this should be [1, 3]
 4 group1 A            4 [2, 1]     
 5 group1 K            5 [2, 2]     
 6 group1 L            6 [3, 0]     # this should be [2, 3]
 7 group1 O            7 [3, 1]     
 8 group1 R            8 [3, 2]     
 9 group1 J            9 [4, 0]     # this should be [3, 3]
10 group2 S            1 [1, 1]     
# … with 28 more rows

Desired output

   col1   col2  Position Position_ij
   <chr>  <chr>    <int> <chr>      
 1 group1 A            1 [1, 1]     
 2 group1 Q            2 [1, 2]     
 3 group1 NA           3 [1, 3]     
 4 group1 A            4 [2, 1]     
 5 group1 K            5 [2, 2]     
 6 group1 L            6 [2, 3]     
 7 group1 O            7 [3, 1]     
 8 group1 R            8 [3, 2]     
 9 group1 J            9 [3, 3]     
10 group2 S            1 [1, 1]     
11 group2 C            2 [1, 2]     
12 group2 S            3 [1, 3]     
13 group2 H            4 [1, 4]     
14 group2 O            5 [2, 1]     
15 group2 T            6 [2, 2]     
16 group2 Z            7 [2, 3]     
17 group2 D            8 [2, 4]     
18 group2 Y            9 [3, 1]     
19 group2 J           10 [3, 2]     
20 group2 V           11 [3, 3]     
21 group2 Z           12 [3, 4]     
22 group2 P           13 [4, 1]     
23 group2 L           14 [4, 2]     
24 group2 X           15 [4, 3]     
25 group2 D           16 [4, 4]     
26 group3 K            1 [1, 1]     
27 group3 M            2 [1, 2]     
28 group3 X            3 [1, 3]     
29 group3 E            4 [1, 4]     
30 group3 P            5 [2, 1]     
31 group3 U            6 [2, 2]     
32 group3 Z            7 [2, 3]     
33 group3 Z            8 [2, 4]     
34 group3 L            9 [3, 1]     
35 group3 W           10 [3, 2]     
36 group3 X           11 [3, 3]     
37 group3 F           12 [3, 4]     
38 group3 K           13 [4, 1]        

FYI, my reference matrix should either be 9 x 9 or 10 x 10 in reality.

>Solution :

Subtract 1 from ‘Position’ before the %/%/%% and then add 1 on the result

library(dplyr)
out <- df %>% 
  group_by(col1) %>% 
  mutate(Position_ij = if (n() == 9) {
      paste0("[", ((Position-1) %/% 3) + 1, ", ", (Position-1) %% 3 + 1, "]")
    } else {
      paste0("[", ((Position-1) %/% 4) + 1, ", ", (Position-1) %% 4 + 1, "]")
      }) %>%
  ungroup

-output

> as.data.frame(out)
     col1 col2 Position Position_ij
1  group1    A        1      [1, 1]
2  group1    Q        2      [1, 2]
3  group1 <NA>        3      [1, 3]
4  group1    A        4      [2, 1]
5  group1    K        5      [2, 2]
6  group1    L        6      [2, 3]
7  group1    O        7      [3, 1]
8  group1    R        8      [3, 2]
9  group1    J        9      [3, 3]
10 group2    S        1      [1, 1]
11 group2    C        2      [1, 2]
12 group2    S        3      [1, 3]
13 group2    H        4      [1, 4]
14 group2    O        5      [2, 1]
15 group2    T        6      [2, 2]
16 group2    Z        7      [2, 3]
17 group2    D        8      [2, 4]
18 group2    Y        9      [3, 1]
19 group2    J       10      [3, 2]
20 group2    V       11      [3, 3]
21 group2    Z       12      [3, 4]
22 group2    P       13      [4, 1]
23 group2    L       14      [4, 2]
24 group2    X       15      [4, 3]
25 group2    D       16      [4, 4]
26 group3    K        1      [1, 1]
27 group3    M        2      [1, 2]
28 group3    X        3      [1, 3]
29 group3    E        4      [1, 4]
30 group3    P        5      [2, 1]
31 group3    U        6      [2, 2]
32 group3    Z        7      [2, 3]
33 group3    Z        8      [2, 4]
34 group3    L        9      [3, 1]
35 group3    W       10      [3, 2]
36 group3    X       11      [3, 3]
37 group3    F       12      [3, 4]
38 group3    K       13      [4, 1]

Or use gl/rowid

library(data.table)
out2 <- df %>%
   group_by(col1) %>% 
   mutate(Position_ij = sprintf('[%d, %d]', 
      as.integer(gl(n(), c(4, 3)[1 + !n()%%3], n())),
      rowid(as.integer(gl(n(), c(4, 3)[1 + !n()%%3], n()))))) %>% 
   ungroup

-testing

> identical(out2, out)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading