Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to do rowwise matching in data.table between list-column and external vector?

I’m trying to do a rowwise matching for vector elements nested inside a data.table list-column.

library(tibble)
library(data.table)

my_dt <-
  tibble::tribble(
    ~col_x,   ~col_y,
    "a",      c(1, 2, 3),
    "b",      c(4, 5, 6),
    "c",      c(7, 8, 9)
  ) %>%
  as.data.table()

external_vec <- 1:9

I want to match each element from col_y against external_vec such that I will get a new column in my_dt that is equivalent to external_vec %in% col_y, such as:

# desired output
##    col_x col_y                                 new_col
## 1:     a 1,2,3  TRUE, TRUE, TRUE,FALSE,FALSE,FALSE,...
## 2:     b 4,5,6 FALSE,FALSE,FALSE, TRUE, TRUE, TRUE,...
## 3:     c 7,8,9 FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,...

EDIT

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


I know how to implement a solution based on tidyverse’s purrr, like the following. However, I’m looking for a data.table native functionality.

# this works but relies on purrr and dplyr, which I hope to avoid in this problem
my_dt %>%
  tibble::add_column(ext_vec_as_col = list(external_vec)) %>%
  mutate(new_col = purrr::map2(.x = ext_vec_as_col, .y = col_y, .f = ~.x %in% .y) )
#>    col_x col_y  ext_vec_as_col                                 new_col
#> 1:     a 1,2,3 1,2,3,4,5,6,...  TRUE, TRUE, TRUE,FALSE,FALSE,FALSE,...
#> 2:     b 4,5,6 1,2,3,4,5,6,... FALSE,FALSE,FALSE, TRUE, TRUE, TRUE,...
#> 3:     c 7,8,9 1,2,3,4,5,6,... FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,...

>Solution :

I’ve got the tidyverse-solution:

library(tibble)
library(dplyr)
library(tidyr)
library(purrr)
library(magrittr)

my_dt <-
  tibble::tribble(
    ~col_x,   ~col_y,
    "a",      c(1, 2, 3),
    "b",      c(4, 5, 6),
    "c",      c(7, 8, 9)
  )
external_vec <- 1:9

my_dt %>% 
  mutate(new_col = col_y %>% 
           map(~external_vec %in% .x)) %>% 
  unnest_wider(col_y, names_sep = "_") %>%  # this is just to see the result..
  unnest_wider(new_col, names_sep = "_") # this is just to see the result..

Usually, library(tidyverse) includes a lot of these, but I just thought that it would be nicer to know exactly what you need to make this work.

data.table solution

I think this is the equivalent:

my_dt[, new_col := lapply(col_y, \(x) external_vec %in% x)]
my_dt

Performance

Since that is important, here is some performance benchmark


bench::mark(
  tidy = 
    my_dt %>% 
    mutate(new_col = col_y %>% 
             map(~external_vec %in% .x)),
  dt = my_dt[, new_col := lapply(col_y, \(x) external_vec %in% x)],
  op_tidy = my_dt %>%
    tibble::add_column(ext_vec_as_col = list(external_vec)) %>%
    mutate(new_col = purrr::map2(.x = ext_vec_as_col, .y = col_y, .f = ~.x %in% .y)),
  check = FALSE
) %>% 
  plot()

tidy is the tidyverse way of doing this, op-tidy is the method in the edit, and dt is the proposed data.table-solution.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading