Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Efficiently convert two-columns data.table to list

I’m looking for a fast yet readable solution to this problem in R. Preferably the solution should use the data.table package or no additional packages although I’d like to hear of other options.

I have a data.table with two columns like this one:

dat
    go_id gene_id
 1:     A       a
 2:     A       b
 3:     B       c
 4:     B       d
 5:     B       e
 6:     C       f
 7:     C       g
 8:     C       h
 9:     C       i
10:     C       j

You can reproduce it with:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(data.table)

dat <- data.table(
    go_id=rep(LETTERS[1:3], times=c(2,3,5))
)
dat[, gene_id := letters[1:nrow(dat)]]

I need to convert it to a list where each "key" is a go_id with "value" a vector of genes assigned to that go_id. That is, the output should be this list:

$A
[1] "a" "b"

$B
[1] "c" "d" "e"

$C
[1] "f" "g" "h" "i" "j"

If it matters, genes can be associated to multiple go_id‘s. The real data has about 280000 rows with 17000 distinct go_id’s and 16000 distinct genes.

This is the solution I have so far – is there anything better in terms of speed and/or readability?

dat <- dat[, list(gene_id=list(gene_id)), by=go_id]
go_ids <- dat$go_id
go_list <- list()
for(i in 1:nrow(dat)) {
    go <- go_ids[i]
    genes <- dat[i,]$gene_id
    go_list[go] <- genes
}

>Solution :

Without data.table or any other package, this will do the task:

tapply(dat$gene_id, dat$go_id, FUN = c)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading