Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Making lists by group

I have data as follows:

library(data.table)
dat <- structure(list(year2006 = c("1110", "1110", "1110", "1110", "1120", 
"1120", "1120", "1120"), group2006 = c("1", "2", "3", "4", "1", 
"2", "3", "4"), min2006 = c("1.35", "2", "3.7", 
"4.25", "5.6", "4.45", "3.09", "1.13"), 
    year2007 = c("1110", "1110", "1110", "1110", "1120", "1120", 
    "1120", "1120"), group2007 = c("1", "2", "3", "4", "1", 
    "2", "3", "4"), min2007 = c("5", "5.05", "5", 
    "1.59", "2.3", "3", "4.05", "5.16"
    )), row.names = c(NA, -8L), class = c("data.table", "data.frame"
))
dat

   year2006  group2006 min2006 year2007  group2007 min2007
1:     1110          1    1.35     1110          1       5
2:     1110          2       2     1110          2    5.05
3:     1110          3     3.7     1110          3       5
4:     1110          4    4.25     1110          4    1.59
5:     1120          1     5.6     1120          1     2.3
6:     1120          2    4.45     1120          2       3
7:     1120          3    3.09     1120          3    4.05
8:     1120          4    1.13     1120          4    5.16

What I would like to do, is to create a list of the numbers in min200x, per category in year200x.

Desired output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   cat       year2006                         year2007
1:     1110  c("1.35", "2", "3.7", "4.25")    c("5", "5.05", "5", "1.59") 
2:     1120  c("5.6", "4.45", "3.09", "1.13") c("2.3", "3", "4.05", "5.16")

I thought I could do something like:

setDT(dat)[, cat := list(min2006), by=year2006]

But that does not work (it just puts the min2006 item in a new colum cat). And even if it did, it would only provide a solution for the year 2006. How should I go about this?

>Solution :

I’m not sure why your columns in your test data are all character but the columns in your desired output are numeric. Also, you ask for a list of numbers by group but your expected output shows a vector.

Nevertheless, here’s a tidyverse solution that creates list columns.

library(tidyverse)

x <- dat %>% 
  mutate(across(everything(), as.numeric)) %>% 
  group_by(year2006) %>% 
  select(year2006, starts_with("min")) %>% 
  summarise(across(everything(), lst))
x
# A tibble: 2 × 3
  year2006 min2006      min2007     
     <dbl> <named list> <named list>
1     1110 <dbl [4]>    <dbl [4]>   
2     1120 <dbl [4]>    <dbl [4]>

and, for example,

x$min2006
$min2006
[1] 1.35 2.00 3.70 4.25

$min2006
[1] 5.60 4.45 3.09 1.13

If your inputs are actually numeric, you can lose the mutate.

Edit

… and to get the correct name for the grouping column, you can add %>% rename(cat=year2006) to the pipe. Apologies for the omission.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading