Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Aggregate a list column by group

Consider the following sample data

library(tidyverse)
df <- tibble(group = c("a", "b", "b"), val = list(1:3, 4:6, 7:12))
## A tibble: 3 × 2
#  group val      
#  <chr> <list>   
#1 a     <int [3]>
#2 b     <int [3]>
#3 b     <int [6]>

I would like to combine entries in column val based on group, giving the expected output

df_out <- tibble(group = c("a", "b"), val = list(1:3, 4:12))

I’m looking for a tidyverse solution but have been unsuccessful. For example,

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df %>% group_by(group) %>% summarise(val = map(val, c), .groups = "drop")

does not concatenate val entries from the two "b" rows and instead produces a warning

# A tibble: 3 × 2
  group val      
  <chr> <list>   
1 a     <int [3]>
2 b     <int [3]>
3 b     <int [6]>
Warning message:
Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame and adjust
  accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated. 

I understand the warning but I don’t understand why "more than 1 row per summarise() group" is returned. Can somebody please explain and offer a solution?

I was hoping for a simple two step group_by + summarise solution (i.e. avoiding nesting etc.).


To clarify: The numbers in val are not in order nor a sequence. A different set of val numbers for one group might be c(1, 10, 2) and c(4, 7, 7). The expected combined output would be c(1, 10, 2, 4, 7, 7). So:

df <- tibble(group = c("a", "b", "b"), val = list(1:3, c(1, 10, 2), c(4, 7, 7)))
df_out <- tibble(group = c("a", "b"), val = list(1:3, c(1, 10, 2, 4, 7, 7)))

>Solution :

We can unlist the value variable in the grouped data, then list it back

library(dplyr)

df <- df |> 
    group_by(group) |> 
    summarise(val = list(unlist(val))) |> 
    ungroup()

pull(df, val)

[[1]]
[1] 1 2 3

[[2]]
[1]  4  5  6  7  8  9 10 11 12
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading