Aggregate a list column by group

June 14, 2023

Consider the following sample data

library(tidyverse)
df <- tibble(group = c("a", "b", "b"), val = list(1:3, 4:6, 7:12))
## A tibble: 3 × 2
#  group val      
#  <chr> <list>   
#1 a     <int [3]>
#2 b     <int [3]>
#3 b     <int [6]>

I would like to combine entries in column val based on group, giving the expected output

df_out <- tibble(group = c("a", "b"), val = list(1:3, 4:12))

I’m looking for a tidyverse solution but have been unsuccessful. For example,

df %>% group_by(group) %>% summarise(val = map(val, c), .groups = "drop")

does not concatenate val entries from the two "b" rows and instead produces a warning

# A tibble: 3 × 2
  group val      
  <chr> <list>   
1 a     <int [3]>
2 b     <int [3]>
3 b     <int [6]>
Warning message:
Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame and adjust
  accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

I understand the warning but I don’t understand why "more than 1 row per summarise() group" is returned. Can somebody please explain and offer a solution?

I was hoping for a simple two step group_by + summarise solution (i.e. avoiding nesting etc.).

To clarify: The numbers in val are not in order nor a sequence. A different set of val numbers for one group might be c(1, 10, 2) and c(4, 7, 7). The expected combined output would be c(1, 10, 2, 4, 7, 7). So:

df <- tibble(group = c("a", "b", "b"), val = list(1:3, c(1, 10, 2), c(4, 7, 7)))
df_out <- tibble(group = c("a", "b"), val = list(1:3, c(1, 10, 2, 4, 7, 7)))

>Solution :

We can unlist the value variable in the grouped data, then list it back

library(dplyr)

df <- df |> 
    group_by(group) |> 
    summarise(val = list(unlist(val))) |> 
    ungroup()

pull(df, val)

[[1]]
[1] 1 2 3

[[2]]
[1]  4  5  6  7  8  9 10 11 12