Consider the following sample data
library(tidyverse)
df <- tibble(group = c("a", "b", "b"), val = list(1:3, 4:6, 7:12))
## A tibble: 3 × 2
# group val
# <chr> <list>
#1 a <int [3]>
#2 b <int [3]>
#3 b <int [6]>
I would like to combine entries in column val based on group, giving the expected output
df_out <- tibble(group = c("a", "b"), val = list(1:3, 4:12))
I’m looking for a tidyverse solution but have been unsuccessful. For example,
df %>% group_by(group) %>% summarise(val = map(val, c), .groups = "drop")
does not concatenate val entries from the two "b" rows and instead produces a warning
# A tibble: 3 × 2
group val
<chr> <list>
1 a <int [3]>
2 b <int [3]>
3 b <int [6]>
Warning message:
Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame and adjust
accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
I understand the warning but I don’t understand why "more than 1 row per summarise() group" is returned. Can somebody please explain and offer a solution?
I was hoping for a simple two step group_by + summarise solution (i.e. avoiding nesting etc.).
To clarify: The numbers in val are not in order nor a sequence. A different set of val numbers for one group might be c(1, 10, 2) and c(4, 7, 7). The expected combined output would be c(1, 10, 2, 4, 7, 7). So:
df <- tibble(group = c("a", "b", "b"), val = list(1:3, c(1, 10, 2), c(4, 7, 7)))
df_out <- tibble(group = c("a", "b"), val = list(1:3, c(1, 10, 2, 4, 7, 7)))
>Solution :
We can unlist the value variable in the grouped data, then list it back
library(dplyr)
df <- df |>
group_by(group) |>
summarise(val = list(unlist(val))) |>
ungroup()
pull(df, val)
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5 6 7 8 9 10 11 12