I’m exploring the across() function introduced in recent versions of dplyr, and I’m trying to understand how to use it to apply a custom function that returns multiple columns. Specifically, I want to apply a function that calculates both the mean and standard deviation for selected numeric columns in my data frame and returns these as separate columns.
For example, given the following data frame:
library(dplyr)
df <- data.frame(
Group = rep(letters[1:3], each = 4),
Value1 = rnorm(12, mean = 10, sd = 2),
Value2 = rnorm(12, mean = 5, sd = 1)
)
I want to create a new data frame that includes the mean and standard deviation for each value column, something like this:
Group Mean_Value1 SD_Value1 Mean_Value2 SD_Value2
1 a 9.812 2.034 4.955 1.085
2 b 10.231 1.987 5.023 0.923
3 c 10.032 2.121 4.998 1.098
I’ve tried the following approach but I’m not sure how to make it work properly with across():
df_summary <- df %>%
group_by(Group) %>%
summarise(across(starts_with("Value"), ~ c(mean = mean(.), sd = sd(.))))
This throws an error because across() doesn’t seem to naturally handle functions that return multiple columns.
My specific questions are:
- How can I modify this approach to properly use
across()for functions that return multiple values? - Is there a better way to achieve this using
dplyror another package in R? - What are the limitations of
across()when dealing with custom functions like this?
Any guidance on how to accomplish this would be greatly appreciated!
>Solution :
Your question is actually listed as an example in the documentation page of across.
You should use list to include multiple functions for across.
library(dplyr)
df %>%
group_by(Group) %>%
summarise(across(starts_with("Value"), list(mean = mean, sd = sd)))
# A tibble: 3 × 5
Group Value1_mean Value1_sd Value2_mean Value2_sd
<chr> <dbl> <dbl> <dbl> <dbl>
1 a 8.61 0.837 5.57 0.581
2 b 8.90 2.08 5.22 0.479
3 c 10.3 1.98 4.36 0.465