dplyr::mutate when custom function return a vector

Advertisements

I am trying to use dplyr::mutate to group_by data and create new columns, using custom function which return a vector, and the function takes a long time to bootstrap.

I know this can be implemented in base R, but is there a more elegent way in dplyr.

Example:

iris %>% 
  group_by(Species) %>% 
  mutate(t1 = f(iris$Sepal.Length)[1], t2 = f(iris$Sepal.Length)[2])

f <- function(x) {
  return(c(2*x, x+1))
}

Is it possible to create two columns only call the function once in each group?


I made a mistake in the previous example.. Please check this example instead:

f <- function(x) {
  return(c(x*2, x+1))
}

iris %>% 
  group_by(Species) %>% 
  
  group_modify(~ {
    .x %>% 
      mutate(t1 := f(mean(.x$Sepal.Length))[1], t2 := f(mean(.x$Sepal.Length))[2])
  })

Thank Darren Tsai for the answer! The problem is solved using unnest_wider in the new example:

library(dplyr)
library(tidyr)

iris %>% 
  group_by(Species) %>% 
  group_modify(~ {
    .x %>% 
      mutate(t = list(f(mean(.x$Sepal.Length)))) %>% 
      unnest_wider(t, names_sep = "")
  })

# A tibble: 150 × 7
# Groups:   Species [3]
   Species Sepal.Length Sepal.Width Petal.Length Petal.Width    t1    t2
   <fct>          <dbl>       <dbl>        <dbl>       <dbl> <dbl> <dbl>
 1 setosa           5.1         3.5          1.4         0.2  10.0  6.01
 2 setosa           4.9         3            1.4         0.2  10.0  6.01
 3 setosa           4.7         3.2          1.3         0.2  10.0  6.01
 4 setosa           4.6         3.1          1.5         0.2  10.0  6.01
 5 setosa           5           3.6          1.4         0.2  10.0  6.01
 6 setosa           5.4         3.9          1.7         0.4  10.0  6.01
 7 setosa           4.6         3.4          1.4         0.3  10.0  6.01
 8 setosa           5           3.4          1.5         0.2  10.0  6.01
 9 setosa           4.4         2.9          1.4         0.2  10.0  6.01
10 setosa           4.9         3.1          1.5         0.1  10.0  6.01
# … with 140 more rows
# ℹ Use `print(n = ...)` to see more rows

>Solution :

The issue with your code is that it passes a vector to f, so the result probably isn’t what you’re expecting:

f(1 : 5)
# [1]  2  4  6  8 10  2  3  4  5  6                                        

Your calling code will have to disentangle that.

You can do that, e.g. using the following helper:

to_tibble <- function (x, colnames) {
    x %>%
        matrix(ncol = length(colnames), dimnames = list(NULL, colnames)) %>%
        as_tibble()
}

With that, you can now call your f inside mutate and provide target column names:

iris %>%
    group_by(Species) %>%
    mutate(to_tibble(f(Sepal.Length), c("t1", "t2"))

The advantage of this method is that it simplifies the calling code and harnesses mutate’s built-in support for producing multiple columns — no manual unnesting required.


Regarding your updated code/requirement, you can simplify that too using the helper function:

iris %>%
    group_by(Species) %>%
    group_modify(
        ~ mutate(.x, to_tibble(f(mean(Sepal.Length)), c("t1", "t2")))
    )

Leave a Reply Cancel reply