Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

dplyr::mutate when custom function return a vector

I am trying to use dplyr::mutate to group_by data and create new columns, using custom function which return a vector, and the function takes a long time to bootstrap.

I know this can be implemented in base R, but is there a more elegent way in dplyr.

Example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

iris %>% 
  group_by(Species) %>% 
  mutate(t1 = f(iris$Sepal.Length)[1], t2 = f(iris$Sepal.Length)[2])

f <- function(x) {
  return(c(2*x, x+1))
}

Is it possible to create two columns only call the function once in each group?


I made a mistake in the previous example.. Please check this example instead:

f <- function(x) {
  return(c(x*2, x+1))
}

iris %>% 
  group_by(Species) %>% 
  
  group_modify(~ {
    .x %>% 
      mutate(t1 := f(mean(.x$Sepal.Length))[1], t2 := f(mean(.x$Sepal.Length))[2])
  })

Thank Darren Tsai for the answer! The problem is solved using unnest_wider in the new example:

library(dplyr)
library(tidyr)

iris %>% 
  group_by(Species) %>% 
  group_modify(~ {
    .x %>% 
      mutate(t = list(f(mean(.x$Sepal.Length)))) %>% 
      unnest_wider(t, names_sep = "")
  })

# A tibble: 150 × 7
# Groups:   Species [3]
   Species Sepal.Length Sepal.Width Petal.Length Petal.Width    t1    t2
   <fct>          <dbl>       <dbl>        <dbl>       <dbl> <dbl> <dbl>
 1 setosa           5.1         3.5          1.4         0.2  10.0  6.01
 2 setosa           4.9         3            1.4         0.2  10.0  6.01
 3 setosa           4.7         3.2          1.3         0.2  10.0  6.01
 4 setosa           4.6         3.1          1.5         0.2  10.0  6.01
 5 setosa           5           3.6          1.4         0.2  10.0  6.01
 6 setosa           5.4         3.9          1.7         0.4  10.0  6.01
 7 setosa           4.6         3.4          1.4         0.3  10.0  6.01
 8 setosa           5           3.4          1.5         0.2  10.0  6.01
 9 setosa           4.4         2.9          1.4         0.2  10.0  6.01
10 setosa           4.9         3.1          1.5         0.1  10.0  6.01
# … with 140 more rows
# ℹ Use `print(n = ...)` to see more rows

>Solution :

The issue with your code is that it passes a vector to f, so the result probably isn’t what you’re expecting:

f(1 : 5)
# [1]  2  4  6  8 10  2  3  4  5  6                                        

Your calling code will have to disentangle that.

You can do that, e.g. using the following helper:

to_tibble <- function (x, colnames) {
    x %>%
        matrix(ncol = length(colnames), dimnames = list(NULL, colnames)) %>%
        as_tibble()
}

With that, you can now call your f inside mutate and provide target column names:

iris %>%
    group_by(Species) %>%
    mutate(to_tibble(f(Sepal.Length), c("t1", "t2"))

The advantage of this method is that it simplifies the calling code and harnesses mutate’s built-in support for producing multiple columns — no manual unnesting required.


Regarding your updated code/requirement, you can simplify that too using the helper function:

iris %>%
    group_by(Species) %>%
    group_modify(
        ~ mutate(.x, to_tibble(f(mean(Sepal.Length)), c("t1", "t2")))
    )
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading