Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to apply this function columnwise with cross() in tidyverse?

I need to create new columns based on existing ones using a function in tidyverse. I planned to use across() because it allows dynamic renaming of new variables which is increasingly important in my case and spares a lot of time especially if you have many variables in your data to modify. The function below could not be applied column-wise as expected, it behaves strangely and by changing the values of the P argument I got unexpected output each time, particularly when I set some values to 1, as if the function was applied element-wise but not column-wise.

I wonder how could this code be written in a more efficient way to achieve the above goal, with efficient I mean, shorter, faster, and neater.

Reprex

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

set.seed (123)
df <- tibble(id = 1:10,
               rosa = runif(10, min = 20.8, max = 36.5),
               lila = runif(10, min = 17, max = 37),
               blaue = runif(10, min = 23.3, max = 32.7))
df[c (2, 5, 8), c (2:4)] <- NA

Code

myfun <- function(x, P = 2, na.rm = FALSE){
    P ^ (min (x, na.rm = na.rm) - x)
}

P <- c(2, 1.5, 1.1) # fiddle with numbers here and see the output each time changes 
names <- c ("rosa", "lila", "blaue")
df %>%
    select(!!names) %>%
    mutate(across(.cols = !!names,
                  .fns = ~myfun(.x, P, na.rm = TRUE),
                  .names = "{.col}_P"))

Output

   + # A tibble: 10 × 6
    rosa  lila blaue    rosa_P     lila_P  blaue_P
   <dbl> <dbl> <dbl>     <dbl>      <dbl>    <dbl>
 1  25.3  36.1  31.7  0.0718    0.0000526  0.00793
 2  NA    NA    NA   NA        NA         NA      
 3  27.2  30.6  29.3  0.581     0.439      0.643  
 4  34.7  28.5  32.6  0.000110  0.0108     0.00401
 5  NA    NA    NA   NA        NA         NA      
 6  21.5  35.0  30.0  1         0.288      0.605  
 7  29.1  21.9  28.4  0.00524   1          0.0753 
 8  NA    NA    NA   NA        NA         NA      
 9  29.5  23.6  26.0  0.469     0.856      0.881  
10  28.0  36.1  24.7  0.0114    0.0000543  1      
Warning messages:
1: Problem while computing `..1 = across(...)`.
ℹ longer object length is not a multiple of shorter object length 
2: Problem while computing `..1 = across(...)`.
ℹ longer object length is not a multiple of shorter object length 
3: Problem while computing `..1 = across(...)`.
ℹ longer object length is not a multiple of shorter object length 

Expected Output

df %>%
select(!!names) %>%
mutate(rosa_P =  2^(min (rosa, na.rm = TRUE) - rosa)) %>%
mutate(lila_P =  1.5^(min (lila, na.rm = TRUE) - lila)) %>%
mutate(blaue_P = 1.1^(min (blaue, na.rm = TRUE) - blaue))

   + # A tibble: 10 × 6
    rosa  lila blaue    rosa_P   lila_P blaue_P
   <dbl> <dbl> <dbl>     <dbl>    <dbl>   <dbl>
 1  25.3  36.1  31.7  0.0718    0.00314   0.514
 2  NA    NA    NA   NA        NA        NA    
 3  27.2  30.6  29.3  0.0192    0.0302    0.643
 4  34.7  28.5  32.6  0.000110  0.0708    0.468
 5  NA    NA    NA   NA        NA        NA    
 6  21.5  35.0  30.0  1         0.00498   0.605
 7  29.1  21.9  28.4  0.00524   1         0.701
 8  NA    NA    NA   NA        NA        NA    
 9  29.5  23.6  26.0  0.00407   0.515     0.881
10  28.0  36.1  24.7  0.0114    0.00320   1    

>Solution :

Your problem is the P vector as across won’t understand which number belongs to what call but will pass all three numbers to myfun. Instead, you could name it, and use cur_column().

It might also be safer to use all_of/any_of instead of !!, and to use another name for the vector of names than names as it’s also a base function. This might lead to confusion.

library(dplyr)

P <- c(rosa = 2, lila = 1.5, blaue = 1.1)
colournames <- names(P) #c("rosa", "lila", "blaue")

df |>
  #select(all_of(colournames)) |>
  mutate(across(all_of(colournames),
                ~ P[cur_column()] ^ (min(., na.rm = TRUE) - .), # ~ my_fun(., P[cur_column()], na.rm = TRUE)
                .names = "{.col}_P"))

Output:

# A tibble: 10 × 6
    rosa  lila blaue    rosa_P   lila_P blaue_P
   <dbl> <dbl> <dbl>     <dbl>    <dbl>   <dbl>
 1  25.3  36.1  31.7  0.0718    0.00314   0.514
 2  NA    NA    NA   NA        NA        NA    
 3  27.2  30.6  29.3  0.0192    0.0302    0.643
 4  34.7  28.5  32.6  0.000110  0.0708    0.468
 5  NA    NA    NA   NA        NA        NA    
 6  21.5  35.0  30.0  1         0.00498   0.605
 7  29.1  21.9  28.4  0.00524   1         0.701
 8  NA    NA    NA   NA        NA        NA    
 9  29.5  23.6  26.0  0.00407   0.515     0.881
10  28.0  36.1  24.7  0.0114    0.00320   1    

Update, changed as OP example changed.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading